* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
[not found] <58cb370e041027074676750027@mail.gmail.com>
@ 2004-10-27 15:14 ` Jeff Garzik
2004-10-27 15:52 ` Martin J. Bligh
0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2004-10-27 15:14 UTC (permalink / raw)
To: Linux Kernel, linux-mm
Cc: Bartlomiej Zolnierkiewicz, Randy.Dunlap, William Lee Irwin III,
Jens Axboe
Bartlomiej Zolnierkiewicz wrote:
> We have stuct page of the first page and a offset.
> We need to obtain struct page of the current page and map it.
Opening this question to a wider audience.
struct scatterlist gives us struct page*, and an offset+length pair.
The struct page* is the _starting_ page of a potentially multi-page run
of data.
The question: how does one get struct page* for the second, and
successive pages in a known-contiguous multi-page run, if one only knows
the first page?
Jeff
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
2004-10-27 15:14 ` news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1) Jeff Garzik
@ 2004-10-27 15:52 ` Martin J. Bligh
2004-10-27 15:59 ` Jeff Garzik
2004-10-27 16:01 ` Martin J. Bligh
0 siblings, 2 replies; 13+ messages in thread
From: Martin J. Bligh @ 2004-10-27 15:52 UTC (permalink / raw)
To: Jeff Garzik, Linux Kernel, linux-mm
Cc: Bartlomiej Zolnierkiewicz, Randy.Dunlap, William Lee Irwin III,
Jens Axboe
> Bartlomiej Zolnierkiewicz wrote:
>> We have stuct page of the first page and a offset.
>> We need to obtain struct page of the current page and map it.
>
>
> Opening this question to a wider audience.
>
> struct scatterlist gives us struct page*, and an offset+length pair. The struct page* is the _starting_ page of a potentially multi-page run of data.
>
> The question: how does one get struct page* for the second, and successive pages in a known-contiguous multi-page run, if one only knows the first page?
If it's a higher order allocation, just page+1 should be safe. If it just
happens to be contig, it might cross a discontig boundary, and not obey
that rule. Very unlikely, but possible.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
2004-10-27 15:52 ` Martin J. Bligh
@ 2004-10-27 15:59 ` Jeff Garzik
2004-10-27 17:36 ` Martin J. Bligh
2004-10-27 16:01 ` Martin J. Bligh
1 sibling, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2004-10-27 15:59 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Linux Kernel, linux-mm, Bartlomiej Zolnierkiewicz, Randy.Dunlap,
William Lee Irwin III, Jens Axboe, Andrew Morton
Martin J. Bligh wrote:
>>Bartlomiej Zolnierkiewicz wrote:
>>
>>>We have stuct page of the first page and a offset.
>>>We need to obtain struct page of the current page and map it.
>>
>>
>>Opening this question to a wider audience.
>>
>>struct scatterlist gives us struct page*, and an offset+length pair. The struct page* is the _starting_ page of a potentially multi-page run of data.
>>
>>The question: how does one get struct page* for the second, and successive pages in a known-contiguous multi-page run, if one only knows the first page?
>
>
> If it's a higher order allocation, just page+1 should be safe. If it just
> happens to be contig, it might cross a discontig boundary, and not obey
> that rule. Very unlikely, but possible.
Unfortunately, it's not.
The block layer just tells us "it's a contiguous run of memory", which
implies nothing really about the allocation size.
Bart and I (and others?) essentially need a "page+1" thing (for 2.4.x
too!), that won't break in the face of NUMA/etc.
Alternatively (or additionally), we may need to make sure the block
layer doesn't merge across zones or NUMA boundaries or whatnot.
Jeff
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
2004-10-27 15:52 ` Martin J. Bligh
2004-10-27 15:59 ` Jeff Garzik
@ 2004-10-27 16:01 ` Martin J. Bligh
2004-10-27 16:35 ` [PATCH] " Jeff Garzik
2004-10-27 18:08 ` Christoph Hellwig
1 sibling, 2 replies; 13+ messages in thread
From: Martin J. Bligh @ 2004-10-27 16:01 UTC (permalink / raw)
To: Jeff Garzik, Linux Kernel, linux-mm
Cc: Bartlomiej Zolnierkiewicz, Randy.Dunlap, William Lee Irwin III,
Jens Axboe
--"Martin J. Bligh" <mbligh@aracnet.com> wrote (on Wednesday, October 27, 2004 08:52:39 -0700):
>> Bartlomiej Zolnierkiewicz wrote:
>>> We have stuct page of the first page and a offset.
>>> We need to obtain struct page of the current page and map it.
>>
>>
>> Opening this question to a wider audience.
>>
>> struct scatterlist gives us struct page*, and an offset+length pair. The struct page* is the _starting_ page of a potentially multi-page run of data.
>>
>> The question: how does one get struct page* for the second, and successive pages in a known-contiguous multi-page run, if one only knows the first page?
>
> If it's a higher order allocation, just page+1 should be safe. If it just
> happens to be contig, it might cross a discontig boundary, and not obey
> that rule. Very unlikely, but possible.
To repeat what I said in IRC ... ;-)
Actually, you could check this with the pfns being the same when >> MAX_ORDER-1.
We should be aligned on a MAX_ORDER boundary, I think.
However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
2004-10-27 16:01 ` Martin J. Bligh
@ 2004-10-27 16:35 ` Jeff Garzik
2004-10-27 21:29 ` Andrew Morton
2004-10-27 18:08 ` Christoph Hellwig
1 sibling, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2004-10-27 16:35 UTC (permalink / raw)
To: Martin J. Bligh, Andrew Morton
Cc: Linux Kernel, linux-mm, Bartlomiej Zolnierkiewicz, Randy.Dunlap,
William Lee Irwin III, Jens Axboe
[-- Attachment #1: Type: text/plain, Size: 494 bytes --]
Martin J. Bligh wrote:
> To repeat what I said in IRC ... ;-)
>
> Actually, you could check this with the pfns being the same when >> MAX_ORDER-1.
> We should be aligned on a MAX_ORDER boundary, I think.
>
> However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.
Is this patch acceptable to everyone? Andrew?
It uses the publicly-exported pfn_to_page/page_to_pfn abstraction, which
seems to be the only way to accomplish what we want to do in IDE/libata.
Jeff
[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 401 bytes --]
===== include/linux/mm.h 1.193 vs edited =====
--- 1.193/include/linux/mm.h 2004-10-20 04:37:06 -04:00
+++ edited/include/linux/mm.h 2004-10-27 12:33:28 -04:00
@@ -41,6 +41,8 @@
#define MM_VM_SIZE(mm) TASK_SIZE
#endif
+#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + n)
+
/*
* Linux kernel virtual memory manager primitives.
* The idea being to have a "virtual" mm in the same way
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
2004-10-27 15:59 ` Jeff Garzik
@ 2004-10-27 17:36 ` Martin J. Bligh
0 siblings, 0 replies; 13+ messages in thread
From: Martin J. Bligh @ 2004-10-27 17:36 UTC (permalink / raw)
To: Jeff Garzik
Cc: Linux Kernel, linux-mm, Bartlomiej Zolnierkiewicz, Randy.Dunlap,
William Lee Irwin III, Jens Axboe, Andrew Morton
> Unfortunately, it's not.
>
> The block layer just tells us "it's a contiguous run of memory", which implies nothing really about the allocation size.
>
> Bart and I (and others?) essentially need a "page+1" thing (for 2.4.x too!), that won't break in the face of NUMA/etc.
>
> Alternatively (or additionally), we may need to make sure the block layer doesn't merge across zones or NUMA boundaries or whatnot.
The latter would be rather more efficient. I don't know how often you
end up doing each operation though ... the page+1 vs the attemtped merge.
Depends on the ratio, I guess.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
2004-10-27 16:01 ` Martin J. Bligh
2004-10-27 16:35 ` [PATCH] " Jeff Garzik
@ 2004-10-27 18:08 ` Christoph Hellwig
2004-10-27 18:33 ` news about IDE PIO HIGHMEM bug Jeff Garzik
1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2004-10-27 18:08 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Jeff Garzik, Linux Kernel, linux-mm, Bartlomiej Zolnierkiewicz,
Randy.Dunlap, William Lee Irwin III, Jens Axboe
> To repeat what I said in IRC ... ;-)
>
> Actually, you could check this with the pfns being the same when >> MAX_ORDER-1.
> We should be aligned on a MAX_ORDER boundary, I think.
>
> However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.
I think this is the wrong level of interface exposed. Just add two hepler
kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
even if it's bigger than a page.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: news about IDE PIO HIGHMEM bug
2004-10-27 18:08 ` Christoph Hellwig
@ 2004-10-27 18:33 ` Jeff Garzik
2004-10-27 18:48 ` William Lee Irwin III
2004-10-28 0:18 ` William Lee Irwin III
0 siblings, 2 replies; 13+ messages in thread
From: Jeff Garzik @ 2004-10-27 18:33 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Martin J. Bligh, Linux Kernel, linux-mm,
Bartlomiej Zolnierkiewicz, Randy.Dunlap, William Lee Irwin III,
Jens Axboe
Christoph Hellwig wrote:
>>To repeat what I said in IRC ... ;-)
>>
>>Actually, you could check this with the pfns being the same when >> MAX_ORDER-1.
>>We should be aligned on a MAX_ORDER boundary, I think.
>>
>>However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.
>
>
> I think this is the wrong level of interface exposed. Just add two hepler
> kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
> even if it's bigger than a page.
Why bother mapping anything larger than a page, when none of the users
need it?
Jeff
P.S. In your scheme you would need four helpers; you forgot kmap_sg()
and kunmap_sg().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: news about IDE PIO HIGHMEM bug
2004-10-27 18:33 ` news about IDE PIO HIGHMEM bug Jeff Garzik
@ 2004-10-27 18:48 ` William Lee Irwin III
2004-10-28 0:18 ` William Lee Irwin III
1 sibling, 0 replies; 13+ messages in thread
From: William Lee Irwin III @ 2004-10-27 18:48 UTC (permalink / raw)
To: Jeff Garzik
Cc: Christoph Hellwig, Linux Kernel, linux-mm,
Bartlomiej Zolnierkiewicz, Randy.Dunlap, Jens Axboe,
James Bottomley
Christoph Hellwig wrote:
>> I think this is the wrong level of interface exposed. Just add two hepler
>> kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
>> even if it's bigger than a page.
On Wed, Oct 27, 2004 at 02:33:45PM -0400, Jeff Garzik wrote:
> Why bother mapping anything larger than a page, when none of the users
> need it?
> P.S. In your scheme you would need four helpers; you forgot kmap_sg()
> and kunmap_sg().
This is all a non-issue. The page structure just represents little more
than a physical address to the block layer in the context of merging,
so the pfn_to_page(page_to_pfn(...) + ...) bits calculate this properly.
There is just nothing interesting going on here. Generate the page
structure for the piece of the segment, kmap_atomic() it, and it's done.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
2004-10-27 16:35 ` [PATCH] " Jeff Garzik
@ 2004-10-27 21:29 ` Andrew Morton
2004-10-27 21:31 ` Jeff Garzik
2004-10-27 21:34 ` William Lee Irwin III
0 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2004-10-27 21:29 UTC (permalink / raw)
To: Jeff Garzik
Cc: mbligh, linux-kernel, linux-mm, bzolnier, rddunlap, wli, axboe
Jeff Garzik <jgarzik@pobox.com> wrote:
>
> > However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.
>
>
> Is this patch acceptable to everyone? Andrew?
spose so. The scatterlist API is being a bit silly there.
It might be worthwhile doing:
#ifdef CONFIG_DISCONTIGMEM
#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + n)
#else
#define nth_page(page,n) ((page)+(n))
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
2004-10-27 21:29 ` Andrew Morton
@ 2004-10-27 21:31 ` Jeff Garzik
2004-10-27 21:34 ` William Lee Irwin III
1 sibling, 0 replies; 13+ messages in thread
From: Jeff Garzik @ 2004-10-27 21:31 UTC (permalink / raw)
To: Andrew Morton
Cc: mbligh, linux-kernel, linux-mm, bzolnier, rddunlap, wli, axboe
On Wed, Oct 27, 2004 at 02:29:14PM -0700, Andrew Morton wrote:
> spose so. The scatterlist API is being a bit silly there.
Well, it depends on your perspective :)
Each scatterlist entry is supposed to map to a physical segment to be
passed to h/w. Hardware S/G tables just want to see a addr/len pair,
and don't care about machine page size. scatterlist follows a similar
model.
dma_map_sg() and other helpers create a favorable situation, where >90%
of the drivers don't have to care about the VM-size details.
Unfortunately those drivers that need need to do their own data transfer
(like ATA's PIO, instead of DMA) need direct access to each member of an
s/g list.
Jeff
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
2004-10-27 21:29 ` Andrew Morton
2004-10-27 21:31 ` Jeff Garzik
@ 2004-10-27 21:34 ` William Lee Irwin III
1 sibling, 0 replies; 13+ messages in thread
From: William Lee Irwin III @ 2004-10-27 21:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Jeff Garzik, mbligh, linux-kernel, linux-mm, bzolnier, rddunlap, axboe
Jeff Garzik <jgarzik@pobox.com> wrote:
>> However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If
>> rather slower. Is this patch acceptable to everyone? Andrew?
On Wed, Oct 27, 2004 at 02:29:14PM -0700, Andrew Morton wrote:
> spose so. The scatterlist API is being a bit silly there.
> It might be worthwhile doing:
> #ifdef CONFIG_DISCONTIGMEM
> #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + n)
> #else
> #define nth_page(page,n) ((page)+(n))
> #endif
This is actually not quite good enough. Zones are not guaranteed
to have adjacent mem_map[]'s even with CONFIG_DISCONTIGMEM=n. It may
make sense to prevent merging from spanning zones, but frankly the
overhead of the pfn_to_page()/page_to_pfn() is negligible in comparison
to the data movement and (when applicable) virtual windowing, where in
the merging code cpu overhead is a greater concern, particularly for
devices that don't require manual data movement.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: news about IDE PIO HIGHMEM bug
2004-10-27 18:33 ` news about IDE PIO HIGHMEM bug Jeff Garzik
2004-10-27 18:48 ` William Lee Irwin III
@ 2004-10-28 0:18 ` William Lee Irwin III
1 sibling, 0 replies; 13+ messages in thread
From: William Lee Irwin III @ 2004-10-28 0:18 UTC (permalink / raw)
To: Jeff Garzik
Cc: Christoph Hellwig, Martin J. Bligh, Linux Kernel, linux-mm,
Bartlomiej Zolnierkiewicz, Randy.Dunlap, Jens Axboe
Christoph Hellwig wrote:
>> I think this is the wrong level of interface exposed. Just add two hepler
>> kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
>> even if it's bigger than a page.
On Wed, Oct 27, 2004 at 02:33:45PM -0400, Jeff Garzik wrote:
> Why bother mapping anything larger than a page, when none of the users
> need it?
> P.S. In your scheme you would need four helpers; you forgot kmap_sg()
> and kunmap_sg().
The scheme hch suggested is highly invasive in the area of architecture-
specific fixmap layout and introduces a dependency of fixmap layout on
maximum segment size, which may make it current normal maximum segment
sizes use prohibitive amounts of vmallocspace on 32-bit architectures.
So I'd drop that suggestion, though it's not particularly farfetched.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2004-10-28 0:18 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <58cb370e041027074676750027@mail.gmail.com>
2004-10-27 15:14 ` news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1) Jeff Garzik
2004-10-27 15:52 ` Martin J. Bligh
2004-10-27 15:59 ` Jeff Garzik
2004-10-27 17:36 ` Martin J. Bligh
2004-10-27 16:01 ` Martin J. Bligh
2004-10-27 16:35 ` [PATCH] " Jeff Garzik
2004-10-27 21:29 ` Andrew Morton
2004-10-27 21:31 ` Jeff Garzik
2004-10-27 21:34 ` William Lee Irwin III
2004-10-27 18:08 ` Christoph Hellwig
2004-10-27 18:33 ` news about IDE PIO HIGHMEM bug Jeff Garzik
2004-10-27 18:48 ` William Lee Irwin III
2004-10-28 0:18 ` William Lee Irwin III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox