Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
       [not found] <58cb370e041027074676750027@mail.gmail.com>
@ 2004-10-27 15:14 ` Jeff Garzik
  2004-10-27 15:52   ` Martin J. Bligh
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2004-10-27 15:14 UTC (permalink / raw)
  To: Linux Kernel, linux-mm
  Cc: Bartlomiej Zolnierkiewicz, Randy.Dunlap, William Lee Irwin III,
	Jens Axboe

Bartlomiej Zolnierkiewicz wrote:
> We have stuct page of the first page and a offset.
> We need to obtain struct page of the current page and map it.


Opening this question to a wider audience.

struct scatterlist gives us struct page*, and an offset+length pair. 
The struct page* is the _starting_ page of a potentially multi-page run 
of data.

The question:  how does one get struct page* for the second, and 
successive pages in a known-contiguous multi-page run, if one only knows 
the first page?

	Jeff


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
  2004-10-27 15:14 ` news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1) Jeff Garzik
@ 2004-10-27 15:52   ` Martin J. Bligh
  2004-10-27 15:59     ` Jeff Garzik
  2004-10-27 16:01     ` Martin J. Bligh
  0 siblings, 2 replies; 13+ messages in thread
From: Martin J. Bligh @ 2004-10-27 15:52 UTC (permalink / raw)
  To: Jeff Garzik, Linux Kernel, linux-mm
  Cc: Bartlomiej Zolnierkiewicz, Randy.Dunlap, William Lee Irwin III,
	Jens Axboe

> Bartlomiej Zolnierkiewicz wrote:
>> We have stuct page of the first page and a offset.
>> We need to obtain struct page of the current page and map it.
> 
> 
> Opening this question to a wider audience.
> 
> struct scatterlist gives us struct page*, and an offset+length pair. The struct page* is the _starting_ page of a potentially multi-page run of data.
> 
> The question:  how does one get struct page* for the second, and successive pages in a known-contiguous multi-page run, if one only knows the first page?

If it's a higher order allocation, just page+1 should be safe. If it just
happens to be contig, it might cross a discontig boundary, and not obey
that rule. Very unlikely, but possible.

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
  2004-10-27 15:52   ` Martin J. Bligh
@ 2004-10-27 15:59     ` Jeff Garzik
  2004-10-27 17:36       ` Martin J. Bligh
  2004-10-27 16:01     ` Martin J. Bligh
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2004-10-27 15:59 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Linux Kernel, linux-mm, Bartlomiej Zolnierkiewicz, Randy.Dunlap,
	William Lee Irwin III, Jens Axboe, Andrew Morton

Martin J. Bligh wrote:
>>Bartlomiej Zolnierkiewicz wrote:
>>
>>>We have stuct page of the first page and a offset.
>>>We need to obtain struct page of the current page and map it.
>>
>>
>>Opening this question to a wider audience.
>>
>>struct scatterlist gives us struct page*, and an offset+length pair. The struct page* is the _starting_ page of a potentially multi-page run of data.
>>
>>The question:  how does one get struct page* for the second, and successive pages in a known-contiguous multi-page run, if one only knows the first page?
> 
> 
> If it's a higher order allocation, just page+1 should be safe. If it just
> happens to be contig, it might cross a discontig boundary, and not obey
> that rule. Very unlikely, but possible.


Unfortunately, it's not.

The block layer just tells us "it's a contiguous run of memory", which 
implies nothing really about the allocation size.

Bart and I (and others?) essentially need a "page+1" thing (for 2.4.x 
too!), that won't break in the face of NUMA/etc.

Alternatively (or additionally), we may need to make sure the block 
layer doesn't merge across zones or NUMA boundaries or whatnot.

	Jeff


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
  2004-10-27 15:52   ` Martin J. Bligh
  2004-10-27 15:59     ` Jeff Garzik
@ 2004-10-27 16:01     ` Martin J. Bligh
  2004-10-27 16:35       ` [PATCH] " Jeff Garzik
  2004-10-27 18:08       ` Christoph Hellwig
  1 sibling, 2 replies; 13+ messages in thread
From: Martin J. Bligh @ 2004-10-27 16:01 UTC (permalink / raw)
  To: Jeff Garzik, Linux Kernel, linux-mm
  Cc: Bartlomiej Zolnierkiewicz, Randy.Dunlap, William Lee Irwin III,
	Jens Axboe

--"Martin J. Bligh" <mbligh@aracnet.com> wrote (on Wednesday, October 27, 2004 08:52:39 -0700):

>> Bartlomiej Zolnierkiewicz wrote:
>>> We have stuct page of the first page and a offset.
>>> We need to obtain struct page of the current page and map it.
>> 
>> 
>> Opening this question to a wider audience.
>> 
>> struct scatterlist gives us struct page*, and an offset+length pair. The struct page* is the _starting_ page of a potentially multi-page run of data.
>> 
>> The question:  how does one get struct page* for the second, and successive pages in a known-contiguous multi-page run, if one only knows the first page?
> 
> If it's a higher order allocation, just page+1 should be safe. If it just
> happens to be contig, it might cross a discontig boundary, and not obey
> that rule. Very unlikely, but possible.

To repeat what I said in IRC ... ;-)

Actually, you could check this with the pfns being the same when >> MAX_ORDER-1.
We should be aligned on a MAX_ORDER boundary, I think.

However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
  2004-10-27 16:01     ` Martin J. Bligh
@ 2004-10-27 16:35       ` Jeff Garzik
  2004-10-27 21:29         ` Andrew Morton
  2004-10-27 18:08       ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2004-10-27 16:35 UTC (permalink / raw)
  To: Martin J. Bligh, Andrew Morton
  Cc: Linux Kernel, linux-mm, Bartlomiej Zolnierkiewicz, Randy.Dunlap,
	William Lee Irwin III, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

Martin J. Bligh wrote:
> To repeat what I said in IRC ... ;-)
> 
> Actually, you could check this with the pfns being the same when >> MAX_ORDER-1.
> We should be aligned on a MAX_ORDER boundary, I think.
> 
> However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.


Is this patch acceptable to everyone?  Andrew?

It uses the publicly-exported pfn_to_page/page_to_pfn abstraction, which 
seems to be the only way to accomplish what we want to do in IDE/libata.

	Jeff



[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 401 bytes --]

===== include/linux/mm.h 1.193 vs edited =====
--- 1.193/include/linux/mm.h	2004-10-20 04:37:06 -04:00
+++ edited/include/linux/mm.h	2004-10-27 12:33:28 -04:00
@@ -41,6 +41,8 @@
 #define MM_VM_SIZE(mm)	TASK_SIZE
 #endif
 
+#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + n)
+
 /*
  * Linux kernel virtual memory manager primitives.
  * The idea being to have a "virtual" mm in the same way

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
  2004-10-27 15:59     ` Jeff Garzik
@ 2004-10-27 17:36       ` Martin J. Bligh
  0 siblings, 0 replies; 13+ messages in thread
From: Martin J. Bligh @ 2004-10-27 17:36 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Linux Kernel, linux-mm, Bartlomiej Zolnierkiewicz, Randy.Dunlap,
	William Lee Irwin III, Jens Axboe, Andrew Morton

> Unfortunately, it's not.
> 
> The block layer just tells us "it's a contiguous run of memory", which implies nothing really about the allocation size.
> 
> Bart and I (and others?) essentially need a "page+1" thing (for 2.4.x too!), that won't break in the face of NUMA/etc.
> 
> Alternatively (or additionally), we may need to make sure the block layer doesn't merge across zones or NUMA boundaries or whatnot.


The latter would be rather more efficient. I don't know how often you 
end up doing each operation though ... the page+1 vs the attemtped merge.
Depends on the ratio,  I guess.

M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
  2004-10-27 16:01     ` Martin J. Bligh
  2004-10-27 16:35       ` [PATCH] " Jeff Garzik
@ 2004-10-27 18:08       ` Christoph Hellwig
  2004-10-27 18:33         ` news about IDE PIO HIGHMEM bug Jeff Garzik
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2004-10-27 18:08 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Jeff Garzik, Linux Kernel, linux-mm, Bartlomiej Zolnierkiewicz,
	Randy.Dunlap, William Lee Irwin III, Jens Axboe

> To repeat what I said in IRC ... ;-)
> 
> Actually, you could check this with the pfns being the same when >> MAX_ORDER-1.
> We should be aligned on a MAX_ORDER boundary, I think.
> 
> However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.

I think this is the wrong level of interface exposed.  Just add two hepler
kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
even if it's bigger than a page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: news about IDE PIO HIGHMEM bug
  2004-10-27 18:08       ` Christoph Hellwig
@ 2004-10-27 18:33         ` Jeff Garzik
  2004-10-27 18:48           ` William Lee Irwin III
  2004-10-28  0:18           ` William Lee Irwin III
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff Garzik @ 2004-10-27 18:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Martin J. Bligh, Linux Kernel, linux-mm,
	Bartlomiej Zolnierkiewicz, Randy.Dunlap, William Lee Irwin III,
	Jens Axboe

Christoph Hellwig wrote:
>>To repeat what I said in IRC ... ;-)
>>
>>Actually, you could check this with the pfns being the same when >> MAX_ORDER-1.
>>We should be aligned on a MAX_ORDER boundary, I think.
>>
>>However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.
> 
> 
> I think this is the wrong level of interface exposed.  Just add two hepler
> kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
> even if it's bigger than a page.

Why bother mapping anything larger than a page, when none of the users 
need it?

	Jeff



P.S. In your scheme you would need four helpers; you forgot kmap_sg() 
and kunmap_sg().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: news about IDE PIO HIGHMEM bug
  2004-10-27 18:33         ` news about IDE PIO HIGHMEM bug Jeff Garzik
@ 2004-10-27 18:48           ` William Lee Irwin III
  2004-10-28  0:18           ` William Lee Irwin III
  1 sibling, 0 replies; 13+ messages in thread
From: William Lee Irwin III @ 2004-10-27 18:48 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Christoph Hellwig, Linux Kernel, linux-mm,
	Bartlomiej Zolnierkiewicz, Randy.Dunlap, Jens Axboe,
	James Bottomley

Christoph Hellwig wrote:
>> I think this is the wrong level of interface exposed.  Just add two hepler
>> kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
>> even if it's bigger than a page.

On Wed, Oct 27, 2004 at 02:33:45PM -0400, Jeff Garzik wrote:
> Why bother mapping anything larger than a page, when none of the users 
> need it?
> P.S. In your scheme you would need four helpers; you forgot kmap_sg() 
> and kunmap_sg().

This is all a non-issue. The page structure just represents little more
than a physical address to the block layer in the context of merging,
so the pfn_to_page(page_to_pfn(...) + ...) bits calculate this properly.
There is just nothing interesting going on here. Generate the page
structure for the piece of the segment, kmap_atomic() it, and it's done.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
  2004-10-27 16:35       ` [PATCH] " Jeff Garzik
@ 2004-10-27 21:29         ` Andrew Morton
  2004-10-27 21:31           ` Jeff Garzik
  2004-10-27 21:34           ` William Lee Irwin III
  0 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2004-10-27 21:29 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: mbligh, linux-kernel, linux-mm, bzolnier, rddunlap, wli, axboe

Jeff Garzik <jgarzik@pobox.com> wrote:
>
> > However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.
> 
> 
> Is this patch acceptable to everyone?  Andrew?

spose so.  The scatterlist API is being a bit silly there.

It might be worthwhile doing:

#ifdef CONFIG_DISCONTIGMEM
#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + n)
#else
#define nth_page(page,n) ((page)+(n))
#endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
  2004-10-27 21:29         ` Andrew Morton
@ 2004-10-27 21:31           ` Jeff Garzik
  2004-10-27 21:34           ` William Lee Irwin III
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff Garzik @ 2004-10-27 21:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mbligh, linux-kernel, linux-mm, bzolnier, rddunlap, wli, axboe

On Wed, Oct 27, 2004 at 02:29:14PM -0700, Andrew Morton wrote:
> spose so.  The scatterlist API is being a bit silly there.

Well, it depends on your perspective :)

Each scatterlist entry is supposed to map to a physical segment to be
passed to h/w.  Hardware S/G tables just want to see a addr/len pair,
and don't care about machine page size.  scatterlist follows a similar
model.

dma_map_sg() and other helpers create a favorable situation, where >90%
of the drivers don't have to care about the VM-size details.
Unfortunately those drivers that need need to do their own data transfer
(like ATA's PIO, instead of DMA) need direct access to each member of an
s/g list.

	Jeff

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Re: news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1)
  2004-10-27 21:29         ` Andrew Morton
  2004-10-27 21:31           ` Jeff Garzik
@ 2004-10-27 21:34           ` William Lee Irwin III
  1 sibling, 0 replies; 13+ messages in thread
From: William Lee Irwin III @ 2004-10-27 21:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeff Garzik, mbligh, linux-kernel, linux-mm, bzolnier, rddunlap, axboe

Jeff Garzik <jgarzik@pobox.com> wrote:
>> However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If
>> rather slower. Is this patch acceptable to everyone?  Andrew?

On Wed, Oct 27, 2004 at 02:29:14PM -0700, Andrew Morton wrote:
> spose so.  The scatterlist API is being a bit silly there.
> It might be worthwhile doing:
> #ifdef CONFIG_DISCONTIGMEM
> #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + n)
> #else
> #define nth_page(page,n) ((page)+(n))
> #endif

This is actually not quite good enough. Zones are not guaranteed
to have adjacent mem_map[]'s even with CONFIG_DISCONTIGMEM=n. It may
make sense to prevent merging from spanning zones, but frankly the
overhead of the pfn_to_page()/page_to_pfn() is negligible in comparison
to the data movement and (when applicable) virtual windowing, where in
the merging code cpu overhead is a greater concern, particularly for
devices that don't require manual data movement.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: news about IDE PIO HIGHMEM bug
  2004-10-27 18:33         ` news about IDE PIO HIGHMEM bug Jeff Garzik
  2004-10-27 18:48           ` William Lee Irwin III
@ 2004-10-28  0:18           ` William Lee Irwin III
  1 sibling, 0 replies; 13+ messages in thread
From: William Lee Irwin III @ 2004-10-28  0:18 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Christoph Hellwig, Martin J. Bligh, Linux Kernel, linux-mm,
	Bartlomiej Zolnierkiewicz, Randy.Dunlap, Jens Axboe

Christoph Hellwig wrote:
>> I think this is the wrong level of interface exposed.  Just add two hepler
>> kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
>> even if it's bigger than a page.

On Wed, Oct 27, 2004 at 02:33:45PM -0400, Jeff Garzik wrote:
> Why bother mapping anything larger than a page, when none of the users 
> need it?
> P.S. In your scheme you would need four helpers; you forgot kmap_sg() 
> and kunmap_sg().

The scheme hch suggested is highly invasive in the area of architecture-
specific fixmap layout and introduces a dependency of fixmap layout on
maximum segment size, which may make it current normal maximum segment
sizes use prohibitive amounts of vmallocspace on 32-bit architectures.

So I'd drop that suggestion, though it's not particularly farfetched.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-10-28  0:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <58cb370e041027074676750027@mail.gmail.com>
2004-10-27 15:14 ` news about IDE PIO HIGHMEM bug (was: Re: 2.6.9-mm1) Jeff Garzik
2004-10-27 15:52   ` Martin J. Bligh
2004-10-27 15:59     ` Jeff Garzik
2004-10-27 17:36       ` Martin J. Bligh
2004-10-27 16:01     ` Martin J. Bligh
2004-10-27 16:35       ` [PATCH] " Jeff Garzik
2004-10-27 21:29         ` Andrew Morton
2004-10-27 21:31           ` Jeff Garzik
2004-10-27 21:34           ` William Lee Irwin III
2004-10-27 18:08       ` Christoph Hellwig
2004-10-27 18:33         ` news about IDE PIO HIGHMEM bug Jeff Garzik
2004-10-27 18:48           ` William Lee Irwin III
2004-10-28  0:18           ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox