* allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
[not found] ` <20051208101833.GM14509@schatzie.adilger.int>
@ 2005-12-08 13:42 ` Christoph Hellwig
2005-12-08 13:58 ` Pekka Enberg
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Christoph Hellwig @ 2005-12-08 13:42 UTC (permalink / raw)
To: FUJITA Tomonori, michaelc, hch, linux-fsdevel, ext2-devel,
open-iscsi, linux-mm, linux-kernel
On Thu, Dec 08, 2005 at 03:18:33AM -0700, Andreas Dilger wrote:
> What happens on 1kB or 2kB block filesystems (i.e. b_size != PAGE_SIZE)?
> This will allocate a whole page for each block (which may be considerable
> overhead on e.g. a 64kB PAGE_SIZE ia64 or PPC system).
Yes. How often do we trigger this codepath?
The problem we're trying to solve here is how do implement network block
devices (nbd, iscsi) efficiently. The zero copy codepath in the networking
layer does need to grab additional references to pages. So to use sendpage
we need a refcountable page. pages used by the slab allocator are not
normally refcounted so try to do get_page/pub_page on them will break.
One way to work around that would be to detect kmalloced pages and use
a slowpath for that. The major issues with that is that we don't have a
reliable way to detect if a given struct page comes from the slab allocator
or not. The minor problem is that even with such an indicator it means
having a separate and lightly tested slowpath for this rare case.
All in all I think we should document that the block layer only accepts
properly refcounted pages, which is everything but kmalloced pages (even
vmalloc is totally fine)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
2005-12-08 13:42 ` allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages Christoph Hellwig
@ 2005-12-08 13:58 ` Pekka Enberg
2005-12-12 17:27 ` Christoph Hellwig
2005-12-08 18:18 ` Mike Christie
2005-12-11 0:47 ` Andrew Morton
2 siblings, 1 reply; 10+ messages in thread
From: Pekka Enberg @ 2005-12-08 13:58 UTC (permalink / raw)
To: Christoph Hellwig, FUJITA Tomonori, michaelc, linux-fsdevel,
ext2-devel, open-iscsi, linux-mm, linux-kernel
Hi Christoph,
On 12/8/05, Christoph Hellwig <hch@infradead.org> wrote:
> One way to work around that would be to detect kmalloced pages and use
> a slowpath for that. The major issues with that is that we don't have a
> reliable way to detect if a given struct page comes from the slab allocator
> or not.
Why doesn't PageSlab work for you?
Pekka
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
2005-12-08 13:42 ` allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages Christoph Hellwig
2005-12-08 13:58 ` Pekka Enberg
@ 2005-12-08 18:18 ` Mike Christie
2005-12-08 18:22 ` Mike Christie
2005-12-11 0:47 ` Andrew Morton
2 siblings, 1 reply; 10+ messages in thread
From: Mike Christie @ 2005-12-08 18:18 UTC (permalink / raw)
To: Christoph Hellwig
Cc: FUJITA Tomonori, linux-fsdevel, ext2-devel, open-iscsi, linux-mm,
linux-kernel
Christoph Hellwig wrote:
> On Thu, Dec 08, 2005 at 03:18:33AM -0700, Andreas Dilger wrote:
>
>>What happens on 1kB or 2kB block filesystems (i.e. b_size != PAGE_SIZE)?
>>This will allocate a whole page for each block (which may be considerable
>>overhead on e.g. a 64kB PAGE_SIZE ia64 or PPC system).
>
>
> Yes. How often do we trigger this codepath?
>
> The problem we're trying to solve here is how do implement network block
> devices (nbd, iscsi) efficiently. The zero copy codepath in the networking
> layer does need to grab additional references to pages. So to use sendpage
> we need a refcountable page. pages used by the slab allocator are not
> normally refcounted so try to do get_page/pub_page on them will break.
>
> One way to work around that would be to detect kmalloced pages and use
> a slowpath for that. The major issues with that is that we don't have a
> reliable way to detect if a given struct page comes from the slab allocator
> or not. The minor problem is that even with such an indicator it means
> having a separate and lightly tested slowpath for this rare case.
>
> All in all I think we should document that the block layer only accepts
> properly refcounted pages, which is everything but kmalloced pages (even
> vmalloc is totally fine)
Is it anytime kmalloc is used? For scsi when it uses scsi_execute* for
something like scanning (report luns result is kmallocd) would this be a
problem?
If PageSlab() does work, then could we have a request queue flag that
bounces those pages for all block layer drivers. Pretty slow and yucky
but if we have to convert SCSI and maybe other parts of the block layer
maybe it will be easiest for now.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
2005-12-08 18:18 ` Mike Christie
@ 2005-12-08 18:22 ` Mike Christie
2005-12-08 19:20 ` Pekka Enberg
0 siblings, 1 reply; 10+ messages in thread
From: Mike Christie @ 2005-12-08 18:22 UTC (permalink / raw)
To: open-iscsi
Cc: Christoph Hellwig, FUJITA Tomonori, linux-fsdevel, ext2-devel,
linux-mm, linux-kernel
Mike Christie wrote:
>
> Christoph Hellwig wrote:
>
>> On Thu, Dec 08, 2005 at 03:18:33AM -0700, Andreas Dilger wrote:
>>
>>> What happens on 1kB or 2kB block filesystems (i.e. b_size != PAGE_SIZE)?
>>> This will allocate a whole page for each block (which may be
>>> considerable
>>> overhead on e.g. a 64kB PAGE_SIZE ia64 or PPC system).
>>
>>
>>
>> Yes. How often do we trigger this codepath?
>>
>> The problem we're trying to solve here is how do implement network block
>> devices (nbd, iscsi) efficiently. The zero copy codepath in the
>> networking
>> layer does need to grab additional references to pages. So to use
>> sendpage
>> we need a refcountable page. pages used by the slab allocator are not
>> normally refcounted so try to do get_page/pub_page on them will break.
>>
>> One way to work around that would be to detect kmalloced pages and use
>> a slowpath for that. The major issues with that is that we don't have a
>> reliable way to detect if a given struct page comes from the slab
>> allocator
>> or not. The minor problem is that even with such an indicator it means
>> having a separate and lightly tested slowpath for this rare case.
>>
>> All in all I think we should document that the block layer only accepts
>> properly refcounted pages, which is everything but kmalloced pages (even
>> vmalloc is totally fine)
>
>
> Is it anytime kmalloc is used? For scsi when it uses scsi_execute* for
> something like scanning (report luns result is kmallocd) would this be a
> problem?
>
> If PageSlab() does work, then could we have a request queue flag that
> bounces those pages for all block layer drivers. Pretty slow and yucky
> but if we have to convert SCSI and maybe other parts of the block layer
> maybe it will be easiest for now.
>
Or there is not a way to do kmalloc(GFP_BLK) that gives us the right
type of memory is there?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
2005-12-08 18:22 ` Mike Christie
@ 2005-12-08 19:20 ` Pekka Enberg
0 siblings, 0 replies; 10+ messages in thread
From: Pekka Enberg @ 2005-12-08 19:20 UTC (permalink / raw)
To: Mike Christie
Cc: open-iscsi, Christoph Hellwig, FUJITA Tomonori, linux-fsdevel,
ext2-devel, linux-mm, linux-kernel
Hi,
On 12/8/05, Mike Christie <michaelc@cs.wisc.edu> wrote:
> Or there is not a way to do kmalloc(GFP_BLK) that gives us the right
> type of memory is there?
The slab allocator uses page->lru for special purposes. See
page_{set|get}_{cache|slab} in mm/slab.c. They are used by kfree(),
ksize() and slab debugging code to lookup the cache and slab an void
pointer belongs to.
But, if you just need put_page and get_page, couldn't you do something
like the following?
Pekka
Index: 2.6/mm/swap.c
===================================================================
--- 2.6.orig/mm/swap.c
+++ 2.6/mm/swap.c
@@ -36,6 +36,9 @@ int page_cluster;
void put_page(struct page *page)
{
+ if (unlikely(PageSlab(page)))
+ return;
+
if (unlikely(PageCompound(page))) {
page = (struct page *)page_private(page);
if (put_page_testzero(page)) {
Index: 2.6/include/linux/mm.h
===================================================================
--- 2.6.orig/include/linux/mm.h
+++ 2.6/include/linux/mm.h
@@ -322,6 +322,9 @@ static inline int page_count(struct page
static inline void get_page(struct page *page)
{
+ if (unlikely(PageSlab(page)))
+ return;
+
if (unlikely(PageCompound(page)))
page = (struct page *)page_private(page);
atomic_inc(&page->_count);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
2005-12-08 13:42 ` allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages Christoph Hellwig
2005-12-08 13:58 ` Pekka Enberg
2005-12-08 18:18 ` Mike Christie
@ 2005-12-11 0:47 ` Andrew Morton
2005-12-11 8:44 ` Arjan van de Ven
2005-12-12 17:25 ` Christoph Hellwig
2 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2005-12-11 0:47 UTC (permalink / raw)
To: Christoph Hellwig
Cc: fujita.tomonori, michaelc, linux-fsdevel, ext2-devel, open-iscsi,
linux-mm, linux-kernel
Christoph Hellwig <hch@infradead.org> wrote:
>
> The problem we're trying to solve here is how do implement network block
> devices (nbd, iscsi) efficiently. The zero copy codepath in the networking
> layer does need to grab additional references to pages. So to use sendpage
> we need a refcountable page. pages used by the slab allocator are not
> normally refcounted so try to do get_page/pub_page on them will break.
I don't get it. Doing get_page/put_page on a slab-allocated page should do
the right thing?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
2005-12-11 0:47 ` Andrew Morton
@ 2005-12-11 8:44 ` Arjan van de Ven
2005-12-12 17:25 ` Christoph Hellwig
1 sibling, 0 replies; 10+ messages in thread
From: Arjan van de Ven @ 2005-12-11 8:44 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, open-iscsi, ext2-devel, linux-fsdevel,
michaelc, fujita.tomonori, Christoph Hellwig
On Sat, 2005-12-10 at 16:47 -0800, Andrew Morton wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> >
> > The problem we're trying to solve here is how do implement network block
> > devices (nbd, iscsi) efficiently. The zero copy codepath in the networking
> > layer does need to grab additional references to pages. So to use sendpage
> > we need a refcountable page. pages used by the slab allocator are not
> > normally refcounted so try to do get_page/pub_page on them will break.
>
> I don't get it. Doing get_page/put_page on a slab-allocated page should do
> the right thing?
but it doesn't stop the kfree from freeing the memory; zero copy needs
the content of the memory to stay around afterwards, eg it wants to
delay the kfree until the data is over the wire, which is an
asynchronous event versus the actual send command in a zero-copy
situation.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
2005-12-11 0:47 ` Andrew Morton
2005-12-11 8:44 ` Arjan van de Ven
@ 2005-12-12 17:25 ` Christoph Hellwig
2005-12-12 20:12 ` Andrew Morton
1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2005-12-12 17:25 UTC (permalink / raw)
To: Andrew Morton
Cc: Christoph Hellwig, fujita.tomonori, michaelc, linux-fsdevel,
ext2-devel, open-iscsi, linux-mm, linux-kernel
On Sat, Dec 10, 2005 at 04:47:36PM -0800, Andrew Morton wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> >
> > The problem we're trying to solve here is how do implement network block
> > devices (nbd, iscsi) efficiently. The zero copy codepath in the networking
> > layer does need to grab additional references to pages. So to use sendpage
> > we need a refcountable page. pages used by the slab allocator are not
> > normally refcounted so try to do get_page/pub_page on them will break.
>
> I don't get it. Doing get_page/put_page on a slab-allocated page should do
> the right thing?
As Arjan mentioned, what would be the right thing? Delaying returning the
page to the page pool and disallow reuse until page count reaches zero?
All this seems highly impractical.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
2005-12-08 13:58 ` Pekka Enberg
@ 2005-12-12 17:27 ` Christoph Hellwig
0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2005-12-12 17:27 UTC (permalink / raw)
To: Pekka Enberg
Cc: Christoph Hellwig, FUJITA Tomonori, michaelc, linux-fsdevel,
ext2-devel, open-iscsi, linux-mm, linux-kernel
On Thu, Dec 08, 2005 at 03:58:46PM +0200, Pekka Enberg wrote:
> Hi Christoph,
>
> On 12/8/05, Christoph Hellwig <hch@infradead.org> wrote:
> > One way to work around that would be to detect kmalloced pages and use
> > a slowpath for that. The major issues with that is that we don't have a
> > reliable way to detect if a given struct page comes from the slab allocator
> > or not.
>
> Why doesn't PageSlab work for you?
When I looked last time it was a noop without slab debugging enabled,
but that's not the case in current mainline anymore.
If the VM people agree with that usage we could at least use it to fall
back to slow-path. Even better would be to require normal pages, though.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
2005-12-12 17:25 ` Christoph Hellwig
@ 2005-12-12 20:12 ` Andrew Morton
0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2005-12-12 20:12 UTC (permalink / raw)
To: Christoph Hellwig
Cc: fujita.tomonori, michaelc, linux-fsdevel, ext2-devel, open-iscsi,
linux-mm, linux-kernel
Christoph Hellwig <hch@infradead.org> wrote:
>
> On Sat, Dec 10, 2005 at 04:47:36PM -0800, Andrew Morton wrote:
> > Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > The problem we're trying to solve here is how do implement network block
> > > devices (nbd, iscsi) efficiently. The zero copy codepath in the networking
> > > layer does need to grab additional references to pages. So to use sendpage
> > > we need a refcountable page. pages used by the slab allocator are not
> > > normally refcounted so try to do get_page/pub_page on them will break.
> >
> > I don't get it. Doing get_page/put_page on a slab-allocated page should do
> > the right thing?
>
> As Arjan mentioned, what would be the right thing? Delaying returning the
> page to the page pool and disallow reuse until page count reaches zero?
Yes, that's what'll happen. slab will put its final ref to the page, so
whoever did that intervening get_page() ends up owning the page.
> All this seems highly impractical.
Well, as Arjan points out, doing get_page() won't prevent slab from
"freeing" a part of the page and reusing it for another object of the same
type.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-12-12 20:12 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20051208180900T.fujita.tomonori@lab.ntt.co.jp>
[not found] ` <20051208101833.GM14509@schatzie.adilger.int>
2005-12-08 13:42 ` allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages Christoph Hellwig
2005-12-08 13:58 ` Pekka Enberg
2005-12-12 17:27 ` Christoph Hellwig
2005-12-08 18:18 ` Mike Christie
2005-12-08 18:22 ` Mike Christie
2005-12-08 19:20 ` Pekka Enberg
2005-12-11 0:47 ` Andrew Morton
2005-12-11 8:44 ` Arjan van de Ven
2005-12-12 17:25 ` Christoph Hellwig
2005-12-12 20:12 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox