linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
       [not found] ` <20051208101833.GM14509@schatzie.adilger.int>
@ 2005-12-08 13:42   ` Christoph Hellwig
  2005-12-08 13:58     ` Pekka Enberg
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Christoph Hellwig @ 2005-12-08 13:42 UTC (permalink / raw)
  To: FUJITA Tomonori, michaelc, hch, linux-fsdevel, ext2-devel,
	open-iscsi, linux-mm, linux-kernel

On Thu, Dec 08, 2005 at 03:18:33AM -0700, Andreas Dilger wrote:
> What happens on 1kB or 2kB block filesystems (i.e. b_size != PAGE_SIZE)?
> This will allocate a whole page for each block (which may be considerable
> overhead on e.g. a 64kB PAGE_SIZE ia64 or PPC system).

Yes.  How often do we trigger this codepath?

The problem we're trying to solve here is how do implement network block
devices (nbd, iscsi) efficiently.  The zero copy codepath in the networking
layer does need to grab additional references to pages.  So to use sendpage
we need a refcountable page.  pages used by the slab allocator are not
normally refcounted so try to do get_page/pub_page on them will break.

One way to work around that would be to detect kmalloced pages and use
a slowpath for that.  The major issues with that is that we don't have a
reliable way to detect if a given struct page comes from the slab allocator
or not.  The minor problem is that even with such an indicator it means
having a separate and lightly tested slowpath for this rare case.

All in all I think we should document that the block layer only accepts
properly refcounted pages, which is everything but kmalloced pages (even
vmalloc is totally fine)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
  2005-12-08 13:42   ` allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages Christoph Hellwig
@ 2005-12-08 13:58     ` Pekka Enberg
  2005-12-12 17:27       ` Christoph Hellwig
  2005-12-08 18:18     ` Mike Christie
  2005-12-11  0:47     ` Andrew Morton
  2 siblings, 1 reply; 10+ messages in thread
From: Pekka Enberg @ 2005-12-08 13:58 UTC (permalink / raw)
  To: Christoph Hellwig, FUJITA Tomonori, michaelc, linux-fsdevel,
	ext2-devel, open-iscsi, linux-mm, linux-kernel

Hi Christoph,

On 12/8/05, Christoph Hellwig <hch@infradead.org> wrote:
> One way to work around that would be to detect kmalloced pages and use
> a slowpath for that.  The major issues with that is that we don't have a
> reliable way to detect if a given struct page comes from the slab allocator
> or not.

Why doesn't PageSlab work for you?

                                                          Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
  2005-12-08 13:42   ` allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages Christoph Hellwig
  2005-12-08 13:58     ` Pekka Enberg
@ 2005-12-08 18:18     ` Mike Christie
  2005-12-08 18:22       ` Mike Christie
  2005-12-11  0:47     ` Andrew Morton
  2 siblings, 1 reply; 10+ messages in thread
From: Mike Christie @ 2005-12-08 18:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: FUJITA Tomonori, linux-fsdevel, ext2-devel, open-iscsi, linux-mm,
	linux-kernel

Christoph Hellwig wrote:
> On Thu, Dec 08, 2005 at 03:18:33AM -0700, Andreas Dilger wrote:
> 
>>What happens on 1kB or 2kB block filesystems (i.e. b_size != PAGE_SIZE)?
>>This will allocate a whole page for each block (which may be considerable
>>overhead on e.g. a 64kB PAGE_SIZE ia64 or PPC system).
> 
> 
> Yes.  How often do we trigger this codepath?
> 
> The problem we're trying to solve here is how do implement network block
> devices (nbd, iscsi) efficiently.  The zero copy codepath in the networking
> layer does need to grab additional references to pages.  So to use sendpage
> we need a refcountable page.  pages used by the slab allocator are not
> normally refcounted so try to do get_page/pub_page on them will break.
> 
> One way to work around that would be to detect kmalloced pages and use
> a slowpath for that.  The major issues with that is that we don't have a
> reliable way to detect if a given struct page comes from the slab allocator
> or not.  The minor problem is that even with such an indicator it means
> having a separate and lightly tested slowpath for this rare case.
> 
> All in all I think we should document that the block layer only accepts
> properly refcounted pages, which is everything but kmalloced pages (even
> vmalloc is totally fine)

Is it anytime kmalloc is used? For scsi when it uses scsi_execute* for 
something like scanning (report luns result is kmallocd) would this be a 
problem?

If PageSlab() does work, then could we have a request queue flag that 
bounces those pages for all block layer drivers. Pretty slow and yucky 
but if we have to convert SCSI and maybe other parts of the block layer 
maybe it will be easiest for now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
  2005-12-08 18:18     ` Mike Christie
@ 2005-12-08 18:22       ` Mike Christie
  2005-12-08 19:20         ` Pekka Enberg
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Christie @ 2005-12-08 18:22 UTC (permalink / raw)
  To: open-iscsi
  Cc: Christoph Hellwig, FUJITA Tomonori, linux-fsdevel, ext2-devel,
	linux-mm, linux-kernel

Mike Christie wrote:
> 
> Christoph Hellwig wrote:
> 
>> On Thu, Dec 08, 2005 at 03:18:33AM -0700, Andreas Dilger wrote:
>>
>>> What happens on 1kB or 2kB block filesystems (i.e. b_size != PAGE_SIZE)?
>>> This will allocate a whole page for each block (which may be 
>>> considerable
>>> overhead on e.g. a 64kB PAGE_SIZE ia64 or PPC system).
>>
>>
>>
>> Yes.  How often do we trigger this codepath?
>>
>> The problem we're trying to solve here is how do implement network block
>> devices (nbd, iscsi) efficiently.  The zero copy codepath in the 
>> networking
>> layer does need to grab additional references to pages.  So to use 
>> sendpage
>> we need a refcountable page.  pages used by the slab allocator are not
>> normally refcounted so try to do get_page/pub_page on them will break.
>>
>> One way to work around that would be to detect kmalloced pages and use
>> a slowpath for that.  The major issues with that is that we don't have a
>> reliable way to detect if a given struct page comes from the slab 
>> allocator
>> or not.  The minor problem is that even with such an indicator it means
>> having a separate and lightly tested slowpath for this rare case.
>>
>> All in all I think we should document that the block layer only accepts
>> properly refcounted pages, which is everything but kmalloced pages (even
>> vmalloc is totally fine)
> 
> 
> Is it anytime kmalloc is used? For scsi when it uses scsi_execute* for 
> something like scanning (report luns result is kmallocd) would this be a 
> problem?
> 
> If PageSlab() does work, then could we have a request queue flag that 
> bounces those pages for all block layer drivers. Pretty slow and yucky 
> but if we have to convert SCSI and maybe other parts of the block layer 
> maybe it will be easiest for now.
> 

Or there is not a way to do kmalloc(GFP_BLK) that gives us the right 
type of memory is there?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
  2005-12-08 18:22       ` Mike Christie
@ 2005-12-08 19:20         ` Pekka Enberg
  0 siblings, 0 replies; 10+ messages in thread
From: Pekka Enberg @ 2005-12-08 19:20 UTC (permalink / raw)
  To: Mike Christie
  Cc: open-iscsi, Christoph Hellwig, FUJITA Tomonori, linux-fsdevel,
	ext2-devel, linux-mm, linux-kernel

Hi,

On 12/8/05, Mike Christie <michaelc@cs.wisc.edu> wrote:
> Or there is not a way to do kmalloc(GFP_BLK) that gives us the right
> type of memory is there?

The slab allocator uses page->lru for special purposes. See
page_{set|get}_{cache|slab} in mm/slab.c. They are used by kfree(),
ksize() and slab debugging code to lookup the cache and slab an void
pointer belongs to.

But, if you just need put_page and get_page, couldn't you do something
like the following?

                                       Pekka

Index: 2.6/mm/swap.c
===================================================================
--- 2.6.orig/mm/swap.c
+++ 2.6/mm/swap.c
@@ -36,6 +36,9 @@ int page_cluster;

 void put_page(struct page *page)
 {
+	if (unlikely(PageSlab(page)))
+		return;
+
 	if (unlikely(PageCompound(page))) {
 		page = (struct page *)page_private(page);
 		if (put_page_testzero(page)) {
Index: 2.6/include/linux/mm.h
===================================================================
--- 2.6.orig/include/linux/mm.h
+++ 2.6/include/linux/mm.h
@@ -322,6 +322,9 @@ static inline int page_count(struct page

 static inline void get_page(struct page *page)
 {
+	if (unlikely(PageSlab(page)))
+		return;
+
 	if (unlikely(PageCompound(page)))
 		page = (struct page *)page_private(page);
 	atomic_inc(&page->_count);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
  2005-12-08 13:42   ` allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages Christoph Hellwig
  2005-12-08 13:58     ` Pekka Enberg
  2005-12-08 18:18     ` Mike Christie
@ 2005-12-11  0:47     ` Andrew Morton
  2005-12-11  8:44       ` Arjan van de Ven
  2005-12-12 17:25       ` Christoph Hellwig
  2 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2005-12-11  0:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: fujita.tomonori, michaelc, linux-fsdevel, ext2-devel, open-iscsi,
	linux-mm, linux-kernel

Christoph Hellwig <hch@infradead.org> wrote:
>
> The problem we're trying to solve here is how do implement network block
>  devices (nbd, iscsi) efficiently.  The zero copy codepath in the networking
>  layer does need to grab additional references to pages.  So to use sendpage
>  we need a refcountable page.  pages used by the slab allocator are not
>  normally refcounted so try to do get_page/pub_page on them will break.

I don't get it.  Doing get_page/put_page on a slab-allocated page should do
the right thing?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
  2005-12-11  0:47     ` Andrew Morton
@ 2005-12-11  8:44       ` Arjan van de Ven
  2005-12-12 17:25       ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Arjan van de Ven @ 2005-12-11  8:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, open-iscsi, ext2-devel, linux-fsdevel,
	michaelc, fujita.tomonori, Christoph Hellwig

On Sat, 2005-12-10 at 16:47 -0800, Andrew Morton wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> >
> > The problem we're trying to solve here is how do implement network block
> >  devices (nbd, iscsi) efficiently.  The zero copy codepath in the networking
> >  layer does need to grab additional references to pages.  So to use sendpage
> >  we need a refcountable page.  pages used by the slab allocator are not
> >  normally refcounted so try to do get_page/pub_page on them will break.
> 
> I don't get it.  Doing get_page/put_page on a slab-allocated page should do
> the right thing?

but it doesn't stop the kfree from freeing the memory; zero copy needs
the content of the memory to stay around afterwards, eg it wants to
delay the kfree until the data is over the wire, which is an
asynchronous event versus the actual send command in a zero-copy
situation. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
  2005-12-11  0:47     ` Andrew Morton
  2005-12-11  8:44       ` Arjan van de Ven
@ 2005-12-12 17:25       ` Christoph Hellwig
  2005-12-12 20:12         ` Andrew Morton
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2005-12-12 17:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, fujita.tomonori, michaelc, linux-fsdevel,
	ext2-devel, open-iscsi, linux-mm, linux-kernel

On Sat, Dec 10, 2005 at 04:47:36PM -0800, Andrew Morton wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> >
> > The problem we're trying to solve here is how do implement network block
> >  devices (nbd, iscsi) efficiently.  The zero copy codepath in the networking
> >  layer does need to grab additional references to pages.  So to use sendpage
> >  we need a refcountable page.  pages used by the slab allocator are not
> >  normally refcounted so try to do get_page/pub_page on them will break.
> 
> I don't get it.  Doing get_page/put_page on a slab-allocated page should do
> the right thing?

As Arjan mentioned, what would be the right thing?  Delaying returning the
page to the page pool and disallow reuse until page count reaches zero?
All this seems highly impractical.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
  2005-12-08 13:58     ` Pekka Enberg
@ 2005-12-12 17:27       ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2005-12-12 17:27 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Hellwig, FUJITA Tomonori, michaelc, linux-fsdevel,
	ext2-devel, open-iscsi, linux-mm, linux-kernel

On Thu, Dec 08, 2005 at 03:58:46PM +0200, Pekka Enberg wrote:
> Hi Christoph,
> 
> On 12/8/05, Christoph Hellwig <hch@infradead.org> wrote:
> > One way to work around that would be to detect kmalloced pages and use
> > a slowpath for that.  The major issues with that is that we don't have a
> > reliable way to detect if a given struct page comes from the slab allocator
> > or not.
> 
> Why doesn't PageSlab work for you?

When I looked last time it was a noop without slab debugging enabled,
but that's not the case in current mainline anymore.
If the VM people agree with that usage we could at least use it to fall
back to slow-path.  Even better would be to require normal pages, though.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages
  2005-12-12 17:25       ` Christoph Hellwig
@ 2005-12-12 20:12         ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2005-12-12 20:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: fujita.tomonori, michaelc, linux-fsdevel, ext2-devel, open-iscsi,
	linux-mm, linux-kernel

Christoph Hellwig <hch@infradead.org> wrote:
>
> On Sat, Dec 10, 2005 at 04:47:36PM -0800, Andrew Morton wrote:
> > Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > The problem we're trying to solve here is how do implement network block
> > >  devices (nbd, iscsi) efficiently.  The zero copy codepath in the networking
> > >  layer does need to grab additional references to pages.  So to use sendpage
> > >  we need a refcountable page.  pages used by the slab allocator are not
> > >  normally refcounted so try to do get_page/pub_page on them will break.
> > 
> > I don't get it.  Doing get_page/put_page on a slab-allocated page should do
> > the right thing?
> 
> As Arjan mentioned, what would be the right thing?  Delaying returning the
> page to the page pool and disallow reuse until page count reaches zero?

Yes, that's what'll happen.  slab will put its final ref to the page, so
whoever did that intervening get_page() ends up owning the page.

> All this seems highly impractical.

Well, as Arjan points out, doing get_page() won't prevent slab from
"freeing" a part of the page and reusing it for another object of the same
type.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-12-12 20:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20051208180900T.fujita.tomonori@lab.ntt.co.jp>
     [not found] ` <20051208101833.GM14509@schatzie.adilger.int>
2005-12-08 13:42   ` allowed pages in the block later, was Re: [Ext2-devel] [PATCH] ext3: avoid sending down non-refcounted pages Christoph Hellwig
2005-12-08 13:58     ` Pekka Enberg
2005-12-12 17:27       ` Christoph Hellwig
2005-12-08 18:18     ` Mike Christie
2005-12-08 18:22       ` Mike Christie
2005-12-08 19:20         ` Pekka Enberg
2005-12-11  0:47     ` Andrew Morton
2005-12-11  8:44       ` Arjan van de Ven
2005-12-12 17:25       ` Christoph Hellwig
2005-12-12 20:12         ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox