linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Christoph Lameter <cl@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	nathalie.furmento@labri.fr
Subject: Re: [PATCH] mm: use a radix-tree to make do_move_pages() complexity linear
Date: Sat, 11 Oct 2008 19:58:12 +1100	[thread overview]
Message-ID: <200810111958.12848.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <48F069B8.6050709@inria.fr>

On Saturday 11 October 2008 19:54, Brice Goglin wrote:
> Christoph Lameter wrote:
> > Would it be possible to restructure this in such a way that we work in
> > chunks of 100 or so pages each so that we can avoid the vmalloc?
> >
> > We also could do a kmalloc for each individual struct pm_struct with the
> > radix tree which would also avoid the vmalloc but still keep the need to
> > allocate 4MB for temporary struct pm_structs.
> >
> > Or get rid of the pm_struct altogether by storing the address of the node
> > vector somewhere and retrieve the node as needed from the array. This
> > would allow storing the struct page * pointers in the radix tree.
>
> I have been thinking about all this, and here's some ideas:
> * do_pages_stat() may easily be rewritten without the huge pm array. It
> just need to user-space pointers to the page and status arrays. It
> traverses the first array , and for each page does: doing get_user() the
> address, retrieve the page node, and put_user() the result in the status
> array. No need to allocate any single page_to_node structure.
> * If we split the migration in small chunks (such as one page of pm's),
> the quadratic complexity doesn't matter that much. There will be at most
> several dozens of pm in the chunk array, so the linear search in
> new_page_node() won't be that slow, it may even be faster than the
> overall cost of adding a radix-tree to improve this search. So we can
> keep the internal code unchanged and just add the chunking around it.
> * One thing that bothers me is move_pages() returning -ENOENT when no
> page are given to migrate_pages(). I don't see why having 100/100 pages
> not migrated would return a different error than having only 99/100
> pages not migrated. We have the status array to place -ENOENT for all
> these pages. If the user doesn't know where his pages are allocated, he
> shouldn't get a different return value depending on how many pages were
> already on the right node. And actually, this convention makes
> user-space application harder to write since you need to treat -ENOENT
> as a success unless you already knew for sure where your pages were
> allocated. And the big thing is that this convention makes the chunking
> painfully/uselessly more complex. Breaking user-ABI is bad, but fixing
> crazy ABI...
>
> Here are some other numbers from the above (dirty) implementation,
> migrating from node #2 to #3 on a quad-quad-core opteron 2347HE with
> vanilla 2.6.27:
>
> length		move_pages (us)	move_pages with patch (us)
> 4kB		126		98
> 40kB		198		168
> 400kB		963		937
> 4MB		12503		11930
> 40MB		246867		11848
>
> It seems to be even slightly better than the previous patch (but the
> kernel are a bit different). And I quickly checked that it scales well
> up to 4GB buffers.

If you are worried about vmalloc overhead, I'd suggest testing with -mm.
I've rewritten the vmap code so it is now slightly scalable and sane to
use.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-10-11  8:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-09 12:32 Brice Goglin
2008-10-10 19:50 ` Andrew Morton
2008-10-10 20:11   ` Brice Goglin
2008-10-10 20:32     ` Christoph Lameter
2008-10-11  8:54       ` Brice Goglin
2008-10-11  8:58         ` Nick Piggin [this message]
2008-10-11  9:19           ` Brice Goglin
2008-10-13 16:09         ` Christoph Lameter
2008-10-10 20:59     ` Brice Goglin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200810111958.12848.nickpiggin@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=Brice.Goglin@inria.fr \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nathalie.furmento@labri.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox