From: Brice Goglin <Brice.Goglin@inria.fr>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
nathalie.furmento@labri.fr
Subject: Re: [PATCH] mm: use a radix-tree to make do_move_pages() complexity linear
Date: Sat, 11 Oct 2008 10:54:16 +0200 [thread overview]
Message-ID: <48F069B8.6050709@inria.fr> (raw)
In-Reply-To: <48EFBBE9.5000703@linux-foundation.org>
Christoph Lameter wrote:
> Would it be possible to restructure this in such a way that we work in chunks
> of 100 or so pages each so that we can avoid the vmalloc?
>
> We also could do a kmalloc for each individual struct pm_struct with the radix
> tree which would also avoid the vmalloc but still keep the need to allocate
> 4MB for temporary struct pm_structs.
>
> Or get rid of the pm_struct altogether by storing the address of the node
> vector somewhere and retrieve the node as needed from the array. This would
> allow storing the struct page * pointers in the radix tree.
>
I have been thinking about all this, and here's some ideas:
* do_pages_stat() may easily be rewritten without the huge pm array. It
just need to user-space pointers to the page and status arrays. It
traverses the first array , and for each page does: doing get_user() the
address, retrieve the page node, and put_user() the result in the status
array. No need to allocate any single page_to_node structure.
* If we split the migration in small chunks (such as one page of pm's),
the quadratic complexity doesn't matter that much. There will be at most
several dozens of pm in the chunk array, so the linear search in
new_page_node() won't be that slow, it may even be faster than the
overall cost of adding a radix-tree to improve this search. So we can
keep the internal code unchanged and just add the chunking around it.
* One thing that bothers me is move_pages() returning -ENOENT when no
page are given to migrate_pages(). I don't see why having 100/100 pages
not migrated would return a different error than having only 99/100
pages not migrated. We have the status array to place -ENOENT for all
these pages. If the user doesn't know where his pages are allocated, he
shouldn't get a different return value depending on how many pages were
already on the right node. And actually, this convention makes
user-space application harder to write since you need to treat -ENOENT
as a success unless you already knew for sure where your pages were
allocated. And the big thing is that this convention makes the chunking
painfully/uselessly more complex. Breaking user-ABI is bad, but fixing
crazy ABI...
Here are some other numbers from the above (dirty) implementation,
migrating from node #2 to #3 on a quad-quad-core opteron 2347HE with
vanilla 2.6.27:
length move_pages (us) move_pages with patch (us)
4kB 126 98
40kB 198 168
400kB 963 937
4MB 12503 11930
40MB 246867 11848
It seems to be even slightly better than the previous patch (but the
kernel are a bit different). And I quickly checked that it scales well
up to 4GB buffers.
Brice
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-10-11 8:54 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-09 12:32 Brice Goglin
2008-10-10 19:50 ` Andrew Morton
2008-10-10 20:11 ` Brice Goglin
2008-10-10 20:32 ` Christoph Lameter
2008-10-11 8:54 ` Brice Goglin [this message]
2008-10-11 8:58 ` Nick Piggin
2008-10-11 9:19 ` Brice Goglin
2008-10-13 16:09 ` Christoph Lameter
2008-10-10 20:59 ` Brice Goglin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48F069B8.6050709@inria.fr \
--to=brice.goglin@inria.fr \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nathalie.furmento@labri.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox