* [PATCH] mm: make do_move_pages() complexity linear
@ 2008-09-12 12:31 Brice Goglin
2008-09-12 13:45 ` Christoph Lameter
2008-09-25 12:58 ` Brice Goglin
0 siblings, 2 replies; 5+ messages in thread
From: Brice Goglin @ 2008-09-12 12:31 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, LKML, Andrew Morton, Nathalie Furmento
Page migration is currently very slow because its overhead is quadratic
with the number of pages. This is caused by each single page migration
doing a linear lookup in the page array in new_page_node().
Since pages are stored in the array order in the pagelist and do_move_pages
process this list in order, new_page_node() can increase the "pm" pointer
to the page array so that the next iteration will find the next page in
0 or few lookup steps.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Nathalie Furmento <Nathalie.Furmento@labri.fr>
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -837,14 +837,23 @@ struct page_to_node {
int status;
};
+/*
+ * Allocate a page on the node given as a page_to_node in private.
+ * Increase private to point to the next page_to_node so that the
+ * next iteration does not have to traverse the whole pm array.
+ */
static struct page *new_page_node(struct page *p, unsigned long private,
int **result)
{
- struct page_to_node *pm = (struct page_to_node *)private;
+ struct page_to_node **pmptr = (struct page_to_node **)private;
+ struct page_to_node *pm = *pmptr;
while (pm->node != MAX_NUMNODES && pm->page != p)
pm++;
+ /* prepare for the next iteration */
+ *pmptr = pm + 1;
+
if (pm->node == MAX_NUMNODES)
return NULL;
@@ -926,10 +935,12 @@ set_status:
pp->status = err;
}
- if (!list_empty(&pagelist))
+ if (!list_empty(&pagelist)) {
+ /* new_page_node() will modify tmp */
+ struct page_to_node *tmp = pm;
err = migrate_pages(&pagelist, new_page_node,
- (unsigned long)pm);
- else
+ (unsigned long)&tmp);
+ } else
err = -ENOENT;
up_read(&mm->mmap_sem);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] mm: make do_move_pages() complexity linear
2008-09-12 12:31 [PATCH] mm: make do_move_pages() complexity linear Brice Goglin
@ 2008-09-12 13:45 ` Christoph Lameter
2008-09-12 13:54 ` Brice Goglin
2008-09-25 12:58 ` Brice Goglin
1 sibling, 1 reply; 5+ messages in thread
From: Christoph Lameter @ 2008-09-12 13:45 UTC (permalink / raw)
To: Brice Goglin; +Cc: linux-mm, LKML, Andrew Morton, Nathalie Furmento
Brice Goglin wrote:
> Page migration is currently very slow because its overhead is quadratic
> with the number of pages. This is caused by each single page migration
> doing a linear lookup in the page array in new_page_node().
Page migration in general is not affected by this issue. This is specific to
the sys_move_pages() system call. The API was so far only used to migrate a
limited number of pages. For more one would use either the cpuset or the
sys_migrate_pages() APIs since these do not require an array that describes
how every single page needs to be moved.
> Since pages are stored in the array order in the pagelist and do_move_pages
> process this list in order, new_page_node() can increase the "pm" pointer
> to the page array so that the next iteration will find the next page in
> 0 or few lookup steps.
I agree. It would be good increase the speed of sys_move_pages().
However, note that your patch assumes that new_page_node() is called in
sequence for each of the pages in the page descriptor array. new_page_node()
is skipped in the loop if
1. The page is not present
2. The page is reserved
3. The page is already on the intended node
4. The page is shared between processes.
If any of those cases happen then your patch will result in the association of
page descriptors with the wrong pages for the remaining pages in the array.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: make do_move_pages() complexity linear
2008-09-12 13:45 ` Christoph Lameter
@ 2008-09-12 13:54 ` Brice Goglin
2008-09-12 14:21 ` Christoph Lameter
0 siblings, 1 reply; 5+ messages in thread
From: Brice Goglin @ 2008-09-12 13:54 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, LKML, Andrew Morton, Nathalie Furmento
Christoph Lameter wrote:
> Brice Goglin wrote:
>> Page migration is currently very slow because its overhead is quadratic
>> with the number of pages. This is caused by each single page migration
>> doing a linear lookup in the page array in new_page_node().
>
> Page migration in general is not affected by this issue. This is specific to
> the sys_move_pages() system call. The API was so far only used to migrate a
> limited number of pages. For more one would use either the cpuset or the
> sys_migrate_pages() APIs since these do not require an array that describes
> how every single page needs to be moved.
>
>> Since pages are stored in the array order in the pagelist and do_move_pages
>> process this list in order, new_page_node() can increase the "pm" pointer
>> to the page array so that the next iteration will find the next page in
>> 0 or few lookup steps.
>
> I agree. It would be good increase the speed of sys_move_pages().
>
> However, note that your patch assumes that new_page_node() is called in
> sequence for each of the pages in the page descriptor array.
No, it assumes that pages are stored in pagelist in order. But some of
them can be missing compared to the page array.
> new_page_node() is skipped in the loop if
>
> 1. The page is not present
> 2. The page is reserved
> 3. The page is already on the intended node
> 4. The page is shared between processes.
>
> If any of those cases happen then your patch will result in the association of
> page descriptors with the wrong pages for the remaining pages in the array.
I don't think so. If this happens, the while loop will skip those pages.
(while in the regular case, the while loop does 0 iterations).
The while loop is still here to make sure we are processing the right pm
entry. What the patch changes is only that we don't uselessly look at
the already-processed beginning of pm.
thanks,
Brice
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: make do_move_pages() complexity linear
2008-09-12 13:54 ` Brice Goglin
@ 2008-09-12 14:21 ` Christoph Lameter
0 siblings, 0 replies; 5+ messages in thread
From: Christoph Lameter @ 2008-09-12 14:21 UTC (permalink / raw)
To: Brice Goglin; +Cc: linux-mm, LKML, Andrew Morton, Nathalie Furmento
Brice Goglin wrote:
> I don't think so. If this happens, the while loop will skip those pages.
> (while in the regular case, the while loop does 0 iterations).
> The while loop is still here to make sure we are processing the right pm
> entry. What the patch changes is only that we don't uselessly look at
> the already-processed beginning of pm.
Ahh.. I missed that.
Acked-by: Christoph Lameter <cl@linux-foundation.org>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: make do_move_pages() complexity linear
2008-09-12 12:31 [PATCH] mm: make do_move_pages() complexity linear Brice Goglin
2008-09-12 13:45 ` Christoph Lameter
@ 2008-09-25 12:58 ` Brice Goglin
1 sibling, 0 replies; 5+ messages in thread
From: Brice Goglin @ 2008-09-25 12:58 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, LKML, Andrew Morton, Nathalie Furmento
Brice Goglin wrote:
> Page migration is currently very slow because its overhead is quadratic
> with the number of pages. This is caused by each single page migration
> doing a linear lookup in the page array in new_page_node().
>
> Since pages are stored in the array order in the pagelist and do_move_pages
> process this list in order, new_page_node() can increase the "pm" pointer
> to the page array so that the next iteration will find the next page in
> 0 or few lookup steps.
>
> [...]
>
> +/*
> + * Allocate a page on the node given as a page_to_node in private.
> + * Increase private to point to the next page_to_node so that the
> + * next iteration does not have to traverse the whole pm array.
> + */
> static struct page *new_page_node(struct page *p, unsigned long private,
> int **result)
> {
> - struct page_to_node *pm = (struct page_to_node *)private;
> + struct page_to_node **pmptr = (struct page_to_node **)private;
> + struct page_to_node *pm = *pmptr;
>
> while (pm->node != MAX_NUMNODES && pm->page != p)
> pm++;
>
> + /* prepare for the next iteration */
> + *pmptr = pm + 1;
> +
>
Actually, this "pm+1" breaks the case where migrate_pages() calls
unmap_and_move() multiple times on the same page. In this case, we need
the while loop to look at pm instead of pm+1 first. So we can't cache
pm+1 in private, but caching pm is ok. There will be 1 while loop
instead of 0 in the regular case. Updated patch (with more comments)
coming soon.
Brice
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-09-25 12:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-12 12:31 [PATCH] mm: make do_move_pages() complexity linear Brice Goglin
2008-09-12 13:45 ` Christoph Lameter
2008-09-12 13:54 ` Brice Goglin
2008-09-12 14:21 ` Christoph Lameter
2008-09-25 12:58 ` Brice Goglin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox