From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: linux-mm <linux-mm@kvack.org>
Subject: [PATCH 2.6.17-rc1-mm1 0/6] Migrate-on-fault - Overview
Date: Fri, 07 Apr 2006 16:18:28 -0400 [thread overview]
Message-ID: <1144441108.5198.36.camel@localhost.localdomain> (raw)
This is a reposting of the migrate-on-fault series, against
the 2.6.17-rc1-mm1 tree. I would love to get some feedback on
these patches--especially regarding criteria for getting them
into the mm tree for wider testing.
I will send the remainder of the series as responses to this
message. Auto-migrate series V0.2 to follow.
Lee
----------------------------------------------------------------------
Migrate-on-fault prototype 0/6 V0.2 - Overview
V0.2 - refreshed against 2.6.17-rc1-mm1 with Christoph's migration
code reorg.
Some rework to 'mpol_replaced'. See comments therein.
TODO:
+ make a Kconfig sub-option of MIGRATION?
+ add a sysctl to enable/disable migrate on fault?
separate controls for anon, page cache?
This series of patches, against 2.6.17-rc1-mm1, implements page migration
in the fault path. Based on discussions with Christoph Lameter, this
seems like the next logical step in page migration.
The basic idea is that when a fault handler [do_swap_page, filemap_nopage,
...] finds a cached page with zero mappings that is otherwise "stable"--
i.e., no writebacks--this is a good opportunity to check whether the
page resides on the node indicated by the policy in the current context.
We only want to check if there are zero mappings because 1) we can easily
migrate the page--don't have to go through the effort of removing all
mappings and 2) default policy--a common case--can give different answers
from different tasks running on different nodes. Checking the policy
when there are zero mappings effectively implements a "first touch"
placement policy.
Note that this mechanism can be used to migrate page cache pages that
were read in earlier, are no longer referenced, but are about to be
used by a new task on another node from where the page resides. The
same mechanism can be used to pull anon pages along with a task when
the load balancer decides to move it to another node. However, that
will require a bit more mechanism, and is the subject of another
patch series.
The current [2.6.17-rc*] direct migration facility support most of the
mechanism that is required to implement this "migration on fault".
Some of the necessary operations are combined in functions with other
code that isn't required [must not be executed] in the fault path,
so these have been separated out in a couple of cases.
Then we need to add the function[s] to test the current page in the
fault path for zero mapping, no writebacks, misplacement; and the
function[s] to acutally migrate the page contents to a newly
allocated page using the [modified] migratepage address space
operations of the direct migration mechanism.
The Patches:
The patches are broken out in the order I implemented them. Each
should build and boot on its own. [at least they did at one time!]
migrate-on-fault-01-separate-unmap-replace.patch
Separates the mm/migrate.c:migrate_page_remove_references()
function into its 2 distinct operations: removing references
[try_to_unmap()], and replacing the old page in the radix
tree of the page's "mapping". Only the second part is
needed in the fault path, as the page is already completely
unmapped.
A wrapper function that calls both operations is provided,
and the 2 places that call migrate_page_remove_references()
have been modified to call that wrapper.
migrate-on-fault-02-mpol_misplaced.patch
This patch implements the function mpol_misplaced() in
mm/mempolicy.c to check whether a page resides on the
node indicated by the vma and address arguments. If
so, it returns 0 [!misplaced]. If not, it returns an
indication of whether the policy was interleaved or not
[for properly accounting later allocation] and passes the
node indicated by the policy through a pointer argument.
Because this will be called in the fault path, I don't
want to go through the effort of actually allocating a
page--e.g., via alloc_page_vma()--only to find that the
current page in on the correct node. However, I wanted
to come to the same answer that alloc_page_vma() would.
So, mpol_misplaced() mimics the node computation logic
of alloc_page_vma().
migrate-on-fault-03-migrate_misplaced_page.patch
This patch contains the main migrate on fault functions:
check_migrate_misplaced_page() is implemented as a static
inline function in mempolicy.h when MIGRATION is configured.
If the page has zero mappings, is stable and misplaced,
check_*() will call migrate_misplaced_page() in mmigrate.c
to do the dirty work. If for any reason the page can't
or shouldn't be migrated, these functions will return the
old page in the state it was found.
Note that when a page is NOT found in the cache, and the fault
handler has to allocate one and read it in, it will have zero
mappings, so check_migrate_misplaced_page() WILL call
mpol_misplaced() to see if it needs migration. Of course, it
should have been allocated on the correct node, so no migration
should be necessary. However, it's possible that the node
indicated by the policy has no free pages so the newly
allocated page may be on a different node. In this case, I
guess check_migrate_misplaced_page() will attempt to migrate
it. In either case, the "unnecessary" calls to mpol_misplaced()
and to migrate_misplaced_page(), if the original allocation
"overflowed", occur after an IO, so this is the slow path
anyway.
When MIGRATION is NOT configured, check_migrate_misplaced_page()
becomes a macro that evaluates to its argument page.
More details with the patch.
migrate-on-fault-04.1-misplaced-anon-pages.patch
This is a simple one-liner [OK, 2, counting an empty line]
to call check_migrate_misplaced_page() from do_swap_page()
in memory.c.
Patches to hook other fault paths [filemap_nopage(), etc.]
are still TBD.
migrate-on-fault-05-mbind-lazy-migrate.patch
This patch adds an MPOL_MF_LAZY [maybe should be '_DEFERRED?]
flag to modify the behavior of MPOL_MF_MOVE[_ALL]. When
the 'LAZY flag is specified, mbind() simply unmaps eligible
pages in the specified range, moving anon pages to the
swap cache, if not already there. Then, when the task
touch the pages, or queries their location via
get_mempolicy(..., MPOL_F_NODE|MPOL_F_ADDR), it will take
fault, find the page in the cache and migrate it, if the
policy so indicates. Actually, this will only happen for
anon pages, until additional fault paths are hooked up.
This patch allows me to test the migrate on fault mechanism
by forcing pages to be unmapped.
migrate-on-fault-06-mbind-noop-policy.patch
This patch adds a "NO-OP" policy to mbind() so that the
"'MOVE+'LAZY" unmap-only function can be performed on a
range of task memory without changing the policy.
Testing:
I have tested migrate-on-fault of anon pages using the MPOL_MF_LAZY
extension to mbind() discussed in patch 5 above on 2.6.17-rc1-mm1.
I have an ad hoc [odd hack?] test program, called memtoy, available at:
http://free.linux.hp.com/~lts/Tools/memtoy-latest.tar.gz
The Xpm-tests subdirectory in the tarball contains memtoy test
scripts for "manual page migration"--i.e., the migrate_pages()
syscall, "direct migration" using mbind(MPOL_MF_MOVE) and
migrate-on-fault using mbind(MPOL_MF_MOVE+MPOL_MF_LAZY).
I have also tested with the "automigration" series layered on top
of this one. In that environment, whenever the scheduler migrates
a task to a new node, the task unmaps pages with default policy and
migrates them, if necessary, on first touch after unmap. Running
kernel builds in this environment provides a fairly good stress test
of the migrate-on-fault mechanism.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2006-04-07 20:17 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-07 20:18 Lee Schermerhorn [this message]
2006-04-07 20:22 ` [PATCH 2.6.17-rc1-mm1 1/6] Migrate-on-fault - separate unmap from radix tree replace Lee Schermerhorn
2006-04-11 18:08 ` Christoph Lameter
2006-04-11 18:47 ` Lee Schermerhorn
2006-04-07 20:23 ` [PATCH 2.6.17-rc1-mm1 2/6] Migrate-on-fault - check for misplaced page Lee Schermerhorn
2006-04-11 18:21 ` Christoph Lameter
2006-04-11 19:28 ` Lee Schermerhorn
2006-04-11 19:33 ` Christoph Lameter
2006-04-12 16:43 ` Paul Jackson
2006-04-12 18:49 ` Lee Schermerhorn
2006-04-12 20:55 ` Paul Jackson
2006-04-07 20:23 ` [PATCH 2.6.17-rc1-mm1 3/6] Migrate-on-fault - migrate " Lee Schermerhorn
2006-04-11 18:32 ` Christoph Lameter
2006-04-11 19:51 ` Lee Schermerhorn
2006-04-07 20:24 ` [PATCH 2.6.17-rc1-mm1 4/6] Migrate-on-fault - handle misplaced anon pages Lee Schermerhorn
2006-04-07 20:26 ` [PATCH 2.6.17-rc1-mm1 5/6] Migrate-on-fault - add MPOL_MF_LAZY Lee Schermerhorn
2006-04-07 20:27 ` [PATCH 2.6.17-rc1-mm1 6/6] Migrate-on-fault - add MPOL_NOOP Lee Schermerhorn
2006-04-09 7:01 ` [PATCH 2.6.17-rc1-mm1 0/6] Migrate-on-fault - Overview Andi Kleen
2006-04-11 18:46 ` Christoph Lameter
2006-04-11 18:52 ` Andi Kleen
2006-04-11 19:03 ` Jack Steiner
2006-04-11 20:40 ` Lee Schermerhorn
2006-04-11 22:12 ` Jack Steiner
2006-04-11 20:40 ` Lee Schermerhorn
2006-04-11 20:40 ` Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1144441108.5198.36.camel@localhost.localdomain \
--to=lee.schermerhorn@hp.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox