From: Pavel Emelyanov <xemul@parallels.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>, Hugh Dickins <hughd@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Michal Hocko <mhocko@suse.cz>, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Linux MM <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>,
Wu Fengguang <fengguang.wu@intel.com>
Subject: Re: [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes
Date: Thu, 06 Dec 2012 10:32:24 +0400 [thread overview]
Message-ID: <50C03BF8.7050508@parallels.com> (raw)
In-Reply-To: <20121205140602.1d8340a8.akpm@linux-foundation.org>
>> For what is required for checkpoint-restore is -- we want to query the kernel
>> for "what pages has been written to since moment X". But this "moment X" is
>> a little bit more tricky than just "mark all pages r/o". Consider we're doing
>> this periodically. So when defining the moment X for the 2nd time we should
>> query the "changed" state and remap the respective page r/o atomically. Full
>> snapshot is actually not required, since we don't need to keep the old copy
>> of a page that is written to. Just a sign, that this page was modified is OK.
>
> How is all this going to work, btw? What is the interface to query
> page states and set them read-only? How will dirty pagecache and dirty
> swapcache be handled? And anonymous memory?
To begin with -- currently criu dumps lots of information about process by
injecting a parasite code into the process [1] and working on the process
state as if it was this very process dumping himself.
That said, the proposed in this set API is about to be used like this:
1. A daemon is started, that turns tracing on, enables proposed mmu.* events
and starts listening for them.
2. The parasite code gets injected into target task. This parasite knows
which mapping(s) we're about to take to the image.
3. The parasite first sends the needed pages [2] to the image file.
4. Then parasite calls the proposed madvise(MADV_TRACE) on the mapping. When
called, the respective mapping is marked with VM_TRACE bit and all the
pages are remaped in ro.
5. After this parasite can be removed and the target task is continued.
If after this a process writes to some page the #PF occurs and the respective
event is send via tracing engine. Next time, when we want to take incremental
dump, we repeat steps 2 through 5, with a small change -- in step 3 parasite
requests the daemon from step 1 which pages has been changes since last time
and dumps only those into new image.
The state of swapcache (clean or dirty) doesn't matter in this case. If the
page is in swap and pte contains swap entry, we'll note this from pagemap file
and will take the page into image in the first pass. If later a process writes
to the page it will go through do_swap_page -> do_wp_page and the modification
event will be sent and caught by daemon from step 1.
The pagecache is completely out of the scope since criu doesn't dump the
contents of file mappings and doesn't snapshot filesystem state. It only
works with process' state. Filesystem state, that corresponds to process state
should be created with other means, e.g. lvm snapshot or rsync while tasks
are stopped. I've tried to explain this in more details here [3].
Thanks,
Pavel
[1] http://lwn.net/Articles/454304/
[2] Looking a the /proc/PID/pagemap file
[3] https://plus.google.com/103175467322423551911/posts/UAtVKaQcKsx
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2012-12-06 6:33 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-30 17:55 Pavel Emelyanov
2012-11-30 17:55 ` [PATCH 1/2] mm: Mark VMA with VM_TRACE bit Pavel Emelyanov
2012-11-30 17:55 ` [PATCH 2/2] mm: Generate events when tasks change their memory Pavel Emelyanov
2012-12-03 23:42 ` Xiao Guangrong
2012-12-04 5:04 ` Pavel Emelyanov
2012-12-03 8:36 ` [RFC PATCH 0/2] mm: Add ability to monitor task's memory changes Glauber Costa
2012-12-03 20:16 ` Marcelo Tosatti
2012-12-04 7:39 ` Glauber Costa
2012-12-03 22:43 ` Andrew Morton
2012-12-04 5:15 ` Pavel Emelyanov
2012-12-04 23:21 ` Andrew Morton
2012-12-05 0:17 ` Matt Mackall
2012-12-05 0:24 ` Andrew Morton
2012-12-05 0:38 ` Matt Mackall
2012-12-05 9:53 ` Pavel Emelyanov
2012-12-05 22:06 ` Andrew Morton
2012-12-06 6:32 ` Pavel Emelyanov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50C03BF8.7050508@parallels.com \
--to=xemul@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=mpm@selenic.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox