From: Hugh Dickins <hugh@veritas.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [patch] mm: fix race in COW logic
Date: Sun, 22 Jun 2008 18:11:02 +0100 (BST) [thread overview]
Message-ID: <Pine.LNX.4.64.0806221742330.31172@blonde.site> (raw)
In-Reply-To: <20080622153035.GA31114@wotan.suse.de>
On Sun, 22 Jun 2008, Nick Piggin wrote:
>
> Can someone please review my thinking here? (no this is not a fix for
> Robin's recent issue, but something completely different)
You have a wicked mind, and I think you're right, and the fix right.
One thing though, in moving the page_remove_rmap in that way, aren't
you assuming that there's an appropriate wmbarrier between the two
locations? If that is necessarily so (there's plenty happening in
between), it may deserve a comment to say just where that barrier is.
Hugh
>
> Thanks,
> Nick
>
>
> There is a race in the COW logic. It contains a shortcut to avoid the
> COW and reuse the page if we have the sole reference on the page, however it
> is possible to have two racing do_wp_page()ers with one causing the other to
> mistakenly believe it is safe to take the shortcut when it is not. This could
> lead to data corruption.
>
> Process 1 and process2 each have a wp pte of the same anon page (ie. one
> forked the other). The page's mapcount is 2. Then they both attempt to write
> to it around the same time...
>
> proc1 proc2 thr1 proc2 thr2
> CPU0 CPU1 CPU3
> do_wp_page() do_wp_page()
> trylock_page()
> can_share_swap_page()
> load page mapcount (==2)
> reuse = 0
> pte unlock
> copy page to new_page
> pte lock
> page_remove_rmap(page);
> trylock_page()
> can_share_swap_page()
> load page mapcount (==1)
> reuse = 1
> ptep_set_access_flags (allow W)
>
> write private key into page
> read from page
> ptep_clear_flush()
> set_pte_at(pte of new_page)
>
>
> Fix this by moving the page_remove_rmap of the old page after the pte clear
> and flush. Potentially the entire branch could be moved down here, but in
> order to stay consistent, I won't (should probably move all the *_mm_counter
> stuff with one patch).
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> Index: linux-2.6/mm/memory.c
> ===================================================================
> --- linux-2.6.orig/mm/memory.c
> +++ linux-2.6/mm/memory.c
> @@ -1766,7 +1766,6 @@ gotten:
> page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> if (likely(pte_same(*page_table, orig_pte))) {
> if (old_page) {
> - page_remove_rmap(old_page, vma);
> if (!PageAnon(old_page)) {
> dec_mm_counter(mm, file_rss);
> inc_mm_counter(mm, anon_rss);
> @@ -1788,6 +1787,24 @@ gotten:
> lru_cache_add_active(new_page);
> page_add_new_anon_rmap(new_page, vma, address);
>
> + if (old_page) {
> + /*
> + * Only after switching the pte to the new page may
> + * we remove the mapcount here. Otherwise another
> + * process may come and find the rmap count decremented
> + * before the pte is switched to the new page, and
> + * "reuse" the old page writing into it while our pte
> + * here still points into it and can be read by other
> + * threads.
> + *
> + * The ptep_clear_flush should be enough to prevent
> + * any possible reordering making the old page visible
> + * to other threads afterwards, so just executing
> + * after it is fine.
> + */
> + page_remove_rmap(old_page, vma);
> + }
> +
> /* Free the old page.. */
> new_page = old_page;
> ret |= VM_FAULT_WRITE;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-06-22 17:11 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-22 15:30 Nick Piggin
2008-06-22 17:11 ` Hugh Dickins [this message]
2008-06-22 17:35 ` Linus Torvalds
2008-06-22 18:10 ` Hugh Dickins
2008-06-22 18:18 ` Linus Torvalds
2008-06-23 1:49 ` Nick Piggin
2008-06-23 10:04 ` Hugh Dickins
2008-06-23 12:18 ` Nick Piggin
2008-06-23 12:30 ` Nick Piggin
2008-06-23 15:39 ` Hugh Dickins
2008-06-27 9:19 ` Peter Zijlstra
2008-06-27 9:13 ` Peter Zijlstra
2008-06-23 1:52 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0806221742330.31172@blonde.site \
--to=hugh@veritas.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox