From: Peter Zijlstra <peterz@infradead.org>
To: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Linux Memory Management List <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [patch] mm: fix race in COW logic
Date: Fri, 27 Jun 2008 11:19:26 +0200 [thread overview]
Message-ID: <1214558366.2801.26.camel@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20080623123030.GB26555@wotan.suse.de>
On Mon, 2008-06-23 at 14:30 +0200, Nick Piggin wrote:
> On Mon, Jun 23, 2008 at 02:18:31PM +0200, Nick Piggin wrote:
> > On Mon, Jun 23, 2008 at 11:04:31AM +0100, Hugh Dickins wrote:
> > > moving the page_remove_rmap down was to be fully effective, it needed
> > > to move through a suitable barrier; it hadn't occurred to me that it
> > > was carrying the suitable barrier with it. But if that is indeed
> > > correct, I think it would be better to rely upon that, than resort
> > > to more difficult arguments.
> >
> > No I actually think you make a good point, and I'll resubmit the
> > patch with a replacement comment to say we've got the ordering
> > covered if nothing else then by the atomic op in rmap.
>
> OK, this is a new comment. I don't actually know if it is any good.
> It is hard to be coherent if you write these things in English.
> Maybe it is best to illustrate with the interleaving diagram in the
> changelog?
>
> --
> There is a race in the COW logic. It contains a shortcut to avoid the
> COW and reuse the page if we have the sole reference on the page, however it
> is possible to have two racing do_wp_page()ers with one causing the other to
> mistakenly believe it is safe to take the shortcut when it is not. This could
> lead to data corruption.
>
> Process 1 and process2 each have a wp pte of the same anon page (ie. one
> forked the other). The page's mapcount is 2. Then they both attempt to write
> to it around the same time...
>
> proc1 proc2 thr1 proc2 thr2
> CPU0 CPU1 CPU3
> do_wp_page() do_wp_page()
> trylock_page()
> can_share_swap_page()
> load page mapcount (==2)
> reuse = 0
> pte unlock
> copy page to new_page
> pte lock
> page_remove_rmap(page);
> trylock_page()
> can_share_swap_page()
> load page mapcount (==1)
> reuse = 1
> ptep_set_access_flags (allow W)
>
> write private key into page
> read from page
> ptep_clear_flush()
> set_pte_at(pte of new_page)
>
>
> Fix this by moving the page_remove_rmap of the old page after the pte clear
> and flush. Potentially the entire branch could be moved down here, but in
> order to stay consistent, I won't (should probably move all the *_mm_counter
> stuff with one patch).
Since I bothered to read all the way through this thread, I might as
well provide an ack,..
Acked-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> Index: linux-2.6/mm/memory.c
> ===================================================================
> --- linux-2.6.orig/mm/memory.c
> +++ linux-2.6/mm/memory.c
> @@ -1766,7 +1766,6 @@ gotten:
> page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> if (likely(pte_same(*page_table, orig_pte))) {
> if (old_page) {
> - page_remove_rmap(old_page, vma);
> if (!PageAnon(old_page)) {
> dec_mm_counter(mm, file_rss);
> inc_mm_counter(mm, anon_rss);
> @@ -1788,6 +1787,32 @@ gotten:
> lru_cache_add_active(new_page);
> page_add_new_anon_rmap(new_page, vma, address);
>
> + if (old_page) {
> + /*
> + * Only after switching the pte to the new page may
> + * we remove the mapcount here. Otherwise another
> + * process may come and find the rmap count decremented
> + * before the pte is switched to the new page, and
> + * "reuse" the old page writing into it while our pte
> + * here still points into it and can be read by other
> + * threads.
> + *
> + * The critical issue is to order this
> + * page_remove_rmap with the ptp_clear_flush above.
> + * Those stores are ordered by (if nothing else,)
> + * the barrier present in the atomic_add_negative
> + * in page_remove_rmap.
> + *
> + * Then the TLB flush in ptep_clear_flush ensures that
> + * no process can access the old page before the
> + * decremented mapcount is visible. And the old page
> + * cannot be reused until after the decremented
> + * mapcount is visible. So transitively, TLBs to
> + * old page will be flushed before it can be reused.
> + */
> + page_remove_rmap(old_page, vma);
> + }
> +
> /* Free the old page.. */
> new_page = old_page;
> ret |= VM_FAULT_WRITE;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-06-27 9:19 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-22 15:30 Nick Piggin
2008-06-22 17:11 ` Hugh Dickins
2008-06-22 17:35 ` Linus Torvalds
2008-06-22 18:10 ` Hugh Dickins
2008-06-22 18:18 ` Linus Torvalds
2008-06-23 1:49 ` Nick Piggin
2008-06-23 10:04 ` Hugh Dickins
2008-06-23 12:18 ` Nick Piggin
2008-06-23 12:30 ` Nick Piggin
2008-06-23 15:39 ` Hugh Dickins
2008-06-27 9:19 ` Peter Zijlstra [this message]
2008-06-27 9:13 ` Peter Zijlstra
2008-06-23 1:52 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1214558366.2801.26.camel@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=hugh@veritas.com \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox