Re: [patch] mm: fix race in COW logic

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [patch] mm: fix race in COW logic
Date: Fri, 27 Jun 2008 11:19:26 +0200	[thread overview]
Message-ID: <1214558366.2801.26.camel@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20080623123030.GB26555@wotan.suse.de>

On Mon, 2008-06-23 at 14:30 +0200, Nick Piggin wrote:
> On Mon, Jun 23, 2008 at 02:18:31PM +0200, Nick Piggin wrote:
> > On Mon, Jun 23, 2008 at 11:04:31AM +0100, Hugh Dickins wrote:
> > > moving the page_remove_rmap down was to be fully effective, it needed
> > > to move through a suitable barrier; it hadn't occurred to me that it
> > > was carrying the suitable barrier with it.  But if that is indeed
> > > correct, I think it would be better to rely upon that, than resort
> > > to more difficult arguments.
> > 
> > No I actually think you make a good point, and I'll resubmit the
> > patch with a replacement comment to say we've got the ordering
> > covered if nothing else then by the atomic op in rmap.
> 
> OK, this is a new comment. I don't actually know if it is any good.
> It is hard to be coherent if you write these things in English.
> Maybe it is best to illustrate with the interleaving diagram in the
> changelog?
> 
> --
> There is a race in the COW logic. It contains a shortcut to avoid the
> COW and reuse the page if we have the sole reference on the page, however it
> is possible to have two racing do_wp_page()ers with one causing the other to
> mistakenly believe it is safe to take the shortcut when it is not. This could
> lead to data corruption.
> 
> Process 1 and process2 each have a wp pte of the same anon page (ie. one
> forked the other). The page's mapcount is 2. Then they both attempt to write
> to it around the same time...
> 
>   proc1				proc2 thr1			proc2 thr2
>   CPU0				CPU1				CPU3
>   do_wp_page()			do_wp_page()
> 				 trylock_page()
> 				  can_share_swap_page()
> 				   load page mapcount (==2)
> 				  reuse = 0
> 				 pte unlock
> 				 copy page to new_page
> 				 pte lock
> 				 page_remove_rmap(page);
>    trylock_page()	
>     can_share_swap_page()
>      load page mapcount (==1)
>     reuse = 1
>    ptep_set_access_flags (allow W)
> 
>   write private key into page
> 								read from page
> 				ptep_clear_flush()
> 				set_pte_at(pte of new_page)
> 
> 
> Fix this by moving the page_remove_rmap of the old page after the pte clear
> and flush. Potentially the entire branch could be moved down here, but in
> order to stay consistent, I won't (should probably move all the *_mm_counter
> stuff with one patch).

Since I bothered to read all the way through this thread, I might as
well provide an ack,..

Acked-by: Peter Zijlstra <peterz@infradead.org>

> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> Index: linux-2.6/mm/memory.c
> ===================================================================
> --- linux-2.6.orig/mm/memory.c
> +++ linux-2.6/mm/memory.c
> @@ -1766,7 +1766,6 @@ gotten:
>  	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
>  	if (likely(pte_same(*page_table, orig_pte))) {
>  		if (old_page) {
> -			page_remove_rmap(old_page, vma);
>  			if (!PageAnon(old_page)) {
>  				dec_mm_counter(mm, file_rss);
>  				inc_mm_counter(mm, anon_rss);
> @@ -1788,6 +1787,32 @@ gotten:
>  		lru_cache_add_active(new_page);
>  		page_add_new_anon_rmap(new_page, vma, address);
>  
> +		if (old_page) {
> +			/*
> +			 * Only after switching the pte to the new page may
> +			 * we remove the mapcount here. Otherwise another
> +			 * process may come and find the rmap count decremented
> +			 * before the pte is switched to the new page, and
> +			 * "reuse" the old page writing into it while our pte
> +			 * here still points into it and can be read by other
> +			 * threads.
> +			 *
> +			 * The critical issue is to order this
> +			 * page_remove_rmap with the ptp_clear_flush above.
> +			 * Those stores are ordered by (if nothing else,)
> +			 * the barrier present in the atomic_add_negative
> +			 * in page_remove_rmap.
> +			 *
> +			 * Then the TLB flush in ptep_clear_flush ensures that
> +			 * no process can access the old page before the
> +			 * decremented mapcount is visible. And the old page
> +			 * cannot be reused until after the decremented
> +			 * mapcount is visible. So transitively, TLBs to
> +			 * old page will be flushed before it can be reused.
> +			 */
> +			page_remove_rmap(old_page, vma);
> +		}
> +
>  		/* Free the old page.. */
>  		new_page = old_page;
>  		ret |= VM_FAULT_WRITE;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-06-27  9:19 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-22 15:30 Nick Piggin
2008-06-22 17:11 ` Hugh Dickins
2008-06-22 17:35   ` Linus Torvalds
2008-06-22 18:10     ` Hugh Dickins
2008-06-22 18:18       ` Linus Torvalds
2008-06-23  1:49       ` Nick Piggin
2008-06-23 10:04         ` Hugh Dickins
2008-06-23 12:18           ` Nick Piggin
2008-06-23 12:30             ` Nick Piggin
2008-06-23 15:39               ` Hugh Dickins
2008-06-27  9:19               ` Peter Zijlstra [this message]
2008-06-27  9:13             ` Peter Zijlstra
2008-06-23  1:52     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1214558366.2801.26.camel@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox