From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Sun, 22 Jun 2008 18:11:02 +0100 (BST) From: Hugh Dickins Subject: Re: [patch] mm: fix race in COW logic In-Reply-To: <20080622153035.GA31114@wotan.suse.de> Message-ID: References: <20080622153035.GA31114@wotan.suse.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Nick Piggin Cc: Linux Memory Management List , Andrew Morton , Linus Torvalds List-ID: On Sun, 22 Jun 2008, Nick Piggin wrote: > > Can someone please review my thinking here? (no this is not a fix for > Robin's recent issue, but something completely different) You have a wicked mind, and I think you're right, and the fix right. One thing though, in moving the page_remove_rmap in that way, aren't you assuming that there's an appropriate wmbarrier between the two locations? If that is necessarily so (there's plenty happening in between), it may deserve a comment to say just where that barrier is. Hugh > > Thanks, > Nick > > > There is a race in the COW logic. It contains a shortcut to avoid the > COW and reuse the page if we have the sole reference on the page, however it > is possible to have two racing do_wp_page()ers with one causing the other to > mistakenly believe it is safe to take the shortcut when it is not. This could > lead to data corruption. > > Process 1 and process2 each have a wp pte of the same anon page (ie. one > forked the other). The page's mapcount is 2. Then they both attempt to write > to it around the same time... > > proc1 proc2 thr1 proc2 thr2 > CPU0 CPU1 CPU3 > do_wp_page() do_wp_page() > trylock_page() > can_share_swap_page() > load page mapcount (==2) > reuse = 0 > pte unlock > copy page to new_page > pte lock > page_remove_rmap(page); > trylock_page() > can_share_swap_page() > load page mapcount (==1) > reuse = 1 > ptep_set_access_flags (allow W) > > write private key into page > read from page > ptep_clear_flush() > set_pte_at(pte of new_page) > > > Fix this by moving the page_remove_rmap of the old page after the pte clear > and flush. Potentially the entire branch could be moved down here, but in > order to stay consistent, I won't (should probably move all the *_mm_counter > stuff with one patch). > > Signed-off-by: Nick Piggin > --- > Index: linux-2.6/mm/memory.c > =================================================================== > --- linux-2.6.orig/mm/memory.c > +++ linux-2.6/mm/memory.c > @@ -1766,7 +1766,6 @@ gotten: > page_table = pte_offset_map_lock(mm, pmd, address, &ptl); > if (likely(pte_same(*page_table, orig_pte))) { > if (old_page) { > - page_remove_rmap(old_page, vma); > if (!PageAnon(old_page)) { > dec_mm_counter(mm, file_rss); > inc_mm_counter(mm, anon_rss); > @@ -1788,6 +1787,24 @@ gotten: > lru_cache_add_active(new_page); > page_add_new_anon_rmap(new_page, vma, address); > > + if (old_page) { > + /* > + * Only after switching the pte to the new page may > + * we remove the mapcount here. Otherwise another > + * process may come and find the rmap count decremented > + * before the pte is switched to the new page, and > + * "reuse" the old page writing into it while our pte > + * here still points into it and can be read by other > + * threads. > + * > + * The ptep_clear_flush should be enough to prevent > + * any possible reordering making the old page visible > + * to other threads afterwards, so just executing > + * after it is fine. > + */ > + page_remove_rmap(old_page, vma); > + } > + > /* Free the old page.. */ > new_page = old_page; > ret |= VM_FAULT_WRITE; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org