From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Sat, 18 Oct 2008 04:25:41 +0200 From: Nick Piggin Subject: Re: [patch] mm: fix anon_vma races Message-ID: <20081018022541.GA19018@wotan.suse.de> References: <20081016041033.GB10371@wotan.suse.de> <1224285222.10548.22.camel@lappy.programming.kicks-ass.net> <20081018013258.GA3595@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org Return-Path: To: Linus Torvalds Cc: Peter Zijlstra , Hugh Dickins , Linux Memory Management List List-ID: On Fri, Oct 17, 2008 at 07:11:38PM -0700, Linus Torvalds wrote: > > > On Sat, 18 Oct 2008, Nick Piggin wrote: > > > > > > Side note: it would be nicer if we had a "spin_lock_init_locked()", so > > > that we could avoid the more expensive "true lock" when doing the initial > > > allocation, but we don't. That said, the case of having to allocate a new > > > anon_vma _should_ be the rare one. > > > > We can't do that, unfortuantely, because anon_vmas are allocated with > > SLAB_DESTROY_BY_RCU. > > Aughh. I see what you're saying. We don't _free_ them by RCU, we just > destroy the page allocation. So an anon_vma can get _re-allocated_ for > another page (without being destroyed), concurrently with somebody > optimistically being busy with that same anon_vma that they got through > that optimistic 'page_lock_anon_vma()' thing. > > So if we were to just set the lock, we might actually be messing with > something that is still actively used by the previous page that was > unmapped concurrently and still being accessed by try_to_unmap_anon. So > even though we allocated a "new" anon_vma, it might still be busy. > > Yes? No? That's what I'm thinking, yes. But I admit the last time I looked at this really closely was probably reading through Hugh's patches and changelogs (which at the time must have convinced me ;)). So I could be wrong. > That thing really is too subtle for words. But if that's actually what you > are alluding to, then doesn't that mean that we _really_ should be doing > that "spin_lock(&anon_vma->lock)" even for the first allocation, and that > the current code is broken? Because otherwise that other concurrent user > that found the stale vma through page_lock_anon_vma() will now try to > follow the linked list and _think_ it's stable (thanks to the lock), but > we're actually inserting entries into it without holding any locks at all. Yes, that's what I meant by "has other problems". Another thing is also that even if we have the lock here, I can't see why page_lock_anon_vma is safe against finding an anon_vma which has been deallocated then allocated for something else (and had vmas inserted into it etc.). I think most of our memory ordering problems can be solved by locking. Note that I don't think we need any barriers there, and callers don't need any read_barrier_depends either, because AFAIKS they all take the lock too. (it shouldn't actually need the reordering of the assignments either, which shows it is a bit more robust than relying on ordering, but I think it is neater if we reorder them). Whether this page_lock_anon_vma is really a problem or not... I've added a test in there that I think may be a problem (at least, I'd like to know why I'm wrong and have a comment in there). > But I'm hoping I actually am totally *not* understanding what you meant, > and am actually just terminally confused. > > Hugh, this is very much your code. Can you please tell me I'm really > confused here, and un-confuse me. Pretty please? Ditto ;) --- Index: linux-2.6/mm/rmap.c =================================================================== --- linux-2.6.orig/mm/rmap.c +++ linux-2.6/mm/rmap.c @@ -63,32 +63,42 @@ int anon_vma_prepare(struct vm_area_stru might_sleep(); if (unlikely(!anon_vma)) { struct mm_struct *mm = vma->vm_mm; - struct anon_vma *allocated, *locked; + struct anon_vma *allocated; anon_vma = find_mergeable_anon_vma(vma); if (anon_vma) { allocated = NULL; - locked = anon_vma; - spin_lock(&locked->lock); } else { anon_vma = anon_vma_alloc(); if (unlikely(!anon_vma)) return -ENOMEM; allocated = anon_vma; - locked = NULL; } + /* + * The lock is required even for new anon_vmas, because as + * soon as we store vma->anon_vma = anon_vma, then the + * anon_vma becomes visible via the vma. This means another + * CPU can find the anon_vma, then store it into the struct + * page with page_add_anon_rmap. At this point, anon_vma can + * be loaded from the page with page_lock_anon_vma. + * + * So long as the anon_vma->lock is taken before looking at + * any fields in the anon_vma, the lock should take care of + * races and memory ordering issues WRT anon_vma fields. + */ + spin_lock(&anon_vma->lock); + /* page_table_lock to protect against threads */ spin_lock(&mm->page_table_lock); if (likely(!vma->anon_vma)) { - vma->anon_vma = anon_vma; list_add_tail(&vma->anon_vma_node, &anon_vma->head); + vma->anon_vma = anon_vma; allocated = NULL; } spin_unlock(&mm->page_table_lock); + spin_lock(&anon_vma->lock); - if (locked) - spin_unlock(&locked->lock); if (unlikely(allocated)) anon_vma_free(allocated); } @@ -171,6 +181,10 @@ static struct anon_vma *page_lock_anon_v anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); spin_lock(&anon_vma->lock); + + if (anon_mapping != (unsigned long)page->mapping) + goto out; + return anon_vma; out: rcu_read_unlock(); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org