From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Subject: Re: Can get_user_pages( ,write=1, force=1, ) result in a read-only pte and _count=2? Date: Thu, 19 Jun 2008 13:34:32 +1000 References: <20080618164158.GC10062@sgi.com> <200806191331.32056.nickpiggin@yahoo.com.au> In-Reply-To: <200806191331.32056.nickpiggin@yahoo.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200806191334.32418.nickpiggin@yahoo.com.au> Sender: owner-linux-mm@kvack.org Return-Path: To: Hugh Dickins Cc: Robin Holt , Ingo Molnar , Christoph Lameter , Jack Steiner , linux-mm@kvack.org List-ID: On Thursday 19 June 2008 13:31, Nick Piggin wrote: > On Thursday 19 June 2008 07:46, Hugh Dickins wrote: > > contain COWs - I used to rail against it for that reason, but in the > > end did an audit and couldn't find any place where that violation of > > our assumptions actually mattered enough to get so excited. > > Still, they're slightly troublesome, as our get_user_pages problems > demonstrate :) > > > Hugh > > > > --- 2.6.26-rc6/mm/memory.c 2008-05-26 20:00:39.000000000 +0100 > > +++ linux/mm/memory.c 2008-06-18 22:06:46.000000000 +0100 > > @@ -1152,9 +1152,15 @@ int get_user_pages(struct task_struct *t > > * do_wp_page has broken COW when necessary, > > * even if maybe_mkwrite decided not to set > > * pte_write. We can thus safely do subsequent > > - * page lookups as if they were reads. > > + * page lookups as if they were reads. But only > > + * do so when looping for pte_write is futile: > > + * in some cases userspace may also be wanting > > + * to write to the gotten user page, which a > > + * read fault here might prevent (a readonly > > + * page would get reCOWed by userspace write). > > */ > > - if (ret & VM_FAULT_WRITE) > > + if ((ret & VM_FAULT_WRITE) && > > + !(vma->vm_flags & VM_WRITE)) > > foll_flags &= ~FOLL_WRITE; > > > > cond_resched(); > > Hmm, doesn't this give the same problem for !VM_WRITE vmas? If you > called get_user_pages again, isn't that going to cause another COW > on the already-COWed page that we're hoping to write into? (not sure > about mprotect either, could that be used to make the vma writeable > afterwards and then write to it?) > > I would rather (if my reading of the code is correct) make the > trylock page into a full lock_page. The indeterminism of the trylock > has always bugged me anyway... Shouldn't that cause a swap page not > to get reCOWed if we have the only mapping to it? Although, I have to qualify that. I do like this patch if nothing else than because it avoids this crazy corner case completely for the common force=0 case. So I would like to see this patch go in (hopefully it doesn't just push the obscure bugs further under the radar!) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org