From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Subject: Re: Can get_user_pages( ,write=1, force=1, ) result in a read-only pte and _count=2? Date: Thu, 19 Jun 2008 13:31:31 +1000 References: <20080618164158.GC10062@sgi.com> <20080618203300.GA10123@sgi.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200806191331.32056.nickpiggin@yahoo.com.au> Sender: owner-linux-mm@kvack.org Return-Path: To: Hugh Dickins Cc: Robin Holt , Ingo Molnar , Christoph Lameter , Jack Steiner , linux-mm@kvack.org List-ID: On Thursday 19 June 2008 07:46, Hugh Dickins wrote: > contain COWs - I used to rail against it for that reason, but in the > end did an audit and couldn't find any place where that violation of > our assumptions actually mattered enough to get so excited. Still, they're slightly troublesome, as our get_user_pages problems demonstrate :) > > Hugh > > --- 2.6.26-rc6/mm/memory.c 2008-05-26 20:00:39.000000000 +0100 > +++ linux/mm/memory.c 2008-06-18 22:06:46.000000000 +0100 > @@ -1152,9 +1152,15 @@ int get_user_pages(struct task_struct *t > * do_wp_page has broken COW when necessary, > * even if maybe_mkwrite decided not to set > * pte_write. We can thus safely do subsequent > - * page lookups as if they were reads. > + * page lookups as if they were reads. But only > + * do so when looping for pte_write is futile: > + * in some cases userspace may also be wanting > + * to write to the gotten user page, which a > + * read fault here might prevent (a readonly > + * page would get reCOWed by userspace write). > */ > - if (ret & VM_FAULT_WRITE) > + if ((ret & VM_FAULT_WRITE) && > + !(vma->vm_flags & VM_WRITE)) > foll_flags &= ~FOLL_WRITE; > > cond_resched(); Hmm, doesn't this give the same problem for !VM_WRITE vmas? If you called get_user_pages again, isn't that going to cause another COW on the already-COWed page that we're hoping to write into? (not sure about mprotect either, could that be used to make the vma writeable afterwards and then write to it?) I would rather (if my reading of the code is correct) make the trylock page into a full lock_page. The indeterminism of the trylock has always bugged me anyway... Shouldn't that cause a swap page not to get reCOWed if we have the only mapping to it? If the lock_page cost bothers you, we could do a quick unlocked check on page_mapcount > 1 before taking the lock (which would also avoid the extra atomic ops and barriers in many cases where the page really is shared) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org