From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 1 Aug 2005 22:06:15 +0100 (BST) From: Hugh Dickins Subject: Re: [patch 2.6.13-rc4] fix get_user_pages bug In-Reply-To: Message-ID: References: <20050801032258.A465C180EC0@magilla.sf.frob.com> <42EDDB82.1040900@yahoo.com.au> <20050801091956.GA3950@elte.hu> <42EDEAFE.1090600@yahoo.com.au> <20050801101547.GA5016@elte.hu> <42EE0021.3010208@yahoo.com.au> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Linus Torvalds Cc: Nick Piggin , Ingo Molnar , Robin Holt , Andrew Morton , Roland McGrath , linux-mm@kvack.org, linux-kernel , Martin Schwidefsky List-ID: On Mon, 1 Aug 2005, Linus Torvalds wrote: > On Mon, 1 Aug 2005, Hugh Dickins wrote: > > > > > Aside, that brings up an interesting question - why should readonly > > > mappings of writeable files (with VM_MAYWRITE set) disallow ptrace > > > write access while readonly mappings of readonly files not? Or am I > > > horribly confused? > > > > Either you or I. You'll have to spell that out to me in more detail, > > I don't see it that way. > > We have always just done a COW if it's read-only - even if it's shared. > > The point being that if a process mapped did a read-only mapping, and a > tracer wants to modify memory, the tracer is always allowed to do so, but > it's _not_ going to write anything back to the filesystem. Writing > something back to an executable just because the user happened to mmap it > with MAP_SHARED (but read-only) _and_ the user had the right to write to > that fd is _not_ ok. I'll need to think that through, but not right now. It's a surprise to me, and it's likely to surprise the current kernel too. I'd prefer to say that if the executable was mapped shared from a writable fd, then the tracer will write back to it; but you're clearly against that. We may need to split the vma to handle your semantics correctly. > Using the dirty flag for a "page is _really_ writable" is admittedly kind > of hacky, but it does have the advantage of working even when the -real- > write bit isn't set due to "maybe_mkwrite()". If it forces the s390 people > to add some more hacks for their strange VM, so be it.. I'll concentrate on this for now. s390 used to have the same semantics as everyone else, they made a conscious choice to diverge, and keep dirty bit in separate array indexed by struct page, and page_test_and_clear_dirty macro they use instead of clearing pte_dirty. It works all right, and I don't think it will prevent us communicating between maybe_mkwrite and get_user_pages by a pte flag that would be the usual pte_dirty on every other architecture. Just need to be careful to handle the right one right. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org