From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ve0-f179.google.com (mail-ve0-f179.google.com [209.85.128.179]) by kanga.kvack.org (Postfix) with ESMTP id 8F63C6B0031 for ; Fri, 13 Jun 2014 13:23:45 -0400 (EDT) Received: by mail-ve0-f179.google.com with SMTP id sa20so1777439veb.24 for ; Fri, 13 Jun 2014 10:23:45 -0700 (PDT) Received: from mail-ve0-f180.google.com (mail-ve0-f180.google.com [209.85.128.180]) by mx.google.com with ESMTPS id x4si1611060ved.52.2014.06.13.10.23.44 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 13 Jun 2014 10:23:44 -0700 (PDT) Received: by mail-ve0-f180.google.com with SMTP id jw12so3569444veb.25 for ; Fri, 13 Jun 2014 10:23:44 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <1402655819-14325-1-git-send-email-dh.herrmann@gmail.com> <1402655819-14325-8-git-send-email-dh.herrmann@gmail.com> From: Andy Lutomirski Date: Fri, 13 Jun 2014 10:23:23 -0700 Message-ID: Subject: Re: [RFC v3 7/7] shm: isolate pinned pages when sealing files Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: David Herrmann Cc: "linux-kernel@vger.kernel.org" , Michael Kerrisk , Ryan Lortie , Linus Torvalds , Andrew Morton , "linux-mm@kvack.org" , Linux FS Devel , Linux API , Greg Kroah-Hartman , John Stultz , Lennart Poettering , Daniel Mack , Kay Sievers , Hugh Dickins , Tony Battersby On Fri, Jun 13, 2014 at 8:27 AM, David Herrmann wrote: > Hi > > On Fri, Jun 13, 2014 at 5:06 PM, Andy Lutomirski wrote: >> On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann wrote: >>> When setting SEAL_WRITE, we must make sure nobody has a writable reference >>> to the pages (via GUP or similar). We currently check references and wait >>> some time for them to be dropped. This, however, might fail for several >>> reasons, including: >>> - the page is pinned for longer than we wait >>> - while we wait, someone takes an already pinned page for read-access >>> >>> Therefore, this patch introduces page-isolation. When sealing a file with >>> SEAL_WRITE, we copy all pages that have an elevated ref-count. The newpage >>> is put in place atomically, the old page is detached and left alone. It >>> will get reclaimed once the last external user dropped it. >>> >>> Signed-off-by: David Herrmann >> >> Won't this have unexpected effects? >> >> Thread 1: start read into mapping backed by fd >> >> Thread 2: SEAL_WRITE >> >> Thread 1: read finishes. now the page doesn't match the sealed page > > Just to be clear: you're talking about read() calls that write into > the memfd? (like my FUSE example does) Your language might be > ambiguous to others as "read into" actually implies a write. > > No, this does not have unexpected effects. But yes, your conclusion is > right. To be clear, this behavior would be part of the API. Any > asynchronous write might be cut off by SEAL_WRITE _iff_ you unmap your > buffer before the write finishes. But you actually have to extend your > example: > > Thread 1: p = mmap(memfd, SIZE); > Thread 1: h = async_read(some_fd, p, SIZE); > Thread 1: munmap(p, SIZE); > Thread 2: SEAL_WRITE > Thread 1: async_wait(h); > > If you don't do the unmap(), then SEAL_WRITE will fail due to an > elevated i_mmap_writable. I think this is fine. In fact, I remember > reading that async-IO is not required to resolve user-space addresses > at the time of the syscall, but might delay it to the time of the > actual write. But you're right, it would be misleading that the AIO > operation returns success. This would be part of the memfd-API, > though. And if you mess with your address space while running an > async-IO operation on it, you're screwed anyway. Ok, I missed the part where you had to munmap to trigger the oddity. That seems fine to me. > > Btw., your sealing use-case is really odd. No-one guarantees that the > SEAL_WRITE happens _after_ you schedule your async-read. In case you > have some synchronization there, you just have to move it after > waiting for your async-io to finish. > > Does that clear things up? I think so. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org