From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4F30C433F5 for ; Wed, 9 Mar 2022 07:37:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 414018D0014; Wed, 9 Mar 2022 02:37:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 39D1B8D0001; Wed, 9 Mar 2022 02:37:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 217148D0014; Wed, 9 Mar 2022 02:37:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id 111CB8D0001 for ; Wed, 9 Mar 2022 02:37:33 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B67669C0F for ; Wed, 9 Mar 2022 07:37:32 +0000 (UTC) X-FDA: 79224042744.30.8333B98 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf21.hostedemail.com (Postfix) with ESMTP id F31D11C0013 for ; Wed, 9 Mar 2022 07:37:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646811451; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/xqkH3oVL9p4jfQYfTW2XbKPGZRPmKZlORb9Agr/qsY=; b=DavUelJ4QtQNibaJO6guAnABk+fHNwgYXtKCEJJ+gU/vxapVZ8iASwxSqxuyxw3mpYxs27 B/EP0abFMmvkuwayHBfWMbrFYdCkMFfXC7FEkRGw0QdVmBzCVaBCA4zZt9HGxq7Y3o0dsi HxjisDneR2X8MSEpEBxkIlk841xl3jk= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-354-sj7t3EdsNaa5PiBC38k-9Q-1; Wed, 09 Mar 2022 02:37:30 -0500 X-MC-Unique: sj7t3EdsNaa5PiBC38k-9Q-1 Received: by mail-wm1-f71.google.com with SMTP id l1-20020a1c2501000000b00389c7b9254cso1035328wml.1 for ; Tue, 08 Mar 2022 23:37:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:from:to:cc:references:organization:subject :in-reply-to:content-transfer-encoding; bh=/xqkH3oVL9p4jfQYfTW2XbKPGZRPmKZlORb9Agr/qsY=; b=IMv+anm+a8DtvZCQIl+gncMG7GEQZJqDSky9mTNkqBSAx+z7R6P3757SbJk7OtE9Ky MiTBlG2mNaOK9v0sUX7Y4gaGhS86qI6x+4TdTFW0dzkInUPQBZpJhJcrR+w/J9Y2GIr7 Fkgw50r8RSQ7sjsg5LLVofXNEZ2bO81VUrfNgA805C9CokMTb0S2q7v//uCS8mov3Gyj wiKffEbpAA1KxhNUGkAHXFPftz4Mm0VnR76spbf3pzHAHZKsr+GWffxq5CzsGveOGpjE QxLs9Y11Aqt70sFynvKHNEf+xN0gcuIjpM6EtVS5/VQiWJiD/nk9tij64j8lceD1DXFA WMnA== X-Gm-Message-State: AOAM530y1E9++WmLHOcSjyKqBYMmrYtj/vLPeuKOidcOqo4uuz8QH9Sq idu6F9hJozQlRwUU7y8FMsix+F9MNSN9V5hlQHfnCKxCviuVFbyOVEKwpBrDGjnjzoogIGZYIwG gCWRMIiVG34Q= X-Received: by 2002:a5d:4b85:0:b0:1f0:9661:9263 with SMTP id b5-20020a5d4b85000000b001f096619263mr14688748wrt.574.1646811449046; Tue, 08 Mar 2022 23:37:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJzOlrviz9RjlGPBY0KzahPHU9TEBH8kOrTT9CDYYf1GGPqDEUq7J2Nna3VzMSPJn958D/4mRA== X-Received: by 2002:a5d:4b85:0:b0:1f0:9661:9263 with SMTP id b5-20020a5d4b85000000b001f096619263mr14688700wrt.574.1646811448671; Tue, 08 Mar 2022 23:37:28 -0800 (PST) Received: from ?IPV6:2003:cb:c707:6300:8418:c653:d01f:3bd2? (p200300cbc70763008418c653d01f3bd2.dip0.t-ipconnect.de. [2003:cb:c707:6300:8418:c653:d01f:3bd2]) by smtp.gmail.com with ESMTPSA id m11-20020adff38b000000b001ef879a5930sm890520wro.61.2022.03.08.23.37.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 08 Mar 2022 23:37:28 -0800 (PST) Message-ID: <89ae59de-5b74-22b6-0076-c1a9a6fa62e7@redhat.com> Date: Wed, 9 Mar 2022 08:37:26 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 From: David Hildenbrand To: John Hubbard , Jason Gunthorpe Cc: linux-kernel@vger.kernel.org, Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , Pedro Gomes , Oded Gabbay , linux-mm@kvack.org References: <20220224122614.94921-1-david@redhat.com> <20220224122614.94921-13-david@redhat.com> <20220302165559.GU219866@nvidia.com> <0a159b65-cb80-c8eb-7ad1-24b83813531f@nvidia.com> <461e4d2b-9aa2-50d4-2c78-3f7fb3f6a2f6@redhat.com> Organization: Red Hat Subject: Re: [PATCH RFC 12/13] mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page In-Reply-To: <461e4d2b-9aa2-50d4-2c78-3f7fb3f6a2f6@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: F31D11C0013 X-Stat-Signature: iz9wgsq1y9ahqhccwn45hi4roq5ozd77 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DavUelJ4; spf=none (imf21.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-HE-Tag: 1646811451-193424 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 03.03.22 09:06, David Hildenbrand wrote: > On 03.03.22 02:47, John Hubbard wrote: >> On 3/2/22 12:38, David Hildenbrand wrote: >> ... >>> BUT, once we actually write to the private mapping via the page table, >>> the GUP pin would go out of sync with the now-anonymous page mapped into >>> the page table. However, I'm having a hard time answering what's >>> actually expected? >>> >>> It's really hard to tell what the user wants with MAP_PRIVATE file >>> mappings and stumbles over a !anon page (no modifications so far): >>> >>> (a) I want a R/O pin to observe file modifications. >>> (b) I want the R/O pin to *not* observe file modifications but observe >>> my (eventual? if any) private modifications, >>> >> >> On this aspect, I think it is easier than trying to discern user >> intentions. Because it is less a question of what the user wants, and >> more a question of how mmap(2) is specified. And the man page clearly >> indicates that the user has no right to expect to see file >> modifications. Here's the excerpt: >> >> "MAP_PRIVATE >> >> Create a private copy-on-write mapping. Updates to the mapping are not >> visible to other processes mapping the same file, and are not carried >> through to the underlying file. It is unspecified whether changes made >> to the file after the mmap() call are visible in the mapped region. >> " >> >>> Of course, if we already wrote to that page and now have an anon page, >>> it's easy: we are already no longer following file changes. >> >> Yes, and in fact, I've always thought that the way this was written >> means that it should be treated as a snapshot of the file contents, >> and no longer reliably connected in either direction to the page(s). > > Thanks John, that's extremely helpful. I forgot about these MAP_PRIVATE > mmap() details -- they help a lot to clarify which semantics to provide. > > So what we could do is: > > a) Extend FAULT_FLAG_UNSHARE to also unshare an !anon page in > a MAP_RPIVATE mapping, replacing it with an (exclusive) anon page. > R/O PTE permissions are maintained, just like unsharing in the > context of this series. > > b) Similarly trigger FAULT_FLAG_UNSHARE from GUP when trying to take a > R/O pin (FOLL_PIN) on a R/O-mapped !anon page in a MAP_PRIVATE > mapping. > > c) Make R/O pins consistently use "FOLL_PIN" instead, getting rid of > FOLL_FORCE|FOLL_WRITE. > > > Of course, we can't detect MAP_PRIVATE vs. MAP_SHARED in GUP-fast (no > VMA), so we'd always have to fallback in GUP-fast in case we intend to > FOLL_PIN a R/O-mapped !anon page. That would imply that essentially any > R/O pins (FOLL_PIN) would have to fallback to ordinary GUP. BUT, I mean > we require FOLL_FORCE|FOLL_WRITE right now, which is not any different, > so ... > > One optimization would be to trigger b) only for FOLL_LONGTERM. For > !FOLL_LONGTERM there are "in theory" absolutely no guarantees which data > will be observed if we modify concurrently to e.g., O_DIRECT IMHO. But > that would require some more thought. > > Of course, that's all material for another journey, although it should > be mostly straight forward. > Just a slight clarification after stumbling over shared zeropage code in follow_page_pte(): we do seem to support pinning the shared zeropage at least on the GUP-slow path. While I haven't played with it, I assume we'd have to implement+trigger unsharing in case we'd want to take a R/O pin on the shared zeropage. Of course, similar to file-backed MAP_PRIVATE handling, this is out of the scope of this series ("This change implies that whenever user space wrote to a private mapping (IOW, we have an anonymous page mapped), that GUP pins will always remain consistent: reliable R/O GUP pins of anonymous pages."). -- Thanks, David / dhildenb