From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54621C433F5 for ; Fri, 17 Dec 2021 21:37:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B9E7E6B0074; Fri, 17 Dec 2021 16:37:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B26EF6B0075; Fri, 17 Dec 2021 16:37:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97A3F6B0078; Fri, 17 Dec 2021 16:37:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0225.hostedemail.com [216.40.44.225]) by kanga.kvack.org (Postfix) with ESMTP id 826F56B0074 for ; Fri, 17 Dec 2021 16:37:22 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 48B8E82FCF for ; Fri, 17 Dec 2021 21:37:12 +0000 (UTC) X-FDA: 78928597104.28.B5897ED Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) by imf15.hostedemail.com (Postfix) with ESMTP id D6C4EA0041 for ; Fri, 17 Dec 2021 21:37:04 +0000 (UTC) Received: by mail-ed1-f46.google.com with SMTP id x15so12950617edv.1 for ; Fri, 17 Dec 2021 13:37:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=C9IysSCKwmnVf0clSv3vqtB9fmhRZmRL64GBWJwRw/Y=; b=NsoM3bp5z1ViaGMsoT3rdJ0GTTkM2nxbtO1rOArzSMsTMLZJNzIeeDbkuFjfjllNC8 xn6hGkr0DopwV/8Eu5djljv17PjTjMOTC8v7g/t2TyO7f8rn0uZpLJ8UcnwQTupGApwJ t6+Rs305ThbmGf+HCSQiO/fTN2vHb+5yfsROE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=C9IysSCKwmnVf0clSv3vqtB9fmhRZmRL64GBWJwRw/Y=; b=KwVyS5rUZjzizQ2I7/cd3qWJ5eheHN2u1bcDr/V9OuTihIPPeYewJqnt/omt/fWeZG /j0hacJzEad8bm4vtpG+HuyLQYgNki4rxKs2cQKqp1Sqj2nxzLzfdVFvRnpUycgZn5uB 9VJM3IgQ8Lg7Kt+PlMyjsb2RadxKINTEHhJM30FkKHaE0ZKxc9YYm5dAW49yGDAJqUFw 9mJZS/MGzjsUMpy/J0uay9rJ9B8knc35o83ywtx+dkGZIEIqc+oTG5J1Fsbb+rt/qISH I/oZJiId9uGxVXBq5tbelGz9CPIbYzvaVA/yjD2wpuLqX3LoZlmTPDr87OsBVRoySroH PSLg== X-Gm-Message-State: AOAM53055/9lObXDzCZdWPSiS1VUnpJuW8zMzOm45CnMxGf5r+8mSDu3 wrFcRQxOFC8J+bbGTl5ovr7/NISkE83Aj3OmvZ4= X-Google-Smtp-Source: ABdhPJxUfHjYIbDlPo8rwjfO1OmcSlZMRHKl4Bk+FBhP5wkunJl3tdCHhI9CYB1EaP7u3dxNnfg0VQ== X-Received: by 2002:aa7:d3c4:: with SMTP id o4mr3520636edr.160.1639777028768; Fri, 17 Dec 2021 13:37:08 -0800 (PST) Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com. [209.85.128.52]) by smtp.gmail.com with ESMTPSA id cw5sm3292534ejc.74.2021.12.17.13.37.08 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Dec 2021 13:37:08 -0800 (PST) Received: by mail-wm1-f52.google.com with SMTP id z206so2483221wmc.1 for ; Fri, 17 Dec 2021 13:37:08 -0800 (PST) X-Received: by 2002:a05:600c:1e01:: with SMTP id ay1mr2170336wmb.152.1639777018053; Fri, 17 Dec 2021 13:36:58 -0800 (PST) MIME-Version: 1.0 References: <20211217113049.23850-1-david@redhat.com> <20211217113049.23850-7-david@redhat.com> <9c3ba92e-9e36-75a9-9572-a08694048c1d@redhat.com> In-Reply-To: From: Linus Torvalds Date: Fri, 17 Dec 2021 13:36:41 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) To: David Hildenbrand Cc: Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=NsoM3bp5; spf=pass (imf15.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.46 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none X-Rspamd-Queue-Id: D6C4EA0041 X-Stat-Signature: jm69qrspdwj86kjihzz81zypjnc4tzq8 X-Rspamd-Server: rspam04 X-HE-Tag: 1639777024-537774 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 17, 2021 at 12:55 PM David Hildenbrand wrote: > > If we have a shared anonymous page we cannot have GUP references, not > even R/O ones. Because GUP would have unshared and copied the page, > resulting in a R/O mapped anonymous page. Doing a GUP on an actual shared page is wrong to begin with. You even know that, you try to use "page_mapcount() > 1" to disallow it. My point is that it's wrong regardless, and that "mapcount" is dubious, and that COW cannot - and must not - use mapcount, and that I think your shared case should strive to avoid it for the exact same reason. So, what I think should happen is: (a) GUP makes sure that it only ever looks up pages that can be shared with this VM. This may in involve breaking COW early with any past fork(). (b) it marks such pages so that any future work will not cause them to COW either Note that (a) is not necessarily "always COW and have to allocate and copy new page". In particular, if the page is already writable, you know you already have exclusive access to it and don't need to COW. And if it isn't writable, then the other common case is "the cow has only one user, and it's us" - that's the "refcount == 1" case. And (b) is what we do with that page_maybe_dma_pinned() logic for fork(), but also for things like swap cache creation (eg see commit feb889fb40fa: "mm: don't put pinned pages into the swap cache"). Note that this code all already exists, and already works - even without getting the (very expensive) mmap_sem. So it works with fast-GUP and it can race with concurrent forking by another thread, which is why we also have that seqcount thing. As far as I can tell, your "mapcount" logic fundamentally requires mmap_sem for the fork() race avoidance, for example. So this is why I don't like the mapcount games - I think they are very fragile, and not at all as logical as the two simple rules a/b above. I believe you can make mapcount games _work_ - we used to have something like that. It was incredibly fragile, and it had its own set of bugs, but with enough care it's doable. But my argument really is that I think it's the wrong approach, and that we should simply strive to follow the two simple conceptual rules above. Linus