From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DE5DC433EF for ; Sun, 19 Dec 2021 00:36:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F0556B0071; Sat, 18 Dec 2021 19:36:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 877EF6B0073; Sat, 18 Dec 2021 19:36:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CA7D6B0074; Sat, 18 Dec 2021 19:36:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0233.hostedemail.com [216.40.44.233]) by kanga.kvack.org (Postfix) with ESMTP id 566FE6B0071 for ; Sat, 18 Dec 2021 19:36:25 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0F545181AC9C6 for ; Sun, 19 Dec 2021 00:36:15 +0000 (UTC) X-FDA: 78932677110.17.D2D9C8C Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf27.hostedemail.com (Postfix) with ESMTP id 1F51340045 for ; Sun, 19 Dec 2021 00:36:13 +0000 (UTC) Received: by mail-ed1-f42.google.com with SMTP id g14so22867400edb.8 for ; Sat, 18 Dec 2021 16:36:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=BdNgjsIj5tOuYGKYyecvDBuTvXGSK6kpZuYnfNKXfG4=; b=SBxRL68wKMh0mduwSJE8/TfpEylwfB/WWVjsuh2IHwg8CZkHVGsqgob31zTym1LDw8 vdy+NGL8vOBy87OVzXh0Q8UvJOMiQrtOLu4VhyLbcgfDGZ8O8+aZwDZsospELMVmT4RW ZwwD7WDaGLKyjNt/mpUr+96sjQo2JE152A2hY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=BdNgjsIj5tOuYGKYyecvDBuTvXGSK6kpZuYnfNKXfG4=; b=7EIYlQHwZV327TlID0xrct32oiwsseMDNBXdjmCqPkCZP+zIbE1zSgkEPNmgVNpuRf WIut3xfRIIYLTb98oaO9PuPihhLofxhSHA5yHDHG6ED0MxzSwaaPHd3lk7YjFtR8sji2 PdLg3lUX4ROPtTswGUrlIWlei0JO7KLDX3HJBFJmnn8KvAVRKEoxqWL0tR8rcUdPKMdd 5Dp8ZpXr5yElIsD/GirfDhXdMZqmGj1CwZIlr63owr8YAqyD6shq1zqPMfzuzGLm+L1l M93teBs1E6waTBMK8c99c2MJKNZenVQHS9g+ehEMn7wLy5sMxRYN8bczrROvd3a03Cfe NcRg== X-Gm-Message-State: AOAM531HPAY1t+irhbqvTmOFu/tLxqywcSZvyFRUJDGjt9RqT/YjwHiG LDqpnyMUNB1Fl2s367Efw6s6m3jkPZKVxlllTco= X-Google-Smtp-Source: ABdhPJw2Z2GnaCnlY26dhqyx4uaDcOvxvo+Jwv42TIvEeD+j9aTjKmlCWwiO3UH4a4wiKTr16WtA8A== X-Received: by 2002:a17:906:e2c5:: with SMTP id gr5mr7947617ejb.282.1639874173026; Sat, 18 Dec 2021 16:36:13 -0800 (PST) Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com. [209.85.128.53]) by smtp.gmail.com with ESMTPSA id b5sm849285edz.14.2021.12.18.16.36.12 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 18 Dec 2021 16:36:12 -0800 (PST) Received: by mail-wm1-f53.google.com with SMTP id bg2-20020a05600c3c8200b0034565c2be15so6875327wmb.0 for ; Sat, 18 Dec 2021 16:36:12 -0800 (PST) X-Received: by 2002:a05:600c:1d97:: with SMTP id p23mr8286016wms.144.1639874161442; Sat, 18 Dec 2021 16:36:01 -0800 (PST) MIME-Version: 1.0 References: <54c492d7-ddcd-dcd0-7209-efb2847adf7c@redhat.com> <20211217204705.GF6385@nvidia.com> <2E28C79D-F79C-45BE-A16C-43678AD165E9@vmware.com> <20211218030509.GA1432915@nvidia.com> <5C0A673F-8326-4484-B976-DA844298DB29@vmware.com> <20211218184233.GB1432915@nvidia.com> <5CA1D89F-9DDB-4F91-8929-FE29BB79A653@vmware.com> <4D97206A-3B32-4818-9980-8F24BC57E289@vmware.com> In-Reply-To: <4D97206A-3B32-4818-9980-8F24BC57E289@vmware.com> From: Linus Torvalds Date: Sat, 18 Dec 2021 16:35:45 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) To: Nadav Amit Cc: Jason Gunthorpe , David Hildenbrand , Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 1F51340045 X-Stat-Signature: 3w6by1gyc88sntpypyjxu7z1x5yz4sog Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=SBxRL68w; spf=pass (imf27.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.42 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none X-HE-Tag: 1639874173-946046 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Dec 18, 2021 at 4:19 PM Nadav Amit wrote: > > I have always felt that the PTE software-bits limit is very artificial. > We can just allocate two adjacent pages when needed, one for PTEs and > one for extra software bits. A software bit in the PTE can indicate > =E2=80=9Cextra software bits=E2=80=9D are relevant (to save cache-misses)= , and a bit > in the PTEs' page-struct indicate whether there is adjacent =E2=80=9Cextr= a > software bits=E2=80=9D page. Hmm. That doesn't sound very bad, no. And it would be nice to have more software bits (and have them portably). > I don=E2=80=99t think that I am following. The write-protection of UFFD m= eans > that the userspace wants to intervene before anything else (including > COW). The point I was making (badly) is that UFFD_WP is only needed to for the case where the pte isn't already non-writable for other reasons. > UFFD_WP indications are recorded per PTE (i.e., not VMA). The changing of those bits are basically a bastardized 'mprotect()', and does already require the vma to be marked VM_UFFD_WP. And the way you set (or clear) the bits is with a range operation. It really could have been done with mprotect(), and with actual explicit vma bits. The fact that it now uses the page table bit is rather random. I think it would actually be cleaner to make that userfaultfd_writeprotect truly *be* a vma range. Right now it's kind of "half this, half that". Of course, it's possible that because of this situation, some users do a lot of fine-grained VM_UFFD_WP setting, and they kind of expect to not have issues with lots of vma fragments. So practical concerns may have made the implementation set in stone. (I have only ever seen the kernel side of uffd, not the actual user side, so I'm not sure about the use patterns). That said, your suggestion of a shadow sw page table bit thing would also work. And it would solve some problems we have in core areas (notably "page_special()" which right now has that ARCH_HAS_PTE_SPECIAL thing). It would make it really easy to have that "this page table entry is pinned" flag too. Linus