From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 116C2C433EF for ; Sat, 18 Dec 2021 22:54:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 433446B0073; Sat, 18 Dec 2021 17:54:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 408656B0074; Sat, 18 Dec 2021 17:54:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A98B6B0075; Sat, 18 Dec 2021 17:54:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id 1D6636B0073 for ; Sat, 18 Dec 2021 17:54:18 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D36788C5A0 for ; Sat, 18 Dec 2021 22:54:07 +0000 (UTC) X-FDA: 78932419734.28.446D53C Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf28.hostedemail.com (Postfix) with ESMTP id 7816EC0022 for ; Sat, 18 Dec 2021 22:54:07 +0000 (UTC) Received: by mail-ed1-f53.google.com with SMTP id z29so22314235edl.7 for ; Sat, 18 Dec 2021 14:54:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mJf35OulebkWfKFhXCfv+327LAJ4W2k8OGIoJTIXfIo=; b=dG1V9TzLGCecXWNdRV5Ieb7V7jx4XXm42+n2J/149xEOJE5Tg0NmZ+cfF1j3OVD1dv 4/abdSFzcm6W1ZofMPLzp3kY7w8JfzzYfQK7mjkdLbDcoe+KW25lfWOyG0w7OTq2no74 0E+b/2RDV5qKx9L2+ASDLVxYRIEppRhgOV7FQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mJf35OulebkWfKFhXCfv+327LAJ4W2k8OGIoJTIXfIo=; b=6Y2i3UWxwQumun1IyYtPNbT36iy9uvR5TWB/7FCswbVQ/tGJRH1VzUdxBX+0YekdQp 1x7ZKI1YF/oDUf3947JAWVz24eNZMpiwpU3JMT285q0Yt4j6RaQ2FbAWytJ0deMx6yVv 7VGRJ2C2cOLkvWTR22ATQPxEVtOr15x1g99V+itI3ybHY9iT+x5Xv/zcujnlhzNDyOSM UIqx/GkJ8AWT43g2LCpB61Uf6+nDsK8Le9a/A7cYinPlpsJEu0jktVeZBbso8CCeWl/G M10cCVGJOEagCz0+LVhudcODSXqhpsXNjNv04inl0jmOEqGKesN0X/XJdlJVJ4LqZumr CTAw== X-Gm-Message-State: AOAM53243jKHb0cEo3K4Z46YmZt9WuiAneFhX5axCaINvh0gN2bd4Bbg bXSzzmeFwLeVlQC3+Sriwn3ZPd3HaaIDyrW5i4Q= X-Google-Smtp-Source: ABdhPJwWYW7qFqDJl2zeqdXjQgCzmcJ3ctJPE9WD4wxZMJUm44ivnYxuJ0T2iKfvCqQ7rLKhy5KqUQ== X-Received: by 2002:a17:907:7f9e:: with SMTP id qk30mr7194972ejc.238.1639868045906; Sat, 18 Dec 2021 14:54:05 -0800 (PST) Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com. [209.85.128.54]) by smtp.gmail.com with ESMTPSA id ck14sm717280edb.5.2021.12.18.14.54.05 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 18 Dec 2021 14:54:05 -0800 (PST) Received: by mail-wm1-f54.google.com with SMTP id p27-20020a05600c1d9b00b0033bf8532855so3946020wms.3 for ; Sat, 18 Dec 2021 14:54:05 -0800 (PST) X-Received: by 2002:a05:600c:1e01:: with SMTP id ay1mr6056643wmb.152.1639868034874; Sat, 18 Dec 2021 14:53:54 -0800 (PST) MIME-Version: 1.0 References: <54c492d7-ddcd-dcd0-7209-efb2847adf7c@redhat.com> <20211217204705.GF6385@nvidia.com> <2E28C79D-F79C-45BE-A16C-43678AD165E9@vmware.com> <20211218030509.GA1432915@nvidia.com> <5C0A673F-8326-4484-B976-DA844298DB29@vmware.com> <20211218184233.GB1432915@nvidia.com> <5CA1D89F-9DDB-4F91-8929-FE29BB79A653@vmware.com> In-Reply-To: <5CA1D89F-9DDB-4F91-8929-FE29BB79A653@vmware.com> From: Linus Torvalds Date: Sat, 18 Dec 2021 14:53:38 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) To: Nadav Amit Cc: Jason Gunthorpe , David Hildenbrand , Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: pjmnccruithfn5tw6qhbap5km7e9fe5g X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 7816EC0022 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=dG1V9TzL; spf=pass (imf28.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.53 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none X-HE-Tag: 1639868047-398797 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Dec 18, 2021 at 1:49 PM Nadav Amit wrote: > > Yes, I guess that you pin the pages early for RDMA registration, which > is also something you may do for IO-uring buffers. This would render > userfaultfd unusable. I think this is all on usefaultfd. That code literally stole two of the bits from the page table layout - bits that we could have used for better things. And guess what? Because it required those two bits in the page tables, and because that's non-portable, it turns out that UFFD_WP can only be enabled and only works on x86-64 in the first place. So UFFS_WP is fundamentally non-portable. Don't use it. Anyway, the good news is that I think that exactly because uffd_wp stole two bits from the page table layout, it already has all the knowledge it needs to handle this entirely on its own. It's just too lazy to do so now. In particular, it has that special UFFD_WP bit that basically says "this page is actually writable, but I've made it read-only just to get the fault for soft-dirty". And the hint here is that if the page truly *was* writable, then COW just shouldn't happen, and all that the page fault code should do is set soft-dirty and return with the page set writable. And if the page was *not* writable, then UFFD_WP wasn't actually needed in the first place, but the uffd code just sets it blindly. Notice? It _should_ be just an operation based purely on the page table contents, never even looking at the page AT ALL. Not even the page count, much less some mapcount thing. Done right, that soft-dirty thing could work even with no page backing at all, I think. But as far as I know, we've actually never seen a workload that does all this, so.. Does anybody even have a test-case? Because I do think that UFFD_WP really should never really look at the page, and this issue is actually independent of the "page_count() vs page_mapcount()" discussion. (Somewhat related aside: Looking up the page is actually one of the more expensive operations of a page fault and a lot of other page table manipulation functions - it's where most of the cache misses happen. That's true on the page fault side, but it's also true for things like copy_page_range() etc) Linus