From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2919DC4332F for ; Tue, 21 Dec 2021 17:05:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 41FEE6B0095; Tue, 21 Dec 2021 12:05:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CF626B0098; Tue, 21 Dec 2021 12:05:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2702A6B0099; Tue, 21 Dec 2021 12:05:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0215.hostedemail.com [216.40.44.215]) by kanga.kvack.org (Postfix) with ESMTP id 17AEE6B0095 for ; Tue, 21 Dec 2021 12:05:54 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id CAA2C180ABA18 for ; Tue, 21 Dec 2021 17:05:53 +0000 (UTC) X-FDA: 78942428586.23.8530D0E Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf15.hostedemail.com (Postfix) with ESMTP id 1EEF6A0044 for ; Tue, 21 Dec 2021 17:05:44 +0000 (UTC) Received: by mail-ed1-f42.google.com with SMTP id m21so27482492edc.0 for ; Tue, 21 Dec 2021 09:05:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Tpumo0oZR7f+gDBFHqsREUmpm9cZIvo2yly46BhRCj0=; b=GnWHKkT8GouwFL0+uYYjXFlw6cUV9SMjR9tfE+ltAxKnlGj6O8fPiy0df016fPcBVF ur7H0ohhQUm1jlCOawmSbmrG+S5D2KrKNMVbGo+rAxsmydeS+L3Nov+E+gzmxlPWmQWG XXN1CVC9l1BhNxY+GTrFGrL4Fw1U5HMdxtV74= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Tpumo0oZR7f+gDBFHqsREUmpm9cZIvo2yly46BhRCj0=; b=ldJil19VBki00QVAekrPKvWVzq6X2WuHhKva1LM3NJtdWVYOB+GidZAXO2Cb5rydqM AwszsnGbBe/377NMm8yhdAZrUc7uYSHpGz3qK+/Y2GL0+gDv52fJy/a3oc3LELniJQaO HPxk7ZmrFCZ6ccmhkCNth2f3SEMYja5onWI3eT0WXkVgXjXuZSQnpg+pGPCqeqAM4+FM JK94yBl2sa2fDNn07K+HTxwLVbT5zp20fyIuDecKj6CSl8tzO77/u8bz2vdPHzqDEeiP 99lD/i2/bTKHcJHdK0Ff/5RLU2t/+cCvyoAuNOj7g2FFLOlxe1KgXXHQ+4kKCWfrE4WQ WfAQ== X-Gm-Message-State: AOAM532k8PvsLhBRWZ8VYn393idKnwoEvTY5RwrumxP5NcEQd+qCmtRO 8sG7aeNbudQz+ePUzzRn9/VFrbzYjf3hdTKpJQw= X-Google-Smtp-Source: ABdhPJxCBjP9nIeLB37PGnzdz34iQ2GDjJZKD7u4Es+yAv14Ty3iGYcRwYqQnowp5YpuTZNnhbcQ0A== X-Received: by 2002:a17:906:7688:: with SMTP id o8mr3562738ejm.291.1640106350270; Tue, 21 Dec 2021 09:05:50 -0800 (PST) Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com. [209.85.221.45]) by smtp.gmail.com with ESMTPSA id i20sm8567839edv.44.2021.12.21.09.05.50 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 21 Dec 2021 09:05:50 -0800 (PST) Received: by mail-wr1-f45.google.com with SMTP id d9so8752420wrb.0 for ; Tue, 21 Dec 2021 09:05:50 -0800 (PST) X-Received: by 2002:adf:8b0e:: with SMTP id n14mr3335003wra.281.1640106339578; Tue, 21 Dec 2021 09:05:39 -0800 (PST) MIME-Version: 1.0 References: <20211218184233.GB1432915@nvidia.com> <5CA1D89F-9DDB-4F91-8929-FE29BB79A653@vmware.com> <4D97206A-3B32-4818-9980-8F24BC57E289@vmware.com> <5A7D771C-FF95-465E-95F6-CD249FE28381@vmware.com> <20211221010312.GC1432915@nvidia.com> In-Reply-To: From: Linus Torvalds Date: Tue, 21 Dec 2021 09:05:23 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) To: David Hildenbrand Cc: Jason Gunthorpe , Nadav Amit , Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 1EEF6A0044 X-Stat-Signature: amhz1k5qq6g5yb35jhrhk9s7h3o5frtc Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=GnWHKkT8; spf=pass (imf15.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.42 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none X-Rspamd-Server: rspam10 X-HE-Tag: 1640106344-423274 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 21, 2021 at 12:58 AM David Hildenbrand wrote: > > On 21.12.21 02:03, Jason Gunthorpe wrote: > > > I'm having a hard time imagining how gup_fast can maintain any sort of > > bit - it lacks all forms of locks so how can we do an atomic test and > > set between two pieces of data? > > And exactly that is to be figured out. So my preference would be to just always maintain the "exclusive to this VM" bit in the 'struct page', because that makes things easier to think about. [ Of course - the bit could be reversed, and be a 'not exclusive to this VM' bit, semantically the set-or-cleared issue doesn't matter. Also, when I talk about some "exclusive to this VM" bit, I'm purely talking about pages that are marked PageAnon(), so the bit may or may not even exist for other pager types ] And then all GUP-fast would need to do is to refuse to look up a page that isn't exclusive to that VM. We already have the situation that GUP-fast can fail for non-writable pages etc, so it's just another test. > Note that I am trying to make also any kind of R/O pins on an anonymous > page work as expected as well, to fix any kind of GUP after fork() and > GUP before fork(). So taking a R/O pin on an !PageAnonExclusive() page > similarly has to make sure that the page is exclusive -- even if it's > mapped R/O (!). I do think the existing "maybe_pinned()" logic is fine for that. The "exclusive to this VM" bit can be used to *help* that decision - because only an exclusive page can be pinned - bit I don't think it should _replace_ that logic. There's a quite fundamental difference between (a) COW and GUP: these two operations _have_ to know that they get an exclusive page in order to re-use or look up the page respectively (b) the pre-cow logic in fork() or the "add this to the swap cache" logic in vmscan that decides whether a page can be turned into a COW page by adding a reference coutn to it (whether due to fork or swap cache doesn't matter - the end result is the same). The difference is that in (a) the thing we *have* to get right is whether a page is exclusively owned by that VM or not. We can COW too much, but we can never share a page unless it's exclusive. That's true whether it's pinned or not. In (b), the "have to get right" is different. In (b), it's perfectly ok to COW an exclusive page and turn it non-exclusive. But we must never COW a pinned page. So (a) and (b) are very different situations, and have different logic. If we always maintain an exclusive bit for AnonPage pages, then both (a) and (b) can use that bit, but they'll use it very differently. In (a) we'll refuse to look it up and will force a 'handle_mm_fault()' to get an exclusive copy. And in (b), we just use it as a "we know only exclusive pages can be pinned", so it's just another check for page_needs_cow_for_dma(), the same way we currently check "MMF_HAS_PINNED" to narrow down the whole "page count indicates this may be a pinned page" question. And the "page is exclusive" would actually be the *common* case for almost all pages. Any time you've written to a page and you haven't forked after the write (and it hasn't been turned into a swap page), that page would be exclusive to that VM. Doesn't this seem like really straightforward semantics to maintain (and think about)? I'd like the exclusive page bit to *not* be directly about "has this page been pinned" exactly because we already have too many special cases for GUP. It would be nicer to have a page bit that has very clear semantics even in the absence of GUP. Linus