From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB4D2C433EF for ; Sat, 18 Dec 2021 19:53:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D89746B0071; Sat, 18 Dec 2021 14:53:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D11826B0073; Sat, 18 Dec 2021 14:53:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B63EB6B0074; Sat, 18 Dec 2021 14:53:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0135.hostedemail.com [216.40.44.135]) by kanga.kvack.org (Postfix) with ESMTP id 9F7BE6B0071 for ; Sat, 18 Dec 2021 14:53:21 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 52F8A180E9362 for ; Sat, 18 Dec 2021 19:53:11 +0000 (UTC) X-FDA: 78931963782.25.9C49858 Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf06.hostedemail.com (Postfix) with ESMTP id 0C91F180031 for ; Sat, 18 Dec 2021 19:53:09 +0000 (UTC) Received: by mail-lf1-f48.google.com with SMTP id k37so12175925lfv.3 for ; Sat, 18 Dec 2021 11:53:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FwDKq6c1mQYLU0p0J66kK9b9U2JQzk1+tCZIkkXvJ6I=; b=U5uC3Kaqj3Jyz9ClVkVNdaCklm+oiz2jhIexhCBthnjLmX3U7Wkoqfro6mIgIQYih1 fYnIl74HxHVmm4oN5qZYu46cLpq/0a9MM0L7iLgtvdvWn7S16gUCjgjGv0WBg5JYcwY8 2t/4p+6P3qYm2phM4iAacdJb1uzh4Bb51ORJI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FwDKq6c1mQYLU0p0J66kK9b9U2JQzk1+tCZIkkXvJ6I=; b=e9K/LJ/0U09Wx3X7+rR9MyaY4gdqVcxXP5THv1/qC6J1sJpt9eOU4xLncNNDsCfe1h /dCTNR16vYxx0pSxHuhVmFDLQJhEWQp4JC6PYCop2SArWkNw8a/+DF8sqKMNm+dM/Wm5 dc2l/mVF3+b5j3DDOI5fQHjxmhQh1jDd/J4q62uHz+dBC53DmJv/NHkWoWNVtKFhuiof 0E8QOMXhBHR1jkZuqeZwM104o5sXtAUFWhmc9U8KzHkfDiLQbI3R9JDo2mUKthbRA939 XLPUHlwTIouftv7ERcbUwXqebpQYCPLnxHz1TbpmNQazIpkchbuvnool8lcv2ll5IM64 j55w== X-Gm-Message-State: AOAM531JiZViDjHuePksB0p0SaOEDpAd6qDgRwxvXoWsti/G9VfQ1ok+ 5IgnrAUgdklgkHAY12w3Xv8w8bXS6ktUygHs1Qs= X-Google-Smtp-Source: ABdhPJxYKWO6Mj129xcfbd1e/fGLhjIJ5sr2E6eCw/2DfRNjYYHWU7oEeU0S7bUNVm3fVEIxeUzhVw== X-Received: by 2002:a05:6512:4022:: with SMTP id br34mr8534261lfb.530.1639857189010; Sat, 18 Dec 2021 11:53:09 -0800 (PST) Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com. [209.85.208.172]) by smtp.gmail.com with ESMTPSA id f23sm2098954ljg.90.2021.12.18.11.53.08 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 18 Dec 2021 11:53:08 -0800 (PST) Received: by mail-lj1-f172.google.com with SMTP id v15so9005327ljc.0 for ; Sat, 18 Dec 2021 11:53:08 -0800 (PST) X-Received: by 2002:a05:6000:10d2:: with SMTP id b18mr6423478wrx.193.1639857177671; Sat, 18 Dec 2021 11:52:57 -0800 (PST) MIME-Version: 1.0 References: <20211217113049.23850-1-david@redhat.com> <20211217113049.23850-7-david@redhat.com> <9c3ba92e-9e36-75a9-9572-a08694048c1d@redhat.com> <02cf4dcf-74e8-9cbd-ffbf-8888f18a9e8a@redhat.com> <40e7e0ab-0828-b2e7-339f-35f68a228b3d@redhat.com> In-Reply-To: From: Linus Torvalds Date: Sat, 18 Dec 2021 11:52:41 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) To: David Hildenbrand Cc: Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 0C91F180031 X-Stat-Signature: 7i6w4r66t79mmc3nf9xnetrc7wf5y6ny Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=U5uC3Kaq; spf=pass (imf06.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.167.48 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none X-Rspamd-Server: rspam10 X-HE-Tag: 1639857189-318656 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Dec 18, 2021 at 11:21 AM Linus Torvalds wrote: > > To recap: > (1) is important, and page_count() is the only thing that guarantees > "you get full access to a page only when it's *obviously* exclusively > yours". > (2) is NOT important, but could be a performance issue, but we have > real data from the past year that it isn't. > (3) is important, and has a really spectacularly simple conceptual > fix with quite simple code too. > > In contrast, with the "mapcount" games you can't even explain why they > should work, and the patches I see are actively buggy because > everything is so subtle. So to challenge you, please explain exactly how mapcount works to solve (1) and (3), and how it incidentally guarantees that (2) doesn't happen. And that really involves explaining the actual code too. I can explain the high-level concepts in literally a couple of sentences. For (1), "the page_count()==1 guarantees you are the only owner, so a COW event can re-use the page" really explains it. And the code is pretty simple too. There's nothing subtle about "goto copy" when pagecount is not 1. And even the locking is simple: "we hold the page table lock, we found a page, it has only one ref to it, we own it" Our VM is *incredibly* complicated. There really are serious advantages to having simple rules in place. And for (2), the simple rule is "yeah, we can cause spurious cow events". That's not only simple to explain, it's simple to code for. Suddenly you don't need to worry. "Copying the page is always safe". That's a really really powerful statement. Now, admittedly (3) is the one that ends up being more complicated, but the *concept* sure is simple. "If you don't want to COW this page, then don't mark it for COW". The *code* for (3) is admittedly a bit more complicated. The "don't mark it for COW" is simple to say, but we do have that fairly odd locking thing with fork() doing a seqcount_write_begin/end, and then GIP does the read-seqcount thing with retry. So it's a bit unusual, and I don't think we have that particular pattern anywhere else, but it's one well-defined lock and while unusual it's not *complicated* as far as kernel locking rules go. It's unusual and perhaps not trivial, but in the end those seqcount code sequences are maybe 10 lines total, and they don't interact with anything else. And yes, the "don't mark it for COW" means that write-protecting something is special, mainly because we sadly do not have extra bits in the page tables. It would be *really* easy if we could just hide this "don't COW this page" in the page table. Truly trivial. We don't, because of portability across different architectures ;( So I'll freely give you that my (3) is somewhat painful, but it's painful with a really simple concept. And the places that get (3) wrong are generally places that nobody has been able to care about. I didn't realize the problem with creating a swap page after the fact for a while, so that commit feb889fb40fa ("mm: don't put pinned pages into the swap cache") came later, but it's literally a very simple two-liner. The commit message for commit feb889fb40fa may be worth reading. It very much explains the spirit of the thing, and is much longer than the trivial patch itself. Simple and clear concepts matter. Code gets complicated even then, but complex code with complex concepts is a bad combination. Linus