From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D887C433DB for ; Sun, 10 Jan 2021 04:00:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E6AE92311E for ; Sun, 10 Jan 2021 04:00:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E6AE92311E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1A43C6B0160; Sat, 9 Jan 2021 23:00:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 12CB36B0161; Sat, 9 Jan 2021 23:00:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F363C6B0162; Sat, 9 Jan 2021 23:00:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0140.hostedemail.com [216.40.44.140]) by kanga.kvack.org (Postfix) with ESMTP id D63BA6B0160 for ; Sat, 9 Jan 2021 23:00:25 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9B458282B for ; Sun, 10 Jan 2021 04:00:25 +0000 (UTC) X-FDA: 77688513210.03.road58_560f91127501 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 71BFD28A4E9 for ; Sun, 10 Jan 2021 04:00:25 +0000 (UTC) X-HE-Tag: road58_560f91127501 X-Filterd-Recvd-Size: 8304 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Sun, 10 Jan 2021 04:00:24 +0000 (UTC) Received: by mail-ej1-f42.google.com with SMTP id d17so19912464ejy.9 for ; Sat, 09 Jan 2021 20:00:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cV8ZqZdlJCrqMkTrujQl9EAYyHQiNVsrSGFMRnZRJxw=; b=eRAa1kQCQANkeqLwLL3/tJy2lwSmtWHajnrcMYSJ5nN7o46BCeUw9zUVaxBi9eifOH 4n3qby9mFGgnOJUzJDVcvsK7YndJvWNIqhupAk8zKFJEYSwhMiPIXqgPAdTwcsByyFdr RPMJOy9Z0j1BumyB3nXDXFy4/7VPYYJbf5R8w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cV8ZqZdlJCrqMkTrujQl9EAYyHQiNVsrSGFMRnZRJxw=; b=K9HTRpATrXc26iQsoPbDMqTQeLCEtv5KF8q+GD6UGtku54ms3emEiXCb4yvHgJ0AIQ igCfBvgq89WvoqwCN+a1AjUNSBboQ/MA+BSaQ5S+l+4vmAvcEiarvZkuCozJpTrghW5c sX7dTOO9WtNuKGRoqIXiv4ZrILBB/6yZjNC1ScLW53NI90AH72D2gDHbtE6YC63T4Meq jfBfJBTZjIJaeJFG5g40DDFESy25D5d/cAqocLsBJFuCsaHjhmqBf9F/PTgPceUOCfXN F/+fEUu67M/0aG5mWx/6rH50Msl3sTPceXnCPSPRfkGfOnjTDu4tD+u9OPl0uaIv7NV5 422Q== X-Gm-Message-State: AOAM5326JA4UAH5Wcy0EuwhLpMEm9MuZQOMfrnZNOyJ8bvRBhV36sUd3 0pE/+MReemps8PvdI3c+pOLy4PcEEOZmxQ== X-Google-Smtp-Source: ABdhPJzHZcMxwOvZiDmvuTQkpksg6S1k+iv+4bFhVRYXbgJUBJGnURELc2jt8f3/plNW7Yp+T6HN6g== X-Received: by 2002:a17:906:7c49:: with SMTP id g9mr7257757ejp.185.1610251223305; Sat, 09 Jan 2021 20:00:23 -0800 (PST) Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com. [209.85.218.46]) by smtp.gmail.com with ESMTPSA id b7sm5274875ejz.4.2021.01.09.20.00.22 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 09 Jan 2021 20:00:23 -0800 (PST) Received: by mail-ej1-f46.google.com with SMTP id t16so19838528ejf.13 for ; Sat, 09 Jan 2021 20:00:22 -0800 (PST) X-Received: by 2002:a05:6512:338f:: with SMTP id h15mr4358383lfg.40.1610250728515; Sat, 09 Jan 2021 19:52:08 -0800 (PST) MIME-Version: 1.0 References: <20210110004435.26382-1-aarcange@redhat.com> In-Reply-To: From: Linus Torvalds Date: Sat, 9 Jan 2021 19:51:52 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse To: Andrea Arcangeli Cc: Andrew Morton , Linux-MM , Linux Kernel Mailing List , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oleg Nesterov , Jann Horn , Kees Cook , John Hubbard , Leon Romanovsky , Jason Gunthorpe , Jan Kara , Kirill Tkhai , Nadav Amit , Jens Axboe Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jan 9, 2021 at 6:51 PM Andrea Arcangeli wrote: > > I just don't see the simplification coming from > 09854ba94c6aad7886996bfbee2530b3d8a7f4f4. Instead of checking > page_mapcount above as an optimization, to me it looks much simpler to > check it in a single place, in do_wp_page, that in addition of > optimizing away the superfluous copy, would optimize away the above > complexity as well. Here's the difference: (a) in COW, "page_mapcount()" is pure and utter garbage, and has zero meaning. Why? Because MAPCOUNT DOES NOT MATTER FOR COW. COW is about "I'm about to write to this page, and that means I need an _exclusive_ page so that I don't write to a page that somebody else is using". Can you admit that fundamental fact? Notice how "page_mapcount()" has absolutely NOTHING to do with "exclusive page". There are lots of other ways the page can be used aside from mapcount. The page cache may have a reference to the page. Somebody that did a GUP may have a reference to the page. So what actually matters at COW time? The only thing that matters is "am I the exclusive owner". And guess what? We have a count of page references. It's "page_count()". That's *EXACTLY* the thing that says "are there maybe other references to this page". In other words, COW needs to use page_count(). It really is that easy. End of story. So, given that, why do I then do > + if (page_mapcount(page) != 1) > + return false; in my patch, when I just told you that "page_mapcount()" is irrelevant for COW? Guess what? The above isn't about COW. The above isn't about whether we need to do a copy in order to be able to write to the page without anybody else being affected by it. No, at fork time, and at this clear_refs time, the question is entirely different. The question is not "Do I have exclusive access to the page", but it is "Did I _already_ made sure that I have exclusive access to the page because I pinned it"? See how different the question is? Because *if* you have done a pinned COW for soem direct-IO read, and *if* that page is dirty, then you know it's mapped only in your address space. You're basically doing the _reverse_ of the COW test, and asking yourself "is this my own private pinned page"? And then it's actually perfectly sane to do a check that says "obviously, if somebody else has this page mapped, then that's not the case". See? For COW, "page_mapcount()" is pure and utter garbage, and entirely meaningless. How many places it's mapped in doesn't matter. You may have to COW even if it's only mapped in your address space (page cache, GUP, whatever). But for "did I already make this exclusive", then it's actually meaningful to say "is it mapped somewhere else". We know it has other users - that "page_may_be_pinned()" in fact *guarantees* that it has other users. But we're double-checking that the other users aren't other mappings. That said, I did just realize that that "page_mapcount()" check is actually pointless. Because we do have a simpler one. Instead of checking whether all those references that made us go "page_might_be_pinned()" aren't other mappings, the simple check for "pte_writable()" would already have told us that we had already done the COW. So you are actually right that the page_mapcount() test in my patch is not the best way to check for this. By the time we see "page_may_be_pinned()", we might as well just say "Oh, it's a private mapping and the pte is already writable, so we know we were the exclusive mapper, because COW and fork() already guarantee that". > And I won't comment if it's actually safe to skip random pages or > not. All I know is for mprotect and uffd-wp, definitely the above > approach wouldn't work. Why do you say that? You say ":definitely know", but I think you're full of it. The fact is, if you have a pinned page, why wouldn't we say "you can't turn it read-only"? It's pinned in the VM address space - and it's pinned writable. Simple and clear semantics. You can *remove* it, but you can't change the pinning. Linus