From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Hildenbrand <david@redhat.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
David Rientjes <rientjes@google.com>,
Shakeel Butt <shakeelb@google.com>,
John Hubbard <jhubbard@nvidia.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Mike Rapoport <rppt@linux.ibm.com>,
Yang Shi <shy828301@gmail.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Matthew Wilcox <willy@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Michal Hocko <mhocko@kernel.org>, Nadav Amit <namit@vmware.com>,
Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Peter Xu <peterx@redhat.com>, Donald Dutile <ddutile@redhat.com>,
Christoph Hellwig <hch@lst.de>, Oleg Nesterov <oleg@redhat.com>,
Jan Kara <jack@suse.cz>, Linux-MM <linux-mm@kvack.org>,
"open list:KERNEL SELFTEST FRAMEWORK"
<linux-kselftest@vger.kernel.org>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>
Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb)
Date: Sat, 18 Dec 2021 11:52:41 -0800 [thread overview]
Message-ID: <CAHk-=wjevjeL44qafYd8=cJHZgNUOUuWVJ28vkS4U4v_Af-xaQ@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wg95CiyT45ZOxtnWQ7cdKmejXcOydEyJcTTNnp5-nd+xg@mail.gmail.com>
On Sat, Dec 18, 2021 at 11:21 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> To recap:
> (1) is important, and page_count() is the only thing that guarantees
> "you get full access to a page only when it's *obviously* exclusively
> yours".
> (2) is NOT important, but could be a performance issue, but we have
> real data from the past year that it isn't.
> (3) is important, and has a really spectacularly simple conceptual
> fix with quite simple code too.
>
> In contrast, with the "mapcount" games you can't even explain why they
> should work, and the patches I see are actively buggy because
> everything is so subtle.
So to challenge you, please explain exactly how mapcount works to
solve (1) and (3), and how it incidentally guarantees that (2) doesn't
happen.
And that really involves explaining the actual code too. I can explain
the high-level concepts in literally a couple of sentences.
For (1), "the page_count()==1 guarantees you are the only owner, so a
COW event can re-use the page" really explains it. And the code is
pretty simple too. There's nothing subtle about "goto copy" when
pagecount is not 1. And even the locking is simple: "we hold the page
table lock, we found a page, it has only one ref to it, we own it"
Our VM is *incredibly* complicated. There really are serious
advantages to having simple rules in place.
And for (2), the simple rule is "yeah, we can cause spurious cow
events". That's not only simple to explain, it's simple to code for.
Suddenly you don't need to worry. "Copying the page is always safe".
That's a really really powerful statement.
Now, admittedly (3) is the one that ends up being more complicated,
but the *concept* sure is simple. "If you don't want to COW this page,
then don't mark it for COW".
The *code* for (3) is admittedly a bit more complicated. The "don't
mark it for COW" is simple to say, but we do have that fairly odd
locking thing with fork() doing a seqcount_write_begin/end, and then
GIP does the read-seqcount thing with retry. So it's a bit unusual,
and I don't think we have that particular pattern anywhere else, but
it's one well-defined lock and while unusual it's not *complicated* as
far as kernel locking rules go. It's unusual and perhaps not trivial,
but in the end those seqcount code sequences are maybe 10 lines total,
and they don't interact with anything else.
And yes, the "don't mark it for COW" means that write-protecting
something is special, mainly because we sadly do not have extra bits
in the page tables. It would be *really* easy if we could just hide
this "don't COW this page" in the page table. Truly trivial. We don't,
because of portability across different architectures ;(
So I'll freely give you that my (3) is somewhat painful, but it's
painful with a really simple concept.
And the places that get (3) wrong are generally places that nobody has
been able to care about. I didn't realize the problem with creating a
swap page after the fact for a while, so that commit feb889fb40fa
("mm: don't put pinned pages into the swap cache") came later, but
it's literally a very simple two-liner.
The commit message for commit feb889fb40fa may be worth reading. It
very much explains the spirit of the thing, and is much longer than
the trivial patch itself.
Simple and clear concepts matter. Code gets complicated even then, but
complex code with complex concepts is a bad combination.
Linus
next prev parent reply other threads:[~2021-12-18 19:53 UTC|newest]
Thread overview: 127+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-17 11:30 [PATCH v1 00/11] mm: COW fixes part 1: fix the COW security issue for THP and hugetlb David Hildenbrand
2021-12-17 11:30 ` [PATCH v1 01/11] seqlock: provide lockdep-free raw_seqcount_t variant David Hildenbrand
2021-12-17 17:02 ` Nadav Amit
2021-12-17 17:29 ` David Hildenbrand
2021-12-17 17:49 ` David Hildenbrand
2021-12-17 18:01 ` Nadav Amit
2021-12-17 21:28 ` Thomas Gleixner
2021-12-17 22:02 ` David Hildenbrand
2021-12-17 11:30 ` [PATCH v1 02/11] mm: thp: consolidate mapcount logic on THP split David Hildenbrand
2021-12-17 19:06 ` Yang Shi
2021-12-18 14:24 ` Kirill A. Shutemov
2021-12-17 11:30 ` [PATCH v1 03/11] mm: simplify hugetlb and file-THP handling in __page_mapcount() David Hildenbrand
2021-12-17 17:16 ` Nadav Amit
2021-12-17 17:30 ` David Hildenbrand
2021-12-17 18:06 ` Mike Kravetz
2021-12-17 18:11 ` David Hildenbrand
2021-12-17 19:07 ` Yang Shi
2021-12-18 14:31 ` Kirill A. Shutemov
2021-12-17 11:30 ` [PATCH v1 04/11] mm: thp: simlify total_mapcount() David Hildenbrand
2021-12-17 19:12 ` Yang Shi
2021-12-18 14:35 ` Kirill A. Shutemov
2021-12-17 11:30 ` [PATCH v1 05/11] mm: thp: allow for reading the THP mapcount atomically via a raw_seqlock_t David Hildenbrand
2021-12-17 11:30 ` [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) David Hildenbrand
2021-12-17 19:04 ` Linus Torvalds
2021-12-17 19:22 ` Linus Torvalds
2021-12-17 20:17 ` David Hildenbrand
2021-12-17 20:36 ` Linus Torvalds
2021-12-17 20:39 ` Linus Torvalds
2021-12-17 20:43 ` Linus Torvalds
2021-12-17 20:42 ` David Hildenbrand
2021-12-17 20:45 ` Linus Torvalds
2021-12-18 22:52 ` Kirill A. Shutemov
2021-12-18 23:05 ` Linus Torvalds
2021-12-17 20:47 ` Jason Gunthorpe
2021-12-17 20:56 ` Linus Torvalds
2021-12-17 21:17 ` David Hildenbrand
2021-12-17 21:04 ` David Hildenbrand
2021-12-18 0:50 ` Jason Gunthorpe
2021-12-17 21:15 ` Nadav Amit
2021-12-17 21:20 ` David Hildenbrand
2021-12-18 0:50 ` Jason Gunthorpe
2021-12-18 1:53 ` Linus Torvalds
2021-12-18 2:17 ` Linus Torvalds
2021-12-18 2:42 ` Linus Torvalds
2021-12-18 3:36 ` Linus Torvalds
2021-12-18 3:05 ` Jason Gunthorpe
2021-12-18 3:30 ` Nadav Amit
2021-12-18 3:38 ` Linus Torvalds
2021-12-18 18:42 ` Jason Gunthorpe
2021-12-18 21:48 ` Nadav Amit
2021-12-18 22:53 ` Linus Torvalds
2021-12-19 0:19 ` Nadav Amit
2021-12-19 0:35 ` Linus Torvalds
2021-12-19 6:02 ` Nadav Amit
2021-12-19 8:01 ` John Hubbard
2021-12-19 11:30 ` Matthew Wilcox
2021-12-19 17:27 ` Linus Torvalds
2021-12-19 17:44 ` David Hildenbrand
2021-12-19 17:44 ` Linus Torvalds
2021-12-19 17:59 ` David Hildenbrand
2021-12-19 21:12 ` Matthew Wilcox
2021-12-19 21:27 ` Linus Torvalds
2021-12-19 21:47 ` Matthew Wilcox
2021-12-19 21:53 ` Linus Torvalds
2021-12-19 22:02 ` Matthew Wilcox
2021-12-19 22:12 ` Linus Torvalds
2021-12-19 22:26 ` Matthew Wilcox
2021-12-20 18:37 ` Matthew Wilcox
2021-12-20 18:52 ` Matthew Wilcox
2021-12-20 19:38 ` Linus Torvalds
2021-12-20 19:15 ` Linus Torvalds
2021-12-20 21:02 ` Matthew Wilcox
2021-12-20 21:27 ` Linus Torvalds
2021-12-21 1:03 ` Jason Gunthorpe
2021-12-21 3:29 ` Matthew Wilcox
2021-12-21 8:58 ` David Hildenbrand
2021-12-21 14:28 ` Jason Gunthorpe
[not found] ` <303f21d3-42b4-2f11-3f22-28f89f819080@redhat.com>
2021-12-21 23:54 ` Jason Gunthorpe
2021-12-21 17:05 ` Linus Torvalds
2021-12-21 17:40 ` David Hildenbrand
2021-12-21 18:00 ` Linus Torvalds
[not found] ` <dda021c8-69ec-c660-46be-793ae345a5bb@redhat.com>
2021-12-21 21:11 ` John Hubbard
2021-12-21 18:07 ` Jan Kara
2021-12-21 18:30 ` Linus Torvalds
[not found] ` <d23ede12-5df7-2f28-00fd-ea58d85ae400@redhat.com>
2021-12-21 18:58 ` Linus Torvalds
2021-12-21 21:16 ` John Hubbard
2021-12-21 19:07 ` Jason Gunthorpe
[not found] ` <3e0868e6-c714-1bf8-163f-389989bf5189@redhat.com>
[not found] ` <dfe1c8d5-6fac-9040-0272-6d77bafa6a16@redhat.com>
2021-12-22 12:41 ` Jan Kara
[not found] ` <4a28e8a0-2efa-8b5e-10b5-38f1fc143a98@redhat.com>
2021-12-22 14:42 ` Jan Kara
[not found] ` <505d3d0f-23ee-0eec-0571-8058b8eedb97@redhat.com>
2021-12-22 16:08 ` Jan Kara
2021-12-22 16:44 ` Matthew Wilcox
2021-12-22 18:40 ` Linus Torvalds
2021-12-23 12:54 ` Jan Kara
2021-12-23 17:18 ` Linus Torvalds
2021-12-23 0:21 ` Matthew Wilcox
2021-12-24 2:53 ` Jason Gunthorpe
2021-12-24 4:53 ` Matthew Wilcox
2022-01-04 0:33 ` Jason Gunthorpe
2021-12-21 23:59 ` Jason Gunthorpe
2021-12-22 12:44 ` Jan Kara
2021-12-17 20:45 ` David Hildenbrand
2021-12-17 20:51 ` Linus Torvalds
2021-12-17 20:55 ` David Hildenbrand
2021-12-17 21:36 ` Linus Torvalds
2021-12-17 21:47 ` David Hildenbrand
2021-12-17 21:50 ` Linus Torvalds
2021-12-17 22:29 ` David Hildenbrand
2021-12-17 22:58 ` Linus Torvalds
2021-12-17 23:29 ` David Hildenbrand
2021-12-17 23:53 ` Nadav Amit
2021-12-18 4:02 ` Linus Torvalds
2021-12-18 4:52 ` Nadav Amit
2021-12-18 5:03 ` Matthew Wilcox
2021-12-18 5:23 ` Nadav Amit
2021-12-18 18:37 ` Linus Torvalds
2021-12-17 22:18 ` Linus Torvalds
2021-12-17 22:43 ` David Hildenbrand
2021-12-17 23:20 ` Linus Torvalds
2021-12-18 9:57 ` David Hildenbrand
2021-12-18 19:21 ` Linus Torvalds
2021-12-18 19:52 ` Linus Torvalds [this message]
2021-12-19 8:43 ` David Hildenbrand
2021-12-17 11:30 ` [PATCH v1 07/11] mm: gup: trigger unsharing via FAULT_FLAG_UNSHARE when required (!hugetlb) David Hildenbrand
2021-12-17 11:30 ` [PATCH v1 08/11] mm: hugetlb: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE David Hildenbrand
2021-12-17 11:30 ` [PATCH v1 09/11] mm: gup: trigger unsharing via FAULT_FLAG_UNSHARE when required (hugetlb) David Hildenbrand
2021-12-17 11:30 ` [PATCH v1 10/11] mm: thp: introduce and use page_trans_huge_anon_shared() David Hildenbrand
2021-12-17 11:30 ` [PATCH v1 11/11] selftests/vm: add tests for the known COW security issues David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHk-=wjevjeL44qafYd8=cJHZgNUOUuWVJ28vkS4U4v_Af-xaQ@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=ddutile@redhat.com \
--cc=guro@fb.com \
--cc=hch@lst.de \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=namit@vmware.com \
--cc=oleg@redhat.com \
--cc=peterx@redhat.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=rppt@linux.ibm.com \
--cc=shakeelb@google.com \
--cc=shy828301@gmail.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox