From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Linux-MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Yu Zhao <yuzhao@google.com>, Andy Lutomirski <luto@kernel.org>,
Peter Xu <peterx@redhat.com>, Pavel Emelyanov <xemul@openvz.org>,
Mike Kravetz <mike.kravetz@oracle.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
Minchan Kim <minchan@kernel.org>, Will Deacon <will@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Hugh Dickins <hughd@google.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Matthew Wilcox <willy@infradead.org>,
Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>,
Kees Cook <keescook@chromium.org>,
John Hubbard <jhubbard@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>, Jan Kara <jack@suse.cz>,
Kirill Tkhai <ktkhai@virtuozzo.com>
Subject: Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy
Date: Thu, 7 Jan 2021 13:05:19 -0800 [thread overview]
Message-ID: <CAHk-=wjDkyom4haQu6OU_yykkCFqMi98qO2gUPgZBF-11krRAA@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wjTuS9JB=Ms4WAMaOkGuLmvYwaf2W0JhXxNPdcv4NWZUA@mail.gmail.com>
On Thu, Jan 7, 2021 at 12:32 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Which is really why I think this needs to be fixed by just fixing UFFD
> to take the write lock.
Side note, and not really related to UFFD, but the mmap_sem in
general: I was at one point actually hoping that we could make the
mmap_sem a spinlock, or at least make the rule be that we never do any
IO under it. At which point a write lock hopefully really shouldn't be
such a huge deal.
The main source of IO under the mmap lock was traditionally the page
faults obviously needing to read the data in, but we now try to handle
that with the whole notion of page fault restart instead.
But I'm 100% sure we don't do as good a job of it as we _could_ do,
and there are probably a lot of other cases where we end up doing IO
under the mmap lock simply because we can and nobody has looked at it
very much.
So if taking the mmap_sem for writing is a huge deal - because it ends
up serializing with IO by people who take it for reading - I think
that is something that might be worth really looking into.
For example, right now I think we (still) only do the page fault retry
once - and the second time if the page still isn't available, we'll
actually wait with the mmap_sem held. That goes back to the very
original page fault retry logic, when I was worried that some infinite
retry would cause busy-waiting because somebody didn't do the proper
"drop mmap_sem, then wait, then return retry".
And if that actually causes problems, maybe we should just make sure
to fix it? remove that FAULT_FLAG_TRIED bit entirely, and make the
rule be that we always drop the mmap_sem and retry?
Similarly, if there are users that don't set FAULT_FLAG_ALLOW_RETRY at
all (because they don't have the logic to check if it's a re-try and
re-do the mmap_sem etc), maybe we can just fix them. I think all the
architectures do it properly in their page fault paths (I think Peter
Xu converted them all - no?), but maybe there are cases of GUP that
don't have it.
Or maybe there is something else that I just didn't notice, where we
end up having bad latencies on the mmap_sem.
I think those would very much be worth fixing, so that if
UFFDIO_WRITEPROTECT taking the mmapo_sem for writing causes problems,
we can _fix_ those problems.
But I think it's entirely wrong to treat UFFDIO_WRITEPROTECT as
specially as Andrea seems to want to treat it. Particularly with
absolutely zero use cases to back it up.
Linus
next prev parent reply other threads:[~2021-01-07 21:05 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-25 9:25 [RFC PATCH v2 0/2] mm: fix races due to deferred TLB flushes Nadav Amit
2020-12-25 9:25 ` [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect Nadav Amit
2021-01-04 12:22 ` Peter Zijlstra
2021-01-04 19:24 ` Andrea Arcangeli
2021-01-04 19:35 ` Nadav Amit
2021-01-04 20:19 ` Andrea Arcangeli
2021-01-04 20:39 ` Nadav Amit
2021-01-04 21:01 ` Andrea Arcangeli
2021-01-04 21:26 ` Nadav Amit
2021-01-05 18:45 ` Andrea Arcangeli
2021-01-05 19:05 ` Nadav Amit
2021-01-05 19:45 ` Andrea Arcangeli
2021-01-05 20:06 ` Nadav Amit
2021-01-05 21:06 ` Andrea Arcangeli
2021-01-05 21:43 ` Peter Xu
2021-01-05 8:13 ` Peter Zijlstra
2021-01-05 8:52 ` Nadav Amit
2021-01-05 14:26 ` Peter Zijlstra
2021-01-05 8:58 ` Peter Zijlstra
2021-01-05 9:22 ` Nadav Amit
2021-01-05 17:58 ` Andrea Arcangeli
2021-01-05 15:08 ` Peter Xu
2021-01-05 18:08 ` Andrea Arcangeli
2021-01-05 18:41 ` Peter Xu
2021-01-05 18:55 ` Andrea Arcangeli
2021-01-05 19:07 ` Nadav Amit
2021-01-05 19:43 ` Peter Xu
2020-12-25 9:25 ` [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup Nadav Amit
2021-01-05 15:08 ` Will Deacon
2021-01-05 18:20 ` Andrea Arcangeli
2021-01-05 19:26 ` Nadav Amit
2021-01-05 20:39 ` Andrea Arcangeli
2021-01-05 21:20 ` Yu Zhao
2021-01-05 21:22 ` Nadav Amit
2021-01-05 22:16 ` Will Deacon
2021-01-06 0:29 ` Andrea Arcangeli
2021-01-06 0:02 ` Andrea Arcangeli
2021-01-07 20:04 ` [PATCH 0/2] page_count can't be used to decide when wp_page_copy Andrea Arcangeli
2021-01-07 20:04 ` [PATCH 1/2] mm: proc: Invalidate TLB after clearing soft-dirty page state Andrea Arcangeli
2021-01-07 20:04 ` [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending Andrea Arcangeli
2021-01-07 20:17 ` Linus Torvalds
2021-01-07 20:25 ` Linus Torvalds
2021-01-07 20:58 ` Andrea Arcangeli
2021-01-07 21:29 ` Linus Torvalds
2021-01-07 21:53 ` John Hubbard
2021-01-07 22:00 ` Linus Torvalds
2021-01-07 22:14 ` John Hubbard
2021-01-07 22:20 ` Linus Torvalds
2021-01-07 22:24 ` Linus Torvalds
2021-01-07 22:37 ` John Hubbard
2021-01-15 11:27 ` Jan Kara
2021-01-07 22:31 ` Andrea Arcangeli
2021-01-07 22:42 ` Linus Torvalds
2021-01-07 22:51 ` Linus Torvalds
2021-01-07 23:48 ` Andrea Arcangeli
2021-01-08 0:25 ` Linus Torvalds
2021-01-08 12:48 ` Will Deacon
2021-01-08 16:14 ` Andrea Arcangeli
2021-01-08 17:39 ` Linus Torvalds
2021-01-08 17:53 ` Andrea Arcangeli
2021-01-08 19:25 ` Linus Torvalds
2021-01-09 0:12 ` Andrea Arcangeli
2021-01-08 17:30 ` Linus Torvalds
2021-01-07 23:28 ` Andrea Arcangeli
2021-01-07 21:36 ` kernel test robot
2021-01-07 20:25 ` [PATCH 0/2] page_count can't be used to decide when wp_page_copy Jason Gunthorpe
2021-01-07 20:32 ` Linus Torvalds
2021-01-07 21:05 ` Linus Torvalds [this message]
2021-01-07 22:02 ` Andrea Arcangeli
2021-01-07 22:17 ` Linus Torvalds
2021-01-07 22:56 ` Andrea Arcangeli
2021-01-09 19:32 ` Matthew Wilcox
2021-01-09 19:46 ` Linus Torvalds
2021-01-15 14:30 ` Jan Kara
2021-01-07 21:54 ` Andrea Arcangeli
2021-01-07 21:45 ` Andrea Arcangeli
2021-01-08 13:36 ` Jason Gunthorpe
2021-01-08 17:00 ` Andrea Arcangeli
2021-01-08 18:19 ` Jason Gunthorpe
2021-01-08 18:31 ` Andy Lutomirski
2021-01-08 18:38 ` Linus Torvalds
2021-01-08 23:34 ` Andrea Arcangeli
2021-01-09 19:03 ` Andy Lutomirski
2021-01-09 19:15 ` Linus Torvalds
2021-01-08 18:59 ` Linus Torvalds
2021-01-08 22:43 ` Andrea Arcangeli
2021-01-09 0:42 ` Jason Gunthorpe
2021-01-09 2:50 ` Andrea Arcangeli
2021-01-11 14:30 ` Jason Gunthorpe
2021-01-13 21:56 ` Jerome Glisse
2021-01-13 23:39 ` Jason Gunthorpe
2021-01-14 2:35 ` Jerome Glisse
2021-01-09 3:49 ` Hillf Danton
2021-01-11 14:39 ` Jason Gunthorpe
2021-01-05 21:55 ` [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHk-=wjDkyom4haQu6OU_yykkCFqMi98qO2gUPgZBF-11krRAA@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=aarcange@redhat.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=keescook@chromium.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=ktkhai@virtuozzo.com \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=minchan@kernel.org \
--cc=oleg@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=rppt@linux.vnet.ibm.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=xemul@openvz.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox