From: Vlastimil Babka <vbabka@suse.cz>
To: Jerome Glisse <jglisse@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>,
linux-mm@kvack.org, Laurent Dufour <ldufour@linux.ibm.com>,
David Rientjes <rientjes@google.com>,
Hugh Dickins <hughd@google.com>,
Michel Lespinasse <walken@google.com>,
Davidlohr Bueso <dbueso@suse.de>
Subject: Re: Splitting the mmap_sem
Date: Tue, 10 Dec 2019 19:09:39 +0100 [thread overview]
Message-ID: <934d0938-3b15-c355-0390-71afffba6fe0@suse.cz> (raw)
In-Reply-To: <20191210160725.GB5257@redhat.com>
On 12/10/19 5:07 PM, Jerome Glisse wrote:
> On Tue, Dec 10, 2019 at 04:26:40PM +0100, Vlastimil Babka wrote:
>> On 12/5/19 6:21 PM, Jerome Glisse wrote:
>>>>
>>>> So calling mmap() looks like this:
>>>>
>>>> 1 allocate a new VMA
>>>> 2 update pointer(s) in maple tree
>>>> 3 sleep until old VMAs have a zero refcount
>>>> 4 synchronize_rcu()
>>>> 5 free old VMAs
>>>> 6 flush caches for affected range
>>>> 7 return to userspace
>>>>
>>>> While one thread is calling mmap(MAP_FIXED), two other threads which are
>>>> accessing the same address may see different data from each other and
>>>> have different page translations in their respective CPU caches until
>>>> the thread calling mmap() returns. I believe this is OK, but would
>>>> greatly appreciate hearing from people who know better.
>>>
>>> I do not believe this is OK, i believe this is wrong (not even considering
>>> possible hardware issues that can arise from such aliasing).
>>
>> But is it true that the races can happen in the above such that multiple CPU's
>> have different translations? I think it's impossible to tell from above - there
>> are no details about when and which pte modifications happen, where ptl lock is
>> taken... perhaps after filling those details, we could be able to see that
>> there's no race.
>>
>
> My assumption reading Matthew was that as step 6 is making progress
> (flushing caches and i assume TLB too) then you can have a CPU which
> is already flushed and that do take a fault against the new VMA and
> thus get a new TLB entry that do not match a CPU which is not yet
> flushed.
We already have to protect against CPU's that access page tables (and
thus fill their TLBs) by hardware, taking no mmap_sem at all. For the
mmap(MAP_FIXED) overwriting existing mapping case this means that the
old stuff will first be munmapped - zap_pte_range() will take pte lock,
clear pte and flush tlb's before installing any new mappings.
A parallel fault trying to install pte for the new VMA should thus
serialize on the pte lock and only install the new pte after everyone
was flushed, and this should be fine?
What might IMHO perhaps need care is a parallel fault that started with
the old VMA, because some PTE was unpopulated. We need to make sure it
doesn't end up being the last to install the PTE for the VMA that's
going away.
> Today this can not happens because page fault will serialize on the
> mmap_sem (ie until the write mode is release when returning to user-
> space).
next prev parent reply other threads:[~2019-12-10 18:09 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-03 22:21 Matthew Wilcox
2019-12-05 17:21 ` Jerome Glisse
2019-12-06 5:13 ` Matthew Wilcox
2019-12-06 17:30 ` Jerome Glisse
2019-12-09 3:33 ` Matthew Wilcox
2019-12-09 14:17 ` Jerome Glisse
2019-12-10 15:26 ` Vlastimil Babka
2019-12-10 16:07 ` Jerome Glisse
2019-12-10 18:09 ` Vlastimil Babka [this message]
2019-12-12 14:24 ` Kirill A. Shutemov
2019-12-12 15:40 ` Matthew Wilcox
2019-12-12 15:46 ` Kirill A. Shutemov
2019-12-13 14:33 ` Matthew Wilcox
2019-12-13 18:06 ` Kirill A. Shutemov
2019-12-13 18:21 ` Matthew Wilcox
2020-01-06 22:09 ` Matthew Wilcox
2020-01-07 12:34 ` Kirill A. Shutemov
2020-01-07 13:54 ` Matthew Wilcox
2020-01-07 14:27 ` Kirill A. Shutemov
2020-01-09 13:56 ` Vlastimil Babka
2020-01-09 17:03 ` Michal Hocko
2020-01-09 17:07 ` Michal Hocko
2020-01-09 17:32 ` SeongJae Park
2020-01-09 20:13 ` Matthew Wilcox
2020-02-06 13:59 ` Peter Zijlstra
2020-02-06 20:15 ` Matthew Wilcox
2020-02-06 20:55 ` Peter Zijlstra
2020-02-06 21:20 ` Matthew Wilcox
2020-02-07 8:52 ` Peter Zijlstra
2020-02-10 22:00 ` Matthew Wilcox
2020-02-19 17:14 ` Laurent Dufour
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=934d0938-3b15-c355-0390-71afffba6fe0@suse.cz \
--to=vbabka@suse.cz \
--cc=dbueso@suse.de \
--cc=hughd@google.com \
--cc=jglisse@redhat.com \
--cc=ldufour@linux.ibm.com \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=walken@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox