From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org
Subject: Re: Splitting the mmap_sem
Date: Tue, 7 Jan 2020 15:34:15 +0300 [thread overview]
Message-ID: <20200107123415.gqklwca4qilva2yr@box> (raw)
In-Reply-To: <20200106220910.GK6788@bombadil.infradead.org>
On Mon, Jan 06, 2020 at 02:09:10PM -0800, Matthew Wilcox wrote:
> On Thu, Dec 12, 2019 at 07:40:02AM -0800, Matthew Wilcox wrote:
> > > > We currently only have one ->map_pages() callback, and it's
> > > > filemap_map_pages(). It only needs to sleep in one place -- to allocate
> > > > a PTE table. I think that can be allocated ahead of time if needed.
> > >
> > > No, filemap_map_pages() doesn't sleep. It cannot. Whole body of the
> > > function is under rcu_read_lock(). It uses pre-allocated page table.
> > > See do_fault_around().
> >
> > Oh, thank you! That makes the ->map_pages() optimisation already workable
> > with no changes.
>
> I've been thinking about this some more, and we have a bit of a tough time
> allocating page table entries while holding the RCU read lock. There's
> no GFP flags to the p??_alloc() functions, so we can't specify GFP_NOWAIT.
>
> Option 1: Add 'prealloc_pmd' and 'prealloc_pud' to the vm_fault (to go
> with prealloc_pte). Allocate them before taking the RCU lock to walk
> the VMA tree. This will be a bit of reordering as we currently take
> the mmap_sem, walk the VMA tree, then walk the page tables once we know
> we have a good VMA. I don't see a problem with doing that, but others
> may differ.
I expect preallocating all these page tables just-in-case would have
measuable performance impact. Current code only preallocates PTE page
table if sees pmd_none().
We may first check if this branch of the tree is present. But I'm not sure
how efficient it can be. And we still need to protect from freeing
these page tables from under us.
> Option 2: Add a memalloc_nowait_save/restore API to go along
> with nofs and noio. That way, we can take the RCU read lock, call
> memalloc_nowait_save(), and walk the VMA tree and the page tables in
> the current order. There's an increased chance of memory allocation of
> page tables failing, so we'll have to risk that and do a retry with the
> reference count held on the VMA if we need to sleep to allocate memory.
>
> Option 3: Variant of 2 where we add GFP flags to the p??_alloc()
> functions.
I think this is the most reasonable way. If we are low of memory, latency
is not on the top of priorities.
> Option 4: Variant of 2 where we make taking the RCU read lock magically
> set the nowait bit, or we have the page allocator check the RCU preempt
> depth. I don't particularly like this one, particularly since the
> preempt depth is not knowable in most kernel configurations.
>
> Other thoughts on this?
--
Kirill A. Shutemov
next prev parent reply other threads:[~2020-01-07 12:34 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-03 22:21 Matthew Wilcox
2019-12-05 17:21 ` Jerome Glisse
2019-12-06 5:13 ` Matthew Wilcox
2019-12-06 17:30 ` Jerome Glisse
2019-12-09 3:33 ` Matthew Wilcox
2019-12-09 14:17 ` Jerome Glisse
2019-12-10 15:26 ` Vlastimil Babka
2019-12-10 16:07 ` Jerome Glisse
2019-12-10 18:09 ` Vlastimil Babka
2019-12-12 14:24 ` Kirill A. Shutemov
2019-12-12 15:40 ` Matthew Wilcox
2019-12-12 15:46 ` Kirill A. Shutemov
2019-12-13 14:33 ` Matthew Wilcox
2019-12-13 18:06 ` Kirill A. Shutemov
2019-12-13 18:21 ` Matthew Wilcox
2020-01-06 22:09 ` Matthew Wilcox
2020-01-07 12:34 ` Kirill A. Shutemov [this message]
2020-01-07 13:54 ` Matthew Wilcox
2020-01-07 14:27 ` Kirill A. Shutemov
2020-01-09 13:56 ` Vlastimil Babka
2020-01-09 17:03 ` Michal Hocko
2020-01-09 17:07 ` Michal Hocko
2020-01-09 17:32 ` SeongJae Park
2020-01-09 20:13 ` Matthew Wilcox
2020-02-06 13:59 ` Peter Zijlstra
2020-02-06 20:15 ` Matthew Wilcox
2020-02-06 20:55 ` Peter Zijlstra
2020-02-06 21:20 ` Matthew Wilcox
2020-02-07 8:52 ` Peter Zijlstra
2020-02-10 22:00 ` Matthew Wilcox
2020-02-19 17:14 ` Laurent Dufour
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200107123415.gqklwca4qilva2yr@box \
--to=kirill@shutemov.name \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox