From: Peter Xu <peterx@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, Vishal Moola <vishal.moola@gmail.com>,
Hugh Dickins <hughd@google.com>, Rik van Riel <riel@surriel.com>,
David Hildenbrand <david@redhat.com>,
"Yin, Fengwei" <fengwei.yin@intel.com>
Subject: Re: Folio mapcount
Date: Wed, 8 Feb 2023 15:58:54 -0500 [thread overview]
Message-ID: <Y+QNDqASslZr6DFU@x1n> (raw)
In-Reply-To: <Y+QFJpvh/G+ekvzI@casper.infradead.org>
On Wed, Feb 08, 2023 at 08:25:10PM +0000, Matthew Wilcox wrote:
> On Wed, Feb 08, 2023 at 02:40:11PM -0500, Peter Xu wrote:
> > On Tue, Feb 07, 2023 at 11:27:17PM +0000, Matthew Wilcox wrote:
> > > I've been thinking about this one, and I wonder if we can do it
> > > without taking any pgtable locks. The locking environment we're in
> > > is the page fault handler, so we have the mmap_lock for read (for now
> > > anyway ...). We also hold the folio lock, so _if_ the folio is mapped,
> > > those entries can't disappear under us.
> >
> > Could MADV_DONTNEED do that from another pgtable that we don't hold the
> > pgtable lock?
>
> Oh, ugh, yes. And zap_pte_range() has the PTL first, so we can't sleep
> to get the folio lock. And we can't decline to zap the pte on a failed
> folio_trylock() (well, we could for MADV_DONTNEED, but not in general).
>
> So ... how about this for a solution:
>
> - If the folio overlaps into the next PMD table, spin_lock it.
> - If the folio overlaps into the previous PMD table, unlock our
> PTL, lock the previous PTL, re-lock our PTL.
> - Do the pvmw, telling it we already have the PTLs held (new PVMW flag).
>
> [explanation simplified; if there is no prior PMD table or if the VMA
> limits how far to search, we can skip this]
>
> We have prior art for taking two PTLs in copy_page_range(). There,
> the hierarchy is clear; one VMA belongs to the process parent and one
> to the child. I don't believe we have precedent for taking two PTLs
> in the same VMA, but I think my proposal (order by ascending address in
> the process) is the obvious order to choose.
Maybe it'll work? Not sure, but seems be something we'd be extremely
careful with. Having a single mmap read lock covering both seems to
guarantee that the order of the lock is stable, which is a good start..
But I have no good idea on other implications across the whole kernel.
IMHO copy_page_range() is not a great example for proving deadlocks,
because the dst_mm should not be exposed to the whole world yet at all when
copying. Say, I don't see any case some thread can try to take the dst mm
pgtable lock at all until it's all set. I'm even wondering whether it's
safe to not take the dst mm pgtable lock at all during a fork()..
--
Peter Xu
next prev parent reply other threads:[~2023-02-08 20:59 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-24 18:13 Matthew Wilcox
2023-01-24 18:35 ` David Hildenbrand
2023-01-24 18:37 ` David Hildenbrand
2023-01-24 18:35 ` Yang Shi
2023-02-02 3:45 ` Mike Kravetz
2023-02-02 15:31 ` Matthew Wilcox
2023-02-07 16:19 ` Zi Yan
2023-02-07 16:44 ` Matthew Wilcox
2023-02-06 20:34 ` Matthew Wilcox
2023-02-06 22:55 ` Yang Shi
2023-02-06 23:09 ` Matthew Wilcox
2023-02-07 3:06 ` Yin, Fengwei
2023-02-07 4:08 ` Matthew Wilcox
2023-02-07 22:39 ` Peter Xu
2023-02-07 23:27 ` Matthew Wilcox
2023-02-08 19:40 ` Peter Xu
2023-02-08 20:25 ` Matthew Wilcox
2023-02-08 20:58 ` Peter Xu [this message]
2023-02-09 15:10 ` Chih-En Lin
2023-02-09 15:43 ` Peter Xu
2023-02-07 22:56 ` James Houghton
2023-02-07 23:08 ` Matthew Wilcox
2023-02-07 23:27 ` James Houghton
2023-02-07 23:35 ` Matthew Wilcox
2023-02-08 0:35 ` James Houghton
2023-02-08 2:26 ` Matthew Wilcox
2023-02-07 16:23 ` Zi Yan
2023-02-07 16:51 ` Matthew Wilcox
2023-02-08 19:36 ` Zi Yan
2023-02-08 19:54 ` Matthew Wilcox
2023-02-10 15:15 ` Zi Yan
2023-03-29 14:02 ` Yin, Fengwei
2023-07-01 1:17 ` Zi Yan
2023-07-02 9:50 ` Yin, Fengwei
2023-07-02 11:45 ` David Hildenbrand
2023-07-02 12:26 ` Matthew Wilcox
2023-07-03 20:54 ` David Hildenbrand
2023-07-02 19:51 ` Zi Yan
2023-07-03 1:09 ` Yin, Fengwei
2023-07-03 13:24 ` Zi Yan
2023-07-03 20:46 ` David Hildenbrand
2023-07-04 1:22 ` Yin, Fengwei
2023-07-04 2:25 ` Matthew Wilcox
2023-07-03 21:09 ` David Hildenbrand
-- strict thread matches above, loose matches on Subject: below --
2021-12-15 21:55 folio mapcount Matthew Wilcox
2021-12-16 9:37 ` Kirill A. Shutemov
2021-12-16 13:56 ` Matthew Wilcox
2021-12-16 15:19 ` Jason Gunthorpe
2021-12-16 15:54 ` Matthew Wilcox
2021-12-16 16:45 ` David Hildenbrand
2021-12-16 17:01 ` Jason Gunthorpe
2021-12-16 18:56 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y+QNDqASslZr6DFU@x1n \
--to=peterx@redhat.com \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=riel@surriel.com \
--cc=vishal.moola@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox