From: Matthew Wilcox <willy@infradead.org>
To: David Hildenbrand <david@redhat.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
yuzhao@google.com, ryan.roberts@arm.com, shy828301@gmail.com,
akpm@linux-foundation.org
Subject: Re: [RFC PATCH 0/3] support large folio for mlock
Date: Fri, 7 Jul 2023 20:26:52 +0100 [thread overview]
Message-ID: <ZKhm/LDJ0X/o3BYG@casper.infradead.org> (raw)
In-Reply-To: <5c9bf622-0866-168f-a1cd-4e4a98322127@redhat.com>
On Fri, Jul 07, 2023 at 09:15:02PM +0200, David Hildenbrand wrote:
> > > Sure, any time we PTE-map a THP we might just say "let's put that on the
> > > deferred split queue" and cross fingers that we can eventually split it
> > > later. (I was recently thinking about that in the context of the mapcount
> > > ...)
> > >
> > > It's all a big mess ...
> >
> > Oh, I agree, there are always going to be circumstances where we realise
> > we've made a bad decision and can't (easily) undo it. Unless we have a
> > per-page pincount, and I Would Rather Not Do That.
>
> I agree ...
>
> But we should _try_
> > to do that because it's the right model -- that's what I meant by "Tell
>
> Try to have per-page pincounts? :/ or do you mean, try to split on VMA
> split? I hope the latter (although I'm not sure about performance) :)
Sorry, try to split a folio on VMA split.
> > me why I'm wrong"; what scenarios do we have where a user temporarilly
> > mlocks (or mprotects or ...) a range of memory, but wants that memory
> > to be aged in the LRU exactly the same way as the adjacent memory that
> > wasn't mprotected?
>
> Let me throw in a "fun one".
>
> Parent process has a 2 MiB range populated by a THP. fork() a child process.
> Child process mprotects half the VMA.
>
> Should we split the (COW-shared) THP? Or should we COW/unshare in the child
> process (ugh!) during the VMA split.
>
> It all makes my brain hurt.
OK, so this goes back to what I wrote earlier about attempting to choose
what size of folio to allocate on COW:
https://lore.kernel.org/linux-mm/Y%2FU8bQd15aUO97vS@casper.infradead.org/
: the parent had already established
: an appropriate size folio to use for this VMA before calling fork().
: Whether it is the parent or the child causing the COW, it should probably
: inherit that choice and we should default to the same size folio that
: was already found.
You've come up with a usefully different case here. I think we should
COW the folio at the point of the mprotect(). That will allow the parent
to become the sole owner of the folio once again and ensure that when
the parent modifies the folio, it _doesn't_ have to COW.
(This is also a rare case, surely)
> >
> > GUP-pinning is different, and I don't think GUP-pinning should split
> > a folio. That's a temporary use (not FOLL_LONGTERM), eg, we're doing
> > tcp zero-copy or it's the source/target of O_DIRECT. That's not an
> > instruction that this memory is different from its neighbours.
> >
> > Maybe we end up deciding to split folios on GUP-pin. That would be
> > regrettable.
>
> That would probably never be accepted, because the ones that heavily rely on
> THP (databases, VMs), typically also end up using a lot of features that use
> (long-term) page pinning. Don't get me started on io_uring with fixed
> buffers.
I do think that something like a long-term pin should split a folio.
Otherwise we're condemning the rest of the folio to be pinned along
with it. Short term pins shouldn't split.
next prev parent reply other threads:[~2023-07-07 19:27 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-07 16:52 Yin Fengwei
2023-07-07 16:52 ` [RFC PATCH 1/3] mm: add function folio_in_range() Yin Fengwei
2023-07-08 5:47 ` Yu Zhao
2023-07-08 6:44 ` Yin, Fengwei
2023-07-07 16:52 ` [RFC PATCH 2/3] mm: handle large folio when large folio in VM_LOCKED VMA range Yin Fengwei
2023-07-08 5:11 ` Yu Zhao
2023-07-08 5:33 ` Yin, Fengwei
2023-07-08 5:56 ` Yu Zhao
2023-07-07 16:52 ` [RFC PATCH 3/3] mm: mlock: update mlock_pte_range to handle large folio Yin Fengwei
2023-07-07 17:26 ` [RFC PATCH 0/3] support large folio for mlock Matthew Wilcox
2023-07-07 18:54 ` David Hildenbrand
2023-07-07 19:06 ` Matthew Wilcox
2023-07-07 19:15 ` David Hildenbrand
2023-07-07 19:26 ` Matthew Wilcox [this message]
2023-07-10 10:36 ` Ryan Roberts
2023-07-08 3:52 ` Yin, Fengwei
2023-07-08 4:02 ` Matthew Wilcox
2023-07-08 4:35 ` Yu Zhao
2023-07-08 4:40 ` Yin, Fengwei
2023-07-08 4:36 ` Yin, Fengwei
2023-07-09 13:25 ` Yin, Fengwei
2023-07-10 9:32 ` David Hildenbrand
2023-07-10 9:43 ` Yin, Fengwei
2023-07-10 9:57 ` David Hildenbrand
2023-07-10 10:19 ` Yin, Fengwei
2023-07-08 3:34 ` Yin, Fengwei
2023-07-08 3:31 ` Yin, Fengwei
2023-07-08 4:45 ` Yu Zhao
2023-07-08 5:01 ` Yin, Fengwei
2023-07-08 5:06 ` Yu Zhao
2023-07-08 5:35 ` Yin, Fengwei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZKhm/LDJ0X/o3BYG@casper.infradead.org \
--to=willy@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox