From: Harry Yoo <harry.yoo@oracle.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
linux-mm <linux-mm@kvack.org>, Dennis Zhou <dennis@kernel.org>
Subject: Re: a case for a destructor for slub: mm_struct
Date: Fri, 14 Mar 2025 17:34:31 +0900 [thread overview]
Message-ID: <Z9PqFx_vOPllIVey@harry> (raw)
In-Reply-To: <Z9PodYC5YQTEtUQw@harry>
On Fri, Mar 14, 2025 at 05:27:33PM +0900, Harry Yoo wrote:
> On Thu, Mar 13, 2025 at 12:23:55PM +0100, Mateusz Guzik wrote:
> > On Thu, Mar 13, 2025 at 9:59 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > > > There is other expensive work which can also be modified this way.
> > >
> > > Not sure what you're referring to?
> > >
> >
> > There is pgd_alloc/free calls which end up taking the global pgd_list
> > lock. Instead, this could be patched to let mm hang out on the get
> > skipped by the pgd list iterating machinery as needed.
>
> Okay, IIUC, you want to remove it from the list in the destructor
> and not remove it from the list when freeing it, and the code iterating
> over the list needs to verify whether it is still allocated.
>
> Would this be with SLAB_TYPESAFE_BY_RCU, or by taking a lock within
> the destructor?
Nah, slipped my mind. a lock needs to be acquired
even with SLAB_TYPESAFE_BY_RCU.
> The RFC patch [1] that removes the destructor states that "taking a spinlock
> in a destructor is a bit risky since the slab allocators may run the
> destructors anytime they decide a slab is no longer needed",
> but I need to think a bit more about how risky that actually is.
>
> [1] https://lore.kernel.org/linux-kernel/Pine.LNX.4.64.0705101156190.10663@schroedinger.engr.sgi.com/#t
>
> > > > there may be spurious mm_struct's hanging out and eating pcpu resources.
> > > > Something can be added to reclaim those by the pcpu allocator.
> > >
> > > Not sure if I follow. What do you mean by spurious mm_struct, and how
> > > does the pcpu allocator reclaim that?
> > >
> >
> > Suppose a workload was ran which created tons of mm_struct. The
> > workload is done and they can be reclaimed, but hang out just in case.
> >
> > Another workload showed up, but one which wants to do many percpu
> > allocs and is not mm_struct-heavy.
> >
> > In case of resource shortage it would be good if the percpu allocator
> > knew how to reclaim the known cached-but-not-used memory instead of
> > grabbing new patches.
> >
> > As for how to get there, so happens the primary consumer (percpu
> > counters) already has a global list of all allocated objects. The
> > allocator could walk it and reclaim as needed.
>
> You mean reclaiming per-cpu objects along withthe slab objects that uses them?
> That sounds like a new slab shrinker for mm_struct?
>
> > > > So that's it for making the case, as for the APIs, I think it would be
> > > > best if both dtor and ctor accepted a batch of objects to operate on,
> > > > but that's a lot of extra churn due to pre-existing ctor users.
> > >
> > > Why do you want to pass batch of objects, instead of calling one by one
> > > for each object when a slab folio is allocated/freed?
> > >
> > > Is it solely to reduce the overhead of extra function calls when
> > > allocating or freeing a slab folio?
> > >
> >
> > The single-threaded overhead is one thing, some contention is another.
> > back-to-back acquires are a degenerate case from scalability
> > standpoint.
> >
> > While these codepaths should be executing rarely, there is still no
> > need to get spikes when they do.
> >
> > Even so, that's a minor thing which can be patched up later -- even a
> > ctor/dtor pair which operates one obj at a time is almost entirety of
> > the improvement.
>
> Hmm, but I think even a ctor/dtor pair that operations on one object at
> a time requires changing the semantic of how the ctor works :'(
>
> For now the constructor is not expected to fail, but looking at
> mm_init()... it can fail. There is no way to let slab allocator know
> that it failed.
>
> Looks like some churn is needed anyway?
>
> > > > ACHTUNG: I think this particular usage would still want some buy in
> > > > from the mm folk and at least Dennis (the percpu allocator
> > > > maintainer), but one has to start somewhere. There were 2 different
> > > > patchsets posted to move rss counters away from the current pcpu
> > > > scheme, but both had different tradeoffs and ultimately died off.
> > > >
> > > > Should someone(tm) commit to sorting this out, I'll handle the percpu
> > > > thing. There are some other tweaks warranted here (e.g., depessimizing
> > > > the rss counter validation loop at exit).
> > > >
> > > > So what do you think?
> > >
> > > I'd love to take the project and work on it, it makes sense to revive
> > > the destructor feature if that turns out to be beneficial.
> > >
> > > I'll do that the slab part.
> >
> > Cool, thanks.
>
> --
> Cheers,
> Harry
--
Cheers,
Harry
next prev parent reply other threads:[~2025-03-14 8:34 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-12 10:33 Mateusz Guzik
2025-03-13 8:59 ` Harry Yoo
2025-03-13 11:23 ` Mateusz Guzik
2025-03-13 11:25 ` Mateusz Guzik
2025-03-14 8:27 ` Harry Yoo
2025-03-14 8:34 ` Harry Yoo [this message]
2025-03-14 12:32 ` Mateusz Guzik
2025-03-17 5:42 ` Harry Yoo
2025-03-17 9:02 ` Vlastimil Babka
2025-03-17 9:17 ` Mateusz Guzik
2025-03-17 9:23 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z9PqFx_vOPllIVey@harry \
--to=harry.yoo@oracle.com \
--cc=42.hyeyoo@gmail.com \
--cc=dennis@kernel.org \
--cc=linux-mm@kvack.org \
--cc=mjguzik@gmail.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox