linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Harry Yoo <harry.yoo@oracle.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	linux-mm <linux-mm@kvack.org>, Dennis Zhou <dennis@kernel.org>
Subject: Re: a case for a destructor for slub: mm_struct
Date: Fri, 14 Mar 2025 17:27:33 +0900	[thread overview]
Message-ID: <Z9PodYC5YQTEtUQw@harry> (raw)
In-Reply-To: <CAGudoHFguN8GWmx496c+xeSKUp=57MKaCsgnqk6wN4ebRFYpEA@mail.gmail.com>

On Thu, Mar 13, 2025 at 12:23:55PM +0100, Mateusz Guzik wrote:
> On Thu, Mar 13, 2025 at 9:59 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > > There is other expensive work which can also be modified this way.
> >
> > Not sure what you're referring to?
> >
> 
> There is pgd_alloc/free calls which end up taking the global pgd_list
> lock. Instead, this could be patched to let mm hang out on the get
> skipped by the pgd list iterating machinery as needed.

Okay, IIUC, you want to remove it from the list in the destructor
and not remove it from the list when freeing it, and the code iterating
over the list needs to verify whether it is still allocated.

Would this be with SLAB_TYPESAFE_BY_RCU, or by taking a lock within
the destructor?

The RFC patch [1] that removes the destructor states that "taking a spinlock
in a destructor is a bit risky since the slab allocators may run the
destructors anytime they decide a slab is no longer needed",
but I need to think a bit more about how risky that actually is.

[1] https://lore.kernel.org/linux-kernel/Pine.LNX.4.64.0705101156190.10663@schroedinger.engr.sgi.com/#t

> > > there may be spurious mm_struct's hanging out and eating pcpu resources.
> > > Something can be added to reclaim those by the pcpu allocator.
> >
> > Not sure if I follow. What do you mean by spurious mm_struct, and how
> > does the pcpu allocator reclaim that?
> >
> 
> Suppose a workload was ran which created tons of mm_struct. The
> workload is done and they can be reclaimed, but hang out just in case.
>
> Another workload showed up, but one which wants to do many percpu
> allocs and is not mm_struct-heavy.
> 
> In case of resource shortage it would be good if the percpu allocator
> knew how to reclaim the known cached-but-not-used memory instead of
> grabbing new patches.
> 
> As for how to get there, so happens the primary consumer (percpu
> counters) already has a global list of all allocated objects. The
> allocator could walk it and reclaim as needed.

You mean reclaiming per-cpu objects along withthe slab objects that uses them?
That sounds like a new slab shrinker for mm_struct?

> > > So that's it for making the case, as for the APIs, I think it would be
> > > best if both dtor and ctor accepted a batch of objects to operate on,
> > > but that's a lot of extra churn due to pre-existing ctor users.
> >
> > Why do you want to pass batch of objects, instead of calling one by one
> > for each object when a slab folio is allocated/freed?
> >
> > Is it solely to reduce the overhead of extra function calls when
> > allocating or freeing a slab folio?
> >
> 
> The single-threaded overhead is one thing, some contention is another.
> back-to-back acquires are a degenerate case from scalability
> standpoint.
> 
> While these codepaths should be executing rarely, there is still no
> need to get spikes when they do.
> 
> Even so, that's a minor thing which can be patched up later -- even a
> ctor/dtor pair which operates one obj at a time is almost entirety of
> the improvement.

Hmm, but I think even a ctor/dtor pair that operations on one object at
a time requires changing the semantic of how the ctor works :'(

For now the constructor is not expected to fail, but looking at
mm_init()... it can fail. There is no way to let slab allocator know
that it failed.

Looks like some churn is needed anyway?

> > > ACHTUNG: I think this particular usage would still want some buy in
> > > from the mm folk and at least Dennis (the percpu allocator
> > > maintainer), but one has to start somewhere. There were 2 different
> > > patchsets posted to move rss counters away from the current pcpu
> > > scheme, but both had different tradeoffs and ultimately died off.
> > >
> > > Should someone(tm) commit to sorting this out, I'll handle the percpu
> > > thing. There are some other tweaks warranted here (e.g., depessimizing
> > > the rss counter validation loop at exit).
> > >
> > > So what do you think?
> >
> > I'd love to take the project and work on it, it makes sense to revive
> > the destructor feature if that turns out to be beneficial.
> >
> > I'll do that the slab part.
> 
> Cool, thanks.

-- 
Cheers,
Harry


  parent reply	other threads:[~2025-03-14  8:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-12 10:33 Mateusz Guzik
2025-03-13  8:59 ` Harry Yoo
2025-03-13 11:23   ` Mateusz Guzik
2025-03-13 11:25     ` Mateusz Guzik
2025-03-14  8:27     ` Harry Yoo [this message]
2025-03-14  8:34       ` Harry Yoo
2025-03-14 12:32       ` Mateusz Guzik
2025-03-17  5:42         ` Harry Yoo
2025-03-17  9:02           ` Vlastimil Babka
2025-03-17  9:17             ` Mateusz Guzik
2025-03-17  9:23           ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9PodYC5YQTEtUQw@harry \
    --to=harry.yoo@oracle.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=dennis@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mjguzik@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox