From: Mateusz Guzik <mjguzik@gmail.com>
To: "Christoph Lameter (Ampere)" <cl@gentwo.org>
Cc: Harry Yoo <harry.yoo@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
David Rientjes <rientjes@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
Jamal Hadi Salim <jhs@mojatatu.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Jiri Pirko <jiri@resnulli.us>, Vlad Buslov <vladbu@nvidia.com>,
Yevgeny Kliteynik <kliteyn@nvidia.com>, Jan Kara <jack@suse.cz>,
Byungchul Park <byungchul@sk.com>,
linux-mm@kvack.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/7] Reviving the slab destructor to tackle the percpu allocator scalability problem
Date: Thu, 24 Apr 2025 19:26:35 +0200 [thread overview]
Message-ID: <CAGudoHHbSKLxHgXfFYFdz5nXFBOQPh5EkCX8C7770vfMH-SLeA@mail.gmail.com> (raw)
In-Reply-To: <cd7de95e-96b6-b957-2889-bf53d0a019e2@gentwo.org>
On Thu, Apr 24, 2025 at 6:39 PM Christoph Lameter (Ampere)
<cl@gentwo.org> wrote:
>
> On Thu, 24 Apr 2025, Mateusz Guzik wrote:
>
> > > You could allocate larger percpu areas for a batch of them and
> > > then assign as needed.
> >
> > I was considering a mechanism like that earlier, but the changes
> > needed to make it happen would result in worse state for the
> > alloc/free path.
> >
> > RSS counters are embedded into mm with only the per-cpu areas being a
> > pointer. The machinery maintains a global list of all of their
> > instances, i.e. the pointers to internal to mm_struct. That is to say
> > even if you deserialized allocation of percpu memory itself, you would
> > still globally serialize on adding/removing the counters to the global
> > list.
> >
> > But suppose this got reworked somehow and this bit ceases to be a problem.
> >
> > Another spot where mm alloc/free globally serializes (at least on
> > x86_64) is pgd_alloc/free on the global pgd_lock.
> >
> > Suppose you managed to decompose the lock into a finer granularity, to
> > the point where it does not pose a problem from contention standpoint.
> > Even then that's work which does not have to happen there.
> >
> > General theme is there is a lot of expensive work happening when
> > dealing with mm lifecycle (*both* from single- and multi-threaded
> > standpoint) and preferably it would only be dealt with once per
> > object's existence.
>
> Maybe change the lifecyle? Allocate a batch nr of entries initially from
> the slab allocator and use them for multiple mm_structs as the need
> arises.
>
> Do not free them to the slab allocator until you
> have too many that do nothing around?
>
> You may also want to avoid counter updates with this scheme if you only
> count the batchees useed. It will become a bit fuzzy but you improve scalability.
>
If I get this right this proposal boils down to caching all the state,
but hiding the objects from reclaim?
If going this kind of route, perhaps it would be simpler to prevent
direct reclaim on mm objs and instead if there is memory shortage, let
a different thread take care of them?
--
Mateusz Guzik <mjguzik gmail.com>
next prev parent reply other threads:[~2025-04-24 17:26 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-24 8:07 Harry Yoo
2025-04-24 8:07 ` [RFC PATCH 1/7] mm/slab: refactor freelist shuffle Harry Yoo
2025-04-24 8:07 ` [RFC PATCH 2/7] treewide, slab: allow slab constructor to return an error Harry Yoo
2025-04-24 8:07 ` [RFC PATCH 3/7] mm/slab: revive the destructor feature in slab allocator Harry Yoo
2025-04-24 8:07 ` [RFC PATCH 4/7] net/sched/act_api: use slab ctor/dtor to reduce contention on pcpu alloc Harry Yoo
2025-04-24 8:07 ` [RFC PATCH 5/7] mm/percpu: allow (un)charging objects without alloc/free Harry Yoo
2025-04-24 8:07 ` [RFC PATCH 6/7] lib/percpu_counter: allow (un)charging percpu counters " Harry Yoo
2025-04-24 8:07 ` [RFC PATCH 7/7] kernel/fork: improve exec() throughput with slab ctor/dtor pair Harry Yoo
2025-04-24 9:29 ` [RFC PATCH 0/7] Reviving the slab destructor to tackle the percpu allocator scalability problem Mateusz Guzik
2025-04-24 9:58 ` Harry Yoo
2025-04-24 15:00 ` Mateusz Guzik
2025-04-24 11:28 ` Pedro Falcato
2025-04-24 15:20 ` Mateusz Guzik
2025-04-24 16:11 ` Mateusz Guzik
2025-04-25 7:40 ` Harry Yoo
2025-04-25 10:12 ` Harry Yoo
2025-04-25 10:42 ` Pedro Falcato
2025-04-28 1:18 ` Harry Yoo
2025-04-30 19:49 ` Mateusz Guzik
2025-05-12 11:00 ` Harry Yoo
2025-04-24 15:50 ` Christoph Lameter (Ampere)
2025-04-24 16:03 ` Mateusz Guzik
2025-04-24 16:39 ` Christoph Lameter (Ampere)
2025-04-24 17:26 ` Mateusz Guzik [this message]
2025-04-24 18:47 ` Tejun Heo
2025-04-25 10:10 ` Harry Yoo
2025-04-25 19:03 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGudoHHbSKLxHgXfFYFdz5nXFBOQPh5EkCX8C7770vfMH-SLeA@mail.gmail.com \
--to=mjguzik@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=byungchul@sk.com \
--cc=cl@gentwo.org \
--cc=dennis@kernel.org \
--cc=harry.yoo@oracle.com \
--cc=jack@suse.cz \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kliteyn@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=vladbu@nvidia.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox