From: Mateusz Guzik <mjguzik@gmail.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>,
lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
bpf <bpf@vger.kernel.org>, Christoph Lameter <cl@linux.com>,
David Rientjes <rientjes@google.com>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
"Uladzislau Rezki (Sony)" <urezki@gmail.com>,
Alexei Starovoitov <ast@kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] SLUB allocator, mainly the sheaves caching layer
Date: Mon, 24 Feb 2025 19:46:52 +0100 [thread overview]
Message-ID: <svy4dxxdgbt4mnapfrqod7c2imufgb4daao7id3j5p7tgeok4j@jtknbmybpqsg> (raw)
In-Reply-To: <e2fz26kcbni37rp2rdqvac7mljvrglvtzmkivfpsnibubu3g3t@blz27xo4honn>
On Mon, Feb 24, 2025 at 10:02:09AM -0800, Shakeel Butt wrote:
> What about pre-memcg-charged sheaves? We had to disable memcg charging
> of some kernel allocations and I think sheaves can help in reenabling
> it.
It has been several months since last I looked at memcg, so details are
fuzzy and I don't have time to refresh everything.
However, if memory serves right the primary problem was the irq on/off
trip associated with them (sometimes happening twice, second time with
refill_obj_stock()).
I think the real fix(tm) would recognize only some allocations need
interrupt safety -- as in some slabs should not be allowed to be used
outside of the process context. This is somewhat what sheaves is doing,
but can be applied without fronting the current kmem caching mechanism.
This may be a tough sell and even then it plays whackamole with patching
up all consumers.
Suppose it is not an option.
Then there are 2 ways that I considered.
The easiest splits memcg accounting for irq and process level -- similar
to what localtry thing is doing. this would only cost preemption off/on
trip in the common case and a branch on the current state. But suppose
this is a no-go as well.
My primary idea was using hand-rolled sequence counters and local 8-byte
cmpxchg (*without* the lock prefix, also not to be confused with 16-byte
used by the current slub fast path). Should this work, it would be
significantly faster than irq trips.
The irq thing is there only to facilitate several fields being updated
or memcg itself getting replaced in an atomic manner for process vs
interrupt context.
The observation is that all values which are getting updated are 4
bytes. Then perhaps an additional counter can be added next to each one
so that an 8-byte cmpxchg is going to fail should an irq swoop in and
change stuff from under us.
The percpu state would have a sequence counter associated with the
assigned memcg_stock_pcp. The memcg_stock_pcp object would have the same
value replicated inside for every var which can be updated in the fast
path.
Then the fast path would only succeed if the value read off from per-cpu
did not change vs what's in the stock thing.
Any change to memcg_stock_pcp (e.g., rolling up bytes after passing the
page size threshold) would disable interrupts and modify all these
counters.
There is some more work needed to make sure the stock obj can be safely
swapped out for a new one and not accidentally have a value which lines
up with the prevoius one, I don't remember what I had for that (and yes,
I recognize a 4 byte value will invariably roll over and *in principle*
a conflict will be possible).
This is a rough outline since Vlasta keeps prodding me about it.
That said, maybe someone will have a better idea. The above is up for
grabs if someone wants to do it, I can't commit to looking at it.
next prev parent reply other threads:[~2025-02-24 18:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-24 16:13 Vlastimil Babka
2025-02-24 18:02 ` Shakeel Butt
2025-02-24 18:15 ` Vlastimil Babka
2025-02-24 20:52 ` Shakeel Butt
2025-02-24 18:46 ` Mateusz Guzik [this message]
2025-02-24 21:12 ` Shakeel Butt
2025-02-24 22:21 ` Mateusz Guzik
2025-02-26 0:17 ` Christoph Lameter (Ampere)
2025-03-05 10:26 ` Vlastimil Babka
2025-03-25 17:43 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=svy4dxxdgbt4mnapfrqod7c2imufgb4daao7id3j5p7tgeok4j@jtknbmybpqsg \
--to=mjguzik@gmail.com \
--cc=42.hyeyoo@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cl@linux.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=rientjes@google.com \
--cc=shakeel.butt@linux.dev \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox