From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: bpf <bpf@vger.kernel.org>, Andrii Nakryiko <andrii@kernel.org>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Sebastian Sewior <bigeasy@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
Hou Tao <houtao1@huawei.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@suse.com>,
Matthew Wilcox <willy@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Jann Horn <jannh@google.com>, Tejun Heo <tj@kernel.org>,
linux-mm <linux-mm@kvack.org>, Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v5 3/7] locking/local_lock: Introduce local_trylock_irqsave()
Date: Wed, 15 Jan 2025 18:20:36 -0800 [thread overview]
Message-ID: <CAADnVQKxBDaoFxav_WCgQ7sDuV_esTgQVp4Ci+L3e3UCz98e3g@mail.gmail.com> (raw)
In-Reply-To: <e6685465-9e72-4822-95a1-990f0893f206@suse.cz>
On Wed, Jan 15, 2025 at 6:22 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 1/15/25 03:17, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > Similar to local_lock_irqsave() introduce local_trylock_irqsave().
> > This is inspired by 'struct local_tryirq_lock' in:
> > https://lore.kernel.org/all/20241112-slub-percpu-caches-v1-5-ddc0bdc27e05@suse.cz/
>
> Let's see what locking maintainers say about adding the flag to every
> local_lock even if it doesn't use the trylock operation.
As I replied to Sebastian there are very few users and
hot users of local_lock like networking use it in RT only.
local_lock_nested_bh() stays true nop in !RT.
This patch doesn't change it.
So active flag on !RT is not in critical path
(at least as much I could study the code)
> > Use spin_trylock in PREEMPT_RT when not in hard IRQ and not in NMI
> > and fail instantly otherwise, since spin_trylock is not safe from IRQ
> > due to PI issues.
> >
> > In !PREEMPT_RT use simple active flag to prevent IRQs or NMIs
> > reentering locked region.
> >
> > Note there is no need to use local_inc for active flag.
> > If IRQ handler grabs the same local_lock after READ_ONCE(lock->active)
>
> IRQ handler AFAICS can't do that since __local_trylock_irqsave() is the only
> trylock operation and it still does local_irq_save(). Could you have added a
> __local_trylock() operation instead? Guess not for your use case because I
> see in Patch 4 you want to use the local_unlock_irqrestore() universally for
> sections that are earlier locked either by local_trylock_irqsave() or
> local_lock_irqsave(). But I wonder if those can be changed (will reply on
> that patch).
Pasting your reply from patch 4 here to reply to both...
Yes, I'm only adding local_trylock_irqsave() and not local_trylock(),
since memcg and slab are using local_lock_irqsave() in numerous
places, and adding local_trylock() here would be just dead code.
> The motivation in my case was to avoid the overhead of irqsave/restore on
> !PREEMPT_RT. If there was a separate "flavor" of local_lock that would
> support the trylock operation, I think it would not need the _irq and
> _irqsave variants at all, and it would also avoid adding the "active" flag
> on !PREEMPT_RT. Meanwhile on PREEMPT_RT, a single implementation could
> likely handle both flavors with no downsides?
I agree with the desire to use local_lock() in slab and memcg long term,
but this is something that definitely should _not_ be done in this patch set.
try_alloc_page() needs to learn to walk before we teach it to run.
> The last line can practially only happen on RT, right? On non-RT irqsave
> means we could only fail the trylock from a nmi and then we should have
> gfp_flags that don't allow spinning.
Correct.
> So suppose we used local_trylock(), local_lock() and local_unlock() (no
> _irqsave) instead, as I mentioned in reply to 3/7. The RT implementation
> would be AFAICS the same. On !RT the trylock could now fail from a IRQ
> context in addition to NMI context, but that should also have a gfp_mask
> that does not allow spinning, so it should work fine.
Also correct.
> It would however mean converting all users of the lock, i.e. also
> consume_obj_stock() etc., but AFAIU that will be necessary anyway to have
> opportunistic slab allocations?
Exactly. And as soon as we do that we start to conflict between trees.
But the main concern is that change like that needs to be
thoroughly analyzed.
I'm not convinced that stock_lock as preempt_disable() will work for memcg.
People do GFP_NOWAIT allocations from IRQ and assume it works.
If memcg local_irqsave (aka local_lock_irqsave) is replaced
with preempt_disable the IRQ can happen in the middle of memcg
update of the counters,
so ALL of stock_lock operations would have to local_TRYlock()
with fallback in case IRQ kmalloc(GFP_NOWAIT) happens to reenter.
Same issue with slub.
local_lock_irqsave(&s->cpu_slab->lock)
as irq disabled region works for kmalloc(GPF_NOWAIT) users.
If it becomes preempt_disable I suspect it will break things.
Like perf and bpf use irq_work do wakeups and allocations.
slub's s->cpu_slab protected by preempt_disable
would mean that 'perf record -a' will be triggering in the
middle of slab partial, deactivate slab logic and
perf will be doing wakups right there. I suspect it will be sad.
While right now irq work handler will be called only
after the last local_unlock_irqrestore enables irqs.
So replacing local_lock_irqsave in slab and memcg with
local_lock is not something to take lightly.
next prev parent reply other threads:[~2025-01-16 2:20 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-15 2:17 [PATCH bpf-next v5 0/7] bpf, mm: Introduce try_alloc_pages() Alexei Starovoitov
2025-01-15 2:17 ` [PATCH bpf-next v5 1/7] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation Alexei Starovoitov
2025-01-15 11:19 ` Vlastimil Babka
2025-01-15 23:00 ` Alexei Starovoitov
2025-01-15 23:47 ` Shakeel Butt
2025-01-16 2:44 ` Alexei Starovoitov
2025-01-15 23:16 ` Shakeel Butt
2025-01-17 18:19 ` Sebastian Andrzej Siewior
2025-01-15 2:17 ` [PATCH bpf-next v5 2/7] mm, bpf: Introduce free_pages_nolock() Alexei Starovoitov
2025-01-15 11:47 ` Vlastimil Babka
2025-01-15 23:15 ` Alexei Starovoitov
2025-01-16 8:31 ` Vlastimil Babka
2025-01-17 18:20 ` Sebastian Andrzej Siewior
2025-01-15 2:17 ` [PATCH bpf-next v5 3/7] locking/local_lock: Introduce local_trylock_irqsave() Alexei Starovoitov
2025-01-15 2:23 ` Alexei Starovoitov
2025-01-15 7:22 ` Sebastian Sewior
2025-01-15 14:22 ` Vlastimil Babka
2025-01-16 2:20 ` Alexei Starovoitov [this message]
2025-01-17 20:33 ` Sebastian Andrzej Siewior
2025-01-21 15:59 ` Vlastimil Babka
2025-01-21 16:43 ` Sebastian Andrzej Siewior
2025-01-22 1:35 ` Alexei Starovoitov
2025-01-15 2:17 ` [PATCH bpf-next v5 4/7] memcg: Use trylock to access memcg stock_lock Alexei Starovoitov
2025-01-15 16:07 ` Vlastimil Babka
2025-01-16 0:12 ` Shakeel Butt
2025-01-16 2:22 ` Alexei Starovoitov
2025-01-16 20:07 ` Joshua Hahn
2025-01-17 17:36 ` Johannes Weiner
2025-01-15 2:17 ` [PATCH bpf-next v5 5/7] mm, bpf: Use memcg in try_alloc_pages() Alexei Starovoitov
2025-01-15 17:51 ` Vlastimil Babka
2025-01-16 0:24 ` Shakeel Butt
2025-01-15 2:17 ` [PATCH bpf-next v5 6/7] mm: Make failslab, kfence, kmemleak aware of trylock mode Alexei Starovoitov
2025-01-15 17:57 ` Vlastimil Babka
2025-01-16 2:23 ` Alexei Starovoitov
2025-01-15 2:17 ` [PATCH bpf-next v5 7/7] bpf: Use try_alloc_pages() to allocate pages for bpf needs Alexei Starovoitov
2025-01-15 18:02 ` Vlastimil Babka
2025-01-16 2:25 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAADnVQKxBDaoFxav_WCgQ7sDuV_esTgQVp4Ci+L3e3UCz98e3g@mail.gmail.com \
--to=alexei.starovoitov@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=houtao1@huawei.com \
--cc=jannh@google.com \
--cc=kernel-team@fb.com \
--cc=linux-mm@kvack.org \
--cc=memxor@gmail.com \
--cc=mhocko@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox