From: Shakeel Butt <shakeel.butt@linux.dev>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Shakeel Butt <shakeel.butt@gmail.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Alexei Starovoitov <ast@kernel.org>,
linux-mm <linux-mm@kvack.org>,
"open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>,
bpf <bpf@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>,
Meta kernel team <kernel-team@meta.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
netdev@vger.kernel.org
Subject: Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock
Date: Mon, 5 May 2025 13:49:46 -0700 [thread overview]
Message-ID: <jilyoryfq7cg6xp4cxbipct5vfbhu7ivp2jmzzigufqd6r5uss@h2cmibfg3fdf> (raw)
In-Reply-To: <ek6ptpggcmnp5kyt37ytriu6d4gj5grpfwcok3rupu5tbjoil3@6cqmoj43bsum>
On Mon, May 05, 2025 at 10:13:37AM -0700, Shakeel Butt wrote:
> Ccing networking folks.
>
> Background: https://lore.kernel.org/dvyyqubghf67b3qsuoreegqk4qnuuqfkk7plpfhhrck5yeeuic@xbn4c6c7yc42/
>
> On Mon, May 05, 2025 at 12:28:43PM +0200, Vlastimil Babka wrote:
> > On 5/3/25 01:03, Shakeel Butt wrote:
> > >> > index cd81c70d144b..f8b9c7aa6771 100644
> > >> > --- a/mm/memcontrol.c
> > >> > +++ b/mm/memcontrol.c
> > >> > @@ -1858,7 +1858,6 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> > >> > {
> > >> > struct memcg_stock_pcp *stock;
> > >> > uint8_t stock_pages;
> > >> > - unsigned long flags;
> > >> > bool ret = false;
> > >> > int i;
> > >> >
> > >> > @@ -1866,8 +1865,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> > >> > return ret;
> > >> >
> > >> > if (gfpflags_allow_spinning(gfp_mask))
> > >> > - local_lock_irqsave(&memcg_stock.lock, flags);
> > >> > - else if (!local_trylock_irqsave(&memcg_stock.lock, flags))
> > >> > + local_lock(&memcg_stock.lock);
> > >> > + else if (!local_trylock(&memcg_stock.lock))
> > >> > return ret;
> > >>
> > >> I don't think it works.
> > >> When there is a normal irq and something doing regular GFP_NOWAIT
> > >> allocation gfpflags_allow_spinning() will be true and
> > >> local_lock() will reenter and complain that lock->acquired is
> > >> already set... but only with lockdep on.
> > >
> > > Yes indeed. I dropped the first patch and didn't fix this one
> > > accordingly. I think the fix can be as simple as checking for
> > > in_task() here instead of gfp_mask. That should work for both RT and
> > > non-RT kernels.
> >
> > These in_task() checks seem hacky to me. I think the patch 1 in v1 was the
> > correct way how to use the local_trylock() to avoid these.
> >
> > As for the RT concerns, AFAIK RT isn't about being fast, but about being
> > preemptible, and the v1 approach didn't violate that - taking the slowpaths
> > more often shouldn't be an issue.
> >
> > Let me quote Shakeel's scenario from the v1 thread:
> >
> > > I didn't really think too much about PREEMPT_RT kernels as I assume
> > > performance is not top priority but I think I get your point. Let me
> >
> > Agreed.
> >
> > > explain and correct me if I am wrong. On PREEMPT_RT kernel, the local
> > > lock is a spin lock which is actually a mutex but with priority
> > > inheritance. A task having the local lock can still get context switched
> >
> > Let's say (seems implied already) this is a low prio task.
> >
> > > (but will remain on same CPU run queue) and the newer task can try to
> >
> > And this is a high prio task.
> >
> > > acquire the memcg stock local lock. If we just do trylock, it will
> > > always go to the slow path but if we do local_lock() then it will sleeps
> > > and possibly gives its priority to the task owning the lock and possibly
> > > make that task to get the CPU. Later the task slept on memcg stock lock
> > > will wake up and go through fast path.
> >
> > I think from RT latency perspective it could very much be better for the
> > high prio task just skip the fast path and go for the slowpath, instead of
> > going to sleep while boosting the low prio task to let the high prio task
> > use the fast path later. It's not really a fast path anymore I'd say.
>
> Thanks Vlastimil, this is actually a very good point. Slow path of memcg
> charging is couple of atomic operations while the alternative here is at
> least two context switches (and possibly scheduler delay). So, it does
> not seem like a fast path anymore.
>
> I have cc'ed networking folks to get their take as well. Orthogonally I
> will do some netperf benchmarking on v1 with RT kernel.
Let me share the result with PREEMPT_RT config on next-20250505 with and
without the v1 of this series.
I ran varying number of netperf clients in different cgroups on a 72 CPU
machine.
$ netserver -6
$ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K
number of clients | Without series | With series
6 | 38559.1 Mbps | 38652.6 Mbps
12 | 37388.8 Mbps | 37560.1 Mbps
18 | 30707.5 Mbps | 31378.3 Mbps
24 | 25908.4 Mbps | 26423.9 Mbps
30 | 22347.7 Mbps | 22326.5 Mbps
36 | 20235.1 Mbps | 20165.0 Mbps
I don't see any significant performance difference for the network
intensive workload with this series.
I am going to send out v3 which will be rebased version of v1 with all
these details unless someone has concerns about this.
prev parent reply other threads:[~2025-05-05 20:49 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-02 0:17 [PATCH v2 0/3] memcg: decouple memcg and objcg stocks Shakeel Butt
2025-05-02 0:17 ` [PATCH v2 1/3] memcg: separate local_trylock for memcg and obj Shakeel Butt
2025-05-02 0:17 ` [PATCH v2 2/3] memcg: completely decouple memcg and obj stocks Shakeel Butt
2025-05-02 0:17 ` [PATCH v2 3/3] memcg: no irq disable for memcg stock lock Shakeel Butt
2025-05-02 18:29 ` Alexei Starovoitov
2025-05-02 23:03 ` Shakeel Butt
2025-05-02 23:28 ` Alexei Starovoitov
2025-05-02 23:40 ` Shakeel Butt
2025-05-05 9:06 ` Sebastian Andrzej Siewior
2025-05-05 10:28 ` Vlastimil Babka
2025-05-05 17:13 ` Shakeel Butt
2025-05-05 20:49 ` Shakeel Butt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jilyoryfq7cg6xp4cxbipct5vfbhu7ivp2jmzzigufqd6r5uss@h2cmibfg3fdf \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=ast@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@gmail.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox