linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeel.butt@linux.dev>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Shakeel Butt <shakeel.butt@gmail.com>,
	 Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	 Alexei Starovoitov <ast@kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	 "open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	 Meta kernel team <kernel-team@meta.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	 Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org
Subject: Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock
Date: Mon, 5 May 2025 13:49:46 -0700	[thread overview]
Message-ID: <jilyoryfq7cg6xp4cxbipct5vfbhu7ivp2jmzzigufqd6r5uss@h2cmibfg3fdf> (raw)
In-Reply-To: <ek6ptpggcmnp5kyt37ytriu6d4gj5grpfwcok3rupu5tbjoil3@6cqmoj43bsum>

On Mon, May 05, 2025 at 10:13:37AM -0700, Shakeel Butt wrote:
> Ccing networking folks.
> 
> Background: https://lore.kernel.org/dvyyqubghf67b3qsuoreegqk4qnuuqfkk7plpfhhrck5yeeuic@xbn4c6c7yc42/
> 
> On Mon, May 05, 2025 at 12:28:43PM +0200, Vlastimil Babka wrote:
> > On 5/3/25 01:03, Shakeel Butt wrote:
> > >> > index cd81c70d144b..f8b9c7aa6771 100644
> > >> > --- a/mm/memcontrol.c
> > >> > +++ b/mm/memcontrol.c
> > >> > @@ -1858,7 +1858,6 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> > >> >  {
> > >> >         struct memcg_stock_pcp *stock;
> > >> >         uint8_t stock_pages;
> > >> > -       unsigned long flags;
> > >> >         bool ret = false;
> > >> >         int i;
> > >> >
> > >> > @@ -1866,8 +1865,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> > >> >                 return ret;
> > >> >
> > >> >         if (gfpflags_allow_spinning(gfp_mask))
> > >> > -               local_lock_irqsave(&memcg_stock.lock, flags);
> > >> > -       else if (!local_trylock_irqsave(&memcg_stock.lock, flags))
> > >> > +               local_lock(&memcg_stock.lock);
> > >> > +       else if (!local_trylock(&memcg_stock.lock))
> > >> >                 return ret;
> > >>
> > >> I don't think it works.
> > >> When there is a normal irq and something doing regular GFP_NOWAIT
> > >> allocation gfpflags_allow_spinning() will be true and
> > >> local_lock() will reenter and complain that lock->acquired is
> > >> already set... but only with lockdep on.
> > > 
> > > Yes indeed. I dropped the first patch and didn't fix this one
> > > accordingly. I think the fix can be as simple as checking for
> > > in_task() here instead of gfp_mask. That should work for both RT and
> > > non-RT kernels.
> > 
> > These in_task() checks seem hacky to me. I think the patch 1 in v1 was the
> > correct way how to use the local_trylock() to avoid these.
> > 
> > As for the RT concerns, AFAIK RT isn't about being fast, but about being
> > preemptible, and the v1 approach didn't violate that - taking the slowpaths
> > more often shouldn't be an issue.
> > 
> > Let me quote Shakeel's scenario from the v1 thread:
> > 
> > > I didn't really think too much about PREEMPT_RT kernels as I assume
> > > performance is not top priority but I think I get your point. Let me
> > 
> > Agreed.
> > 
> > > explain and correct me if I am wrong. On PREEMPT_RT kernel, the local
> > > lock is a spin lock which is actually a mutex but with priority
> > > inheritance. A task having the local lock can still get context switched
> > 
> > Let's say (seems implied already) this is a low prio task.
> > 
> > > (but will remain on same CPU run queue) and the newer task can try to
> > 
> > And this is a high prio task.
> > 
> > > acquire the memcg stock local lock. If we just do trylock, it will
> > > always go to the slow path but if we do local_lock() then it will sleeps
> > > and possibly gives its priority to the task owning the lock and possibly
> > > make that task to get the CPU. Later the task slept on memcg stock lock
> > > will wake up and go through fast path.
> > 
> > I think from RT latency perspective it could very much be better for the
> > high prio task just skip the fast path and go for the slowpath, instead of
> > going to sleep while boosting the low prio task to let the high prio task
> > use the fast path later. It's not really a fast path anymore I'd say.
> 
> Thanks Vlastimil, this is actually a very good point. Slow path of memcg
> charging is couple of atomic operations while the alternative here is at
> least two context switches (and possibly scheduler delay). So, it does
> not seem like a fast path anymore.
> 
> I have cc'ed networking folks to get their take as well. Orthogonally I
> will do some netperf benchmarking on v1 with RT kernel.

Let me share the result with PREEMPT_RT config on next-20250505 with and
without the v1 of this series.

I ran varying number of netperf clients in different cgroups on a 72 CPU
machine.

 $ netserver -6
 $ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K

number of clients | Without series | With series
  6               | 38559.1 Mbps   | 38652.6 Mbps
  12              | 37388.8 Mbps   | 37560.1 Mbps
  18              | 30707.5 Mbps   | 31378.3 Mbps
  24              | 25908.4 Mbps   | 26423.9 Mbps
  30              | 22347.7 Mbps   | 22326.5 Mbps
  36              | 20235.1 Mbps   | 20165.0 Mbps

I don't see any significant performance difference for the network
intensive workload with this series.

I am going to send out v3 which will be rebased version of v1 with all
these details unless someone has concerns about this.


      reply	other threads:[~2025-05-05 20:49 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-02  0:17 [PATCH v2 0/3] memcg: decouple memcg and objcg stocks Shakeel Butt
2025-05-02  0:17 ` [PATCH v2 1/3] memcg: separate local_trylock for memcg and obj Shakeel Butt
2025-05-02  0:17 ` [PATCH v2 2/3] memcg: completely decouple memcg and obj stocks Shakeel Butt
2025-05-02  0:17 ` [PATCH v2 3/3] memcg: no irq disable for memcg stock lock Shakeel Butt
2025-05-02 18:29   ` Alexei Starovoitov
2025-05-02 23:03     ` Shakeel Butt
2025-05-02 23:28       ` Alexei Starovoitov
2025-05-02 23:40         ` Shakeel Butt
2025-05-05  9:06           ` Sebastian Andrzej Siewior
2025-05-05 10:28       ` Vlastimil Babka
2025-05-05 17:13         ` Shakeel Butt
2025-05-05 20:49           ` Shakeel Butt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jilyoryfq7cg6xp4cxbipct5vfbhu7ivp2jmzzigufqd6r5uss@h2cmibfg3fdf \
    --to=shakeel.butt@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox