linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: "Greg Thelen" <gthelen@google.com>, "Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Eric Dumazet" <edumzaet@google.com>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	"Eric Dumazet" <edumazet@google.com>
Subject: Re: [PATCH] cgroup/rstat: avoid disabling irqs for O(num_cpu)
Date: Thu, 27 Mar 2025 17:17:01 +0000	[thread overview]
Message-ID: <Z-WIDWP1o4g-N5mg@google.com> (raw)
In-Reply-To: <2vznaaotzkgkrfoi2qitiwdjinpl7ozhpz7w6n7577kaa2hpki@okh2mkqqhbkq>

On Thu, Mar 27, 2025 at 03:38:50PM +0100, Mateusz Guzik wrote:
> On Wed, Mar 19, 2025 at 05:18:05PM +0000, Yosry Ahmed wrote:
> > On Wed, Mar 19, 2025 at 11:47:32AM +0100, Mateusz Guzik wrote:
> > > Is not this going a little too far?
> > > 
> > > the lock + irq trip is quite expensive in its own right and now is
> > > going to be paid for each cpu, as in the total time spent executing
> > > cgroup_rstat_flush_locked is going to go up.
> > > 
> > > Would your problem go away toggling this every -- say -- 8 cpus?
> > 
> > I was concerned about this too, and about more lock bouncing, but the
> > testing suggests that this actually overall improves the latency of
> > cgroup_rstat_flush_locked() (at least on tested HW).
> > 
> > So I don't think we need to do something like this unless a regression
> > is observed.
> > 
> 
> To my reading it reduces max time spent with irq disabled, which of
> course it does -- after all it toggles it for every CPU.
> 
> Per my other e-mail in the thread the irq + lock trips remain not cheap
> at least on Sapphire Rapids.
> 
> In my testing outlined below I see 11% increase in total execution time
> with the irq + lock trip for every CPU in a 24-way vm.
> 
> So I stand by instead doing this every n CPUs, call it 8 or whatever.
> 
> How to repro:
> 
> I employed a poor-man's profiler like so:
> 
> bpftrace -e 'kprobe:cgroup_rstat_flush_locked { @start[tid] = nsecs; } kretprobe:cgroup_rstat_flush_locked /@start[tid]/ { print(nsecs - @start[tid]); delete(@start[tid]); } interval:s:60 { exit(); }'
> 
> This patch or not, execution time varies wildly even while the box is idle.
> 
> The above runs for a minute, collecting 23 samples (you may get
> "lucky" and get one extra, in that case remove it for comparison).
> 
> A sysctl was added to toggle the new behavior vs old one. Patch at the
> end.
> 
> "enabled"(1) means new behavior, "disabled"(0) means the old one.
> 
> Sum of nsecs (results piped to: awk '{ sum += $1 } END { print sum }'):
> disabled:	903610
> enabled:	1006833 (+11.4%)

IIUC this calculates the amount of elapsed time between start and
finish, not necessarily the function's own execution time. Is it
possible that the increase in time is due to more interrupts arriving
during the function execution (which is what we want), rather than more
time being spent on disabling/enabling IRQs?


  reply	other threads:[~2025-03-27 17:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-19  7:13 Greg Thelen
2025-03-19  7:17 ` Greg Thelen
2025-03-19 10:20 ` Michal Koutný
2025-03-19 10:47 ` Mateusz Guzik
2025-03-19 17:18   ` Yosry Ahmed
2025-03-27 14:38     ` Mateusz Guzik
2025-03-27 17:17       ` Yosry Ahmed [this message]
2025-03-27 17:47         ` Mateusz Guzik
2025-04-01 15:00           ` Michal Koutný
2025-04-01 15:46             ` Mateusz Guzik
2025-04-01 16:59               ` Michal Koutný
2025-03-19 17:26   ` Tejun Heo
2025-03-26 23:57     ` Mateusz Guzik
2025-03-19 17:16 ` Yosry Ahmed
2025-03-19 18:06   ` Johannes Weiner
2025-03-19 18:35     ` Yosry Ahmed
2025-03-19 19:10       ` Tejun Heo
2025-03-19 19:16       ` Johannes Weiner
2025-03-19 19:46         ` Johannes Weiner
2025-03-19 17:26 ` Tejun Heo
2025-03-19 17:35 ` Johannes Weiner
2025-03-19 17:53 ` Shakeel Butt
2025-03-19 18:37   ` Eric Dumazet
2025-03-20 14:43   ` Greg Thelen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-WIDWP1o4g-N5mg@google.com \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=edumzaet@google.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mjguzik@gmail.com \
    --cc=mkoutny@suse.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox