linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cgroup/rstat: avoid disabling irqs for O(num_cpu)
@ 2025-03-19  7:13 Greg Thelen
  2025-03-19  7:17 ` Greg Thelen
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Greg Thelen @ 2025-03-19  7:13 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Koutný
  Cc: Andrew Morton, Eric Dumazet, Yosry Ahmed, cgroups, linux-mm,
	linux-kernel, Greg Thelen, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

cgroup_rstat_flush_locked() grabs the irq safe cgroup_rstat_lock while
iterating all possible cpus. It only drops the lock if there is
scheduler or spin lock contention. If neither, then interrupts can be
disabled for a long time. On large machines this can disable interrupts
for a long enough time to drop network packets. On 400+ CPU machines
I've seen interrupt disabled for over 40 msec.

Prevent rstat from disabling interrupts while processing all possible
cpus. Instead drop and reacquire cgroup_rstat_lock for each cpu. This
approach was previously discussed in
https://lore.kernel.org/lkml/ZBz%2FV5a7%2F6PZeM7S@slm.duckdns.org/,
though this was in the context of an non-irq rstat spin lock.

Benchmark this change with:
1) a single stat_reader process with 400 threads, each reading a test
   memcg's memory.stat repeatedly for 10 seconds.
2) 400 memory hog processes running in the test memcg and repeatedly
   charging memory until oom killed. Then they repeat charging and oom
   killing.

v6.14-rc6 with CONFIG_IRQSOFF_TRACER with stat_reader and hogs, finds
interrupts are disabled by rstat for 45341 usec:
  #  => started at: _raw_spin_lock_irq
  #  => ended at:   cgroup_rstat_flush
  #
  #
  #                    _------=> CPU#
  #                   / _-----=> irqs-off/BH-disabled
  #                  | / _----=> need-resched
  #                  || / _---=> hardirq/softirq
  #                  ||| / _--=> preempt-depth
  #                  |||| / _-=> migrate-disable
  #                  ||||| /     delay
  #  cmd     pid     |||||| time  |   caller
  #     \   /        ||||||  \    |    /
  stat_rea-96532    52d....    0us*: _raw_spin_lock_irq
  stat_rea-96532    52d.... 45342us : cgroup_rstat_flush
  stat_rea-96532    52d.... 45342us : tracer_hardirqs_on <-cgroup_rstat_flush
  stat_rea-96532    52d.... 45343us : <stack trace>
   => memcg1_stat_format
   => memory_stat_format
   => memory_stat_show
   => seq_read_iter
   => vfs_read
   => ksys_read
   => do_syscall_64
   => entry_SYSCALL_64_after_hwframe

With this patch the CONFIG_IRQSOFF_TRACER doesn't find rstat to be the
longest holder. The longest irqs-off holder has irqs disabled for
4142 usec, a huge reduction from previous 45341 usec rstat finding.

Running stat_reader memory.stat reader for 10 seconds:
- without memory hogs: 9.84M accesses => 12.7M accesses
-    with memory hogs: 9.46M accesses => 11.1M accesses
The throughput of memory.stat access improves.

The mode of memory.stat access latency after grouping by of 2 buckets:
- without memory hogs: 64 usec => 16 usec
-    with memory hogs: 64 usec =>  8 usec
The memory.stat latency improves.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Tested-by: Greg Thelen <gthelen@google.com>
---
 kernel/cgroup/rstat.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index aac91466279f..976c24b3671a 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -323,13 +323,11 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp)
 			rcu_read_unlock();
 		}
 
-		/* play nice and yield if necessary */
-		if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) {
-			__cgroup_rstat_unlock(cgrp, cpu);
-			if (!cond_resched())
-				cpu_relax();
-			__cgroup_rstat_lock(cgrp, cpu);
-		}
+		/* play nice and avoid disabling interrupts for a long time */
+		__cgroup_rstat_unlock(cgrp, cpu);
+		if (!cond_resched())
+			cpu_relax();
+		__cgroup_rstat_lock(cgrp, cpu);
 	}
 }
 
-- 
2.49.0.rc1.451.g8f38331e32-goog



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-04-01 16:59 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-19  7:13 [PATCH] cgroup/rstat: avoid disabling irqs for O(num_cpu) Greg Thelen
2025-03-19  7:17 ` Greg Thelen
2025-03-19 10:20 ` Michal Koutný
2025-03-19 10:47 ` Mateusz Guzik
2025-03-19 17:18   ` Yosry Ahmed
2025-03-27 14:38     ` Mateusz Guzik
2025-03-27 17:17       ` Yosry Ahmed
2025-03-27 17:47         ` Mateusz Guzik
2025-04-01 15:00           ` Michal Koutný
2025-04-01 15:46             ` Mateusz Guzik
2025-04-01 16:59               ` Michal Koutný
2025-03-19 17:26   ` Tejun Heo
2025-03-26 23:57     ` Mateusz Guzik
2025-03-19 17:16 ` Yosry Ahmed
2025-03-19 18:06   ` Johannes Weiner
2025-03-19 18:35     ` Yosry Ahmed
2025-03-19 19:10       ` Tejun Heo
2025-03-19 19:16       ` Johannes Weiner
2025-03-19 19:46         ` Johannes Weiner
2025-03-19 17:26 ` Tejun Heo
2025-03-19 17:35 ` Johannes Weiner
2025-03-19 17:53 ` Shakeel Butt
2025-03-19 18:37   ` Eric Dumazet
2025-03-20 14:43   ` Greg Thelen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox