linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC next v2 0/2] ucounts: turn the atomic rlimit to percpu_counter
@ 2025-05-19 13:11 Chen Ridong
  2025-05-19 13:11 ` [RFC next v2 1/2] ucounts: free ucount only count and rlimit are zero Chen Ridong
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Chen Ridong @ 2025-05-19 13:11 UTC (permalink / raw)
  To: akpm, Liam.Howlett, lorenzo.stoakes, vbabka, jannh, pfalcato,
	bigeasy, paulmck, chenridong, roman.gushchin, brauner, pmladek,
	geert, mingo, rrangel, francesco, kpsingh, guoweikang.kernel,
	link, viro, neil, nichen, tglx, frederic, peterz, oleg,
	joel.granados, linux, avagin, legion
  Cc: linux-kernel, linux-mm, lujialin4

From: Chen Ridong <chenridong@huawei.com>

The will-it-scale test case signal1 [1] has been observed. and the test
results reveal that the signal sending system call lacks linearity.
To further investigate this issue, we initiated a series of tests by
launching varying numbers of dockers and closely monitored the throughput
of each individual docker. The detailed test outcomes are presented as
follows:

	| Dockers     |1      |4      |8      |16     |32     |64     |
	| Throughput  |380068 |353204 |308948 |306453 |180659 |129152 |

The data clearly demonstrates a discernible trend: as the quantity of
dockers increases, the throughput per container progressively declines.
In-depth analysis has identified the root cause of this performance
degradation. The ucouts module conducts statistics on rlimit, which
involves a significant number of atomic operations. These atomic
operations, when acting on the same variable, trigger a substantial number
of cache misses or remote accesses, ultimately resulting in a drop in
performance.

This patch set addresses scalability issues in the ucounts rlimit by
replacing atomic rlimit counters with percpu_counter, which distributes
counts across CPU cores to reduce cache contention under heavy load.

Patch 1 modifies thate ucount can be freed until both the refcount and
rlimit are fully released, minimizing redundant summations. Patch 2 turns
the atomic rlimit to percpu_counter, which is suggested by Andrew.

[1] https://github.com/antonblanchard/will-it-scale/blob/master/tests/

---
v2: use percpu_counter intead of cache rlimit.

v1: https://lore.kernel.org/lkml/20250509072054.148257-1-chenridong@huaweicloud.com/

Chen Ridong (2):
  ucounts: free ucount only count and rlimit are zero
  ucounts: turn the atomic rlimit to percpu_counter

 include/linux/user_namespace.h |  17 +++-
 init/main.c                    |   1 +
 ipc/mqueue.c                   |   6 +-
 kernel/signal.c                |   8 +-
 kernel/ucount.c                | 169 +++++++++++++++++++++++----------
 mm/mlock.c                     |   5 +-
 6 files changed, 138 insertions(+), 68 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-21  1:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-19 13:11 [RFC next v2 0/2] ucounts: turn the atomic rlimit to percpu_counter Chen Ridong
2025-05-19 13:11 ` [RFC next v2 1/2] ucounts: free ucount only count and rlimit are zero Chen Ridong
2025-05-19 13:11 ` [RFC next v2 2/2] ucounts: turn the atomic rlimit to percpu_counter Chen Ridong
2025-05-19 19:32 ` [RFC next v2 0/2] " Jann Horn
2025-05-19 21:01   ` Alexey Gladkov
2025-05-19 21:24     ` Jann Horn
2025-05-21  1:48       ` Chen Ridong
2025-05-21  1:41   ` Chen Ridong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox