From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 961E9C2D0CD for ; Mon, 19 May 2025 21:41:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8063F6B008A; Mon, 19 May 2025 17:41:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DE776B008C; Mon, 19 May 2025 17:41:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71F716B0092; Mon, 19 May 2025 17:41:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 51DEF6B008A for ; Mon, 19 May 2025 17:41:40 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8227A5EB93 for ; Mon, 19 May 2025 21:01:40 +0000 (UTC) X-FDA: 83460878760.21.6A10B9C Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf30.hostedemail.com (Postfix) with ESMTP id B810380005 for ; Mon, 19 May 2025 21:01:38 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=njwescSh; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf30.hostedemail.com: domain of legion@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=legion@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747688498; a=rsa-sha256; cv=none; b=jMZg6vpCFje10Nk9/FMYpcYAeiIgisr3E8OP7xifBdF+Bcd/27/cZEi4BI6L1PbZNxzeaV 9swTgE2KTNRTG952c58kV/kH5bXMbkOV7b2DDZ2E/5f5ajJvi3ZJqIZmTnnfN7r5GI6a0T wF83IZIPtrCurDpQ66lY3LUMX4BaTS8= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=njwescSh; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf30.hostedemail.com: domain of legion@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=legion@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747688498; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ttXviWbnHPbK1uwpd0O/4JdZNvJpdetbHMZOH9OoZss=; b=P/ELQYv/Fng1T/T8MkPb1oLoc+M2qdSZ/IsDznyzY+VZcRLTRiXe0TQv63WSjW7OrltDzn N3jaVxaDbipnzCIuvSf+wZG6gqUBlZA9wz1I42HfDxinSoXXrGLluLVzSAxKdS8mHu7fgt Eiq2BlUrwzgxYLy44Y/zD95nbpfU+7E= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id F327BA4EACB; Mon, 19 May 2025 21:01:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D0968C4CEE4; Mon, 19 May 2025 21:01:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747688497; bh=XixqqRpjosG48i2+PvsQ5v+yrNqZcfiuvL820bhOgm8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=njwescShl3J8q+voT2CHdV71X8obbaNmulLvx+jpoIMAVzhVduIdS61UtgO3Heqar 2gTyrxMewL3bijQ+Cp/GTD4g2ewjq7c2dSnKjujegZIHGMSxgpE+mtEN0y0gM2dsdG U3RNgjaKVBvvZ5NGqnRsWr8yLRIEq871USajKdCQBtTcAerB1QUEiUguLdXMpG+6u9 mFu9zMgqOodoTmcn2Elji+/kDmz6FeWXZvdaR+9gOVb02T8ZtRn0KujPZU8q27FpnN fK6QSr1lqm8+6uF2vcAtjFR7fthuh1MBt6V+fQVE5bdPEhBx8iGXu28bTdtTGkUWap 3NpTSyrAMHnAw== Date: Mon, 19 May 2025 23:01:27 +0200 From: Alexey Gladkov To: Jann Horn Cc: Chen Ridong , akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, pfalcato@suse.de, bigeasy@linutronix.de, paulmck@kernel.org, chenridong@huawei.com, roman.gushchin@linux.dev, brauner@kernel.org, pmladek@suse.com, geert@linux-m68k.org, mingo@kernel.org, rrangel@chromium.org, francesco@valla.it, kpsingh@kernel.org, guoweikang.kernel@gmail.com, link@vivo.com, viro@zeniv.linux.org.uk, neil@brown.name, nichen@iscas.ac.cn, tglx@linutronix.de, frederic@kernel.org, peterz@infradead.org, oleg@redhat.com, joel.granados@kernel.org, linux@weissschuh.net, avagin@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lujialin4@huawei.com, "Serge E. Hallyn" , David Howells Subject: Re: [RFC next v2 0/2] ucounts: turn the atomic rlimit to percpu_counter Message-ID: References: <20250519131151.988900-1-chenridong@huaweicloud.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B810380005 X-Stat-Signature: rw1ddgzwpt8w6qmcdkpp58trhigf1kmy X-HE-Tag: 1747688498-625623 X-HE-Meta: U2FsdGVkX1+Tk1K85fMWdCU3YnSIhp3k5AIGzszwHVHqBmd/oK/6ffJWuiGU3+egxlgPq4tq6MX7RD2/4DuSXJa7ea2ipXYkxcRZoeOTDIJ3sYnt81DYxa58uWjAOQf4CrAfyZHlVLux4ByWroN1wRqCpp4OWFdE6tuTdvNy0iclUZPcpcMAV+fmH8B/jO7ueNJOu/oTKsbTraJE8bOnaroFSR2Q79tAkgxhW/LX02xlUZjkd2VtcKS0Z1xvsOS8ud9tGN8xIXfe3ozFzqTnNkgYBbBG7HtkiW99Yh5+1S9/Lc63epLvku+rL2m41BWW82Qr7q4ZAB8yE/TKC8O8Q3xtgsT3NM54/M/vOYW3SbYCmh3LXSwrSP7dovvhv5OTAMloJBX4Gamr3qCAO1LaEFpTsFsgmp2F2YU1v+cKXf5obAd2mKsVPLVB+gU9j9ZQE1AAH+R7ibdQemuKNHYb+zrCnfM7KErEA2anpFqsrgTmgyFJNXjrYUKbvYD3QlmG9MN96TGW5t9HCpJYoaaOtlhMB5Zt0AFCixACWmcmHK0qIj/YhzqvuLUyWuBSYsfqMBHbLL3RfZf2rfUisVNWT3u4aIdTDJMIDKBLp1FhYvd+F88yGSNt5uneMCX9WESSNXN7C+D/fyCiEkxpRkTagYXdOdrhQ4g99JaMJLE4YVnGcC+cioGNadK2p2SNadoxyeP6c3HTtINUi4/vzt7rivoGR6GXe5ZSFYzF2L4SGXyeQ2GYQBBTUuQfadFRDq33InneRid0nfTEhbdigm6jTLovzBMWcA8gzf1+2pd9E1wR9ufMVM0bLez8blynbuTYllO2ROxjbD2A34q5bPYHtJZm4ztX7RCaD/AVEXTSVRII5EDwU4cYU0kZwDHogXskY9aoWel+qz5yQaiduRJz/mRVsDiYSyteRiwgRplqL4Z9CkGDkZXA1CcWuhCzOp0bXI37xhlsUvd+cRxC/rg 9fduq+Yg Yooo0pd2c1opGsQ4UqEHZoqANuRHAQlypbx3tXbtM6PG9QDvtWVLLvW4fZfeW+vt83MyhFLF7XyollJGpLxXAYhyUTLb774ZE/45sSH0NoLAunfduVRh2EpJdXLn7ms7ONCD86quuYyIMWTAW4hGyVhPQa83W/+Ox/pHriHAujVd6eXTyVxbD/gFRj3lJHXm4GpLD9hht2a4MmwAjR8DcbAYj8UvaTmRSUc48jcYkfsZbMsrAJafYmve0OHqUN/SkbcHz3YvW1tr+IteemmvYKzqsFDYI+n2bzE/U7EqOpaaLCTO4cYWYgLTbfKmLGvAjIJMYY+gXyNLbKXZNKGSKWcIqjEUZbFmUrS8uX/3VdDuIjko8sKO+6O1Y1ic59VZpe5FMo1VJlf8Khn6hrSXd57LnkE+fAbVuHRFS1wNWxfPFdMU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, May 19, 2025 at 09:32:17PM +0200, Jann Horn wrote: > On Mon, May 19, 2025 at 3:25 PM Chen Ridong wrote: > > From: Chen Ridong > > > > The will-it-scale test case signal1 [1] has been observed. and the test > > results reveal that the signal sending system call lacks linearity. > > To further investigate this issue, we initiated a series of tests by > > launching varying numbers of dockers and closely monitored the throughput > > of each individual docker. The detailed test outcomes are presented as > > follows: > > > > | Dockers |1 |4 |8 |16 |32 |64 | > > | Throughput |380068 |353204 |308948 |306453 |180659 |129152 | > > > > The data clearly demonstrates a discernible trend: as the quantity of > > dockers increases, the throughput per container progressively declines. > > But is that actually a problem? Do you have real workloads that > concurrently send so many signals, or create inotify watches so > quickly, that this is has an actual performance impact? > > > In-depth analysis has identified the root cause of this performance > > degradation. The ucouts module conducts statistics on rlimit, which > > involves a significant number of atomic operations. These atomic > > operations, when acting on the same variable, trigger a substantial number > > of cache misses or remote accesses, ultimately resulting in a drop in > > performance. > > You're probably running into the namespace-associated ucounts here? So > the issue is probably that Docker creates all your containers with the > same owner UID (EUID at namespace creation), causing them all to > account towards a single ucount, while normally outside of containers, > each RUID has its own ucount instance? > > Sharing of rlimits between containers is probably normally undesirable > even without the cacheline bouncing, because it means that too much > resource usage in one container can cause resource allocations in > another container to fail... so I think the real problem here is at a > higher level, in the namespace setup code. Maybe root should be able > to create a namespace that doesn't inherit ucount limits of its owner > UID, or something like that... If we allow rlimits not to be inherited in the userns being created, the user will be able to bypass their rlimits by running a fork bomb inside the new userns. Or I missed your point ? In init_user_ns all rlimits that are bound to it are set to RLIM_INFINITY. So root can only reduce rlimits. https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/fork.c#n1091 -- Rgrds, legion