From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4C88C3ABDD for ; Mon, 19 May 2025 13:25:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD58E6B00D7; Mon, 19 May 2025 09:24:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B85736B00D8; Mon, 19 May 2025 09:24:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A02DD6B00D9; Mon, 19 May 2025 09:24:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 806A76B00D7 for ; Mon, 19 May 2025 09:24:57 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 84E5CE0386 for ; Mon, 19 May 2025 13:25:01 +0000 (UTC) X-FDA: 83459728002.19.849D081 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by imf21.hostedemail.com (Postfix) with ESMTP id 148361C0002 for ; Mon, 19 May 2025 13:24:58 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf21.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747661099; a=rsa-sha256; cv=none; b=WZeBHD12XuDOn3YMXHWKUP/H3CJE4SryLlYr4sz4dfpCz35ZmHKsD3H47luYAvcwnR5KNY Z+vUMRKGFYZpGRY0hJY6HBnM+kTz8l5bVwWSMRni6qBwejflOfpyZ4qmmqfprDYkhba+In ubQs4/g3FL+5qZSf2t0uI1M8JtLGGuU= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf21.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747661099; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O0NYbT7WIWJn3JTuluh3WrG/F5BmRix3yHrNleX2780=; b=mwGfiOwAfgngh3vXmGX7+UEElhBWmKJwDiibdyWoE3Qc+PCt8h1aoez4fj1KcS10rzKRXt yE8OjJYAcol0bqnJaWKL5sKcAa7/iuacVuNr05tvN/Ugatr4ooAs7dmfN+q8evfUV88b9m kC54uDuM91FjMU3XmfS+NBwGII9mdps= Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4b1JM75cq7zYQtxC for ; Mon, 19 May 2025 21:24:51 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 03FE31A1085 for ; Mon, 19 May 2025 21:24:51 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP2 (Coremail) with SMTP id Syh0CgDXk2YDMSto5JogMw--.10967S3; Mon, 19 May 2025 21:24:50 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, pfalcato@suse.de, bigeasy@linutronix.de, paulmck@kernel.org, chenridong@huawei.com, roman.gushchin@linux.dev, brauner@kernel.org, pmladek@suse.com, geert@linux-m68k.org, mingo@kernel.org, rrangel@chromium.org, francesco@valla.it, kpsingh@kernel.org, guoweikang.kernel@gmail.com, link@vivo.com, viro@zeniv.linux.org.uk, neil@brown.name, nichen@iscas.ac.cn, tglx@linutronix.de, frederic@kernel.org, peterz@infradead.org, oleg@redhat.com, joel.granados@kernel.org, linux@weissschuh.net, avagin@google.com, legion@kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, lujialin4@huawei.com Subject: [RFC next v2 1/2] ucounts: free ucount only count and rlimit are zero Date: Mon, 19 May 2025 13:11:50 +0000 Message-Id: <20250519131151.988900-2-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250519131151.988900-1-chenridong@huaweicloud.com> References: <20250519131151.988900-1-chenridong@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID:Syh0CgDXk2YDMSto5JogMw--.10967S3 X-Coremail-Antispam: 1UD129KBjvJXoWxGr17uw18Gr1furWUtF18Zrb_yoWrtF1fpr 4xG345Aa1kJr43JwsxJw48Ary5tr1S9r15GFy7Gwn3Jr13Xr1Fgw1xAr1YgFnxXrn7Jrya qFnrWFyDCF4UXa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPIb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUGw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVW7JVWDJwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI 0_Wrv_ZF1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26rWY6r4UJw CIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x02 67AKxVWxJVW8Jr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r 1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x07jh b18UUUUU= X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 148361C0002 X-Rspam-User: X-Stat-Signature: saxu4gjezacpmaji35paqhridgdh5sf9 X-HE-Tag: 1747661098-492860 X-HE-Meta: U2FsdGVkX18atdhSiwLO/tNgz0XsCprc4yRpqUSyLxGKSsoNBT8u0jRyMeNsBQu42n15M6fejwfI7Yy9vI90niKcA3DR3SKNu8guFlNg5CCvb4MTOflGWUHh/OQWzzhrCoAlQiHl9r8TsDeq0rWGvw8enn9kc6l/qiVHWvyV8EzPh9GFQIqAAEIx0l8OQ+zK4Bx8xrzZp5UhcnmOo04wp14MDavNt+be1E1Duf/d2MPxD4no6qJqorcsSsGRZKQ64UrPp5W7Ir1hq4UpRZ689sK7Gai4V8DdKfbBpv1nabyVPpgJpbZEGl0ZcVX/tO+Z84nN1r43vIjqH03exZ4j+Ib4t5DgNb14gWzT44qjy/2cHThSCLdB/AXfRSnsm0rId3SYOChZr6L9NbBRnbzMHXiI66/kIGIWTlbzou36/gVT7pguwWwIRyFLRhilBf6nEe+lL4gdlwN8rG1iSlVkI/k9eWLqPH+wf4IbCmvoP1YSKq3szRWa9rg2z+ixsRglAN2mx3lhwWO/N7aG4bNpMj5Hq4f1TEUCS0qx/YjFQAiHNu5NE0J8am3f40/U9v579vakyI/RsKSfmDpbS5tcLqT/0EEtUrwpcQHpUPca/uqZWn5QiiKLcvwx/57LRQGvf889KpN4PGow8Vefr35IBvUuclTiWgUuEqJp9FbnMzFFseAtYrXv3eKfMlp8pFgbovMX+SdmSSOuqeUFaZYNhVOaiKWgj5F1+r400RP0XnuoPIQW+2Ypp5OIfT1CvGtAal2GfJ+l2SrlYUSWnQmVAelXggdcUKoC0q5icToNNoyZq6YLf73z0n095oW5Bw0GdUtdNhhRN3mTaKY2R2IVboUu2RjIaRJBjbiXaxTKyXF0HhFitvCXhJQamt/UZhoC9+HP99xHvpyHa/22NUb99S10uPymhmxko2mKYLG5j8mVE0q+mn/BCNAJ/fh2jz30eV41XhhzJYKTD1mxoTS c6yOeYj1 2kq+GVYmPXMzvtvae/yv34YSHznAC0GCL4f/ZuhAttrKvNdHS7rAnhb5aFu5/HBiWOWNZgBQxLcb8wkzFcqbRZQcfPC2HhZlQPCs25TpTc2VcnxLSB7+zePNCRseSdaxu3lk1/5rXE2OCcUcsM/EnrVskK9HHJt260phgAGyKOojERQmyThVSF6CNjz9yLSZ4PdXs5Qb9+ADMoT4NswxGeDp4RnFQkQb5y2iOfO/82M5+na12W8uBVexMoQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chen Ridong After the commit fda31c50292a ("signal: avoid double atomic counter increments for user accounting") and the commit 15bc01effefe ("ucounts: Fix signal ucount refcounting"), the reference counting mechanism for ucounts has the following behavior. The reference count is incremented when the first pending signal pins to the ucounts, and it is decremented when the last pending signal is dequeued. This implies that as long as there are any pending signals pinned to the ucounts, the ucounts cannot be freed. To address the scalability issue, the next patch will mention, the ucounts.rlimits will be converted to percpu_counter. However, summing up the percpu counters is expensive. To overcome this, this patch modifies the conditions for freeing ucounts. Instead of complex checks regarding whether a pending signal is the first or the last one, the ucounts can now be freed only when both the refcount and the rlimits are zero. This change not only simplifies the logic but also reduces the number of atomic operations. Signed-off-by: Chen Ridong --- include/linux/user_namespace.h | 1 + kernel/ucount.c | 75 ++++++++++++++++++++++++++-------- 2 files changed, 59 insertions(+), 17 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index a0bb6d012137..6e2229ea4673 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -122,6 +122,7 @@ struct ucounts { kuid_t uid; struct rcu_head rcu; rcuref_t count; + atomic_long_t freed; atomic_long_t ucount[UCOUNT_COUNTS]; atomic_long_t rlimit[UCOUNT_RLIMIT_COUNTS]; }; diff --git a/kernel/ucount.c b/kernel/ucount.c index 8686e329b8f2..125471af7d59 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -185,18 +185,61 @@ struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid) return new; } -void put_ucounts(struct ucounts *ucounts) +/* + * Whether all the rlimits are zero. + * For now, only UCOUNT_RLIMIT_SIGPENDING is considered. + * Other rlimit can be added. + */ +static bool rlimits_are_zero(struct ucounts *ucounts) +{ + int rtypes[] = { UCOUNT_RLIMIT_SIGPENDING }; + int rtype; + + for (int i = 0; i < sizeof(rtypes)/sizeof(int); ++i) { + rtype = rtypes[i]; + if (atomic_long_read(&ucounts->rlimit[rtype]) > 0) + return false; + } + return true; +} + +/* + * Ucounts can be freed only when the ucount->count is released + * and the rlimits are zero. + * The caller should hold rcu_read_lock(); + */ +static bool ucounts_can_be_freed(struct ucounts *ucounts) +{ + if (rcuref_read(&ucounts->count) > 0) + return false; + if (!rlimits_are_zero(ucounts)) + return false; + /* Prevent double free */ + return atomic_long_cmpxchg(&ucounts->freed, 0, 1) == 0; +} + +static void free_ucounts(struct ucounts *ucounts) { unsigned long flags; - if (rcuref_put(&ucounts->count)) { - spin_lock_irqsave(&ucounts_lock, flags); - hlist_nulls_del_rcu(&ucounts->node); - spin_unlock_irqrestore(&ucounts_lock, flags); + spin_lock_irqsave(&ucounts_lock, flags); + hlist_nulls_del_rcu(&ucounts->node); + spin_unlock_irqrestore(&ucounts_lock, flags); + + put_user_ns(ucounts->ns); + kfree_rcu(ucounts, rcu); +} - put_user_ns(ucounts->ns); - kfree_rcu(ucounts, rcu); +void put_ucounts(struct ucounts *ucounts) +{ + rcu_read_lock(); + if (rcuref_put(&ucounts->count) && + ucounts_can_be_freed(ucounts)) { + rcu_read_unlock(); + free_ucounts(ucounts); + return; } + rcu_read_unlock(); } static inline bool atomic_long_inc_below(atomic_long_t *v, int u) @@ -281,11 +324,17 @@ static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts, { struct ucounts *iter, *next; for (iter = ucounts; iter != last; iter = next) { + bool to_free; + + rcu_read_lock(); long dec = atomic_long_sub_return(1, &iter->rlimit[type]); WARN_ON_ONCE(dec < 0); next = iter->ns->ucounts; - if (dec == 0) - put_ucounts(iter); + to_free = ucounts_can_be_freed(iter); + rcu_read_unlock(); + /* If ucounts->count is zero and the rlimits are zero, free ucounts */ + if (to_free) + free_ucounts(iter); } } @@ -310,14 +359,6 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type, ret = new; if (!override_rlimit) max = get_userns_rlimit_max(iter->ns, type); - /* - * Grab an extra ucount reference for the caller when - * the rlimit count was previously 0. - */ - if (new != 1) - continue; - if (!get_ucounts(iter)) - goto dec_unwind; } return ret; dec_unwind: -- 2.34.1