From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56ED3C3DA6D for ; Wed, 21 May 2025 01:48:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E484F6B007B; Tue, 20 May 2025 21:48:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DF9166B0082; Tue, 20 May 2025 21:48:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D0EFA6B0083; Tue, 20 May 2025 21:48:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B58926B007B for ; Tue, 20 May 2025 21:48:31 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 378841A1D79 for ; Wed, 21 May 2025 01:48:31 +0000 (UTC) X-FDA: 83465230422.19.598DBBE Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by imf28.hostedemail.com (Postfix) with ESMTP id E023EC0003 for ; Wed, 21 May 2025 01:48:26 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747792109; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WzcRXaSjwgpp91SX6+vReXd8syZv0Wc8a0PdKFJw06s=; b=QsIdQM5rNYse19h5I02fVj6Opvdc3sgivnIXDVVlnCKTWPC8M74hstLZLdnnV+4fX/SfRO 85jgTLuDzBZYMLLeJHA/fyAmlpPo2eOga4bNEpKbikVByx9iDKTmt6p6H1iWt0ZE6LXqyh TFh/cIaCxbj8yplFx6cUL98MblO3wLQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747792109; a=rsa-sha256; cv=none; b=yvVTDFgCYCWeaiJu837hbVuMz9a9VaOVrIN2CTvpxyfOZVyPok+OCerkjzpE9oHx6eAa7o Yj+bjpUi0E1iMQd5JrA8GDUELBiWAernhKcmve3veI7N5/a9wO2IyUvB4ptXQx9AKpYZHo D5IGPPRFRIi8EOZNNA2A2vYJyG9Qk0A= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com; dmarc=none Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4b2DpB3NLfz4f3jt0 for ; Wed, 21 May 2025 09:48:02 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 4EEAC1A09E6 for ; Wed, 21 May 2025 09:48:22 +0800 (CST) Received: from [10.67.109.79] (unknown [10.67.109.79]) by APP3 (Coremail) with SMTP id _Ch0CgAHp8HkMC1o7sJXMw--.49119S2; Wed, 21 May 2025 09:48:22 +0800 (CST) Message-ID: Date: Wed, 21 May 2025 09:48:19 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC next v2 0/2] ucounts: turn the atomic rlimit to percpu_counter To: Jann Horn , Alexey Gladkov Cc: akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, pfalcato@suse.de, bigeasy@linutronix.de, paulmck@kernel.org, chenridong@huawei.com, roman.gushchin@linux.dev, brauner@kernel.org, pmladek@suse.com, geert@linux-m68k.org, mingo@kernel.org, rrangel@chromium.org, francesco@valla.it, kpsingh@kernel.org, guoweikang.kernel@gmail.com, link@vivo.com, viro@zeniv.linux.org.uk, neil@brown.name, nichen@iscas.ac.cn, tglx@linutronix.de, frederic@kernel.org, peterz@infradead.org, oleg@redhat.com, joel.granados@kernel.org, linux@weissschuh.net, avagin@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lujialin4@huawei.com, "Serge E. Hallyn" , David Howells References: <20250519131151.988900-1-chenridong@huaweicloud.com> Content-Language: en-US From: Chen Ridong In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-CM-TRANSID:_Ch0CgAHp8HkMC1o7sJXMw--.49119S2 X-Coremail-Antispam: 1UD129KBjvJXoWxXr4UGr18ur45JryUuF1Dtrb_yoW5Zw45pF W2y3Z8Kan5JFnxAwn2qw18Xa4rKr4fJryUX3W5G3yxA3Z0kFyS9F17t3yYkF9rGr4fK34j vF4jg347AFWDXaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvFb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS 14v26rWY6Fy7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I 8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWrXVW8 Jr1lIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7 CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AK xVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvj xUsPfHUUUUU X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ X-Stat-Signature: 4qm6si5m3uruh3tnxe3hb3xmemwti9xt X-Rspamd-Queue-Id: E023EC0003 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1747792106-363515 X-HE-Meta: U2FsdGVkX1+jZu670D0MxIbeeZhlRvT5jq91ptqqx/wnJ+ZEnjmYaRKo9VqGy4EMgagQAM6X17+Dm8sMpcGEBEe6XCIUzkyPTjpdSZilwxjZNGzN0Zuu884IAwEz1x74tS2srw4SnOfkEAFa/W0o3HOxHCLy+WGCsl4CDl/1gMM2r0OHZQdF1K/D4U6SaK25/95C8k0SmsCH3X28Z86liYE4+QDTqYFdLX8K93mS0iNorDLUwVvVJfgJ787IWl2S14aQ42oQ29z40jJa5v2RmtikHxVepPOCGyQzFUYA7fwqekC5BJG0hbu02Qmqq1YyiNUCuxei3wvvRjHlWCF96uLKNnIJWgAx3eQkQw1ymtL7INSb1CuXdIKEBPIVuo6itsDTvtrRxXai0jgK8hzheWhZ0fnnEGybF6ZTDNadLoaQ0uiRX1kcUtGRMd56USoV3p0ZvJFQVLyvbY7/X6KG0nKezAG8v1ksLgJyeTOFu9SmWyvd6mEuOcSy71vqFADNzZSN4GEpjiH1Ov6H0MZQ9J2FFP1y3He7i0eidjCXPdIAn0dV2nbMNifI6Ld3iTRy58bEM20sQFLqQWibGqNeCxngls8A+Gx/YiZ1MgQOOATaG1EMomlD8YuO6MuQzSlPgK+g6+n7uNWAzV5XbcFpjvfZhjXPKCyh/RU/SNaN7QCYCOz2NNXndxz+sUJEzLUN4Sijw6wp/j5kK4nb0Aejl6bRB616W2AFnPXwH/OVwyiYWn5RHAXJamO435XEW3BI1RBIOUdA8A3xcKB5sp20DAkm2n1qYysQCIxanhN9s1J7Q9eHeJW0VI/b6igLTVhE8q7ieP5QR0NU4rZxkph+Ig4Vb8SOVxo7n5741hA5S7+wCo/HEb8HZB2TjHeYZm4tMlHB3k9BHdke1+rQayRJzCB5mPe7dS9Qsi22jiTzh+nkFGRBTQyt6vO9WMOHQMAQm950uaoStkhL6G9REwm iCnHb5fF 9x6In/8LsHwauCY5FQOdsDVf6bMncRaSxt0Lfwq7N1skv6RtNNbYxuRSSPdEjQY4ln8A3M4J0Rvnvacto5SMc0vdBWNeASEf4SBKpttCt6ikcw+sD3gpr5El8vav/VvDULuKW96GWx/FxD/Q8GaaO+9qEH0BEIqKfMQj8NQ6EoelAsLwrLtI/pZTeC9wm5yGyiC9OLvcchLZBH7T7BcQUUT+IUtOvzquoEGw/6PmmKtbHwwGsidcumpQG/Dh3QdqtbTWmvUo9RrQjzk+8/8qTVM3Cb/MAjsj4bDrCqR6ojigSZsQYmsi2Yp+E9OH/CamluJKtnvOveZQf2GiqekpsWQq8NhtU+/yoJnTOBQ184y5ahvwcS5pcxbaUfw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/5/20 5:24, Jann Horn wrote: > On Mon, May 19, 2025 at 11:01 PM Alexey Gladkov wrote: >> On Mon, May 19, 2025 at 09:32:17PM +0200, Jann Horn wrote: >>> On Mon, May 19, 2025 at 3:25 PM Chen Ridong wrote: >>>> From: Chen Ridong >>>> >>>> The will-it-scale test case signal1 [1] has been observed. and the test >>>> results reveal that the signal sending system call lacks linearity. >>>> To further investigate this issue, we initiated a series of tests by >>>> launching varying numbers of dockers and closely monitored the throughput >>>> of each individual docker. The detailed test outcomes are presented as >>>> follows: >>>> >>>> | Dockers |1 |4 |8 |16 |32 |64 | >>>> | Throughput |380068 |353204 |308948 |306453 |180659 |129152 | >>>> >>>> The data clearly demonstrates a discernible trend: as the quantity of >>>> dockers increases, the throughput per container progressively declines. >>> >>> But is that actually a problem? Do you have real workloads that >>> concurrently send so many signals, or create inotify watches so >>> quickly, that this is has an actual performance impact? >>> >>>> In-depth analysis has identified the root cause of this performance >>>> degradation. The ucouts module conducts statistics on rlimit, which >>>> involves a significant number of atomic operations. These atomic >>>> operations, when acting on the same variable, trigger a substantial number >>>> of cache misses or remote accesses, ultimately resulting in a drop in >>>> performance. >>> >>> You're probably running into the namespace-associated ucounts here? So >>> the issue is probably that Docker creates all your containers with the >>> same owner UID (EUID at namespace creation), causing them all to >>> account towards a single ucount, while normally outside of containers, >>> each RUID has its own ucount instance? >>> >>> Sharing of rlimits between containers is probably normally undesirable >>> even without the cacheline bouncing, because it means that too much >>> resource usage in one container can cause resource allocations in >>> another container to fail... so I think the real problem here is at a >>> higher level, in the namespace setup code. Maybe root should be able >>> to create a namespace that doesn't inherit ucount limits of its owner >>> UID, or something like that... >> >> If we allow rlimits not to be inherited in the userns being created, the >> user will be able to bypass their rlimits by running a fork bomb inside >> the new userns. >> >> Or I missed your point ? > > You're right, I guess it would actually still be necessary to have one > shared limit across the entire container, so rather than not having a > namespace-level ucount, maybe it would make more sense to have a > private ucount instance for a container... > It sounds like the private ucounts were what I was trying to implement in version 1? It applies batch counts from the parent for each user namespace, but the approach is complex. Best regards, Ridong > (But to be clear I'm not invested in this suggestion at all, I just > looked at that patch and was wondering about alternatives if that is > actually a real performance problem...)