From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95EFAC7618D for ; Tue, 4 Apr 2023 07:06:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD99A6B0071; Tue, 4 Apr 2023 03:06:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D89506B0074; Tue, 4 Apr 2023 03:06:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C50C86B0075; Tue, 4 Apr 2023 03:06:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B21636B0071 for ; Tue, 4 Apr 2023 03:06:19 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7866A1A05E7 for ; Tue, 4 Apr 2023 07:06:19 +0000 (UTC) X-FDA: 80642824878.27.783B507 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf20.hostedemail.com (Postfix) with ESMTP id 3A9E91C0019 for ; Tue, 4 Apr 2023 07:06:15 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=none; spf=pass (imf20.hostedemail.com: domain of yebin10@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=yebin10@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680591977; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=m8rN4AbjBEY7XtMR3u0e6b2qbshtw5XjAdH5X1twfhc=; b=m/x8Byzda8WdZ3ZsOz5QlQkgMlTH0F3YlNT/Ek+Vhle1oSiCPlyT/ZuLSyXefbJ2n8kXJ5 oBp2Sr4+9JXXJdDtUVOsbuxPSaBgLYs8mP7Z5XySLV1MKHSD9dtU+UQiORQg63HomxfuQ1 uo6xwWtEZx7vftDOkqpGXufKdrBHPzE= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none; spf=pass (imf20.hostedemail.com: domain of yebin10@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=yebin10@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680591977; a=rsa-sha256; cv=none; b=4Cxick0GMAlsWbdyxSzjekKGQEyz92AUbWFF4a5agkPJYiVPoDEgtAV3kttktpYHmwC7Iq htykmOMOT2qvvI1b9tEOCgqqv/xaN922/j2NMkYdkvCUPKhZjZzxtVYgg1NuhDq20uofYP /4phAS82tLGskL7sE5b7XxQn/pG+eCI= Received: from canpemm500010.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4PrJc20MBtzSqJ5; Tue, 4 Apr 2023 15:02:26 +0800 (CST) Received: from [10.174.178.185] (10.174.178.185) by canpemm500010.china.huawei.com (7.192.105.118) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Tue, 4 Apr 2023 15:06:09 +0800 Subject: Re: [PATCH 2/2] lib/percpu_counter: fix dying cpu compare race To: Yury Norov , Ye Bin References: <20230404014206.3752945-1-yebin@huaweicloud.com> <20230404014206.3752945-3-yebin@huaweicloud.com> CC: , , , , , , , From: "yebin (H)" Message-ID: <642BCC61.70507@huawei.com> Date: Tue, 4 Apr 2023 15:06:09 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.178.185] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To canpemm500010.china.huawei.com (7.192.105.118) X-CFilter-Loop: Reflected X-Stat-Signature: 3w6i3nb55aezjdqaiqybh4e9sdkqwp7q X-Rspam-User: X-Rspamd-Queue-Id: 3A9E91C0019 X-Rspamd-Server: rspam06 X-HE-Tag: 1680591975-68570 X-HE-Meta: U2FsdGVkX1/N9NKxNd0Zdx+NzyHldCtXfVvzGb7xB25cQfMWxd2P1a/VWN2lDh4h8ITY9wCg8HFfbS0NYAI+5eTQ+ej2ldpw/qiGPe5XkNMXf//L5oq00rxmPxemfDQ80n42190lYXX8J4sJh51O9n9gSCLO9StTCloaewekYIAKq8F7px3wAKPVTNTiT62AyMHTEcKfu6SthDzvPvBYPboaar7cRAF9Jqyru8UVHOzyfKipoHF5LLAtuKxkvXrTHuctSGfPqupWZz4j3d4ZuLy3hUYKlhZxH6QA71P5cXe8fU2/kLrSk/A+RiVOkxE7+sVtbRtOBe2FvegqGwVCOcLalHFbUaZzFk28XrkXLeAUWYCXB8+/gNIEBVUWqkZfludNS1tzQv66d6wfIEtHDh2VREd9Pe8xOfuWb+aTkRCdRhRadJc8zYJO9e4SOAJmAMrHuJ1uI3LtQN7S5eRkR9hnc1vfr7+xvi4yT9ehSobKtAZIQBIMMeU4ee4sKoo5C4iJHgKX2uR1EqzKud31+vHctyFy66c23QbWXD5Sg9dmjISEjqyzDyNZoZQq/uTba60k1FPHeNiNaHQT81b7NyJvfNuRTEAEutFiXZCelBox/ur2ERim4r6jfT5Qo5XJOFFW+kih1J4cTJzNc5yBniz2Vtkw0Hy5YITgpEhZDIjI21YWgWf9cg2CM4wxCh/JF7f3NdSYJguGN39uTcJL11MD03DOILdA2U3kAftyCjMGlvZDrjRFg78jn4SOYvJwsFGWlrrkocQAUIOE1HfXX0Y+/dSlXrcmNZGMy57R7MUTPNAD5WcIJJhuNctjTpkdXinbQrnjXMkLZDI9leNHkgGS4DbXxueZSqDkSXGfUBl8KAJmjmAMS9K8Fy9gRmddohMaDpl1ex4o/a8U5l++jSDk0PsTYJ3SIeSGfy38z3sUnkHlfkiGkkgzFCN3238HvPRaqeKO8OG3JBwIgra oM42vC6W vGEWhI9FJH10i+oJIB/sUCPizVuil8BTId30E9GUhZ/GkfawXMlc8x4c7TnOLa8F8Tx+SvP5sOFPCt56ARAT54L7YZCYASD6T9cOmICD7hY47OqEU4ahN4lYdEqLmveVFdKfpFncHP49wGukeemVDPX+I6pXX9v1ptJD1wOVkoHUzJKP+pQDELOiix9dpoJ70NxiIcFGkLthOIIHLLng2XjJpaE7mc6hTQXQk2cdIKFb/13TjtzHjwOiQHNLkvM+GcGEnA1eIj1iBKUbYkURiFYSS20xDO2nlHIXiHKEWkhsEJmr7vxVWLqctVIKIe03xB8j/UIqZ8kAM2sDcRfsBM3q3yDsv+Y9KGIj6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/4/4 10:50, Yury Norov wrote: > On Tue, Apr 04, 2023 at 09:42:06AM +0800, Ye Bin wrote: >> From: Ye Bin >> >> In commit 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") a race >> condition between a cpu dying and percpu_counter_sum() iterating online CPUs >> was identified. >> Acctually, there's the same race condition between a cpu dying and >> __percpu_counter_compare(). Here, use 'num_online_cpus()' for quick judgment. >> But 'num_online_cpus()' will be decreased before call 'percpu_counter_cpu_dead()', >> then maybe return incorrect result. >> To solve above issue, also need to add dying CPUs count when do quick judgment >> in __percpu_counter_compare(). > Not sure I completely understood the race you are describing. All CPU > accounting is protected with percpu_counters_lock. Is it a real race > that you've faced, or hypothetical? If it's real, can you share stack > traces? > >> Signed-off-by: Ye Bin >> --- >> lib/percpu_counter.c | 11 ++++++++++- >> 1 file changed, 10 insertions(+), 1 deletion(-) >> >> diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c >> index 5004463c4f9f..399840cb0012 100644 >> --- a/lib/percpu_counter.c >> +++ b/lib/percpu_counter.c >> @@ -227,6 +227,15 @@ static int percpu_counter_cpu_dead(unsigned int cpu) >> return 0; >> } >> >> +static __always_inline unsigned int num_count_cpus(void) > This doesn't look like a good name. Maybe num_offline_cpus? num_count_cpus() include online CPUs and offline CPUs, use num_offline_cpus() doesn't seem appropriate either. > >> +{ >> +#ifdef CONFIG_HOTPLUG_CPU Perhaps we need to add a memory barrier to setting and reading __num_dying_cpu. + return (num_online_cpus() + num_dying_cpus()); > ^ ^ > 'return' is not a function. Braces are not needed > > Generally speaking, a sequence of atomic operations is not an atomic > operation, so the above doesn't look correct. I don't think that it > would be possible to implement raceless accounting based on 2 separate > counters. > > Most probably, you'd have to use the same approach as in 8b57b11cca88: > > lock(); > for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) > cnt++; > unlock(); > > And if so, I'd suggest to implement cpumask_weight_or() for that. > >> +#else >> + return num_online_cpus(); >> +#endif >> +} >> + >> /* >> * Compare counter against given value. >> * Return 1 if greater, 0 if equal and -1 if less >> @@ -237,7 +246,7 @@ int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch) >> >> count = percpu_counter_read(fbc); >> /* Check to see if rough count will be sufficient for comparison */ >> - if (abs(count - rhs) > (batch * num_online_cpus())) { >> + if (abs(count - rhs) > (batch * num_count_cpus())) { >> if (count > rhs) >> return 1; >> else >> -- >> 2.31.1 > . >