From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAFD6C7EE30 for ; Thu, 26 Jun 2025 23:38:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 598258D0002; Thu, 26 Jun 2025 19:38:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5224A8D0001; Thu, 26 Jun 2025 19:38:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E8DD8D0002; Thu, 26 Jun 2025 19:38:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 272F98D0001 for ; Thu, 26 Jun 2025 19:38:34 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9A22B1A0454 for ; Thu, 26 Jun 2025 23:38:33 +0000 (UTC) X-FDA: 83599168506.04.58A9B2F Received: from out-186.mta1.migadu.com (out-186.mta1.migadu.com [95.215.58.186]) by imf14.hostedemail.com (Postfix) with ESMTP id 9679A100004 for ; Thu, 26 Jun 2025 23:38:31 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="Xlm/+sHJ"; spf=pass (imf14.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.186 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750981112; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=x+p1GkYHskRpx352iokJ2/ibgcF8JqaS68YqWzxLlIw=; b=imI/vpp4aW/07ny0mlywrvc/dMmYhAE8OMau3QkbG0w9dg8ocmXs2AQTJz9a6HJ2blqeRY cNs3cDKzm3XnT0OJkliT88k2ihQ01Fh/tvri7/D69fmpC7FFmmD68/ELrcu0U76GWnqEJl H2YrqoTczI9dVFAyiVAyzNjX6OfArbs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750981112; a=rsa-sha256; cv=none; b=8EhLF6SClFLbSglv0Pc6Z3WYTSqvGeG8aZYhxFAFQhIdzq1Q44RKfenncTLHiX6nizLhw5 YFZFAFlgqi7UBtS5FrwDPIvVJx4+xh2JYratU2iDu+ZAoMh7QhC3L8KDRFGIpO190eE5xL nWg40R/scZ3kHLC8zd7rkMjTSCKvGoQ= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="Xlm/+sHJ"; spf=pass (imf14.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.186 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Thu, 26 Jun 2025 16:38:18 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1750981109; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=x+p1GkYHskRpx352iokJ2/ibgcF8JqaS68YqWzxLlIw=; b=Xlm/+sHJ9a9AvjfJxvljezL9tKVFP6Hw7NIxfinr8LnTmmjWSVV0Lmf5HOIoclE4Y0QD0Q 95G06zuBU+BPZInr+tIAOEQpETahHM0iwa71DHp5wUuoxB/PzjQeZ7pyZSNRIS91f6I1zR J4ctzmswx4CBctLkhdeHLeneRGyqEh4= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Tejun Heo Cc: "Paul E . McKenney" , Andrew Morton , JP Kobryn , Johannes Weiner , Ying Huang , Vlastimil Babka , Alexei Starovoitov , Sebastian Andrzej Siewior , Michal =?utf-8?Q?Koutn=C3=BD?= , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: Re: [PATCH] llist: add [READ|WRITE]_ONCE tags for llist_node Message-ID: References: <20250626190550.4170599-1-shakeel.butt@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250626190550.4170599-1-shakeel.butt@linux.dev> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 9679A100004 X-Stat-Signature: 9eh5frufd1kdurjem88egazjctj7q3zu X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1750981111-5008 X-HE-Meta: U2FsdGVkX19eOyOfbjf61pGoxm3sBNIrvwf5PQ+Tt+VgPMKgmCIaj0NISrRSYLMG6OBBvef3QnPl2lAIHIQN6UTV+nSBhS2ewsLBAIIsoFpFSOgrvFrI/Rm8nnCMYELTTDyg20EQzF2bj5zbqv0Rohyr2wlgNLXOKPa2bhVZ5cJeLsXk/w2ySdI89jTi/pr693ryPo91FoGZfsD7q9gMXjN68CXqu084e8n4MrxLdjFgOQm5WyjnobeAJDH4s4K62L8AIgEFMeYr/AQVR9QiAJh1rxgKdT++VsZ9YKEo2NiIphbqxOGb+pOCmYf19xK0WiILAdWw8p4LuBx9Z18pJ7sVYxUaIedQ9NzoNKzYRwFHyjC23PAhRxYE4FFnT9ck6s3I4ChKXOJ6zIxhV0CbHOiGoHZo8ZDFmiGQayJtS02Q5fQKP61+HO0vMhlBsPbZmgBeMH0OC1DgxOh1TWMEZJfkIHdPGqfWMWlvy+Es6irbfAwxDKfsiEDpugqV/SnyKt0fkiLEYLbLI+3hGGOemEPi2k1/YerrtnKq1v9HfnXRSAQ/K5VIg8g8dSZq4sON+m9tHfS6y1GtOsNyENGdZafjgmAY/rQ8J/73tUxU5wDSy3ECRfAZmZNRYlLRlWKwbQO/LH8BdSk70di37a7xvr1GPcS8/ypr0/smZWTVQz8AHuylnzVUSGoeqmKjDi3/XH0JhFqU89sP6As+kY0hU2WYuolR1hSeFFhCPvx9JIVDKI7XLnWcqNl0LH3PUAPOlk908UiOIDhX8KrAV91tiyacmagA67ceyhBQggU/QOs4pz95VzFUR7B7iClBKm1vRIU3bS5nl3Z4TCDIER9V98ipdfVRu3F+3/jusuca5aRgKHq/zySEtDTIfN1iOWfb8Y39/77mBfQXnXr0VRofTTPDmRZstXinRr7pZ23Nbdq+6eRN0gpjZQmEqQLAq125dNdbW39ydA1Ii3NgtSe 1NzAIZ9f rwHnM2WzvT5qQzu+Ky68spVdL0zhNY9nKtdn4LDuVFulnonf3u3VBg7LocdM9uGp2oTisz6VvmfBq0sUEZ97MqwniM5KaUqUD5Gs8xvXVbQZlhJrk/hv7siiqUvGfK79HIwz9HEIedpwZoelrgIpL4k3QZl1mQv6WUt1bLUJaFxvVBAA9qK5+H1WcBIe4Fsho2TimPAHk0k7zWewDPyqLsy6YsDtfVhiS0n1HIGmS4R8u+0fsS2xF+Ramhr07+6iYB2eyvJt/LV7Tc63ZRXAx9UPeMdhoXsDh144S6bHgilxcYGpIjNAcjn4C8pZ6cBJWAQ9igwx3CdTVC3k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 26, 2025 at 12:05:50PM -0700, Shakeel Butt wrote: > Before the commit 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi > safe"), the struct llist_node is expected to be private to the one > inserting the node to the lockless list or the one removing the node > from the lockless list. After the mentioned commit, the llist_node in > the rstat code is per-cpu shared between the stacked contexts i.e. > process, softirq, hardirq & nmi. > > KCSAN reported the following race: > > Reported by Kernel Concurrency Sanitizer on: > CPU: 60 UID: 0 PID: 5425 ... 6.16.0-rc3-next-20250626 #1 NONE > Tainted: [E]=UNSIGNED_MODULE > Hardware name: ... > ================================================================== > ================================================================== > BUG: KCSAN: data-race in css_rstat_flush / css_rstat_updated > write to 0xffffe8fffe1c85f0 of 8 bytes by task 1061 on cpu 1: > css_rstat_flush+0x1b8/0xeb0 > __mem_cgroup_flush_stats+0x184/0x190 > flush_memcg_stats_dwork+0x22/0x50 > process_one_work+0x335/0x630 > worker_thread+0x5f1/0x8a0 > kthread+0x197/0x340 > ret_from_fork+0xd3/0x110 > ret_from_fork_asm+0x11/0x20 > read to 0xffffe8fffe1c85f0 of 8 bytes by task 3551 on cpu 15: > css_rstat_updated+0x81/0x180 > mod_memcg_lruvec_state+0x113/0x2d0 > __mod_lruvec_state+0x3d/0x50 > lru_add+0x21e/0x3f0 > folio_batch_move_lru+0x80/0x1b0 > __folio_batch_add_and_move+0xd7/0x160 > folio_add_lru_vma+0x42/0x50 > do_anonymous_page+0x892/0xe90 > __handle_mm_fault+0xfaa/0x1520 > handle_mm_fault+0xdc/0x350 > do_user_addr_fault+0x1dc/0x650 > exc_page_fault+0x5c/0x110 > asm_exc_page_fault+0x22/0x30 > value changed: 0xffffe8fffe18e0d0 -> 0xffffe8fffe1c85f0 > > $ ./scripts/faddr2line vmlinux css_rstat_flush+0x1b8/0xeb0 > css_rstat_flush+0x1b8/0xeb0: > init_llist_node at include/linux/llist.h:86 > (inlined by) llist_del_first_init at include/linux/llist.h:308 > (inlined by) css_process_update_tree at kernel/cgroup/rstat.c:148 > (inlined by) css_rstat_updated_list at kernel/cgroup/rstat.c:258 > (inlined by) css_rstat_flush at kernel/cgroup/rstat.c:389 > > $ ./scripts/faddr2line vmlinux css_rstat_updated+0x81/0x180 > css_rstat_updated+0x81/0x180: > css_rstat_updated at kernel/cgroup/rstat.c:90 (discriminator 1) > > These are expected race and a simple READ_ONCE/WRITE_ONCE resolves these > reports. Tejun privately communicated that though the race is benign, we should document it better instead of just silencing kcsan. More specifically the llist_on_list() check on the update side and the init_llist_node() reset on the flush side needs to coornidate to guarantee that either the updater should get false from llist_on_list() and it adds the node to the lockless list or the flush side get the updated stats from concurrent updaters. To guarantee that, on the update side we need a barrier between stats update and llist_on_list() check and on the flush side, a barrier in between init_llist_node() and the actual stats flush. However do we really need such a guarantee and can we be fine with the race? Particularly the update side is very performance critical path and adding a barrier might be very costly. I think this race is benign and we can just document it and then ignore it. Tejun, is something like following acceptable? I know you mentioned to add the barrier on the flush but I am wondering if we are not adding barrier on the update side, what's the point to add it on the flush side. Let me know if you still prefer to add the barrier on the flush side. diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index c8a48cf83878..02258b43abb3 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -86,8 +86,12 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) return; rstatc = css_rstat_cpu(css, cpu); - /* If already on list return. */ - if (llist_on_list(&rstatc->lnode)) + /* + * If already on list return. + * + * TODO: add detailed comment on the potential race. + */ + if (data_race(llist_on_list(&rstatc->lnode))) return; /* @@ -145,9 +149,13 @@ static void css_process_update_tree(struct cgroup_subsys *ss, int cpu) struct llist_head *lhead = ss_lhead_cpu(ss, cpu); struct llist_node *lnode; - while ((lnode = llist_del_first_init(lhead))) { + while ((lnode = data_race(llist_del_first_init(lhead)))) { struct css_rstat_cpu *rstatc; + /* + * TODO: add detailed comment and maybe smp_mb(). + */ + rstatc = container_of(lnode, struct css_rstat_cpu, lnode); __css_process_update_tree(rstatc->owner, cpu); }