From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C04EC369DC for ; Wed, 7 May 2025 06:52:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB70F6B000A; Wed, 7 May 2025 02:52:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B64E86B0083; Wed, 7 May 2025 02:52:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2E066B0085; Wed, 7 May 2025 02:52:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 858596B000A for ; Wed, 7 May 2025 02:52:45 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 33641121481 for ; Wed, 7 May 2025 06:52:47 +0000 (UTC) X-FDA: 83415193974.18.2BFB77F Received: from out-181.mta1.migadu.com (out-181.mta1.migadu.com [95.215.58.181]) by imf17.hostedemail.com (Postfix) with ESMTP id 2FBF540004 for ; Wed, 7 May 2025 06:52:44 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=HKKiYY1Q; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf17.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.181 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746600765; a=rsa-sha256; cv=none; b=Z+gc1eUQQvjHbCmbEVP9zR9Ce31/lEtFsfB5TbZZ131kjptGcHtd3jlrGp3YfU0cRkeLSj Vn97HCnTgSCVSwzMF2ig0IYtnqT337W87ZFdIcDoihSE/saXPgTd/52METBJ0R2gnP9ynj aHSe0PjmdwduiGXAPJedZl1iNlDuO8s= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=HKKiYY1Q; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf17.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.181 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746600765; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8HaiMZiXg2nwNYj42sWaIow6hUDCogkFMm2uGLi8rdk=; b=3ak+t1XhbGfybTw5vsE/L3YRQRFRdoxIunYgOs5gYAWUA271EJ0BA0B4iFETQfgv4+JdIb lj0uwUkTDinOTRZ9KZvx7o8bcseqRxL/ex4JC6jJLbINjUSz6zy0E1QaZ//XTO4RzZ2Zfx TvJ0H8jz3eCOCJFkWRp/JtnWXi3wrJU= Date: Wed, 7 May 2025 06:52:35 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1746600762; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8HaiMZiXg2nwNYj42sWaIow6hUDCogkFMm2uGLi8rdk=; b=HKKiYY1QDJwP06WEgxcLyYYpnmbEJ7z0MJcFfAWzsdS23MQU3W3NcE4urZkrZ3tihR0DAv U3634YnEd2hT0crdvSEErHvZ5102hKmry0wewqS0J8fFluONG/Xcg/zDShYWVd5Y7pnkGU 5uSxAK/4Mkdm0cr2atbVratS2wiB/9k= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yosry Ahmed To: Shakeel Butt Cc: Tejun Heo , Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Michal =?utf-8?Q?Koutn=C3=BD?= , Vlastimil Babka , Sebastian Andrzej Siewior , JP Kobryn , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: Re: [RFC PATCH 3/3] cgroup: make css_rstat_updated nmi safe Message-ID: References: <20250429061211.1295443-1-shakeel.butt@linux.dev> <20250429061211.1295443-4-shakeel.butt@linux.dev> <6u7ccequ5ye3e4iqblcdeqsigindo3xjpsvkdb6hyaw7cpjddc@u2ujv7ymlxc6> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 2FBF540004 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: ofok4zchh6npw8yxwfxtmod9epghu3ae X-HE-Tag: 1746600764-95507 X-HE-Meta: U2FsdGVkX1+OODVocwx+YbpYRoLUDuzoSXGEKdxPpoHoUHkNDa+Spvm/PqRRSCqNyLlwCoA6ba1BaXcdEP//PW45HhXyGYq5kiipwfjlWH7uWI0xKwlJwnEu6FGlC6uKYPFU5UpcCzoJCEC2Q9+P7elo451LXrtdZMP/j99I1UPtkWsIuTEWTA9hbfBaYZWUhpSAN+D1cn8FZ2ktj7XzIgw+WCex/sqf7U/H3hOP7vKnMLzcvVOjYLOD8qeMg+cqvOcCCjScjz0jpkjPM/WdrhZLqTIrMM1d5GdDT+CudLbx6G9TS8dCZ5BBstoveAW61dOAl/ATH4AtlGgTBFjW8C8FFaB+87HOYJJDbuy54u/eJf6wT/BGtgcbMJMMHEN7CMER5LB/HmUX4d/QleJ6X1RCL9VvfNkq6LmGaNTBkgsdIe1F7NTDj7uBArBAYhVoJkcuSPDi9qhzCt6tSRsF2LZqe7YFCZFxMY80aDy4kZ2cWyxyI4BlMF+rS69ZJPZO3KKvxKxxv1OG2nE2UfjefZ4ju381Tel5MVO3DtRGUjoB4YazbCgpZym+jFXnyd3EPSg2MX5dbJlqpHRDqoPbRusvzqiS5tK3lCAsAR4jueiOcwYfNAsW55zME+q7Q12CL7MVgHtGjGGFGYsEtx2Gx+F5ZPkV0IBA2bUSZD++faZ4cEmB1ImTZUwyD8w3rXhVY0iWjGAU2A+fvtnHKUd/cu2AXZA4HiD6rpTppmRa19bZT7+Kj3F78Y1U5XlrSJZnGE4As0So34pS/IDt+ih57wIj/Hlzx9RgxwqG9TIgBpYy+yb3Bjfwf9RSWyHIjcDl9m1TxsN3VeBW8HwFjLzYCOjJdACXOC7Hh0FxycE81AhkCaC4Vb+caQp0ml50x7vGMVYaMMWXLD8Wit79A/Hq/IeWLFhZd+njtDSDJ+0BunkhpVlsYXX+Y0pIpmMsWwj3QLV9tCWfvRPxFXLajY8 H3+khmG3 GyPLsJ1KXqCxfa6Udmk9wY4nnBOAmLLLtvGjfATAg/dovdPtsj367Aww7rdTZKxIzMGPBTxW94J0tPS8CfyM9vq/ILSvno/aUNJmR4H6w/ORYhF84NsQR1SYf9KEyjeb+/P/N+NM0ekNayWaFUnVtkM8ZseVnP35t2rVKHbP0ZouHzRzOT0LXwzZ/3p0NJJDc/clK5WSaF5+svaLjI1AeSSE2fSE7oGzPsEtJmMQYZRQglOpV4e6V32gLjfIRL10hiUkVK6O8+Xywk2I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 06, 2025 at 12:30:18PM -0700, Shakeel Butt wrote: > On Tue, May 06, 2025 at 09:41:04AM +0000, Yosry Ahmed wrote: > > On Thu, May 01, 2025 at 03:10:20PM -0700, Shakeel Butt wrote: > > > On Wed, Apr 30, 2025 at 06:14:28AM -0700, Yosry Ahmed wrote: > > > [...] > > > > > + > > > > > + if (!_css_rstat_cpu_trylock(css, cpu, &flags)) { > > > > > > > > > > > > IIUC this trylock will only fail if a BPF program runs in NMI context > > > > and tries to update cgroup stats, interrupting a context that is already > > > > holding the lock (i.e. updating or flushing stats). > > > > > > > > > > Correct (though note that flushing side can be on a different CPU). > > > > > > > How often does this happen in practice tho? Is it worth the complexity? > > > > > > This is about correctness, so even a chance of occurance need the > > > solution. > > > > Right, my question was more about the need to special case NMIs, see > > below. > > > > > > > > > > > > > I wonder if it's better if we make css_rstat_updated() inherently > > > > lockless instead. > > > > > > > > What if css_rstat_updated() always just adds to a lockless tree, > > > > > > Here I assume you meant lockless list instead of tree. > > > > Yeah, in a sense. I meant using lockless lists to implement the rstat > > tree instead of normal linked lists. > > > > > > > > > and we > > > > defer constructing the proper tree to the flushing side? This should > > > > make updates generally faster and avoids locking or disabling interrupts > > > > in the fast path. We essentially push more work to the flushing side. > > > > > > > > We may be able to consolidate some of the code too if all the logic > > > > manipulating the tree is on the flushing side. > > > > > > > > WDYT? Am I missing something here? > > > > > > > > > > Yes this can be done but I don't think we need to tie that to current > > > series. I think we can start with lockless in the nmi context and then > > > iteratively make css_rstat_updated() lockless for all contexts. > > > > My question is basically whether it would be simpler to actually make it > > all lockless than special casing NMIs. With this patch we have two > > different paths and a deferred list that we process at a later point. I > > think it may be simpler if we just make it all lockless to begin with. > > Then we would have a single path and no special deferred processing. > > > > WDYT? > > So, in the update side, always add to the lockless list (if not already) > and on the flush side, built the udpate tree from the lockless list and > flush it. Exactly, yes. > Hopefully this tree building and flushing can be done in a > more optimized way. Is this what you are suggesting? Yes, but this latter part can be a follow up if it's not straight forward. For now we can just add use a lockless list on the update side and move updating the tree (i.e updated_next and updated_children) to the flush side.