From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74166C3ABCC for ; Tue, 13 May 2025 11:41:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F26206B00C8; Tue, 13 May 2025 07:41:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB0DB6B00C9; Tue, 13 May 2025 07:41:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D271C6B00CA; Tue, 13 May 2025 07:41:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A8F606B00C8 for ; Tue, 13 May 2025 07:41:31 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id EE55ABB8AA for ; Tue, 13 May 2025 11:41:33 +0000 (UTC) X-FDA: 83437694466.19.82A0C48 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf04.hostedemail.com (Postfix) with ESMTP id 1FBE54000A for ; Tue, 13 May 2025 11:41:30 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=u+zoIWHe; spf=none (imf04.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747136492; a=rsa-sha256; cv=none; b=Tmov4Uvu0f/BOr/a+TnZafc2yBOEz0WRl7uvQWwxEcOkL6d480pkzgteghidYwABzf3ss6 XbIAWS1s+E3ic05lwG1JISIg6demL5ae0jbIADu7TY8egi+OZ0G9UlZPJ0DNZamZvObO5p /fw876dBUUDU2Zkz9n1MunmvPsiDH9U= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=u+zoIWHe; spf=none (imf04.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747136492; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BLCbEb2NTTzv4/ODVyjqYAPhHDxBZDeZKEtBF7viCvw=; b=Ey4mMNGo9AwEjMvtcxnWXomQAMIEbHt865RmC1Ix1JeHXXXWN8Wo73QEJ0fn56RgTbH1eN zGnp/NNrLgzxx4pbXssrNs0nN6RTksenJoGqLP+1sgBzFIlh1MpHiDCZNW+ayZxzzbQv7o 4WFZtYaaEr4rlT3XYxmp0XT+S3BYCNk= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=BLCbEb2NTTzv4/ODVyjqYAPhHDxBZDeZKEtBF7viCvw=; b=u+zoIWHef6JFSMGtTvvxGch/iJ awCg/a3tBJSi76YIT0JHmabu6gBZHE7JjSzk5Qvmnfm59gMkRP7Lo0ZTpQEPeDijYGq10YWfM7onf IFSBjnUnhDQbdr+dld3GHJB3G/OiA5UQD8uUwLjwfB1E8gXVDaVFZUc/iIo4bkYI/f4ezMiYW4BXh aGzQ3SocmH5QgN5qnFGoZdjDtZnbbs82Gi845Bghn0Fiwa/BVdoCA9RfXBparJWErWzNgmq5NCmFr PVzj+37HcJq3ak06M7Qyn35+jMadKBfgIr6GGvPg9DJV6b9pYRAZdpCjxsFLmmT9sSpcz8YDSgRuu NfKBM76g==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uEo0k-0000000ArbM-31WB; Tue, 13 May 2025 11:41:26 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 2ABD930066A; Tue, 13 May 2025 13:41:26 +0200 (CEST) Date: Tue, 13 May 2025 13:41:25 +0200 From: Peter Zijlstra To: Vlastimil Babka Cc: Shakeel Butt , Andrew Morton , Tejun Heo , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Alexei Starovoitov , Sebastian Andrzej Siewior , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team , Mathieu Desnoyers Subject: Re: [PATCH 0/4] memcg: nmi-safe kmem charging Message-ID: <20250513114125.GE25763@noisy.programming.kicks-ass.net> References: <20250509232859.657525-1-shakeel.butt@linux.dev> <2e2f0568-3687-4574-836d-c23d09614bce@suse.cz> <07e4e8d9-2588-41bf-89d4-328ca6afd263@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <07e4e8d9-2588-41bf-89d4-328ca6afd263@suse.cz> X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1FBE54000A X-Stat-Signature: 7foc75pt9hparer45ew84b6yjrxj7tpf X-HE-Tag: 1747136490-274984 X-HE-Meta: U2FsdGVkX1+W5/1HMrsEV8MMTBODET9SspmHbLYblOOYd+RVY3fM3qXHUSUSv0StRdC3xgzSItOYQwuHvfYhdDpwcPWfnro07U4piD9Rn3/XqLucNTUHplnFXFHE59W/UXSRqGW47OADiem3xvPvKAt+GvZNeiOpe8KvArycnkjuXVkPOy3a6MSlohjPLBRr65h/XgYJcvl4n8tUBFKLImksbBY0hEyaKMNkoiMeMSUOTB0Fag6DmfXNmIJ1Ygi5sv40DCBbdB/R1snmfwnAuOO6lZFrUddSErEYcrGmKQq4BQdbeDDr8A2X6i7kW8q9TIbGW4PVaBooGUvuxQbopUnqozxNIUdn2AoNCeKrF0lCHN3yZZiZM/1Q/J4m/K8FbRJdvXoC/ETnL1D/0NOwciCPDjm0wS/0gAc1YoVljMrAZGIcW5zup92rG3K2lzi3pZwpkBO4UIPzTjFRJiQ9qHxjWfz8tS0WODp2DtQ0s1u5LJwYwHAPkzIMBF/Y70q7yxRaFT1Pnyu1lPTV2sXrINEtPO7C2wsm2tLTkI0ROFcdR5Q/C3RmfM4778QFN7lc0IyjH1r9quueP1LnERDzyq9qdd+gE09Uel+tvWshnmvw8C9BlpOtchh5a0XDsjwY1SflQ7wnFB4v1Y/1Zp9kBWSwAGMqdu7mOW5uQ92VVmJUiHpL7V1EA8uZXUYqXpnh+YEFUjblwe0Olafwe6ToY9/b2Sletq5kNkmbyq4RDQD4fUPt0C9IkIxMTx2PA8rQhMRjq60pedbYhcnwGz2mKplLwdwRsxr2c0ljNPgNo602zuqOjlZUvvqdMAozqBPZNIMHeHZzcyqCqWdvy98r+/YyEQTRopY8gZOHZoE/8S2f+8L56gj7Xnzkna6GY6+tdSWIJ3FdSBNeLLQOilHQVBNxSET+AgKkuHF9UVGxQSlK/4T2vT+1z7AgR6xskrwhMomdW/Sg/yK5tcpCkpE 9/QibJzc wj9CdkIvX2E5f/SdLnmWm2t6y486zh8anx5xinEVcHG9MYiDy2ekEzaXfISXv+TbT6HYND62C4zL5JfluOVgGHvWiDALaoHEgAJGhZa16dN19DgBRRtczLz+VBEhqsyoO2fdJYcixyzX6esMfij5DpnRnYX5F/ZhAovBjq/zKcchTxO4S7LFPwLkZcHHiBy79NOvzbi8wdZRixwkc1K4szkiK9CzhrC5p70PK5DapwnXa2Ls= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 13, 2025 at 09:15:23AM +0200, Vlastimil Babka wrote: > >> > The initial prototype tried to make memcg charging infra for kernel > >> > memory re-entrant against irq and nmi. However upon realizing that > >> > this_cpu_* operations are not safe on all architectures (Tejun), this > >> > >> I assume it was an off-list discussion? > >> Could we avoid this for the architectures where these are safe, which should > >> be the major ones I hope? IIRC Power64 has issues here, 'funnily' their local_t is NMI safe. Perhaps we could do the same for their this_cpu_*(), but ideally someone with actual power hardware should do this ;-) > > Yes it was an off-list discussion. The discussion was more about the > > this_cpu_* ops vs atomic_* ops as on x86 this_cpu_* does not have lock > > prefix and how I should prefer this_cpu_* over atomic_* for my series on > > objcg charging without disabling irqs. Tejun pointed out this_cpu_* are > > not nmi safe for some archs and it would be better to handle nmi context > > separately. So, I am not that worried about optimizing for NMI context > > Well, we're introducing in_nmi() check and different execution paths to all > charging. This could be e.g. compiled out for architectures where this_cpu* > is NMI safe or they don't have NMIs in the first place. Very few architectures one would care about do not have NMIs. Risc-V seems to be the exception here ?!? > > but your next comment on generic_atomic64_* ops is giving me headache. > > > >> > >> > series took a different approach targeting only nmi context. Since the > >> > number of stats that are updated in kernel memory charging path are 3, > >> > this series added special handling of those stats in nmi context rather > >> > than making all >100 memcg stats nmi safe. > >> > >> Hmm so from patches 2 and 3 I see this relies on atomic64_add(). > >> But AFAIU lib/atomic64.c has the generic fallback implementation for > >> architectures that don't know better, and that would be using the "void > >> generic_atomic64_##op" macro, which AFAICS is doing: > >> > >> local_irq_save(flags); \ > >> arch_spin_lock(lock); \ > >> v->counter c_op a; \ > >> arch_spin_unlock(lock); \ > >> local_irq_restore(flags); \ > >> > >> so in case of a nmi hitting after the spin_lock this can still deadlock? > >> > >> Hm or is there some assumption that we only use these paths when already > >> in_nmi() and then another nmi can't come in that context? > >> > >> But even then, flush_nmi_stats() in patch 1 isn't done in_nmi() and uses > >> atomic64_xchg() which in generic_atomic64_xchg() implementation also has the > >> irq_save+spin_lock. So can't we deadlock there? > > > > I was actually assuming that atomic_* ops are safe against nmis for all > > archs. We have HAVE_NMI_SAFE_CMPXCHG for this -- there are architectures where this is not the case -- but again, those are typically oddball archs you don't much care about. But yes, *64 on 32bit archs is generally not NMI safe. > I looked at atomic_* ops in include/asm-generic/atomic.h and it > > is using arch_cmpxchg() for CONFIG_SMP and it seems like for archs with > > cmpxchg should be fine against nmi. I am not sure why atomic64_* are not > > using arch_cmpxchg() instead. I will dig more. Not many 32bit architectures have 64bit cmpxchg. We're only now dropping support for x86 chips without CMPXCHG8b. > Yeah I've found https://docs.kernel.org/core-api/local_ops.html and since it > listed Mathieu we discussed on IRC and he mentioned the same thing that > atomic_ ops are fine, but the later added 64bit variant isn't, which PeterZ > (who added it) acknowledged. > > But there could be way out if we could somehow compile-time assert that > either is true: > - CONFIG_HAVE_NMI=n - we can compile out all the nmi code Note that in_nmi() is not depending on HAVE_NMI -- nor can it be. Many architectures treat various traps as NMI-like, even though they might not have real NMIs. > - this_cpu is safe on that arch - we can also compile out the nmi code There is no config symbol for this presently. > - (if the above leaves any 64bit arch) its 64bit atomics implementation is safe True, only because HPPA does not in fact have NMIs. > - (if there are any 32bit applicable arch left) 32bit atomics should be > enough for the nmi counters even with >4GB memory as we flush them? and we > know the 32bit ops are safe Older ARM might qualify here.