From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E0FAC3ABC9 for ; Tue, 13 May 2025 07:15:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB3DB6B000A; Tue, 13 May 2025 03:15:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E626F6B0083; Tue, 13 May 2025 03:15:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D02176B0085; Tue, 13 May 2025 03:15:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B20FA6B000A for ; Tue, 13 May 2025 03:15:27 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id BEE001A1A5C for ; Tue, 13 May 2025 07:15:28 +0000 (UTC) X-FDA: 83437023936.11.D7489E0 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf08.hostedemail.com (Postfix) with ESMTP id 4CD8D160008 for ; Tue, 13 May 2025 07:15:26 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=2ROraWs7; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=p0MMbgRl; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=2ROraWs7; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=p0MMbgRl; spf=pass (imf08.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747120526; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=be8oLaF1XY+ew3NrcaBsZrwtLFQ8Y+wmaaeGX1M5udQ=; b=gZrKx9VVrUP2WnHkWUWKqaV2n5PLHCDTliTNCJApmYEWLdbGqUCp/0QyiqVgJbNFrnlnlv GMZLv+JhiaIRPDXc5TY4IaisN+WpBUpNkc3Ht7T3+DQnsg/Ccc5efwLhOyIcOyBUSUbR9A na/A/DXE3I3kQ0eyHAa5fzIyRf7lego= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=2ROraWs7; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=p0MMbgRl; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=2ROraWs7; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=p0MMbgRl; spf=pass (imf08.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747120526; a=rsa-sha256; cv=none; b=TCoW4zWKIWybWBSIEWmtHRmNSnGlTXPIspAKecujehjq3iwsca6DR1IiAcjKSqZ1cIKDZX eg77S6xDHixUCRLL9D6WJt9u1MFmXgw9rHo71xtE7TERcfggxbkMN2bGgGtzwEpwy/u3Xo jHNzMboFxp5BoOaceh3GlBCS/bDrulI= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 447B01F443; Tue, 13 May 2025 07:15:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1747120524; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=be8oLaF1XY+ew3NrcaBsZrwtLFQ8Y+wmaaeGX1M5udQ=; b=2ROraWs7n7h8YDCl7bDSG0oIj+Cp+2cg0DPODFWCnbfdYwjSLEMsUAXWHkhExob9HSav53 7heEoJS9XabZoDlnx4ALiNtc2xJ1J7v0Yi1Mb3NjL1RFFpBt7CaKSlwZIFpS7qYiLoDYDJ w4JwIciqwPCSp6Mv8cI1ydsaPNDVxxg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1747120524; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=be8oLaF1XY+ew3NrcaBsZrwtLFQ8Y+wmaaeGX1M5udQ=; b=p0MMbgRlMPZl8K82MK4rdhKr93W8rKYimUD6vBwRNDQRO7CUrXOKiewqO0K5dWALz44m1Y DkIVOHi56A1xQ8Cg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1747120524; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=be8oLaF1XY+ew3NrcaBsZrwtLFQ8Y+wmaaeGX1M5udQ=; b=2ROraWs7n7h8YDCl7bDSG0oIj+Cp+2cg0DPODFWCnbfdYwjSLEMsUAXWHkhExob9HSav53 7heEoJS9XabZoDlnx4ALiNtc2xJ1J7v0Yi1Mb3NjL1RFFpBt7CaKSlwZIFpS7qYiLoDYDJ w4JwIciqwPCSp6Mv8cI1ydsaPNDVxxg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1747120524; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=be8oLaF1XY+ew3NrcaBsZrwtLFQ8Y+wmaaeGX1M5udQ=; b=p0MMbgRlMPZl8K82MK4rdhKr93W8rKYimUD6vBwRNDQRO7CUrXOKiewqO0K5dWALz44m1Y DkIVOHi56A1xQ8Cg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id F31981365D; Tue, 13 May 2025 07:15:23 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id S4mxLYvxImjwVgAAD6G6ig (envelope-from ); Tue, 13 May 2025 07:15:23 +0000 Message-ID: <07e4e8d9-2588-41bf-89d4-328ca6afd263@suse.cz> Date: Tue, 13 May 2025 09:15:23 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/4] memcg: nmi-safe kmem charging Content-Language: en-US To: Shakeel Butt Cc: Andrew Morton , Tejun Heo , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Alexei Starovoitov , Sebastian Andrzej Siewior , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team , Mathieu Desnoyers , Peter Zijlstra References: <20250509232859.657525-1-shakeel.butt@linux.dev> <2e2f0568-3687-4574-836d-c23d09614bce@suse.cz> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Action: no action X-Rspam-User: X-Rspamd-Queue-Id: 4CD8D160008 X-Rspamd-Server: rspam09 X-Stat-Signature: 4y8dajcp5hfi816bnqjothmbuw6urh81 X-HE-Tag: 1747120526-113972 X-HE-Meta: U2FsdGVkX194w0p5SG4lj0YPrTGIKXV3v2c+xzGE8FA302A7/nsmqrYeo9RzMuzJnx4M0wZ8HjwCJqLROMMPbwQzfSLOAXdzZGYp7sSDESGdpv9N6qkEiIy2/S7fNBdK6chZVqrvEIe5ZXvKwcivzePys93oJ0b/MnllpwtGxIPVNj66lYgzefNyJrqaXRkmhGmC/7uIaXG2WJUNADQfVC8pZobeQCIVj0M973Z0a2R7aTI2TajYxqJYPf1TIQAyP+ZSc73b6UvbB6ZVAbKhN9VhTNieGVsW81QS9PPA+YVSx+ggVNpFZSFcCBsakZHV5QoScIXcqjYv6+wBRkhlYBkLzw13LBLn7diltLd8sDGNkTEm1YxXfCUe42ipwBpzuduVzGPf3lsrRsr4oPAgEAjm2nt89Eay+s9Dl2CzUJBdoGSCMKCsz4k7gkTZ2MN67jE33JerjdqIhdQ1H4+TjCzru9BPqY9rTV9Vp2kTCgRZx4dmsXwLO66K3JCAT7kqD9t4E4tfhVjSvLKoYljSd5gU9bc7jd5FbNcB3QHXuzMYzKJEyx7zJ7my66Mpg3U0rea9eRItFIzLnVSvH++tCRKv90PVaGcCnJ+eHr606ziJPVnlGzqDAc2fwKyqsT3T30xmMgaogDGmOexK14yJbD+w+US8gZj+5u4wXpLw3ufO4gVGesXjqi+Ely6X0w/c9rfFUCZtp9bjd24eb5e8Z1IkfBvbmw9eYkKQ+9X+d45csfDuEXRYnkjk39l0Jc3mNn11Mq8O0csRaMfM4J8DU0ACFD9iycwT9b3B0zJor+tLjqJ79orlKu6gvib2NsJFNZdaOzbmmA0t4q3cmiUWdbJq2jhKciG5rsUxr8WNslmWaaevaH7EOTsUX8EavxDthYe44x4+GxzhbDLuievG+q//0FHTz+xjY2WO62guIH754HcocKfxFa/SYKBnHEjOW7k4kncykqOg+tVCfMj eCL12eOo gJp1D1t1n47KzKAkXC/SDZanOrGv8LyUmEeQ2Knz8DNrKWKaQlLRf4xizmWZLibp/QxWS671V3aaSWz8NNMWYi+Wb/LHn2vbrvxCvBlvoaODI6S/THGrlbYVMXS6W1LLHchZuaJSGGDwK6lLnrieiZaK+EI20lREjkvfDBh741hqhBVHhX/zTNB44f/JJE62jEDrrr05qak/dr13fbsZ67Z8EAEg98oJ+tQP91HVIoddLfxx96Q1qtUlcRglO2xCyPnuFZGVEOe9PNPWx9e9r6bkDaPrbXcPwZ2sWA9dfuow2cwhpfEbDWYRBNnQvZdNNOFM8J9wr8oRqCCFbL9HtwZsBuw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 5/12/25 21:12, Shakeel Butt wrote: > I forgot to CC Tejun, so doing it now. > > On Mon, May 12, 2025 at 05:56:09PM +0200, Vlastimil Babka wrote: >> On 5/10/25 01:28, Shakeel Butt wrote: >> > BPF programs can trigger memcg charged kernel allocations in nmi >> > context. However memcg charging infra for kernel memory is not equipped >> > to handle nmi context. This series adds support for kernel memory >> > charging for nmi context. >> > >> > The initial prototype tried to make memcg charging infra for kernel >> > memory re-entrant against irq and nmi. However upon realizing that >> > this_cpu_* operations are not safe on all architectures (Tejun), this >> >> I assume it was an off-list discussion? >> Could we avoid this for the architectures where these are safe, which should >> be the major ones I hope? > > Yes it was an off-list discussion. The discussion was more about the > this_cpu_* ops vs atomic_* ops as on x86 this_cpu_* does not have lock > prefix and how I should prefer this_cpu_* over atomic_* for my series on > objcg charging without disabling irqs. Tejun pointed out this_cpu_* are > not nmi safe for some archs and it would be better to handle nmi context > separately. So, I am not that worried about optimizing for NMI context Well, we're introducing in_nmi() check and different execution paths to all charging. This could be e.g. compiled out for architectures where this_cpu* is NMI safe or they don't have NMIs in the first place. > but your next comment on generic_atomic64_* ops is giving me headache. > >> >> > series took a different approach targeting only nmi context. Since the >> > number of stats that are updated in kernel memory charging path are 3, >> > this series added special handling of those stats in nmi context rather >> > than making all >100 memcg stats nmi safe. >> >> Hmm so from patches 2 and 3 I see this relies on atomic64_add(). >> But AFAIU lib/atomic64.c has the generic fallback implementation for >> architectures that don't know better, and that would be using the "void >> generic_atomic64_##op" macro, which AFAICS is doing: >> >> local_irq_save(flags); \ >> arch_spin_lock(lock); \ >> v->counter c_op a; \ >> arch_spin_unlock(lock); \ >> local_irq_restore(flags); \ >> >> so in case of a nmi hitting after the spin_lock this can still deadlock? >> >> Hm or is there some assumption that we only use these paths when already >> in_nmi() and then another nmi can't come in that context? >> >> But even then, flush_nmi_stats() in patch 1 isn't done in_nmi() and uses >> atomic64_xchg() which in generic_atomic64_xchg() implementation also has the >> irq_save+spin_lock. So can't we deadlock there? > > I was actually assuming that atomic_* ops are safe against nmis for all > archs. I looked at atomic_* ops in include/asm-generic/atomic.h and it > is using arch_cmpxchg() for CONFIG_SMP and it seems like for archs with > cmpxchg should be fine against nmi. I am not sure why atomic64_* are not > using arch_cmpxchg() instead. I will dig more. Yeah I've found https://docs.kernel.org/core-api/local_ops.html and since it listed Mathieu we discussed on IRC and he mentioned the same thing that atomic_ ops are fine, but the later added 64bit variant isn't, which PeterZ (who added it) acknowledged. But there could be way out if we could somehow compile-time assert that either is true: - CONFIG_HAVE_NMI=n - we can compile out all the nmi code - this_cpu is safe on that arch - we can also compile out the nmi code - (if the above leaves any 64bit arch) its 64bit atomics implementation is safe - (if there are any 32bit applicable arch left) 32bit atomics should be enough for the nmi counters even with >4GB memory as we flush them? and we know the 32bit ops are safe > I also have the followup series on objcg charging without irq almost > ready. I will send it out as rfc soon. > > Thanks a lot for awesome and insightful comments. > Shakeel