From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01319C369C2 for ; Fri, 25 Apr 2025 10:42:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE2896B008C; Fri, 25 Apr 2025 06:42:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D93B56B0092; Fri, 25 Apr 2025 06:42:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C33996B0093; Fri, 25 Apr 2025 06:42:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A90136B008C for ; Fri, 25 Apr 2025 06:42:34 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0D963BEEE6 for ; Fri, 25 Apr 2025 10:42:35 +0000 (UTC) X-FDA: 83372227470.15.76B1FB1 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf02.hostedemail.com (Postfix) with ESMTP id A50B68000B for ; Fri, 25 Apr 2025 10:42:32 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=cWiUqFTr; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=Y0ZmVzGf; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=fqJV5IG0; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="R7k0UeX/"; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf02.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=pfalcato@suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745577753; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TkLOb7ep/XB+oGXue+ezpJrDGGh8x1ALKKM4drPkRMQ=; b=Xmd/5qUFpE9eJp7sW1gHpCKn8RpwPDar+dEkHgOPCC00gyChKQ59SP6+AB40HRpewfjEe0 G1gbQ0/zctRqpCbiPpPtvED+924tTBaOnZBgSy1+YNL61W9WvlJ2rnj7BwKTOTcJ0KBGU5 crvB/8LDvtqRrS8H3lS+u253WwMScS8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745577753; a=rsa-sha256; cv=none; b=eViM24XW7pXP8jucIQb5BnFi1KPKgsxNpo40coxt0iDq/nz2P4AmFeX8fvW69TgAnIF/p3 tY+Q9YNDp+VGBVaiiOZwJdtzCDUWJqG5IJtpbVsEreTnN8/QP3YRxsagey//iNW1zii8KM FBmPUTMmFw9V8LHUyH2zWtNx5NZyiYE= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=cWiUqFTr; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=Y0ZmVzGf; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=fqJV5IG0; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="R7k0UeX/"; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf02.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=pfalcato@suse.de Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D807921165; Fri, 25 Apr 2025 10:42:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1745577751; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TkLOb7ep/XB+oGXue+ezpJrDGGh8x1ALKKM4drPkRMQ=; b=cWiUqFTr/2nFFeZmXzq3gKG3ltuuH1WlH/mxpn3GoNjeaeOC92SV5krw71BNTqemdV9SaN qdcOtv8tEzzKC/J5/DJ9IFAtBLG8WmLzDrXv7pIQKtkh6grseMkEBTPLh0/HKrby0rznEy OcW8LtWjdpYGs14ywBOqrwqJGRY1vzU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1745577751; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TkLOb7ep/XB+oGXue+ezpJrDGGh8x1ALKKM4drPkRMQ=; b=Y0ZmVzGf8B2yX5w9q1/sTgxqIZVPC5n99odt67I5nKJCgZ8QjY/kMNWcnuTG7gzzqp9VWU rE2PK7jgJwiTn7Dw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1745577750; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TkLOb7ep/XB+oGXue+ezpJrDGGh8x1ALKKM4drPkRMQ=; b=fqJV5IG0bhNfyGdivbLUgLIffWPuW/tU+2b0TxuTUeJpQV7idraaWU2HjbR6LdA9z//NAg PLmNpOJ0oMH6TYjXVjeZEy9Whvd71iMeFZqMJDxyg8QGKQnAH4GXeWEQlIfIjr6mTJeseA MzxOlbdUWs0sjCEGAlfk1d+rJuIV4do= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1745577750; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TkLOb7ep/XB+oGXue+ezpJrDGGh8x1ALKKM4drPkRMQ=; b=R7k0UeX/PFFmh2+kLYJ8KRGZnL6286IStOk86GZ/VGGOKDuj0ylYul0faVZPFtsSUeNOle JT8iIsN3clH3BfDA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 810B21398F; Fri, 25 Apr 2025 10:42:29 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id e5LMGBVnC2jFJQAAD6G6ig (envelope-from ); Fri, 25 Apr 2025 10:42:29 +0000 Date: Fri, 25 Apr 2025 11:42:27 +0100 From: Pedro Falcato To: Harry Yoo Cc: Vlastimil Babka , Christoph Lameter , David Rientjes , Andrew Morton , Dennis Zhou , Tejun Heo , Mateusz Guzik , Jamal Hadi Salim , Cong Wang , Jiri Pirko , Vlad Buslov , Yevgeny Kliteynik , Jan Kara , Byungchul Park , linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 0/7] Reviving the slab destructor to tackle the percpu allocator scalability problem Message-ID: References: <20250424080755.272925-1-harry.yoo@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Action: no action X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A50B68000B X-Rspam-User: X-Stat-Signature: teoiqzigwpas5eakzh9ror1dezrbtoao X-HE-Tag: 1745577752-685068 X-HE-Meta: U2FsdGVkX1+qXKFQKVcI05x0ykkrl6DU8CxBzEaTp5KZYORpao1SgRwbyJxhRAYCIHy/gEzyriveJQEaK5cCS6URX7jns6aXWUfPFfiit1z2/2JepuzsfskVYu89YBqr+zgA+rQC9dtVtRnatGjkjfTGR3TaZFZEsgi3KZDyy1/6xyNTTzZa3M3pVb/dFwyb2IboQBJhGyK56rPBMbyhg9qOglSxxCvZP2MBOLWW/ucdVKpFoRU82S9xgTbwoqaOZ+XbiOaS7IeOJr7jFBZcz/EZ9IAH8uF1IeXwuxI/rnH5S/E/1SWe5yKI86CZL2bqsjb8Ljx8KEOcACmpeIKZWW5+tJ0oq9UemVbzuFgT7hGqshZ92kQgC9GI6gCn1i2XDEzdkV70yPJhCgqkC+IDKbEM7xy+j487wYTDaypt365KMhwBfhm4M004lO1XgCCRdb1fEVNVwsKFcDCDVKjNMcihMPikCzHlT8utjlUjtNtoKeYZApe0GRlZ5GZagOY0ADOw4TufFFC/EB7S/88f4FVVvMwut5/wXgHJGfF2iJbu/nRGexkMF1f8DzhJxAms3wbeC7nnGZXiLBBCb3eN5XXLsn7AdM9O3icVHnGxrVxJWQFEphrVN4M3HH8LXJCsh+bxxapy/ghguHaPAYeBi/Syzaln2M05TlPQ1HAB6SZ8s390xZfPImnVtSBR8uCah1uOs8WmCYsaEDF4JSbyYSvGlt3i0bzWgDpCNstLZpYzMLQwN4vuIe77Rz2CQA8+wDVAbNLqZm2h3p6eNw/Y0s9kWxiZEegp/779y86FpzgaF0rjaGKRZNT1wZt+HCZKvnjR4JOYgi+G6MYrLCtSHes98QtOPwxZips1ryKmYoIAlaXHbv6QAcSnlNcehDDyu89VdIRr/RxUjwTVfSPbBY6bkUDGOmA3yixanbUc0IzWD3NaOtK7en2YFKhHFWmFa8+oHKp/rycD852J9VU SQG/5Hpg Lwpz80T7b2imGUzo+jaJ7RVjrn9r2rj4YYkXPXnhWRUfKpVs2680DrQblbOlTkDWF9+iVT+Rx52LG4QxVFmxvODODccpwJhfW/ODAYl1CSfAthbSuG0YXg/aK7hWMuLE/Mv6+FpACDM0bvr+CSLYz/4dcJs7NuPpLsura4Ji/LiQz2rayBqN8oWZxNqGgy0NcW3AYjzDvXoMAy9YX5kvg22tfYurvLDpxCmg1bTrIPPl/6R1bXatkwPZuITEznb5owtUhKp6wmQfkV3uf84lf9Z0G9pVPqZ5WqgKj4GYDBLQNibKgMF7JG3tHwnnr+W1W6u/pNel9N/ovW+/0j8ANrkYEPxnTFD95WFPVdF/a/dC03YvQD/wFlU8Eo66yW6+SsaNkwq/QgLFYNRI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 25, 2025 at 07:12:02PM +0900, Harry Yoo wrote: > On Thu, Apr 24, 2025 at 12:28:37PM +0100, Pedro Falcato wrote: > > On Thu, Apr 24, 2025 at 05:07:48PM +0900, Harry Yoo wrote: > > > Overview > > > ======== > > > > > > The slab destructor feature existed in early days of slab allocator(s). > > > It was removed by the commit c59def9f222d ("Slab allocators: Drop support > > > for destructors") in 2007 due to lack of serious use cases at that time. > > > > > > Eighteen years later, Mateusz Guzik proposed [1] re-introducing a slab > > > constructor/destructor pair to mitigate the global serialization point > > > (pcpu_alloc_mutex) that occurs when each slab object allocates and frees > > > percpu memory during its lifetime. > > > > > > Consider mm_struct: it allocates two percpu regions (mm_cid and rss_stat), > > > so each allocate–free cycle requires two expensive acquire/release on > > > that mutex. > > > > > > We can mitigate this contention by retaining the percpu regions after > > > the object is freed and releasing them only when the backing slab pages > > > are freed. > > > > > > How to do this with slab constructors and destructors: the constructor > > > allocates percpu memory, and the destructor frees it when the slab pages > > > are reclaimed; this slightly alters the constructor’s semantics, > > > as it can now fail. > > > > > > > I really really really really don't like this. We're opening a pandora's box > > of locking issues for slab deadlocks and other subtle issues. IMO the best > > solution there would be, what, failing dtors? which says a lot about the whole > > situation... > > > > Case in point: > > <...snip...> > > > Then there are obviously other problems like: whatever you're calling must > > not ever require the slab allocator (directly or indirectly) and must not > > do direct reclaim (ever!), at the risk of a deadlock. The pcpu allocator > > is a no-go (AIUI!) already because of such issues. > > Could you please elaborate more on this? Well, as discussed multiple-times both on-and-off-list, the pcpu allocator is not a problem here because the freeing path takes a spinlock, not a mutex. But obviously you can see the fun locking horror dependency chains we're creating with this patchset. ->ctor() needs to be super careful calling things, avoiding any sort of loop. ->dtor() needs to be super careful calling things, avoiding _any_ sort of direct reclaim possibilities. You also now need to pass a gfp_t to both ->ctor and ->dtor. With regards to "leaf locks", I still don't really understand what you/Mateusz mean or how that's even enforceable from the get-go. So basically: - ->ctor takes more args, can fail, can do fancier things (multiple allocations, lock holding, etc, can be hidden with a normal kmem_cache_alloc; certain caches become GFP_ATOMIC-incompatible) - ->dtor *will* do fancy things like recursing back onto the slab allocator and grabbing locks - a normal kmem_cache_free can suddenly attempt to grab !SLUB locks as it tries to dispose of slabs. It can also uncontrollably do $whatever. - a normal kmem_cache_alloc can call vast swaths of code, uncontrollably, due to ->ctor. It can also set off direct reclaim, and thus run into all sorts of kmem_ cache_free/slab disposal issues - a normal, possibly-unrelated GFP_KERNEL allocation can also run into all of these issues by purely starting up shrinkers on direct reclaim as well. - the whole original "Slab object caching allocator" idea from 1992 is extremely confusing and works super poorly with various debugging features (like, e.g, KASAN). IMO it should really be reserved (in a limited capacity!) for stuff like TYPESAFE_BY_RCU, that we *really* need. These are basically my issues with the whole idea. I highly disagree that we should open this pandora's box for problems in *other places*. -- Pedro