From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30789C369D9 for ; Wed, 30 Apr 2025 19:49:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F34A26B0093; Wed, 30 Apr 2025 15:49:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBE8E6B0099; Wed, 30 Apr 2025 15:49:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D0ED76B009A; Wed, 30 Apr 2025 15:49:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id ABD0C6B0093 for ; Wed, 30 Apr 2025 15:49:19 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id AA90CC8FBD for ; Wed, 30 Apr 2025 19:49:19 +0000 (UTC) X-FDA: 83391749238.27.FBD643E Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf23.hostedemail.com (Postfix) with ESMTP id B95F5140011 for ; Wed, 30 Apr 2025 19:49:17 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZG1+qpTd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746042557; a=rsa-sha256; cv=none; b=tO2uUdlwcpcK9JY+nTYKngIqvtLRCLZ/b+GAjNgK7kOEb1WpWn+1u7isGJPKEaZgrh4XJj gcB6qPLRzEhcXVDHy68CtrfYMKirJ7/waZdOcAvP3hYZdeJKnDXL7mP3lLUmOYtAcLlLZd 8FqtOuD/lrN+Mqd5vVKweyL4a4WFuBk= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZG1+qpTd; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746042557; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y5vFC8AhrZtOhJIds93Y4EHwr/YoR1RMhFCgXyxy4qo=; b=BWIsTlX6R1+KxLaS6niD//0z72OtWe0FiuwEEqgeQghJOthl/Go/T9fpr7PcZuLd8BBg9J bKotzsdxV4g7XoEZePafY8KF8Qv29oig8/nLo8doGHH8jLVzyuhZviFqB5ZZ6c39627tdB 9gr/lIm4qYd1Ppl3Jy9obCFqZqM+sec= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-ac29fd22163so26935166b.3 for ; Wed, 30 Apr 2025 12:49:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746042556; x=1746647356; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=y5vFC8AhrZtOhJIds93Y4EHwr/YoR1RMhFCgXyxy4qo=; b=ZG1+qpTdZDE3UtWbC8b0S040xhoHbV1D6/CB0NuG1IKKKyBLAbX+OwgIwkqFe/1qqm ehJzpuhdbNuuJdNynGSA6Zx3+6rKsf08w7XRSueayZ8bpHnkYxAAoLSae2t1mwuzU8uH 5m2X6OMDkhVOPblGKGUjQT+q6l4CIrGESEEEMd4UOe13S2gzjdzbvx+kBppJ0+4ClqZw 8rayvwC0zfxMEK5l7Rnx0ORnCwr62wbtJ2Jit0+5+Gc6gHxfTf/IIc3l4rNrJxSR4FuT W30TwqIT+TjWBp/yj+LGsQh2QMq9q5Jy/OHzSE4RDnjAKqdJ1ojIc86whp5mwKlPiWSN tnig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746042556; x=1746647356; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y5vFC8AhrZtOhJIds93Y4EHwr/YoR1RMhFCgXyxy4qo=; b=GSfgNRn8ioUIJKytODWUSfiBGuY5G2dTOeRiQghBNJB17rmGG1Ctf6w+cNa07ZbrIl wmo4M5QTxJqUuiWvGAXcSY0kOq/UZwfFu27gpCSR81BDRlO5HW9uXAiEIrpZmvwvBZIa 4JPu9XcgNWW5u+7Ou2c41teFbhX1F0YKi48AB1fyN1Rg+R2wMA3FZL7Y3F10OKWPQr/6 RNHYvXdfWVqXU1j4yuZq4gyDipBizHK7MIA0CbXrzY/+Wm0yEoO2CYYUHO3J7vkAGr89 Ezd+BcfjAtpQCCyj/Ca/968GK7fb4KJOxwb3mRS8yyZ3fnj6+DmauCbl7F+98gl+p9aS f4lg== X-Forwarded-Encrypted: i=1; AJvYcCXsYX/I30kZB6WNCJ+Y9RI6wwfapN0Ki/MPdpkr8dzUaxWFx/IW2OyGoUJbo/5lsbVLA0gvFwyixQ==@kvack.org X-Gm-Message-State: AOJu0YxTKjMYIDlnTSFU9PaC58dfr2VceRjMy4lmi8RRcZvjKHFYMZBO H0ubshzMAbsqcDtj8gNBLjTim6vWQw40H94xr1SD8TM/IAEQ3RQ3YzBCfUBAYwfWTZZ3RXTrA8x ZiKleoSjMdRpPSx3MKResBD5u05Q= X-Gm-Gg: ASbGncvWCvM5vxxjSuFErX8i/MNvpdk2XyWHMMxghxcH6M7tptwIZ7qj74MvMbvgvNs 0oWACN2meEREosUa8AjBZ2gs2sS/T9QwKvK1Wz7SgtEzPlVzCBJ4nyGJQAqFdSdsKurlzzzMJ3R DF45p8Tcyj34OFTXn1hMGUVU8QGHx2YrU= X-Google-Smtp-Source: AGHT+IHpT+iTWDFMpDYFHHF3i70rxCnGl9rhceHR3hYQzhGNL2ueWGAsYtHXahAnbbzOSF84ATohrmtx/W4nupgJxCo= X-Received: by 2002:a17:907:7b9d:b0:ac6:cea2:6c7 with SMTP id a640c23a62f3a-acedc701d69mr453954466b.42.1746042555700; Wed, 30 Apr 2025 12:49:15 -0700 (PDT) MIME-Version: 1.0 References: <20250424080755.272925-1-harry.yoo@oracle.com> In-Reply-To: From: Mateusz Guzik Date: Wed, 30 Apr 2025 21:49:02 +0200 X-Gm-Features: ATxdqUFl3ZIseT5-rXLXwt2orZP-o0_kcICmKwmBNRne1lS2Hq2Mm11zQAmstzI Message-ID: Subject: Re: [RFC PATCH 0/7] Reviving the slab destructor to tackle the percpu allocator scalability problem To: Pedro Falcato Cc: Harry Yoo , Vlastimil Babka , Christoph Lameter , David Rientjes , Andrew Morton , Dennis Zhou , Tejun Heo , Jamal Hadi Salim , Cong Wang , Jiri Pirko , Vlad Buslov , Yevgeny Kliteynik , Jan Kara , Byungchul Park , linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B95F5140011 X-Stat-Signature: qtpb3m51n5mfub3x5sw4b4y3kzn733we X-Rspam-User: X-HE-Tag: 1746042557-791519 X-HE-Meta: U2FsdGVkX18y4hR7mBElli/H8QenWnONUizwo8ro7Wl+o06h7igEOeIb7UCbAeh80oaORDFu4+4D/Ftc4HnajIFku6PmDuugvW8RwaVBqR1K9yPV00Szipyq7bwvds8xGQr8ZiIxZnP5vMgpzEUyZ6cpfxQFpHv/20efJvsD8BBzGvqBMqwMdATq6nU1+jluRUk8ID6mzJpuIfjdV7KckWxgg9pm9ZhH7ybCTmjAzaGLmCDYXlMHexW1xvieMcFZJgi7zJ4ZG+LE+lRcMTCILUzgRmXhF1mJZ5/SbUeBaGZU62LgMbvE0mB5psyRKYxZRF97xoLaTpN7GA9uCiyL5hn1rP9KoDtMIf40bpBlwJq+k8/YTPvxNEepZpmuNjRIAzIWUjjDVn3FfpVvB1p1fMCrZiqiUwabum7LPj+qGKmLkRYMb556eaYmnHMIhDp0hY5oU0KRrqHbcVaAEG0NRFL0eeONSg8yH1P5hLVaitGtbtta+RC0GnUSyE9/Ddry82uOpACfpmiTixlWrekgvFyjvKWSE34mhGzfL7X3O28zc/bbNMTA6zExS1G+KO6RYrng3uPx+zx3XGzWdbheaG8bo/qz/RjQZLJaA1ikaRKxgfFYz17nwSZb2GsJ9ESmrrIyij1jYdbBd7qXdD4C5oL5hx5/6yoRankMG8GvVttvztIGupte16SuMaMsL2+MW7UVCyF4XVwLveeK5qROTGjWfZ1/+p4qA391FoMgVCbcyTr4DzsOFVEf5Ox6DIX8RqNqPsBFEMxwEAUvoHnVwTzqxE/A3aHsTwbeEfVSb0RMhgXBlCf+99vuOpKfn3LjtN/f/xUmdbRm48aZ/b0r3AeUzOsx24WVRV4MhAjNogfRhr7tzq3ACQnA41BqcjtfllSy3KcWo4sUVBcm6Gdn0HBmp7QoP3beQADLa29K1pp348e8coAL+Z3NXZdv4ZlasC7iE/Qu6Q1uJq3k0GE aq5nfqtU U14af5AMdBH3SqrDZtzsPzR8Dt7cDpiftNlyrXyaltOsS++9k5e5TfqU5hjND4fv78OnDHhj1FwCSmTCHthT1pdN1VC3S++3Km5/LbkCx2p8pKeRG4/dfJjG3fN5bIwiLwHL7Gyf1dWInoWW347022hXDYCH8JXjgHak1dDMMsBWDAHzg2MlMTbwnellwU5UmLiOxWXztQnqSJhBkZBmQO96AbKnwlMNireprHuwcH4mrXFIrMqT8GXxZ6XhmGV9fjp8w0V70TqmdQZ0SG6qf9sdo2FjAksc6Km8jCgg1RIuCyqcNRsRZYHgP8DfJGV3aIzr2E3hVRA3fEcI0GIA9ygYl63w/A0uEuaq9bPgmtrO1SOk+ZxlVYV0rE/Nw15IvBo2SK7wT63IhrNbTQd28mEQiLWB/A2Yrllid X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 25, 2025 at 12:42=E2=80=AFPM Pedro Falcato w= rote: > With regards to "leaf locks", I still don't really understand what you/Ma= teusz > mean or how that's even enforceable from the get-go. > > So basically: > - ->ctor takes more args, can fail, can do fancier things (multiple alloc= ations, > lock holding, etc, can be hidden with a normal kmem_cache_alloc; certai= n > caches become GFP_ATOMIC-incompatible) > > - ->dtor *will* do fancy things like recursing back onto the slab allocat= or and > grabbing locks > > - a normal kmem_cache_free can suddenly attempt to grab !SLUB locks as it= tries > to dispose of slabs. It can also uncontrollably do $whatever. > > - a normal kmem_cache_alloc can call vast swaths of code, uncontrollably,= due to > ->ctor. It can also set off direct reclaim, and thus run into all sorts= of kmem_ > cache_free/slab disposal issues > > - a normal, possibly-unrelated GFP_KERNEL allocation can also run into al= l of these > issues by purely starting up shrinkers on direct reclaim as well. > > - the whole original "Slab object caching allocator" idea from 1992 is ex= tremely > confusing and works super poorly with various debugging features (like,= e.g, > KASAN). IMO it should really be reserved (in a limited capacity!) for s= tuff like > TYPESAFE_BY_RCU, that we *really* need. > > These are basically my issues with the whole idea. I highly disagree that= we should > open this pandora's box for problems in *other places*. Apologies for the late reply. Looks like your primary apprehension concerns dtor, I'm going to address it below. But first a quick remark that the headline here is expensive single-threaded work which keeps happening on every mm alloc/free and which does not have to, extending beyond the percpu allocator problems. Having a memory allocator which can handle it would be most welcome. Now to business: I'm going to start with pointing out that dtors callable from any place are not an *inherent* requirement of the idea. Given that apparently sheaves don't do direct reclaim and that Christoph's idea does not do it either, I think there is some support for objs with unsafe dtors *not* being direct reclaimable (instead there can be a dedicated workqueue or some other mechanism sorting them out). I did not realize something like this would be considered fine. It is the easiest way out and is perfectly fine with me. However, suppose objs with dtors do need to be reclaimable the usual way. I claim that writing dtors which are safe to use in that context is not a significant challenge. Moreover, it is also possible to extend lockdep to validate correct behavior. Finally, test code can trigger ctor and dtor calls for all slabs to execute all this code at least once with lockdep enabled. So while *honest* mistakes with locking are very much possible, they will be trivially caught and I don't believe the box which is being opened here belongs to Pandora. So here is another attempt at explaning leaf spinlocks. Suppose you have a global lock named "crapper". Further suppose the lock is only taken with _irqsave *and* no locks are taken while holding it. Say this is the only consumer: void crapperbump(void) { int flags; spin_lock_irqsave(&crapper, flags); mehvar++; spin_unlock_irqsave(&crapper); } Perhaps you can agree *anyone* can call here at any point and not risk deadlocking. That's an example of a leaf lock. Aight, so how does one combat cases where the code turns into: spin_lock_irqsave(&crapper, flags); spin_lock_irqsave(&meh, flags2); In this case "crapper" is no longer a leaf lock and in principle there may be lock ordering involving "meh" which does deadlock. Here is an example way out: when initializing the "crapper" lock, it gets marked as a leaf lock so that lockdep can check for it. Then on a lockdep-enabled kernel you get a splat when executing the routine when you get to locking "meh". This sorts out the ctor side of things. How does one validate dtor? lockdep can have a "only leaf-locks allowed in this area" tunable around calls to dtors. Then should a dtor have ideas of acquiring a lock which is not a leaf lock you are once more going to get a splat. And of course you can just force call all ctors and dtors on a debug kernel (no need to trigger any memory pressure, just walk the list of slabs with ctor + dtor pairs and call the stuff). This of course would require some effort to implement, but there is no rocket science degree required. However, given that direct reclaim for mm is apparently not a strict requirement, my preferred way out is to simply not provide it. --=20 Mateusz Guzik