From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D4A7C369AB for ; Thu, 24 Apr 2025 16:11:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D17B56B00C7; Thu, 24 Apr 2025 12:11:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C9C976B00C8; Thu, 24 Apr 2025 12:11:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AEFE36B00CA; Thu, 24 Apr 2025 12:11:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8FFCF6B00C7 for ; Thu, 24 Apr 2025 12:11:47 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 213AF14114E for ; Thu, 24 Apr 2025 16:11:48 +0000 (UTC) X-FDA: 83369428296.03.E176EC6 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf17.hostedemail.com (Postfix) with ESMTP id 16FC640009 for ; Thu, 24 Apr 2025 16:11:45 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IWctKe38; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745511106; a=rsa-sha256; cv=none; b=2zZ4zwMDwSqkpsaPNgu2tzLgQeVx5+RWT5QX6JzOFuhb7oFoKp5M+l/J3yHDO3vESDz2qS IEBZiYtWakopd+laIHTEiifQE04tflR0XKJjJN441I21iy/vguiRjXi4wFhF0fjpga/S8s cSse5HeXbf1vJ1yIt/OZ7+eXqvF/Itw= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IWctKe38; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745511106; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rBnYkBYfCKwOov8i1SK1rfC4Dy3fPUMul+NfsgVA7Bo=; b=uTZgAtZieX0a4QKL0InF/6SpAYjH3TloN5bTrUPlL3cl/Hl7DMZS+u4/gJaop79syAP3ML hf9qxPk2Lt5dOTS393CnZ5Ng+I8gsIHovLdf7gYg7p/Pyojn35WP91vXmc/4mCF6y2/TVB ZV8mpuuNuZ3f0TWSb097V4vuhY1AXA0= Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-5e8be1bdb7bso2031317a12.0 for ; Thu, 24 Apr 2025 09:11:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745511104; x=1746115904; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rBnYkBYfCKwOov8i1SK1rfC4Dy3fPUMul+NfsgVA7Bo=; b=IWctKe38zUUxTiJwB9KZB7EqUtjTcBSxn3CjZug+O9eJq5Rs8BDYG6568W2IPRkB6U k3lvomfrPDAX+6QdGxPYBFvsknkfTxgSPKpiJWhY7Ymm9M3UiXJCzc14Dgo8TC28qxk8 Ioa8m5jOSytCf9t8YTXNecpxaP4TG55uUgHxFT31WSTsZVooB8i8DUYcC+i6F7fCKpCw 2eS1BSCCs6IOVBBnl5h6uA7oSdM+xfdNBTm3OOq7P4eeKEO8zmTAUpt/3kj+uzhsgwfK 3YSBaZXW3NyhYRIx1Ct9xY11c3VYe/XtWxnRqrOelqk2td+D0ojgiXYUT3oJTCkliVen LaUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745511104; x=1746115904; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rBnYkBYfCKwOov8i1SK1rfC4Dy3fPUMul+NfsgVA7Bo=; b=MzR//R0jSGcXtP/Pjm4OOB8sZ1r8h/T2UOAhzzECAUaokTPvO1LpSTcJ3TXitn/K2r Sl3lZ0cdnTbCP5EAXgU7+IbsfJGDnvSkO9Ub9K7pR4S/r7bnXfw0oM/umOSlHQbNCwGY HhsxxMABhbQWZRlhTLmr5ZU1BmUt3EVT5Sr/a//EXpfAFBxIlESwGCYqS9ck0rGCRX2p wlr++MdmEKCRE+D91VQwivOk2QDof5vLT76rhDwT0GIjWvWdHmUdrxnsGhZ1xD5NCu2/ YsfggH7DGKE0HgFMMkBoNcTum+7f3kk6XLHTfy573PzS5G6CB3PQq010+Q8HUrAo1m/+ SVoQ== X-Forwarded-Encrypted: i=1; AJvYcCXbdlp0YooFE2w/KnoWJ1ZVnTeEOLF9xL4WwoM/fCUrCoGbB+LE2uJLrcVuFOHLp+/o6DbJx8epiw==@kvack.org X-Gm-Message-State: AOJu0YyuogdPDBFj7CJuj1y1X7tPKCggJ5x3xaC9z84EB3DNsxLRBEDb e6XIef4/w751MH87dGpA0xFVO5Ayvd3Ik3QZxbarG+X2F59tAtDYnMKMs0Iui2evOg8P+TAxsaP XfaNAqGomQVuvCmUggnpKXpwpZcU= X-Gm-Gg: ASbGnctsoUrUTK0jViw9Lpt0d98/vq9R4354E7dkgsyPmvYEszmkadGOsdI5K9kxYHS g39AgiAOThDsLFvdEe+FB4KpfrcxF+CRlLGVpqsRxu5tqDSZp/M2VgVUR7IBX418CeuArS3sIVZ EEDJQqInusGNqxX4YCsZnf X-Google-Smtp-Source: AGHT+IGby31M7pE/UqIvfEJsZQ9NUU69Wi1+xEDECqgWhfFYhid9VUDiXC1EruOMr6Bf2ISZOq6Rqz7wkWd+oIsEGDc= X-Received: by 2002:a05:6402:35c5:b0:5de:c9d0:6742 with SMTP id 4fb4d7f45d1cf-5f6de2bb65dmr2876390a12.9.1745511104137; Thu, 24 Apr 2025 09:11:44 -0700 (PDT) MIME-Version: 1.0 References: <20250424080755.272925-1-harry.yoo@oracle.com> In-Reply-To: From: Mateusz Guzik Date: Thu, 24 Apr 2025 18:11:31 +0200 X-Gm-Features: ATxdqUGX-yYDU2lrQULg7WTQlr93U3bgOGgrBEp1CMvLz6ftZCTsRcY_hk0ktIw Message-ID: Subject: Re: [RFC PATCH 0/7] Reviving the slab destructor to tackle the percpu allocator scalability problem To: Pedro Falcato Cc: Harry Yoo , Vlastimil Babka , Christoph Lameter , David Rientjes , Andrew Morton , Dennis Zhou , Tejun Heo , Jamal Hadi Salim , Cong Wang , Jiri Pirko , Vlad Buslov , Yevgeny Kliteynik , Jan Kara , Byungchul Park , linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 16FC640009 X-Stat-Signature: 1toe7yetbp5hip8d9jtrkbxzu7ebyxia X-Rspam-User: X-HE-Tag: 1745511105-366772 X-HE-Meta: U2FsdGVkX19kZaJaAFs0iJpk3WUpF5vMy8jTMuSl5Qfhwn4Cq8BOh/bxFMtxujBNVAB9EVOUTRqBGsCtzM7tbmXfhRMXW7cyEB3u5mQ6KL17disxsLPhd4X7HBV98o0wvyxUmMmj5DMaodlAVe8K25GXTKpgvIhuy2sdaVtpvU1m/iN22i7aJFkpQ/e8pIVMFFxEcH/EVN3Vdw/1/yvK6URW8E7B2nnQ0wC1KeJs7nRGlxHIYVpo2eD8yQGvPMIhASuokPIhCm0ljDfoNvzY6Jw7pOCoeqmXmIpUDDONf60tcJyGl//77s8S0qKvcuYdnnCZP9q0Dy78dYLcS+Wr/frCkdfuQR4wJosfXT1EmQewlWjff1w+vs6WAlJRXkN1+UfsbcZ5VUskPatqsWTPDurdoPGZEYolBI+xXfMsvRzqD5/dk0Gv8YLATNlLCNI2rPkiwxcKQumAnR8DtOnNhl9MFP51GOjeK9L0sRMNQKHr/0JTKeTenLb4esDlnjfffs5AE1k8wZIJpL4awqRAfGa1im5X41dqTaBzUK86oMHljY8BopzDCqWn8qGCpvkA7DpWRSWC2uHTxID+RMv4HpAPSlO0caMSMyKKYh+wMd/VnHPVhL//avLuZjELf/5bV2zMmLj32/wMrf0Y1C4pXKIgxotsPBYCEmcWE4O86wG/DqCFR193XkEHJb0llUQiNvFknUfPtA+/2SkcfY9Ad7PfT9QenZVCRQg5QsRQAkHDLfHWUhuC/66hz92E8kFXz8iIEeQ/WNC4m7IwcqS4YTy9U8DGcYje0W1H9yZfAAVcaJnIp81UqQebSfoDidNpRijFJS363WD+Wk3p8UazI41+iK1iWRUZFI5l5KvHJvt4PPzPTHC0uULuXBOvcksVSfWR5ItwwS/A8syUME9+ENEcpE9z2x51svmENKab5qYk0lY0N2QXc1M/9v/QnlCMFjJjhrYDHEuPyIXFhxj 0jnNGIiB K+CEtOq3KV5fgQ1DPSsXUY+JNtStWaBatY9LNdFLR2dsxtrD6Nxvu0p0wbVdPXJpkM03Z+Oado/7MSCP5an8P2cSxGPPgVe0LknlUI0XMENxrof3BaDgyR98uiuAcnvt3QbOB7O89EUwRPiHkAO72wurVOPbivlXT4PHlHqpLVCOFtLwiIn0kFqoe5/YKBP+1dEYy89cJgXEzUT95hzbCKho5/xkuyRHG75pKGrURFGl6Zi36UVEY7QAs0DTY1kWsE3tK92EjgWK7zrQFVsX5aSH3P+AKvRJ0f+LgFxvFKg6HFY0ysIM/Etldt2xhPaVkcmc7pqHs0uNWzmCKhO4K2e5CGf3XmItm4xbH+fDUeZR8VIc0VNcJgnpnnmxuP/nITGI6iAYPtvRdLT7KDIbog9l3dRoxqvQm1a8j X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 24, 2025 at 5:20=E2=80=AFPM Mateusz Guzik w= rote: > > On Thu, Apr 24, 2025 at 1:28=E2=80=AFPM Pedro Falcato = wrote: > > > How to do this with slab constructors and destructors: the constructo= r > > > allocates percpu memory, and the destructor frees it when the slab pa= ges > > > are reclaimed; this slightly alters the constructor=E2=80=99s semanti= cs, > > > as it can now fail. > > > > > > > I really really really really don't like this. We're opening a pandora'= s box > > of locking issues for slab deadlocks and other subtle issues. IMO the b= est > > solution there would be, what, failing dtors? which says a lot about th= e whole > > situation... > > > > I noted the need to use leaf spin locks in my IRC conversations with > Harry and later in this very thread, it is a bummer this bit did not > make into the cover letter -- hopefully it would have avoided this > exchange. > > I'm going to summarize this again here: > > By API contract the dtor can only take a leaf spinlock, in this case one = which: > 1. disables irqs > 2. is the last lock in the dependency chain, as in no locks are taken > while holding it > > That way there is no possibility of a deadlock. > > This poses a question on how to enforce it and this bit is easy: for > example one can add leaf-spinlock notion to lockdep. Then a misuse on > allocation side is going to complain immediately even without > triggering reclaim. Further, if one would feel so inclined, a test > module can walk the list of all slab caches and do a populate/reclaim > cycle on those with the ctor/dtor pair. > > Then there is the matter of particular consumers being ready to do > what they need to on the dtor side only with the spinlock. Does not > sound like a fundamental problem. > > > Case in point: > > What happens if you allocate a slab and start ->ctor()-ing objects, and= then > > one of the ctors fails? We need to free the ctor, but not without ->dto= r()-ing > > everything back (AIUI this is not handled in this series, yet). Besides= this > > complication, if failing dtors were added into the mix, we'd be left wi= th a > > half-initialized slab(!!) in the middle of the cache waiting to get fre= ed, > > without being able to. > > > > Per my previous paragraph failing dtors would be a self-induced problem. > > I can agree one has to roll things back if ctors don't work out, but I > don't think this poses a significant problem. > > > Then there are obviously other problems like: whatever you're calling m= ust > > not ever require the slab allocator (directly or indirectly) and must n= ot > > do direct reclaim (ever!), at the risk of a deadlock. The pcpu allocato= r > > is a no-go (AIUI!) already because of such issues. > > > > I don't see how that's true. > > > Then there's the separate (but adjacent, particularly as we're consider= ing > > this series due to performance improvements) issue that the ctor() and > > dtor() interfaces are terrible, in the sense that they do not let you b= atch > > in any way shape or form (requiring us to lock/unlock many times, alloc= ate > > many times, etc). If this is done for performance improvements, I would= prefer > > a superior ctor/dtor interface that takes something like a slab iterato= r and > > lets you do these things. > > > > Batching this is also something I mentioned and indeed is a "nice to > have" change. Note however that the work you are suggesting to batch > now also on every alloc/free cycle, so doing it once per creation of a > given object instead is already a win. > Whether the ctor/dtor thing lands or not, I would like to point out the current state is quite nasty and something(tm) needs to be done. The mm object is allocated from a per-cpu cache, only to have the mandatory initialization globally serialize *several* times, including twice on the same lock. This is so bad that performance would be better if someone created a globally-locked cache with mms still holding onto per-cpu memory et al. Or to put it differently, existence of a per-cpu cache of mm objs is defeated by the global locking endured for each alloc/free cycle (and this goes beyond percpu memory allocs for counters). So another idea would be to instead create a cache with *some* granularity (say 4 or 8 cpu threads per instance). Note this should reduce the total number of mms allocated (but unused) in the system. If mms hanging out there would still be populated like in this patchset, perhaps the reduction in objs which are "wasted" would be sufficient to ignore direct reclaim? Instead if need be this would be reclaimable from a dedicated thread (whatever it is in Linux). --=20 Mateusz Guzik