From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D37AC369C2 for ; Thu, 24 Apr 2025 15:21:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9AACD6B002A; Thu, 24 Apr 2025 11:21:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 959E46B009D; Thu, 24 Apr 2025 11:21:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 821E56B00AE; Thu, 24 Apr 2025 11:21:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 687AC6B002A for ; Thu, 24 Apr 2025 11:21:14 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 66E511CD558 for ; Thu, 24 Apr 2025 15:21:15 +0000 (UTC) X-FDA: 83369300910.22.92F7015 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by imf03.hostedemail.com (Postfix) with ESMTP id 7407C2000A for ; Thu, 24 Apr 2025 15:21:13 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ddixfCNZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.49 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745508073; a=rsa-sha256; cv=none; b=AfqJzZ8afdHN6QymKT4OKYN4rS7UBzVEyEoqlXLARLuk332Kn6y9B5rPGBIMUhZqGqAa59 O+gDy8Sr37syP4DfKp7CbA1f3sKSlc/Qu5tp/7GTWtJeS1XYl4iUBX8WUzymbgf9pCah4N ZkXMmtEHilc0cTOXnV0YMP2Xas1lnp4= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ddixfCNZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.49 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745508073; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GC+PtHDYmmef0IxCV4hv6duPBITreI/6ZNwMatH+Tgg=; b=YgT3dCZJ+LLkSiSCy0Nb+SEUrp7Ai9D5d55gnRuO1kC7OsCd7bGmNBdrUv773M4p1LxXi+ cpqt5Ppn7UDVtdvSmHWHJoPtd5cKpIz1uU23d9JGnF7uPnCl6WCaLOTdjtuonRbXLtyRF+ puwH9e/7Q9/qjuN2N013juyQrgv6ERQ= Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-5f6222c6c4cso1882640a12.1 for ; Thu, 24 Apr 2025 08:21:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745508072; x=1746112872; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GC+PtHDYmmef0IxCV4hv6duPBITreI/6ZNwMatH+Tgg=; b=ddixfCNZ0zoNS6+7MDyl6LXyumw/zsTZJBMG/9Drh7P0JG9DR2WsIyiP7Xd+OWbsr/ TdAR3rfKFz9BPhVFWEd/FGNIbL6XeUdCSNqYmWUiagY+sBZrqN/w0XF3uFfsaY9hY7D5 4uV5ZcaEP2pvO4AB2s/GOq4nnrSoLsngT3o4UFUofAd0VbKEOI7BS4aa1b7P6pTasqgg 3YIW6HLDwLDNeTZuIWNbC/gzxM/BzeMtxKGECI1OOWfAUrVYw8gUo0L90uLWuQdwZQlL pgqFjLdewyHe6TCwHOTGoqgihVMy67CXX1nTYpsZNcXjHouwrHy+wtTi/4eDerkgsW+0 gCFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745508072; x=1746112872; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GC+PtHDYmmef0IxCV4hv6duPBITreI/6ZNwMatH+Tgg=; b=blulLRwgC7Ocwp25m7GXFIg8RrvyPByu8upH5JJLo5Q5C/J/4DasfktCEX98GhC5S4 zT+q7oK0OViCCC+Ilw5VKske2F8FqNfDur7rl6tiiVuH3MTte0CQ+ceb/IGLcwwDNvf7 KZxCYily5r9idflU9/harbrQ3tG++urSbjUzzdok+WncSw4bH53AIM6I+ebdzzgp4HvI x5OLHVSl7RV2wvt2757QwuHza9HaCto3FgnhwNUoWLr2nj8uJnrNzi5gnt9Fftpb5Es9 XzrS/ibCXUWlaefat4vygVp06bZyXPsZQLwEM2SZSiJgpgl6GKEjMavKNzk32sFDUIR0 Bj2g== X-Forwarded-Encrypted: i=1; AJvYcCWseQjtfA4zHpvgeYWD6+dW9iX9RLCLy6SLaLdL3wgA886PJ2vtNFSRtbT1hDhIiBM5ua4y82Fo3g==@kvack.org X-Gm-Message-State: AOJu0YxKsh5g8r6Ze4EqIJKcgPjswh7hHtAMenDbERiUWPQtT3bkshxt tzK+CbGffF+kUqGD/jZSYQbOwtO2nuCtZIhO6xAV8+DyGkneFfGgakrHdYQf1qknowR1/tVhWWd kmbMW5M8Sgi0FJzpklDSrTQnuRSc= X-Gm-Gg: ASbGnctk+K12k7frGJZAgGUDfQSq1yoiwKA/HNK5ChiPv4PgY+btcCL0byI/PRDHNBA RAQbvBfdgqOXNhGbT71qY46p0L6ne7ot7ZlPY4Z/go5ERQu+w9RyM1aO11rKYoxk9M0KvJ+tsmX +sS5fLRr4rQcl1G8PJy9TX X-Google-Smtp-Source: AGHT+IEdImTuRc3xMVstIZacqrqR7ovVhW5g7sBs2ZSRtcs3RwZU5YX4VlNzD8HzVelGQLKyFVRllpCEcLEgDxw0YVs= X-Received: by 2002:a05:6402:34d1:b0:5ec:96a6:e1cd with SMTP id 4fb4d7f45d1cf-5f6ddb0ae59mr2804678a12.2.1745508071566; Thu, 24 Apr 2025 08:21:11 -0700 (PDT) MIME-Version: 1.0 References: <20250424080755.272925-1-harry.yoo@oracle.com> In-Reply-To: From: Mateusz Guzik Date: Thu, 24 Apr 2025 17:20:59 +0200 X-Gm-Features: ATxdqUGZG_Q9xpHZ3X2jOpTAzlBs2c9IIdsj6KruiU8Xt92FFEuIAKEa75sHT38 Message-ID: Subject: Re: [RFC PATCH 0/7] Reviving the slab destructor to tackle the percpu allocator scalability problem To: Pedro Falcato Cc: Harry Yoo , Vlastimil Babka , Christoph Lameter , David Rientjes , Andrew Morton , Dennis Zhou , Tejun Heo , Jamal Hadi Salim , Cong Wang , Jiri Pirko , Vlad Buslov , Yevgeny Kliteynik , Jan Kara , Byungchul Park , linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7407C2000A X-Stat-Signature: nue8g4ikrsaeirf7nix4gthay9yxqdjw X-Rspam-User: X-HE-Tag: 1745508073-829843 X-HE-Meta: U2FsdGVkX1+pzH/OtYrtVMGfSpyt3IwWF1U9ueuwrdHUVGCcmKPEDUmKqLko3uxfRgVziPm01Xc2q6zddoSANU6WT4dKSUSJ/XiNEx1QBtXuIL1knQWRPYXClIVxtE9YndBbyd0jg8LtnYk9LSSBWawUJ7+MU2zXn2qqB5rdcMFhTVdiFsXSMUVubEzaGd2Tl+pjLEBz7mLZUaB9n6hzyXwIFKtMhRQFKDzqin4e62VP1trLRi/2mwAQkACaDzcsb3OBDkF0yf59cmFj8roufAQqrv3gOlwpepUASyb4mkmwUkUQlmPGWaPLWObRiaWY8K1Y5poaUkV060SKn4dL13fVE0VSwTQ7qDm+uwPCWc0fBU6tLP1mPAhE2jBkvYlcx6lKtVxDLHw5xUGqplaI+YHO6WqYxQFghsrb2s3xbrc5MenpzTQvvyBg7aobWHGq1obTXiLx1Hk8Vx7WEMH9LkFNoyMus/fmqoQABYiwQIkumtSwR6fow2yszJ4WDyZUF49/x/0+4QIHmHulruVCXdSFQctdtehMN3sN1Qn4tdNNi5WQFTE5kVOYrUW6DZxc08OACaIDgog2IG8YVzsO7HLZSok24ua3ZeOZnzQQmjVmlGlh4RNdT+noXo5nLxDJzBjvljeBzil0NqNNIpu8e+VtwgMokIsiOhJDkphG9rehbdtQ6sBnJcryFYHGzv5EQFhiOFkG5BeluXSmUnBZ8PNAtzYp0RphC+cn6CNEesCVbFb0pPNa0XWSFnUs+YOkaTAdY1HQWf20oTqLTzFK+T1psqGwcPJz1TWfRADobr7ofKNS9Z6IueTtAQIs5FEaImAsZOarFFKPxzhEMFjf74/Q1WVWTvGLlBvHrv0lsCUnTHQgIvvZXVd2BXD8MEfXo4Oqpprx8QNOJ1Oy/tJNmsdRIzVVNZtvDUPc2Jxnq0LqwqMfjvGQnfssou3lJKstGyK4hT31s18zgyDSCKt htb0/yDZ A2QDWjTVcm0hzQf7UcrqCcDeIiOkzyk1D/w10kAxG8dwmdwHQVPHJSmfQY+MPNW+FAGu+H1Rc2IcQvYAF/tgCC67k4vdsTVFAuVqHvhVIz6urA3cKQe2mCU7hFy7hmSnwBSmcAuXM3i/71m3jaaokPqrpW21gVTLYSg7Ubo2cq7y1Xj8m/dgA1wOSn6m/LXFQj/fWLa6ZdIS898APx13TIctNVyGqoM8dNwsUjOSRCqOgG2aDj9yXt72+F4hl+9L/j81M4UkQXVd/BiTpiChanrVvJh2JjmvIRV5dXhnBag2ZEgnQQNJv3C8yVj4s9fBYFL0xnnZyBldA4lYVUcaZrrNHyrE4YiB6efdkXogqRzrPziNBa1ckEwz/zWugt58t5V92WCQsBUK5UHNfPm+tXvNxXH9ctUhp6zWx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 24, 2025 at 1:28=E2=80=AFPM Pedro Falcato wr= ote: > > How to do this with slab constructors and destructors: the constructor > > allocates percpu memory, and the destructor frees it when the slab page= s > > are reclaimed; this slightly alters the constructor=E2=80=99s semantics= , > > as it can now fail. > > > > I really really really really don't like this. We're opening a pandora's = box > of locking issues for slab deadlocks and other subtle issues. IMO the bes= t > solution there would be, what, failing dtors? which says a lot about the = whole > situation... > I noted the need to use leaf spin locks in my IRC conversations with Harry and later in this very thread, it is a bummer this bit did not make into the cover letter -- hopefully it would have avoided this exchange. I'm going to summarize this again here: By API contract the dtor can only take a leaf spinlock, in this case one wh= ich: 1. disables irqs 2. is the last lock in the dependency chain, as in no locks are taken while holding it That way there is no possibility of a deadlock. This poses a question on how to enforce it and this bit is easy: for example one can add leaf-spinlock notion to lockdep. Then a misuse on allocation side is going to complain immediately even without triggering reclaim. Further, if one would feel so inclined, a test module can walk the list of all slab caches and do a populate/reclaim cycle on those with the ctor/dtor pair. Then there is the matter of particular consumers being ready to do what they need to on the dtor side only with the spinlock. Does not sound like a fundamental problem. > Case in point: > What happens if you allocate a slab and start ->ctor()-ing objects, and t= hen > one of the ctors fails? We need to free the ctor, but not without ->dtor(= )-ing > everything back (AIUI this is not handled in this series, yet). Besides t= his > complication, if failing dtors were added into the mix, we'd be left with= a > half-initialized slab(!!) in the middle of the cache waiting to get freed= , > without being able to. > Per my previous paragraph failing dtors would be a self-induced problem. I can agree one has to roll things back if ctors don't work out, but I don't think this poses a significant problem. > Then there are obviously other problems like: whatever you're calling mus= t > not ever require the slab allocator (directly or indirectly) and must not > do direct reclaim (ever!), at the risk of a deadlock. The pcpu allocator > is a no-go (AIUI!) already because of such issues. > I don't see how that's true. > Then there's the separate (but adjacent, particularly as we're considerin= g > this series due to performance improvements) issue that the ctor() and > dtor() interfaces are terrible, in the sense that they do not let you bat= ch > in any way shape or form (requiring us to lock/unlock many times, allocat= e > many times, etc). If this is done for performance improvements, I would p= refer > a superior ctor/dtor interface that takes something like a slab iterator = and > lets you do these things. > Batching this is also something I mentioned and indeed is a "nice to have" change. Note however that the work you are suggesting to batch now also on every alloc/free cycle, so doing it once per creation of a given object instead is already a win. --=20 Mateusz Guzik