From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C189C369AB for ; Thu, 24 Apr 2025 16:03:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E9BE6B00CC; Thu, 24 Apr 2025 12:03:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5832E6B00CB; Thu, 24 Apr 2025 12:03:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3EB676B00CC; Thu, 24 Apr 2025 12:03:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1B6896B00B2 for ; Thu, 24 Apr 2025 12:03:26 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 88464810A4 for ; Thu, 24 Apr 2025 16:03:27 +0000 (UTC) X-FDA: 83369407254.01.8F7833E Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) by imf05.hostedemail.com (Postfix) with ESMTP id 96708100016 for ; Thu, 24 Apr 2025 16:03:25 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jes7f6gt; spf=pass (imf05.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745510605; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gwmub9V1s7OtbSQu33WoDoG4zFi1R//niy7dgWyB1Rc=; b=g9YQXBbkfwFRAvriV5QRx5zGnSPu3kHC5w7vUtdofZK6Pr7ObG+xwvFzzMJ+6N3lekerpr f6o1C26R8wDJXMJfIhLrZVs6mGlxJQ8yj7gFTndw8hmUOFA7hSlUD5py0f186Gj8F8g/Cu nJELg90Me6XBAayDN8P1vTgax7JRx2o= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jes7f6gt; spf=pass (imf05.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745510605; a=rsa-sha256; cv=none; b=g1KAdtzmTwWcKRwA/KoobYXnkmrcJhgbZL9Dedrjg+4kSZ2d4AJPX0oswHdxKRnaFpqsXn wL3ThIflJQR8UnrUI7gM+kvLgD17s6j/ENnFWycS3lwllAYIGcrsR3Ssc/UfC69mZH1DpM RvuRKuVP1kT/MgoF7Rc1qJhlzuD9tF8= Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-ac3fcf5ab0dso201199366b.3 for ; Thu, 24 Apr 2025 09:03:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745510604; x=1746115404; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gwmub9V1s7OtbSQu33WoDoG4zFi1R//niy7dgWyB1Rc=; b=jes7f6gtRCKu469LlsBdQ6T23HktMfHDazvWorIkO6MSUpdZpqVo+Df6Q30slZJdS+ 7Vh6CdthNdTf9RveFh0yrIOABkp6NzjsOAwyJRmfImq1sWfOrJ83nRPUvh+rm9Qkb7eB 67vN/e0j1q7ocBCNggd9KkTVJCojldPJqQmk2aRGQlFnDeMPriys/Ft2JI1JtFSegwEg iSArQRfzWduLMjvqUVc9g4bR8s254eRByMvmFxxCa37Zo+JBjuXWCo+6TSGELJg6A4L0 lJbC6QFKKkmW3dV0+WkxqfdCMuehTiX0BUkxrgRv+7ftLnpZQmxmYmalMJ8oH3IZX5sx Gzkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745510604; x=1746115404; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gwmub9V1s7OtbSQu33WoDoG4zFi1R//niy7dgWyB1Rc=; b=XYAunTnzp8TVJVeigOE89+aPBDr7YkxCKnhICQeuQntQnSC19Jcd1zLfV0VQ6A1xi3 8447aUm4dlGX1S0UkJ0pWvZ7pH1O9urmO14HT0cfSt4YnqykF12oFCq4Yz6B+Ov3NpfH J+mJZwOhTmRX1/IcmfxAbaTOhPsDeI7fyLGQZ4F5I+MKGQhfOeqYVwHZrAjPRcUKN5Kk SqmusiDswSI9oS1ify3CVQh+c5pgOSzWBCLlpjH7xPAjP9ToPAkYwvbdv58LHTLbpPAM V3l+id0hztUAPRZXzxSwobGf5HoajHCerwMhUUyzOEcygHyA7Gya60MyptMWl8mWRCsD ir1w== X-Forwarded-Encrypted: i=1; AJvYcCWoHs4Sdb2Cs9GSYvFbZQyPuSisv7iq6+A81XZyPAclNry3VjxxLeoyv1pjkYFy39YnX9gkkSgY4w==@kvack.org X-Gm-Message-State: AOJu0Yw94KzqNn+4FQDAJhfQbYRWKcH4PWMH0EdnyKIHjXkiCMHFcrwB uMToFUz92l8JbT3rddHiuOsx/vhfamdy8dU3AQ8g7K19VsmcpxOTsi8bt4VsILB7hVZF46kPLbD Hh9sd8CXZqne8RblpFI79r4VwZA4= X-Gm-Gg: ASbGnctFAol1hCsrtKR8bktxVfjPSf5TUjYDFJgvRepphlVwncFQll10RJOJtUsUJL8 EDogbJo/M114dHial1OX1Ru6tBYnedalz666LrsqzKvmhT43Xj3QKsRc2a5/1KqC9fWxfWh+qpy DqWYZhL89KEnh8fs4P2xss X-Google-Smtp-Source: AGHT+IFQWoL1hL+rjuGIAdSWWrMhRLWhWHhJBObVQO6gqAAl3XAQ1e2z0nGjbJtm8t54PmKZEdKGUsYL/h/aFwUgIrc= X-Received: by 2002:a17:906:4fd0:b0:ac2:9683:ad25 with SMTP id a640c23a62f3a-ace572a2580mr297129466b.34.1745510603548; Thu, 24 Apr 2025 09:03:23 -0700 (PDT) MIME-Version: 1.0 References: <20250424080755.272925-1-harry.yoo@oracle.com> <80208a6c-ec42-6260-5f6f-b3c5c2788fcd@gentwo.org> In-Reply-To: <80208a6c-ec42-6260-5f6f-b3c5c2788fcd@gentwo.org> From: Mateusz Guzik Date: Thu, 24 Apr 2025 18:03:11 +0200 X-Gm-Features: ATxdqUEOWFXlCzjV_nM4MRoRlDiYlYAK5aexpc-gpBDaZKUfEa2BIixZznjqzs0 Message-ID: Subject: Re: [RFC PATCH 0/7] Reviving the slab destructor to tackle the percpu allocator scalability problem To: "Christoph Lameter (Ampere)" Cc: Harry Yoo , Vlastimil Babka , David Rientjes , Andrew Morton , Dennis Zhou , Tejun Heo , Jamal Hadi Salim , Cong Wang , Jiri Pirko , Vlad Buslov , Yevgeny Kliteynik , Jan Kara , Byungchul Park , linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 96708100016 X-Stat-Signature: bxdfqpfxndpc8wnde931b3x8dh9kjc9o X-HE-Tag: 1745510605-543381 X-HE-Meta: U2FsdGVkX1/rixnOklgEo1lCBjgvDnu1pXwo8TVj5mk2Gmy2V/xavWfOtRxi+98a/LblGRiJdmlRK93c1++gb5Axj1mrQi9J3tJ+rq/i2168/e1MyPoFSE/CJ1ic2VartkF7BAnjkw0q8SqW9c33OytgbPcvQ1SasT7f0GMMm1wZJak7TZ2OLfnAPLXpW3o1+o5qm6JJjNkl+cTcmzxKn0CPRk3uEBXChQED4KJoBOgpZ5XjUxDv1thGE9Imp0fIgKT+PTHAPdvuf5lEZnsNed0KEBvo+BFBNsl1XFacShroY30uuAzSLxy3T1OTdsbKe+B3jjGnO7Ca3/oQr0WIM0afcNs9eqKKX3SLWoFcm8u05+/p0eVTUcgIpw1OllEOBoS+qfWhhvYy5X0gXxvSDiGYwF1XzMYkWITvQfne0vNGL8WC/hnspkZXXDgYEmIHNo6VE9sC2r3l5Klovf8dbd+Ph/Osz36Oq5VaMnXt4F3ueahcPhoCFCk77EU2gJrlpPaHCKH3oL5yf9N5x35Mhuy3GPzXcdJPQNN2qlolVg9Tx1SECDNuw8tEBkSDh0xwJVSvd8vTP5GzEet/nyzmWfO//SghV+Bj1ubJXo6h3Ti9kYWf8cLA0aSwTfCi2xRF2b8ueSWeim0WbFVEDehAGDmRuKUNRKooB1OzOJKVAiem5LrceX6rtuCcMhanpaqeGtMr2RIQlMuln5naIEjxFSFEDoYWseUd36GDLMAFdrg7pXU5kEWpA+qcgMVc2VzgvCcb7+SpUa80qU/MDdLdgSpkzekSVtwyVZS8dyzCdIs2VqrcZAdyjR1c8Q5wx/moM0b+aG8DIw/vKesKJHNEOInFIoVIDT2G05xiDiP2yGsLI7FVgUD3jpmfM5TlC3NfoYaUSZpNGV28OZ+DVVWCOHftnJdiq/8nsye1jiFddUuLN5F3Silp14pO/jp4PRli2w7bsmMjs1FxznLeLtE sRaj4Fbm 9ElqOFxFIh60K8pe+uCIPcSqh7cxN2qtGDgNaDE/q3tfY0J+Cuar8GTUYQP34r9z/WDqx8Eqx3Atru5iMz9932CjI3VoaJToVC+Qj/iMPtcIHm7h5c8GXoeSkaKHSl0vtVAhjDa6nFEIKGpMlKy/9/cmUHLYMc0XiMlJO0gYCFlW5ibtaiV+I/PFmXKiIfBU2mRqVTJqKAZL/4wAv2QjydXwZA8TOykgp32zbPAI+mqwVJ2eu2Qeh0vg+q/K+JkpUJ3YVSt/pSu3j+9+r8Erdh8kyhpJwJ6jOeR9wPITF9A0uN/8IbPBEKexF7k/ZmYNXQN5uz6X+Zzhp00cmH5/VBdkkPxDilW2LBfoIDPLXonEsmEsNsOYwthoUyo27uSxXXeCmw/oFll5vDmZmpa3bFdWkLvIRmmvYR/7wvgT+13FI9C5o7s/3Su9TPf/+bapz+nLXFK2cwir96ymXp8IY5U6V4zyZaKHpgUax X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 24, 2025 at 5:50=E2=80=AFPM Christoph Lameter (Ampere) wrote: > > On Thu, 24 Apr 2025, Harry Yoo wrote: > > > Consider mm_struct: it allocates two percpu regions (mm_cid and rss_sta= t), > > so each allocate=E2=80=93free cycle requires two expensive acquire/rele= ase on > > that mutex. > > > We can mitigate this contention by retaining the percpu regions after > > the object is freed and releasing them only when the backing slab pages > > are freed. > > Could you keep a cache of recently used per cpu regions so that you can > avoid frequent percpu allocation operation? > > You could allocate larger percpu areas for a batch of them and > then assign as needed. I was considering a mechanism like that earlier, but the changes needed to make it happen would result in worse state for the alloc/free path. RSS counters are embedded into mm with only the per-cpu areas being a pointer. The machinery maintains a global list of all of their instances, i.e. the pointers to internal to mm_struct. That is to say even if you deserialized allocation of percpu memory itself, you would still globally serialize on adding/removing the counters to the global list. But suppose this got reworked somehow and this bit ceases to be a problem. Another spot where mm alloc/free globally serializes (at least on x86_64) is pgd_alloc/free on the global pgd_lock. Suppose you managed to decompose the lock into a finer granularity, to the point where it does not pose a problem from contention standpoint. Even then that's work which does not have to happen there. General theme is there is a lot of expensive work happening when dealing with mm lifecycle (*both* from single- and multi-threaded standpoint) and preferably it would only be dealt with once per object's existence. --=20 Mateusz Guzik