From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9342CC3600C for ; Tue, 8 Apr 2025 07:46:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C73B06B000A; Tue, 8 Apr 2025 03:46:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C230B6B000C; Tue, 8 Apr 2025 03:46:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC3196B000D; Tue, 8 Apr 2025 03:46:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8E1E76B000A for ; Tue, 8 Apr 2025 03:46:30 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 789E6AC85C for ; Tue, 8 Apr 2025 07:46:31 +0000 (UTC) X-FDA: 83310094182.21.F7D31B8 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf03.hostedemail.com (Postfix) with ESMTP id 8A2AE20004 for ; Tue, 8 Apr 2025 07:46:29 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EmFRZNE0; spf=pass (imf03.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744098389; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CYWLgNMUz1gmoCTVrt8FUVjBZYckyum9T6o9Eaioln4=; b=wJ/3MEAmFOms1+6Z8hPYLq/DsaHFRvhptHA92VfRGSm98vclNeSqxAL/eoOQz51Xj57gAV vcLlXcEoiLUZ3i6WXqJqjUqAc/ehdoY36rdr6g4v0BMdKEgkrg8Ddf0k9m1L9y+LxQ5k7L 2DHLxS/nQoMExDqIpiH8Yzxgb0tZcoE= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EmFRZNE0; spf=pass (imf03.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744098389; a=rsa-sha256; cv=none; b=s0S5w0CvRAFWxJv1E1VKJNrSLDBwEf/EZe7d3DdCBe/1M86ebwau9b5vWn2DX0o+rkAc9E AugR0Vu/j8fFqH3Oz08a7I4Hrr11MuA/KXvBgiVbjm3lYpVHrfgWiVE0RaiJGfGDXoc6Sq /JVhHgXIF7XDNVzA1TM4sIaqVgY1XUw= Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-5e5bc066283so8546998a12.0 for ; Tue, 08 Apr 2025 00:46:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744098388; x=1744703188; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CYWLgNMUz1gmoCTVrt8FUVjBZYckyum9T6o9Eaioln4=; b=EmFRZNE0Gp2qv513LEnQOT7uWiSYrQoPEnXYiUHXAqWfbM1HsHSse3z0rJQVRFJu8h fZTCpVqcu3r4auoX8CYdZuu92LS4yd9ELolMw0FZSO2QXFEwpYGPlwEYvC/IL4TdWsW4 YxvJ4GgmF4mTj8kFUp7C7h00/I2Zjs6DlMiVS41V3eX1CbxTyK4d2LLValk1yC6I0Ad2 WrfjiEW7CDJnFF6uD6DoXi5LBq/SNSmbhwvad6jDyj18dTmCbkV9GduLfKrqN5XkqoQr NcqA6ySIyKzRdmtYHc+uHNzsmEE16CjQnBKA2db7mNa+59bSeiwDg6+2+ksyRq5brQIL hW2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744098388; x=1744703188; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CYWLgNMUz1gmoCTVrt8FUVjBZYckyum9T6o9Eaioln4=; b=M8CJ65pKLemzmprWH853yF0eSHwBaRaD7lniFzDZ7e1rI5jap54+fCG00mxL1sQmaf 8bbnug3oeSf1QsO7Z8/SIFmTP6ZUp/3xqpJVEQ9uQkeCCczGIXgzkQMwTElxIBMEdESV FRltu6Ea1LGLLTt8Mb5fYLNTEaJ12GDl6Ob4Of/GdhLyMm3zqL4i0xhkXL4aIQuhEzbN FPSJb0DtrOQpHQjllaA4aUx/wSBk32wFTcpNW36ndLWv5kWr61mUAa9Oowk9EhPwlnq4 2y2XzYFx5/C0aWz8tPc809gv4oEqcZrvG8DJeQcaXFYKBz0dbOfqm0TkVp+o3wt4fZ1l drUg== X-Forwarded-Encrypted: i=1; AJvYcCX/S+/HwwG40cZHWHKdWlCwwx3OO4dCzy5VUPcNoFCVgNn8tSXXmcg2nbmG5jLZUBelGu5RMYZ5HA==@kvack.org X-Gm-Message-State: AOJu0YxmEUbYwgrGxHdrCkbx4UDwFAHACIq81/R+7rDD+Q+judb3tc86 ddsCKUBV2sUaGOcpeccktbPnds2elZLPgCg2In9rWN5wx5IsRQSLtD76LGVAuyD/RyNTSFJJoHJ IQYaH7QWmnVtYHcZXFpsHYEnFIK8= X-Gm-Gg: ASbGncvZ0C1YbFRsaEtDDLOtTVTqdCzSmusvWzhU62EQFvkU/Md2443RgVbVaSKFFQs XSCGKMdP3KQxcVbhT4/y6eA0r+Ew23Z4ughZ6BdrkMfSuPiFwr1EfzzGFnxpEGNOT/Igr7OZsfW 0/Nnk+17En8ZAdWp6VhowFMlow X-Google-Smtp-Source: AGHT+IFmG6MLnSUDQL0TyguKXOKi9sZPAf8BJ1l0ld/2h6vDza1UJsGTRfqurp3q32bdQUAB0t99uhb/N1ovjRQpxIM= X-Received: by 2002:a05:6402:1ece:b0:5e6:466e:5866 with SMTP id 4fb4d7f45d1cf-5f0b6606682mr13812283a12.25.1744098387736; Tue, 08 Apr 2025 00:46:27 -0700 (PDT) MIME-Version: 1.0 References: <20250331223516.7810-2-sweettea-kernel@dorminy.me> In-Reply-To: From: Mateusz Guzik Date: Tue, 8 Apr 2025 09:46:15 +0200 X-Gm-Features: ATxdqUEpX6XFuXeA6vg0-rN_JsZ2MEwWgX3OKplmGxlA9qwyfA2V1kLMwyG-s0c Message-ID: Subject: Re: [RFC PATCH v2] mm: use per-numa-node atomics instead of percpu_counters To: Kairui Song Cc: Sweet Tea Dorminy , Andrew Morton , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Dennis Zhou , Tejun Heo , Christoph Lameter , Martin Liu , David Rientjes , =?UTF-8?Q?Christian_K=C3=B6nig?= , Shakeel Butt , Johannes Weiner , Sweet Tea Dorminy , Lorenzo Stoakes , "Liam R . Howlett" , Suren Baghdasaryan , Vlastimil Babka , Christian Brauner , Wei Yang , David Hildenbrand , Miaohe Lin , Al Viro , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, Yu Zhao , Roman Gushchin Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8A2AE20004 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: gkgk5tuo88kkuzhxo3nt517ra81mut5u X-HE-Tag: 1744098389-735759 X-HE-Meta: U2FsdGVkX1/uHWOs6Ayl8TBF/nTOWG1yj9JAqR63AKCWFHoL6ii9PA7sSA7LbKGTRke036m+SM3cLj+k3gDvMdO3dhLEBvn3SgQOTl6lE8OPLtzo//JGiJUFLShb07pB4X/cgHkwyBTDz86vh9WpzHaODUXfGAi6SxUYsQ2EnrchVLSp9Efa6HaUc43uAPlL/NTwnD7sBOg4fRObF3gsZ8HXRNNfSFCaBDRjNAqFYTmcMBhUlwDXwkNgJlWXB3sFOsrDWkDdStsY6WpUxM2hf8lsCBBhkOeLnDKAAgN+rR/BtqaPfmUwVF8jFkkfHkWptCE1ppJAKSCC0UkPtgR7Q1kQ5PCXD9eM6WOQv+8OUk9GM3R4N5h3J6ovx3Sh4so/9JJKhzURD3JPhvPLLr87pdNDVrPmaNni+HibhdDVPVp0UCwMkUTJBmwArFw6lGfENUI0xmQ3sgnUOkZwW4HNpWtlFR4iKBSMyE57H7PEzvA9qNLFILGaZsOlBPtbqbHjoBsXSkL0xe1stxkdNR8xdxlLXKaC38a36SrdaCWN5TyhIzfyj+gQyI4R/OnOnCGAHTxdiWn4U/LCgs/VgbP6RjKxtHDLOgO4rVj9Z4VTE+4fLDCiRLA+ULhLd8Xqh4w2IufeESZq6Qwcf2xpsF/Y9Ozi8p82OEGPS4mNa5uDmkExG6BqPnb8IWw9F6naUIfhgYnZo0mXlXMprLb6nDHSCol0NOEJ/XQl2SRP05DNJszEX/uRNADa9zWZt50XidkVRGEfJYgFvMrfwf/2cnHsugZ7pqyEttnEa1qDglb6gIGbkSTboNMyIKbDX4tr+Om48c21D3Zs2a6kOW0dVDk7HjAt3liGyeibXKP/fn5OKoAZPM5aIlLk1d8Ea1ryzCavEv13LRDwmVUzzhJ8M8Xu1WaDRMPYVH8eSe5wjgLtEc9El0nSOWzh3XpTCacDsHg7ooiio3E52NZ2IajOwAf zJn7uMuI mhHkUNutTVtQfu6nd5wg1KuEvw+bTZEZMh6al6KBMqYVy9cmcdcCRj/zpemYsFc+suRUKCynwvhVa9s5/QMCERWoDNIVDgiVr5bXLAggshNdU4LwSJ78M4ddZ4hgT7JBE7Hon07du/gLK/g1DkEXs+J7dWX0lUZ5gmrnAWvsUnWrnWZbu+TVg/b1T115QmZ5bcXXr2TsWWrGC+K4MZArQvo+woAG/dkugwEnCV/HqLzCj3OJ4eD1Qqhus5elNG47V+T8WE1AFT1na6pb/JmfYulwLjDsIhyfYDR1nIBT+iU90rxObLhAnWifx7dg/ohB1uF/OJplQ6Bourhj1kOu61Bry2VMK8MdLfOW4n73Jj2NI/ntcmxBJMOMeVfYd/cvFf6mNUChy80vLjj89DL+pMB9/Td0m0bVLSfd56Pr0fD53dw6fU52Ws4QBCsmimxn0qr62Nh/vTQ+ax9s6rObGWA5WAA3ciSnL6K30FUARwjzP33xbhI0CvClQKA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 4, 2025 at 6:51=E2=80=AFPM Kairui Song wrote= : > > On Thu, Apr 3, 2025 at 10:31=E2=80=AFPM Mateusz Guzik = wrote: > > Note there are 2 unrelated components in that patchset: > > - one per-cpu instance of rss counters which is rolled up on context > > switches, avoiding the costly counter alloc/free on mm > > creation/teardown > > - cpu iteration in get_mm_counter > > > > The allocation problem is fixable without abandoning the counters, see > > my other e -mail (tl;dr let mm's hanging out in slab caches *keep* the > > counters). This aspect has to be solved anyway due to mm_alloc_cid(). > > Providing a way to sort it out covers *both* the rss counters and the > > cid thing. > > It's not just about the fork performance, on some servers there could > be ~100K processes and ~200 CPUs, that will be hundreds of MBs of > memory just for the counters. > > And nowadays it's not something uncommon for a desktop to have ~64 > CPUs and ~10K processes. > > If we use a single shared "per-cpu" counter (as in the patch), the > total consumption will always be only about just dozens of bytes. > I agree there is a tradeoff here and your approach saves memory in exchange for more work during a context switch. I have no opinion which way to go here. > > > > In your patchset the accuracy increase comes at the expense of walking > > all CPUs every time, while a big part of the point of using percpu > > counters is to have a good enough approximation somewhere that this is > > not necessary. > > It usually doesn't walk all CPUs, only the CPUs that actually used > that mm_struct, by checking mm_struct's cpu_bitmap. I didn't check if > all arch uses that bitmap though. > > It's true that one CPU having its bit set on one mm_struct's > cpu_bitmap doesn't mean it updated the RSS counter so there will be > false positives, the false positive rate is low as schedulers don't > shuffle processes between processors randomly, and not every process > will be ran at a period. > > Also per my observation the reader side is much colder compared to > updater for /proc. > Per my comment, the read thing happens a lot for mmap and munmap so it cannot be taken lightly. You can check yourself with bpftrace. While I can agree vast majority of processes are not very thread-heavy and vast majority of machines out there don't have hundreds of cores, this does have to behave sanely for the cases which *do* exhibit these conditions. For example a box with > 200 cores and 200+ threads to boot, all running on the entirety of the box. In your patch as posted fetching the value will force the walk *a lot* and is consequently a no-go. This aspect needs to be dealt with for the patchset to be ok. Otherwise few months down the road someone else will show up and complain about a new slowdown stemming from it. --=20 Mateusz Guzik