From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D022CDB46E for ; Thu, 12 Oct 2023 15:22:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75E518D00FC; Thu, 12 Oct 2023 11:22:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 70E938D0002; Thu, 12 Oct 2023 11:22:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D6AD8D00FC; Thu, 12 Oct 2023 11:22:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4F22E8D0002 for ; Thu, 12 Oct 2023 11:22:58 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 278EFA06C7 for ; Thu, 12 Oct 2023 15:22:58 +0000 (UTC) X-FDA: 81337177236.19.A142A01 Received: from outbound-smtp45.blacknight.com (outbound-smtp45.blacknight.com [46.22.136.57]) by imf25.hostedemail.com (Postfix) with ESMTP id 3C443A000A for ; Thu, 12 Oct 2023 15:22:54 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.57 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697124175; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YKzlfFcwL05GqJx+tQL5QM9vs95/GioTZAbaSHJc85U=; b=NJh9kSQsnvigmMKABuuY9qSGv5DksSxyqrIcvCJD+9RGf3CwTGDBj27cLxv+DAZ0aMfh5/ XqNRd0EW8xCcYmUE2bgPolPsUanbpJkjdzAUSxs8goWtP5Mt3xcW5dur5VnfSkpSYSN2ll V9O3rpIQmvHqNGrfDsL3ePru9oYjINs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697124175; a=rsa-sha256; cv=none; b=oyLwWoBOqHdtKpoZGcdVtMCZgEWJq+n101jXO+WaBFcB9skGqjaTCrt8PsWQnucYXtxnKA IwOCjbkfDfPqaqJfh7urUSLZwFShXLu3YzvazphkdEXItf/25lickbsfAVudFqHW6kMt6c Bpt6kPqAhC4Jz+2LEpnYdMZXtQ+rQvQ= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; spf=pass (imf25.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.57 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp45.blacknight.com (Postfix) with ESMTPS id 2C5AAFA7E0 for ; Thu, 12 Oct 2023 16:22:53 +0100 (IST) Received: (qmail 3313 invoked from network); 12 Oct 2023 15:22:53 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.197.19]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 12 Oct 2023 15:22:52 -0000 Date: Thu, 12 Oct 2023 16:22:50 +0100 From: Mel Gorman To: "Huang, Ying" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Sudeep Holla , Andrew Morton , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , Christoph Lameter Subject: Re: [PATCH 02/10] cacheinfo: calculate per-CPU data cache size Message-ID: <20231012152250.xuu5mvghwtonpvp2@techsingularity.net> References: <20230920061856.257597-1-ying.huang@intel.com> <20230920061856.257597-3-ying.huang@intel.com> <20231011122027.pw3uw32sdxxqjsrq@techsingularity.net> <87h6mwf3gf.fsf@yhuang6-desk2.ccr.corp.intel.com> <20231012125253.fpeehd6362c5v2sj@techsingularity.net> <87v8bcdly7.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <87v8bcdly7.fsf@yhuang6-desk2.ccr.corp.intel.com> X-Rspamd-Queue-Id: 3C443A000A X-Rspam-User: X-Stat-Signature: 8xrxahrkygqbdqcojt9xsdcs167aokxn X-Rspamd-Server: rspam03 X-HE-Tag: 1697124174-72882 X-HE-Meta: U2FsdGVkX193dFJXyQ6ZkJRd0AC1Vtt9JL5K/YiPS5SA7qOf5iyctDD//rmWnj5N7aTCnVtGCfU5H/ZO0W9eLc4OBq9D+ZaT+Ga13ebetEpdi4j3WHC/ZM3tnRXjqtuBO0p3q6ESx17Leq66b6Z/NwoUcls9PGRIcoPEApOGT6rNF8xVhWdMPYXoGv0n8I+K7UL+lbNb/EUFq74eOuOdFLxt8G0p6RU4+7izoTShMXgvRZjR8bSaQqObF4tGvKf9cOTrw7bqlc/fK1LxUdVocJPrShF7Oz8u6CbEGZDHgCMYiA+Mv1W75DiRcrpMRVg3NFxzrAHtZx8KcQS48H8KHjeSKWrhpxgE8UsxW7nGxuduA17S994ei8vOiQUFMflR137eOV47yl6m9095KAeR2qYVrhHipFn3+6Nm4LMWLXNfXMZ5F7nKJ47wJeBXHGI0I7Jk8InmWE05GgaILJ81SBx5Sgl+bKbp+moL5YwKpD7ahjnrTtPmRhFBRQhZDkTEjg+kXNHL1rhX0tJ3iZ/NVrbCKBoheIZALrF3hWomPve+FkgOCQxAyB7GfzbE4IzJnoRI5+0uEFk3GxUyz4ylnX/c935W/D7Ip3cULT7w/dINZb3vO6MsAkEPyMhwMiSnaj8dOS5sAepo1w0UQd1DSBKiEi1WeFwt95EmQmLWCk2idsJLzVKQYnRozGPorznnI2LLUIv2tMif+DKdVgd5NW5c0A9QFd6smgG0UFAdd/Jq3b/ZhwQQnMl8pm/+lI93UFe76RGLWhbl/TjJNhhsI4RngC7Ud7Ud0ucJc8+4n7cuE6K0rRuDwddCaJI1RaUnlHXYX0enpIWPazF1JFHOBzrtjASjOjy4kmkgGngqxFZyTWOXKtjFjEM+mqmTIDV+g7SeVqPHqSGMNc2i4gsLuTiQ2r+cPwbK53u5j9HbK9L3JeQE7R+3sT6Ui4hNmYLtFNWfxw0VKUNRO5F9iWL KbC2zGnT IUJeA9rr2go7J8ZXZzmvECLGtQ73ad453igz3n07U3/uXNd2M+efIEQ2JUAoXLkms7O9GRcZcDnC8A4og0mANvgj+h6OO4j1aI1CW1oyaHvSDThhLCTeaO/lWv2tNiEuH35nFfTrEU2wsazbPL/PntvSN4NGLVecl4DRI+bLQhxKK9fJE7HiXe6FCWDOXzgngNdI+E4AkKvyQAwEIjqMayuONcCYjYZ3fvVszgB9bEx5O0rpedXbMKtFHxIw6ulVW9HWV7tRif/joRdFxuYqesXKxUKtUnZBAwYdA7J3n6Wgcd3eVBh+plpX6yTsrQRmuBBWIeMMc8On33sPKao2vVBgq5XrBpCYrXgFUBj7sA2kP4yK9I04mgt7AIksuWwx2ox396y31D/0DxB3gxwHU0hSXdo/zlE8aZ5YdQ4LPj9HjBPZnW+FeN4I3CUIbrBfF9Fv7evCELy4NN2u5okXUGjYiTYJhC8jqCBTkqDpG/Uhn9gVrfIoaRLHxJnvgIIaPQiHur4Rd0dAzBYJFKpgZoXOahN3NkFik9GSMHImAHCn4t5Xwb3OlpfLCb7KNmupsAD7jVEjLx+QuTBSTbB5m5DvKqz4xl8s2wqjVeXeW31c9cr2qN5Kuzp7v7zj794Yfoicd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 12, 2023 at 09:12:00PM +0800, Huang, Ying wrote: > Mel Gorman writes: > > > On Thu, Oct 12, 2023 at 08:08:32PM +0800, Huang, Ying wrote: > >> Mel Gorman writes: > >> > >> > On Wed, Sep 20, 2023 at 02:18:48PM +0800, Huang Ying wrote: > >> >> Per-CPU data cache size is useful information. For example, it can be > >> >> used to determine per-CPU cache size. So, in this patch, the data > >> >> cache size for each CPU is calculated via data_cache_size / > >> >> shared_cpu_weight. > >> >> > >> >> A brute-force algorithm to iterate all online CPUs is used to avoid > >> >> to allocate an extra cpumask, especially in offline callback. > >> >> > >> >> Signed-off-by: "Huang, Ying" > >> > > >> > It's not necessarily relevant to the patch, but at least the scheduler > >> > also stores some per-cpu topology information such as sd_llc_size -- the > >> > number of CPUs sharing the same last-level-cache as this CPU. It may be > >> > worth unifying this at some point if it's common that per-cpu > >> > information is too fine and per-zone or per-node information is too > >> > coarse. This would be particularly true when considering locking > >> > granularity, > >> > > >> >> Cc: Sudeep Holla > >> >> Cc: Andrew Morton > >> >> Cc: Mel Gorman > >> >> Cc: Vlastimil Babka > >> >> Cc: David Hildenbrand > >> >> Cc: Johannes Weiner > >> >> Cc: Dave Hansen > >> >> Cc: Michal Hocko > >> >> Cc: Pavel Tatashin > >> >> Cc: Matthew Wilcox > >> >> Cc: Christoph Lameter > >> >> --- > >> >> drivers/base/cacheinfo.c | 42 ++++++++++++++++++++++++++++++++++++++- > >> >> include/linux/cacheinfo.h | 1 + > >> >> 2 files changed, 42 insertions(+), 1 deletion(-) > >> >> > >> >> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c > >> >> index cbae8be1fe52..3e8951a3fbab 100644 > >> >> --- a/drivers/base/cacheinfo.c > >> >> +++ b/drivers/base/cacheinfo.c > >> >> @@ -898,6 +898,41 @@ static int cache_add_dev(unsigned int cpu) > >> >> return rc; > >> >> } > >> >> > >> >> +static void update_data_cache_size_cpu(unsigned int cpu) > >> >> +{ > >> >> + struct cpu_cacheinfo *ci; > >> >> + struct cacheinfo *leaf; > >> >> + unsigned int i, nr_shared; > >> >> + unsigned int size_data = 0; > >> >> + > >> >> + if (!per_cpu_cacheinfo(cpu)) > >> >> + return; > >> >> + > >> >> + ci = ci_cacheinfo(cpu); > >> >> + for (i = 0; i < cache_leaves(cpu); i++) { > >> >> + leaf = per_cpu_cacheinfo_idx(cpu, i); > >> >> + if (leaf->type != CACHE_TYPE_DATA && > >> >> + leaf->type != CACHE_TYPE_UNIFIED) > >> >> + continue; > >> >> + nr_shared = cpumask_weight(&leaf->shared_cpu_map); > >> >> + if (!nr_shared) > >> >> + continue; > >> >> + size_data += leaf->size / nr_shared; > >> >> + } > >> >> + ci->size_data = size_data; > >> >> +} > >> > > >> > This needs comments. > >> > > >> > It would be nice to add a comment on top describing the limitation of > >> > CACHE_TYPE_UNIFIED here in the context of > >> > update_data_cache_size_cpu(). > >> > >> Sure. Will do that. > >> > > > > Thanks. > > > >> > The L2 cache could be unified but much smaller than a L3 or other > >> > last-level-cache. It's not clear from the code what level of cache is being > >> > used due to a lack of familiarity of the cpu_cacheinfo code but size_data > >> > is not the size of a cache, it appears to be the share of a cache a CPU > >> > would have under ideal circumstances. > >> > >> Yes. And it isn't for one specific level of cache. It's sum of per-CPU > >> shares of all levels of cache. But the calculation is inaccurate. More > >> details are in the below reply. > >> > >> > However, as it appears to also be > >> > iterating hierarchy then this may not be accurate. Caches may or may not > >> > allow data to be duplicated between levels so the value may be inaccurate. > >> > >> Thank you very much for pointing this out! The cache can be inclusive > >> or not. So, we cannot calculate the per-CPU slice of all-level caches > >> via adding them together blindly. I will change this in a follow-on > >> patch. > >> > > > > Please do, I would strongly suggest basing this on LLC only because it's > > the only value you can be sure of. This change is the only change that may > > warrant a respin of the series as the history will be somewhat confusing > > otherwise. > > I am still checking whether it's possible to get cache inclusive > information via cpuid. > cpuid may be x86-specific so that potentially leads to different behaviours on different architectures. > If there's no reliable way to do that. We can use the max value of > per-CPU share of each level of cache. For inclusive cache, that will be > the value of LLC. For non-inclusive cache, the value will be more > accurate. For example, on Intel Sapphire Rapids, the L2 cache is 2 MB > per core, while LLC is 1.875 MB per core according to [1]. > Be that as it may, it still opens the possibility of significantly different behaviour depending on the CPU family. I would strongly recommend that you start with LLC only because LLC is also the topology level of interest used by the scheduler and it's information that is generally available. Trying to get accurate information on every level and the complexity of dealing with inclusive vs exclusive cache or write-back vs write-through should be a separate patch, with separate justification and notes on how it can lead to behaviour specific to the CPU family or architecture. -- Mel Gorman SUSE Labs