From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B36AC2BD09 for ; Wed, 3 Jul 2024 07:38:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EEAFC6B008A; Wed, 3 Jul 2024 03:38:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E9A6F6B008C; Wed, 3 Jul 2024 03:38:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3A556B0092; Wed, 3 Jul 2024 03:38:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B73F66B008A for ; Wed, 3 Jul 2024 03:38:56 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 69CBB1A0300 for ; Wed, 3 Jul 2024 07:38:56 +0000 (UTC) X-FDA: 82297639872.01.A4B82C3 Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) by imf26.hostedemail.com (Postfix) with ESMTP id 5E582140006 for ; Wed, 3 Jul 2024 07:38:53 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=ZIqJGEin; dmarc=none; spf=pass (imf26.hostedemail.com: domain of tvrtko.ursulin@igalia.com designates 178.60.130.6 as permitted sender) smtp.mailfrom=tvrtko.ursulin@igalia.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719992307; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=otQl1juZa2F5D4LyOMjrn0FzANzEGfAwgkR3egn5RdE=; b=uccjHXBqH4fWQRH0ljBKrNAn0n+9bTV2FyLbnicIOmWjaqeIdTh9/Bm1jiEWNsIDNkbi72 NeFvhF3zLUpjFbLzB0++C+djugErRt/syzxBkqHxNrsLrnnUkhoeqOE9epwe7d3oYZqINn QMAInMtXnlzRA43JGt6rz3dDAmR1M+M= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719992307; a=rsa-sha256; cv=none; b=Qqvgncg2w/n6npbrrzxi4SgzXfMjiYkjORrWEMEexRXc3NuuMN9kpTHybMFfA6dUm+QHH0 VO8ljz20nQqmFT4jQUsUCzuVjHKgTkKso2MxNGUjIr/Etlfu3T2Jq2bA07cYqKYcTeX3q2 lHn3F7/DhoPOfOlE03TFSucvLPceHEI= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=ZIqJGEin; dmarc=none; spf=pass (imf26.hostedemail.com: domain of tvrtko.ursulin@igalia.com designates 178.60.130.6 as permitted sender) smtp.mailfrom=tvrtko.ursulin@igalia.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:From: References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=otQl1juZa2F5D4LyOMjrn0FzANzEGfAwgkR3egn5RdE=; b=ZIqJGEinDLZVR5SqFLwN38DAf7 ehl3in5Y5MpHyWTu1aFVKYQUgxtXCvPf+I//QoATd5L1TlOAeg9ObdgXUJ4UggbEmEzVo8XzDG8DC igclkVUg9zbYWvaQYnMb9axcTnVO4zGbdGoF9OA9iLC8BsVEwzzI9sfVTgZhgRhfBuga1fg/D5Q8I FytGlbe5Asdm+ITZ4NNMo4hj023x4zm4XAz8UgMqdeV1UXadkJm0soBfEHVGB/NRH+tYt4hetNVB5 fj51Qh09vuyfldczOZpPW+YTrZOYpNe+go7S2hb1f8s/ulD8akpcR01HmiPRGBx9d+bCwepW8QNi1 3TxJybjQ==; Received: from [84.69.19.168] (helo=[192.168.0.101]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_128_GCM:128) (Exim) id 1sOuZb-00AbMq-Na; Wed, 03 Jul 2024 09:38:39 +0200 Message-ID: <2fe66068-4419-4bfc-a92b-2ece3cfcb2ad@igalia.com> Date: Wed, 3 Jul 2024 08:38:38 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] mm/numa_balancing: Teach mpol_to_str about the balancing mode To: "Huang, Ying" , Tvrtko Ursulin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, Mel Gorman , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , "Matthew Wilcox (Oracle)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes References: <20240702150006.35206-1-tursulin@igalia.com> <87o77fkprp.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Language: en-GB From: Tvrtko Ursulin In-Reply-To: <87o77fkprp.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5E582140006 X-Stat-Signature: pdae9yehqskooksqnpft31zqmcouqz75 X-Rspam-User: X-HE-Tag: 1719992333-887369 X-HE-Meta: U2FsdGVkX1/oKjmjrfHNfwUXrfiZzgxF4J13FiroYUwKWy83l20EnYGKeqwXXvI1oUMOPo7PN59UVpslB9PPm9AbcTBZys6FupNF0Ap5El08DBTxc4fvXejdasEw9MZwUHQr2gnWgsudR6/5sktcF4hLJj0CUFtM6B/yqMtkc5pq2uqt7A81+YVdoaLFBk1QV1Usbq+FVsw3VY5iipRizycDV0RHnw9spcz+X+QmlOA3vwFNTiimdsdmHB6iprsY74AvZRT+CvKraw6wwvlll4uaHcIfBD2q0pTLbEXoMV56q9S7xmopeS1RJipbColzO6P262b6c5TaBzR5MOqYDH/4HYNyKUgv38Ho7yh4vGpFz+L2SNuKnekLWT2rmvrIOFdUsb7jQV7h3qEiarhspnOkWdwpu+YHSxspXUo4bdMMIkHW87/X6SE2Nknl8p6PKxFr0lVO+Q/UX1hqVoF6WovwifT6CzBznR8DFeJdS3j2+YYedNApOedTwuO0X8TOYbbcYEvx0LWM/sBzuQBf6tomKlrn+uRaT05whRHA1gtU1WGzrkMiLiK3dAjmESa+Lx+MDyva3dNTnyFEjPRaF7UBtKyCPYwJ+RspRnmEk2F2x3shWD0LjvoZrB3dx2aSAOEcnF/KyDiFf+xpP3+7YNgEkJmyNJoV0WxDXkbZb4JvSWp+NpzGsU13hB8xHs2wmaRkXqxI1lKOb7wgBK/pjWkCPChTmISKrAsmqT/gMKO9xak3FlLP6MdEETOr5ffa3strMX0FgizNTNMEEUDiTEW/njUmE2AwZcdc5Q+QmNL/KyjQXzjyPuGtXlUxnK+M5JJAz+rBArVX9Dtp4mLVQn4XfcgsZ6PLN0FbQT7tMQIsJc2rz/NLh/9yeHgXdXeICssks/qqOUea8rV8rqpPwFWfbU1gwhG6NDZrVMmeCeuS+eDasdNpmBPV/Xm/Yhn6kpWOK5kh9JifZGKWxMp e+OHn3k2 +3LCBV6sCMbu6HPdmc3rcohmVc9W8KzioZHjEXm0+Mue6U80eKV8/REXLvdkCPSNKNaZTopL1lVNLSlunPwOAJ+/EY3U1RXtkt+DpPXkifSmuXjhG2Zc6G7Y4VkqB1YAUhgvTFpBbmKaKwPpX5y89OzcHcG7OnK8tU5YgUsOSWQ3YJBaddgDKEI/3UDqmibUgJemJpjLu8bxhO6CeHOqTrbQ+2DNnqlWzInPkV/ojijiRiB6fnaa0vov16MXoK9ozF/Rh1MFmPW3rfxcIGl67U8PNr8F0x9c/Ej6R/Q18BN20hGIga8IDgitHcUOWslru/EKZmzG0Fep5xV+TnddvtBG8Rmbuadytxh+CnM9qrrxf3SqleTjX7yfWkeb9FsmwDRAz7UMn2mA043f7Eq5mGIv+RjZGUthAszbwJIjt7UQL6FgvLt5PrsOunL09n9HaBci0yh5GTB8eWKwQsC4knS5AmGjroluXgYw1s6mnAwsNTajw8XX46yoARfU7QPNVDMUuh/Y/xSjVkpb096qczx93W/UlQvy7CWz49pBZPWMS/hHJp598DLtgwBGcYETBRIq4QI43miqOQ/sEAB0NAPrysCucPr/G5WIwFOM3w/x0MTDQdJmdu5vqBUgBJxavjJVEdzwYdvR+63lWrkpal+KmZA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 03/07/2024 06:28, Huang, Ying wrote: > Tvrtko Ursulin writes: > >> From: Tvrtko Ursulin >> >> Since balancing mode was added in >> bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes"), >> it was possible to set this mode but it wouldn't be shown in >> /proc//numa_maps since there was no support for it in the >> mpol_to_str() helper. >> >> Furthermore, because the balancing mode sets the MPOL_F_MORON flag, it >> would be displayed as 'default' due a workaround introduced a few years >> earlier in >> 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps"). >> >> To tidy this up we implement two changes: >> >> First we introduce a new internal flag MPOL_F_KERNEL and with it mark the >> kernel's internal default and fallback policies (for tasks and/or VMAs >> with no explicit policy set). By doing this we generalise the current >> special casing and replace the incorrect 'default' with the correct >> 'bind'. >> >> Secondly, we add a string representation and corresponding handling for >> MPOL_F_NUMA_BALANCING. We do this by adding a sparse mapping array of >> flags to names. With the sparseness being the downside, but with the >> advantage of generalising and removing the "policy" from flags display. > > Please split these 2 changes into 2 patches. Because we will need to > back port the first one to -stable kernel. Why two? AFAICT there wasn't a issue until bda420b98505, and to fix it all changes from this patch are needed. >> End result: >> >> $ numactl -b -m 0-1,3 cat /proc/self/numa_maps >> 555559580000 bind=balancing:0-1,3 file=/usr/bin/cat mapped=3 active=0 N0=3 kernelpagesize_kB=16 >> ... >> >> v2: >> * Fully fix by introducing MPOL_F_KERNEL. >> >> Signed-off-by: Tvrtko Ursulin >> Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") >> References: 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps") >> Cc: Huang Ying >> Cc: Mel Gorman >> Cc: Peter Zijlstra >> Cc: Ingo Molnar >> Cc: Rik van Riel >> Cc: Johannes Weiner >> Cc: "Matthew Wilcox (Oracle)" >> Cc: Dave Hansen >> Cc: Andi Kleen >> Cc: Michal Hocko >> Cc: David Rientjes >> --- >> include/uapi/linux/mempolicy.h | 1 + >> mm/mempolicy.c | 44 ++++++++++++++++++++++++---------- >> 2 files changed, 32 insertions(+), 13 deletions(-) >> >> diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h >> index 1f9bb10d1a47..bcf56ce9603b 100644 >> --- a/include/uapi/linux/mempolicy.h >> +++ b/include/uapi/linux/mempolicy.h >> @@ -64,6 +64,7 @@ enum { >> #define MPOL_F_SHARED (1 << 0) /* identify shared policies */ >> #define MPOL_F_MOF (1 << 3) /* this policy wants migrate on fault */ >> #define MPOL_F_MORON (1 << 4) /* Migrate On protnone Reference On Node */ >> +#define MPOL_F_KERNEL (1 << 5) /* Kernel's internal policy */ >> >> /* >> * These bit locations are exposed in the vm.zone_reclaim_mode sysctl >> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >> index aec756ae5637..8ecc6d9f100a 100644 >> --- a/mm/mempolicy.c >> +++ b/mm/mempolicy.c >> @@ -134,6 +134,7 @@ enum zone_type policy_zone = 0; >> static struct mempolicy default_policy = { >> .refcnt = ATOMIC_INIT(1), /* never free it */ >> .mode = MPOL_LOCAL, >> + .flags = MPOL_F_KERNEL, >> }; >> >> static struct mempolicy preferred_node_policy[MAX_NUMNODES]; >> @@ -3095,7 +3096,7 @@ void __init numa_policy_init(void) >> preferred_node_policy[nid] = (struct mempolicy) { >> .refcnt = ATOMIC_INIT(1), >> .mode = MPOL_PREFERRED, >> - .flags = MPOL_F_MOF | MPOL_F_MORON, >> + .flags = MPOL_F_MOF | MPOL_F_MORON | MPOL_F_KERNEL, >> .nodes = nodemask_of_node(nid), >> }; >> } >> @@ -3150,6 +3151,12 @@ static const char * const policy_modes[] = >> [MPOL_PREFERRED_MANY] = "prefer (many)", >> }; >> >> +static const char * const policy_flags[] = { >> + [ilog2(MPOL_F_STATIC_NODES)] = "static", >> + [ilog2(MPOL_F_RELATIVE_NODES)] = "relative", >> + [ilog2(MPOL_F_NUMA_BALANCING)] = "balancing", >> +}; >> + >> #ifdef CONFIG_TMPFS >> /** >> * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option. >> @@ -3293,17 +3300,18 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) >> * @pol: pointer to mempolicy to be formatted >> * >> * Convert @pol into a string. If @buffer is too short, truncate the string. >> - * Recommend a @maxlen of at least 32 for the longest mode, "interleave", the >> - * longest flag, "relative", and to display at least a few node ids. >> + * Recommend a @maxlen of at least 42 for the longest mode, "weighted >> + * interleave", the longest flag, "balancing", and to display at least a few >> + * node ids. >> */ >> void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) >> { >> char *p = buffer; >> nodemask_t nodes = NODE_MASK_NONE; >> unsigned short mode = MPOL_DEFAULT; >> - unsigned short flags = 0; >> + unsigned long flags = 0; >> >> - if (pol && pol != &default_policy && !(pol->flags & MPOL_F_MORON)) { >> + if (!(pol->flags & MPOL_F_KERNEL)) { > > Can we avoid to introduce a new flag? Whether the following code work? > > if (pol && pol != &default_policy && !(pol->mode != > MPOL_PREFERRED) && !(pol->flags & MPOL_F_MORON)) > > But I think that this is kind of fragile. A flag is better. But > personally, I don't think MPOL_F_KERNEL is a good name, maybe > MPOL_F_DEFAULT? I thought along the same lines, but as you have also shown we need to exclude both default and preferred fallbacks so naming the flag default did not feel best. MPOL_F_INTERNAL? MPOL_F_FALLBACK? MPOL_F_SHOW_AS_DEFAULT? :)) What I dislike about the flag more is the fact internal flags are for some reason in the uapi headers. And presumably we cannot zap them. But I don't think we can check for MPOL_PREFERRED since it can be a legitimate user set policy. We could check for the address of preferred_node_policy[] members with a loop covering all possible nids? If that will be the consensus I am happy to change it. But flag feels more elegant and robust. >> mode = pol->mode; >> flags = pol->flags; >> } >> @@ -3328,15 +3336,25 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) >> p += snprintf(p, maxlen, "%s", policy_modes[mode]); >> >> if (flags & MPOL_MODE_FLAGS) { >> - p += snprintf(p, buffer + maxlen - p, "="); >> + unsigned int bit, cnt = 0; >> >> - /* >> - * Currently, the only defined flags are mutually exclusive >> - */ >> - if (flags & MPOL_F_STATIC_NODES) >> - p += snprintf(p, buffer + maxlen - p, "static"); >> - else if (flags & MPOL_F_RELATIVE_NODES) >> - p += snprintf(p, buffer + maxlen - p, "relative"); >> + for_each_set_bit(bit, &flags, ARRAY_SIZE(policy_flags)) { >> + if (bit <= ilog2(MPOL_F_KERNEL)) >> + continue; >> + >> + if (cnt == 0) >> + p += snprintf(p, buffer + maxlen - p, "="); >> + else >> + p += snprintf(p, buffer + maxlen - p, ","); >> + >> + if (WARN_ON_ONCE(!policy_flags[bit])) >> + p += snprintf(p, buffer + maxlen - p, "bit%u", >> + bit); >> + else >> + p += snprintf(p, buffer + maxlen - p, >> + policy_flags[bit]); >> + cnt++; >> + } > > Please refer to commit 2291990ab36b ("mempolicy: clean-up mpol-to-str() > mempolicy formatting") for the original format. That was in 2008 so long time ago and in the meantime there were no bars. The format in this patch tries to align with the input format and I think it manages, apart from deciding to print unknown flags as bit numbers (which is most probably an irrelevant difference). Why do you think the pre-2008 format is better? Regards, Tvrtko > >> } >> >> if (!nodes_empty(nodes)) > > -- > Best Regards, > Huang, Ying