From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A89F3C2BD09 for ; Wed, 3 Jul 2024 05:30:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 203126B00A0; Wed, 3 Jul 2024 01:30:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B3406B00A3; Wed, 3 Jul 2024 01:30:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07B056B00A6; Wed, 3 Jul 2024 01:30:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DEBC96B00A0 for ; Wed, 3 Jul 2024 01:30:42 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8A5291207F9 for ; Wed, 3 Jul 2024 05:30:42 +0000 (UTC) X-FDA: 82297316724.08.58245FE Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) by imf10.hostedemail.com (Postfix) with ESMTP id 63A00C0009 for ; Wed, 3 Jul 2024 05:30:39 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=AozoWZLh; spf=pass (imf10.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.10 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719984616; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QK03aMzszjl/LPx2+fYyR6cTwo1AHXmrmrZOPOB+gfg=; b=TQw5M39wiRx/w989uTYOPGXSux6jPGnhoj4sTHhDNSBbbJ3/wypo+SAiRirdofjTq0SVWs 29AUfuG3zen5EyaiRtPtUu28CAJ8yermKK0e1twEI5Nk3odzrqj5qWmmwBBo62qU2DkaVf MBGlzhbLtXqEsTffun3RFuzGB3mfa6E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719984616; a=rsa-sha256; cv=none; b=YSntagn7Yur6HSOqAt7+3Ok8ddt5VndVcz8E5zbtTFH2OjA3HOyCTFBDnfpKa363sCZsuq ozKKRjZpsBlvEFDuKV8woUaMTm0tr88Ik5WIzysX6e8nEZdmWkX6SH5qYM6c9NvfwYmuG4 3zDg0iOJS6qG0g3+B4JxJzOTDvQNcQM= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=AozoWZLh; spf=pass (imf10.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.10 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719984640; x=1751520640; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=Oe7dtG7Wch4wib4uF9zEAYhSwiJWyadvkQnsTHejGow=; b=AozoWZLhGRzZZCFgVAWREd8afqJ14VxaKNO0YWueXH2DjTyWYTDdyroY BQF1qDb1I2mI013i/CxoUUCnxBkkE5iNGrQeo1C850QnYGsfV56c3ALhb F4xnBqEc5rUkSS1GpLIO3XvsGQF7ERaJy2lwqvoplSxQKnC0sGLgDJ75J GrVv0X3tAMYvzUGVuhkPBwrvm+7uUTVOkQKys1q5dzp+zGM0Lk3atBOwy o8SH5BmXRUomfT9QUTULfnM+Sx6TdXWrdYLkajhv4xKmezXdr0ZNTFDcf A0POkRBw2HvzQls0jw+g4ZwExpemukrPzEbze67gJo7EGvl+jSqBtrmBS Q==; X-CSE-ConnectionGUID: sTKOy2jpQO+GpYwx87E3AQ== X-CSE-MsgGUID: qR2IjXODT0uk9kqdx0dlyw== X-IronPort-AV: E=McAfee;i="6700,10204,11121"; a="28586546" X-IronPort-AV: E=Sophos;i="6.09,181,1716274800"; d="scan'208";a="28586546" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2024 22:30:37 -0700 X-CSE-ConnectionGUID: zd83bQHSTBCjC+l7UrE8Pg== X-CSE-MsgGUID: 4hxORyeRTSCRLYu9LeC6EQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,181,1716274800"; d="scan'208";a="46112632" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2024 22:30:33 -0700 From: "Huang, Ying" To: Tvrtko Ursulin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, Tvrtko Ursulin , Mel Gorman , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , "Matthew Wilcox (Oracle)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes Subject: Re: [PATCH v2] mm/numa_balancing: Teach mpol_to_str about the balancing mode In-Reply-To: <20240702150006.35206-1-tursulin@igalia.com> (Tvrtko Ursulin's message of "Tue, 2 Jul 2024 16:00:06 +0100") References: <20240702150006.35206-1-tursulin@igalia.com> Date: Wed, 03 Jul 2024 13:28:42 +0800 Message-ID: <87o77fkprp.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Stat-Signature: gziknbpcjuggpg4s7tjcm43osixqhkf6 X-Rspamd-Queue-Id: 63A00C0009 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1719984639-568756 X-HE-Meta: U2FsdGVkX19Gv4yHfyXg0GCyN9g+iWkBbbYcCO6lqXT6roEEq2PrmIgo+uEFGfwdTxIIDHENzj0Ug1oFoLisVVO+ugpcP7bfPTKoYaF0Yy4JO8ckgwvH6i7SOQOCz/1HYev6vVjc+TwjswHAwdWf5dpIU1Y6pws6MiG7fX49P+xOSxQYbAeVLKOTq4Xk8fl4kzYdEKAzfVZrBfeufVPFJHRqTaJNYYiqPGX8F65YAna0KPix5TyPQSOq38VipO8YqtH+6hYn/dmYLMDdrsTx984XVQVgTZSpzULuEvI3/y1boA39SFFRmPB+fsSDV2UPfC04X8SM9Y04kV7MY0OquuigGDd5paOnVG7132dh9i0Id+V9N0Z6rV4Fqzzxmwq2sRrQOwqKxAGzVVD9zuBbgeI4IXd9dKtXDwV0UzthECjbUGXd8O2fcSA/lJ0Yw5+ywSOB1Am2aFDRfA/wI3XsnNE1pwDSx/GubkKN0S9y1zZAHzyWnkJnISwEm4rgknzTQNZjrrdkKXHWgd6Q8LTqdCKRobweX0y4JDx/uMze6570tv6U9yWE7VMm4Kq/UjOAt98Nw1qehm9P0sx9DJ6MHvJuVqvEaSQ7i7zzxbdoFw/VpRi4dhiUMgahsaopigXWBUTdTOkTSLt83efM2Gs9u5g9OY3IU4GGe0zWtjtvyyShaFkUDa8AsHpFtNXw4iIAJKMQE/GzkJEe6uQ0A69tSo9dcIcEYtAmWunC7qInfi40whdGYusfA8dFd1YYhr7nDyrVW1aeomRpZt3lDBpcXICzlXw4pTyupojnMwXaAs2LIWZkO1aAHk8oAlLZIIRj9HbHo81y1dBi5827yGgvpZwmJyUY3Ng3djyH8APOnEFP97BST4fEIfwes5hWPWmJsSpi/vRth4fmiyZskoD/8uZOALbrXucilqwSrm4erh3WDrGq1bdMrMAHrl3eQOgQ1gw+B30Rs6jyDNYL2fW iZi6Ffq0 /gsL4CagwKkW27p82O4obtuxXMMdZI5z5H8eXAqd623XkI820i8svZYxt1PJjUyvbLD1vLCAhjfaV0br/MTr2HFCiYnvMQlBF5iZZrBtlP+vKUlhVV+wWit0gRm2bKxSbYgetLyG+utRMeOnEgm7vdTUBmKu9Lk+rxOJFZssq6IomOtJEh09CWbtQEXq+N9V/hXwIXiuN60kvWPDbh+4hyyul4TaZ+JhvQgncaDQbAnWmHPsULdAihIBE9toJPB+Lc0HvWFcEyEzOIsTCJjwCZhiGr7OBPcjBu/Rdh0HbxSV0LEAwI7rR9yMVBDZv0nm76rPVgLiZkmZVYDs88nyhBHBOuyQnJJeQjiue05dVys1XXejSeLkJTPpMqm7/Zm1cN4R093nar88LIIyHDj5jOqkQ2i3smIflabfGfp8dnL6jRbnksVebuin9hHOoE+jOs7lvDmZDsrPsZJqFE+oo0/3V8NCjHvgBQ8KQKOTgxtBq9rana/NuoPU0FW5ELv9q7dDD1vLZoRrz0Homg1N1FOPTYKWmFFfWnN0b/xJOLO86HNbeXojq3KR3XoV4EAx+XrMQdv6AKYpbygQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Tvrtko Ursulin writes: > From: Tvrtko Ursulin > > Since balancing mode was added in > bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes"), > it was possible to set this mode but it wouldn't be shown in > /proc//numa_maps since there was no support for it in the > mpol_to_str() helper. > > Furthermore, because the balancing mode sets the MPOL_F_MORON flag, it > would be displayed as 'default' due a workaround introduced a few years > earlier in > 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps"). > > To tidy this up we implement two changes: > > First we introduce a new internal flag MPOL_F_KERNEL and with it mark the > kernel's internal default and fallback policies (for tasks and/or VMAs > with no explicit policy set). By doing this we generalise the current > special casing and replace the incorrect 'default' with the correct > 'bind'. > > Secondly, we add a string representation and corresponding handling for > MPOL_F_NUMA_BALANCING. We do this by adding a sparse mapping array of > flags to names. With the sparseness being the downside, but with the > advantage of generalising and removing the "policy" from flags display. Please split these 2 changes into 2 patches. Because we will need to back port the first one to -stable kernel. > End result: > > $ numactl -b -m 0-1,3 cat /proc/self/numa_maps > 555559580000 bind=balancing:0-1,3 file=/usr/bin/cat mapped=3 active=0 N0=3 kernelpagesize_kB=16 > ... > > v2: > * Fully fix by introducing MPOL_F_KERNEL. > > Signed-off-by: Tvrtko Ursulin > Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") > References: 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps") > Cc: Huang Ying > Cc: Mel Gorman > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: Rik van Riel > Cc: Johannes Weiner > Cc: "Matthew Wilcox (Oracle)" > Cc: Dave Hansen > Cc: Andi Kleen > Cc: Michal Hocko > Cc: David Rientjes > --- > include/uapi/linux/mempolicy.h | 1 + > mm/mempolicy.c | 44 ++++++++++++++++++++++++---------- > 2 files changed, 32 insertions(+), 13 deletions(-) > > diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h > index 1f9bb10d1a47..bcf56ce9603b 100644 > --- a/include/uapi/linux/mempolicy.h > +++ b/include/uapi/linux/mempolicy.h > @@ -64,6 +64,7 @@ enum { > #define MPOL_F_SHARED (1 << 0) /* identify shared policies */ > #define MPOL_F_MOF (1 << 3) /* this policy wants migrate on fault */ > #define MPOL_F_MORON (1 << 4) /* Migrate On protnone Reference On Node */ > +#define MPOL_F_KERNEL (1 << 5) /* Kernel's internal policy */ > > /* > * These bit locations are exposed in the vm.zone_reclaim_mode sysctl > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index aec756ae5637..8ecc6d9f100a 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -134,6 +134,7 @@ enum zone_type policy_zone = 0; > static struct mempolicy default_policy = { > .refcnt = ATOMIC_INIT(1), /* never free it */ > .mode = MPOL_LOCAL, > + .flags = MPOL_F_KERNEL, > }; > > static struct mempolicy preferred_node_policy[MAX_NUMNODES]; > @@ -3095,7 +3096,7 @@ void __init numa_policy_init(void) > preferred_node_policy[nid] = (struct mempolicy) { > .refcnt = ATOMIC_INIT(1), > .mode = MPOL_PREFERRED, > - .flags = MPOL_F_MOF | MPOL_F_MORON, > + .flags = MPOL_F_MOF | MPOL_F_MORON | MPOL_F_KERNEL, > .nodes = nodemask_of_node(nid), > }; > } > @@ -3150,6 +3151,12 @@ static const char * const policy_modes[] = > [MPOL_PREFERRED_MANY] = "prefer (many)", > }; > > +static const char * const policy_flags[] = { > + [ilog2(MPOL_F_STATIC_NODES)] = "static", > + [ilog2(MPOL_F_RELATIVE_NODES)] = "relative", > + [ilog2(MPOL_F_NUMA_BALANCING)] = "balancing", > +}; > + > #ifdef CONFIG_TMPFS > /** > * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option. > @@ -3293,17 +3300,18 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) > * @pol: pointer to mempolicy to be formatted > * > * Convert @pol into a string. If @buffer is too short, truncate the string. > - * Recommend a @maxlen of at least 32 for the longest mode, "interleave", the > - * longest flag, "relative", and to display at least a few node ids. > + * Recommend a @maxlen of at least 42 for the longest mode, "weighted > + * interleave", the longest flag, "balancing", and to display at least a few > + * node ids. > */ > void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) > { > char *p = buffer; > nodemask_t nodes = NODE_MASK_NONE; > unsigned short mode = MPOL_DEFAULT; > - unsigned short flags = 0; > + unsigned long flags = 0; > > - if (pol && pol != &default_policy && !(pol->flags & MPOL_F_MORON)) { > + if (!(pol->flags & MPOL_F_KERNEL)) { Can we avoid to introduce a new flag? Whether the following code work? if (pol && pol != &default_policy && !(pol->mode != MPOL_PREFERRED) && !(pol->flags & MPOL_F_MORON)) But I think that this is kind of fragile. A flag is better. But personally, I don't think MPOL_F_KERNEL is a good name, maybe MPOL_F_DEFAULT? > mode = pol->mode; > flags = pol->flags; > } > @@ -3328,15 +3336,25 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) > p += snprintf(p, maxlen, "%s", policy_modes[mode]); > > if (flags & MPOL_MODE_FLAGS) { > - p += snprintf(p, buffer + maxlen - p, "="); > + unsigned int bit, cnt = 0; > > - /* > - * Currently, the only defined flags are mutually exclusive > - */ > - if (flags & MPOL_F_STATIC_NODES) > - p += snprintf(p, buffer + maxlen - p, "static"); > - else if (flags & MPOL_F_RELATIVE_NODES) > - p += snprintf(p, buffer + maxlen - p, "relative"); > + for_each_set_bit(bit, &flags, ARRAY_SIZE(policy_flags)) { > + if (bit <= ilog2(MPOL_F_KERNEL)) > + continue; > + > + if (cnt == 0) > + p += snprintf(p, buffer + maxlen - p, "="); > + else > + p += snprintf(p, buffer + maxlen - p, ","); > + > + if (WARN_ON_ONCE(!policy_flags[bit])) > + p += snprintf(p, buffer + maxlen - p, "bit%u", > + bit); > + else > + p += snprintf(p, buffer + maxlen - p, > + policy_flags[bit]); > + cnt++; > + } Please refer to commit 2291990ab36b ("mempolicy: clean-up mpol-to-str() mempolicy formatting") for the original format. > } > > if (!nodes_empty(nodes)) -- Best Regards, Huang, Ying