linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Gregory Price <gourry.memverge@gmail.com>
Cc: linux-mm@kvack.org,  linux-doc@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org,  x86@kernel.org,
	 akpm@linux-foundation.org, arnd@arndb.de,  tglx@linutronix.de,
	 luto@kernel.org,  mingo@redhat.com, bp@alien8.de,
	 dave.hansen@linux.intel.com,  hpa@zytor.com, mhocko@kernel.org,
	 tj@kernel.org,  gregory.price@memverge.com, corbet@lwn.net,
	 rakie.kim@sk.com,  hyeongtak.ji@sk.com, honggyu.kim@sk.com,
	 vtavarespetr@micron.com,  peterz@infradead.org,
	jgroves@micron.com,  ravis.opensrc@micron.com,
	 sthanneeru@micron.com, emirakhur@micron.com,
	 Hasan.Maruf@amd.com,  seungjun.ha@samsung.com
Subject: Re: [PATCH v4 11/11] mm/mempolicy: extend set_mempolicy2 and mbind2 to support weighted interleave
Date: Tue, 19 Dec 2023 11:07:10 +0800	[thread overview]
Message-ID: <87sf3ynb4x.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <20231218194631.21667-12-gregory.price@memverge.com> (Gregory Price's message of "Mon, 18 Dec 2023 14:46:31 -0500")

Gregory Price <gourry.memverge@gmail.com> writes:

> Extend set_mempolicy2 and mbind2 to support weighted interleave, and
> demonstrate the extensibility of the mpol_args structure.
>
> To support weighted interleave we add interleave weight fields to the
> following structures:
>
> Kernel Internal:  (include/linux/mempolicy.h)
> struct mempolicy {
> 	/* task-local weights to apply to weighted interleave */
> 	unsigned char weights[MAX_NUMNODES];
> }
> struct mempolicy_args {
> 	/* Optional: interleave weights for MPOL_WEIGHTED_INTERLEAVE */
> 	unsigned char *il_weights;	/* of size MAX_NUMNODES */
> }
>
> UAPI: (/include/uapi/linux/mempolicy.h)
> struct mpol_args {
> 	/* Optional: interleave weights for MPOL_WEIGHTED_INTERLEAVE */
> 	unsigned char *il_weights;	/* of size pol_max_nodes */
> }
>
> The task-local weights are a single, one-dimensional array of weights
> that apply to all possible nodes on the system.  If a node is set in
> the mempolicy nodemask, the weight in `il_weights` must be >= 1,
> otherwise set_mempolicy2() will return -EINVAL.  If a node is not
> set in pol_nodemask, the weight will default to `1` in the task policy.
>
> The default value of `1` is required to handle the situation where a
> task migrates to a set of nodes for which weights were not set (up to
> and including the local numa node).  For example, a migrated task whose
> nodemask changes entirely will have all its weights defaulted back
> to `1`, or if the nodemask changes to include a mix of nodes that
> were not previously accounted for - the weighted interleave may be
> suboptimal.
>
> If migrations are expected, a task should prefer not to use task-local
> interleave weights, and instead utilize the global settings for natural
> re-weighting on migration.
>
> To support global vs local weighting,  we add the kernel-internal flag:
> MPOL_F_GWEIGHT (1 << 5) /* Utilize global weights */
>
> This flag is set when il_weights is omitted by set_mempolicy2(), or
> when MPOL_WEIGHTED_INTERLEAVE is set by set_mempolicy(). This internal
> mode_flag dictates whether global weights or task-local weights are
> utilized by the the various weighted interleave functions:
>
> * weighted_interleave_nodes
> * weighted_interleave_nid
> * alloc_pages_bulk_array_weighted_interleave
>
> if (pol->flags & MPOL_F_GWEIGHT)
> 	pol_weights = iw_table;
> else
> 	pol_weights = pol->wil.weights;
>
> To simplify creations and duplication of mempolicies, the weights are
> added as a structure directly within mempolicy. This allows the
> existing logic in __mpol_dup to copy the weights without additional
> allocations:
>
> if (old == current->mempolicy) {
> 	task_lock(current);
> 	*new = *old;
> 	task_unlock(current);
> } else
> 	*new = *old
>
> Suggested-by: Rakie Kim <rakie.kim@sk.com>
> Suggested-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
> Suggested-by: Honggyu Kim <honggyu.kim@sk.com>
> Suggested-by: Vinicius Tavares Petrucci <vtavarespetr@micron.com>
> Signed-off-by: Gregory Price <gregory.price@memverge.com>
> Co-developed-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Co-developed-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
> Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
> Co-developed-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Co-developed-by: Vinicius Tavares Petrucci <vtavarespetr@micron.com>
> Signed-off-by: Vinicius Tavares Petrucci <vtavarespetr@micron.com>
> ---
>  .../admin-guide/mm/numa_memory_policy.rst     |  10 ++
>  include/linux/mempolicy.h                     |   2 +
>  include/uapi/linux/mempolicy.h                |   2 +
>  mm/mempolicy.c                                | 129 +++++++++++++++++-
>  4 files changed, 139 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
> index 99e1f732cade..0e91efe9e769 100644
> --- a/Documentation/admin-guide/mm/numa_memory_policy.rst
> +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
> @@ -254,6 +254,8 @@ MPOL_WEIGHTED_INTERLEAVE
>  	This mode operates the same as MPOL_INTERLEAVE, except that
>  	interleaving behavior is executed based on weights set in
>  	/sys/kernel/mm/mempolicy/weighted_interleave/
> +	when configured to utilize global weights, or based on task-local
> +	weights configured with set_mempolicy2(2) or mbind2(2).
>  
>  	Weighted interleave allocations pages on nodes according to
>  	their weight.  For example if nodes [0,1] are weighted [5,2]
> @@ -261,6 +263,13 @@ MPOL_WEIGHTED_INTERLEAVE
>  	2 pages allocated on node1.  This can better distribute data
>  	according to bandwidth on heterogeneous memory systems.
>  
> +	When utilizing task-local weights, weights are not rebalanced
> +	in the event of a task migration.  If a weight has not been
> +	explicitly set for a node set in the new nodemask, the
> +	value of that weight defaults to "1".  For this reason, if
> +	migrations are expected or possible, users should consider
> +	utilizing global interleave weights.
> +
>  NUMA memory policy supports the following optional mode flags:
>  
>  MPOL_F_STATIC_NODES
> @@ -514,6 +523,7 @@ Extended Mempolicy Arguments::
>  		__u16 mode_flags;
>  		__s32 home_node; /* mbind2: policy home node */
>  		__aligned_u64 pol_nodes; /* nodemask pointer */
> +		__aligned_u64 il_weights;  /* u8 buf of size pol_maxnodes */
>  		__u64 pol_maxnodes;
>  		__s32 policy_node; /* get_mempolicy2: policy node information */
>  	};
> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> index aeac19dfc2b6..387c5c418a66 100644
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -58,6 +58,7 @@ struct mempolicy {
>  	/* Weighted interleave settings */
>  	struct {
>  		unsigned char cur_weight;
> +		unsigned char weights[MAX_NUMNODES];
>  	} wil;
>  };
>  
> @@ -70,6 +71,7 @@ struct mempolicy_args {
>  	unsigned short mode_flags;	/* policy mode flags */
>  	int home_node;			/* mbind: use MPOL_MF_HOME_NODE */
>  	nodemask_t *policy_nodes;	/* get/set/mbind */
> +	unsigned char *il_weights;	/* for mode MPOL_WEIGHTED_INTERLEAVE */
>  	int policy_node;		/* get: policy node information */
>  };
>  
> diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
> index ec1402dae35b..16fedf966166 100644
> --- a/include/uapi/linux/mempolicy.h
> +++ b/include/uapi/linux/mempolicy.h
> @@ -33,6 +33,7 @@ struct mpol_args {
>  	__u16 mode_flags;
>  	__s32 home_node;	/* mbind2: policy home node */
>  	__aligned_u64 pol_nodes;
> +	__aligned_u64 il_weights; /* size: pol_maxnodes * sizeof(char) */
>  	__u64 pol_maxnodes;
>  	__s32 policy_node;	/* get_mempolicy: policy node info */
>  };

You break the ABI you introduced earlier in the patchset.  Although they
are done within a patchset, I don't think that it's a good idea.  I
suggest to finalize the ABI in the first place.  Otherwise, people check
git log will be confused by ABI broken.  This makes it easier to be
reviewed too.

> @@ -75,6 +76,7 @@ struct mpol_args {
>  #define MPOL_F_SHARED  (1 << 0)	/* identify shared policies */
>  #define MPOL_F_MOF	(1 << 3) /* this policy wants migrate on fault */
>  #define MPOL_F_MORON	(1 << 4) /* Migrate On protnone Reference On Node */
> +#define MPOL_F_GWEIGHT	(1 << 5) /* Utilize global weights */
>  
>  /*
>   * These bit locations are exposed in the vm.zone_reclaim_mode sysctl

--
Best Regards,
Huang, Ying

[snip]


  parent reply	other threads:[~2023-12-19  3:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-18 19:46 [PATCH v4 00/11] mempolicy2, mbind2, and " Gregory Price
2023-12-18 19:46 ` [PATCH v4 07/11] mm/mempolicy: add userland mempolicy arg structure Gregory Price
2023-12-18 19:46 ` [PATCH v4 10/11] mm/mempolicy: add the mbind2 syscall Gregory Price
2023-12-19 12:24   ` kernel test robot
2023-12-20  0:48   ` kernel test robot
2023-12-19  3:04 ` [PATCH v4 00/11] mempolicy2, mbind2, and weighted interleave Huang, Ying
2023-12-19 18:09   ` Gregory Price
2023-12-20  2:27     ` Huang, Ying
2023-12-26  7:26       ` Gregory Price
2024-01-02  4:08         ` Huang, Ying
     [not found] ` <20231218194631.21667-12-gregory.price@memverge.com>
2023-12-19  3:07   ` Huang, Ying [this message]
2023-12-19 18:12     ` [PATCH v4 11/11] mm/mempolicy: extend set_mempolicy2 and mbind2 to support " Gregory Price
2024-01-03 11:16   ` Dan Carpenter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sf3ynb4x.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=Hasan.Maruf@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=emirakhur@micron.com \
    --cc=gourry.memverge@gmail.com \
    --cc=gregory.price@memverge.com \
    --cc=honggyu.kim@sk.com \
    --cc=hpa@zytor.com \
    --cc=hyeongtak.ji@sk.com \
    --cc=jgroves@micron.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rakie.kim@sk.com \
    --cc=ravis.opensrc@micron.com \
    --cc=seungjun.ha@samsung.com \
    --cc=sthanneeru@micron.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vtavarespetr@micron.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox