From: Michal Hocko <mhocko@suse.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
Ben Widawsky <ben.widawsky@intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Feng Tang <feng.tang@intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
Mike Kravetz <mike.kravetz@oracle.com>,
Randy Dunlap <rdunlap@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>, Andi Kleen <ak@linux.intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Huang Ying <ying.huang@intel.com>,
linux-api@vger.kernel.org
Subject: Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy
Date: Wed, 13 Oct 2021 12:42:30 +0200 [thread overview]
Message-ID: <YWa4FoicH0VztGTl@dhcp22.suse.cz> (raw)
In-Reply-To: <20211013094539.962357-1-aneesh.kumar@linux.ibm.com>
[Cc linux-api]
On Wed 13-10-21 15:15:39, Aneesh Kumar K.V wrote:
> This mempolicy mode can be used with either the set_mempolicy(2)
> or mbind(2) interfaces. Like the MPOL_PREFERRED interface, it
> allows an application to set a preference node from which the kernel
> will fulfill memory allocation requests. Unlike the MPOL_PREFERRED mode,
> it takes a set of nodes. The nodes in the nodemask are used as fallback
> allocation nodes if memory is not available on the preferred node.
> Unlike MPOL_PREFERRED_MANY, it will not fall back memory allocations
> to all nodes in the system. Like the MPOL_BIND interface, it works over a
> set of nodes and will cause a SIGSEGV or invoke the OOM killer if
> memory is not available on those preferred nodes.
>
> This patch helps applications to hint a memory allocation preference node
> and fallback to _only_ a set of nodes if the memory is not available
> on the preferred node. Fallback allocation is attempted from the node which is
> nearest to the preferred node.
>
> This new memory policy helps applications to have explicit control on slow
> memory allocation and avoids default fallback to slow memory NUMA nodes.
> The difference with MPOL_BIND is the ability to specify a preferred node
> which is the first node in the nodemask argument passed.
>
> Cc: Ben Widawsky <ben.widawsky@intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Huang Ying <ying.huang@intel.com>b
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> .../admin-guide/mm/numa_memory_policy.rst | 7 +++
> include/uapi/linux/mempolicy.h | 1 +
> mm/mempolicy.c | 43 +++++++++++++++++--
> 3 files changed, 48 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
> index 64fd0ba0d057..4dfdcbd22d67 100644
> --- a/Documentation/admin-guide/mm/numa_memory_policy.rst
> +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
> @@ -252,6 +252,13 @@ MPOL_PREFERRED_MANY
> can fall back to all existing numa nodes. This is effectively
> MPOL_PREFERRED allowed for a mask rather than a single node.
>
> +MPOL_PREFERRED_STRICT
> + This mode specifies that the allocation should be attempted
> + from the first node specified in the nodemask of the policy.
> + If that allocation fails, the kernel will search other nodes
> + in the nodemask, in order of increasing distance from the
> + preferred node based on information provided by the platform firmware.
> +
> NUMA memory policy supports the following optional mode flags:
>
> MPOL_F_STATIC_NODES
> diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
> index 046d0ccba4cd..8aa1d1963235 100644
> --- a/include/uapi/linux/mempolicy.h
> +++ b/include/uapi/linux/mempolicy.h
> @@ -23,6 +23,7 @@ enum {
> MPOL_INTERLEAVE,
> MPOL_LOCAL,
> MPOL_PREFERRED_MANY,
> + MPOL_PREFERRED_STRICT,
> MPOL_MAX, /* always last member of enum */
> };
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 1592b081c58e..59080dd1ea69 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -407,6 +407,10 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
> .create = mpol_new_nodemask,
> .rebind = mpol_rebind_preferred,
> },
> + [MPOL_PREFERRED_STRICT] = {
> + .create = mpol_new_nodemask,
> + .rebind = mpol_rebind_preferred,
> + },
> };
>
> static int migrate_page_add(struct page *page, struct list_head *pagelist,
> @@ -900,6 +904,7 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes)
> case MPOL_INTERLEAVE:
> case MPOL_PREFERRED:
> case MPOL_PREFERRED_MANY:
> + case MPOL_PREFERRED_STRICT:
> *nodes = p->nodes;
> break;
> case MPOL_LOCAL:
> @@ -1781,7 +1786,7 @@ nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
> cpuset_nodemask_valid_mems_allowed(&policy->nodes))
> return &policy->nodes;
>
> - if (mode == MPOL_PREFERRED_MANY)
> + if (mode == MPOL_PREFERRED_MANY || mode == MPOL_PREFERRED_STRICT)
> return &policy->nodes;
>
> return NULL;
> @@ -1796,7 +1801,7 @@ nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
> */
> static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd)
> {
> - if (policy->mode == MPOL_PREFERRED) {
> + if (policy->mode == MPOL_PREFERRED || policy->mode == MPOL_PREFERRED_STRICT) {
> nd = first_node(policy->nodes);
> } else {
> /*
> @@ -1840,6 +1845,7 @@ unsigned int mempolicy_slab_node(void)
>
> switch (policy->mode) {
> case MPOL_PREFERRED:
> + case MPOL_PREFERRED_STRICT:
> return first_node(policy->nodes);
>
> case MPOL_INTERLEAVE:
> @@ -1952,7 +1958,8 @@ int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags,
> huge_page_shift(hstate_vma(vma)));
> } else {
> nid = policy_node(gfp_flags, *mpol, numa_node_id());
> - if (mode == MPOL_BIND || mode == MPOL_PREFERRED_MANY)
> + if (mode == MPOL_BIND || mode == MPOL_PREFERRED_MANY ||
> + mode == MPOL_PREFERRED_STRICT)
> *nodemask = &(*mpol)->nodes;
> }
> return nid;
> @@ -1986,6 +1993,7 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
> switch (mempolicy->mode) {
> case MPOL_PREFERRED:
> case MPOL_PREFERRED_MANY:
> + case MPOL_PREFERRED_STRICT:
> case MPOL_BIND:
> case MPOL_INTERLEAVE:
> *mask = mempolicy->nodes;
> @@ -2072,6 +2080,23 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
> return page;
> }
>
> +static struct page *alloc_pages_preferred_strict(gfp_t gfp, unsigned int order,
> + struct mempolicy *pol)
> +{
> + int nid;
> + gfp_t preferred_gfp;
> +
> + /*
> + * With MPOL_PREFERRED_STRICT first node in the policy nodemask
> + * is picked as the preferred node id and the fallback allocation
> + * is still restricted to the preferred nodes in the nodemask.
> + */
> + preferred_gfp = gfp | __GFP_NOWARN;
> + preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
> + nid = first_node(pol->nodes);
> + return __alloc_pages(preferred_gfp, order, nid, &pol->nodes);
> +}
> +
> /**
> * alloc_pages_vma - Allocate a page for a VMA.
> * @gfp: GFP flags.
> @@ -2113,6 +2138,12 @@ struct page *alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
> goto out;
> }
>
> + if (pol->mode == MPOL_PREFERRED_STRICT) {
> + page = alloc_pages_preferred_strict(gfp, order, pol);
> + mpol_cond_put(pol);
> + goto out;
> + }
> +
> if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) {
> int hpage_node = node;
>
> @@ -2193,6 +2224,8 @@ struct page *alloc_pages(gfp_t gfp, unsigned order)
> else if (pol->mode == MPOL_PREFERRED_MANY)
> page = alloc_pages_preferred_many(gfp, order,
> numa_node_id(), pol);
> + else if (pol->mode == MPOL_PREFERRED_STRICT)
> + page = alloc_pages_preferred_strict(gfp, order, pol);
> else
> page = __alloc_pages(gfp, order,
> policy_node(gfp, pol, numa_node_id()),
> @@ -2265,6 +2298,7 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b)
> case MPOL_INTERLEAVE:
> case MPOL_PREFERRED:
> case MPOL_PREFERRED_MANY:
> + case MPOL_PREFERRED_STRICT:
> return !!nodes_equal(a->nodes, b->nodes);
> case MPOL_LOCAL:
> return true;
> @@ -2405,6 +2439,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
> break;
>
> case MPOL_PREFERRED:
> + case MPOL_PREFERRED_STRICT:
> if (node_isset(curnid, pol->nodes))
> goto out;
> polnid = first_node(pol->nodes);
> @@ -2866,6 +2901,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol)
> err = 0;
> goto out;
> case MPOL_PREFERRED_MANY:
> + case MPOL_PREFERRED_STRICT:
> case MPOL_BIND:
> /*
> * Insist on a nodelist
> @@ -2953,6 +2989,7 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol)
> break;
> case MPOL_PREFERRED:
> case MPOL_PREFERRED_MANY:
> + case MPOL_PREFERRED_STRICT:
> case MPOL_BIND:
> case MPOL_INTERLEAVE:
> nodes = pol->nodes;
> --
> 2.31.1
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2021-10-13 10:42 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-13 9:45 Aneesh Kumar K.V
2021-10-13 10:42 ` Michal Hocko [this message]
2021-10-13 10:48 ` Michal Hocko
2021-10-13 12:35 ` Aneesh Kumar K.V
2021-10-13 12:50 ` Michal Hocko
2021-10-13 12:58 ` Aneesh Kumar K.V
2021-10-13 13:07 ` Michal Hocko
2021-10-13 13:10 ` Aneesh Kumar K.V
2021-10-13 14:22 ` Michal Hocko
2021-10-13 13:57 ` Aneesh Kumar K.V
2021-10-13 14:26 ` Michal Hocko
2021-10-13 13:16 ` Andi Kleen
2021-10-13 13:23 ` Aneesh Kumar K.V
2021-10-13 14:21 ` Michal Hocko
2021-10-14 9:30 ` Aneesh Kumar K.V
2021-10-14 9:38 ` Michal Hocko
2021-10-14 10:28 ` Aneesh Kumar K.V
2021-10-14 11:41 ` Michal Hocko
2021-10-14 13:29 ` Aneesh Kumar K.V
2021-10-14 14:56 ` Michal Hocko
2021-10-14 15:50 ` Aneesh Kumar K.V
2021-10-19 9:38 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YWa4FoicH0VztGTl@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=ben.widawsky@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=feng.tang@intel.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mike.kravetz@oracle.com \
--cc=rdunlap@infradead.org \
--cc=vbabka@suse.cz \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox