From: Feng Tang <feng.tang@intel.com>
To: Michal Hocko <mhocko@suse.com>,
Muchun Song <songmuchun@bytedance.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <hansen.dave@intel.com>,
Ben Widawsky <bwidawsk@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Feng Tang <feng.tang@intel.com>
Subject: [PATCH] mm/hugetlb: add dedicated func to get 'allowed' nodemask for current process
Date: Thu, 4 Aug 2022 21:03:42 +0800 [thread overview]
Message-ID: <20220804130342.63355-1-feng.tang@intel.com> (raw)
In-Reply-To: <YuuitMLOYqBcXMR3@dhcp22.suse.cz>
Muchun Song found that after MPOL_PREFERRED_MANY policy was introduced
in commit b27abaccf8e8 ("mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes")
[1], the policy_nodemask_current()'s semantics for this new policy
has been changed, which returns 'preferred' nodes instead of 'allowed'
nodes, and could hurt the usage of its caller in hugetlb:
allowed_mems_nr().
Michal found the policy_nodemask_current() is only used by hugetlb,
and suggested to move it to hugetlb code with more explicit name to
enforce the 'allowed' semantics for which only MPOL_BIND policy
matters.
One note for the new policy_mbind_nodemask() is, the cross check
from MPOL_BIND, gfp flags and cpuset configuration can lead to
a no available node case, which is considered to be broken
configuration, and 'NULL' (equals all nodes) will be returned.
apply_policy_zone() is made extern to be called in hugetlb code
and its return value is changed to bool.
[1]. https://lore.kernel.org/lkml/20220801084207.39086-1-songmuchun@bytedance.com/t/
Reported-by: Muchun Song <songmuchun@bytedance.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
include/linux/mempolicy.h | 13 +------------
mm/hugetlb.c | 24 ++++++++++++++++++++----
mm/mempolicy.c | 2 +-
3 files changed, 22 insertions(+), 17 deletions(-)
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 668389b4b53d..d232de7cdc56 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -151,13 +151,6 @@ extern bool mempolicy_in_oom_domain(struct task_struct *tsk,
const nodemask_t *mask);
extern nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy);
-static inline nodemask_t *policy_nodemask_current(gfp_t gfp)
-{
- struct mempolicy *mpol = get_task_policy(current);
-
- return policy_nodemask(gfp, mpol);
-}
-
extern unsigned int mempolicy_slab_node(void);
extern enum zone_type policy_zone;
@@ -189,6 +182,7 @@ static inline bool mpol_is_preferred_many(struct mempolicy *pol)
return (pol->mode == MPOL_PREFERRED_MANY);
}
+extern bool apply_policy_zone(struct mempolicy *policy, enum zone_type zone);
#else
@@ -294,11 +288,6 @@ static inline void mpol_put_task_policy(struct task_struct *task)
{
}
-static inline nodemask_t *policy_nodemask_current(gfp_t gfp)
-{
- return NULL;
-}
-
static inline bool mpol_is_preferred_many(struct mempolicy *pol)
{
return false;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a18c071c294e..ad84bb85b6de 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4330,18 +4330,34 @@ static int __init default_hugepagesz_setup(char *s)
}
__setup("default_hugepagesz=", default_hugepagesz_setup);
+static nodemask_t *policy_mbind_nodemask(gfp_t gfp)
+{
+#ifdef CONFIG_NUMA
+ struct mempolicy *mpol = get_task_policy(current);
+
+ /*
+ * Only enforce MPOL_BIND policy which overlaps with cpuset policy
+ * (from policy_nodemask) specifically for hugetlb case
+ */
+ if (mpol->mode == MPOL_BIND &&
+ (apply_policy_zone(mpol, gfp_zone(gfp)) &&
+ cpuset_nodemask_valid_mems_allowed(&mpol->nodes)))
+ return &mpol->nodes;
+#endif
+ return NULL;
+}
+
static unsigned int allowed_mems_nr(struct hstate *h)
{
int node;
unsigned int nr = 0;
- nodemask_t *mpol_allowed;
+ nodemask_t *mbind_nodemask;
unsigned int *array = h->free_huge_pages_node;
gfp_t gfp_mask = htlb_alloc_mask(h);
- mpol_allowed = policy_nodemask_current(gfp_mask);
-
+ mbind_nodemask = policy_mbind_nodemask(gfp_mask);
for_each_node_mask(node, cpuset_current_mems_allowed) {
- if (!mpol_allowed || node_isset(node, *mpol_allowed))
+ if (!mbind_nodemask || node_isset(node, *mbind_nodemask))
nr += array[node];
}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d39b01fd52fe..9f15bc533601 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1805,7 +1805,7 @@ bool vma_policy_mof(struct vm_area_struct *vma)
return pol->flags & MPOL_F_MOF;
}
-static int apply_policy_zone(struct mempolicy *policy, enum zone_type zone)
+bool apply_policy_zone(struct mempolicy *policy, enum zone_type zone)
{
enum zone_type dynamic_policy_zone = policy_zone;
--
2.27.0
next prev parent reply other threads:[~2022-08-04 13:03 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-01 8:42 [PATCH] mm: mempolicy: fix policy_nodemask() for MPOL_PREFERRED_MANY case Muchun Song
2022-08-01 9:06 ` Michal Hocko
2022-08-01 9:26 ` Feng Tang
2022-08-02 3:42 ` Muchun Song
2022-08-02 5:52 ` Feng Tang
2022-08-02 6:40 ` Muchun Song
2022-08-02 7:39 ` Feng Tang
2022-08-02 9:02 ` Michal Hocko
2022-08-03 6:41 ` Feng Tang
2022-08-03 7:36 ` Michal Hocko
2022-08-03 17:14 ` Feng Tang
2022-08-03 11:28 ` Michal Hocko
2022-08-03 20:43 ` Feng Tang
2022-08-03 12:56 ` Michal Hocko
2022-08-03 21:08 ` Feng Tang
2022-08-03 13:21 ` Michal Hocko
2022-08-04 8:27 ` Feng Tang
2022-08-04 10:43 ` Michal Hocko
2022-08-04 13:03 ` Feng Tang [this message]
2022-08-04 13:36 ` [PATCH] mm/hugetlb: add dedicated func to get 'allowed' nodemask for current process Michal Hocko
2022-08-04 22:37 ` Andrew Morton
2022-08-05 0:06 ` Feng Tang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220804130342.63355-1-feng.tang@intel.com \
--to=feng.tang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=bwidawsk@kernel.org \
--cc=hansen.dave@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox