* Re: [PATCH] mm/mempolicy: fix mpol_rebind_nodemask() for MPOL_F_NUMA_BALANCING
2025-12-13 8:29 [PATCH] mm/mempolicy: fix mpol_rebind_nodemask() for MPOL_F_NUMA_BALANCING Jinjiang Tu
@ 2025-12-15 0:04 ` Andrew Morton
2025-12-15 1:40 ` Jinjiang Tu
2025-12-19 19:20 ` Gregory Price
2025-12-19 19:23 ` Gregory Price
2025-12-21 7:06 ` Huang, Ying
2 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2025-12-15 0:04 UTC (permalink / raw)
To: Jinjiang Tu
Cc: david, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, mgorman, linux-mm, wangkefeng.wang
On Sat, 13 Dec 2025 16:29:11 +0800 Jinjiang Tu <tujinjiang@huawei.com> wrote:
> When mempolicy is rebound due to the process moves to a different cpuset
> context, or the set of nodes allowed by current cpuset context changes,
> mpol_rebind_nodemask() remaps the nodemask according to the old and new
> cpuset_mems_allowed by default. So, use mempolicy.w.cpuset_mems_allowed
> to store the old nodemask allowed by cpuset.
>
> MPOL_F_STATIC_NODES suppresses the node remap and intersects the user's
> passed nodemask and nodes allowed by new cpuset context.
> For MPOL_F_RELATIVE_NODES, the user's passed nodemask means node IDs that
> are relative to the set of node IDs allowed by the process's current
> cpuset. So, use mempolicy.w.user_nodemask to store the user's passed
> nodemask.
>
> commit bda420b98505 ("numa balancing: migrate on fault among multiple
> bound nodes") adds new flag MPOL_F_NUMA_BALANCING to enable NUMA balancing
> for MPOL_BIND, the behaviour of rebinding should be same with default
> befaviour. However, mpol_store_user_nodemask() returns true for
> MPOL_F_NUMA_BALANCING, leading to mempolicy.w.cpuset_mems_allowed stores
> the user's passed nodemask instead of cpuset_current_mems_allowed, and
> mpol_rebind_nodemask() remaps wrongly.
Thanks.
I find the changelog hard to follow, unfortunately. It's odd that the
problem description comes in the final paragraph!
I cheekily changed that and then fed the text into Gemini, which
I think helped. What do you think of the below?
I won't merge the patch at this time - I'll await reviewer input.
## Bug Fix: Corrected `MPOL_BIND` Rebinding with `MPOL_F_NUMA_BALANCING`
### Problem
The commit `bda420b98505` ("numa balancing: migrate on fault among
multiple bound nodes") introduced the new flag
**`MPOL_F_NUMA_BALANCING`** to enable NUMA balancing for the
**`MPOL_BIND`** memory policy.
The intended behavior was for the rebinding logic to remain the same as
the default `MPOL_BIND` behavior. However, the function
`mpol_store_user_nodemask()` was incorrectly returning `true` for
policies containing `MPOL_F_NUMA_BALANCING`.
This led to a bug where:
1. `mempolicy.w.cpuset_mems_allowed` stored the **user's passed
nodemask** instead of the actual nodemask allowed by the current
cpuset context (`cpuset_current_mems_allowed`).
2. Consequently, **`mpol_rebind_nodemask()` performed incorrect
remapping** when the mempolicy was rebound.
### Analysis of Correct Rebinding Logic
When a memory policy is rebound (e.g., because the process moves to a
different cpuset context, or the allowed nodes within the current
cpuset change), `mpol_rebind_nodemask()`, by default, remaps the
policy's nodemask based on the transition between the **old** and
**new** `cpuset_mems_allowed` sets.
To support this mechanism correctly, `mempolicy.w.cpuset_mems_allowed`
**must store the old nodemask allowed by the cpuset** before the
transition.
### Context for Other Flags
* **`MPOL_F_STATIC_NODES`**: This flag suppresses the node remap and
simply intersects the user's passed nodemask with the nodes allowed
by the new cpuset context.
* **`MPOL_F_RELATIVE_NODES`**: For this policy, the user's passed
nodemask represents node IDs **relative** to the set of node IDs
allowed by the process's current cpuset. Therefore,
`mempolicy.w.user_nodemask` is correctly used to store the user's
original relative nodemask.
### Proposed Fix
Ensure that `mpol_store_user_nodemask()` handles
`MPOL_F_NUMA_BALANCING` correctly so that
`mempolicy.w.cpuset_mems_allowed` stores the correct cpuset-allowed
nodemask, thereby restoring the proper remapping behavior in
`mpol_rebind_nodemask()`.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm/mempolicy: fix mpol_rebind_nodemask() for MPOL_F_NUMA_BALANCING
2025-12-15 0:04 ` Andrew Morton
@ 2025-12-15 1:40 ` Jinjiang Tu
2025-12-19 19:20 ` Gregory Price
1 sibling, 0 replies; 8+ messages in thread
From: Jinjiang Tu @ 2025-12-15 1:40 UTC (permalink / raw)
To: Andrew Morton
Cc: david, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
gourry, ying.huang, apopple, mgorman, linux-mm, wangkefeng.wang
[-- Attachment #1: Type: text/plain, Size: 3969 bytes --]
在 2025/12/15 8:04, Andrew Morton 写道:
> On Sat, 13 Dec 2025 16:29:11 +0800 Jinjiang Tu<tujinjiang@huawei.com> wrote:
>
>> When mempolicy is rebound due to the process moves to a different cpuset
>> context, or the set of nodes allowed by current cpuset context changes,
>> mpol_rebind_nodemask() remaps the nodemask according to the old and new
>> cpuset_mems_allowed by default. So, use mempolicy.w.cpuset_mems_allowed
>> to store the old nodemask allowed by cpuset.
>>
>> MPOL_F_STATIC_NODES suppresses the node remap and intersects the user's
>> passed nodemask and nodes allowed by new cpuset context.
>> For MPOL_F_RELATIVE_NODES, the user's passed nodemask means node IDs that
>> are relative to the set of node IDs allowed by the process's current
>> cpuset. So, use mempolicy.w.user_nodemask to store the user's passed
>> nodemask.
>>
>> commit bda420b98505 ("numa balancing: migrate on fault among multiple
>> bound nodes") adds new flag MPOL_F_NUMA_BALANCING to enable NUMA balancing
>> for MPOL_BIND, the behaviour of rebinding should be same with default
>> befaviour. However, mpol_store_user_nodemask() returns true for
>> MPOL_F_NUMA_BALANCING, leading to mempolicy.w.cpuset_mems_allowed stores
>> the user's passed nodemask instead of cpuset_current_mems_allowed, and
>> mpol_rebind_nodemask() remaps wrongly.
> Thanks.
>
> I find the changelog hard to follow, unfortunately. It's odd that the
> problem description comes in the final paragraph!
>
> I cheekily changed that and then fed the text into Gemini, which
> I think helped. What do you think of the below?
>
Thanks, it describes the problem more clearly. I will update the
commit log in the next version.
> I won't merge the patch at this time - I'll await reviewer input.
>
>
> ## Bug Fix: Corrected `MPOL_BIND` Rebinding with `MPOL_F_NUMA_BALANCING`
>
> ### Problem
>
> The commit `bda420b98505` ("numa balancing: migrate on fault among
> multiple bound nodes") introduced the new flag
> **`MPOL_F_NUMA_BALANCING`** to enable NUMA balancing for the
> **`MPOL_BIND`** memory policy.
>
> The intended behavior was for the rebinding logic to remain the same as
> the default `MPOL_BIND` behavior. However, the function
> `mpol_store_user_nodemask()` was incorrectly returning `true` for
> policies containing `MPOL_F_NUMA_BALANCING`.
>
> This led to a bug where:
>
> 1. `mempolicy.w.cpuset_mems_allowed` stored the **user's passed
> nodemask** instead of the actual nodemask allowed by the current
> cpuset context (`cpuset_current_mems_allowed`).
>
> 2. Consequently, **`mpol_rebind_nodemask()` performed incorrect
> remapping** when the mempolicy was rebound.
>
> ### Analysis of Correct Rebinding Logic
>
> When a memory policy is rebound (e.g., because the process moves to a
> different cpuset context, or the allowed nodes within the current
> cpuset change), `mpol_rebind_nodemask()`, by default, remaps the
> policy's nodemask based on the transition between the **old** and
> **new** `cpuset_mems_allowed` sets.
>
> To support this mechanism correctly, `mempolicy.w.cpuset_mems_allowed`
> **must store the old nodemask allowed by the cpuset** before the
> transition.
>
> ### Context for Other Flags
>
> * **`MPOL_F_STATIC_NODES`**: This flag suppresses the node remap and
> simply intersects the user's passed nodemask with the nodes allowed
> by the new cpuset context.
>
> * **`MPOL_F_RELATIVE_NODES`**: For this policy, the user's passed
> nodemask represents node IDs **relative** to the set of node IDs
> allowed by the process's current cpuset. Therefore,
> `mempolicy.w.user_nodemask` is correctly used to store the user's
> original relative nodemask.
>
> ### Proposed Fix
>
> Ensure that `mpol_store_user_nodemask()` handles
> `MPOL_F_NUMA_BALANCING` correctly so that
> `mempolicy.w.cpuset_mems_allowed` stores the correct cpuset-allowed
> nodemask, thereby restoring the proper remapping behavior in
> `mpol_rebind_nodemask()`.
>
[-- Attachment #2: Type: text/html, Size: 4575 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm/mempolicy: fix mpol_rebind_nodemask() for MPOL_F_NUMA_BALANCING
2025-12-15 0:04 ` Andrew Morton
2025-12-15 1:40 ` Jinjiang Tu
@ 2025-12-19 19:20 ` Gregory Price
2025-12-20 6:49 ` Jinjiang Tu
1 sibling, 1 reply; 8+ messages in thread
From: Gregory Price @ 2025-12-19 19:20 UTC (permalink / raw)
To: Andrew Morton
Cc: Jinjiang Tu, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, ying.huang, apopple, mgorman, linux-mm,
wangkefeng.wang
On Sun, Dec 14, 2025 at 04:04:59PM -0800, Andrew Morton wrote:
> On Sat, 13 Dec 2025 16:29:11 +0800 Jinjiang Tu <tujinjiang@huawei.com> wrote:
...
> The intended behavior was for the rebinding logic to remain the same as
> the default `MPOL_BIND` behavior. However, the function
> `mpol_store_user_nodemask()` was incorrectly returning `true` for
> policies containing `MPOL_F_NUMA_BALANCING`.
>
> This led to a bug where:
>
> 1. `mempolicy.w.cpuset_mems_allowed` stored the **user's passed
> nodemask** instead of the actual nodemask allowed by the current
> cpuset context (`cpuset_current_mems_allowed`).
>
Hm... these things are a union.
It's probably simpler to state the following
----
Setting MPOL_F_NUMA_BALANCING causes pol->w.cpuset_mems_allowed to be
erroneously overwritten, causing mpol_rebind_nodemask to rebind the
policy based on the wrong nodemask.
1. The intended rebind behavior of MPOL_F_NUMA_BALANCING when neither
MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags are present is
to remap nodes based on mempolicy.w.cpuset_mems_allowed.
2. `mempolicy.w.cpuset_mems_allowed` is overwritten by mpol_set_nodemask
setting `mempolicy.w.user_nodemask` (these are unioned) when
MPOL_F_NUMA_BALANCING is set because it mpol_store_user_nodemask()
check for any mode flag.
union {
nodemask_t cpuset_mems_allowed; /* relative to these nodes */
nodemask_t user_nodemask; /* nodemask passed by user */
} w;
static inline int mpol_store_user_nodemask(const struct mempolicy *pol)
{
return pol->flags & MPOL_MODE_FLAGS;
}
static int mpol_set_nodemask(...)
{
if (mpol_store_user_nodemask(pol))
pol->w.user_nodemask = *nodes;
else
pol->w.cpuset_mems_allowed = cpuset_current_mems_allowed;
}
3. `mpol_rebind_nodemask()` consequently ends up rebinding based on the
user-passed nodemask rather than the cpuset_mems_allowed nodemask
as intended.
static void mpol_rebind_nodemask()
{
if (pol->flags & MPOL_F_STATIC_NODES)
nodes_and(tmp, pol->w.user_nodemask, *nodes);
else if (pol->flags & MPOL_F_RELATIVE_NODES)
mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes);
else
nodes_remap(tmp, pol->nodes, pol->w.cpuset_mems_allowed,
*nodes);
...
}
To fix this, only store the user nodemask if MPOL_F_STATIC_NODES or
MPOL_F_RELATIVE_NODES are present.
-----------
On another note... what's even the reason for this union to exist if you
need to know the flag state to determine which one to access????
and they're both nodemask_t!
May as well call it `pol->w.rebind_mask` or something and let the flags do
the talking.
Otherwise the fix looks good, I will respond to the original with a
review tag.
~Gregory
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm/mempolicy: fix mpol_rebind_nodemask() for MPOL_F_NUMA_BALANCING
2025-12-19 19:20 ` Gregory Price
@ 2025-12-20 6:49 ` Jinjiang Tu
0 siblings, 0 replies; 8+ messages in thread
From: Jinjiang Tu @ 2025-12-20 6:49 UTC (permalink / raw)
To: Gregory Price, Andrew Morton
Cc: david, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
ying.huang, apopple, mgorman, linux-mm, wangkefeng.wang
在 2025/12/20 3:20, Gregory Price 写道:
> On Sun, Dec 14, 2025 at 04:04:59PM -0800, Andrew Morton wrote:
>> On Sat, 13 Dec 2025 16:29:11 +0800 Jinjiang Tu <tujinjiang@huawei.com> wrote:
> ...
>> The intended behavior was for the rebinding logic to remain the same as
>> the default `MPOL_BIND` behavior. However, the function
>> `mpol_store_user_nodemask()` was incorrectly returning `true` for
>> policies containing `MPOL_F_NUMA_BALANCING`.
>>
>> This led to a bug where:
>>
>> 1. `mempolicy.w.cpuset_mems_allowed` stored the **user's passed
>> nodemask** instead of the actual nodemask allowed by the current
>> cpuset context (`cpuset_current_mems_allowed`).
>>
> Hm... these things are a union.
>
> It's probably simpler to state the following
>
> ----
>
> Setting MPOL_F_NUMA_BALANCING causes pol->w.cpuset_mems_allowed to be
> erroneously overwritten, causing mpol_rebind_nodemask to rebind the
> policy based on the wrong nodemask.
>
> 1. The intended rebind behavior of MPOL_F_NUMA_BALANCING when neither
> MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags are present is
> to remap nodes based on mempolicy.w.cpuset_mems_allowed.
>
> 2. `mempolicy.w.cpuset_mems_allowed` is overwritten by mpol_set_nodemask
> setting `mempolicy.w.user_nodemask` (these are unioned) when
> MPOL_F_NUMA_BALANCING is set because it mpol_store_user_nodemask()
> check for any mode flag.
>
> union {
> nodemask_t cpuset_mems_allowed; /* relative to these nodes */
> nodemask_t user_nodemask; /* nodemask passed by user */
> } w;
>
> static inline int mpol_store_user_nodemask(const struct mempolicy *pol)
> {
> return pol->flags & MPOL_MODE_FLAGS;
> }
>
> static int mpol_set_nodemask(...)
> {
> if (mpol_store_user_nodemask(pol))
> pol->w.user_nodemask = *nodes;
> else
> pol->w.cpuset_mems_allowed = cpuset_current_mems_allowed;
> }
>
>
> 3. `mpol_rebind_nodemask()` consequently ends up rebinding based on the
> user-passed nodemask rather than the cpuset_mems_allowed nodemask
> as intended.
>
> static void mpol_rebind_nodemask()
> {
> if (pol->flags & MPOL_F_STATIC_NODES)
> nodes_and(tmp, pol->w.user_nodemask, *nodes);
> else if (pol->flags & MPOL_F_RELATIVE_NODES)
> mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes);
> else
> nodes_remap(tmp, pol->nodes, pol->w.cpuset_mems_allowed,
> *nodes);
> ...
> }
>
> To fix this, only store the user nodemask if MPOL_F_STATIC_NODES or
> MPOL_F_RELATIVE_NODES are present.
>
> -----------
Thanks for review, I will update the changelog and send v2.
> On another note... what's even the reason for this union to exist if you
> need to know the flag state to determine which one to access????
>
> and they're both nodemask_t!
>
> May as well call it `pol->w.rebind_mask` or something and let the flags do
> the talking.
Yes, the only difference is which type of nodemask is stored.
> Otherwise the fix looks good, I will respond to the original with a
> review tag.
>
> ~Gregory
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/mempolicy: fix mpol_rebind_nodemask() for MPOL_F_NUMA_BALANCING
2025-12-13 8:29 [PATCH] mm/mempolicy: fix mpol_rebind_nodemask() for MPOL_F_NUMA_BALANCING Jinjiang Tu
2025-12-15 0:04 ` Andrew Morton
@ 2025-12-19 19:23 ` Gregory Price
2025-12-21 7:06 ` Huang, Ying
2 siblings, 0 replies; 8+ messages in thread
From: Gregory Price @ 2025-12-19 19:23 UTC (permalink / raw)
To: Jinjiang Tu
Cc: akpm, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, ying.huang, apopple, mgorman, linux-mm,
wangkefeng.wang
On Sat, Dec 13, 2025 at 04:29:11PM +0800, Jinjiang Tu wrote:
> When mempolicy is rebound due to the process moves to a different cpuset
> context, or the set of nodes allowed by current cpuset context changes,
> mpol_rebind_nodemask() remaps the nodemask according to the old and new
> cpuset_mems_allowed by default. So, use mempolicy.w.cpuset_mems_allowed
> to store the old nodemask allowed by cpuset.
>
> MPOL_F_STATIC_NODES suppresses the node remap and intersects the user's
> passed nodemask and nodes allowed by new cpuset context.
> For MPOL_F_RELATIVE_NODES, the user's passed nodemask means node IDs that
> are relative to the set of node IDs allowed by the process's current
> cpuset. So, use mempolicy.w.user_nodemask to store the user's passed
> nodemask.
>
> commit bda420b98505 ("numa balancing: migrate on fault among multiple
> bound nodes") adds new flag MPOL_F_NUMA_BALANCING to enable NUMA balancing
> for MPOL_BIND, the behaviour of rebinding should be same with default
> befaviour. However, mpol_store_user_nodemask() returns true for
> MPOL_F_NUMA_BALANCING, leading to mempolicy.w.cpuset_mems_allowed stores
> the user's passed nodemask instead of cpuset_current_mems_allowed, and
> mpol_rebind_nodemask() remaps wrongly.
>
> Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes")
> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Fix looks good. Thank you!
With changelog updates discussed in the thread with Andrew:
Reviewed-by: Gregory Price <gourry@gourry.net>
~Gregory
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm/mempolicy: fix mpol_rebind_nodemask() for MPOL_F_NUMA_BALANCING
2025-12-13 8:29 [PATCH] mm/mempolicy: fix mpol_rebind_nodemask() for MPOL_F_NUMA_BALANCING Jinjiang Tu
2025-12-15 0:04 ` Andrew Morton
2025-12-19 19:23 ` Gregory Price
@ 2025-12-21 7:06 ` Huang, Ying
2025-12-22 3:08 ` Jinjiang Tu
2 siblings, 1 reply; 8+ messages in thread
From: Huang, Ying @ 2025-12-21 7:06 UTC (permalink / raw)
To: Jinjiang Tu
Cc: akpm, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, apopple, mgorman, linux-mm, wangkefeng.wang
Hi, Jinjiang,
Jinjiang Tu <tujinjiang@huawei.com> writes:
> When mempolicy is rebound due to the process moves to a different cpuset
> context, or the set of nodes allowed by current cpuset context changes,
> mpol_rebind_nodemask() remaps the nodemask according to the old and new
> cpuset_mems_allowed by default. So, use mempolicy.w.cpuset_mems_allowed
> to store the old nodemask allowed by cpuset.
>
> MPOL_F_STATIC_NODES suppresses the node remap and intersects the user's
> passed nodemask and nodes allowed by new cpuset context.
> For MPOL_F_RELATIVE_NODES, the user's passed nodemask means node IDs that
> are relative to the set of node IDs allowed by the process's current
> cpuset. So, use mempolicy.w.user_nodemask to store the user's passed
> nodemask.
>
> commit bda420b98505 ("numa balancing: migrate on fault among multiple
> bound nodes") adds new flag MPOL_F_NUMA_BALANCING to enable NUMA balancing
> for MPOL_BIND, the behaviour of rebinding should be same with default
> befaviour. However, mpol_store_user_nodemask() returns true for
> MPOL_F_NUMA_BALANCING, leading to mempolicy.w.cpuset_mems_allowed stores
> the user's passed nodemask instead of cpuset_current_mems_allowed, and
> mpol_rebind_nodemask() remaps wrongly.
Good catch! Thanks for fixing this.
> Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes")
> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
> ---
> include/uapi/linux/mempolicy.h | 6 ++++++
> mm/mempolicy.c | 2 +-
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
> index 8fbbe613611a..1802b6c89603 100644
> --- a/include/uapi/linux/mempolicy.h
> +++ b/include/uapi/linux/mempolicy.h
> @@ -39,6 +39,12 @@ enum {
> #define MPOL_MODE_FLAGS \
> (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES | MPOL_F_NUMA_BALANCING)
>
> +/*
> + * MPOL_USER_NODEMASK_FLAGS is used to determine if nodemask passed by
> + * users should be used in mpol_rebind_nodemask().
> + */
This sounds a little internal in a user API header file. How about
something like below.
/* Whether is a nodemask specified by user */
> +#define MPOL_USER_NODEMASK_FLAGS (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES)
> +
> /* Flags for get_mempolicy */
> #define MPOL_F_NODE (1<<0) /* return next IL mode instead of node mask */
> #define MPOL_F_ADDR (1<<1) /* look up vma using address */
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 68a98ba57882..76da50425712 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -365,7 +365,7 @@ static const struct mempolicy_operations {
>
> static inline int mpol_store_user_nodemask(const struct mempolicy *pol)
> {
> - return pol->flags & MPOL_MODE_FLAGS;
> + return pol->flags & MPOL_USER_NODEMASK_FLAGS;
> }
>
> static void mpol_relative_nodemask(nodemask_t *ret, const nodemask_t *orig,
---
Best Regards,
Huang, Ying
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm/mempolicy: fix mpol_rebind_nodemask() for MPOL_F_NUMA_BALANCING
2025-12-21 7:06 ` Huang, Ying
@ 2025-12-22 3:08 ` Jinjiang Tu
0 siblings, 0 replies; 8+ messages in thread
From: Jinjiang Tu @ 2025-12-22 3:08 UTC (permalink / raw)
To: Huang, Ying
Cc: akpm, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, gourry, apopple, mgorman, linux-mm, wangkefeng.wang
在 2025/12/21 15:06, Huang, Ying 写道:
> Hi, Jinjiang,
>
> Jinjiang Tu <tujinjiang@huawei.com> writes:
>
>> When mempolicy is rebound due to the process moves to a different cpuset
>> context, or the set of nodes allowed by current cpuset context changes,
>> mpol_rebind_nodemask() remaps the nodemask according to the old and new
>> cpuset_mems_allowed by default. So, use mempolicy.w.cpuset_mems_allowed
>> to store the old nodemask allowed by cpuset.
>>
>> MPOL_F_STATIC_NODES suppresses the node remap and intersects the user's
>> passed nodemask and nodes allowed by new cpuset context.
>> For MPOL_F_RELATIVE_NODES, the user's passed nodemask means node IDs that
>> are relative to the set of node IDs allowed by the process's current
>> cpuset. So, use mempolicy.w.user_nodemask to store the user's passed
>> nodemask.
>>
>> commit bda420b98505 ("numa balancing: migrate on fault among multiple
>> bound nodes") adds new flag MPOL_F_NUMA_BALANCING to enable NUMA balancing
>> for MPOL_BIND, the behaviour of rebinding should be same with default
>> befaviour. However, mpol_store_user_nodemask() returns true for
>> MPOL_F_NUMA_BALANCING, leading to mempolicy.w.cpuset_mems_allowed stores
>> the user's passed nodemask instead of cpuset_current_mems_allowed, and
>> mpol_rebind_nodemask() remaps wrongly.
> Good catch! Thanks for fixing this.
>
>> Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes")
>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
>> ---
>> include/uapi/linux/mempolicy.h | 6 ++++++
>> mm/mempolicy.c | 2 +-
>> 2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
>> index 8fbbe613611a..1802b6c89603 100644
>> --- a/include/uapi/linux/mempolicy.h
>> +++ b/include/uapi/linux/mempolicy.h
>> @@ -39,6 +39,12 @@ enum {
>> #define MPOL_MODE_FLAGS \
>> (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES | MPOL_F_NUMA_BALANCING)
>>
>> +/*
>> + * MPOL_USER_NODEMASK_FLAGS is used to determine if nodemask passed by
>> + * users should be used in mpol_rebind_nodemask().
>> + */
> This sounds a little internal in a user API header file. How about
> something like below.
Thanks, I have updated it in v2.
>
> /* Whether is a nodemask specified by user */
>
>> +#define MPOL_USER_NODEMASK_FLAGS (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES)
>> +
>> /* Flags for get_mempolicy */
>> #define MPOL_F_NODE (1<<0) /* return next IL mode instead of node mask */
>> #define MPOL_F_ADDR (1<<1) /* look up vma using address */
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index 68a98ba57882..76da50425712 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -365,7 +365,7 @@ static const struct mempolicy_operations {
>>
>> static inline int mpol_store_user_nodemask(const struct mempolicy *pol)
>> {
>> - return pol->flags & MPOL_MODE_FLAGS;
>> + return pol->flags & MPOL_USER_NODEMASK_FLAGS;
>> }
>>
>> static void mpol_relative_nodemask(nodemask_t *ret, const nodemask_t *orig,
> ---
> Best Regards,
> Huang, Ying
^ permalink raw reply [flat|nested] 8+ messages in thread