linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing
@ 2024-11-23 19:09 Junjie Fu
  2024-11-23 22:15 ` Matthew Wilcox
  2024-11-25 11:33 ` Michal Hocko
  0 siblings, 2 replies; 7+ messages in thread
From: Junjie Fu @ 2024-11-23 19:09 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm, dave.hansen, mhocko, fujunjie1

When handling a page fault caused by NUMA balancing (do_numa_page), it is
necessary to decide whether to migrate the current page to another node or
keep it on its current node. For pages with the MPOL_PREFERRED memory
policy, it is sufficient to check whether the first node set in the
nodemask is the same as the node where the page is currently located. If
this is the case, the page should remain in its current state. Otherwise,
migration to another node should be attempted.

Because the definition of MPOL_PREFERRED is as follows: "This mode sets the
preferred node for allocation. The kernel will try to allocate pages from
this node first and fall back to nearby nodes if the preferred node is low
on free memory. If the nodemask specifies more than one node ID, the first
node in the mask will be selected as the preferred node."

Thus, if the node where the current page resides is not the first node in
the nodemask, it is not the PREFERRED node, and memory migration can be
attempted.

However, in the original code, the check only verifies whether the current
node exists in the nodemask (which may or may not be the first node in the
mask). This could lead to a scenario where, if the current node is not the
first node in the nodemask, the code incorrectly decides not to attempt
migration to other nodes.

This behavior is clearly incorrect. If the target node for migration and
the page's current NUMA node are both within the nodemask but neither is
the first node, they should be treated with the same priority, and
migration attempts should proceed.

Signed-off-by: Junjie Fu <fujunjie1@qq.com>
---
 mm/mempolicy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bb37cd1a51d8..3454dfc7da8d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2769,7 +2769,7 @@ int mpol_misplaced(struct folio *folio, struct vm_fault *vmf,
 		break;
 
 	case MPOL_PREFERRED:
-		if (node_isset(curnid, pol->nodes))
+		if (curnid == first_node(pol->nodes))
 			goto out;
 		polnid = first_node(pol->nodes);
 		break;
-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing
  2024-11-23 19:09 [PATCH] mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing Junjie Fu
@ 2024-11-23 22:15 ` Matthew Wilcox
  2024-11-25 11:33 ` Michal Hocko
  1 sibling, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2024-11-23 22:15 UTC (permalink / raw)
  To: Junjie Fu; +Cc: linux-mm, linux-kernel, akpm, dave.hansen, mhocko

On Sun, Nov 24, 2024 at 03:09:35AM +0800, Junjie Fu wrote:
> Because the definition of MPOL_PREFERRED is as follows: "This mode sets the
> preferred node for allocation. The kernel will try to allocate pages from
> this node first and fall back to nearby nodes if the preferred node is low
> on free memory. If the nodemask specifies more than one node ID, the first
> node in the mask will be selected as the preferred node."
> 
> Thus, if the node where the current page resides is not the first node in
> the nodemask, it is not the PREFERRED node, and memory migration can be
> attempted.

I think you've found poor documentation, not a kernel bug.  If multiple
nodes are set in PREFERRED, then _new_ allocations should come from the
first node, but _existing_ allocations do not need to be moved to the
new node.  At least IMO that was the original intent of allowing
multiple nodes to be set.  Otherwise, what is the point?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing
  2024-11-23 19:09 [PATCH] mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing Junjie Fu
  2024-11-23 22:15 ` Matthew Wilcox
@ 2024-11-25 11:33 ` Michal Hocko
  2024-11-25 16:06   ` Gregory Price
  2024-11-25 19:45   ` Junjie Fu
  1 sibling, 2 replies; 7+ messages in thread
From: Michal Hocko @ 2024-11-25 11:33 UTC (permalink / raw)
  To: Junjie Fu; +Cc: linux-mm, linux-kernel, akpm, dave.hansen

On Sun 24-11-24 03:09:35, Junjie Fu wrote:
> When handling a page fault caused by NUMA balancing (do_numa_page), it is
> necessary to decide whether to migrate the current page to another node or
> keep it on its current node. For pages with the MPOL_PREFERRED memory
> policy, it is sufficient to check whether the first node set in the
> nodemask is the same as the node where the page is currently located. If
> this is the case, the page should remain in its current state. Otherwise,
> migration to another node should be attempted.
> 
> Because the definition of MPOL_PREFERRED is as follows: "This mode sets the
> preferred node for allocation. The kernel will try to allocate pages from
> this node first and fall back to nearby nodes if the preferred node is low
> on free memory. If the nodemask specifies more than one node ID, the first
> node in the mask will be selected as the preferred node."
> 
> Thus, if the node where the current page resides is not the first node in
> the nodemask, it is not the PREFERRED node, and memory migration can be
> attempted.
> 
> However, in the original code, the check only verifies whether the current
> node exists in the nodemask (which may or may not be the first node in the
> mask). This could lead to a scenario where, if the current node is not the
> first node in the nodemask, the code incorrectly decides not to attempt
> migration to other nodes.
> 
> This behavior is clearly incorrect. If the target node for migration and
> the page's current NUMA node are both within the nodemask but neither is
> the first node, they should be treated with the same priority, and
> migration attempts should proceed.

The code is clearly confusing but is there any actual problem to be
solved? IIRC although we do keep nodemask for MPOL_PREFERRED
policy we do not allow to set more than a single node to be set there.
Have a look at mpol_new_preferred

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing
  2024-11-25 11:33 ` Michal Hocko
@ 2024-11-25 16:06   ` Gregory Price
  2024-11-25 19:45   ` Junjie Fu
  1 sibling, 0 replies; 7+ messages in thread
From: Gregory Price @ 2024-11-25 16:06 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Junjie Fu, linux-mm, linux-kernel, akpm, dave.hansen

On Mon, Nov 25, 2024 at 12:33:52PM +0100, Michal Hocko wrote:
> On Sun 24-11-24 03:09:35, Junjie Fu wrote:
> > When handling a page fault caused by NUMA balancing (do_numa_page), it is
> > necessary to decide whether to migrate the current page to another node or
> > keep it on its current node. For pages with the MPOL_PREFERRED memory
> > policy, it is sufficient to check whether the first node set in the
> > nodemask is the same as the node where the page is currently located. If
> > this is the case, the page should remain in its current state. Otherwise,
> > migration to another node should be attempted.
> > 
> > Because the definition of MPOL_PREFERRED is as follows: "This mode sets the
> > preferred node for allocation. The kernel will try to allocate pages from
> > this node first and fall back to nearby nodes if the preferred node is low
> > on free memory. If the nodemask specifies more than one node ID, the first
> > node in the mask will be selected as the preferred node."
> > 
> > Thus, if the node where the current page resides is not the first node in
> > the nodemask, it is not the PREFERRED node, and memory migration can be
> > attempted.
> > 
> > However, in the original code, the check only verifies whether the current
> > node exists in the nodemask (which may or may not be the first node in the
> > mask). This could lead to a scenario where, if the current node is not the
> > first node in the nodemask, the code incorrectly decides not to attempt
> > migration to other nodes.
> > 
> > This behavior is clearly incorrect. If the target node for migration and
> > the page's current NUMA node are both within the nodemask but neither is
> > the first node, they should be treated with the same priority, and
> > migration attempts should proceed.
> 
> The code is clearly confusing but is there any actual problem to be
> solved? IIRC although we do keep nodemask for MPOL_PREFERRED
> policy we do not allow to set more than a single node to be set there.
> Have a look at mpol_new_preferred
>

concur here - the proposed patch doesn't actually change any behavior
(or it shouldn, at least).

Is there a migration error being observed that this patch fixes, or is
this just an `observational fix`?

~Gregory


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing
  2024-11-25 11:33 ` Michal Hocko
  2024-11-25 16:06   ` Gregory Price
@ 2024-11-25 19:45   ` Junjie Fu
  2024-11-25 20:18     ` Michal Hocko
  1 sibling, 1 reply; 7+ messages in thread
From: Junjie Fu @ 2024-11-25 19:45 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, linux-mm, akpm, dave.hansen, gourry

On November 25, 2024 at 19:33, Michal Hocko wrote:
> On Sun 24-11-24 03:09:35, Junjie Fu wrote:
>> When handling a page fault caused by NUMA balancing (do_numa_page), it is
>> necessary to decide whether to migrate the current page to another node or
>> keep it on its current node. For pages with the MPOL_PREFERRED memory
>> policy, it is sufficient to check whether the first node set in the
>> nodemask is the same as the node where the page is currently located. If
>> this is the case, the page should remain in its current state. Otherwise,
>> migration to another node should be attempted.
>>
>> Because the definition of MPOL_PREFERRED is as follows: "This mode sets the
>> preferred node for allocation. The kernel will try to allocate pages from
>> this node first and fall back to nearby nodes if the preferred node is low
>> on free memory. If the nodemask specifies more than one node ID, the first
>> node in the mask will be selected as the preferred node."
>>
>> Thus, if the node where the current page resides is not the first node in
>> the nodemask, it is not the PREFERRED node, and memory migration can be
>> attempted.
>>
>> However, in the original code, the check only verifies whether the current
>> node exists in the nodemask (which may or may not be the first node in the
>> mask). This could lead to a scenario where, if the current node is not the
>> first node in the nodemask, the code incorrectly decides not to attempt
>> migration to other nodes.
>>
>> This behavior is clearly incorrect. If the target node for migration and
>> the page's current NUMA node are both within the nodemask but neither is
>> the first node, they should be treated with the same priority, and
>> migration attempts should proceed.
> 
> The code is clearly confusing but is there any actual problem to be
> solved? IIRC although we do keep nodemask for MPOL_PREFERRED
> policy we do not allow to set more than a single node to be set there.
> Have a look at mpol_new_preferred
> 

I apologize for the oversight when reviewing the code regarding the 
process of setting only the first node in the nodemask for the 
MPOL_PREFERRED memory policy. After reviewing the mpol_new_preferred 
function, I realized that when setting the memory policy, only the first 
node from the user's nodemask is copied into the corresponding memory 
policy instance's nodemask, as shown in the following code:

static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t 
*nodes)
{
     if (nodes_empty(*nodes))
         return -EINVAL;

     nodes_clear(pol->nodes);
     node_set(first_node(*nodes), pol->nodes); //only the first node to 
be set
     return 0;
}

Due to my previous oversight, I mistakenly assumed that multiple nodes 
could be set in pol->nodes, leading to my incorrect understanding. 
Therefore, the original code is correct. Thank you all for your responses.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing
  2024-11-25 19:45   ` Junjie Fu
@ 2024-11-25 20:18     ` Michal Hocko
  2024-11-25 20:41       ` Junjie Fu
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2024-11-25 20:18 UTC (permalink / raw)
  To: Junjie Fu; +Cc: linux-kernel, linux-mm, akpm, dave.hansen, gourry

On Tue 26-11-24 03:45:01, Junjie Fu wrote:
[...]
> I apologize for the oversight when reviewing the code regarding the process
> of setting only the first node in the nodemask for the MPOL_PREFERRED memory
> policy. 

There is no need to apologize! Really, this code is far from
straightforward and it is not easy to get all the loose ends together.
What helps though, is to help reviewers with the problem statement. It
often helps to state whether the fix is based on code review or it is
fixing a real life or even artificial workload.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing
  2024-11-25 20:18     ` Michal Hocko
@ 2024-11-25 20:41       ` Junjie Fu
  0 siblings, 0 replies; 7+ messages in thread
From: Junjie Fu @ 2024-11-25 20:41 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, dave.hansen, gourry, linux-kernel, linux-mm, willy



On November 26, 2024 at 4:18, Michal Hocko wrote:
> On Tue 26-11-24 03:45:01, Junjie Fu wrote:
> [...]
>> I apologize for the oversight when reviewing the code regarding the process
>> of setting only the first node in the nodemask for the MPOL_PREFERRED memory
>> policy.
> 
> There is no need to apologize! Really, this code is far from
> straightforward and it is not easy to get all the loose ends together.
> What helps though, is to help reviewers with the problem statement. It
> often helps to state whether the fix is based on code review or it is
> fixing a real life or even artificial workload.

Thank you very much for your understanding and kind words. I truly 
appreciate your helpful suggestion, and I will make sure to provide more 
context and clarity for reviewers in future submissions to avoid any 
confusion.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-11-25 20:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-23 19:09 [PATCH] mm/mempolicy: Fix decision-making issues for memory migration during NUMA balancing Junjie Fu
2024-11-23 22:15 ` Matthew Wilcox
2024-11-25 11:33 ` Michal Hocko
2024-11-25 16:06   ` Gregory Price
2024-11-25 19:45   ` Junjie Fu
2024-11-25 20:18     ` Michal Hocko
2024-11-25 20:41       ` Junjie Fu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox