* [PATCH 0/2] handle memoryless nodes more appropriately
@ 2023-02-15 15:24 Qi Zheng
2023-02-15 15:24 ` [PATCH 1/2] mm: page_alloc: skip memoryless nodes entirely Qi Zheng
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Qi Zheng @ 2023-02-15 15:24 UTC (permalink / raw)
To: akpm, vbabka, david, rppt, mhocko
Cc: willy, mgorman, osalvador, linux-mm, linux-kernel, Qi Zheng
Hi all,
Currently, in the process of initialization or offline memory, memoryless
nodes will still be built into the fallback list of itself or other nodes.
This is not what we expected, so this patch series removes memoryless
nodes from the fallback list entirely.
Comments and suggestions are welcome.
Thanks,
Qi
Qi Zheng (2):
mm: page_alloc: skip memoryless nodes entirely
mm: memory_hotplug: drop memoryless node from fallback lists
mm/memory_hotplug.c | 2 +-
mm/page_alloc.c | 7 +++++--
2 files changed, 6 insertions(+), 3 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/2] mm: page_alloc: skip memoryless nodes entirely 2023-02-15 15:24 [PATCH 0/2] handle memoryless nodes more appropriately Qi Zheng @ 2023-02-15 15:24 ` Qi Zheng 2023-02-15 15:24 ` [PATCH 2/2] mm: memory_hotplug: drop memoryless node from fallback lists Qi Zheng 2023-02-15 16:36 ` [PATCH 0/2] handle memoryless nodes more appropriately Michal Hocko 2 siblings, 0 replies; 9+ messages in thread From: Qi Zheng @ 2023-02-15 15:24 UTC (permalink / raw) To: akpm, vbabka, david, rppt, mhocko Cc: willy, mgorman, osalvador, linux-mm, linux-kernel, Qi Zheng In find_next_best_node(), We skipped the memoryless nodes when building the zonelists of other normal nodes (N_NORMAL), but did not skip the memoryless node itself when building the zonelist. This will cause it to be traversed at runtime. For example, say we have node0 and node1, node0 is memoryless node, then the fallback order of node0 and node1 as follows: [ 0.153005] Fallback order for Node 0: 0 1 [ 0.153564] Fallback order for Node 1: 1 After this patch, we skip memoryless node0 entirely, then the fallback order of node0 and node1 as follows: [ 0.155236] Fallback order for Node 0: 1 [ 0.155806] Fallback order for Node 1: 1 So it becomes completely invisible, which will reduce runtime overhead. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- mm/page_alloc.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0745aedebb37..f0b17dd71bec 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6377,8 +6377,11 @@ int find_next_best_node(int node, nodemask_t *used_node_mask) int min_val = INT_MAX; int best_node = NUMA_NO_NODE; - /* Use the local node if we haven't already */ - if (!node_isset(node, *used_node_mask)) { + /* + * Use the local node if we haven't already. But for memoryless local + * node, we should skip it and fallback to other nodes. + */ + if (!node_isset(node, *used_node_mask) && node_state(node, N_MEMORY)) { node_set(node, *used_node_mask); return node; } -- 2.20.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] mm: memory_hotplug: drop memoryless node from fallback lists 2023-02-15 15:24 [PATCH 0/2] handle memoryless nodes more appropriately Qi Zheng 2023-02-15 15:24 ` [PATCH 1/2] mm: page_alloc: skip memoryless nodes entirely Qi Zheng @ 2023-02-15 15:24 ` Qi Zheng 2023-02-15 16:36 ` [PATCH 0/2] handle memoryless nodes more appropriately Michal Hocko 2 siblings, 0 replies; 9+ messages in thread From: Qi Zheng @ 2023-02-15 15:24 UTC (permalink / raw) To: akpm, vbabka, david, rppt, mhocko Cc: willy, mgorman, osalvador, linux-mm, linux-kernel, Qi Zheng In offline_pages(), if a node becomes memoryless, we will clear its N_MEMORY state by calling node_states_clear_node(). But we do this after rebuilding the zonelists by calling build_all_zonelists(), which will cause this memoryless node to still be in the fallback list of other nodes. This will incur some runtime overhead. To drop memoryless node from fallback lists in this case, just call node_states_clear_node() before calling build_all_zonelists(). Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- mm/memory_hotplug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index fd40f7e9f176..1a5e5e8f7e13 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1931,12 +1931,12 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, /* reinitialise watermarks and update pcp limits */ init_per_zone_wmark_min(); + node_states_clear_node(node, &arg); if (!populated_zone(zone)) { zone_pcp_reset(zone); build_all_zonelists(NULL); } - node_states_clear_node(node, &arg); if (arg.status_change_nid >= 0) { kcompactd_stop(node); kswapd_stop(node); -- 2.20.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] handle memoryless nodes more appropriately 2023-02-15 15:24 [PATCH 0/2] handle memoryless nodes more appropriately Qi Zheng 2023-02-15 15:24 ` [PATCH 1/2] mm: page_alloc: skip memoryless nodes entirely Qi Zheng 2023-02-15 15:24 ` [PATCH 2/2] mm: memory_hotplug: drop memoryless node from fallback lists Qi Zheng @ 2023-02-15 16:36 ` Michal Hocko 2023-02-15 23:11 ` Qi Zheng 2 siblings, 1 reply; 9+ messages in thread From: Michal Hocko @ 2023-02-15 16:36 UTC (permalink / raw) To: Qi Zheng Cc: akpm, vbabka, david, rppt, willy, mgorman, osalvador, linux-mm, linux-kernel On Wed 15-02-23 23:24:10, Qi Zheng wrote: > Hi all, > > Currently, in the process of initialization or offline memory, memoryless > nodes will still be built into the fallback list of itself or other nodes. > > This is not what we expected, so this patch series removes memoryless > nodes from the fallback list entirely. > > Comments and suggestions are welcome. This is a tricky area full of surprises and it is really easy to introduce new problems. What kind of problem/issue are you trying to solve/handle by these changes? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] handle memoryless nodes more appropriately 2023-02-15 16:36 ` [PATCH 0/2] handle memoryless nodes more appropriately Michal Hocko @ 2023-02-15 23:11 ` Qi Zheng 2023-02-16 7:51 ` Michal Hocko 0 siblings, 1 reply; 9+ messages in thread From: Qi Zheng @ 2023-02-15 23:11 UTC (permalink / raw) To: Michal Hocko Cc: akpm, vbabka, david, rppt, willy, mgorman, osalvador, linux-mm, linux-kernel, Muchun Song On 2023/2/16 00:36, Michal Hocko wrote: > On Wed 15-02-23 23:24:10, Qi Zheng wrote: >> Hi all, >> >> Currently, in the process of initialization or offline memory, memoryless >> nodes will still be built into the fallback list of itself or other nodes. >> >> This is not what we expected, so this patch series removes memoryless >> nodes from the fallback list entirely. >> >> Comments and suggestions are welcome. Hi Michal, > > This is a tricky area full of surprises and it is really easy to Would you mind giving an example of a "new problem"? > introduce new problems. What kind of problem/issue are you trying to > solve/handle by these changes? IIUC, I think there are two reasons: Firstly, as mentioned in commit message, the memoryless node has no memory to allocate (If it can be allocated, it may also cause the panic I mentioned in [1]), so we should not continue to traverse it when allocating memory at runtime, which will have a certain overhead. Secondly, from the perspective of semantic correctness, why do we remove the memoryless node from the fallback list of other normal nodes (N_MEMORY), but not from its own fallback list (PATCH[1/2])? Why should an upcoming memoryless node continue exist in the fallback list of itself and other normal nodes (PATCH[2/2])? Please let me know if I missed something. [1] https://lore.kernel.org/lkml/20230212110305.93670-1-zhengqi.arch@bytedance.com/ Thanks, Qi > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] handle memoryless nodes more appropriately 2023-02-15 23:11 ` Qi Zheng @ 2023-02-16 7:51 ` Michal Hocko 2023-02-16 8:21 ` Qi Zheng 0 siblings, 1 reply; 9+ messages in thread From: Michal Hocko @ 2023-02-16 7:51 UTC (permalink / raw) To: Qi Zheng Cc: akpm, vbabka, david, rppt, willy, mgorman, osalvador, linux-mm, linux-kernel, Muchun Song On Thu 16-02-23 07:11:19, Qi Zheng wrote: > > > On 2023/2/16 00:36, Michal Hocko wrote: > > On Wed 15-02-23 23:24:10, Qi Zheng wrote: > > > Hi all, > > > > > > Currently, in the process of initialization or offline memory, memoryless > > > nodes will still be built into the fallback list of itself or other nodes. > > > > > > This is not what we expected, so this patch series removes memoryless > > > nodes from the fallback list entirely. > > > > > > Comments and suggestions are welcome. > > Hi Michal, > > > > > This is a tricky area full of surprises and it is really easy to > > Would you mind giving an example of a "new problem"? The initialization is spread over several places and it is quite easy to introduce bugs because it is hard to review this area. Been there done that. Just look into the git log. > > introduce new problems. What kind of problem/issue are you trying to > > solve/handle by these changes? > > IIUC, I think there are two reasons: > > Firstly, as mentioned in commit message, the memoryless node has no > memory to allocate (If it can be allocated, it may also cause the panic > I mentioned in [1]), so we should not continue to traverse it when > allocating memory at runtime, which will have a certain overhead. Sure that is not the most optimal implementation but does this matter in practice? Can you observe any actual measurable performance penalty? Currently we are just sacrificing some tiny performance for a simplicity. > Secondly, from the perspective of semantic correctness, why do we remove > the memoryless node from the fallback list of other normal nodes > (N_MEMORY), but not from its own fallback list (PATCH[1/2])? Why should > an upcoming memoryless node continue exist in the fallback list of > itself and other normal nodes (PATCH[2/2])? I am not sure I follow. What is the semantic correctness issue? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] handle memoryless nodes more appropriately 2023-02-16 7:51 ` Michal Hocko @ 2023-02-16 8:21 ` Qi Zheng 2023-02-16 8:37 ` Michal Hocko 0 siblings, 1 reply; 9+ messages in thread From: Qi Zheng @ 2023-02-16 8:21 UTC (permalink / raw) To: Michal Hocko Cc: akpm, vbabka, david, rppt, willy, mgorman, osalvador, linux-mm, linux-kernel, Muchun Song On 2023/2/16 15:51, Michal Hocko wrote: > On Thu 16-02-23 07:11:19, Qi Zheng wrote: >> >> >> On 2023/2/16 00:36, Michal Hocko wrote: >>> On Wed 15-02-23 23:24:10, Qi Zheng wrote: >>>> Hi all, >>>> >>>> Currently, in the process of initialization or offline memory, memoryless >>>> nodes will still be built into the fallback list of itself or other nodes. >>>> >>>> This is not what we expected, so this patch series removes memoryless >>>> nodes from the fallback list entirely. >>>> >>>> Comments and suggestions are welcome. >> >> Hi Michal, >> >>> >>> This is a tricky area full of surprises and it is really easy to >> >> Would you mind giving an example of a "new problem"? > > The initialization is spread over several places and it is quite easy to > introduce bugs because it is hard to review this area. Been there done > that. Just look into the git log. I understand your concern, but should we therefore reject all revisions to this? > >>> introduce new problems. What kind of problem/issue are you trying to >>> solve/handle by these changes? >> >> IIUC, I think there are two reasons: >> >> Firstly, as mentioned in commit message, the memoryless node has no >> memory to allocate (If it can be allocated, it may also cause the panic >> I mentioned in [1]), so we should not continue to traverse it when >> allocating memory at runtime, which will have a certain overhead. > > Sure that is not the most optimal implementation but does this matter in > practice? Can you observe any actual measurable performance penalty? No, and the original reason for noticing this place was the panic I mentioned in [1] (< NODE_MIN_SIZE). And if we had handled the memoryless node's zonelist correctly before, we wouldn't have had that panic at all. > Currently we are just sacrificing some tiny performance for a > simplicity. Hmm, I don't think my modification complicates the code. > >> Secondly, from the perspective of semantic correctness, why do we remove >> the memoryless node from the fallback list of other normal nodes >> (N_MEMORY), but not from its own fallback list (PATCH[1/2])? Why should >> an upcoming memoryless node continue exist in the fallback list of >> itself and other normal nodes (PATCH[2/2])? > > I am not sure I follow. What is the semantic correctness issue? Sorry for the ambiguity, what I meant was that memoryless nodes should never have been built into any fallback list, not just for performance optimizations. > -- Thanks, Qi ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] handle memoryless nodes more appropriately 2023-02-16 8:21 ` Qi Zheng @ 2023-02-16 8:37 ` Michal Hocko 2023-02-16 10:50 ` Qi Zheng 0 siblings, 1 reply; 9+ messages in thread From: Michal Hocko @ 2023-02-16 8:37 UTC (permalink / raw) To: Qi Zheng Cc: akpm, vbabka, david, rppt, willy, mgorman, osalvador, linux-mm, linux-kernel, Muchun Song On Thu 16-02-23 16:21:54, Qi Zheng wrote: > > > On 2023/2/16 15:51, Michal Hocko wrote: > > On Thu 16-02-23 07:11:19, Qi Zheng wrote: > > > > > > > > > On 2023/2/16 00:36, Michal Hocko wrote: > > > > On Wed 15-02-23 23:24:10, Qi Zheng wrote: > > > > > Hi all, > > > > > > > > > > Currently, in the process of initialization or offline memory, memoryless > > > > > nodes will still be built into the fallback list of itself or other nodes. > > > > > > > > > > This is not what we expected, so this patch series removes memoryless > > > > > nodes from the fallback list entirely. > > > > > > > > > > Comments and suggestions are welcome. > > > > > > Hi Michal, > > > > > > > > > > > This is a tricky area full of surprises and it is really easy to > > > > > > Would you mind giving an example of a "new problem"? > > > > The initialization is spread over several places and it is quite easy to > > introduce bugs because it is hard to review this area. Been there done > > that. Just look into the git log. > > I understand your concern, but should we therefore reject all revisions > to this? No, but either somebode is willing to invest a non-trivial amount of time and unify the NUMA initialization code that is spread over arch specific code in different places or we should just focus on addressing bugs. > > > > introduce new problems. What kind of problem/issue are you trying to > > > > solve/handle by these changes? > > > > > > IIUC, I think there are two reasons: > > > > > > Firstly, as mentioned in commit message, the memoryless node has no > > > memory to allocate (If it can be allocated, it may also cause the panic > > > I mentioned in [1]), so we should not continue to traverse it when > > > allocating memory at runtime, which will have a certain overhead. > > > > Sure that is not the most optimal implementation but does this matter in > > practice? Can you observe any actual measurable performance penalty? > > No, and the original reason for noticing this place was the panic I > mentioned in [1] (< NODE_MIN_SIZE). And if we had handled the memoryless > node's zonelist correctly before, we wouldn't have had that panic at > all. Yes, this is another good example of how subtle the code is. Mike has posted a patch that simply drops the NODE_MIN_SIZE constrain and I believe that is the right thing to do at this stage. There is a non-zero risk of regression but at least we will be forced to fix the original problem properly or at least document is properly. > > Currently we are just sacrificing some tiny performance for a > > simplicity. > Hmm, I don't think my modification complicates the code. > > > > Secondly, from the perspective of semantic correctness, why do we remove > > > the memoryless node from the fallback list of other normal nodes > > > (N_MEMORY), but not from its own fallback list (PATCH[1/2])? Why should > > > an upcoming memoryless node continue exist in the fallback list of > > > itself and other normal nodes (PATCH[2/2])? > > > > I am not sure I follow. What is the semantic correctness issue? > > Sorry for the ambiguity, what I meant was that memoryless nodes should > never have been built into any fallback list, not just for performance > optimizations. Well, I am not 100% sure I agree with you here. The performance would be the only reason why to drop those nodes from zonelists. Other than that zonelists are a useful abstraction for the node distance ordering. Even if those nodes do not have any memory at all in principle there is no big difference from depleted nodes. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] handle memoryless nodes more appropriately 2023-02-16 8:37 ` Michal Hocko @ 2023-02-16 10:50 ` Qi Zheng 0 siblings, 0 replies; 9+ messages in thread From: Qi Zheng @ 2023-02-16 10:50 UTC (permalink / raw) To: Michal Hocko Cc: akpm, vbabka, david, rppt, willy, mgorman, osalvador, linux-mm, linux-kernel, Muchun Song On 2023/2/16 16:37, Michal Hocko wrote: > On Thu 16-02-23 16:21:54, Qi Zheng wrote: >> >> >> On 2023/2/16 15:51, Michal Hocko wrote: >>> On Thu 16-02-23 07:11:19, Qi Zheng wrote: >>>> >>>> >>>> On 2023/2/16 00:36, Michal Hocko wrote: >>>>> On Wed 15-02-23 23:24:10, Qi Zheng wrote: >>>>>> Hi all, >>>>>> >>>>>> Currently, in the process of initialization or offline memory, memoryless >>>>>> nodes will still be built into the fallback list of itself or other nodes. >>>>>> >>>>>> This is not what we expected, so this patch series removes memoryless >>>>>> nodes from the fallback list entirely. >>>>>> >>>>>> Comments and suggestions are welcome. >>>> >>>> Hi Michal, >>>> >>>>> >>>>> This is a tricky area full of surprises and it is really easy to >>>> >>>> Would you mind giving an example of a "new problem"? >>> >>> The initialization is spread over several places and it is quite easy to >>> introduce bugs because it is hard to review this area. Been there done >>> that. Just look into the git log. >> >> I understand your concern, but should we therefore reject all revisions >> to this? > > No, but either somebode is willing to invest a non-trivial amount of > time and unify the NUMA initialization code that is spread over arch > specific code in different places or we should just focus on addressing > bugs. > >>>>> introduce new problems. What kind of problem/issue are you trying to >>>>> solve/handle by these changes? >>>> >>>> IIUC, I think there are two reasons: >>>> >>>> Firstly, as mentioned in commit message, the memoryless node has no >>>> memory to allocate (If it can be allocated, it may also cause the panic >>>> I mentioned in [1]), so we should not continue to traverse it when >>>> allocating memory at runtime, which will have a certain overhead. >>> >>> Sure that is not the most optimal implementation but does this matter in >>> practice? Can you observe any actual measurable performance penalty? >> >> No, and the original reason for noticing this place was the panic I >> mentioned in [1] (< NODE_MIN_SIZE). And if we had handled the memoryless >> node's zonelist correctly before, we wouldn't have had that panic at >> all. > > Yes, this is another good example of how subtle the code is. Mike has > posted a patch that simply drops the NODE_MIN_SIZE constrain and I > believe that is the right thing to do at this stage. There is a non-zero > risk of regression but at least we will be forced to fix the original > problem properly or at least document is properly. > >>> Currently we are just sacrificing some tiny performance for a >>> simplicity. >> Hmm, I don't think my modification complicates the code. >> >>>> Secondly, from the perspective of semantic correctness, why do we remove >>>> the memoryless node from the fallback list of other normal nodes >>>> (N_MEMORY), but not from its own fallback list (PATCH[1/2])? Why should >>>> an upcoming memoryless node continue exist in the fallback list of >>>> itself and other normal nodes (PATCH[2/2])? >>> >>> I am not sure I follow. What is the semantic correctness issue? >> >> Sorry for the ambiguity, what I meant was that memoryless nodes should >> never have been built into any fallback list, not just for performance >> optimizations. > > Well, I am not 100% sure I agree with you here. The performance would be > the only reason why to drop those nodes from zonelists. Other than that > zonelists are a useful abstraction for the node distance ordering. Even > if those nodes do not have any memory at all in principle there is no > big difference from depleted nodes. I see what you mean, no more code for no more bugs (in cases where is no obvious gain). But I still feel that the current implementation is rather weird (deleted some, and kept some), and my changes are actually very small. Anyway, let's wait for other people's opinions. :) -- Thanks, Qi ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-02-16 10:51 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-02-15 15:24 [PATCH 0/2] handle memoryless nodes more appropriately Qi Zheng 2023-02-15 15:24 ` [PATCH 1/2] mm: page_alloc: skip memoryless nodes entirely Qi Zheng 2023-02-15 15:24 ` [PATCH 2/2] mm: memory_hotplug: drop memoryless node from fallback lists Qi Zheng 2023-02-15 16:36 ` [PATCH 0/2] handle memoryless nodes more appropriately Michal Hocko 2023-02-15 23:11 ` Qi Zheng 2023-02-16 7:51 ` Michal Hocko 2023-02-16 8:21 ` Qi Zheng 2023-02-16 8:37 ` Michal Hocko 2023-02-16 10:50 ` Qi Zheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox