* [PATCH 0/4] sched fixes
@ 2023-08-10 16:24 Yury Norov
2023-08-10 16:24 ` [PATCH 1/4] numa: generalize numa_map_to_online_node() Yury Norov
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Yury Norov @ 2023-08-10 16:24 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Yury Norov, Ingo Molnar, Peter Zijlstra, Andrew Morton,
Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
Jacob Keller, Jakub Kicinski, Juri Lelli, Mel Gorman,
Peter Lafreniere, Steven Rostedt, Tariq Toukan,
Valentin Schneider, Vincent Guittot, shiju.jose,
jonathan.cameron, prime.zeng, linuxarm, yangyicong,
Andy Shevchenko, Rasmus Villemoes
Fixes for recently introduced sched_numa_find_nth_cpu(), and minor
improvements in sched/fair.
Yury Norov (4):
numa: generalize numa_map_to_online_node()
sched/fair: fix opencoded numa_nearest_node()
sched: fix sched_numa_find_nth_cpu() in CPU-less case
sched: fix sched_numa_find_nth_cpu() in non-NUMA case
include/linux/numa.h | 7 +++++--
include/linux/topology.h | 2 +-
kernel/sched/fair.c | 14 +-------------
kernel/sched/topology.c | 6 +++++-
mm/mempolicy.c | 18 +++++++++++-------
5 files changed, 23 insertions(+), 24 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/4] numa: generalize numa_map_to_online_node()
2023-08-10 16:24 [PATCH 0/4] sched fixes Yury Norov
@ 2023-08-10 16:24 ` Yury Norov
2023-08-11 8:48 ` Andy Shevchenko
2023-08-10 16:24 ` [PATCH 2/4] sched/fair: fix opencoded numa_nearest_node() Yury Norov
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Yury Norov @ 2023-08-10 16:24 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Yury Norov, Ingo Molnar, Peter Zijlstra, Andrew Morton,
Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
Jacob Keller, Jakub Kicinski, Juri Lelli, Mel Gorman,
Peter Lafreniere, Steven Rostedt, Tariq Toukan,
Valentin Schneider, Vincent Guittot, shiju.jose,
jonathan.cameron, prime.zeng, linuxarm, yangyicong,
Andy Shevchenko, Rasmus Villemoes
The function is in fact searches for the nearest node for a given one,
based on a N_ONLINE state. This is a common pattern to search for a
nearest node.
This patch converts numa_map_to_online_node(node) to
numa_nearest_node(node, state), so that others won't need opencode the
logic. The following patches apply it where applicable.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
include/linux/numa.h | 7 +++++--
mm/mempolicy.c | 18 +++++++++++-------
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/include/linux/numa.h b/include/linux/numa.h
index 59df211d051f..fb30a42f0700 100644
--- a/include/linux/numa.h
+++ b/include/linux/numa.h
@@ -25,7 +25,7 @@
#include <asm/sparsemem.h>
/* Generic implementation available */
-int numa_map_to_online_node(int node);
+int numa_nearest_node(int node, unsigned int state);
#ifndef memory_add_physaddr_to_nid
static inline int memory_add_physaddr_to_nid(u64 start)
@@ -44,10 +44,11 @@ static inline int phys_to_target_node(u64 start)
}
#endif
#else /* !CONFIG_NUMA */
-static inline int numa_map_to_online_node(int node)
+static inline int numa_nearest_node(int node, unsigned int state)
{
return NUMA_NO_NODE;
}
+
static inline int memory_add_physaddr_to_nid(u64 start)
{
return 0;
@@ -58,6 +59,8 @@ static inline int phys_to_target_node(u64 start)
}
#endif
+#define numa_map_to_online_node(node) numa_nearest_node(node, N_ONLINE)
+
#ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP
extern const struct attribute_group arch_node_dev_group;
#endif
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index c53f8beeb507..db74dd957d2c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -131,22 +131,26 @@ static struct mempolicy default_policy = {
static struct mempolicy preferred_node_policy[MAX_NUMNODES];
/**
- * numa_map_to_online_node - Find closest online node
+ * numa_nearest_node - Find nearest node by state
* @node: Node id to start the search
+ * @state: State to filter the search
*
- * Lookup the next closest node by distance if @nid is not online.
+ * Lookup the closest node by distance if @nid is not in state.
*
- * Return: this @node if it is online, otherwise the closest node by distance
+ * Return: this @node if it is in state, otherwise the closest node by distance
*/
-int numa_map_to_online_node(int node)
+int numa_nearest_node(int node, unsigned int state)
{
int min_dist = INT_MAX, dist, n, min_node;
- if (node == NUMA_NO_NODE || node_online(node))
+ if (state >= NR_NODE_STATES)
+ return -EINVAL;
+
+ if (node == NUMA_NO_NODE || node_state(node, state))
return node;
min_node = node;
- for_each_online_node(n) {
+ for_each_node_state(n, state) {
dist = node_distance(node, n);
if (dist < min_dist) {
min_dist = dist;
@@ -156,7 +160,7 @@ int numa_map_to_online_node(int node)
return min_node;
}
-EXPORT_SYMBOL_GPL(numa_map_to_online_node);
+EXPORT_SYMBOL_GPL(numa_nearest_node);
struct mempolicy *get_task_policy(struct task_struct *p)
{
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/4] sched/fair: fix opencoded numa_nearest_node()
2023-08-10 16:24 [PATCH 0/4] sched fixes Yury Norov
2023-08-10 16:24 ` [PATCH 1/4] numa: generalize numa_map_to_online_node() Yury Norov
@ 2023-08-10 16:24 ` Yury Norov
2023-08-11 8:50 ` Andy Shevchenko
2023-08-10 16:24 ` [PATCH 3/4] sched: fix sched_numa_find_nth_cpu() in CPU-less case Yury Norov
2023-08-10 16:24 ` [PATCH 4/4] sched: fix sched_numa_find_nth_cpu() in non-NUMA case Yury Norov
3 siblings, 1 reply; 8+ messages in thread
From: Yury Norov @ 2023-08-10 16:24 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Yury Norov, Ingo Molnar, Peter Zijlstra, Andrew Morton,
Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
Jacob Keller, Jakub Kicinski, Juri Lelli, Mel Gorman,
Peter Lafreniere, Steven Rostedt, Tariq Toukan,
Valentin Schneider, Vincent Guittot, shiju.jose,
jonathan.cameron, prime.zeng, linuxarm, yangyicong,
Andy Shevchenko, Rasmus Villemoes
task_numa_placement() searches for a nearest node by calling
for_each_node_state(). Now that we have numa_nearest_node(), switch to
using it.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
kernel/sched/fair.c | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b3e25be58e2b..e7b7cf87937b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2645,19 +2645,7 @@ static void task_numa_placement(struct task_struct *p)
}
/* Cannot migrate task to CPU-less node */
- if (max_nid != NUMA_NO_NODE && !node_state(max_nid, N_CPU)) {
- int near_nid = max_nid;
- int distance, near_distance = INT_MAX;
-
- for_each_node_state(nid, N_CPU) {
- distance = node_distance(max_nid, nid);
- if (distance < near_distance) {
- near_nid = nid;
- near_distance = distance;
- }
- }
- max_nid = near_nid;
- }
+ max_nid = numa_nearest_node(max_nid, N_CPU);
if (ng) {
numa_group_count_active_nodes(ng);
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/4] sched: fix sched_numa_find_nth_cpu() in CPU-less case
2023-08-10 16:24 [PATCH 0/4] sched fixes Yury Norov
2023-08-10 16:24 ` [PATCH 1/4] numa: generalize numa_map_to_online_node() Yury Norov
2023-08-10 16:24 ` [PATCH 2/4] sched/fair: fix opencoded numa_nearest_node() Yury Norov
@ 2023-08-10 16:24 ` Yury Norov
2023-08-14 8:43 ` Yicong Yang
2023-08-10 16:24 ` [PATCH 4/4] sched: fix sched_numa_find_nth_cpu() in non-NUMA case Yury Norov
3 siblings, 1 reply; 8+ messages in thread
From: Yury Norov @ 2023-08-10 16:24 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Yury Norov, Ingo Molnar, Peter Zijlstra, Andrew Morton,
Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
Jacob Keller, Jakub Kicinski, Juri Lelli, Mel Gorman,
Peter Lafreniere, Steven Rostedt, Tariq Toukan,
Valentin Schneider, Vincent Guittot, shiju.jose,
jonathan.cameron, prime.zeng, linuxarm, yangyicong,
Andy Shevchenko, Rasmus Villemoes, Guenter Roeck
When the node provided by user is CPU-less, corresponding record in
sched_domains_numa_masks is not set. Trying to dereference it in the
following code leads to kernel crash.
To avoid it, start searching from the nearest node with CPUs.
Fixes: cd7f55359c90 ("sched: add sched_numa_find_nth_cpu()")
Reported-by: Yicong Yang <yangyicong@hisilicon.com>
Closes: https://lore.kernel.org/lkml/CAAH8bW8C5humYnfpW3y5ypwx0E-09A3QxFE1JFzR66v+mO4XfA@mail.gmail.com/T/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/lkml/ZMHSNQfv39HN068m@yury-ThinkPad/T/#mf6431cb0b7f6f05193c41adeee444bc95bf2b1c4
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
This has been discovered and fixed by Yicong Yang:
https://lore.kernel.org/lkml/CAAH8bW8C5humYnfpW3y5ypwx0E-09A3QxFE1JFzR66v+mO4XfA@mail.gmail.com/T/
When discovering Guenter's failure report for sparc64, I found it's due to
the same problem. And while fixing, I found an opportunity to generalize
nearest NUMA node search and avoid code duplication.
Yicong, if you like this approach, please feel free to add your co-developed-by
or any appropriate tags.
kernel/sched/topology.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index d3a3b2646ec4..66b387172b6f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2113,10 +2113,14 @@ static int hop_cmp(const void *a, const void *b)
*/
int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
{
- struct __cmp_key k = { .cpus = cpus, .node = node, .cpu = cpu };
+ struct __cmp_key k = { .cpus = cpus, .cpu = cpu };
struct cpumask ***hop_masks;
int hop, ret = nr_cpu_ids;
+ /* CPU-less node entries are uninitialized in sched_domains_numa_masks */
+ node = numa_nearest_node(node, N_CPU);
+ k.node = node;
+
rcu_read_lock();
k.masks = rcu_dereference(sched_domains_numa_masks);
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 4/4] sched: fix sched_numa_find_nth_cpu() in non-NUMA case
2023-08-10 16:24 [PATCH 0/4] sched fixes Yury Norov
` (2 preceding siblings ...)
2023-08-10 16:24 ` [PATCH 3/4] sched: fix sched_numa_find_nth_cpu() in CPU-less case Yury Norov
@ 2023-08-10 16:24 ` Yury Norov
3 siblings, 0 replies; 8+ messages in thread
From: Yury Norov @ 2023-08-10 16:24 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Yury Norov, Ingo Molnar, Peter Zijlstra, Andrew Morton,
Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
Jacob Keller, Jakub Kicinski, Juri Lelli, Mel Gorman,
Peter Lafreniere, Steven Rostedt, Tariq Toukan,
Valentin Schneider, Vincent Guittot, shiju.jose,
jonathan.cameron, prime.zeng, linuxarm, yangyicong,
Andy Shevchenko, Rasmus Villemoes
When CONFIG_NUMA is enabled, sched_numa_find_nth_cpu() searches for a
CPU in sched_domains_numa_masks. The masks includes only online CPUs,
so effectively offline CPUs are skipped.
When CONFIG_NUMA is disabled, the fallback function should be consistent.
Fixes: cd7f55359c90 ("sched: add sched_numa_find_nth_cpu()")
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
include/linux/topology.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/topology.h b/include/linux/topology.h
index fea32377f7c7..52f5850730b3 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -251,7 +251,7 @@ extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int
#else
static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
{
- return cpumask_nth(cpu, cpus);
+ return cpumask_nth_and(cpu, cpus, cpu_online_mask);
}
static inline const struct cpumask *
--
2.39.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/4] numa: generalize numa_map_to_online_node()
2023-08-10 16:24 ` [PATCH 1/4] numa: generalize numa_map_to_online_node() Yury Norov
@ 2023-08-11 8:48 ` Andy Shevchenko
0 siblings, 0 replies; 8+ messages in thread
From: Andy Shevchenko @ 2023-08-11 8:48 UTC (permalink / raw)
To: Yury Norov
Cc: linux-kernel, linux-mm, Ingo Molnar, Peter Zijlstra,
Andrew Morton, Ben Segall, Daniel Bristot de Oliveira,
Dietmar Eggemann, Jacob Keller, Jakub Kicinski, Juri Lelli,
Mel Gorman, Peter Lafreniere, Steven Rostedt, Tariq Toukan,
Valentin Schneider, Vincent Guittot, shiju.jose,
jonathan.cameron, prime.zeng, linuxarm, yangyicong,
Rasmus Villemoes
On Thu, Aug 10, 2023 at 09:24:39AM -0700, Yury Norov wrote:
> The function is in fact searches for the nearest node for a given one,
> based on a N_ONLINE state. This is a common pattern to search for a
> nearest node.
>
> This patch converts numa_map_to_online_node(node) to
> numa_nearest_node(node, state), so that others won't need opencode the
> logic.
> The following patches apply it where applicable.
Doesn't sound like should be part of the commit message.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/4] sched/fair: fix opencoded numa_nearest_node()
2023-08-10 16:24 ` [PATCH 2/4] sched/fair: fix opencoded numa_nearest_node() Yury Norov
@ 2023-08-11 8:50 ` Andy Shevchenko
0 siblings, 0 replies; 8+ messages in thread
From: Andy Shevchenko @ 2023-08-11 8:50 UTC (permalink / raw)
To: Yury Norov
Cc: linux-kernel, linux-mm, Ingo Molnar, Peter Zijlstra,
Andrew Morton, Ben Segall, Daniel Bristot de Oliveira,
Dietmar Eggemann, Jacob Keller, Jakub Kicinski, Juri Lelli,
Mel Gorman, Peter Lafreniere, Steven Rostedt, Tariq Toukan,
Valentin Schneider, Vincent Guittot, shiju.jose,
jonathan.cameron, prime.zeng, linuxarm, yangyicong,
Rasmus Villemoes
On Thu, Aug 10, 2023 at 09:24:40AM -0700, Yury Norov wrote:
> task_numa_placement() searches for a nearest node by calling
> for_each_node_state(). Now that we have numa_nearest_node(), switch to
> using it.
For better looking you may rewrap the above as
---8<---
task_numa_placement() searches for a nearest node by calling
for_each_node_state(). Now that we have numa_nearest_node(),
switch to using
---8<---
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/4] sched: fix sched_numa_find_nth_cpu() in CPU-less case
2023-08-10 16:24 ` [PATCH 3/4] sched: fix sched_numa_find_nth_cpu() in CPU-less case Yury Norov
@ 2023-08-14 8:43 ` Yicong Yang
0 siblings, 0 replies; 8+ messages in thread
From: Yicong Yang @ 2023-08-14 8:43 UTC (permalink / raw)
To: Yury Norov, linux-kernel, linux-mm
Cc: Ingo Molnar, Peter Zijlstra, Andrew Morton, Ben Segall,
Daniel Bristot de Oliveira, Dietmar Eggemann, Jacob Keller,
Jakub Kicinski, Juri Lelli, Mel Gorman, Peter Lafreniere,
Steven Rostedt, Tariq Toukan, Valentin Schneider,
Vincent Guittot, shiju.jose, jonathan.cameron, prime.zeng,
yangyicong, Andy Shevchenko, Rasmus Villemoes, Guenter Roeck
Hi Yury,
On 2023/8/11 0:24, Yury Norov wrote:
> When the node provided by user is CPU-less, corresponding record in
> sched_domains_numa_masks is not set. Trying to dereference it in the
> following code leads to kernel crash.
>
> To avoid it, start searching from the nearest node with CPUs.
>
> Fixes: cd7f55359c90 ("sched: add sched_numa_find_nth_cpu()")
> Reported-by: Yicong Yang <yangyicong@hisilicon.com>
> Closes: https://lore.kernel.org/lkml/CAAH8bW8C5humYnfpW3y5ypwx0E-09A3QxFE1JFzR66v+mO4XfA@mail.gmail.com/T/
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Closes: https://lore.kernel.org/lkml/ZMHSNQfv39HN068m@yury-ThinkPad/T/#mf6431cb0b7f6f05193c41adeee444bc95bf2b1c4
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>
> This has been discovered and fixed by Yicong Yang:
>
> https://lore.kernel.org/lkml/CAAH8bW8C5humYnfpW3y5ypwx0E-09A3QxFE1JFzR66v+mO4XfA@mail.gmail.com/T/
>
> When discovering Guenter's failure report for sparc64, I found it's due to
> the same problem. And while fixing, I found an opportunity to generalize
> nearest NUMA node search and avoid code duplication.
>
> Yicong, if you like this approach, please feel free to add your co-developed-by
> or any appropriate tags.
>
Looks fine to me. One nit below.
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
> kernel/sched/topology.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index d3a3b2646ec4..66b387172b6f 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2113,10 +2113,14 @@ static int hop_cmp(const void *a, const void *b)
> */
> int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
> {
> - struct __cmp_key k = { .cpus = cpus, .node = node, .cpu = cpu };
> + struct __cmp_key k = { .cpus = cpus, .cpu = cpu };
> struct cpumask ***hop_masks;
> int hop, ret = nr_cpu_ids;
>
> + /* CPU-less node entries are uninitialized in sched_domains_numa_masks */
> + node = numa_nearest_node(node, N_CPU);
> + k.node = node;
> +
We may also have problem if node == NUMA_NO_NODE, is it better to mention this
in the function comment or check it before we going on? Currently this function
is only used in cpumask_local_spread() and the caller has already checked it, but
considering this is an export function so somebody may use it directly.
I wondering whether we should put this block within the protection of rcu_read_lock()
for some issues like hotplug or not. Is it possible if @node become CPU-less subsequently?
> rcu_read_lock();
>
> k.masks = rcu_dereference(sched_domains_numa_masks);
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-08-14 8:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-10 16:24 [PATCH 0/4] sched fixes Yury Norov
2023-08-10 16:24 ` [PATCH 1/4] numa: generalize numa_map_to_online_node() Yury Norov
2023-08-11 8:48 ` Andy Shevchenko
2023-08-10 16:24 ` [PATCH 2/4] sched/fair: fix opencoded numa_nearest_node() Yury Norov
2023-08-11 8:50 ` Andy Shevchenko
2023-08-10 16:24 ` [PATCH 3/4] sched: fix sched_numa_find_nth_cpu() in CPU-less case Yury Norov
2023-08-14 8:43 ` Yicong Yang
2023-08-10 16:24 ` [PATCH 4/4] sched: fix sched_numa_find_nth_cpu() in non-NUMA case Yury Norov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox