* [PATCH] mm/fake-numa: fix under-allocation detection in uniform split
@ 2026-04-17 11:41 Sang-Heon Jeon
2026-04-17 12:49 ` Sang-Heon Jeon
0 siblings, 1 reply; 2+ messages in thread
From: Sang-Heon Jeon @ 2026-04-17 11:41 UTC (permalink / raw)
To: akpm, rppt, djbw, mingo
Cc: linux-mm, Sang-Heon Jeon, Donghyeon Lee, Munhui Chae
When split NUMA node uniformly, split_nodes_size_interleave_uniform()
returns the next absolute node ID, not the number of nodes created.
The existing under-allocation detection logic compares next absolute node
ID (ret) and request count (n), which only works when nid starts at 0.
For example, on a system with 2 physical NUMA nodes (node 0: 2GB, node
1: 128MB) and numa=fake=8U, 8 fake nodes are successfully created from
node 0 and split_nodes_size_interleave_uniform() returns 8. For node 1,
fake node nid starts at 8, but only 4 fake nodes are created due to
current FAKE_NODE_MIN_SIZE being 32MB, and
split_nodes_size_interleave_uniform() returns 12. By existing
under-allocation detection logic, "ret < n" (12 < 8) is false, so the
under-allocation will not be detected.
Fix under-allocation detection logic to compare the number of actually
created nodes (ret - nid) against the request count (n).
Also, fix the outdated comment to match the actual return value.
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Reported-by: Donghyeon Lee <asd142513@gmail.com>
Reported-by: Munhui Chae <mochae@student.42seoul.kr>
Fixes: cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability") # 4.19
---
Changes from RFC v1 [1]
- Merge patchset into once.
- Change base from linux-next to mm-unstable
Changes from RFC v2 [2]
- Fix error message to use the number of created node instead of
returned node ID.
- Define nr_created variable to explicitly show the number of created
noeds.
[1] https://lore.kernel.org/all/20260413154438.396031-1-ekffu200098@gmail.com/
[2] https://lore.kernel.org/all/20260416102558.575210-1-ekffu200098@gmail.com/
---
QEMU-based test results for the commit message scenario.
1) AS-IS (before fix)
[ 0.001878] NUMA: Node 0 [mem 0x00001000-0x0009ffff] + [mem 0x00100000-0x7fffffff] ]
[ 0.001881] Fake node size 255MB too small, increasing to 256MB
[ 0.001882] Faking node 0 at [mem 0x0000000000001000-0x0000000010000fff] (256MB)
[ 0.001883] Faking node 1 at [mem 0x0000000010001000-0x0000000020000fff] (256MB)
[ 0.001883] Faking node 2 at [mem 0x0000000020001000-0x0000000030000fff] (256MB)
[ 0.001884] Faking node 3 at [mem 0x0000000030001000-0x0000000040000fff] (256MB)
[ 0.001884] Faking node 4 at [mem 0x0000000040001000-0x0000000050000fff] (256MB)
[ 0.001884] Faking node 5 at [mem 0x0000000050001000-0x0000000060000fff] (256MB)
[ 0.001885] Faking node 6 at [mem 0x0000000060001000-0x0000000070000fff] (256MB)
[ 0.001885] Faking node 7 at [mem 0x0000000070001000-0x000000007fffffff] (255MB)
[ 0.001885] Fake node size 15MB too small, increasing to 32MB
[ 0.001886] Faking node 8 at [mem 0x0000000080000000-0x0000000081ffffff] (32MB)
[ 0.001886] Faking node 9 at [mem 0x0000000082000000-0x0000000083ffffff] (32MB)
[ 0.001887] Faking node 10 at [mem 0x0000000084000000-0x0000000087fdcfff] (63MB)
[ 0.001924] NODE_DATA(0) allocated [mem 0x0fffd6c0-0x10000fff]
[ 0.019852] NODE_DATA(1) allocated [mem 0x1fffd6c0-0x20000fff]
[ 0.022458] NODE_DATA(2) allocated [mem 0x2dffc6c0-0x2dffffff]
[ 0.023293] NODE_DATA(3) allocated [mem 0x3fffd6c0-0x40000fff]
[ 0.028522] NODE_DATA(4) allocated [mem 0x4fffd6c0-0x50000fff]
[ 0.032397] NODE_DATA(5) allocated [mem 0x5fffd6c0-0x60000fff]
[ 0.036552] NODE_DATA(6) allocated [mem 0x6fffd6c0-0x70000fff]
[ 0.038746] NODE_DATA(7) allocated [mem 0x7fffc6c0-0x7fffffff]
[ 0.040286] NODE_DATA(8) allocated [mem 0x81ffc6c0-0x81ffffff]
[ 0.041517] NODE_DATA(9) allocated [mem 0x83ffc6c0-0x83ffffff]
[ 0.043678] NODE_DATA(10) allocated [mem 0x87fd86c0-0x87fdbfff]
2) TO-BE (after fix)
[ 0.001858] NUMA: Node 0 [mem 0x00001000-0x0009ffff] + [mem 0x00100000-0x7fffffff] ]
[ 0.001860] Fake node size 255MB too small, increasing to 256MB
[ 0.001861] Faking node 0 at [mem 0x0000000000001000-0x0000000010000fff] (256MB)
[ 0.001861] Faking node 1 at [mem 0x0000000010001000-0x0000000020000fff] (256MB)
[ 0.001862] Faking node 2 at [mem 0x0000000020001000-0x0000000030000fff] (256MB)
[ 0.001862] Faking node 3 at [mem 0x0000000030001000-0x0000000040000fff] (256MB)
[ 0.001863] Faking node 4 at [mem 0x0000000040001000-0x0000000050000fff] (256MB)
[ 0.001863] Faking node 5 at [mem 0x0000000050001000-0x0000000060000fff] (256MB)
[ 0.001863] Faking node 6 at [mem 0x0000000060001000-0x0000000070000fff] (256MB)
[ 0.001864] Faking node 7 at [mem 0x0000000070001000-0x000000007fffffff] (255MB)
[ 0.001864] Fake node size 15MB too small, increasing to 32MB
[ 0.001864] Faking node 8 at [mem 0x0000000080000000-0x0000000081ffffff] (32MB)
[ 0.001865] Faking node 9 at [mem 0x0000000082000000-0x0000000083ffffff] (32MB)
[ 0.001865] Faking node 10 at [mem 0x0000000084000000-0x0000000087fdcfff] (63MB)
[ 0.001866] numa_emulation: phys: 1 only got 3 of 8 nodes, failing
[ 0.001867] NODE_DATA(0) allocated [mem 0x7fffc6c0-0x7fffffff]
[ 0.001940] NODE_DATA(1) allocated [mem 0x87fd96c0-0x87fdcfff]
And also there is other scenario tested by Donghyeon. [1]
[1] https://lore.kernel.org/all/CAFPTC5e1OLpHa3HqwhtSPjS_PTQz+iG=ovM2cZ=VnOZ_5z7oxg@mail.gmail.com/
---
mm/numa_emulation.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/mm/numa_emulation.c b/mm/numa_emulation.c
index 703c8fa05048..bc2f163e7c45 100644
--- a/mm/numa_emulation.c
+++ b/mm/numa_emulation.c
@@ -214,7 +214,7 @@ static u64 uniform_size(u64 max_addr, u64 base, u64 hole, int nr_nodes)
* Sets up fake nodes of `size' interleaved over physical nodes ranging from
* `addr' to `max_addr'.
*
- * Returns zero on success or negative on error.
+ * Returns absolute node ID on success or negative on error.
*/
static int __init split_nodes_size_interleave_uniform(struct numa_meminfo *ei,
struct numa_meminfo *pi,
@@ -398,7 +398,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
*/
if (strchr(emu_cmdline, 'U')) {
unsigned long n;
- int nid = 0;
+ int nid = 0, nr_created;
n = simple_strtoul(emu_cmdline, &emu_cmdline, 0);
ret = -1;
@@ -416,9 +416,11 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
n, &pi.blk[0], nid);
if (ret < 0)
break;
- if (ret < n) {
+
+ nr_created = ret - nid;
+ if (nr_created < n) {
pr_info("%s: phys: %d only got %d of %ld nodes, failing\n",
- __func__, i, ret, n);
+ __func__, i, nr_created, n);
ret = -1;
break;
}
--
2.43.0
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: [PATCH] mm/fake-numa: fix under-allocation detection in uniform split
2026-04-17 11:41 [PATCH] mm/fake-numa: fix under-allocation detection in uniform split Sang-Heon Jeon
@ 2026-04-17 12:49 ` Sang-Heon Jeon
0 siblings, 0 replies; 2+ messages in thread
From: Sang-Heon Jeon @ 2026-04-17 12:49 UTC (permalink / raw)
To: akpm, rppt, djbw, mingo; +Cc: linux-mm, Donghyeon Lee, Munhui Chae
On Fri, Apr 17, 2026 at 8:41 PM Sang-Heon Jeon <ekffu200098@gmail.com> wrote:
>
> When split NUMA node uniformly, split_nodes_size_interleave_uniform()
> returns the next absolute node ID, not the number of nodes created.
>
> The existing under-allocation detection logic compares next absolute node
> ID (ret) and request count (n), which only works when nid starts at 0.
>
> For example, on a system with 2 physical NUMA nodes (node 0: 2GB, node
> 1: 128MB) and numa=fake=8U, 8 fake nodes are successfully created from
> node 0 and split_nodes_size_interleave_uniform() returns 8. For node 1,
> fake node nid starts at 8, but only 4 fake nodes are created due to
> current FAKE_NODE_MIN_SIZE being 32MB, and
> split_nodes_size_interleave_uniform() returns 12. By existing
> under-allocation detection logic, "ret < n" (12 < 8) is false, so the
> under-allocation will not be detected.
>
> Fix under-allocation detection logic to compare the number of actually
> created nodes (ret - nid) against the request count (n).
>
> Also, fix the outdated comment to match the actual return value.
>
> Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
> Reported-by: Donghyeon Lee <asd142513@gmail.com>
> Reported-by: Munhui Chae <mochae@student.42seoul.kr>
> Fixes: cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability") # 4.19
> ---
> Changes from RFC v1 [1]
> - Merge patchset into once.
> - Change base from linux-next to mm-unstable
>
> Changes from RFC v2 [2]
> - Fix error message to use the number of created node instead of
> returned node ID.
> - Define nr_created variable to explicitly show the number of created
> noeds.
>
> [1] https://lore.kernel.org/all/20260413154438.396031-1-ekffu200098@gmail.com/
> [2] https://lore.kernel.org/all/20260416102558.575210-1-ekffu200098@gmail.com/
> ---
> QEMU-based test results for the commit message scenario.
>
> 1) AS-IS (before fix)
> [ 0.001878] NUMA: Node 0 [mem 0x00001000-0x0009ffff] + [mem 0x00100000-0x7fffffff] ]
> [ 0.001881] Fake node size 255MB too small, increasing to 256MB
> [ 0.001882] Faking node 0 at [mem 0x0000000000001000-0x0000000010000fff] (256MB)
> [ 0.001883] Faking node 1 at [mem 0x0000000010001000-0x0000000020000fff] (256MB)
> [ 0.001883] Faking node 2 at [mem 0x0000000020001000-0x0000000030000fff] (256MB)
> [ 0.001884] Faking node 3 at [mem 0x0000000030001000-0x0000000040000fff] (256MB)
> [ 0.001884] Faking node 4 at [mem 0x0000000040001000-0x0000000050000fff] (256MB)
> [ 0.001884] Faking node 5 at [mem 0x0000000050001000-0x0000000060000fff] (256MB)
> [ 0.001885] Faking node 6 at [mem 0x0000000060001000-0x0000000070000fff] (256MB)
> [ 0.001885] Faking node 7 at [mem 0x0000000070001000-0x000000007fffffff] (255MB)
> [ 0.001885] Fake node size 15MB too small, increasing to 32MB
> [ 0.001886] Faking node 8 at [mem 0x0000000080000000-0x0000000081ffffff] (32MB)
> [ 0.001886] Faking node 9 at [mem 0x0000000082000000-0x0000000083ffffff] (32MB)
> [ 0.001887] Faking node 10 at [mem 0x0000000084000000-0x0000000087fdcfff] (63MB)
> [ 0.001924] NODE_DATA(0) allocated [mem 0x0fffd6c0-0x10000fff]
> [ 0.019852] NODE_DATA(1) allocated [mem 0x1fffd6c0-0x20000fff]
> [ 0.022458] NODE_DATA(2) allocated [mem 0x2dffc6c0-0x2dffffff]
> [ 0.023293] NODE_DATA(3) allocated [mem 0x3fffd6c0-0x40000fff]
> [ 0.028522] NODE_DATA(4) allocated [mem 0x4fffd6c0-0x50000fff]
> [ 0.032397] NODE_DATA(5) allocated [mem 0x5fffd6c0-0x60000fff]
> [ 0.036552] NODE_DATA(6) allocated [mem 0x6fffd6c0-0x70000fff]
> [ 0.038746] NODE_DATA(7) allocated [mem 0x7fffc6c0-0x7fffffff]
> [ 0.040286] NODE_DATA(8) allocated [mem 0x81ffc6c0-0x81ffffff]
> [ 0.041517] NODE_DATA(9) allocated [mem 0x83ffc6c0-0x83ffffff]
> [ 0.043678] NODE_DATA(10) allocated [mem 0x87fd86c0-0x87fdbfff]
>
> 2) TO-BE (after fix)
> [ 0.001858] NUMA: Node 0 [mem 0x00001000-0x0009ffff] + [mem 0x00100000-0x7fffffff] ]
> [ 0.001860] Fake node size 255MB too small, increasing to 256MB
> [ 0.001861] Faking node 0 at [mem 0x0000000000001000-0x0000000010000fff] (256MB)
> [ 0.001861] Faking node 1 at [mem 0x0000000010001000-0x0000000020000fff] (256MB)
> [ 0.001862] Faking node 2 at [mem 0x0000000020001000-0x0000000030000fff] (256MB)
> [ 0.001862] Faking node 3 at [mem 0x0000000030001000-0x0000000040000fff] (256MB)
> [ 0.001863] Faking node 4 at [mem 0x0000000040001000-0x0000000050000fff] (256MB)
> [ 0.001863] Faking node 5 at [mem 0x0000000050001000-0x0000000060000fff] (256MB)
> [ 0.001863] Faking node 6 at [mem 0x0000000060001000-0x0000000070000fff] (256MB)
> [ 0.001864] Faking node 7 at [mem 0x0000000070001000-0x000000007fffffff] (255MB)
> [ 0.001864] Fake node size 15MB too small, increasing to 32MB
> [ 0.001864] Faking node 8 at [mem 0x0000000080000000-0x0000000081ffffff] (32MB)
> [ 0.001865] Faking node 9 at [mem 0x0000000082000000-0x0000000083ffffff] (32MB)
> [ 0.001865] Faking node 10 at [mem 0x0000000084000000-0x0000000087fdcfff] (63MB)
> [ 0.001866] numa_emulation: phys: 1 only got 3 of 8 nodes, failing
> [ 0.001867] NODE_DATA(0) allocated [mem 0x7fffc6c0-0x7fffffff]
> [ 0.001940] NODE_DATA(1) allocated [mem 0x87fd96c0-0x87fdcfff]
>
> And also there is other scenario tested by Donghyeon. [1]
>
> [1] https://lore.kernel.org/all/CAFPTC5e1OLpHa3HqwhtSPjS_PTQz+iG=ovM2cZ=VnOZ_5z7oxg@mail.gmail.com/
>
> ---
> mm/numa_emulation.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/mm/numa_emulation.c b/mm/numa_emulation.c
> index 703c8fa05048..bc2f163e7c45 100644
> --- a/mm/numa_emulation.c
> +++ b/mm/numa_emulation.c
> @@ -214,7 +214,7 @@ static u64 uniform_size(u64 max_addr, u64 base, u64 hole, int nr_nodes)
> * Sets up fake nodes of `size' interleaved over physical nodes ranging from
> * `addr' to `max_addr'.
> *
> - * Returns zero on success or negative on error.
> + * Returns absolute node ID on success or negative on error.
> */
> static int __init split_nodes_size_interleave_uniform(struct numa_meminfo *ei,
> struct numa_meminfo *pi,
> @@ -398,7 +398,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
> */
> if (strchr(emu_cmdline, 'U')) {
> unsigned long n;
> - int nid = 0;
> + int nid = 0, nr_created;
>
> n = simple_strtoul(emu_cmdline, &emu_cmdline, 0);
> ret = -1;
> @@ -416,9 +416,11 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
> n, &pi.blk[0], nid);
> if (ret < 0)
> break;
> - if (ret < n) {
> +
> + nr_created = ret - nid;
> + if (nr_created < n) {
sashiko [1] found a false failure point with memoryless nodes, I'll
fix it and send v2 patch soon.
[1] https://sashiko.dev/#/patchset/20260417114127.1664283-1-ekffu200098%40gmail.com
> pr_info("%s: phys: %d only got %d of %ld nodes, failing\n",
> - __func__, i, ret, n);
> + __func__, i, nr_created, n);
> ret = -1;
> break;
> }
> --
> 2.43.0
>
Best Regards,
Sang-Heon Jeon
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-04-17 12:49 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-17 11:41 [PATCH] mm/fake-numa: fix under-allocation detection in uniform split Sang-Heon Jeon
2026-04-17 12:49 ` Sang-Heon Jeon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox