linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Sang-Heon Jeon <ekffu200098@gmail.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: akpm@linux-foundation.org, djbw@kernel.org, mingo@kernel.org,
	 linux-mm@kvack.org, Donghyeon Lee <asd142513@gmail.com>,
	 Munhui Chae <mochae@student.42seoul.kr>
Subject: Re: [PATCH v2] mm/fake-numa: fix under-allocation detection in uniform split
Date: Mon, 20 Apr 2026 22:50:47 +0900	[thread overview]
Message-ID: <CABFDxMGvGU6Vaq-AX9NXatHA9fTRDfB4yiHZBEQoJf4wsz-XqA@mail.gmail.com> (raw)
In-Reply-To: <aeXIKM9BeRmaCZ_d@kernel.org>

Hi

On Mon, Apr 20, 2026 at 3:31 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> Hi,
>
> On Fri, Apr 17, 2026 at 10:58:05PM +0900, Sang-Heon Jeon wrote:
> > When split NUMA node uniformly, split_nodes_size_interleave_uniform()
> > returns the next absolute node ID, not the number of nodes created.
> >
> > The existing under-allocation detection logic compares next absolute node
> > ID (ret) and request count (n), which only works when nid starts at 0.
> >
> > For example, on a system with 2 physical NUMA nodes (node 0: 2GB, node
> > 1: 128MB) and numa=fake=8U, 8 fake nodes are successfully created from
> > node 0 and split_nodes_size_interleave_uniform() returns 8. For node 1,
> > fake node nid starts at 8, but only 4 fake nodes are created due to
> > current FAKE_NODE_MIN_SIZE being 32MB, and
> > split_nodes_size_interleave_uniform() returns 12. By existing
> > under-allocation detection logic, "ret < n" (12 < 8) is false, so the
>
> In this example it would be 11, won't it?
> I'll update when applying.

I think 12 is correct because there is a difference between the
attached QEMU test log and the scenario which is described in the
commit message. This is because I missed the reserved memory (~140KB)
for ACPI table.

1) Previous attached QEMU test scenario

QEMU command line :
    qemu-system-x86_64 \
    ...
    -m 2176M \
    -object memory-backend-ram,id=mem0,size=2G \
    -object memory-backend-ram,id=mem1,size=128M \
    -numa node,nodeid=0,cpus=0-1,memdev=mem0 \
    -numa node,nodeid=1,cpus=2-3,memdev=mem1 \

log :
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff]
System RAM
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff]
device reserved
[    0.000000] BIOS-e820: [gap 0x00000000000a0000-0x00000000000effff]
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff]
device reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000087fdcfff]
System RAM
[    0.000000] BIOS-e820: [mem 0x0000000087fdd000-0x0000000087ffffff]
device reserved
[    0.000000] BIOS-e820: [gap 0x0000000088000000-0x00000000feffbfff]
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff]
device reserved
[    0.000000] BIOS-e820: [gap 0x00000000ff000000-0x00000000fffbffff]
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff]
device reserved
[    0.000000] BIOS-e820: [gap 0x0000000100000000-0x000000fcffffffff]
[    0.000000] BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff]
device reserved

...

[    0.001919] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.001921] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x7fffffff]
[    0.001922] ACPI: SRAT: Node 1 PXM 1 [mem 0x80000000-0x87ffffff]

...

[    0.001930] Fake node size 15MB too small, increasing to 32MB
[    0.001930] Faking node 8 at [mem
0x0000000080000000-0x0000000081ffffff] (32MB)
[    0.001931] Faking node 9 at [mem
0x0000000082000000-0x0000000083ffffff] (32MB)
[    0.001931] Faking node 10 at [mem
0x0000000084000000-0x0000000087fdcfff] (63MB)

So actual node 1 usable memory is [0x80000000-0x87fdcfff] (~127.86MB)
That's why uniformly splitted memory count is 3 for node 1

2) Commit message scenario (Log is newly attached)

QEMU command line :
     ...
     -m 2228480K \
    -object memory-backend-ram,id=mem0,size=2G \
    -object memory-backend-ram,id=mem1,size=131328K \ # 128MB + 256KB for margin
    -numa node,nodeid=0,cpus=0-1,memdev=mem0 \
    -numa node,nodeid=1,cpus=2-3,memdev=mem1 \

log:
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff]
System RAM
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff]
device reserved
[    0.000000] BIOS-e820: [gap 0x00000000000a0000-0x00000000000effff]
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff]
device reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000008801cfff]
System RAM
[    0.000000] BIOS-e820: [mem 0x000000008801d000-0x000000008803ffff]
device reserved
[    0.000000] BIOS-e820: [gap 0x0000000088040000-0x00000000feffbfff]
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff]
device reserved
[    0.000000] BIOS-e820: [gap 0x00000000ff000000-0x00000000fffbffff]
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff]
device reserved
[    0.000000] BIOS-e820: [gap 0x0000000100000000-0x000000fcffffffff]
[    0.000000] BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff]
device reserved

...

[    0.001870] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.001871] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x7fffffff]
[    0.001872] ACPI: SRAT: Node 1 PXM 1 [mem 0x80000000-0x8803ffff]

...

[    0.001888] Fake node size 16MB too small, increasing to 32MB
[    0.001888] Faking node 8 at [mem
0x0000000080000000-0x0000000081ffffff] (32MB)
[    0.001888] Faking node 9 at [mem
0x0000000082000000-0x0000000083ffffff] (32MB)
[    0.001889] Faking node 10 at [mem
0x0000000084000000-0x0000000085ffffff] (32MB)
[    0.001889] Faking node 11 at [mem
0x0000000086000000-0x000000008801cfff] (32MB) # (~32.11MB)

So actual node 1 usable memory is [0x80000000-0x8801cfff] (~128.11MB)
And Node 1 is successfully uniformly splitted into 4 fake nodes.

Sorry for causing confusion due to the wrong example.

> > under-allocation will not be detected.
> >
> > Fix under-allocation detection logic to compare the number of actually
> > created nodes (ret - nid) against the request count (n). Also skip
> > under-allocation detection logic for memoryless physical nodes where no
> > fake nodes are created.
> >
> > Also, fix the outdated comment to match the actual return value.
> >
> > Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
> > Reported-by: Donghyeon Lee <asd142513@gmail.com>
> > Reported-by: Munhui Chae <mochae@student.42seoul.kr>
> > Fixes: cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability") # 4.19
>
> ...
>
> > @@ -416,9 +416,18 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
> >                                       n, &pi.blk[0], nid);
> >                       if (ret < 0)
> >                               break;
> > -                     if (ret < n) {
> > +
> > +                     /*
> > +                      * If no memory was found for this physical node,
> > +                      * skip the under-allocation check.
>
> checkpatch complains about trailing white space here.
> I'll fix it up when applying.

Oops, I missed it. Thank you for pointing it out!

> > +                      */
> > +                     if (ret == nid)
> > +                             continue;
> > +
> > +                     nr_created = ret - nid;
> > +                     if (nr_created < n) {
> >                               pr_info("%s: phys: %d only got %d of %ld nodes, failing\n",
> > -                                             __func__, i, ret, n);
> > +                                             __func__, i, nr_created, n);
> >                               ret = -1;
> >                               break;
> >                       }
> > --
> > 2.43.0
> >
>
> --
> Sincerely yours,
> Mike.

I really appreciate your detailed review :)

Best Regards,
Sang-Heon Jeon


  reply	other threads:[~2026-04-20 13:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-17 13:58 Sang-Heon Jeon
2026-04-20  6:31 ` Mike Rapoport
2026-04-20 13:50   ` Sang-Heon Jeon [this message]
2026-04-20 14:26   ` Sang-Heon Jeon
2026-04-21  6:29     ` Mike Rapoport
2026-04-21  6:56       ` Sang-Heon Jeon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABFDxMGvGU6Vaq-AX9NXatHA9fTRDfB4yiHZBEQoJf4wsz-XqA@mail.gmail.com \
    --to=ekffu200098@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=asd142513@gmail.com \
    --cc=djbw@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=mochae@student.42seoul.kr \
    --cc=rppt@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox