Re: [PATCH] mm/fake-numa: per-phys node fake size

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mike Rapoport <rppt@kernel.org>
To: Bruno Faccini <bfaccini@nvidia.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	Zi Yan <ziy@nvidia.com>, Timur Tabi <ttabi@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>
Subject: Re: [PATCH] mm/fake-numa: per-phys node fake size
Date: Wed, 25 Sep 2024 12:28:56 +0300	[thread overview]
Message-ID: <ZvPX2J7D9w0EJTUo@kernel.org> (raw)
In-Reply-To: <A846613E-A2B4-4B56-B368-5786F572F168@nvidia.com>

Hi Bruno,

Please reply inline to the mails on Linux kernel mailing lists.

On Tue, Sep 24, 2024 at 03:27:52PM +0000, Bruno Faccini wrote:
> On 24/09/2024 12:43, "Mike Rapoport" <rppt@kernel.org> wrote:

> > On Sat, Sep 21, 2024 at 01:13:49AM -0700, Bruno Faccini wrote:
> > > Determine fake numa node size on a per-phys node basis to
> > > handle cases where there are big differences of reserved
> > > memory size inside physical nodes, this will allow to get
> > > the expected number of nodes evenly interleaved.
> > >
> > > Consider a system with 2 physical Numa nodes where almost
> > > all reserved memory sits into a single node, computing the
> > > fake-numa nodes (fake=N) size as the ratio of all
> > > available/non-reserved memory can cause the inability to
> > > create N/2 fake-numa nodes in the physical node.
> > 
> > 
> > I'm not sure I understand the problem you are trying to solve.
> > Can you provide more specific example?
>
> I will try to be more precise about the situation I have encountered with
> your original set of patches and how I thought it could be solved.
> 
> On a system with 2 physical Numa nodes each with 480GB local memory,
> where the biggest part of reserved memory (~ 309MB) is from node 0 with a
> small part (~ 51MB) from node 1, leading to the fake node size of ~<120GB
> being determined.
>
> But when allocating fake nodes from physical nodes, with let say fake=8
> boot parameter being used, we ended with less (7) than expected, because
> there was not enough room to allocate 8/2 fake nodes in physical node 0,
> due to too big size evaluation.

The ability to split a physical node to emulated nodes depends not only on
the node sizes and hole sizes, but also where the holes are located inside
the nodes and it's quite possible that for some memory layouts
split_nodes_interleave() will fail to create the requested number of the
emulated nodes.

> I don't think that fake=N allocation method is intended to get fake nodes
> with equal size, but to get this exact number of nodes.  This is why I
> think we should use a per-phys node size for the fake nodes it will host.

IMO your change adds to much complexity for a feature that by definition
should be used only for debugging.

Also, there is a variation numa=fake=<N>U of numa=fake parameter that
divides each node into N emulated nodes.
 
> Hope this clarifies the reason and intent for my patch, have a good day,
> Bruno 
> 
> 
> > Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
> > ---
> > mm/numa_emulation.c | 66 ++++++++++++++++++++++++++-------------------
> > 1 file changed, 39 insertions(+), 27 deletions(-)

-- 
Sincerely yours,
Mike.

next prev parent reply	other threads:[~2024-09-25  9:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-21  8:13 Bruno Faccini
2024-09-24 10:40 ` Mike Rapoport
2024-09-24 15:27   ` Bruno Faccini
2024-09-25  9:28     ` Mike Rapoport [this message]
2024-09-29 15:43       ` Bruno Faccini
2024-10-01  7:15         ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZvPX2J7D9w0EJTUo@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bfaccini@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ttabi@nvidia.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox