linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Karim Manaouil <kmanaouil.dev@gmail.com>
Cc: Gregory Price <gourry@gourry.net>,
	 Aneesh Kumar <AneeshKumar.KizhakeVeetil@arm.com>,
	 David Hildenbrand <david@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	 Kirill Shutemov <k.shutemov@gmail.com>,
	 Matthew Wilcox <willy@infradead.org>,
	Mel Gorman <mel.gorman@gmail.com>,
	 "Rao, Bharata Bhasker" <bharata@amd.com>,
	Rik van Riel <riel@surriel.com>,
	 RaghavendraKT <Raghavendra.KodsaraThimmappa@amd.com>,
	 Wei Xu <weixugc@google.com>,
	Suyeon Lee <leesuyeon0506@gmail.com>,
	 Lei Chen <leillc@google.com>,
	"Shukla, Santosh" <santosh.shukla@amd.com>,
	 "Grimm, Jon" <jon.grimm@amd.com>,
	sj@kernel.org, shy828301@gmail.com,  Zi Yan <ziy@nvidia.com>,
	Liam Howlett <liam.howlett@oracle.com>,
	 Gregory Price <gregory.price@memverge.com>,
	linux-mm@kvack.org
Subject: Re: Slow-tier Page Promotion discussion recap and open questions
Date: Sun, 29 Dec 2024 21:36:41 -0800 (PST)	[thread overview]
Message-ID: <e65d9afd-7fee-56d6-d2e0-e8379c5f9988@google.com> (raw)
In-Reply-To: <20241226012833.rmmbkws4wdhzdht6@ed.ac.uk>

On Thu, 26 Dec 2024, Karim Manaouil wrote:

> On Wed, Dec 18, 2024 at 07:56:19PM -0500, Gregory Price wrote:
> > On Tue, Dec 17, 2024 at 08:19:56PM -0800, David Rientjes wrote:
> > > ----->o-----
> > > Raghu noted the current promotion destination is node 0 by default.  Wei
> > > noted we could get some page owner information to determine things like
> > > mempolicies or compute the distance between nodes and, if multiple nodes
> > > have the same distance, choose one of them just as we do for demotions.
> > > 
> > > Gregory Price noted some downsides to using mempolicies for this based on
> > > per-task, per-vma, and cross socket policies, so using the kernel's
> > > memory tiering policies is probably the best way to go about it.
> > > 
> > 
> > Slightly elaborating here:
> > - In an async context, associating a page with a specific task is not
> >   presently possible (that I know of). The most we know is the last
> >   accessing CPU - maybe - in the page/folio struct.  Right now this
> >   is disabled in favor of a timestamp when tiering is enabled.
> > 
> >   a process with 2 tasks which have access to the page may not run
> >   on the same socket, so we run the risk of migrating to a bad target.
> >   Best effort here would suggest either socket is fine - since they're
> >   both "fast nodes" - but this requires that we record the last 
> >   accessing CPU for a page at identification time.
> > 
> 
> This can be sovled with a two steps migration: first, you promote the
> page from CXL to a NUMA node, then you rely on NUMA balancing to
> further place the page into the right NUMA node. NUMA hint faults can
> still be enabled for pages allocated from NUMA nodes, but not for CXL.
> 

I think it would be a shame to promote to the wrong top-tier NUMA node and 
rely on NUMA Balancing to fix it up with yet another migration :/

Since these cpuless memory nodes should have a promotion node associated 
with them, which defaults to the latency given to us by the HMAT, can we 
make that the default promotion target when memory is accessed?  The 
"normal mode" for NUMA Balancing could fix this up subsequent to the 
promotion, but only if enabled.

Raghu noted in the session that the current patch series only promotes to 
node 0 but that choice is only for the RFC.  I *assume* that every CXL 
memory node will have a standard top-tier node to promote to *or* that we 
stash that promotion node information at the time of demotion so memory 
comes back to the same node it was demoted from.

Either way, this feels like a solvable problem?


  reply	other threads:[~2024-12-30  5:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-18  4:19 David Rientjes
2024-12-18 14:50 ` Zi Yan
2024-12-19  6:38   ` Shivank Garg
2024-12-30  5:30     ` David Rientjes
2024-12-30 17:33       ` Zi Yan
2025-01-06  9:14       ` Shivank Garg
2024-12-18 15:21 ` Nadav Amit
2024-12-20 11:28   ` Raghavendra K T
2024-12-18 19:23 ` SeongJae Park
2024-12-19  0:56 ` Gregory Price
2024-12-26  1:28   ` Karim Manaouil
2024-12-30  5:36     ` David Rientjes [this message]
2024-12-30  6:51       ` Raghavendra K T
2025-01-06 17:02       ` Gregory Price
2024-12-20 11:21 ` Raghavendra K T
2025-01-02  4:44   ` David Rientjes
2025-01-06  6:29     ` Raghavendra K T
2025-01-08  5:43     ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e65d9afd-7fee-56d6-d2e0-e8379c5f9988@google.com \
    --to=rientjes@google.com \
    --cc=AneeshKumar.KizhakeVeetil@arm.com \
    --cc=Raghavendra.KodsaraThimmappa@amd.com \
    --cc=bharata@amd.com \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=gregory.price@memverge.com \
    --cc=jhubbard@nvidia.com \
    --cc=jon.grimm@amd.com \
    --cc=k.shutemov@gmail.com \
    --cc=kmanaouil.dev@gmail.com \
    --cc=leesuyeon0506@gmail.com \
    --cc=leillc@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mel.gorman@gmail.com \
    --cc=riel@surriel.com \
    --cc=santosh.shukla@amd.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox