Re: Slow-tier Page Promotion discussion recap and open questions

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Raghavendra K T <rkodsara@amd.com>
To: David Rientjes <rientjes@google.com>,
	Karim Manaouil <kmanaouil.dev@gmail.com>
Cc: Gregory Price <gourry@gourry.net>,
	Aneesh Kumar <AneeshKumar.KizhakeVeetil@arm.com>,
	David Hildenbrand <david@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Kirill Shutemov <k.shutemov@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	Mel Gorman <mel.gorman@gmail.com>,
	"Rao, Bharata Bhasker" <bharata@amd.com>,
	Rik van Riel <riel@surriel.com>,
	RaghavendraKT <Raghavendra.KodsaraThimmappa@amd.com>,
	Wei Xu <weixugc@google.com>, Suyeon Lee <leesuyeon0506@gmail.com>,
	Lei Chen <leillc@google.com>,
	"Shukla, Santosh" <santosh.shukla@amd.com>,
	"Grimm, Jon" <jon.grimm@amd.com>,
	sj@kernel.org, shy828301@gmail.com, Zi Yan <ziy@nvidia.com>,
	Liam Howlett <liam.howlett@oracle.com>,
	Gregory Price <gregory.price@memverge.com>,
	linux-mm@kvack.org
Subject: Re: Slow-tier Page Promotion discussion recap and open questions
Date: Mon, 30 Dec 2024 12:21:26 +0530	[thread overview]
Message-ID: <8c857d4b-4cb6-41b3-8766-fe293d0fdbf0@amd.com> (raw)
In-Reply-To: <e65d9afd-7fee-56d6-d2e0-e8379c5f9988@google.com>

On 12/30/2024 11:06 AM, David Rientjes wrote:
> On Thu, 26 Dec 2024, Karim Manaouil wrote:
> 
>> On Wed, Dec 18, 2024 at 07:56:19PM -0500, Gregory Price wrote:
>>> On Tue, Dec 17, 2024 at 08:19:56PM -0800, David Rientjes wrote:
>>>> ----->o-----
>>>> Raghu noted the current promotion destination is node 0 by default.  Wei
>>>> noted we could get some page owner information to determine things like
>>>> mempolicies or compute the distance between nodes and, if multiple nodes
>>>> have the same distance, choose one of them just as we do for demotions.
>>>>
>>>> Gregory Price noted some downsides to using mempolicies for this based on
>>>> per-task, per-vma, and cross socket policies, so using the kernel's
>>>> memory tiering policies is probably the best way to go about it.
>>>>
>>>
>>> Slightly elaborating here:
>>> - In an async context, associating a page with a specific task is not
>>>    presently possible (that I know of). The most we know is the last
>>>    accessing CPU - maybe - in the page/folio struct.  Right now this
>>>    is disabled in favor of a timestamp when tiering is enabled.
>>>
>>>    a process with 2 tasks which have access to the page may not run
>>>    on the same socket, so we run the risk of migrating to a bad target.
>>>    Best effort here would suggest either socket is fine - since they're
>>>    both "fast nodes" - but this requires that we record the last
>>>    accessing CPU for a page at identification time.
>>>
>>
>> This can be sovled with a two steps migration: first, you promote the
>> page from CXL to a NUMA node, then you rely on NUMA balancing to
>> further place the page into the right NUMA node. NUMA hint faults can
>> still be enabled for pages allocated from NUMA nodes, but not for CXL.
>>
> 
> I think it would be a shame to promote to the wrong top-tier NUMA node and
> rely on NUMA Balancing to fix it up with yet another migration :/

Agree here. Advantage of promotion is lost, considering the typical
access time for CXL vs regular node we have currently.

> 
> Since these cpuless memory nodes should have a promotion node associated
> with them, which defaults to the latency given to us by the HMAT, can we
> make that the default promotion target when memory is accessed?  The
> "normal mode" for NUMA Balancing could fix this up subsequent to the
> promotion, but only if enabled.
> 
> Raghu noted in the session that the current patch series only promotes to
> node 0 but that choice is only for the RFC.  I *assume* that every CXL
> memory node will have a standard top-tier node to promote to *or* that we
> stash that promotion node information at the time of demotion so memory
> comes back to the same node it was demoted from.
> 
> Either way, this feels like a solvable problem?

How about sharing the hint between NUMAB mode=1 and kernel thread. For
e.g., NUMAB mode=1 needs help on hot VMAs to scan. (which is supplied
from kernel thread) whereas promotion target is kept at VMA level as a
hint based on hint faults?? (Thinking loud here).

Even top-tier node associated CXL might work, but need to think more here.

PS: I had run my experiment with NUMAB mode=1 the benefit of kernel
thread was intact.

Thanks and Regards
- Raghu

next prev parent reply	other threads:[~2024-12-30  6:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-18  4:19 David Rientjes
2024-12-18 14:50 ` Zi Yan
2024-12-19  6:38   ` Shivank Garg
2024-12-30  5:30     ` David Rientjes
2024-12-30 17:33       ` Zi Yan
2025-01-06  9:14       ` Shivank Garg
2024-12-18 15:21 ` Nadav Amit
2024-12-20 11:28   ` Raghavendra K T
2024-12-18 19:23 ` SeongJae Park
2024-12-19  0:56 ` Gregory Price
2024-12-26  1:28   ` Karim Manaouil
2024-12-30  5:36     ` David Rientjes
2024-12-30  6:51       ` Raghavendra K T [this message]
2025-01-06 17:02       ` Gregory Price
2024-12-20 11:21 ` Raghavendra K T
2025-01-02  4:44   ` David Rientjes
2025-01-06  6:29     ` Raghavendra K T
2025-01-08  5:43     ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c857d4b-4cb6-41b3-8766-fe293d0fdbf0@amd.com \
    --to=rkodsara@amd.com \
    --cc=AneeshKumar.KizhakeVeetil@arm.com \
    --cc=Raghavendra.KodsaraThimmappa@amd.com \
    --cc=bharata@amd.com \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=gregory.price@memverge.com \
    --cc=jhubbard@nvidia.com \
    --cc=jon.grimm@amd.com \
    --cc=k.shutemov@gmail.com \
    --cc=kmanaouil.dev@gmail.com \
    --cc=leesuyeon0506@gmail.com \
    --cc=leillc@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mel.gorman@gmail.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=santosh.shukla@amd.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox