Re: Slow-tier Page Promotion discussion recap and open questions

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Zi Yan" <ziy@nvidia.com>
To: "David Rientjes" <rientjes@google.com>,
	"Shivank Garg" <shivankg@amd.com>
Cc: "Aneesh Kumar" <AneeshKumar.KizhakeVeetil@arm.com>,
	"David Hildenbrand" <david@redhat.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Kirill Shutemov" <k.shutemov@gmail.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Mel Gorman" <mel.gorman@gmail.com>,
	"Rao, Bharata Bhasker" <bharata@amd.com>,
	"Rik van Riel" <riel@surriel.com>,
	"RaghavendraKT" <Raghavendra.KodsaraThimmappa@amd.com>,
	"Wei Xu" <weixugc@google.com>,
	"Suyeon Lee" <leesuyeon0506@gmail.com>,
	"Lei Chen" <leillc@google.com>,
	"Shukla, Santosh" <santosh.shukla@amd.com>,
	"Grimm, Jon" <jon.grimm@amd.com>, <sj@kernel.org>,
	<shy828301@gmail.com>, "Liam Howlett" <liam.howlett@oracle.com>,
	"Gregory Price" <gregory.price@memverge.com>,
	<linux-mm@kvack.org>, "Kefeng Wang" <wangkefeng.wang@huawei.com>
Subject: Re: Slow-tier Page Promotion discussion recap and open questions
Date: Mon, 30 Dec 2024 12:33:05 -0500	[thread overview]
Message-ID: <D6P7RAH0KMU3.XTM8E3TLIHEY@nvidia.com> (raw)
In-Reply-To: <edfcb05e-090c-bdef-88f2-00a87aff6a9b@google.com>

On Mon Dec 30, 2024 at 12:30 AM EST, David Rientjes wrote:
> On Thu, 19 Dec 2024, Shivank Garg wrote:
>
> > On 12/18/2024 8:20 PM, Zi Yan wrote:
> > > On 17 Dec 2024, at 23:19, David Rientjes wrote:
> > > 
> > >> Hi everybody,
> > >>
> > >> We had a very interactive discussion last week led by RaghavendraKT on
> > >> slow-tier page promotion intended for memory tiering platforms, thank
> > >> you!  Thanks as well to everybody who attended and provided great
> > >> questions, suggestions, and feedback.
> > >>
> > >> The RFC patch series "mm: slowtier page promotion based on PTE A bit"[1]
> > >> is a proposal to allow for asynchronous page promotion based on memory
> > >> accesses as an alternative to NUMA Balancing based promotions.  There was
> > >> widespread interest in this topic and the discussion surfaced multiple
> > >> use cases and requirements, very focused on CXL use cases.
> > >>
> > > <snip>
> > >> ----->o-----
> > >> I asked about offloading the migration to a data mover, such as the PSP
> > >> for AMD, DMA engine, etc and whether that should be treated entirely
> > >> separately as a topic.  Bharata said there was a proof-of-concept
> > >> available from AMD that does just that but the initial results were not
> > >> that encouraging.
> > >>
> > >> Zi asked if the DMA engine saturated the link between the slow and fast
> > >> tiers.  If we want to offload to a copy engine, we need to verify that
> > >> the throughput is sufficient or we may be better off using idle cpus to
> > >> perform the migration for us.
> > > 
> > > <snip>
> > >>
> > >>  - we likely want to reconsider the single threaded nature of the kthread
> > >>    even if only for NUMA purposes
> > >>
> > > 
> > > Related to using DMA engine and/or multi threads for page migration, I had
> > > a patchset accelerating page migration[1] back in 2019. It showed good
> > > throughput speedup, ~4x using 16 threads to copy multiple 2MB THP. I think
> > > it is time to revisit the topic.
> > > 
> > > 
> > > [1] https://lore.kernel.org/linux-mm/20190404020046.32741-1-zi.yan@sent.com/
> > 
> > Hi All,
> > 
> > I wanted to provide some additional context regarding the AMD DMA offloading
> > POC mentioned by Bharata:
> > https://lore.kernel.org/linux-mm/20240614221525.19170-1-shivankg@amd.com
> > 
> > While the initial results weren't as encouraging as hoped, I plan to improve this
> > in next versions of the patchset.
> > 
> > The core idea in my RFC patchset is restructuring the folio move operation
> > to better leverage DMA hardware. Instead of the current folio-by-folio approach:
> > 
> > for_each_folio() {
> >     copy metadata + content + update PTEs
> > }
> > 
> > We batch the operations to minimize overhead:
> > 
> > for_each_folio() {
> >     copy metadata
> > }
> > DMA batch copy all content
> > for_each_folio() {
> >     update PTEs
> > }
> > 
> > My experiment showed that folio copy can consume up to 26.6% of total migration
> > cost when moving data between NUMA nodes. This suggests significant room for
> > improvement through DMA offloading, particularly for the larger transfers expected
> > in CXL scenarios.
> > 
> > It would be interesting work on combining these approaches for optimized page
> > promotion.
> > 
>
> This is very exciting, thanks Shivank and Zi!  The reason I brought this 
> topic up during the session on asynchronous page promotion for memory 
> tiering was because page migration is likely going to become *much* more 
> popular and will be in the critical path under system-wide memory 
> pressure.  Hardware assist and any software optimizations that can go 
> along with it would certainly be very interesting to discuss.
>
> Shivank, do you have an estimated timeline for when that patch series will 
> be refreshed?  Any planned integration with TMPM?
>
> Zi, are you looking to refresh your series and continue discussing page 
> migration offload?  We could set up another Linux MM Alignment Session 
> topic focused exactly on this and get representatives from the vendors 
> involved.

Sure. I am redoing the experiments with multithreads recently
and see more throughput increase (up to 10x througput with 32 threads)
on NVIDIA Grace CPUs.

Shivank's approach, using MIGRATE_SYNC_NO_COPY, looks simpler
than what I have done, splitting migrate_folio() into two parts[1]. I
am planning to rebuild my multithreaded folio copy patches on top of
Shivank's patches with some modifications. One thing to note is that
MIGRATE_SYNC_NO_COPY is removed by Kefeng (cc'd) recently[2], so I will
need to bring it back.


[1] https://github.com/x-y-z/linux-dev/tree/batched_page_migration_copy-v6.12
[2] https://lore.kernel.org/all/20240524052843.182275-6-wangkefeng.wang@huawei.com/

-- 
Best Regards,
Yan, Zi

next prev parent reply	other threads:[~2024-12-30 17:33 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-18  4:19 David Rientjes
2024-12-18 14:50 ` Zi Yan
2024-12-19  6:38   ` Shivank Garg
2024-12-30  5:30     ` David Rientjes
2024-12-30 17:33       ` Zi Yan [this message]
2025-01-06  9:14       ` Shivank Garg
2024-12-18 15:21 ` Nadav Amit
2024-12-20 11:28   ` Raghavendra K T
2024-12-18 19:23 ` SeongJae Park
2024-12-19  0:56 ` Gregory Price
2024-12-26  1:28   ` Karim Manaouil
2024-12-30  5:36     ` David Rientjes
2024-12-30  6:51       ` Raghavendra K T
2025-01-06 17:02       ` Gregory Price
2024-12-20 11:21 ` Raghavendra K T
2025-01-02  4:44   ` David Rientjes
2025-01-06  6:29     ` Raghavendra K T
2025-01-08  5:43     ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D6P7RAH0KMU3.XTM8E3TLIHEY@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=AneeshKumar.KizhakeVeetil@arm.com \
    --cc=Raghavendra.KodsaraThimmappa@amd.com \
    --cc=bharata@amd.com \
    --cc=david@redhat.com \
    --cc=gregory.price@memverge.com \
    --cc=jhubbard@nvidia.com \
    --cc=jon.grimm@amd.com \
    --cc=k.shutemov@gmail.com \
    --cc=leesuyeon0506@gmail.com \
    --cc=leillc@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mel.gorman@gmail.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=santosh.shukla@amd.com \
    --cc=shivankg@amd.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=wangkefeng.wang@huawei.com \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox