Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
       [not found] <20240803094715.23900-1-gourry@gourry.net>
@ 2024-08-08 23:20 ` Andrew Morton
  2024-08-13 15:04   ` Gregory Price
  2024-08-19  7:46 ` Huang, Ying
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2024-08-08 23:20 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-mm, linux-kernel, david, ying.huang, nphamcs, nehagholkar,
	abhishekd

On Sat,  3 Aug 2024 05:47:12 -0400 Gregory Price <gourry@gourry.net> wrote:

> Unmapped pagecache pages can be demoted to low-tier memory, but 
> they can only be promoted if a process maps the pages into the
> memory space (so that NUMA hint faults can be caught).  This can
> cause significant performance degradation as the pagecache ages
> and unmapped, cached files are accessed.

It would be helpful to share some testing results so the magnitude of
this degradation can be understood.

What is the potential downside to this change?  The local node now gets
stuffed full of pagecache and other things get evicted?

> This patch series enables the pagecache to request a promotion of
> a folio when it is accessed via the pagecache.
> 
> We add a new `numa_hint_page_cache` counter in vmstat to capture
> information on when these migrations occur.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-08-08 23:20 ` [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache Andrew Morton
@ 2024-08-13 15:04   ` Gregory Price
  2024-08-14 16:09     ` Gregory Price
  0 siblings, 1 reply; 14+ messages in thread
From: Gregory Price @ 2024-08-13 15:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, david, ying.huang, nphamcs, nehagholkar,
	abhishekd

On Thu, Aug 08, 2024 at 04:20:11PM -0700, Andrew Morton wrote:
> On Sat,  3 Aug 2024 05:47:12 -0400 Gregory Price <gourry@gourry.net> wrote:
> 
> > Unmapped pagecache pages can be demoted to low-tier memory, but 
> > they can only be promoted if a process maps the pages into the
> > memory space (so that NUMA hint faults can be caught).  This can
> > cause significant performance degradation as the pagecache ages
> > and unmapped, cached files are accessed.
> 
> It would be helpful to share some testing results so the magnitude of
> this degradation can be understood.

Apologies, this should have been an RFC - testing results forthcoming.

> 
> What is the potential downside to this change?  The local node now gets
> stuffed full of pagecache and other things get evicted?
> 

That is one possible degenerate case if there exists a large amount of
free memory in the local node.  We're testing it now against TPP demotion
logic, but the expectation should be that if the local node is already
pressured the pagecache would be trapped on CXL until TPP frees up local
node pages.

> > This patch series enables the pagecache to request a promotion of
> > a folio when it is accessed via the pagecache.
> > 
> > We add a new `numa_hint_page_cache` counter in vmstat to capture
> > information on when these migrations occur.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-08-13 15:04   ` Gregory Price
@ 2024-08-14 16:09     ` Gregory Price
  0 siblings, 0 replies; 14+ messages in thread
From: Gregory Price @ 2024-08-14 16:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, david, ying.huang, nphamcs, nehagholkar,
	abhishekd

On Tue, Aug 13, 2024 at 11:04:59AM -0400, Gregory Price wrote:
> On Thu, Aug 08, 2024 at 04:20:11PM -0700, Andrew Morton wrote:
> > On Sat,  3 Aug 2024 05:47:12 -0400 Gregory Price <gourry@gourry.net> wrote:
> > 
> > > Unmapped pagecache pages can be demoted to low-tier memory, but 
> > > they can only be promoted if a process maps the pages into the
> > > memory space (so that NUMA hint faults can be caught).  This can
> > > cause significant performance degradation as the pagecache ages
> > > and unmapped, cached files are accessed.
> > 
> > It would be helpful to share some testing results so the magnitude of
> > this degradation can be understood.
> 
> Apologies, this should have been an RFC - testing results forthcoming.
> 
> > 
> > What is the potential downside to this change?  The local node now gets
> > stuffed full of pagecache and other things get evicted?
> > 
> 
> That is one possible degenerate case if there exists a large amount of
> free memory in the local node.  We're testing it now against TPP demotion
> logic, but the expectation should be that if the local node is already
> pressured the pagecache would be trapped on CXL until TPP frees up local
> node pages.
> 
> > > This patch series enables the pagecache to request a promotion of
> > > a folio when it is accessed via the pagecache.
> > > 
> > > We add a new `numa_hint_page_cache` counter in vmstat to capture
> > > information on when these migrations occur.

Worth noting for interested parties: This patch is not stable.  After some
extended testing, we find some soft locks.  So please disregard until v2+.

~Gregory


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
       [not found] <20240803094715.23900-1-gourry@gourry.net>
  2024-08-08 23:20 ` [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache Andrew Morton
@ 2024-08-19  7:46 ` Huang, Ying
  2024-08-19 15:15   ` Gregory Price
  1 sibling, 1 reply; 14+ messages in thread
From: Huang, Ying @ 2024-08-19  7:46 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar,
	abhishekd, Johannes Weiner

Gregory Price <gourry@gourry.net> writes:

> Unmapped pagecache pages can be demoted to low-tier memory, but 
> they can only be promoted if a process maps the pages into the
> memory space (so that NUMA hint faults can be caught).  This can
> cause significant performance degradation as the pagecache ages
> and unmapped, cached files are accessed.
>
> This patch series enables the pagecache to request a promotion of
> a folio when it is accessed via the pagecache.
>
> We add a new `numa_hint_page_cache` counter in vmstat to capture
> information on when these migrations occur.

It appears that you will promote page cache page on the second access.
Do you have some better way to identify hot pages from the not-so-hot
pages?  How to balance between unmapped and mapped pages?  We have hot
page selection for hot pages.

[snip]

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-08-19  7:46 ` Huang, Ying
@ 2024-08-19 15:15   ` Gregory Price
  2024-09-02  6:53     ` Huang, Ying
  0 siblings, 1 reply; 14+ messages in thread
From: Gregory Price @ 2024-08-19 15:15 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar,
	abhishekd, Johannes Weiner

On Mon, Aug 19, 2024 at 03:46:00PM +0800, Huang, Ying wrote:
> Gregory Price <gourry@gourry.net> writes:
> 
> > Unmapped pagecache pages can be demoted to low-tier memory, but 
> > they can only be promoted if a process maps the pages into the
> > memory space (so that NUMA hint faults can be caught).  This can
> > cause significant performance degradation as the pagecache ages
> > and unmapped, cached files are accessed.
> >
> > This patch series enables the pagecache to request a promotion of
> > a folio when it is accessed via the pagecache.
> >
> > We add a new `numa_hint_page_cache` counter in vmstat to capture
> > information on when these migrations occur.
> 
> It appears that you will promote page cache page on the second access.
> Do you have some better way to identify hot pages from the not-so-hot
> pages?  How to balance between unmapped and mapped pages?  We have hot
> page selection for hot pages.
> 
> [snip]
> 

I've since explored moving this down under a (referenced && active) check.

This would be more like promotion on third access within an LRU shrink
round (the LRU should, in theory, hack off the active bits on some decent
time interval when the system is pressured).

Barring adding new counters to folios to track hits, I don't see a clear
and obvious way way to track hotness.  The primary observation here is 
that pagecache is un-mapped, and so cannot use numa-fault hints.

This is more complicated with MGLRU, but I'm saving that for after I
figure out the plan for plain old LRU.

~Gregory


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-08-19 15:15   ` Gregory Price
@ 2024-09-02  6:53     ` Huang, Ying
  2024-09-03 13:36       ` Gregory Price
  2024-11-04 18:12       ` Gregory Price
  0 siblings, 2 replies; 14+ messages in thread
From: Huang, Ying @ 2024-09-02  6:53 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar,
	abhishekd, Johannes Weiner, Feng Tang

Gregory Price <gourry@gourry.net> writes:

> On Mon, Aug 19, 2024 at 03:46:00PM +0800, Huang, Ying wrote:
>> Gregory Price <gourry@gourry.net> writes:
>> 
>> > Unmapped pagecache pages can be demoted to low-tier memory, but 
>> > they can only be promoted if a process maps the pages into the
>> > memory space (so that NUMA hint faults can be caught).  This can
>> > cause significant performance degradation as the pagecache ages
>> > and unmapped, cached files are accessed.
>> >
>> > This patch series enables the pagecache to request a promotion of
>> > a folio when it is accessed via the pagecache.
>> >
>> > We add a new `numa_hint_page_cache` counter in vmstat to capture
>> > information on when these migrations occur.
>> 
>> It appears that you will promote page cache page on the second access.
>> Do you have some better way to identify hot pages from the not-so-hot
>> pages?  How to balance between unmapped and mapped pages?  We have hot
>> page selection for hot pages.
>> 
>> [snip]
>> 
>
> I've since explored moving this down under a (referenced && active) check.
>
> This would be more like promotion on third access within an LRU shrink
> round (the LRU should, in theory, hack off the active bits on some decent
> time interval when the system is pressured).
>
> Barring adding new counters to folios to track hits, I don't see a clear
> and obvious way way to track hotness.  The primary observation here is 
> that pagecache is un-mapped, and so cannot use numa-fault hints.
>
> This is more complicated with MGLRU, but I'm saving that for after I
> figure out the plan for plain old LRU.

Several years ago, we have tried to use the access time tracking
mechanism of NUMA balancing to track the access time latency of unmapped
file cache folios.  The original implementation is as follows,

https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329

What do you think about this?

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-09-02  6:53     ` Huang, Ying
@ 2024-09-03 13:36       ` Gregory Price
  2024-11-04 18:12       ` Gregory Price
  1 sibling, 0 replies; 14+ messages in thread
From: Gregory Price @ 2024-09-03 13:36 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar,
	abhishekd, Johannes Weiner, Feng Tang

On Mon, Sep 02, 2024 at 02:53:26PM +0800, Huang, Ying wrote:
> Gregory Price <gourry@gourry.net> writes:
> 
> > On Mon, Aug 19, 2024 at 03:46:00PM +0800, Huang, Ying wrote:
> >> Gregory Price <gourry@gourry.net> writes:
> >> 
> >> > Unmapped pagecache pages can be demoted to low-tier memory, but 
> >> > they can only be promoted if a process maps the pages into the
> >> > memory space (so that NUMA hint faults can be caught).  This can
> >> > cause significant performance degradation as the pagecache ages
> >> > and unmapped, cached files are accessed.
> >> >
> >> > This patch series enables the pagecache to request a promotion of
> >> > a folio when it is accessed via the pagecache.
> >> >
> >> > We add a new `numa_hint_page_cache` counter in vmstat to capture
> >> > information on when these migrations occur.
> >> 
> >> It appears that you will promote page cache page on the second access.
> >> Do you have some better way to identify hot pages from the not-so-hot
> >> pages?  How to balance between unmapped and mapped pages?  We have hot
> >> page selection for hot pages.
> >> 
> >> [snip]
> >> 
> >
> > I've since explored moving this down under a (referenced && active) check.
> >
> > This would be more like promotion on third access within an LRU shrink
> > round (the LRU should, in theory, hack off the active bits on some decent
> > time interval when the system is pressured).
> >
> > Barring adding new counters to folios to track hits, I don't see a clear
> > and obvious way way to track hotness.  The primary observation here is 
> > that pagecache is un-mapped, and so cannot use numa-fault hints.
> >
> > This is more complicated with MGLRU, but I'm saving that for after I
> > figure out the plan for plain old LRU.
> 
> Several years ago, we have tried to use the access time tracking
> mechanism of NUMA balancing to track the access time latency of unmapped
> file cache folios.  The original implementation is as follows,
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329
> 
> What do you think about this?
>

Also seems like an interesting option. I've been looking at another old
proposal to simply add a new LRU that was implemented by kbusch a few
years back.

https://git.kernel.org/pub/scm/linux/kernel/git/kbusch/linux.git/commit/?h=lru-promote&id=6616afe9a722f6ebedbb27ade3848cf07b9a3af7

I may spend a little time to add a few different methods in with a switch
I can flip to test them side by side / with each other and see what results
we can get.
 
> --
> Best Regards,
> Huang, Ying


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-09-02  6:53     ` Huang, Ying
  2024-09-03 13:36       ` Gregory Price
@ 2024-11-04 18:12       ` Gregory Price
  2024-11-05  2:00         ` Huang, Ying
  1 sibling, 1 reply; 14+ messages in thread
From: Gregory Price @ 2024-11-04 18:12 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar,
	abhishekd, Johannes Weiner, Feng Tang

On Mon, Sep 02, 2024 at 02:53:26PM +0800, Huang, Ying wrote:
> Gregory Price <gourry@gourry.net> writes:
> 
> > On Mon, Aug 19, 2024 at 03:46:00PM +0800, Huang, Ying wrote:
> >> Gregory Price <gourry@gourry.net> writes:
> >> 
> >> > Unmapped pagecache pages can be demoted to low-tier memory, but 
> >> > they can only be promoted if a process maps the pages into the
> >> > memory space (so that NUMA hint faults can be caught).  This can
> >> > cause significant performance degradation as the pagecache ages
> >> > and unmapped, cached files are accessed.
> >> >
> >> > This patch series enables the pagecache to request a promotion of
> >> > a folio when it is accessed via the pagecache.
> >> >
> >> > We add a new `numa_hint_page_cache` counter in vmstat to capture
> >> > information on when these migrations occur.
> >> 
> >> It appears that you will promote page cache page on the second access.
> >> Do you have some better way to identify hot pages from the not-so-hot
> >> pages?  How to balance between unmapped and mapped pages?  We have hot
> >> page selection for hot pages.
> >> 
> >> [snip]
> >> 
> >
> > I've since explored moving this down under a (referenced && active) check.
> >
> > This would be more like promotion on third access within an LRU shrink
> > round (the LRU should, in theory, hack off the active bits on some decent
> > time interval when the system is pressured).
> >
> > Barring adding new counters to folios to track hits, I don't see a clear
> > and obvious way way to track hotness.  The primary observation here is 
> > that pagecache is un-mapped, and so cannot use numa-fault hints.
> >
> > This is more complicated with MGLRU, but I'm saving that for after I
> > figure out the plan for plain old LRU.
> 
> Several years ago, we have tried to use the access time tracking
> mechanism of NUMA balancing to track the access time latency of unmapped
> file cache folios.  The original implementation is as follows,
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329
> 
> What do you think about this?
> 

Coming back around to explore this topic a bit more, dug into this old
patch and the LRU patch by Keith - I'm struggling find a good option
that doesn't over-complicate or propose something contentious.

I did a browse through lore and did not see any discussion on this patch
or on Keith's LRU patch, so i presume discussion on this happened largely
off-list.  So if you have any context as to why this wasn't RFC'd officially
I would like more information.

My observations between these 3 proposals:

- The page-lock state is complex while trying interpose in mark_folio_accessed,
  meaning inline promotion inside that interface is a non-starter.

  We found one deadlock during task exit due to the PTL being held. 

  This worries me more generally, but we did find some success changing certain
  calls to mark_folio_accessed to mark_folio_accessed_and_promote - rather than
  modifying mark_folio_accessed. This ends up changing code in similar places
  to your hook - but catches a more conditions that mark a page accessed.

- For Keith's proposal, promotions via LRU requires memory pressure on the lower
  tier to cause a shrink and therefore promotions. I'm not well versed in LRU
  LRU sematics, but it seems we could try proactive reclaim here.

  Doing promote-reclaim and demote/swap/evict reclaim on the same triggers
  seems counter-intuitive.

- Doing promotions inline with access creates overhead.  I've seen some research
  suggesting 60us+ per migration - so aggressiveness could harm performance.

  Doing it async would alleviate inline access overheads - but it could also make
  promotion pointless if time-to-promote is to far from liveliness of the pages.

- Doing async-promotion may also require something like PG_PROMOTABLE (as proposed
  by Keith's patch), which will obviously be a very contentious topic.

tl;dr: I'm learning towards a solution like you have here, but we may need to
make a sysfs switch similar to demotion_enabled in case of poor performance due
to heuristically degenerate access patterns, and we may need to expose some
form of adjustable aggressiveness value to make it tunable.

Reading more into the code surrounding this and other migration logic, I also
think we should explore an optimization to mempolicy that tries to aggressively
keep certain classes of memory on the local node (RX memory and stack for example).

Other areas of reclaim try to actively prevent demoting this type of memory, so we
should try not to allocate it there in the first place.

~Gregory

> --
> Best Regards,
> Huang, Ying

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-11-04 18:12       ` Gregory Price
@ 2024-11-05  2:00         ` Huang, Ying
  2024-11-05 15:16           ` Gregory Price
  2024-11-08 18:00           ` Gregory Price
  0 siblings, 2 replies; 14+ messages in thread
From: Huang, Ying @ 2024-11-05  2:00 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar,
	abhishekd, Johannes Weiner, Feng Tang

Hi, Gregory,

Gregory Price <gourry@gourry.net> writes:

> On Mon, Sep 02, 2024 at 02:53:26PM +0800, Huang, Ying wrote:
>> Gregory Price <gourry@gourry.net> writes:
>> 
>> > On Mon, Aug 19, 2024 at 03:46:00PM +0800, Huang, Ying wrote:
>> >> Gregory Price <gourry@gourry.net> writes:
>> >> 
>> >> > Unmapped pagecache pages can be demoted to low-tier memory, but 
>> >> > they can only be promoted if a process maps the pages into the
>> >> > memory space (so that NUMA hint faults can be caught).  This can
>> >> > cause significant performance degradation as the pagecache ages
>> >> > and unmapped, cached files are accessed.
>> >> >
>> >> > This patch series enables the pagecache to request a promotion of
>> >> > a folio when it is accessed via the pagecache.
>> >> >
>> >> > We add a new `numa_hint_page_cache` counter in vmstat to capture
>> >> > information on when these migrations occur.
>> >> 
>> >> It appears that you will promote page cache page on the second access.
>> >> Do you have some better way to identify hot pages from the not-so-hot
>> >> pages?  How to balance between unmapped and mapped pages?  We have hot
>> >> page selection for hot pages.
>> >> 
>> >> [snip]
>> >> 
>> >
>> > I've since explored moving this down under a (referenced && active) check.
>> >
>> > This would be more like promotion on third access within an LRU shrink
>> > round (the LRU should, in theory, hack off the active bits on some decent
>> > time interval when the system is pressured).
>> >
>> > Barring adding new counters to folios to track hits, I don't see a clear
>> > and obvious way way to track hotness.  The primary observation here is 
>> > that pagecache is un-mapped, and so cannot use numa-fault hints.
>> >
>> > This is more complicated with MGLRU, but I'm saving that for after I
>> > figure out the plan for plain old LRU.
>> 
>> Several years ago, we have tried to use the access time tracking
>> mechanism of NUMA balancing to track the access time latency of unmapped
>> file cache folios.  The original implementation is as follows,
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329
>> 
>> What do you think about this?
>> 
>
> Coming back around to explore this topic a bit more, dug into this old
> patch and the LRU patch by Keith - I'm struggling find a good option
> that doesn't over-complicate or propose something contentious.
>
>
> I did a browse through lore and did not see any discussion on this patch
> or on Keith's LRU patch, so i presume discussion on this happened largely
> off-list.  So if you have any context as to why this wasn't RFC'd officially
> I would like more information.

Thanks for doing this.  There's no much discussion offline.  We just
don't have enough time to work on the solution.

> My observations between these 3 proposals:
>
> - The page-lock state is complex while trying interpose in mark_folio_accessed,
>   meaning inline promotion inside that interface is a non-starter.
>
>   We found one deadlock during task exit due to the PTL being held. 
>
>   This worries me more generally, but we did find some success changing certain
>   calls to mark_folio_accessed to mark_folio_accessed_and_promote - rather than
>   modifying mark_folio_accessed. This ends up changing code in similar places
>   to your hook - but catches a more conditions that mark a page accessed.
>
> - For Keith's proposal, promotions via LRU requires memory pressure on the lower
>   tier to cause a shrink and therefore promotions. I'm not well versed in LRU
>   LRU sematics, but it seems we could try proactive reclaim here.
>   
>   Doing promote-reclaim and demote/swap/evict reclaim on the same triggers
>   seems counter-intuitive.

IIUC, in TPP paper (https://arxiv.org/abs/2206.02878), a similar method
is proposed for page promoting.  I guess that it works together with
proactive reclaiming.

> - Doing promotions inline with access creates overhead.  I've seen some research
>   suggesting 60us+ per migration - so aggressiveness could harm performance.
>
>   Doing it async would alleviate inline access overheads - but it could also make
>   promotion pointless if time-to-promote is to far from liveliness of the pages.

Async promotion needs to deal with the resource (CPU/memory) charging
too.  You do some work for a task, so you need to charge the consumed
resource for the task.

> - Doing async-promotion may also require something like PG_PROMOTABLE (as proposed
>   by Keith's patch), which will obviously be a very contentious topic.

Some additional data structure can be used to record pages.

> tl;dr: I'm learning towards a solution like you have here, but we may need to
> make a sysfs switch similar to demotion_enabled in case of poor performance due
> to heuristically degenerate access patterns, and we may need to expose some
> form of adjustable aggressiveness value to make it tunable.

Yes.  We may need that, because the performance benefit may be lower
than the overhead introduced.

> Reading more into the code surrounding this and other migration logic, I also
> think we should explore an optimization to mempolicy that tries to aggressively
> keep certain classes of memory on the local node (RX memory and stack
> for example).
>
> Other areas of reclaim try to actively prevent demoting this type of memory, so we
> should try not to allocate it there in the first place.

We have already used DRAM first allocation policy.  So, we need to
measure its effect firstly.

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-11-05  2:00         ` Huang, Ying
@ 2024-11-05 15:16           ` Gregory Price
  2024-11-08 18:00           ` Gregory Price
  1 sibling, 0 replies; 14+ messages in thread
From: Gregory Price @ 2024-11-05 15:16 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar,
	abhishekd, Johannes Weiner, Feng Tang

On Tue, Nov 05, 2024 at 10:00:59AM +0800, Huang, Ying wrote:
> Hi, Gregory,
> 
> Gregory Price <gourry@gourry.net> writes:
> 
> > My observations between these 3 proposals:
> >
> > - The page-lock state is complex while trying interpose in mark_folio_accessed,
> >   meaning inline promotion inside that interface is a non-starter.
> >
> >   We found one deadlock during task exit due to the PTL being held. 
> >
> >   This worries me more generally, but we did find some success changing certain
> >   calls to mark_folio_accessed to mark_folio_accessed_and_promote - rather than
> >   modifying mark_folio_accessed. This ends up changing code in similar places
> >   to your hook - but catches a more conditions that mark a page accessed.
> >
> > - For Keith's proposal, promotions via LRU requires memory pressure on the lower
> >   tier to cause a shrink and therefore promotions. I'm not well versed in LRU
> >   LRU sematics, but it seems we could try proactive reclaim here.
> >   
> >   Doing promote-reclaim and demote/swap/evict reclaim on the same triggers
> >   seems counter-intuitive.
> 
> IIUC, in TPP paper (https://arxiv.org/abs/2206.02878), a similar method
> is proposed for page promoting.  I guess that it works together with
> proactive reclaiming.
> 

Each process is responsible for doing page table scanning for numa hint faults
and producing a promotion.  Since the structure used there is the page tables
themselves, there isn't an existing recording mechanism for us to piggy-back on
to defer migrations to later.

> > - Doing promotions inline with access creates overhead.  I've seen some research
> >   suggesting 60us+ per migration - so aggressiveness could harm performance.
> >
> >   Doing it async would alleviate inline access overheads - but it could also make
> >   promotion pointless if time-to-promote is to far from liveliness of the pages.
> 
> Async promotion needs to deal with the resource (CPU/memory) charging
> too.  You do some work for a task, so you need to charge the consumed
> resource for the task.
> 

This is a good point, and would heavily complicate things. Simple is better,
let's avoid that.

> > - Doing async-promotion may also require something like PG_PROMOTABLE (as proposed
> >   by Keith's patch), which will obviously be a very contentious topic.
> 
> Some additional data structure can be used to record pages.
> 

I have an idea inspired by these three sets, i'll bumble my way through a prototype.

> > Reading more into the code surrounding this and other migration logic, I also
> > think we should explore an optimization to mempolicy that tries to aggressively
> > keep certain classes of memory on the local node (RX memory and stack
> > for example).
> >
> > Other areas of reclaim try to actively prevent demoting this type of memory, so we
> > should try not to allocate it there in the first place.
> 
> We have already used DRAM first allocation policy.  So, we need to
> measure its effect firstly.
> 

Yes, but also as the weighted interleave patch set demonstrated, it can be beneficial
to change this to distribute allocations from the outset - however, distributing all
allocations lead to less reliable performance than just distributing the heap.

Another topic for another thread.
~Gregory


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-11-05  2:00         ` Huang, Ying
  2024-11-05 15:16           ` Gregory Price
@ 2024-11-08 18:00           ` Gregory Price
  2024-11-11  1:35             ` Huang, Ying
  1 sibling, 1 reply; 14+ messages in thread
From: Gregory Price @ 2024-11-08 18:00 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar,
	abhishekd, Johannes Weiner, Feng Tang

On Tue, Nov 05, 2024 at 10:00:59AM +0800, Huang, Ying wrote:
> Hi, Gregory,
> >> 
> >> Several years ago, we have tried to use the access time tracking
> >> mechanism of NUMA balancing to track the access time latency of unmapped
> >> file cache folios.  The original implementation is as follows,
> >> 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329
> >> 
> >> What do you think about this?
> >> 
> >
> > Coming back around to explore this topic a bit more, dug into this old
> > patch and the LRU patch by Keith - I'm struggling find a good option
> > that doesn't over-complicate or propose something contentious.
> >
> >
> > I did a browse through lore and did not see any discussion on this patch
> > or on Keith's LRU patch, so i presume discussion on this happened largely
> > off-list.  So if you have any context as to why this wasn't RFC'd officially
> > I would like more information.
> 
> Thanks for doing this.  There's no much discussion offline.  We just
> don't have enough time to work on the solution.
> 

Exploring and testing this a little further, I brought this up to current
folio work in 6.9 and found this solution to be unstable as-is.

After some work to fix lock/reference issues, Johannes pointed out that
__filemap_get_folio can be called from an atomic context - which means it
may not be safe to do migrations in this context.

We're back to looking at something like an LRU-esque system, but now we're
thinking about isolating the folios in folio_mark_accessed into a task-local
list, and then process the list on resume.

Basically we're thinking

1) hook folio_mark_accessed and use PG_ACTIVE/PG_ACCESSED to determine whether
   the page is a promotion candidate.
2) if it is, isolate it from the LRU - which is safe because folio_mark_accessed
   already does this elsewhere, and place it onto current->promo_queue
3) set_notify_resume
4) add logic to resume_user_mode_work() to run through current->promo_queue and
   either promote the pages accordingly, or do folio_putback_lru on failure.

Going to RFC this up

~Gregory

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-11-08 18:00           ` Gregory Price
@ 2024-11-11  1:35             ` Huang, Ying
  2024-11-11 14:25               ` Gregory Price
  0 siblings, 1 reply; 14+ messages in thread
From: Huang, Ying @ 2024-11-11  1:35 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar,
	abhishekd, Johannes Weiner, Feng Tang

Gregory Price <gourry@gourry.net> writes:

> On Tue, Nov 05, 2024 at 10:00:59AM +0800, Huang, Ying wrote:
>> Hi, Gregory,
>> >> 
>> >> Several years ago, we have tried to use the access time tracking
>> >> mechanism of NUMA balancing to track the access time latency of unmapped
>> >> file cache folios.  The original implementation is as follows,
>> >> 
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329
>> >> 
>> >> What do you think about this?
>> >> 
>> >
>> > Coming back around to explore this topic a bit more, dug into this old
>> > patch and the LRU patch by Keith - I'm struggling find a good option
>> > that doesn't over-complicate or propose something contentious.
>> >
>> >
>> > I did a browse through lore and did not see any discussion on this patch
>> > or on Keith's LRU patch, so i presume discussion on this happened largely
>> > off-list.  So if you have any context as to why this wasn't RFC'd officially
>> > I would like more information.
>> 
>> Thanks for doing this.  There's no much discussion offline.  We just
>> don't have enough time to work on the solution.
>> 
>
> Exploring and testing this a little further, I brought this up to current
> folio work in 6.9 and found this solution to be unstable as-is.
>
> After some work to fix lock/reference issues, Johannes pointed out that
> __filemap_get_folio can be called from an atomic context - which means it
> may not be safe to do migrations in this context.

Sorry, I don't understand this, the above patch changes
filemap_get_pages() and grab_cache_page_write_begin() instead of
__filemap_get_folio().

> We're back to looking at something like an LRU-esque system, but now we're
> thinking about isolating the folios in folio_mark_accessed into a task-local
> list, and then process the list on resume.

If necessary, we can use a similar method for above solution too.  And
we can filter accessed once folios with folio_mark_accessed() firstly.
That is, only promote a page if,

- record the folio access time in folio_mark_accessed() only
- when the folio are accessed again, and "access_time - record_time <
  threshold", promote the folio.

> Basically we're thinking
>
> 1) hook folio_mark_accessed and use PG_ACTIVE/PG_ACCESSED to determine whether
>    the page is a promotion candidate.
> 2) if it is, isolate it from the LRU - which is safe because folio_mark_accessed
>    already does this elsewhere, and place it onto current->promo_queue
> 3) set_notify_resume
> 4) add logic to resume_user_mode_work() to run through current->promo_queue and
>    either promote the pages accordingly, or do folio_putback_lru on failure.

Use a task_work?

> Going to RFC this up

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-11-11  1:35             ` Huang, Ying
@ 2024-11-11 14:25               ` Gregory Price
  2024-11-12  0:33                 ` Huang, Ying
  0 siblings, 1 reply; 14+ messages in thread
From: Gregory Price @ 2024-11-11 14:25 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar,
	abhishekd, Johannes Weiner, Feng Tang

On Mon, Nov 11, 2024 at 09:35:09AM +0800, Huang, Ying wrote:
> Gregory Price <gourry@gourry.net> writes:
> 
> >
> > Exploring and testing this a little further, I brought this up to current
> > folio work in 6.9 and found this solution to be unstable as-is.
> >
> > After some work to fix lock/reference issues, Johannes pointed out that
> > __filemap_get_folio can be called from an atomic context - which means it
> > may not be safe to do migrations in this context.
> 
> Sorry, I don't understand this, the above patch changes
> filemap_get_pages() and grab_cache_page_write_begin() instead of
> __filemap_get_folio().
>

on newer kernels, grab_cache_page_write_begin is a compat wrapper for
__filemap_get_folio and folio_file_page.  This chunk of code has changed
somewhat significantly, actually.
 
> > We're back to looking at something like an LRU-esque system, but now we're
> > thinking about isolating the folios in folio_mark_accessed into a task-local
> > list, and then process the list on resume.
> 
> If necessary, we can use a similar method for above solution too.  And
> we can filter accessed once folios with folio_mark_accessed() firstly.
> That is, only promote a page if,
> 
> - record the folio access time in folio_mark_accessed() only
> - when the folio are accessed again, and "access_time - record_time <
>   threshold", promote the folio.
> 

yes this was the thought.

> > Basically we're thinking
> >
> > 1) hook folio_mark_accessed and use PG_ACTIVE/PG_ACCESSED to determine whether
> >    the page is a promotion candidate.
> > 2) if it is, isolate it from the LRU - which is safe because folio_mark_accessed
> >    already does this elsewhere, and place it onto current->promo_queue
> > 3) set_notify_resume
> > 4) add logic to resume_user_mode_work() to run through current->promo_queue and
> >    either promote the pages accordingly, or do folio_putback_lru on failure.
> 
> Use a task_work?
> 

probably more correct, had a discussion about kernel threads accessing
file cache and we weren't sure if that situation even existed - so probably
going to try task_work first.

~Gregory


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache
  2024-11-11 14:25               ` Gregory Price
@ 2024-11-12  0:33                 ` Huang, Ying
  0 siblings, 0 replies; 14+ messages in thread
From: Huang, Ying @ 2024-11-12  0:33 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar,
	abhishekd, Johannes Weiner, Feng Tang

Gregory Price <gourry@gourry.net> writes:

> On Mon, Nov 11, 2024 at 09:35:09AM +0800, Huang, Ying wrote:
>> Gregory Price <gourry@gourry.net> writes:
>> 
>> >
>> > Exploring and testing this a little further, I brought this up to current
>> > folio work in 6.9 and found this solution to be unstable as-is.
>> >
>> > After some work to fix lock/reference issues, Johannes pointed out that
>> > __filemap_get_folio can be called from an atomic context - which means it
>> > may not be safe to do migrations in this context.
>> 
>> Sorry, I don't understand this, the above patch changes
>> filemap_get_pages() and grab_cache_page_write_begin() instead of
>> __filemap_get_folio().
>>
>
> on newer kernels, grab_cache_page_write_begin is a compat wrapper for
> __filemap_get_folio and folio_file_page.  This chunk of code has changed
> somewhat significantly, actually.
>  
>> > We're back to looking at something like an LRU-esque system, but now we're
>> > thinking about isolating the folios in folio_mark_accessed into a task-local
>> > list, and then process the list on resume.
>> 
>> If necessary, we can use a similar method for above solution too.  And
>> we can filter accessed once folios with folio_mark_accessed() firstly.
>> That is, only promote a page if,
>> 
>> - record the folio access time in folio_mark_accessed() only
>> - when the folio are accessed again, and "access_time - record_time <
>>   threshold", promote the folio.
>> 
>
> yes this was the thought.
>
>> > Basically we're thinking
>> >
>> > 1) hook folio_mark_accessed and use PG_ACTIVE/PG_ACCESSED to determine whether
>> >    the page is a promotion candidate.
>> > 2) if it is, isolate it from the LRU - which is safe because folio_mark_accessed
>> >    already does this elsewhere, and place it onto current->promo_queue
>> > 3) set_notify_resume
>> > 4) add logic to resume_user_mode_work() to run through current->promo_queue and
>> >    either promote the pages accordingly, or do folio_putback_lru on failure.
>> 
>> Use a task_work?
>> 
>
> probably more correct, had a discussion about kernel threads accessing
> file cache and we weren't sure if that situation even existed - so probably

We can ignore kthread when collecting promoting candidates folios.

> going to try task_work first.

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-11-12  0:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20240803094715.23900-1-gourry@gourry.net>
2024-08-08 23:20 ` [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache Andrew Morton
2024-08-13 15:04   ` Gregory Price
2024-08-14 16:09     ` Gregory Price
2024-08-19  7:46 ` Huang, Ying
2024-08-19 15:15   ` Gregory Price
2024-09-02  6:53     ` Huang, Ying
2024-09-03 13:36       ` Gregory Price
2024-11-04 18:12       ` Gregory Price
2024-11-05  2:00         ` Huang, Ying
2024-11-05 15:16           ` Gregory Price
2024-11-08 18:00           ` Gregory Price
2024-11-11  1:35             ` Huang, Ying
2024-11-11 14:25               ` Gregory Price
2024-11-12  0:33                 ` Huang, Ying

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox