* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache [not found] <20240803094715.23900-1-gourry@gourry.net> @ 2024-08-08 23:20 ` Andrew Morton 2024-08-13 15:04 ` Gregory Price 2024-08-19 7:46 ` Huang, Ying 1 sibling, 1 reply; 14+ messages in thread From: Andrew Morton @ 2024-08-08 23:20 UTC (permalink / raw) To: Gregory Price Cc: linux-mm, linux-kernel, david, ying.huang, nphamcs, nehagholkar, abhishekd On Sat, 3 Aug 2024 05:47:12 -0400 Gregory Price <gourry@gourry.net> wrote: > Unmapped pagecache pages can be demoted to low-tier memory, but > they can only be promoted if a process maps the pages into the > memory space (so that NUMA hint faults can be caught). This can > cause significant performance degradation as the pagecache ages > and unmapped, cached files are accessed. It would be helpful to share some testing results so the magnitude of this degradation can be understood. What is the potential downside to this change? The local node now gets stuffed full of pagecache and other things get evicted? > This patch series enables the pagecache to request a promotion of > a folio when it is accessed via the pagecache. > > We add a new `numa_hint_page_cache` counter in vmstat to capture > information on when these migrations occur. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-08-08 23:20 ` [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache Andrew Morton @ 2024-08-13 15:04 ` Gregory Price 2024-08-14 16:09 ` Gregory Price 0 siblings, 1 reply; 14+ messages in thread From: Gregory Price @ 2024-08-13 15:04 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, david, ying.huang, nphamcs, nehagholkar, abhishekd On Thu, Aug 08, 2024 at 04:20:11PM -0700, Andrew Morton wrote: > On Sat, 3 Aug 2024 05:47:12 -0400 Gregory Price <gourry@gourry.net> wrote: > > > Unmapped pagecache pages can be demoted to low-tier memory, but > > they can only be promoted if a process maps the pages into the > > memory space (so that NUMA hint faults can be caught). This can > > cause significant performance degradation as the pagecache ages > > and unmapped, cached files are accessed. > > It would be helpful to share some testing results so the magnitude of > this degradation can be understood. Apologies, this should have been an RFC - testing results forthcoming. > > What is the potential downside to this change? The local node now gets > stuffed full of pagecache and other things get evicted? > That is one possible degenerate case if there exists a large amount of free memory in the local node. We're testing it now against TPP demotion logic, but the expectation should be that if the local node is already pressured the pagecache would be trapped on CXL until TPP frees up local node pages. > > This patch series enables the pagecache to request a promotion of > > a folio when it is accessed via the pagecache. > > > > We add a new `numa_hint_page_cache` counter in vmstat to capture > > information on when these migrations occur. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-08-13 15:04 ` Gregory Price @ 2024-08-14 16:09 ` Gregory Price 0 siblings, 0 replies; 14+ messages in thread From: Gregory Price @ 2024-08-14 16:09 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, david, ying.huang, nphamcs, nehagholkar, abhishekd On Tue, Aug 13, 2024 at 11:04:59AM -0400, Gregory Price wrote: > On Thu, Aug 08, 2024 at 04:20:11PM -0700, Andrew Morton wrote: > > On Sat, 3 Aug 2024 05:47:12 -0400 Gregory Price <gourry@gourry.net> wrote: > > > > > Unmapped pagecache pages can be demoted to low-tier memory, but > > > they can only be promoted if a process maps the pages into the > > > memory space (so that NUMA hint faults can be caught). This can > > > cause significant performance degradation as the pagecache ages > > > and unmapped, cached files are accessed. > > > > It would be helpful to share some testing results so the magnitude of > > this degradation can be understood. > > Apologies, this should have been an RFC - testing results forthcoming. > > > > > What is the potential downside to this change? The local node now gets > > stuffed full of pagecache and other things get evicted? > > > > That is one possible degenerate case if there exists a large amount of > free memory in the local node. We're testing it now against TPP demotion > logic, but the expectation should be that if the local node is already > pressured the pagecache would be trapped on CXL until TPP frees up local > node pages. > > > > This patch series enables the pagecache to request a promotion of > > > a folio when it is accessed via the pagecache. > > > > > > We add a new `numa_hint_page_cache` counter in vmstat to capture > > > information on when these migrations occur. Worth noting for interested parties: This patch is not stable. After some extended testing, we find some soft locks. So please disregard until v2+. ~Gregory ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache [not found] <20240803094715.23900-1-gourry@gourry.net> 2024-08-08 23:20 ` [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache Andrew Morton @ 2024-08-19 7:46 ` Huang, Ying 2024-08-19 15:15 ` Gregory Price 1 sibling, 1 reply; 14+ messages in thread From: Huang, Ying @ 2024-08-19 7:46 UTC (permalink / raw) To: Gregory Price Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar, abhishekd, Johannes Weiner Gregory Price <gourry@gourry.net> writes: > Unmapped pagecache pages can be demoted to low-tier memory, but > they can only be promoted if a process maps the pages into the > memory space (so that NUMA hint faults can be caught). This can > cause significant performance degradation as the pagecache ages > and unmapped, cached files are accessed. > > This patch series enables the pagecache to request a promotion of > a folio when it is accessed via the pagecache. > > We add a new `numa_hint_page_cache` counter in vmstat to capture > information on when these migrations occur. It appears that you will promote page cache page on the second access. Do you have some better way to identify hot pages from the not-so-hot pages? How to balance between unmapped and mapped pages? We have hot page selection for hot pages. [snip] -- Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-08-19 7:46 ` Huang, Ying @ 2024-08-19 15:15 ` Gregory Price 2024-09-02 6:53 ` Huang, Ying 0 siblings, 1 reply; 14+ messages in thread From: Gregory Price @ 2024-08-19 15:15 UTC (permalink / raw) To: Huang, Ying Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar, abhishekd, Johannes Weiner On Mon, Aug 19, 2024 at 03:46:00PM +0800, Huang, Ying wrote: > Gregory Price <gourry@gourry.net> writes: > > > Unmapped pagecache pages can be demoted to low-tier memory, but > > they can only be promoted if a process maps the pages into the > > memory space (so that NUMA hint faults can be caught). This can > > cause significant performance degradation as the pagecache ages > > and unmapped, cached files are accessed. > > > > This patch series enables the pagecache to request a promotion of > > a folio when it is accessed via the pagecache. > > > > We add a new `numa_hint_page_cache` counter in vmstat to capture > > information on when these migrations occur. > > It appears that you will promote page cache page on the second access. > Do you have some better way to identify hot pages from the not-so-hot > pages? How to balance between unmapped and mapped pages? We have hot > page selection for hot pages. > > [snip] > I've since explored moving this down under a (referenced && active) check. This would be more like promotion on third access within an LRU shrink round (the LRU should, in theory, hack off the active bits on some decent time interval when the system is pressured). Barring adding new counters to folios to track hits, I don't see a clear and obvious way way to track hotness. The primary observation here is that pagecache is un-mapped, and so cannot use numa-fault hints. This is more complicated with MGLRU, but I'm saving that for after I figure out the plan for plain old LRU. ~Gregory ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-08-19 15:15 ` Gregory Price @ 2024-09-02 6:53 ` Huang, Ying 2024-09-03 13:36 ` Gregory Price 2024-11-04 18:12 ` Gregory Price 0 siblings, 2 replies; 14+ messages in thread From: Huang, Ying @ 2024-09-02 6:53 UTC (permalink / raw) To: Gregory Price Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar, abhishekd, Johannes Weiner, Feng Tang Gregory Price <gourry@gourry.net> writes: > On Mon, Aug 19, 2024 at 03:46:00PM +0800, Huang, Ying wrote: >> Gregory Price <gourry@gourry.net> writes: >> >> > Unmapped pagecache pages can be demoted to low-tier memory, but >> > they can only be promoted if a process maps the pages into the >> > memory space (so that NUMA hint faults can be caught). This can >> > cause significant performance degradation as the pagecache ages >> > and unmapped, cached files are accessed. >> > >> > This patch series enables the pagecache to request a promotion of >> > a folio when it is accessed via the pagecache. >> > >> > We add a new `numa_hint_page_cache` counter in vmstat to capture >> > information on when these migrations occur. >> >> It appears that you will promote page cache page on the second access. >> Do you have some better way to identify hot pages from the not-so-hot >> pages? How to balance between unmapped and mapped pages? We have hot >> page selection for hot pages. >> >> [snip] >> > > I've since explored moving this down under a (referenced && active) check. > > This would be more like promotion on third access within an LRU shrink > round (the LRU should, in theory, hack off the active bits on some decent > time interval when the system is pressured). > > Barring adding new counters to folios to track hits, I don't see a clear > and obvious way way to track hotness. The primary observation here is > that pagecache is un-mapped, and so cannot use numa-fault hints. > > This is more complicated with MGLRU, but I'm saving that for after I > figure out the plan for plain old LRU. Several years ago, we have tried to use the access time tracking mechanism of NUMA balancing to track the access time latency of unmapped file cache folios. The original implementation is as follows, https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329 What do you think about this? -- Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-09-02 6:53 ` Huang, Ying @ 2024-09-03 13:36 ` Gregory Price 2024-11-04 18:12 ` Gregory Price 1 sibling, 0 replies; 14+ messages in thread From: Gregory Price @ 2024-09-03 13:36 UTC (permalink / raw) To: Huang, Ying Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar, abhishekd, Johannes Weiner, Feng Tang On Mon, Sep 02, 2024 at 02:53:26PM +0800, Huang, Ying wrote: > Gregory Price <gourry@gourry.net> writes: > > > On Mon, Aug 19, 2024 at 03:46:00PM +0800, Huang, Ying wrote: > >> Gregory Price <gourry@gourry.net> writes: > >> > >> > Unmapped pagecache pages can be demoted to low-tier memory, but > >> > they can only be promoted if a process maps the pages into the > >> > memory space (so that NUMA hint faults can be caught). This can > >> > cause significant performance degradation as the pagecache ages > >> > and unmapped, cached files are accessed. > >> > > >> > This patch series enables the pagecache to request a promotion of > >> > a folio when it is accessed via the pagecache. > >> > > >> > We add a new `numa_hint_page_cache` counter in vmstat to capture > >> > information on when these migrations occur. > >> > >> It appears that you will promote page cache page on the second access. > >> Do you have some better way to identify hot pages from the not-so-hot > >> pages? How to balance between unmapped and mapped pages? We have hot > >> page selection for hot pages. > >> > >> [snip] > >> > > > > I've since explored moving this down under a (referenced && active) check. > > > > This would be more like promotion on third access within an LRU shrink > > round (the LRU should, in theory, hack off the active bits on some decent > > time interval when the system is pressured). > > > > Barring adding new counters to folios to track hits, I don't see a clear > > and obvious way way to track hotness. The primary observation here is > > that pagecache is un-mapped, and so cannot use numa-fault hints. > > > > This is more complicated with MGLRU, but I'm saving that for after I > > figure out the plan for plain old LRU. > > Several years ago, we have tried to use the access time tracking > mechanism of NUMA balancing to track the access time latency of unmapped > file cache folios. The original implementation is as follows, > > https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329 > > What do you think about this? > Also seems like an interesting option. I've been looking at another old proposal to simply add a new LRU that was implemented by kbusch a few years back. https://git.kernel.org/pub/scm/linux/kernel/git/kbusch/linux.git/commit/?h=lru-promote&id=6616afe9a722f6ebedbb27ade3848cf07b9a3af7 I may spend a little time to add a few different methods in with a switch I can flip to test them side by side / with each other and see what results we can get. > -- > Best Regards, > Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-09-02 6:53 ` Huang, Ying 2024-09-03 13:36 ` Gregory Price @ 2024-11-04 18:12 ` Gregory Price 2024-11-05 2:00 ` Huang, Ying 1 sibling, 1 reply; 14+ messages in thread From: Gregory Price @ 2024-11-04 18:12 UTC (permalink / raw) To: Huang, Ying Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar, abhishekd, Johannes Weiner, Feng Tang On Mon, Sep 02, 2024 at 02:53:26PM +0800, Huang, Ying wrote: > Gregory Price <gourry@gourry.net> writes: > > > On Mon, Aug 19, 2024 at 03:46:00PM +0800, Huang, Ying wrote: > >> Gregory Price <gourry@gourry.net> writes: > >> > >> > Unmapped pagecache pages can be demoted to low-tier memory, but > >> > they can only be promoted if a process maps the pages into the > >> > memory space (so that NUMA hint faults can be caught). This can > >> > cause significant performance degradation as the pagecache ages > >> > and unmapped, cached files are accessed. > >> > > >> > This patch series enables the pagecache to request a promotion of > >> > a folio when it is accessed via the pagecache. > >> > > >> > We add a new `numa_hint_page_cache` counter in vmstat to capture > >> > information on when these migrations occur. > >> > >> It appears that you will promote page cache page on the second access. > >> Do you have some better way to identify hot pages from the not-so-hot > >> pages? How to balance between unmapped and mapped pages? We have hot > >> page selection for hot pages. > >> > >> [snip] > >> > > > > I've since explored moving this down under a (referenced && active) check. > > > > This would be more like promotion on third access within an LRU shrink > > round (the LRU should, in theory, hack off the active bits on some decent > > time interval when the system is pressured). > > > > Barring adding new counters to folios to track hits, I don't see a clear > > and obvious way way to track hotness. The primary observation here is > > that pagecache is un-mapped, and so cannot use numa-fault hints. > > > > This is more complicated with MGLRU, but I'm saving that for after I > > figure out the plan for plain old LRU. > > Several years ago, we have tried to use the access time tracking > mechanism of NUMA balancing to track the access time latency of unmapped > file cache folios. The original implementation is as follows, > > https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329 > > What do you think about this? > Coming back around to explore this topic a bit more, dug into this old patch and the LRU patch by Keith - I'm struggling find a good option that doesn't over-complicate or propose something contentious. I did a browse through lore and did not see any discussion on this patch or on Keith's LRU patch, so i presume discussion on this happened largely off-list. So if you have any context as to why this wasn't RFC'd officially I would like more information. My observations between these 3 proposals: - The page-lock state is complex while trying interpose in mark_folio_accessed, meaning inline promotion inside that interface is a non-starter. We found one deadlock during task exit due to the PTL being held. This worries me more generally, but we did find some success changing certain calls to mark_folio_accessed to mark_folio_accessed_and_promote - rather than modifying mark_folio_accessed. This ends up changing code in similar places to your hook - but catches a more conditions that mark a page accessed. - For Keith's proposal, promotions via LRU requires memory pressure on the lower tier to cause a shrink and therefore promotions. I'm not well versed in LRU LRU sematics, but it seems we could try proactive reclaim here. Doing promote-reclaim and demote/swap/evict reclaim on the same triggers seems counter-intuitive. - Doing promotions inline with access creates overhead. I've seen some research suggesting 60us+ per migration - so aggressiveness could harm performance. Doing it async would alleviate inline access overheads - but it could also make promotion pointless if time-to-promote is to far from liveliness of the pages. - Doing async-promotion may also require something like PG_PROMOTABLE (as proposed by Keith's patch), which will obviously be a very contentious topic. tl;dr: I'm learning towards a solution like you have here, but we may need to make a sysfs switch similar to demotion_enabled in case of poor performance due to heuristically degenerate access patterns, and we may need to expose some form of adjustable aggressiveness value to make it tunable. Reading more into the code surrounding this and other migration logic, I also think we should explore an optimization to mempolicy that tries to aggressively keep certain classes of memory on the local node (RX memory and stack for example). Other areas of reclaim try to actively prevent demoting this type of memory, so we should try not to allocate it there in the first place. ~Gregory > -- > Best Regards, > Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-11-04 18:12 ` Gregory Price @ 2024-11-05 2:00 ` Huang, Ying 2024-11-05 15:16 ` Gregory Price 2024-11-08 18:00 ` Gregory Price 0 siblings, 2 replies; 14+ messages in thread From: Huang, Ying @ 2024-11-05 2:00 UTC (permalink / raw) To: Gregory Price Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar, abhishekd, Johannes Weiner, Feng Tang Hi, Gregory, Gregory Price <gourry@gourry.net> writes: > On Mon, Sep 02, 2024 at 02:53:26PM +0800, Huang, Ying wrote: >> Gregory Price <gourry@gourry.net> writes: >> >> > On Mon, Aug 19, 2024 at 03:46:00PM +0800, Huang, Ying wrote: >> >> Gregory Price <gourry@gourry.net> writes: >> >> >> >> > Unmapped pagecache pages can be demoted to low-tier memory, but >> >> > they can only be promoted if a process maps the pages into the >> >> > memory space (so that NUMA hint faults can be caught). This can >> >> > cause significant performance degradation as the pagecache ages >> >> > and unmapped, cached files are accessed. >> >> > >> >> > This patch series enables the pagecache to request a promotion of >> >> > a folio when it is accessed via the pagecache. >> >> > >> >> > We add a new `numa_hint_page_cache` counter in vmstat to capture >> >> > information on when these migrations occur. >> >> >> >> It appears that you will promote page cache page on the second access. >> >> Do you have some better way to identify hot pages from the not-so-hot >> >> pages? How to balance between unmapped and mapped pages? We have hot >> >> page selection for hot pages. >> >> >> >> [snip] >> >> >> > >> > I've since explored moving this down under a (referenced && active) check. >> > >> > This would be more like promotion on third access within an LRU shrink >> > round (the LRU should, in theory, hack off the active bits on some decent >> > time interval when the system is pressured). >> > >> > Barring adding new counters to folios to track hits, I don't see a clear >> > and obvious way way to track hotness. The primary observation here is >> > that pagecache is un-mapped, and so cannot use numa-fault hints. >> > >> > This is more complicated with MGLRU, but I'm saving that for after I >> > figure out the plan for plain old LRU. >> >> Several years ago, we have tried to use the access time tracking >> mechanism of NUMA balancing to track the access time latency of unmapped >> file cache folios. The original implementation is as follows, >> >> https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329 >> >> What do you think about this? >> > > Coming back around to explore this topic a bit more, dug into this old > patch and the LRU patch by Keith - I'm struggling find a good option > that doesn't over-complicate or propose something contentious. > > > I did a browse through lore and did not see any discussion on this patch > or on Keith's LRU patch, so i presume discussion on this happened largely > off-list. So if you have any context as to why this wasn't RFC'd officially > I would like more information. Thanks for doing this. There's no much discussion offline. We just don't have enough time to work on the solution. > My observations between these 3 proposals: > > - The page-lock state is complex while trying interpose in mark_folio_accessed, > meaning inline promotion inside that interface is a non-starter. > > We found one deadlock during task exit due to the PTL being held. > > This worries me more generally, but we did find some success changing certain > calls to mark_folio_accessed to mark_folio_accessed_and_promote - rather than > modifying mark_folio_accessed. This ends up changing code in similar places > to your hook - but catches a more conditions that mark a page accessed. > > - For Keith's proposal, promotions via LRU requires memory pressure on the lower > tier to cause a shrink and therefore promotions. I'm not well versed in LRU > LRU sematics, but it seems we could try proactive reclaim here. > > Doing promote-reclaim and demote/swap/evict reclaim on the same triggers > seems counter-intuitive. IIUC, in TPP paper (https://arxiv.org/abs/2206.02878), a similar method is proposed for page promoting. I guess that it works together with proactive reclaiming. > - Doing promotions inline with access creates overhead. I've seen some research > suggesting 60us+ per migration - so aggressiveness could harm performance. > > Doing it async would alleviate inline access overheads - but it could also make > promotion pointless if time-to-promote is to far from liveliness of the pages. Async promotion needs to deal with the resource (CPU/memory) charging too. You do some work for a task, so you need to charge the consumed resource for the task. > - Doing async-promotion may also require something like PG_PROMOTABLE (as proposed > by Keith's patch), which will obviously be a very contentious topic. Some additional data structure can be used to record pages. > tl;dr: I'm learning towards a solution like you have here, but we may need to > make a sysfs switch similar to demotion_enabled in case of poor performance due > to heuristically degenerate access patterns, and we may need to expose some > form of adjustable aggressiveness value to make it tunable. Yes. We may need that, because the performance benefit may be lower than the overhead introduced. > Reading more into the code surrounding this and other migration logic, I also > think we should explore an optimization to mempolicy that tries to aggressively > keep certain classes of memory on the local node (RX memory and stack > for example). > > Other areas of reclaim try to actively prevent demoting this type of memory, so we > should try not to allocate it there in the first place. We have already used DRAM first allocation policy. So, we need to measure its effect firstly. -- Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-11-05 2:00 ` Huang, Ying @ 2024-11-05 15:16 ` Gregory Price 2024-11-08 18:00 ` Gregory Price 1 sibling, 0 replies; 14+ messages in thread From: Gregory Price @ 2024-11-05 15:16 UTC (permalink / raw) To: Huang, Ying Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar, abhishekd, Johannes Weiner, Feng Tang On Tue, Nov 05, 2024 at 10:00:59AM +0800, Huang, Ying wrote: > Hi, Gregory, > > Gregory Price <gourry@gourry.net> writes: > > > My observations between these 3 proposals: > > > > - The page-lock state is complex while trying interpose in mark_folio_accessed, > > meaning inline promotion inside that interface is a non-starter. > > > > We found one deadlock during task exit due to the PTL being held. > > > > This worries me more generally, but we did find some success changing certain > > calls to mark_folio_accessed to mark_folio_accessed_and_promote - rather than > > modifying mark_folio_accessed. This ends up changing code in similar places > > to your hook - but catches a more conditions that mark a page accessed. > > > > - For Keith's proposal, promotions via LRU requires memory pressure on the lower > > tier to cause a shrink and therefore promotions. I'm not well versed in LRU > > LRU sematics, but it seems we could try proactive reclaim here. > > > > Doing promote-reclaim and demote/swap/evict reclaim on the same triggers > > seems counter-intuitive. > > IIUC, in TPP paper (https://arxiv.org/abs/2206.02878), a similar method > is proposed for page promoting. I guess that it works together with > proactive reclaiming. > Each process is responsible for doing page table scanning for numa hint faults and producing a promotion. Since the structure used there is the page tables themselves, there isn't an existing recording mechanism for us to piggy-back on to defer migrations to later. > > - Doing promotions inline with access creates overhead. I've seen some research > > suggesting 60us+ per migration - so aggressiveness could harm performance. > > > > Doing it async would alleviate inline access overheads - but it could also make > > promotion pointless if time-to-promote is to far from liveliness of the pages. > > Async promotion needs to deal with the resource (CPU/memory) charging > too. You do some work for a task, so you need to charge the consumed > resource for the task. > This is a good point, and would heavily complicate things. Simple is better, let's avoid that. > > - Doing async-promotion may also require something like PG_PROMOTABLE (as proposed > > by Keith's patch), which will obviously be a very contentious topic. > > Some additional data structure can be used to record pages. > I have an idea inspired by these three sets, i'll bumble my way through a prototype. > > Reading more into the code surrounding this and other migration logic, I also > > think we should explore an optimization to mempolicy that tries to aggressively > > keep certain classes of memory on the local node (RX memory and stack > > for example). > > > > Other areas of reclaim try to actively prevent demoting this type of memory, so we > > should try not to allocate it there in the first place. > > We have already used DRAM first allocation policy. So, we need to > measure its effect firstly. > Yes, but also as the weighted interleave patch set demonstrated, it can be beneficial to change this to distribute allocations from the outset - however, distributing all allocations lead to less reliable performance than just distributing the heap. Another topic for another thread. ~Gregory ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-11-05 2:00 ` Huang, Ying 2024-11-05 15:16 ` Gregory Price @ 2024-11-08 18:00 ` Gregory Price 2024-11-11 1:35 ` Huang, Ying 1 sibling, 1 reply; 14+ messages in thread From: Gregory Price @ 2024-11-08 18:00 UTC (permalink / raw) To: Huang, Ying Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar, abhishekd, Johannes Weiner, Feng Tang On Tue, Nov 05, 2024 at 10:00:59AM +0800, Huang, Ying wrote: > Hi, Gregory, > >> > >> Several years ago, we have tried to use the access time tracking > >> mechanism of NUMA balancing to track the access time latency of unmapped > >> file cache folios. The original implementation is as follows, > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329 > >> > >> What do you think about this? > >> > > > > Coming back around to explore this topic a bit more, dug into this old > > patch and the LRU patch by Keith - I'm struggling find a good option > > that doesn't over-complicate or propose something contentious. > > > > > > I did a browse through lore and did not see any discussion on this patch > > or on Keith's LRU patch, so i presume discussion on this happened largely > > off-list. So if you have any context as to why this wasn't RFC'd officially > > I would like more information. > > Thanks for doing this. There's no much discussion offline. We just > don't have enough time to work on the solution. > Exploring and testing this a little further, I brought this up to current folio work in 6.9 and found this solution to be unstable as-is. After some work to fix lock/reference issues, Johannes pointed out that __filemap_get_folio can be called from an atomic context - which means it may not be safe to do migrations in this context. We're back to looking at something like an LRU-esque system, but now we're thinking about isolating the folios in folio_mark_accessed into a task-local list, and then process the list on resume. Basically we're thinking 1) hook folio_mark_accessed and use PG_ACTIVE/PG_ACCESSED to determine whether the page is a promotion candidate. 2) if it is, isolate it from the LRU - which is safe because folio_mark_accessed already does this elsewhere, and place it onto current->promo_queue 3) set_notify_resume 4) add logic to resume_user_mode_work() to run through current->promo_queue and either promote the pages accordingly, or do folio_putback_lru on failure. Going to RFC this up ~Gregory ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-11-08 18:00 ` Gregory Price @ 2024-11-11 1:35 ` Huang, Ying 2024-11-11 14:25 ` Gregory Price 0 siblings, 1 reply; 14+ messages in thread From: Huang, Ying @ 2024-11-11 1:35 UTC (permalink / raw) To: Gregory Price Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar, abhishekd, Johannes Weiner, Feng Tang Gregory Price <gourry@gourry.net> writes: > On Tue, Nov 05, 2024 at 10:00:59AM +0800, Huang, Ying wrote: >> Hi, Gregory, >> >> >> >> Several years ago, we have tried to use the access time tracking >> >> mechanism of NUMA balancing to track the access time latency of unmapped >> >> file cache folios. The original implementation is as follows, >> >> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=5f2e64ce75c0322602c2ec8c70b64bb69b1f1329 >> >> >> >> What do you think about this? >> >> >> > >> > Coming back around to explore this topic a bit more, dug into this old >> > patch and the LRU patch by Keith - I'm struggling find a good option >> > that doesn't over-complicate or propose something contentious. >> > >> > >> > I did a browse through lore and did not see any discussion on this patch >> > or on Keith's LRU patch, so i presume discussion on this happened largely >> > off-list. So if you have any context as to why this wasn't RFC'd officially >> > I would like more information. >> >> Thanks for doing this. There's no much discussion offline. We just >> don't have enough time to work on the solution. >> > > Exploring and testing this a little further, I brought this up to current > folio work in 6.9 and found this solution to be unstable as-is. > > After some work to fix lock/reference issues, Johannes pointed out that > __filemap_get_folio can be called from an atomic context - which means it > may not be safe to do migrations in this context. Sorry, I don't understand this, the above patch changes filemap_get_pages() and grab_cache_page_write_begin() instead of __filemap_get_folio(). > We're back to looking at something like an LRU-esque system, but now we're > thinking about isolating the folios in folio_mark_accessed into a task-local > list, and then process the list on resume. If necessary, we can use a similar method for above solution too. And we can filter accessed once folios with folio_mark_accessed() firstly. That is, only promote a page if, - record the folio access time in folio_mark_accessed() only - when the folio are accessed again, and "access_time - record_time < threshold", promote the folio. > Basically we're thinking > > 1) hook folio_mark_accessed and use PG_ACTIVE/PG_ACCESSED to determine whether > the page is a promotion candidate. > 2) if it is, isolate it from the LRU - which is safe because folio_mark_accessed > already does this elsewhere, and place it onto current->promo_queue > 3) set_notify_resume > 4) add logic to resume_user_mode_work() to run through current->promo_queue and > either promote the pages accordingly, or do folio_putback_lru on failure. Use a task_work? > Going to RFC this up -- Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-11-11 1:35 ` Huang, Ying @ 2024-11-11 14:25 ` Gregory Price 2024-11-12 0:33 ` Huang, Ying 0 siblings, 1 reply; 14+ messages in thread From: Gregory Price @ 2024-11-11 14:25 UTC (permalink / raw) To: Huang, Ying Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar, abhishekd, Johannes Weiner, Feng Tang On Mon, Nov 11, 2024 at 09:35:09AM +0800, Huang, Ying wrote: > Gregory Price <gourry@gourry.net> writes: > > > > > Exploring and testing this a little further, I brought this up to current > > folio work in 6.9 and found this solution to be unstable as-is. > > > > After some work to fix lock/reference issues, Johannes pointed out that > > __filemap_get_folio can be called from an atomic context - which means it > > may not be safe to do migrations in this context. > > Sorry, I don't understand this, the above patch changes > filemap_get_pages() and grab_cache_page_write_begin() instead of > __filemap_get_folio(). > on newer kernels, grab_cache_page_write_begin is a compat wrapper for __filemap_get_folio and folio_file_page. This chunk of code has changed somewhat significantly, actually. > > We're back to looking at something like an LRU-esque system, but now we're > > thinking about isolating the folios in folio_mark_accessed into a task-local > > list, and then process the list on resume. > > If necessary, we can use a similar method for above solution too. And > we can filter accessed once folios with folio_mark_accessed() firstly. > That is, only promote a page if, > > - record the folio access time in folio_mark_accessed() only > - when the folio are accessed again, and "access_time - record_time < > threshold", promote the folio. > yes this was the thought. > > Basically we're thinking > > > > 1) hook folio_mark_accessed and use PG_ACTIVE/PG_ACCESSED to determine whether > > the page is a promotion candidate. > > 2) if it is, isolate it from the LRU - which is safe because folio_mark_accessed > > already does this elsewhere, and place it onto current->promo_queue > > 3) set_notify_resume > > 4) add logic to resume_user_mode_work() to run through current->promo_queue and > > either promote the pages accordingly, or do folio_putback_lru on failure. > > Use a task_work? > probably more correct, had a discussion about kernel threads accessing file cache and we weren't sure if that situation even existed - so probably going to try task_work first. ~Gregory ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache 2024-11-11 14:25 ` Gregory Price @ 2024-11-12 0:33 ` Huang, Ying 0 siblings, 0 replies; 14+ messages in thread From: Huang, Ying @ 2024-11-12 0:33 UTC (permalink / raw) To: Gregory Price Cc: linux-mm, linux-kernel, akpm, david, nphamcs, nehagholkar, abhishekd, Johannes Weiner, Feng Tang Gregory Price <gourry@gourry.net> writes: > On Mon, Nov 11, 2024 at 09:35:09AM +0800, Huang, Ying wrote: >> Gregory Price <gourry@gourry.net> writes: >> >> > >> > Exploring and testing this a little further, I brought this up to current >> > folio work in 6.9 and found this solution to be unstable as-is. >> > >> > After some work to fix lock/reference issues, Johannes pointed out that >> > __filemap_get_folio can be called from an atomic context - which means it >> > may not be safe to do migrations in this context. >> >> Sorry, I don't understand this, the above patch changes >> filemap_get_pages() and grab_cache_page_write_begin() instead of >> __filemap_get_folio(). >> > > on newer kernels, grab_cache_page_write_begin is a compat wrapper for > __filemap_get_folio and folio_file_page. This chunk of code has changed > somewhat significantly, actually. > >> > We're back to looking at something like an LRU-esque system, but now we're >> > thinking about isolating the folios in folio_mark_accessed into a task-local >> > list, and then process the list on resume. >> >> If necessary, we can use a similar method for above solution too. And >> we can filter accessed once folios with folio_mark_accessed() firstly. >> That is, only promote a page if, >> >> - record the folio access time in folio_mark_accessed() only >> - when the folio are accessed again, and "access_time - record_time < >> threshold", promote the folio. >> > > yes this was the thought. > >> > Basically we're thinking >> > >> > 1) hook folio_mark_accessed and use PG_ACTIVE/PG_ACCESSED to determine whether >> > the page is a promotion candidate. >> > 2) if it is, isolate it from the LRU - which is safe because folio_mark_accessed >> > already does this elsewhere, and place it onto current->promo_queue >> > 3) set_notify_resume >> > 4) add logic to resume_user_mode_work() to run through current->promo_queue and >> > either promote the pages accordingly, or do folio_putback_lru on failure. >> >> Use a task_work? >> > > probably more correct, had a discussion about kernel threads accessing > file cache and we weren't sure if that situation even existed - so probably We can ignore kthread when collecting promoting candidates folios. > going to try task_work first. -- Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-11-12 0:36 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20240803094715.23900-1-gourry@gourry.net>
2024-08-08 23:20 ` [PATCH 0/3] mm,TPP: Enable promotion of unmapped pagecache Andrew Morton
2024-08-13 15:04 ` Gregory Price
2024-08-14 16:09 ` Gregory Price
2024-08-19 7:46 ` Huang, Ying
2024-08-19 15:15 ` Gregory Price
2024-09-02 6:53 ` Huang, Ying
2024-09-03 13:36 ` Gregory Price
2024-11-04 18:12 ` Gregory Price
2024-11-05 2:00 ` Huang, Ying
2024-11-05 15:16 ` Gregory Price
2024-11-08 18:00 ` Gregory Price
2024-11-11 1:35 ` Huang, Ying
2024-11-11 14:25 ` Gregory Price
2024-11-12 0:33 ` Huang, Ying
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox