linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] Page cache tracking with the Maple Tree
@ 2026-02-24 17:10 Liam R. Howlett
  2026-04-17 19:50 ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: Liam R. Howlett @ 2026-02-24 17:10 UTC (permalink / raw)
  To: lsf-pc, linux-mm, linux-fsdevel
  Cc: Matthew Wilcox, Johannes Weiner, David Hildenbrand, Jan Kara,
	Ryan Roberts, Christian Brauner

Hi,

The page cache currently uses a radix tree to store folios.  There are several
well-known deficiencies to this approach:
 - Inefficient representation of large folios
 - Inefficient representation of sparsely accessed files
 - Poor cache locality
 - Supports 3 search marks

The Maple Tree solves these problems more efficiently than the XArray.  In this
session, I'd like to discuss what needs to happen to change from using the
XArray to the Maple Tree for the page cache.

The Maple Tree needs some enhancements:
 - New node type to efficiently support entries of length of 1
 - New node type to support marks
 - Searching and iterating over marks
 - Support for purging shadow entries from the page cache

The motivation, beyond potential performance gains and memory footprint,
is to support more features natively in the data structure.

Thanks,
Liam


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Page cache tracking with the Maple Tree
  2026-02-24 17:10 [LSF/MM/BPF TOPIC] Page cache tracking with the Maple Tree Liam R. Howlett
@ 2026-04-17 19:50 ` Matthew Wilcox
  2026-04-17 23:14   ` Liam R. Howlett
  2026-04-20 20:18   ` Johannes Weiner
  0 siblings, 2 replies; 4+ messages in thread
From: Matthew Wilcox @ 2026-04-17 19:50 UTC (permalink / raw)
  To: Liam R. Howlett, lsf-pc, linux-mm, linux-fsdevel,
	Johannes Weiner, David Hildenbrand, Jan Kara, Ryan Roberts,
	Christian Brauner

On Tue, Feb 24, 2026 at 12:10:26PM -0500, Liam R. Howlett wrote:
> The Maple Tree needs some enhancements:
>  - Support for purging shadow entries from the page cache

Liam and I hd some preliminary discussions yesterday around this
and we'd like some feedback if anyone has time before LSFMM.

For those who aren't aware, when a folio falls off the end of the
LRU list, we store a shadow entry in the page cache in its place
so that if we access that page again, we know where to put its folio in
the LRU list.

But this creates a problem (documented in mm/workingset.c) where
we can fill up memory with shadow entries.  Currently, we embed a
list_head in xa_node and add nodes which contain only shadow entries
to a list which can be walked by a shrinker when we're low on memory.
Ideally we wouldn't do that with the maple tree.  There are a few
options.

The first question we have is whether it's best to keep nodes around to
wait for a shrinker to kick in.  Was any experimentation done to
see whether eagerly freeing a node that contains only shadow entries
has a bad effect on performance?

The second idea we talked about is that the maple tree is much more
flexible than the radix tree.  Having even a single folio in a node pins
the entire node, so it's "free" to keep the shadow entries in that node
around.  But with the maple tree, we can be much more granular and
delete shadow entries in arbitrary positions.  So we could (for example)
keep track of inodes which contain shadow entries and purge shadow
entries when they reach, say, 10% of the number of pages.  Or 1000
entries, or some other threshold.

The third idea is that instead of having an injected list_head that we
keep a tree pointing to inodes (or even just maple tree nodes which
contain a lot of shadow entries).  That's not how list_lru works today,
so a certain amount of development work would need to be done.

Liam, anything I missed?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Page cache tracking with the Maple Tree
  2026-04-17 19:50 ` Matthew Wilcox
@ 2026-04-17 23:14   ` Liam R. Howlett
  2026-04-20 20:18   ` Johannes Weiner
  1 sibling, 0 replies; 4+ messages in thread
From: Liam R. Howlett @ 2026-04-17 23:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: lsf-pc, linux-mm, linux-fsdevel, Johannes Weiner,
	David Hildenbrand, Jan Kara, Ryan Roberts, Christian Brauner

On 26/04/17 08:50PM, Matthew Wilcox wrote:
> On Tue, Feb 24, 2026 at 12:10:26PM -0500, Liam R. Howlett wrote:
> > The Maple Tree needs some enhancements:
> >  - Support for purging shadow entries from the page cache
> 
> Liam and I hd some preliminary discussions yesterday around this
> and we'd like some feedback if anyone has time before LSFMM.
> 
> For those who aren't aware, when a folio falls off the end of the
> LRU list, we store a shadow entry in the page cache in its place
> so that if we access that page again, we know where to put its folio in
> the LRU list.
> 
> But this creates a problem (documented in mm/workingset.c) where
> we can fill up memory with shadow entries.  Currently, we embed a
> list_head in xa_node and add nodes which contain only shadow entries
> to a list which can be walked by a shrinker when we're low on memory.
> Ideally we wouldn't do that with the maple tree.  There are a few
> options.
> 
> The first question we have is whether it's best to keep nodes around to
> wait for a shrinker to kick in.  Was any experimentation done to
> see whether eagerly freeing a node that contains only shadow entries
> has a bad effect on performance?
> 
> The second idea we talked about is that the maple tree is much more
> flexible than the radix tree.  Having even a single folio in a node pins
> the entire node, so it's "free" to keep the shadow entries in that node
> around.  But with the maple tree, we can be much more granular and
> delete shadow entries in arbitrary positions.  So we could (for example)
> keep track of inodes which contain shadow entries and purge shadow
> entries when they reach, say, 10% of the number of pages.  Or 1000
> entries, or some other threshold.
> 
> The third idea is that instead of having an injected list_head that we
> keep a tree pointing to inodes (or even just maple tree nodes which
> contain a lot of shadow entries).  That's not how list_lru works today,
> so a certain amount of development work would need to be done.
> 
> Liam, anything I missed?

The only thing is that order 6 trick can be removed and we can support
any order folio.

Besides that, there are some ideas on implementing the different paths.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Page cache tracking with the Maple Tree
  2026-04-17 19:50 ` Matthew Wilcox
  2026-04-17 23:14   ` Liam R. Howlett
@ 2026-04-20 20:18   ` Johannes Weiner
  1 sibling, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2026-04-20 20:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Liam R. Howlett, lsf-pc, linux-mm, linux-fsdevel,
	David Hildenbrand, Jan Kara, Ryan Roberts, Christian Brauner

On Fri, Apr 17, 2026 at 08:50:01PM +0100, Matthew Wilcox wrote:
> On Tue, Feb 24, 2026 at 12:10:26PM -0500, Liam R. Howlett wrote:
> > The Maple Tree needs some enhancements:
> >  - Support for purging shadow entries from the page cache
> 
> Liam and I hd some preliminary discussions yesterday around this
> and we'd like some feedback if anyone has time before LSFMM.
> 
> For those who aren't aware, when a folio falls off the end of the
> LRU list, we store a shadow entry in the page cache in its place
> so that if we access that page again, we know where to put its folio in
> the LRU list.
> 
> But this creates a problem (documented in mm/workingset.c) where
> we can fill up memory with shadow entries.  Currently, we embed a
> list_head in xa_node and add nodes which contain only shadow entries
> to a list which can be walked by a shrinker when we're low on memory.
> Ideally we wouldn't do that with the maple tree.  There are a few
> options.
> 
> The first question we have is whether it's best to keep nodes around to
> wait for a shrinker to kick in.  Was any experimentation done to
> see whether eagerly freeing a node that contains only shadow entries
> has a bad effect on performance?

Hm, I'm not sure how that could work.

The LRU order created by readahead makes it highly likely that all the
folios in a cache node are reclaimed/made non-resident at once. Going
this route would destroy a large part of the non-resident cache the
moment it is created.

The goal is to garbage-collect the oldest shadow entries whose
distances are too long to be actionable at this point. Specifically,
their distance to lruvec->nonresident_age (per-cgroup, per-node).

In the current scheme, we just go in the order in which nodes became
all-shadow - oldest first. And we only do so lazily when the
non-resident cache is far into that territory (cache set vastly larger
than available memory). That gives us confidence that we're mostly
dropping very old entries without having to look at them one by one.

We don't have to stick with that design, but whatever replaces it
should meet the goal, or approximate it well enough.

> The second idea we talked about is that the maple tree is much more
> flexible than the radix tree.  Having even a single folio in a node pins
> the entire node, so it's "free" to keep the shadow entries in that node
> around.  But with the maple tree, we can be much more granular and
> delete shadow entries in arbitrary positions.  So we could (for example)
> keep track of inodes which contain shadow entries and purge shadow
> entries when they reach, say, 10% of the number of pages.  Or 1000
> entries, or some other threshold.

It's not the volume or the concentration of shadows, it's their age
that makes them good candidates for garbage collection.

Searching the tree for all-shadow nodes should be possible, but you'd
still have to filter for age. Naively, that would be unpack_shadow()
-> workingset_test_recent() for each one, but that's likely too heavy.

What might work is a monotonic ID counter for each node that becomes
all-shadow, then search the trees with quick comparisons for any IDs
below a certain cutoff?

> The third idea is that instead of having an injected list_head that we
> keep a tree pointing to inodes (or even just maple tree nodes which
> contain a lot of shadow entries).  That's not how list_lru works today,
> so a certain amount of development work would need to be done.

Ah, so you don't need storage inside the inode or maple tree node.

That could work well with the monotonic ID counter, you'd just scan
and reclaim through that tree, oldest node first.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-20 20:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-24 17:10 [LSF/MM/BPF TOPIC] Page cache tracking with the Maple Tree Liam R. Howlett
2026-04-17 19:50 ` Matthew Wilcox
2026-04-17 23:14   ` Liam R. Howlett
2026-04-20 20:18   ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox