linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [LSM/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping
@ 2026-02-19 19:28 Lorenzo Stoakes
  2026-02-19 20:25 ` Suren Baghdasaryan
  2026-02-20 15:03 ` Liam R. Howlett
  0 siblings, 2 replies; 6+ messages in thread
From: Lorenzo Stoakes @ 2026-02-19 19:28 UTC (permalink / raw)
  To: lsf-pc
  Cc: linux-mm, David Hildenbrand, Liam R. Howlett, Vlastimil Babka,
	Suren Baghdasaryan, Pedro Falcato, Ryan Roberts, Harry Yoo,
	Rik van Riel, Jann Horn, Chris Li, Barry Song

Currently we track the reverse mapping between folios and VMAs at a VMA level,
utilising a complicated and confusing combination of anon_vma objects and
anon_vma_chain's linking them, which must be updated when VMAs are split,
merged, remapped or forked.

It's further complicated by various optimisations intended to avoid scalability
issues in locking and memory allocation.

I have done recent work to improve the situation [0] which has also lead to a
reported improvement in lock scalability [1], but fundamentally the situation
remains the same.

The logic is actually, when you think hard enough about it, is a fairly
reasonable means of implementing the reverse mapping at a VMA level.

It is, however, a very broken abstraction as it stands. In order to work with
the logic, you have to essentially keep a broad understanding of the entire
implementation in your head at one time - that is, not much is really
abstracted.

This results in confusion, mistakes, and bit rot. It's also very time-consuming
to work with - personally I've gone to the lengths of writing a private set of
slides for myself on the topic as a reminder each time I come back to it.

There are also issues with lock scalability - the use of interval trees to
maintain a connection between an anon_vma and AVCs connected to VMAs requires
that a lock must be held across the entire 'CoW hierarchy' of parent and child
VMAs whenever performing an rmap walk or performing a merge, split, remap or
fork.

This is because we tear down all interval tree mappings and reestablish them
each time we might see changes in VMA geometry. This is an issue Barry Song
identified as problematic in a real world use case [2].

So what do we do to improve the situation?

Recently I have been working on an experimental new approach to the anonymous
reverse mapping, in which we instead track anonymous remaps, and then use the
VMA's virtual page offset to locate VMAs from the folio.

I have got the implementation working to the point where it tracks the exact
same VMAs as the anon_vma implementation, and it seems a lot of it can be done
under RCU.

It avoids the need to maintain expensive mappings at a VMA level, though it
incurs a cost in tracking remaps, and MAP_PRIVATE files are very much a TODO
(they maintain a file vma->vm_pgoff, even when CoW'd, so the remap tracking is
pretty sub-optimal).

I am investigating whether I can change how MAP_PRIVATE file-backed mappings
work to avoid this issue, and will be developing tests to see how lock
scalability, throughput and memory usage compare to the anon_vma approach under
different workloads.

This experiment may or may not work out, either way it will be interesting to
discuss it.

By the time LSF/MM comes around I may even have already decided on a different
approach but that's what makes things interesting :)

[0]:https://lore.kernel.org/all/cover.1767711638.git.lorenzo.stoakes@oracle.com/
[1]:https://lore.kernel.org/all/202602061747.855f053f-lkp@intel.com/
[2]:https://lore.kernel.org/linux-mm/CAGsJ_4x=YsQR=nNcHA-q=0vg0b7ok=81C_qQqKmoJ+BZ+HVduQ@mail.gmail.com/

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSM/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping
  2026-02-19 19:28 [LSM/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping Lorenzo Stoakes
@ 2026-02-19 20:25 ` Suren Baghdasaryan
  2026-02-20 11:34   ` Lorenzo Stoakes
  2026-02-20 15:03 ` Liam R. Howlett
  1 sibling, 1 reply; 6+ messages in thread
From: Suren Baghdasaryan @ 2026-02-19 20:25 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: lsf-pc, linux-mm, David Hildenbrand, Liam R. Howlett,
	Vlastimil Babka, Pedro Falcato, Ryan Roberts, Harry Yoo,
	Rik van Riel, Jann Horn, Chris Li, Barry Song

On Thu, Feb 19, 2026 at 11:28 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> Currently we track the reverse mapping between folios and VMAs at a VMA level,
> utilising a complicated and confusing combination of anon_vma objects and
> anon_vma_chain's linking them, which must be updated when VMAs are split,
> merged, remapped or forked.
>
> It's further complicated by various optimisations intended to avoid scalability
> issues in locking and memory allocation.
>
> I have done recent work to improve the situation [0] which has also lead to a
> reported improvement in lock scalability [1], but fundamentally the situation
> remains the same.
>
> The logic is actually, when you think hard enough about it, is a fairly
> reasonable means of implementing the reverse mapping at a VMA level.
>
> It is, however, a very broken abstraction as it stands. In order to work with
> the logic, you have to essentially keep a broad understanding of the entire
> implementation in your head at one time - that is, not much is really
> abstracted.
>
> This results in confusion, mistakes, and bit rot. It's also very time-consuming
> to work with - personally I've gone to the lengths of writing a private set of
> slides for myself on the topic as a reminder each time I come back to it.
>
> There are also issues with lock scalability - the use of interval trees to
> maintain a connection between an anon_vma and AVCs connected to VMAs requires
> that a lock must be held across the entire 'CoW hierarchy' of parent and child
> VMAs whenever performing an rmap walk or performing a merge, split, remap or
> fork.
>
> This is because we tear down all interval tree mappings and reestablish them
> each time we might see changes in VMA geometry. This is an issue Barry Song
> identified as problematic in a real world use case [2].
>
> So what do we do to improve the situation?
>
> Recently I have been working on an experimental new approach to the anonymous
> reverse mapping, in which we instead track anonymous remaps, and then use the
> VMA's virtual page offset to locate VMAs from the folio.
>
> I have got the implementation working to the point where it tracks the exact
> same VMAs as the anon_vma implementation, and it seems a lot of it can be done
> under RCU.

Do you have a link to the code we can look at before the discussion?

>
> It avoids the need to maintain expensive mappings at a VMA level, though it
> incurs a cost in tracking remaps, and MAP_PRIVATE files are very much a TODO
> (they maintain a file vma->vm_pgoff, even when CoW'd, so the remap tracking is
> pretty sub-optimal).
>
> I am investigating whether I can change how MAP_PRIVATE file-backed mappings
> work to avoid this issue, and will be developing tests to see how lock
> scalability, throughput and memory usage compare to the anon_vma approach under
> different workloads.
>
> This experiment may or may not work out, either way it will be interesting to
> discuss it.

I'm interested in this discussion. Hopefully this will result in
simpler rmap code and reduced lock contention.
Thanks,
Suren.

>
> By the time LSF/MM comes around I may even have already decided on a different
> approach but that's what makes things interesting :)
>
> [0]:https://lore.kernel.org/all/cover.1767711638.git.lorenzo.stoakes@oracle.com/
> [1]:https://lore.kernel.org/all/202602061747.855f053f-lkp@intel.com/
> [2]:https://lore.kernel.org/linux-mm/CAGsJ_4x=YsQR=nNcHA-q=0vg0b7ok=81C_qQqKmoJ+BZ+HVduQ@mail.gmail.com/
>
> Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSM/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping
  2026-02-19 20:25 ` Suren Baghdasaryan
@ 2026-02-20 11:34   ` Lorenzo Stoakes
  0 siblings, 0 replies; 6+ messages in thread
From: Lorenzo Stoakes @ 2026-02-20 11:34 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: lsf-pc, linux-mm, David Hildenbrand, Liam R. Howlett,
	Vlastimil Babka, Pedro Falcato, Ryan Roberts, Harry Yoo,
	Rik van Riel, Jann Horn, Chris Li, Barry Song

On Thu, Feb 19, 2026 at 12:25:47PM -0800, Suren Baghdasaryan wrote:
> On Thu, Feb 19, 2026 at 11:28 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > Currently we track the reverse mapping between folios and VMAs at a VMA level,
> > utilising a complicated and confusing combination of anon_vma objects and
> > anon_vma_chain's linking them, which must be updated when VMAs are split,
> > merged, remapped or forked.
> >
> > It's further complicated by various optimisations intended to avoid scalability
> > issues in locking and memory allocation.
> >
> > I have done recent work to improve the situation [0] which has also lead to a
> > reported improvement in lock scalability [1], but fundamentally the situation
> > remains the same.
> >
> > The logic is actually, when you think hard enough about it, is a fairly
> > reasonable means of implementing the reverse mapping at a VMA level.
> >
> > It is, however, a very broken abstraction as it stands. In order to work with
> > the logic, you have to essentially keep a broad understanding of the entire
> > implementation in your head at one time - that is, not much is really
> > abstracted.
> >
> > This results in confusion, mistakes, and bit rot. It's also very time-consuming
> > to work with - personally I've gone to the lengths of writing a private set of
> > slides for myself on the topic as a reminder each time I come back to it.
> >
> > There are also issues with lock scalability - the use of interval trees to
> > maintain a connection between an anon_vma and AVCs connected to VMAs requires
> > that a lock must be held across the entire 'CoW hierarchy' of parent and child
> > VMAs whenever performing an rmap walk or performing a merge, split, remap or
> > fork.
> >
> > This is because we tear down all interval tree mappings and reestablish them
> > each time we might see changes in VMA geometry. This is an issue Barry Song
> > identified as problematic in a real world use case [2].
> >
> > So what do we do to improve the situation?
> >
> > Recently I have been working on an experimental new approach to the anonymous
> > reverse mapping, in which we instead track anonymous remaps, and then use the
> > VMA's virtual page offset to locate VMAs from the folio.
> >
> > I have got the implementation working to the point where it tracks the exact
> > same VMAs as the anon_vma implementation, and it seems a lot of it can be done
> > under RCU.
>
> Do you have a link to the code we can look at before the discussion?

The code is in a really early stage and being constantly changed so _not yet_
but I'll put it somewhere public once it's settled down.

It's also currently just a case of the implementation is side-by-side the
existing anon_vma stuff, with code in rmap_walk_anon() also kicking off a cow
context walk then comparing the count of discovered matching folios.

The code will of course eventually

>
> >
> > It avoids the need to maintain expensive mappings at a VMA level, though it
> > incurs a cost in tracking remaps, and MAP_PRIVATE files are very much a TODO
> > (they maintain a file vma->vm_pgoff, even when CoW'd, so the remap tracking is
> > pretty sub-optimal).
> >
> > I am investigating whether I can change how MAP_PRIVATE file-backed mappings
> > work to avoid this issue, and will be developing tests to see how lock
> > scalability, throughput and memory usage compare to the anon_vma approach under
> > different workloads.
> >
> > This experiment may or may not work out, either way it will be interesting to
> > discuss it.
>
> I'm interested in this discussion. Hopefully this will result in
> simpler rmap code and reduced lock contention.
> Thanks,
> Suren.

Thanks, I am keen to extract numbers from this and use that to guide the
implementation - I want this to be an evidence-based improvement rather than
simply a rework of some kind :)

If the proposed solution causes meaningful regressions then I will look to an
alternative approach, either way I think it's important to base as much as
possible on actual observed numbers.

I hope to implement benchmarking/test code as part of this work which should be
useful regardless of the approach taken.

When I have this more developed I may ask people to help test this on different
hardware/clusters/etc. to ensure stability and assess impact, so don't be
surprised if I ask for a favour at this point :)

>
> >
> > By the time LSF/MM comes around I may even have already decided on a different
> > approach but that's what makes things interesting :)
> >
> > [0]:https://lore.kernel.org/all/cover.1767711638.git.lorenzo.stoakes@oracle.com/
> > [1]:https://lore.kernel.org/all/202602061747.855f053f-lkp@intel.com/
> > [2]:https://lore.kernel.org/linux-mm/CAGsJ_4x=YsQR=nNcHA-q=0vg0b7ok=81C_qQqKmoJ+BZ+HVduQ@mail.gmail.com/
> >
> > Cheers, Lorenzo
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSM/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping
  2026-02-19 19:28 [LSM/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping Lorenzo Stoakes
  2026-02-19 20:25 ` Suren Baghdasaryan
@ 2026-02-20 15:03 ` Liam R. Howlett
  2026-02-20 15:38   ` Lorenzo Stoakes
  1 sibling, 1 reply; 6+ messages in thread
From: Liam R. Howlett @ 2026-02-20 15:03 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: lsf-pc, linux-mm, David Hildenbrand, Vlastimil Babka,
	Suren Baghdasaryan, Pedro Falcato, Ryan Roberts, Harry Yoo,
	Rik van Riel, Jann Horn, Chris Li, Barry Song

* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [260219 14:28]:
> Currently we track the reverse mapping between folios and VMAs at a VMA level,
> utilising a complicated and confusing combination of anon_vma objects and
> anon_vma_chain's linking them, which must be updated when VMAs are split,
> merged, remapped or forked.
> 
> It's further complicated by various optimisations intended to avoid scalability
> issues in locking and memory allocation.
> 
> I have done recent work to improve the situation [0] which has also lead to a
> reported improvement in lock scalability [1], but fundamentally the situation
> remains the same.
> 
> The logic is actually, when you think hard enough about it, is a fairly
> reasonable means of implementing the reverse mapping at a VMA level.
> 
> It is, however, a very broken abstraction as it stands. In order to work with
> the logic, you have to essentially keep a broad understanding of the entire
> implementation in your head at one time - that is, not much is really
> abstracted.
> 
> This results in confusion, mistakes, and bit rot. It's also very time-consuming
> to work with - personally I've gone to the lengths of writing a private set of
> slides for myself on the topic as a reminder each time I come back to it.
> 
> There are also issues with lock scalability - the use of interval trees to
> maintain a connection between an anon_vma and AVCs connected to VMAs requires
> that a lock must be held across the entire 'CoW hierarchy' of parent and child
> VMAs whenever performing an rmap walk or performing a merge, split, remap or
> fork.
> 
> This is because we tear down all interval tree mappings and reestablish them
> each time we might see changes in VMA geometry. This is an issue Barry Song
> identified as problematic in a real world use case [2].
> 
> So what do we do to improve the situation?
> 
> Recently I have been working on an experimental new approach to the anonymous
> reverse mapping, in which we instead track anonymous remaps, and then use the
> VMA's virtual page offset to locate VMAs from the folio.
> 
> I have got the implementation working to the point where it tracks the exact
> same VMAs as the anon_vma implementation, and it seems a lot of it can be done
> under RCU.
> 
> It avoids the need to maintain expensive mappings at a VMA level, though it
> incurs a cost in tracking remaps, and MAP_PRIVATE files are very much a TODO
> (they maintain a file vma->vm_pgoff, even when CoW'd, so the remap tracking is
> pretty sub-optimal).
> 
> I am investigating whether I can change how MAP_PRIVATE file-backed mappings
> work to avoid this issue, and will be developing tests to see how lock
> scalability, throughput and memory usage compare to the anon_vma approach under
> different workloads.
> 
> This experiment may or may not work out, either way it will be interesting to
> discuss it.

Discussing alternatives to the anon_vma and anon_vma_chain would be
interesting.

Just to clarify, this is to look at the complexity of the data
structures and not the locking, or both?

> 
> By the time LSF/MM comes around I may even have already decided on a different
> approach but that's what makes things interesting :)
> 
> [0]:https://lore.kernel.org/all/cover.1767711638.git.lorenzo.stoakes@oracle.com/
> [1]:https://lore.kernel.org/all/202602061747.855f053f-lkp@intel.com/
> [2]:https://lore.kernel.org/linux-mm/CAGsJ_4x=YsQR=nNcHA-q=0vg0b7ok=81C_qQqKmoJ+BZ+HVduQ@mail.gmail.com/
> 
> Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSM/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping
  2026-02-20 15:03 ` Liam R. Howlett
@ 2026-02-20 15:38   ` Lorenzo Stoakes
  2026-02-20 19:22     ` Liam R. Howlett
  0 siblings, 1 reply; 6+ messages in thread
From: Lorenzo Stoakes @ 2026-02-20 15:38 UTC (permalink / raw)
  To: Liam R. Howlett, lsf-pc, linux-mm, David Hildenbrand,
	Vlastimil Babka, Suren Baghdasaryan, Pedro Falcato, Ryan Roberts,
	Harry Yoo, Rik van Riel, Jann Horn, Chris Li, Barry Song

On Fri, Feb 20, 2026 at 10:03:29AM -0500, Liam R. Howlett wrote:
> * Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [260219 14:28]:
> > Currently we track the reverse mapping between folios and VMAs at a VMA level,
> > utilising a complicated and confusing combination of anon_vma objects and
> > anon_vma_chain's linking them, which must be updated when VMAs are split,
> > merged, remapped or forked.
> >
> > It's further complicated by various optimisations intended to avoid scalability
> > issues in locking and memory allocation.
> >
> > I have done recent work to improve the situation [0] which has also lead to a
> > reported improvement in lock scalability [1], but fundamentally the situation
> > remains the same.
> >
> > The logic is actually, when you think hard enough about it, is a fairly
> > reasonable means of implementing the reverse mapping at a VMA level.
> >
> > It is, however, a very broken abstraction as it stands. In order to work with
> > the logic, you have to essentially keep a broad understanding of the entire
> > implementation in your head at one time - that is, not much is really
> > abstracted.
> >
> > This results in confusion, mistakes, and bit rot. It's also very time-consuming
> > to work with - personally I've gone to the lengths of writing a private set of
> > slides for myself on the topic as a reminder each time I come back to it.
> >
> > There are also issues with lock scalability - the use of interval trees to
> > maintain a connection between an anon_vma and AVCs connected to VMAs requires
> > that a lock must be held across the entire 'CoW hierarchy' of parent and child
> > VMAs whenever performing an rmap walk or performing a merge, split, remap or
> > fork.
> >
> > This is because we tear down all interval tree mappings and reestablish them
> > each time we might see changes in VMA geometry. This is an issue Barry Song
> > identified as problematic in a real world use case [2].
> >
> > So what do we do to improve the situation?
> >
> > Recently I have been working on an experimental new approach to the anonymous
> > reverse mapping, in which we instead track anonymous remaps, and then use the
> > VMA's virtual page offset to locate VMAs from the folio.
> >
> > I have got the implementation working to the point where it tracks the exact
> > same VMAs as the anon_vma implementation, and it seems a lot of it can be done
> > under RCU.
> >
> > It avoids the need to maintain expensive mappings at a VMA level, though it
> > incurs a cost in tracking remaps, and MAP_PRIVATE files are very much a TODO
> > (they maintain a file vma->vm_pgoff, even when CoW'd, so the remap tracking is
> > pretty sub-optimal).
> >
> > I am investigating whether I can change how MAP_PRIVATE file-backed mappings
> > work to avoid this issue, and will be developing tests to see how lock
> > scalability, throughput and memory usage compare to the anon_vma approach under
> > different workloads.
> >
> > This experiment may or may not work out, either way it will be interesting to
> > discuss it.
>
> Discussing alternatives to the anon_vma and anon_vma_chain would be
> interesting.
>
> Just to clarify, this is to look at the complexity of the data
> structures and not the locking, or both?

It's emphatically not about a rework for rework's sake or a de-complexifying of
the algorithms, it's really focused on:

- Memory usage
- Lock scalability
- Performance

And these are the metrics that will determine the way forward.

Talking specifically about my current experiments, I have totally reworked the
entire thing, it's a fundamentally different approach (as briefly described
above), which also completely changes how the locking works.

This maintains a per-mm data structure (which also outlives the mm) called the
cow_context, that tracks anon remaps and the CoW hierarchy
(i.e. parent/child/etc relationship between mm's which have forked).

Since we don't fork that much, RCU makes sense for the connections between
parents/children and means that we can quickly read through the VMA maple trees
for each mm without having to contend any locks.

I currently have the code working (as far as I can tell) with RCU alone, I'm
still testing this but obviously that'd be quite a nice property to maintain and
could lead to quite different characteristics compared to the current
implementation.

But I'm still figuring things out and MAP_PRIVATE file-backed mappings remain a
complete pain (they are effectively 'remapped' from the start).

Whether this approach works or not, it should give some interesting data and
insights that can feed in an alternative approach if necessary.

>
> >
> > By the time LSF/MM comes around I may even have already decided on a different
> > approach but that's what makes things interesting :)
> >
> > [0]:https://lore.kernel.org/all/cover.1767711638.git.lorenzo.stoakes@oracle.com/
> > [1]:https://lore.kernel.org/all/202602061747.855f053f-lkp@intel.com/
> > [2]:https://lore.kernel.org/linux-mm/CAGsJ_4x=YsQR=nNcHA-q=0vg0b7ok=81C_qQqKmoJ+BZ+HVduQ@mail.gmail.com/
> >
> > Cheers, Lorenzo
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSM/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping
  2026-02-20 15:38   ` Lorenzo Stoakes
@ 2026-02-20 19:22     ` Liam R. Howlett
  0 siblings, 0 replies; 6+ messages in thread
From: Liam R. Howlett @ 2026-02-20 19:22 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: lsf-pc, linux-mm, David Hildenbrand, Vlastimil Babka,
	Suren Baghdasaryan, Pedro Falcato, Ryan Roberts, Harry Yoo,
	Rik van Riel, Jann Horn, Chris Li, Barry Song

* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [260220 10:39]:
> On Fri, Feb 20, 2026 at 10:03:29AM -0500, Liam R. Howlett wrote:
> > * Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [260219 14:28]:
> > > Currently we track the reverse mapping between folios and VMAs at a VMA level,
> > > utilising a complicated and confusing combination of anon_vma objects and
> > > anon_vma_chain's linking them, which must be updated when VMAs are split,
> > > merged, remapped or forked.
> > >
> > > It's further complicated by various optimisations intended to avoid scalability
> > > issues in locking and memory allocation.
> > >
> > > I have done recent work to improve the situation [0] which has also lead to a
> > > reported improvement in lock scalability [1], but fundamentally the situation
> > > remains the same.
> > >
> > > The logic is actually, when you think hard enough about it, is a fairly
> > > reasonable means of implementing the reverse mapping at a VMA level.
> > >
> > > It is, however, a very broken abstraction as it stands. In order to work with
> > > the logic, you have to essentially keep a broad understanding of the entire
> > > implementation in your head at one time - that is, not much is really
> > > abstracted.
> > >
> > > This results in confusion, mistakes, and bit rot. It's also very time-consuming
> > > to work with - personally I've gone to the lengths of writing a private set of
> > > slides for myself on the topic as a reminder each time I come back to it.
> > >
> > > There are also issues with lock scalability - the use of interval trees to
> > > maintain a connection between an anon_vma and AVCs connected to VMAs requires
> > > that a lock must be held across the entire 'CoW hierarchy' of parent and child
> > > VMAs whenever performing an rmap walk or performing a merge, split, remap or
> > > fork.
> > >
> > > This is because we tear down all interval tree mappings and reestablish them
> > > each time we might see changes in VMA geometry. This is an issue Barry Song
> > > identified as problematic in a real world use case [2].
> > >
> > > So what do we do to improve the situation?
> > >
> > > Recently I have been working on an experimental new approach to the anonymous
> > > reverse mapping, in which we instead track anonymous remaps, and then use the
> > > VMA's virtual page offset to locate VMAs from the folio.
> > >
> > > I have got the implementation working to the point where it tracks the exact
> > > same VMAs as the anon_vma implementation, and it seems a lot of it can be done
> > > under RCU.
> > >
> > > It avoids the need to maintain expensive mappings at a VMA level, though it
> > > incurs a cost in tracking remaps, and MAP_PRIVATE files are very much a TODO
> > > (they maintain a file vma->vm_pgoff, even when CoW'd, so the remap tracking is
> > > pretty sub-optimal).
> > >
> > > I am investigating whether I can change how MAP_PRIVATE file-backed mappings
> > > work to avoid this issue, and will be developing tests to see how lock
> > > scalability, throughput and memory usage compare to the anon_vma approach under
> > > different workloads.
> > >
> > > This experiment may or may not work out, either way it will be interesting to
> > > discuss it.
> >
> > Discussing alternatives to the anon_vma and anon_vma_chain would be
> > interesting.
> >
> > Just to clarify, this is to look at the complexity of the data
> > structures and not the locking, or both?
> 
> It's emphatically not about a rework for rework's sake or a de-complexifying of
> the algorithms, it's really focused on:
> 
> - Memory usage
> - Lock scalability
> - Performance
> 
> And these are the metrics that will determine the way forward.
> 
> Talking specifically about my current experiments, I have totally reworked the
> entire thing, it's a fundamentally different approach (as briefly described
> above), which also completely changes how the locking works.
> 
> This maintains a per-mm data structure (which also outlives the mm) called the
> cow_context, that tracks anon remaps and the CoW hierarchy
> (i.e. parent/child/etc relationship between mm's which have forked).
> 
> Since we don't fork that much, RCU makes sense for the connections between
> parents/children and means that we can quickly read through the VMA maple trees
> for each mm without having to contend any locks.
> 
> I currently have the code working (as far as I can tell) with RCU alone, I'm
> still testing this but obviously that'd be quite a nice property to maintain and
> could lead to quite different characteristics compared to the current
> implementation.
> 
> But I'm still figuring things out and MAP_PRIVATE file-backed mappings remain a
> complete pain (they are effectively 'remapped' from the start).
> 
> Whether this approach works or not, it should give some interesting data and
> insights that can feed in an alternative approach if necessary.
> 

The locking changes are very interesting to me as it pertains to the
tangle we get into with the mmap lock, which requires preallocation (and
external locks on the maple tree) in most cases.

Although this can't fix (all of) the tangled locking, it could reduce it
significantly.

Thanks,
Liam


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-20 19:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-19 19:28 [LSM/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping Lorenzo Stoakes
2026-02-19 20:25 ` Suren Baghdasaryan
2026-02-20 11:34   ` Lorenzo Stoakes
2026-02-20 15:03 ` Liam R. Howlett
2026-02-20 15:38   ` Lorenzo Stoakes
2026-02-20 19:22     ` Liam R. Howlett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox