linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	David Hildenbrand <david@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Suren Baghdasaryan <surenb@google.com>,
	Pedro Falcato <pfalcato@suse.de>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Harry Yoo <harry.yoo@oracle.com>, Rik van Riel <riel@surriel.com>,
	Jann Horn <jannh@google.com>, Chris Li <chriscli@google.com>,
	Barry Song <baohua@kernel.org>
Subject: Re: [LSM/MM/BPF TOPIC] The Future of the Anonymous Reverse Mapping
Date: Fri, 20 Feb 2026 14:22:25 -0500	[thread overview]
Message-ID: <iwmfmiswzs7wlubom4unvhed6kxjybdscf5onmgvlfpbf2uxlf@oetdllsyxviw> (raw)
In-Reply-To: <76b8c24f-22a5-4af4-baea-087e0a2b0e70@lucifer.local>

* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [260220 10:39]:
> On Fri, Feb 20, 2026 at 10:03:29AM -0500, Liam R. Howlett wrote:
> > * Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [260219 14:28]:
> > > Currently we track the reverse mapping between folios and VMAs at a VMA level,
> > > utilising a complicated and confusing combination of anon_vma objects and
> > > anon_vma_chain's linking them, which must be updated when VMAs are split,
> > > merged, remapped or forked.
> > >
> > > It's further complicated by various optimisations intended to avoid scalability
> > > issues in locking and memory allocation.
> > >
> > > I have done recent work to improve the situation [0] which has also lead to a
> > > reported improvement in lock scalability [1], but fundamentally the situation
> > > remains the same.
> > >
> > > The logic is actually, when you think hard enough about it, is a fairly
> > > reasonable means of implementing the reverse mapping at a VMA level.
> > >
> > > It is, however, a very broken abstraction as it stands. In order to work with
> > > the logic, you have to essentially keep a broad understanding of the entire
> > > implementation in your head at one time - that is, not much is really
> > > abstracted.
> > >
> > > This results in confusion, mistakes, and bit rot. It's also very time-consuming
> > > to work with - personally I've gone to the lengths of writing a private set of
> > > slides for myself on the topic as a reminder each time I come back to it.
> > >
> > > There are also issues with lock scalability - the use of interval trees to
> > > maintain a connection between an anon_vma and AVCs connected to VMAs requires
> > > that a lock must be held across the entire 'CoW hierarchy' of parent and child
> > > VMAs whenever performing an rmap walk or performing a merge, split, remap or
> > > fork.
> > >
> > > This is because we tear down all interval tree mappings and reestablish them
> > > each time we might see changes in VMA geometry. This is an issue Barry Song
> > > identified as problematic in a real world use case [2].
> > >
> > > So what do we do to improve the situation?
> > >
> > > Recently I have been working on an experimental new approach to the anonymous
> > > reverse mapping, in which we instead track anonymous remaps, and then use the
> > > VMA's virtual page offset to locate VMAs from the folio.
> > >
> > > I have got the implementation working to the point where it tracks the exact
> > > same VMAs as the anon_vma implementation, and it seems a lot of it can be done
> > > under RCU.
> > >
> > > It avoids the need to maintain expensive mappings at a VMA level, though it
> > > incurs a cost in tracking remaps, and MAP_PRIVATE files are very much a TODO
> > > (they maintain a file vma->vm_pgoff, even when CoW'd, so the remap tracking is
> > > pretty sub-optimal).
> > >
> > > I am investigating whether I can change how MAP_PRIVATE file-backed mappings
> > > work to avoid this issue, and will be developing tests to see how lock
> > > scalability, throughput and memory usage compare to the anon_vma approach under
> > > different workloads.
> > >
> > > This experiment may or may not work out, either way it will be interesting to
> > > discuss it.
> >
> > Discussing alternatives to the anon_vma and anon_vma_chain would be
> > interesting.
> >
> > Just to clarify, this is to look at the complexity of the data
> > structures and not the locking, or both?
> 
> It's emphatically not about a rework for rework's sake or a de-complexifying of
> the algorithms, it's really focused on:
> 
> - Memory usage
> - Lock scalability
> - Performance
> 
> And these are the metrics that will determine the way forward.
> 
> Talking specifically about my current experiments, I have totally reworked the
> entire thing, it's a fundamentally different approach (as briefly described
> above), which also completely changes how the locking works.
> 
> This maintains a per-mm data structure (which also outlives the mm) called the
> cow_context, that tracks anon remaps and the CoW hierarchy
> (i.e. parent/child/etc relationship between mm's which have forked).
> 
> Since we don't fork that much, RCU makes sense for the connections between
> parents/children and means that we can quickly read through the VMA maple trees
> for each mm without having to contend any locks.
> 
> I currently have the code working (as far as I can tell) with RCU alone, I'm
> still testing this but obviously that'd be quite a nice property to maintain and
> could lead to quite different characteristics compared to the current
> implementation.
> 
> But I'm still figuring things out and MAP_PRIVATE file-backed mappings remain a
> complete pain (they are effectively 'remapped' from the start).
> 
> Whether this approach works or not, it should give some interesting data and
> insights that can feed in an alternative approach if necessary.
> 

The locking changes are very interesting to me as it pertains to the
tangle we get into with the mmap lock, which requires preallocation (and
external locks on the maple tree) in most cases.

Although this can't fix (all of) the tangled locking, it could reduce it
significantly.

Thanks,
Liam


      reply	other threads:[~2026-02-20 19:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19 19:28 Lorenzo Stoakes
2026-02-19 20:25 ` Suren Baghdasaryan
2026-02-20 11:34   ` Lorenzo Stoakes
2026-02-20 15:03 ` Liam R. Howlett
2026-02-20 15:38   ` Lorenzo Stoakes
2026-02-20 19:22     ` Liam R. Howlett [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=iwmfmiswzs7wlubom4unvhed6kxjybdscf5onmgvlfpbf2uxlf@oetdllsyxviw \
    --to=liam.howlett@oracle.com \
    --cc=baohua@kernel.org \
    --cc=chriscli@google.com \
    --cc=david@kernel.org \
    --cc=harry.yoo@oracle.com \
    --cc=jannh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=pfalcato@suse.de \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox