linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kalesh Singh <kaleshsingh@google.com>
To: Pedro Falcato <pfalcato@suse.de>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>,
	linux-mm@kvack.org,  akpm@linux-foundation.org,
	andreyknvl@gmail.com, arnd@arndb.de, bp@alien8.de,
	 brauner@kernel.org, bsegall@google.com, corbet@lwn.net,
	 dave.hansen@linux.intel.com, david@redhat.com,
	dietmar.eggemann@arm.com,  ebiederm@xmission.com, hpa@zytor.com,
	jakub.wartak@mailbox.org,  jannh@google.com,
	juri.lelli@redhat.com, khalid@kernel.org,
	 liam.howlett@oracle.com, linyongting@bytedance.com,
	 lorenzo.stoakes@oracle.com, luto@kernel.org,
	markhemm@googlemail.com,  maz@kernel.org, mhiramat@kernel.org,
	mgorman@suse.de, mhocko@suse.com,  mingo@redhat.com,
	muchun.song@linux.dev, neilb@suse.de, osalvador@suse.de,
	 pcc@google.com, peterz@infradead.org, rostedt@goodmis.org,
	rppt@kernel.org,  shakeel.butt@linux.dev, surenb@google.com,
	tglx@linutronix.de,  vasily.averin@linux.dev, vbabka@suse.cz,
	vincent.guittot@linaro.org,  viro@zeniv.linux.org.uk,
	vschneid@redhat.com, willy@infradead.org,  x86@kernel.org,
	xhao@linux.alibaba.com, linux-doc@vger.kernel.org,
	 linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org
Subject: Re: [PATCH v3 00/22] Add support for shared PTEs across processes
Date: Mon, 23 Feb 2026 09:43:03 -0800	[thread overview]
Message-ID: <CAC_TJvdRsfzYohiKW=82N8Ofi5V26rX1GS0M8HeaX6CEsgc+PA@mail.gmail.com> (raw)
In-Reply-To: <fqabaahjjlmoc2xh4cwh4ykbqyu3rnzvjw5epxi5wwpmgqth7f@d3mqpjozwmo4>

On Sat, Feb 21, 2026 at 4:40 AM Pedro Falcato <pfalcato@suse.de> wrote:
>
> On Fri, Feb 20, 2026 at 01:35:58PM -0800, Kalesh Singh wrote:
> > On Tue, Aug 19, 2025 at 6:57 PM Anthony Yznaga
> > <anthony.yznaga@oracle.com> wrote:
> > >
> > > Memory pages shared between processes require page table entries
> > > (PTEs) for each process. Each of these PTEs consume some of
> > > the memory and as long as the number of mappings being maintained
> > > is small enough, this space consumed by page tables is not
> > > objectionable. When very few memory pages are shared between
> > > processes, the number of PTEs to maintain is mostly constrained by
> > > the number of pages of memory on the system. As the number of shared
> > > pages and the number of times pages are shared goes up, amount of
> > > memory consumed by page tables starts to become significant. This
> > > issue does not apply to threads. Any number of threads can share the
> > > same pages inside a process while sharing the same PTEs. Extending
> > > this same model to sharing pages across processes can eliminate this
> > > issue for sharing across processes as well.
> > >
> > > <snip>
> > Hi Anthony,
> >
> > Thanks for continuing to push this forward, and apologies for joining
> > this discussion late. I am likely missing some context from the
> > various previous iterations of this feature, but I'd like to throw
> > another use case into the mix to be considered around the design of
> > the sharing API.
> >
> > We are exploring a similar optimization for Android to reduce page
> > table overhead. In Android, we preload many ELF mappings in the Zygote
> > process to help application launch times. Since the Zygote model is
> > fork-but-no-exec, all applications inherit these mappings, which can
> > result in upwards of 200 MB of redundant page table overhead per
> > device.
>
> This can be solved by simply not using the Zygote model :p Or perhaps
> MADV_DONTNEED/straight up unmapping libraries you don't need in the child's
> side.

I think that's a separate topic, but that model is used on billions of
client devices :) The common runtime for apps and other core system
code is preloaded to significantly reduce app startup latencies.

>
> >
> > I believe that managing a pseudo-filesystem (msharefs) and mapping via
> > ioctl during process creation could introduce overhead that impacts
> > app startup latency. Ideally, child apps shouldn't be aware of this
> > sharing or need to manage the pseudo-filesystem on their end. To
> > achieve this "transparent" sharing, I would prefer Khalid's previous
> > API from his 2022 RFC [1]. By attaching the shared mm directly to the
> > file's address_space and exposing a MAP_SHARED_PT flag, child apps
> > could transparently inherit the shared page tables during fork().
>
> So, we've discussed this before. I initially liked this idea a lot more.
> However, there are a couple of problems here:
>
> 1) mshare (as in the mshare feature) isn't really aiming for transparent here.
> There is e.g a specific need to setup an mshare region, with a few files/anon
> there, and then later mprotect/munmap parts of the region - and have it apply
> on every process that has it mapped. This is why we're aiming for different
> system calls (not ioctls anymore), doing munmap(mshare_reg, 4096) is ambiguous
> as to whether you want to unmap the mshare VMA, or a VMA inside the mshare mm.

Since we are interested in sharing text here, how does this play with
stuff like symbolization for call stacks? I believe this is another
reason where we might want to avoid mapping the pseudo mshare file
wrapper?

>
> 2) Sharing the page table at all (even worse so, Transparently(tm)) is a huge
> pain. TLB shootdown becomes much harder, and rmap as-is isn't suited to deal
> with this case. The way things are going with mshare, the container mm will
> have one single entry in rmap, and then actually doing the shootdown is a
> huuuuge pain (which, fwiw, will probably need a per-mshare TLB workaround),
> because you need to find out and shoot down _every_ mm that has these tables

I agree the TLB shootdowns would be a pain. Perhaps, if there was a
concept of a shared ASID/PCID in the hardware, that would make things
less so ...

> mapped. And then, naturally, since you're sharing page tables, doing A/D bit
> collection on these becomes extremely useless - and that will naturally pose
> problems to the reclaim process if you abuse it.

I think in the use case I described, it would mostly be sharing
MAP_PRIVATE stuff, and the access bit should still apply for global
reclaim. However, I agree it becomes difficult to reason especially if
you throw memcgs into the mix.

Thanks,
Kalesh

>
> 3) other misc problems that make it hard to work transparently (VMA alignment,
> levels which you may or may not want to share, you need to revisit most page
> table walkers in the kernel to get a completely transparent feature, etc)
>
> >
> > Regarding David's and Matthew's discussion on VMA-modifying functions,
> > I would lean towards the standard VMA manipulating APIs should be
> > preferred over custom ioctls to preserve transparency for user-space.
> > Perhaps whether or not these modifications persist across all sharing
> > processes needs to be configurable? It seems that for database
> > workloads, having the updates reflected everywhere would be the
> > desired behavior. In the use case described for Android, we don't want
> > apps to be able to modify these shared ELF mappings. To handle this,
> > it's likely we would do something like mseal() the VMAs in the dynamic
> > loader before forking.
>
> mshare_mseal!
>
> >
> > Perhaps we could decouple the core sharing logic from the sharing API
> > itself? Since the sharing interface seems one of  the main areas where
> > we don't have a good consensus yet, perhaps we could land the core
> > sharing logic first. Keeping the core infrastructure generic would
>
> I think the core infrastructure is relatively generic (at least the
> small core mm modifications to get this to even work) already, but
> perhaps Anthony can comment on that.
>
> --
> Pedro


  reply	other threads:[~2026-02-23 17:43 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-20  1:03 Anthony Yznaga
2025-08-20  1:03 ` [PATCH v3 01/22] mm: Add msharefs filesystem Anthony Yznaga
2025-09-08 18:29   ` Liam R. Howlett
2025-09-08 19:09     ` Anthony Yznaga
2025-09-10 12:14   ` Pedro Falcato
2025-09-10 12:46     ` David Hildenbrand
2025-08-20  1:03 ` [PATCH v3 02/22] mm/mshare: pre-populate msharefs with information file Anthony Yznaga
2025-08-20  1:03 ` [PATCH v3 03/22] mm/mshare: make msharefs writable and support directories Anthony Yznaga
2025-08-20  1:03 ` [PATCH v3 04/22] mm/mshare: allocate an mm_struct for msharefs files Anthony Yznaga
2025-08-20  1:03 ` [PATCH v3 05/22] mm/mshare: add ways to set the size of an mshare region Anthony Yznaga
2025-08-20  1:03 ` [PATCH v3 06/22] mm/mshare: Add a vma flag to indicate " Anthony Yznaga
2025-09-08 18:45   ` David Hildenbrand
2025-09-08 18:56     ` Anthony Yznaga
2025-09-08 19:02       ` David Hildenbrand
2025-09-08 19:03         ` Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 07/22] mm/mshare: Add mmap support Anthony Yznaga
2025-08-20 19:02   ` kernel test robot
2025-08-20  1:04 ` [PATCH v3 08/22] mm/mshare: flush all TLBs when updating PTEs in an mshare range Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 09/22] sched/numa: do not scan msharefs vmas Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 10/22] mm: add mmap_read_lock_killable_nested() Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 11/22] mm: add and use unmap_page_range vm_ops hook Anthony Yznaga
2025-08-21 15:40   ` kernel test robot
2025-08-20  1:04 ` [PATCH v3 12/22] mm: introduce PUD page table shared count Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 13/22] mm/mshare: prepare for page table sharing support Anthony Yznaga
2025-09-15 15:27   ` Lorenzo Stoakes
2025-08-20  1:04 ` [PATCH v3 14/22] x86/mm: enable page table sharing Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 15/22] mm: create __do_mmap() to take an mm_struct * arg Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 16/22] mm: pass the mm in vma_munmap_struct Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 17/22] sched/mshare: mshare ownership Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 18/22] mm/mshare: Add an ioctl for mapping objects in an mshare region Anthony Yznaga
2025-08-20 20:36   ` kernel test robot
2025-08-20  1:04 ` [PATCH v3 19/22] mm/mshare: Add an ioctl for unmapping " Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 20/22] mm/mshare: support mapping files and anon hugetlb " Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 21/22] mm/mshare: provide a way to identify an mm as an mshare host mm Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 22/22] mm/mshare: charge fault handling allocations to the mshare owner Anthony Yznaga
2025-09-08 18:50   ` David Hildenbrand
2025-09-08 19:21     ` Anthony Yznaga
2025-09-08 20:28       ` David Hildenbrand
2025-09-08 20:55         ` Anthony Yznaga
2025-09-08 20:32 ` [PATCH v3 00/22] Add support for shared PTEs across processes David Hildenbrand
2025-09-08 20:59   ` Matthew Wilcox
2025-09-08 21:14     ` Anthony Yznaga
2025-09-09  7:53       ` David Hildenbrand
2025-09-09 18:29         ` Anthony Yznaga
2025-09-09 19:06         ` Lorenzo Stoakes
2026-02-20 21:35 ` Kalesh Singh
2026-02-21 12:40   ` Pedro Falcato
2026-02-23 17:43     ` Kalesh Singh [this message]
2026-02-23 19:55       ` anthony.yznaga

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAC_TJvdRsfzYohiKW=82N8Ofi5V26rX1GS0M8HeaX6CEsgc+PA@mail.gmail.com' \
    --to=kaleshsingh@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=bsegall@google.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=jakub.wartak@mailbox.org \
    --cc=jannh@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=khalid@kernel.org \
    --cc=liam.howlett@oracle.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linyongting@bytedance.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=luto@kernel.org \
    --cc=markhemm@googlemail.com \
    --cc=maz@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=neilb@suse.de \
    --cc=osalvador@suse.de \
    --cc=pcc@google.com \
    --cc=peterz@infradead.org \
    --cc=pfalcato@suse.de \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=vasily.averin@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=vincent.guittot@linaro.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vschneid@redhat.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xhao@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox