linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pedro Falcato <pfalcato@suse.de>
To: Kalesh Singh <kaleshsingh@google.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	 Anthony Yznaga <anthony.yznaga@oracle.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	 andreyknvl@gmail.com, arnd@arndb.de, bp@alien8.de,
	brauner@kernel.org,  bsegall@google.com, corbet@lwn.net,
	dave.hansen@linux.intel.com,  dietmar.eggemann@arm.com,
	ebiederm@xmission.com, hpa@zytor.com, jakub.wartak@mailbox.org,
	 jannh@google.com, juri.lelli@redhat.com, khalid@kernel.org,
	 liam.howlett@oracle.com, linyongting@bytedance.com,
	lorenzo.stoakes@oracle.com,  luto@kernel.org,
	markhemm@googlemail.com, maz@kernel.org, mhiramat@kernel.org,
	 mgorman@suse.de, mhocko@suse.com, mingo@redhat.com,
	muchun.song@linux.dev,  neilb@suse.de, osalvador@suse.de,
	pcc@google.com, peterz@infradead.org,  rostedt@goodmis.org,
	rppt@kernel.org, shakeel.butt@linux.dev, surenb@google.com,
	 tglx@linutronix.de, vasily.averin@linux.dev, vbabka@suse.cz,
	 vincent.guittot@linaro.org, viro@zeniv.linux.org.uk,
	vschneid@redhat.com,  willy@infradead.org, x86@kernel.org,
	xhao@linux.alibaba.com,  linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	 Isaac Manjarres <isaacmanjarres@google.com>,
	"T.J. Mercier" <tjmercier@google.com>,
	 android-mm <android-mm@google.com>
Subject: Re: [PATCH v3 00/22] Add support for shared PTEs across processes
Date: Thu, 26 Feb 2026 21:22:06 +0000	[thread overview]
Message-ID: <5tdailzxoywzzunbwhtlk4yjfmzunntniqtudkb52q6hib74ql@oq4mi226dedv> (raw)
In-Reply-To: <CAC_TJvdgvyjyJsU4v6W+3tHKx_2e8UMJU3RT2HKLSngcC+yH3Q@mail.gmail.com>

On Wed, Feb 25, 2026 at 03:06:10PM -0800, Kalesh Singh wrote:
> On Tue, Feb 24, 2026 at 1:40 AM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
> >
> > > I believe that managing a pseudo-filesystem (msharefs) and mapping via
> > > ioctl during process creation could introduce overhead that impacts
> > > app startup latency. Ideally, child apps shouldn't be aware of this
> > > sharing or need to manage the pseudo-filesystem on their end.
> > All process must be aware of these special semantics.
> >
> > I'd assume that fork() would simply replicate mshare region into the
> > fork'ed child process. So from that point of view, it's "transparent" as
> > in "no special mshare() handling required after fork".
> 
> Hi David,
> 
> That's agood  point. If fork() simply replicates the mshare region, it
> does achieve transparency in terms of setup.
> 
> I am still concerned about transparency in terms of observability.
> Applications and sometimes inspect their own mappings (from
> /proc/self/maps) to locate specific code or data regions for various
> anti-tamper and obfuscation techniques. [2] If those mappings suddenly
> point to an msharefs pseudo-file instead of the expected shared
> library backing, it may break user-space assumptions and cause
> compatibility issues.

I'm not worried about transparency because this is not supposed to be
transparent. This is not supposed to be used by most core system software.
This is supposed to help replace hugetlb page table sharing.

Transparent page table sharing has other constraints. I like the idea, in
theory, but there are a number of constraints that make the idea unfeasible
for now. There are a couple of problems we need to solve first:

1) Every spot where we modify PTEs needs to be assessed and use different
helpers (that can un-cow page tables). Every pte_offset_map_lock() can now
feasibly fail for OOM reasons (and that also needs to be assessed).

2) Various bits of PTE modification/unmapping now needs special care wrt TLB
invalidation. The kernel needs to be aware of how the page tables are shared.
I don't think the current rmap data structures are well suited to this kind
of stuff (perhaps with Lorenzo's WIP anon rmap rework we'll get something
better). Basically every spot that goes "modify PTE, flush TLB for mm" now
needs to go "modify PTE, for every mm that maps this page table, flush $mm"
(if you're thinking that COW will save us, it technically won't, or shouldn't,
because of stuff like try_to_unmap_one() that is used in reclaim).

3) Reclaim loses even more information as now N processes share the same A
bits. I don't know what effects this can cause. It would require
experimentation. Perhaps something like "if page table is shared, value
pte_young more". I don't know if this can work as a bandaid, but it's not
ideal.

4) It's not known whether page table COW fork() is a real win in most cases,
or all cases. Would want measurement.

5) It becomes even harder to estimate RSS and PSS for each process.

For these reasons (and more, certainly), I don't think working mshare() into
a transparent, all-great thing that fits the zygote model can work. It has been
discussed at length how to pull off certain hard bits like TLB invalidation and
locking for mshare, and with mshare we have the advantage of not needing to
support every feature ever (tailoring it more to the big database users of
hugetlb). And we'll still need to adapt certain bits of arch code just to get
it to work efficiently.

This said, if you want to discuss pulling this off, I'm all ears and it could
be perhaps a fun discussion (too late for LSF, I guess), but I don't think
it's workeable into the current mshare efforts. And, believe me, I would love
a unified feature here :)

-- 
Pedro


      parent reply	other threads:[~2026-02-26 21:22 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-20  1:03 Anthony Yznaga
2025-08-20  1:03 ` [PATCH v3 01/22] mm: Add msharefs filesystem Anthony Yznaga
2025-09-08 18:29   ` Liam R. Howlett
2025-09-08 19:09     ` Anthony Yznaga
2025-09-10 12:14   ` Pedro Falcato
2025-09-10 12:46     ` David Hildenbrand
2025-08-20  1:03 ` [PATCH v3 02/22] mm/mshare: pre-populate msharefs with information file Anthony Yznaga
2025-08-20  1:03 ` [PATCH v3 03/22] mm/mshare: make msharefs writable and support directories Anthony Yznaga
2025-08-20  1:03 ` [PATCH v3 04/22] mm/mshare: allocate an mm_struct for msharefs files Anthony Yznaga
2025-08-20  1:03 ` [PATCH v3 05/22] mm/mshare: add ways to set the size of an mshare region Anthony Yznaga
2025-08-20  1:03 ` [PATCH v3 06/22] mm/mshare: Add a vma flag to indicate " Anthony Yznaga
2025-09-08 18:45   ` David Hildenbrand
2025-09-08 18:56     ` Anthony Yznaga
2025-09-08 19:02       ` David Hildenbrand
2025-09-08 19:03         ` Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 07/22] mm/mshare: Add mmap support Anthony Yznaga
2025-08-20 19:02   ` kernel test robot
2025-08-20  1:04 ` [PATCH v3 08/22] mm/mshare: flush all TLBs when updating PTEs in an mshare range Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 09/22] sched/numa: do not scan msharefs vmas Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 10/22] mm: add mmap_read_lock_killable_nested() Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 11/22] mm: add and use unmap_page_range vm_ops hook Anthony Yznaga
2025-08-21 15:40   ` kernel test robot
2025-08-20  1:04 ` [PATCH v3 12/22] mm: introduce PUD page table shared count Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 13/22] mm/mshare: prepare for page table sharing support Anthony Yznaga
2025-09-15 15:27   ` Lorenzo Stoakes
2025-08-20  1:04 ` [PATCH v3 14/22] x86/mm: enable page table sharing Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 15/22] mm: create __do_mmap() to take an mm_struct * arg Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 16/22] mm: pass the mm in vma_munmap_struct Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 17/22] sched/mshare: mshare ownership Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 18/22] mm/mshare: Add an ioctl for mapping objects in an mshare region Anthony Yznaga
2025-08-20 20:36   ` kernel test robot
2025-08-20  1:04 ` [PATCH v3 19/22] mm/mshare: Add an ioctl for unmapping " Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 20/22] mm/mshare: support mapping files and anon hugetlb " Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 21/22] mm/mshare: provide a way to identify an mm as an mshare host mm Anthony Yznaga
2025-08-20  1:04 ` [PATCH v3 22/22] mm/mshare: charge fault handling allocations to the mshare owner Anthony Yznaga
2025-09-08 18:50   ` David Hildenbrand
2025-09-08 19:21     ` Anthony Yznaga
2025-09-08 20:28       ` David Hildenbrand
2025-09-08 20:55         ` Anthony Yznaga
2025-09-08 20:32 ` [PATCH v3 00/22] Add support for shared PTEs across processes David Hildenbrand
2025-09-08 20:59   ` Matthew Wilcox
2025-09-08 21:14     ` Anthony Yznaga
2025-09-09  7:53       ` David Hildenbrand
2025-09-09 18:29         ` Anthony Yznaga
2025-09-09 19:06         ` Lorenzo Stoakes
2026-02-20 21:35 ` Kalesh Singh
2026-02-21 12:40   ` Pedro Falcato
2026-02-23 17:43     ` Kalesh Singh
2026-02-23 19:55       ` anthony.yznaga
2026-02-25 22:53         ` Kalesh Singh
2026-02-24  9:40   ` David Hildenbrand (Arm)
2026-02-25 23:06     ` Kalesh Singh
2026-02-26  9:02       ` David Hildenbrand (Arm)
2026-02-26 21:22       ` Pedro Falcato [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5tdailzxoywzzunbwhtlk4yjfmzunntniqtudkb52q6hib74ql@oq4mi226dedv \
    --to=pfalcato@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=android-mm@google.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=bsegall@google.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=isaacmanjarres@google.com \
    --cc=jakub.wartak@mailbox.org \
    --cc=jannh@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kaleshsingh@google.com \
    --cc=khalid@kernel.org \
    --cc=liam.howlett@oracle.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linyongting@bytedance.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=luto@kernel.org \
    --cc=markhemm@googlemail.com \
    --cc=maz@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=neilb@suse.de \
    --cc=osalvador@suse.de \
    --cc=pcc@google.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=tjmercier@google.com \
    --cc=vasily.averin@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=vincent.guittot@linaro.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vschneid@redhat.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xhao@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox