linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Yznaga <anthony.yznaga@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: willy@infradead.org, markhemm@googlemail.com,
	viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org,
	jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com,
	kirill@shutemov.name, luto@kernel.org, brauner@kernel.org,
	arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com,
	mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com,
	hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev,
	shakeel.butt@linux.dev, muchun.song@linux.dev,
	tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org,
	linux-doc@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mhiramat@kernel.org, rostedt@goodmis.org,
	vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com,
	neilb@suse.de, maz@kernel.org
Subject: Re: [PATCH 00/20] Add support for shared PTEs across processes
Date: Mon, 27 Jan 2025 15:59:30 -0800	[thread overview]
Message-ID: <4ac061cf-9b98-4831-9058-a3cb0e743dd6@oracle.com> (raw)
In-Reply-To: <20250127143339.b1f6b6d5586f319762c5e516@linux-foundation.org>


On 1/27/25 2:33 PM, Andrew Morton wrote:
> On Fri, 24 Jan 2025 15:54:34 -0800 Anthony Yznaga <anthony.yznaga@oracle.com> wrote:
>
>> Memory pages shared between processes require page table entries
>> (PTEs) for each process. Each of these PTEs consume some of
>> the memory and as long as the number of mappings being maintained
>> is small enough, this space consumed by page tables is not
>> objectionable. When very few memory pages are shared between
>> processes, the number of PTEs to maintain is mostly constrained by
>> the number of pages of memory on the system. As the number of shared
>> pages and the number of times pages are shared goes up, amount of
>> memory consumed by page tables starts to become significant. This
>> issue does not apply to threads. Any number of threads can share the
>> same pages inside a process while sharing the same PTEs. Extending
>> this same model to sharing pages across processes can eliminate this
>> issue for sharing across processes as well.
>>
>> ...
>>
>> API
>> ===
>>
>> mshare does not introduce a new API. It instead uses existing APIs
>> to implement page table sharing. The steps to use this feature are:
>>
>> 1. Mount msharefs on /sys/fs/mshare -
>>          mount -t msharefs msharefs /sys/fs/mshare
>>
>> 2. mshare regions have alignment and size requirements. Start
>>     address for the region must be aligned to an address boundary and
>>     be a multiple of fixed size. This alignment and size requirement
>>     can be obtained by reading the file /sys/fs/mshare/mshare_info
>>     which returns a number in text format. mshare regions must be
>>     aligned to this boundary and be a multiple of this size.
>>
>> 3. For the process creating an mshare region:
>>          a. Create a file on /sys/fs/mshare, for example -
>>                  fd = open("/sys/fs/mshare/shareme",
>>                                  O_RDWR|O_CREAT|O_EXCL, 0600);
>>
>>          b. Establish the starting address and size of the region
>>                  struct mshare_info minfo;
>>
>>                  minfo.start = TB(2);
>>                  minfo.size = BUFFER_SIZE;
>>                  ioctl(fd, MSHAREFS_SET_SIZE, &minfo)
>>
>>          c. Map some memory in the region
>>                  struct mshare_create mcreate;
>>
>>                  mcreate.addr = TB(2);
>>                  mcreate.size = BUFFER_SIZE;
>>                  mcreate.offset = 0;
>>                  mcreate.prot = PROT_READ | PROT_WRITE;
>>                  mcreate.flags = MAP_ANONYMOUS | MAP_SHARED | MAP_FIXED;
>>                  mcreate.fd = -1;
>>
>>                  ioctl(fd, MSHAREFS_CREATE_MAPPING, &mcreate)
> I'm not really understanding why step a exists.  It's basically an
> mmap() so why can't this be done within step d?

One way to think of it is that step d establishes a window to the mshare 
region and the objects mapped within it.

Discussions on earlier iterations of mshare pushed back strongly on 
introducing special casing in the mmap path to redirect mmaps that fell 
within an mshare region to map into an mshare mm. Even then it gets 
messier for munmap, i.e. does an unmap of the whole range mean unmap the 
window or unmap the objects within it.

>
>>          d. Map the mshare region into the process
>>                  mmap((void *)TB(2), BUF_SIZE, PROT_READ | PROT_WRITE,
>>                          MAP_SHARED, fd, 0);
>>
>>          e. Write and read to mshared region normally.
>>
>> 4. For processes attaching an mshare region:
>>          a. Open the file on msharefs, for example -
>>                  fd = open("/sys/fs/mshare/shareme", O_RDWR);
>>
>>          b. Get information about mshare'd region from the file:
>>                  struct mshare_info minfo;
>>
>>                  ioctl(fd, MSHAREFS_GET_SIZE, &minfo);
>>
>>          c. Map the mshare'd region into the process
>>                  mmap(minfo.start, minfo.size,
>>                          PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>>
>> 5. To delete the mshare region -
>>                  unlink("/sys/fs/mshare/shareme");
>>
> The userspace intergace is the thing we should initially consider.  I'm
> having ancient memories of hugetlbfs.  Over time it was seen that
> hugetlbfs was too standalone and huge pages became more (and more (and
> more (and more))) integrated into regular MM code.  Can we expect a
> similar evolution with pte-shared memory and if so, is this the correct
> interface to be starting out with?

I don't know. This is an approach that has been refined through a number 
of discussions, but I'm certainly open to alternatives.


Anthony



  reply	other threads:[~2025-01-28  0:00 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-24 23:54 Anthony Yznaga
2025-01-24 23:54 ` [PATCH 01/20] mm: Add msharefs filesystem Anthony Yznaga
2025-01-25  3:13   ` Randy Dunlap
2025-01-25 20:05     ` Anthony Yznaga
2025-01-25 21:10       ` Matthew Wilcox
2025-01-27 17:01         ` Anthony Yznaga
2025-02-04  1:52   ` Bagas Sanjaya
2025-02-04 16:41     ` Anthony Yznaga
2025-01-24 23:54 ` [PATCH 02/20] mm/mshare: pre-populate msharefs with information file Anthony Yznaga
2025-01-24 23:54 ` [PATCH 03/20] mm/mshare: make msharefs writable and support directories Anthony Yznaga
2025-01-24 23:54 ` [PATCH 04/20] mm/mshare: allocate an mm_struct for msharefs files Anthony Yznaga
2025-01-24 23:54 ` [PATCH 05/20] mm/mshare: Add ioctl support Anthony Yznaga
2025-01-24 23:54 ` [PATCH 06/20] mm/mshare: Add a vma flag to indicate an mshare region Anthony Yznaga
2025-01-24 23:54 ` [PATCH 07/20] mm/mshare: Add mmap support Anthony Yznaga
2025-01-24 23:54 ` [PATCH 08/20] mm/mshare: flush all TLBs when updating PTEs in an mshare range Anthony Yznaga
2025-01-24 23:54 ` [PATCH 09/20] sched/numa: do not scan msharefs vmas Anthony Yznaga
2025-01-24 23:54 ` [PATCH 10/20] mm: add mmap_read_lock_killable_nested() Anthony Yznaga
2025-01-24 23:54 ` [PATCH 11/20] mm: add and use unmap_page_range vm_ops hook Anthony Yznaga
2025-01-24 23:54 ` [PATCH 12/20] mm/mshare: prepare for page table sharing support Anthony Yznaga
2025-01-24 23:54 ` [PATCH 13/20] x86/mm: enable page table sharing Anthony Yznaga
2025-01-24 23:54 ` [PATCH 14/20] mm: create __do_mmap() to take an mm_struct * arg Anthony Yznaga
2025-01-24 23:54 ` [PATCH 15/20] mm: pass the mm in vma_munmap_struct Anthony Yznaga
2025-01-24 23:54 ` [PATCH 16/20] mshare: add MSHAREFS_CREATE_MAPPING Anthony Yznaga
2025-01-24 23:54 ` [PATCH 17/20] mshare: add MSHAREFS_UNMAP Anthony Yznaga
2025-01-24 23:54 ` [PATCH 18/20] mm/mshare: provide a way to identify an mm as an mshare host mm Anthony Yznaga
2025-01-24 23:54 ` [PATCH 19/20] mm/mshare: get memcg from current->mm instead of mshare mm Anthony Yznaga
2025-01-24 23:54 ` [PATCH 20/20] mm/mshare: associate a mem cgroup with an mshare file Anthony Yznaga
2025-01-27 22:33 ` [PATCH 00/20] Add support for shared PTEs across processes Andrew Morton
2025-01-27 23:59   ` Anthony Yznaga [this message]
2025-01-28  9:21   ` David Hildenbrand
2025-01-28  7:11 ` Bagas Sanjaya
2025-01-28 19:53   ` Anthony Yznaga
2025-01-28  9:36 ` David Hildenbrand
2025-01-28 19:40   ` Anthony Yznaga
2025-01-29  0:11 ` Andrew Morton
2025-01-29  0:25   ` Anthony Yznaga
2025-01-29  0:59     ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ac061cf-9b98-4831-9058-a3cb0e743dd6@oracle.com \
    --to=anthony.yznaga@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=jthoughton@google.com \
    --cc=khalid@kernel.org \
    --cc=kirill@shutemov.name \
    --cc=liam.howlett@oracle.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=luto@kernel.org \
    --cc=markhemm@googlemail.com \
    --cc=maz@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=neilb@suse.de \
    --cc=pcc@google.com \
    --cc=peterz@infradead.org \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=shakeel.butt@linux.dev \
    --cc=tglx@linutronix.de \
    --cc=vasily.averin@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xhao@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox