linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Khalid Aziz <khalid.aziz@oracle.com>
To: Barry Song <21cnbao@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Aneesh Kumar <aneesh.kumar@linux.ibm.com>,
	Arnd Bergmann <arnd@arndb.de>, Jonathan Corbet <corbet@lwn.net>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	ebiederm@xmission.com, hagen@jauu.net, jack@suse.cz,
	Kees Cook <keescook@chromium.org>,
	kirill@shutemov.name, kucharsk@gmail.com, linkinjeon@kernel.org,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	longpeng2@huawei.com, Andy Lutomirski <luto@kernel.org>,
	markhemm@googlemail.com, pcc@google.com,
	Mike Rapoport <rppt@kernel.org>,
	sieberf@amazon.com, sjpark@amazon.de,
	Suren Baghdasaryan <surenb@google.com>,
	tst@schoebel-theuer.de, Iurii Zaikin <yzaikin@google.com>
Subject: Re: [PATCH v1 08/14] mm/mshare: Add basic page table sharing using mshare
Date: Tue, 28 Jun 2022 14:11:41 -0600	[thread overview]
Message-ID: <33d1edc7-0846-5ff4-7311-06ceed353972@oracle.com> (raw)
In-Reply-To: <CAGsJ_4xPFu5FCQtNE6cJxbV7kMQXNtzotBFQKC3OkXUOtweyYQ@mail.gmail.com>

On 5/30/22 05:11, Barry Song wrote:
> On Tue, Apr 12, 2022 at 4:07 AM Khalid Aziz <khalid.aziz@oracle.com> wrote:
>>
>>
>> @@ -193,6 +226,8 @@ SYSCALL_DEFINE5(mshare, const char __user *, name, unsigned long, addr,
>>          if (IS_ERR(fname))
>>                  goto err_out;
>>
>> +       end = addr + len;
>> +
>>          /*
>>           * Does this mshare entry exist already? If it does, calling
>>           * mshare with O_EXCL|O_CREAT is an error
>> @@ -205,49 +240,165 @@ SYSCALL_DEFINE5(mshare, const char __user *, name, unsigned long, addr,
>>          inode_lock(d_inode(msharefs_sb->s_root));
>>          dentry = d_lookup(msharefs_sb->s_root, &namestr);
>>          if (dentry && (oflag & (O_EXCL|O_CREAT))) {
>> +               inode = d_inode(dentry);
>>                  err = -EEXIST;
>>                  dput(dentry);
>>                  goto err_unlock_inode;
>>          }
>>
>>          if (dentry) {
>> +               unsigned long mapaddr, prot = PROT_NONE;
>> +
>>                  inode = d_inode(dentry);
>>                  if (inode == NULL) {
>> +                       mmap_write_unlock(current->mm);
>>                          err = -EINVAL;
>>                          goto err_out;
>>                  }
>>                  info = inode->i_private;
>> -               refcount_inc(&info->refcnt);
>>                  dput(dentry);
>> +
>> +               /*
>> +                * Map in the address range as anonymous mappings
>> +                */
>> +               oflag &= (O_RDONLY | O_WRONLY | O_RDWR);
>> +               if (oflag & O_RDONLY)
>> +                       prot |= PROT_READ;
>> +               else if (oflag & O_WRONLY)
>> +                       prot |= PROT_WRITE;
>> +               else if (oflag & O_RDWR)
>> +                       prot |= (PROT_READ | PROT_WRITE);
>> +               mapaddr = vm_mmap(NULL, addr, len, prot,
>> +                               MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, 0);
> 
>  From the perspective of hardware, do we have to use MAP_FIXED to make
> sure those processes sharing PTE
> use the same virtual address for the shared area? or actually we don't
> necessarily need it? as long as the
> upper level pgtable entries point to the same lower level pgtable?

Hi Barry,

Sorry, I didn't mean to ignore this. I was out of commission for the last few weeks.

All processes sharing an mshare region must use the same virtual address otherwise page table entry for those processes 
won't be identical and hence can not be shared. Upper bits of virtual address provide index into various level 
directories. It may be possible to manipulate the various page directories to allow for different virtual addresses 
across processes and get hardware page table walk to work correctly, but that would be complex and potentially error prone.

Thanks,
Khalid


  reply	other threads:[~2022-06-28 20:12 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-11 16:05 [PATCH v1 00/14] Add support for shared PTEs across processes Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 01/14] mm: Add new system calls mshare, mshare_unlink Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 02/14] mm/mshare: Add msharefs filesystem Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 03/14] mm/mshare: Add read for msharefs Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 04/14] mm/mshare: implement mshare_unlink syscall Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 05/14] mm/mshare: Add locking to msharefs syscalls Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 06/14] mm/mshare: Check for mounted filesystem Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 07/14] mm/mshare: Add vm flag for shared PTE Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 08/14] mm/mshare: Add basic page table sharing using mshare Khalid Aziz
2022-04-11 18:48   ` Dave Hansen
2022-04-11 20:39     ` Khalid Aziz
2022-05-30 11:11   ` Barry Song
2022-06-28 20:11     ` Khalid Aziz [this message]
2022-05-31  3:46   ` Barry Song
2022-06-28 20:16     ` Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 09/14] mm/mshare: Do not free PTEs for mshare'd PTEs Khalid Aziz
2022-05-31  4:24   ` Barry Song
2022-06-29 17:38     ` Khalid Aziz
2022-07-03 20:54       ` Andy Lutomirski
2022-07-06 20:33         ` Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 10/14] mm/mshare: Check for mapped vma when mshare'ing existing mshare'd range Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 11/14] mm/mshare: unmap vmas in mshare_unlink Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 12/14] mm/mshare: Add a proc file with mshare alignment/size information Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 13/14] mm/mshare: Enforce mshare'd region permissions Khalid Aziz
2022-04-11 16:05 ` [PATCH v1 14/14] mm/mshare: Copy PTEs to host mm Khalid Aziz
2022-04-11 17:37 ` [PATCH v1 00/14] Add support for shared PTEs across processes Matthew Wilcox
2022-04-11 18:51   ` Dave Hansen
2022-04-11 19:08     ` Matthew Wilcox
2022-04-11 19:52   ` Khalid Aziz
2022-04-11 18:47 ` Dave Hansen
2022-04-11 20:10 ` Eric W. Biederman
2022-04-11 22:21   ` Khalid Aziz
2022-05-30 10:48 ` Barry Song
2022-05-30 11:18   ` David Hildenbrand
2022-05-30 11:49     ` Barry Song
2022-06-29 17:48     ` Khalid Aziz
2022-06-29 17:40   ` Khalid Aziz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33d1edc7-0846-5ff4-7311-06ceed353972@oracle.com \
    --to=khalid.aziz@oracle.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=arnd@arndb.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=hagen@jauu.net \
    --cc=jack@suse.cz \
    --cc=keescook@chromium.org \
    --cc=kirill@shutemov.name \
    --cc=kucharsk@gmail.com \
    --cc=linkinjeon@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longpeng2@huawei.com \
    --cc=luto@kernel.org \
    --cc=markhemm@googlemail.com \
    --cc=pcc@google.com \
    --cc=rppt@kernel.org \
    --cc=sieberf@amazon.com \
    --cc=sjpark@amazon.de \
    --cc=surenb@google.com \
    --cc=tst@schoebel-theuer.de \
    --cc=willy@infradead.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox