linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Adalbert Lazăr" <alazar@bitdefender.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Alexander Graf <graf@amazon.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Mihai Donțu <mdontu@bitdefender.com>,
	Mircea Cirjaliu <mcirjaliu@bitdefender.com>
Subject: Re: [RFC PATCH 0/5] Remote mapping
Date: Thu, 03 Sep 2020 21:08:00 +0300	[thread overview]
Message-ID: <15991564800.2De71.9187@host> (raw)
In-Reply-To: <20200903174730.2685-1-alazar@bitdefender.com>

CC+= Mihai, Mircea

On Thu,  3 Sep 2020 20:47:25 +0300, Adalbert Lazăr <alazar@bitdefender.com> wrote:
> This patchset adds support for the remote mapping feature.
> Remote mapping, as its name suggests, is a means for transparent and
> zero-copy access of a remote process' address space.
> access of a remote process' address space.
> 
> The feature was designed according to a specification suggested by Paolo Bonzini:
> >> The proposed API is a new pidfd system call, through which the parent
> >> can map portions of its virtual address space into a file descriptor
> >> and then pass that file descriptor to a child.
> >>
> >> This should be:
> >>
> >> - upstreamable, pidfd is the new cool thing and we could sell it as a
> >> better way to do PTRACE_{PEEK,POKE}DATA
> >>
> >> - relatively easy to do based on the bitdefender remote process
> >> mapping patches at.
> >>
> >> - pidfd_mem() takes a pidfd and some flags (which are 0) and returns
> >> two file descriptors for respectively the control plane and the memory access.
> >>
> >> - the control plane accepts three ioctls
> >>
> >> PIDFD_MEM_MAP takes a struct like
> >>
> >>     struct pidfd_mem_map {
> >>          uint64_t address;
> >>          off_t offset;
> >>          off_t size;
> >>          int flags;
> >>          int padding[7];
> >>     }
> >>
> >> After this is done, the memory access fd can be mmap-ed at range
> >> [offset,
> >> offset+size), and it will read memory from range [address,
> >> address+size) of the target descriptor.
> >>
> >> PIDFD_MEM_UNMAP takes a struct like
> >>
> >>     struct pidfd_mem_unmap {
> >>          off_t offset;
> >>          off_t size;
> >>     }
> >>
> >> and unmaps the corresponding range of course.
> >>
> >> Finally PIDFD_MEM_LOCK forbids subsequent PIDFD_MEM_MAP or
> >> PIDFD_MEM_UNMAP.  For now I think it should just check that the
> >> argument is zero, bells and whistles can be added later.
> >>
> >> - the memory access fd can be mmap-ed as in the bitdefender patches
> >> but also accessed with read/write/pread/pwrite/...  As in the
> >> BitDefender patches, MMU notifiers can be used to adjust any mmap-ed
> >> regions when the source address space changes.  In this case,
> >> PIDFD_MEM_UNMAP could also cause a pre-existing mmap to "disappear".
> (it currently doesn't support read/write/pread/pwrite/...)
> 
> The main remote mapping patch also contains the legacy implementation which
> creates a region the size of the whole process address space by means of the
> REMOTE_PROC_MAP ioctl. The user is then free to mmap() any region of the
> address space it wishes.
> 
> VMAs obtained by mmap()ing memory access fds mirror the contents of the remote
> process address space within the specified range. Pages are installed in the
> current process page tables at fault time and removed by the mmu_interval_notifier
> invalidate callbck. No further memory management is involved.
> On attempts to access a hole, or if a mapping was removed by PIDFD_MEM_UNMAP,
> or if the remote process address space was reaped by OOM, the remote mapping
> fault handler returns VM_FAULT_SIGBUS.
> 
> At Bitdefender we are using remote mapping for virtual machine introspection:
> - the QEMU running the introspected machine creates the pair of file descriptors,
> passes the access fd to the introspector QEMU, and uses the control fd to allow
> access to the memslots it creates for its machine
> - the QEMU running the introspector machine receives the access fd and mmap()s
> the regions made available, then hotplugs the obtained memory in its machine
> Having this setup creates nested invalidate_range_start/end MMU notifier calls.
> 
> Patch organization:
> - patch 1 allows unmap_page_range() to run without rescheduling
>   Needed for remote mapping to zap current process page tables when OOM calls
>   mmu_notifier_invalidate_range_start_nonblock(&range)
> 
> - patch 2 creates VMA-specific zapping behavior
>   A remote mapping VMA does not own the pages it maps, so all it has to do is
>   clear the PTEs.
> 
> - patch 3 removed MMU notifier lockdep map
>   It was just incompatible with our use case.
> 
> - patch 4 is the remote mapping implementation
> 
> - patch 5 adds suggested pidfd_mem system call
> 
> Mircea Cirjaliu (5):
>   mm: add atomic capability to zap_details
>   mm: let the VMA decide how zap_pte_range() acts on mapped pages
>   mm/mmu_notifier: remove lockdep map, allow mmu notifier to be used in
>     nested scenarios
>   mm/remote_mapping: use a pidfd to access memory belonging to unrelated
>     process
>   pidfd_mem: implemented remote memory mapping system call
> 
>  arch/x86/entry/syscalls/syscall_32.tbl |    1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |    1 +
>  include/linux/mm.h                     |   22 +
>  include/linux/mmu_notifier.h           |    5 +-
>  include/linux/pid.h                    |    1 +
>  include/linux/remote_mapping.h         |   22 +
>  include/linux/syscalls.h               |    1 +
>  include/uapi/asm-generic/unistd.h      |    2 +
>  include/uapi/linux/remote_mapping.h    |   36 +
>  kernel/exit.c                          |    2 +-
>  kernel/pid.c                           |   55 +
>  mm/Kconfig                             |   11 +
>  mm/Makefile                            |    1 +
>  mm/memory.c                            |  193 ++--
>  mm/mmu_notifier.c                      |   19 -
>  mm/remote_mapping.c                    | 1273 ++++++++++++++++++++++++
>  16 files changed, 1535 insertions(+), 110 deletions(-)
>  create mode 100644 include/linux/remote_mapping.h
>  create mode 100644 include/uapi/linux/remote_mapping.h
>  create mode 100644 mm/remote_mapping.c
> 
> 
> CC:Christian Brauner <christian@brauner.io>
> base-commit: ae83d0b416db002fe95601e7f97f64b59514d936


  parent reply	other threads:[~2020-09-03 18:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-03 17:47 Adalbert Lazăr
2020-09-03 17:47 ` [RFC PATCH 1/5] mm: add atomic capability to zap_details Adalbert Lazăr
2020-09-03 17:47 ` [RFC PATCH 2/5] mm: let the VMA decide how zap_pte_range() acts on mapped pages Adalbert Lazăr
2020-09-03 17:47 ` [RFC PATCH 3/5] mm/mmu_notifier: remove lockdep map, allow mmu notifier to be used in nested scenarios Adalbert Lazăr
2020-09-03 17:47 ` [RFC PATCH 4/5] mm/remote_mapping: use a pidfd to access memory belonging to unrelated process Adalbert Lazăr
2020-09-03 17:47 ` [RFC PATCH 5/5] pidfd_mem: implemented remote memory mapping system call Adalbert Lazăr
2020-09-03 18:08 ` Adalbert Lazăr [this message]
2020-09-04  9:54 ` [RFC PATCH 0/5] Remote mapping Christian Brauner
2020-09-04 11:34   ` Adalbert Lazăr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15991564800.2De71.9187@host \
    --to=alazar@bitdefender.com \
    --cc=akpm@linux-foundation.org \
    --cc=graf@amazon.com \
    --cc=jglisse@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mcirjaliu@bitdefender.com \
    --cc=mdontu@bitdefender.com \
    --cc=pbonzini@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox