From: "Adalbert Lazăr" <alazar@bitdefender.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Alexander Graf <graf@amazon.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Jerome Glisse <jglisse@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Mihai DonÈu <mdontu@bitdefender.com>,
Mircea Cirjaliu <mcirjaliu@bitdefender.com>
Subject: Re: [RFC PATCH 0/5] Remote mapping
Date: Thu, 03 Sep 2020 21:08:00 +0300 [thread overview]
Message-ID: <15991564800.2De71.9187@host> (raw)
In-Reply-To: <20200903174730.2685-1-alazar@bitdefender.com>
CC+= Mihai, Mircea
On Thu, 3 Sep 2020 20:47:25 +0300, Adalbert Lazăr <alazar@bitdefender.com> wrote:
> This patchset adds support for the remote mapping feature.
> Remote mapping, as its name suggests, is a means for transparent and
> zero-copy access of a remote process' address space.
> access of a remote process' address space.
>
> The feature was designed according to a specification suggested by Paolo Bonzini:
> >> The proposed API is a new pidfd system call, through which the parent
> >> can map portions of its virtual address space into a file descriptor
> >> and then pass that file descriptor to a child.
> >>
> >> This should be:
> >>
> >> - upstreamable, pidfd is the new cool thing and we could sell it as a
> >> better way to do PTRACE_{PEEK,POKE}DATA
> >>
> >> - relatively easy to do based on the bitdefender remote process
> >> mapping patches at.
> >>
> >> - pidfd_mem() takes a pidfd and some flags (which are 0) and returns
> >> two file descriptors for respectively the control plane and the memory access.
> >>
> >> - the control plane accepts three ioctls
> >>
> >> PIDFD_MEM_MAP takes a struct like
> >>
> >> struct pidfd_mem_map {
> >> uint64_t address;
> >> off_t offset;
> >> off_t size;
> >> int flags;
> >> int padding[7];
> >> }
> >>
> >> After this is done, the memory access fd can be mmap-ed at range
> >> [offset,
> >> offset+size), and it will read memory from range [address,
> >> address+size) of the target descriptor.
> >>
> >> PIDFD_MEM_UNMAP takes a struct like
> >>
> >> struct pidfd_mem_unmap {
> >> off_t offset;
> >> off_t size;
> >> }
> >>
> >> and unmaps the corresponding range of course.
> >>
> >> Finally PIDFD_MEM_LOCK forbids subsequent PIDFD_MEM_MAP or
> >> PIDFD_MEM_UNMAP. For now I think it should just check that the
> >> argument is zero, bells and whistles can be added later.
> >>
> >> - the memory access fd can be mmap-ed as in the bitdefender patches
> >> but also accessed with read/write/pread/pwrite/... As in the
> >> BitDefender patches, MMU notifiers can be used to adjust any mmap-ed
> >> regions when the source address space changes. In this case,
> >> PIDFD_MEM_UNMAP could also cause a pre-existing mmap to "disappear".
> (it currently doesn't support read/write/pread/pwrite/...)
>
> The main remote mapping patch also contains the legacy implementation which
> creates a region the size of the whole process address space by means of the
> REMOTE_PROC_MAP ioctl. The user is then free to mmap() any region of the
> address space it wishes.
>
> VMAs obtained by mmap()ing memory access fds mirror the contents of the remote
> process address space within the specified range. Pages are installed in the
> current process page tables at fault time and removed by the mmu_interval_notifier
> invalidate callbck. No further memory management is involved.
> On attempts to access a hole, or if a mapping was removed by PIDFD_MEM_UNMAP,
> or if the remote process address space was reaped by OOM, the remote mapping
> fault handler returns VM_FAULT_SIGBUS.
>
> At Bitdefender we are using remote mapping for virtual machine introspection:
> - the QEMU running the introspected machine creates the pair of file descriptors,
> passes the access fd to the introspector QEMU, and uses the control fd to allow
> access to the memslots it creates for its machine
> - the QEMU running the introspector machine receives the access fd and mmap()s
> the regions made available, then hotplugs the obtained memory in its machine
> Having this setup creates nested invalidate_range_start/end MMU notifier calls.
>
> Patch organization:
> - patch 1 allows unmap_page_range() to run without rescheduling
> Needed for remote mapping to zap current process page tables when OOM calls
> mmu_notifier_invalidate_range_start_nonblock(&range)
>
> - patch 2 creates VMA-specific zapping behavior
> A remote mapping VMA does not own the pages it maps, so all it has to do is
> clear the PTEs.
>
> - patch 3 removed MMU notifier lockdep map
> It was just incompatible with our use case.
>
> - patch 4 is the remote mapping implementation
>
> - patch 5 adds suggested pidfd_mem system call
>
> Mircea Cirjaliu (5):
> mm: add atomic capability to zap_details
> mm: let the VMA decide how zap_pte_range() acts on mapped pages
> mm/mmu_notifier: remove lockdep map, allow mmu notifier to be used in
> nested scenarios
> mm/remote_mapping: use a pidfd to access memory belonging to unrelated
> process
> pidfd_mem: implemented remote memory mapping system call
>
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> include/linux/mm.h | 22 +
> include/linux/mmu_notifier.h | 5 +-
> include/linux/pid.h | 1 +
> include/linux/remote_mapping.h | 22 +
> include/linux/syscalls.h | 1 +
> include/uapi/asm-generic/unistd.h | 2 +
> include/uapi/linux/remote_mapping.h | 36 +
> kernel/exit.c | 2 +-
> kernel/pid.c | 55 +
> mm/Kconfig | 11 +
> mm/Makefile | 1 +
> mm/memory.c | 193 ++--
> mm/mmu_notifier.c | 19 -
> mm/remote_mapping.c | 1273 ++++++++++++++++++++++++
> 16 files changed, 1535 insertions(+), 110 deletions(-)
> create mode 100644 include/linux/remote_mapping.h
> create mode 100644 include/uapi/linux/remote_mapping.h
> create mode 100644 mm/remote_mapping.c
>
>
> CC:Christian Brauner <christian@brauner.io>
> base-commit: ae83d0b416db002fe95601e7f97f64b59514d936
next prev parent reply other threads:[~2020-09-03 18:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-03 17:47 Adalbert Lazăr
2020-09-03 17:47 ` [RFC PATCH 1/5] mm: add atomic capability to zap_details Adalbert Lazăr
2020-09-03 17:47 ` [RFC PATCH 2/5] mm: let the VMA decide how zap_pte_range() acts on mapped pages Adalbert Lazăr
2020-09-03 17:47 ` [RFC PATCH 3/5] mm/mmu_notifier: remove lockdep map, allow mmu notifier to be used in nested scenarios Adalbert Lazăr
2020-09-03 17:47 ` [RFC PATCH 4/5] mm/remote_mapping: use a pidfd to access memory belonging to unrelated process Adalbert Lazăr
2020-09-03 17:47 ` [RFC PATCH 5/5] pidfd_mem: implemented remote memory mapping system call Adalbert Lazăr
2020-09-03 18:08 ` Adalbert Lazăr [this message]
2020-09-04 9:54 ` [RFC PATCH 0/5] Remote mapping Christian Brauner
2020-09-04 11:34 ` Adalbert Lazăr
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=15991564800.2De71.9187@host \
--to=alazar@bitdefender.com \
--cc=akpm@linux-foundation.org \
--cc=graf@amazon.com \
--cc=jglisse@redhat.com \
--cc=linux-mm@kvack.org \
--cc=mcirjaliu@bitdefender.com \
--cc=mdontu@bitdefender.com \
--cc=pbonzini@redhat.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox