From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EADDCC433E9 for ; Thu, 3 Sep 2020 18:07:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 99EA920716 for ; Thu, 3 Sep 2020 18:07:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 99EA920716 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bitdefender.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1B4C86B006C; Thu, 3 Sep 2020 14:07:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 165606B006E; Thu, 3 Sep 2020 14:07:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07CB56B0070; Thu, 3 Sep 2020 14:07:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113]) by kanga.kvack.org (Postfix) with ESMTP id DC0956B006C for ; Thu, 3 Sep 2020 14:07:38 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 2ECFD362E for ; Thu, 3 Sep 2020 18:07:36 +0000 (UTC) X-FDA: 77222532912.06.trees79_2915ae0270ab Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id F0AC610040136 for ; Thu, 3 Sep 2020 18:07:35 +0000 (UTC) X-HE-Tag: trees79_2915ae0270ab X-Filterd-Recvd-Size: 7233 Received: from mx01.bbu.dsd.mx.bitdefender.com (mx01.bbu.dsd.mx.bitdefender.com [91.199.104.161]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Sep 2020 18:07:35 +0000 (UTC) Received: from smtp.bitdefender.com (smtp02.buh.bitdefender.net [10.17.80.76]) by mx01.bbu.dsd.mx.bitdefender.com (Postfix) with ESMTPS id 987D330747BF; Thu, 3 Sep 2020 21:07:33 +0300 (EEST) Received: from localhost (unknown [195.189.155.252]) by smtp.bitdefender.com (Postfix) with ESMTPSA id 70889303EF07; Thu, 3 Sep 2020 21:07:33 +0300 (EEST) From: Adalbert =?iso-8859-2?b?TGF643I=?= Subject: Re: [RFC PATCH 0/5] Remote mapping To: linux-mm@kvack.org Cc: Andrew Morton , Alexander Graf , Stefan Hajnoczi , Jerome Glisse , Paolo Bonzini , Mihai Donțu , Mircea Cirjaliu In-Reply-To: <20200903174730.2685-1-alazar@bitdefender.com> References: <20200903174730.2685-1-alazar@bitdefender.com> Date: Thu, 03 Sep 2020 21:08:00 +0300 Message-ID: <15991564800.2De71.9187@host> User-agent: void Content-Type: text/plain; charset=UTF-8 MIME-Version: 1.0 X-Rspamd-Queue-Id: F0AC610040136 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: CC+=3D Mihai, Mircea On Thu, 3 Sep 2020 20:47:25 +0300, Adalbert Laz=C4=83r wrote: > This patchset adds support for the remote mapping feature. > Remote mapping, as its name suggests, is a means for transparent and > zero-copy access of a remote process' address space. > access of a remote process' address space. >=20 > The feature was designed according to a specification suggested by Paol= o Bonzini: > >> The proposed API is a new pidfd system call, through which the paren= t > >> can map portions of its virtual address space into a file descriptor > >> and then pass that file descriptor to a child. > >> > >> This should be: > >> > >> - upstreamable, pidfd is the new cool thing and we could sell it as = a > >> better way to do PTRACE_{PEEK,POKE}DATA > >> > >> - relatively easy to do based on the bitdefender remote process > >> mapping patches at. > >> > >> - pidfd_mem() takes a pidfd and some flags (which are 0) and returns > >> two file descriptors for respectively the control plane and the memo= ry access. > >> > >> - the control plane accepts three ioctls > >> > >> PIDFD_MEM_MAP takes a struct like > >> > >> struct pidfd_mem_map { > >> uint64_t address; > >> off_t offset; > >> off_t size; > >> int flags; > >> int padding[7]; > >> } > >> > >> After this is done, the memory access fd can be mmap-ed at range > >> [offset, > >> offset+size), and it will read memory from range [address, > >> address+size) of the target descriptor. > >> > >> PIDFD_MEM_UNMAP takes a struct like > >> > >> struct pidfd_mem_unmap { > >> off_t offset; > >> off_t size; > >> } > >> > >> and unmaps the corresponding range of course. > >> > >> Finally PIDFD_MEM_LOCK forbids subsequent PIDFD_MEM_MAP or > >> PIDFD_MEM_UNMAP. For now I think it should just check that the > >> argument is zero, bells and whistles can be added later. > >> > >> - the memory access fd can be mmap-ed as in the bitdefender patches > >> but also accessed with read/write/pread/pwrite/... As in the > >> BitDefender patches, MMU notifiers can be used to adjust any mmap-ed > >> regions when the source address space changes. In this case, > >> PIDFD_MEM_UNMAP could also cause a pre-existing mmap to "disappear". > (it currently doesn't support read/write/pread/pwrite/...) >=20 > The main remote mapping patch also contains the legacy implementation w= hich > creates a region the size of the whole process address space by means o= f the > REMOTE_PROC_MAP ioctl. The user is then free to mmap() any region of th= e > address space it wishes. >=20 > VMAs obtained by mmap()ing memory access fds mirror the contents of the= remote > process address space within the specified range. Pages are installed i= n the > current process page tables at fault time and removed by the mmu_interv= al_notifier > invalidate callbck. No further memory management is involved. > On attempts to access a hole, or if a mapping was removed by PIDFD_MEM_= UNMAP, > or if the remote process address space was reaped by OOM, the remote ma= pping > fault handler returns VM_FAULT_SIGBUS. >=20 > At Bitdefender we are using remote mapping for virtual machine introspe= ction: > - the QEMU running the introspected machine creates the pair of file de= scriptors, > passes the access fd to the introspector QEMU, and uses the control fd = to allow > access to the memslots it creates for its machine > - the QEMU running the introspector machine receives the access fd and = mmap()s > the regions made available, then hotplugs the obtained memory in its ma= chine > Having this setup creates nested invalidate_range_start/end MMU notifie= r calls. >=20 > Patch organization: > - patch 1 allows unmap_page_range() to run without rescheduling > Needed for remote mapping to zap current process page tables when OOM= calls > mmu_notifier_invalidate_range_start_nonblock(&range) >=20 > - patch 2 creates VMA-specific zapping behavior > A remote mapping VMA does not own the pages it maps, so all it has to= do is > clear the PTEs. >=20 > - patch 3 removed MMU notifier lockdep map > It was just incompatible with our use case. >=20 > - patch 4 is the remote mapping implementation >=20 > - patch 5 adds suggested pidfd_mem system call >=20 > Mircea Cirjaliu (5): > mm: add atomic capability to zap_details > mm: let the VMA decide how zap_pte_range() acts on mapped pages > mm/mmu_notifier: remove lockdep map, allow mmu notifier to be used in > nested scenarios > mm/remote_mapping: use a pidfd to access memory belonging to unrelate= d > process > pidfd_mem: implemented remote memory mapping system call >=20 > arch/x86/entry/syscalls/syscall_32.tbl | 1 + > arch/x86/entry/syscalls/syscall_64.tbl | 1 + > include/linux/mm.h | 22 + > include/linux/mmu_notifier.h | 5 +- > include/linux/pid.h | 1 + > include/linux/remote_mapping.h | 22 + > include/linux/syscalls.h | 1 + > include/uapi/asm-generic/unistd.h | 2 + > include/uapi/linux/remote_mapping.h | 36 + > kernel/exit.c | 2 +- > kernel/pid.c | 55 + > mm/Kconfig | 11 + > mm/Makefile | 1 + > mm/memory.c | 193 ++-- > mm/mmu_notifier.c | 19 - > mm/remote_mapping.c | 1273 ++++++++++++++++++++++++ > 16 files changed, 1535 insertions(+), 110 deletions(-) > create mode 100644 include/linux/remote_mapping.h > create mode 100644 include/uapi/linux/remote_mapping.h > create mode 100644 mm/remote_mapping.c >=20 >=20 > CC:Christian Brauner > base-commit: ae83d0b416db002fe95601e7f97f64b59514d936