From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40C37C433E9 for ; Thu, 3 Sep 2020 17:48:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0158F2072A for ; Thu, 3 Sep 2020 17:48:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0158F2072A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bitdefender.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 959066B005C; Thu, 3 Sep 2020 13:48:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 827FA8E0003; Thu, 3 Sep 2020 13:48:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 431E18E0003; Thu, 3 Sep 2020 13:48:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 07BC26B005C for ; Thu, 3 Sep 2020 13:48:12 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id BD0EE1E15 for ; Thu, 3 Sep 2020 17:48:11 +0000 (UTC) X-FDA: 77222483982.21.dolls90_0f09413270ab Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 88F20180442C0 for ; Thu, 3 Sep 2020 17:48:11 +0000 (UTC) X-HE-Tag: dolls90_0f09413270ab X-Filterd-Recvd-Size: 6694 Received: from mx01.bbu.dsd.mx.bitdefender.com (mx01.bbu.dsd.mx.bitdefender.com [91.199.104.161]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Sep 2020 17:48:10 +0000 (UTC) Received: from smtp.bitdefender.com (smtp01.buh.bitdefender.com [10.17.80.75]) by mx01.bbu.dsd.mx.bitdefender.com (Postfix) with ESMTPS id F1ABF30747BF; Thu, 3 Sep 2020 20:48:08 +0300 (EEST) Received: from localhost.localdomain (unknown [195.189.155.252]) by smtp.bitdefender.com (Postfix) with ESMTPSA id C6BD73072785; Thu, 3 Sep 2020 20:48:08 +0300 (EEST) From: =?UTF-8?q?Adalbert=20Laz=C4=83r?= To: linux-mm@kvack.org Cc: Andrew Morton , Alexander Graf , Stefan Hajnoczi , Jerome Glisse , Paolo Bonzini , =?UTF-8?q?Adalbert=20Laz=C4=83r?= Subject: [RFC PATCH 0/5] Remote mapping Date: Thu, 3 Sep 2020 20:47:25 +0300 Message-Id: <20200903174730.2685-1-alazar@bitdefender.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 88F20180442C0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patchset adds support for the remote mapping feature. Remote mapping, as its name suggests, is a means for transparent and zero-copy access of a remote process' address space. access of a remote process' address space. The feature was designed according to a specification suggested by Paolo = Bonzini: >> The proposed API is a new pidfd system call, through which the parent >> can map portions of its virtual address space into a file descriptor >> and then pass that file descriptor to a child. >> >> This should be: >> >> - upstreamable, pidfd is the new cool thing and we could sell it as a >> better way to do PTRACE_{PEEK,POKE}DATA >> >> - relatively easy to do based on the bitdefender remote process >> mapping patches at. >> >> - pidfd_mem() takes a pidfd and some flags (which are 0) and returns >> two file descriptors for respectively the control plane and the memory= access. >> >> - the control plane accepts three ioctls >> >> PIDFD_MEM_MAP takes a struct like >> >> struct pidfd_mem_map { >> uint64_t address; >> off_t offset; >> off_t size; >> int flags; >> int padding[7]; >> } >> >> After this is done, the memory access fd can be mmap-ed at range >> [offset, >> offset+size), and it will read memory from range [address, >> address+size) of the target descriptor. >> >> PIDFD_MEM_UNMAP takes a struct like >> >> struct pidfd_mem_unmap { >> off_t offset; >> off_t size; >> } >> >> and unmaps the corresponding range of course. >> >> Finally PIDFD_MEM_LOCK forbids subsequent PIDFD_MEM_MAP or >> PIDFD_MEM_UNMAP. For now I think it should just check that the >> argument is zero, bells and whistles can be added later. >> >> - the memory access fd can be mmap-ed as in the bitdefender patches >> but also accessed with read/write/pread/pwrite/... As in the >> BitDefender patches, MMU notifiers can be used to adjust any mmap-ed >> regions when the source address space changes. In this case, >> PIDFD_MEM_UNMAP could also cause a pre-existing mmap to "disappear". (it currently doesn't support read/write/pread/pwrite/...) The main remote mapping patch also contains the legacy implementation whi= ch creates a region the size of the whole process address space by means of = the REMOTE_PROC_MAP ioctl. The user is then free to mmap() any region of the address space it wishes. VMAs obtained by mmap()ing memory access fds mirror the contents of the r= emote process address space within the specified range. Pages are installed in = the current process page tables at fault time and removed by the mmu_interval= _notifier invalidate callbck. No further memory management is involved. On attempts to access a hole, or if a mapping was removed by PIDFD_MEM_UN= MAP, or if the remote process address space was reaped by OOM, the remote mapp= ing fault handler returns VM_FAULT_SIGBUS. At Bitdefender we are using remote mapping for virtual machine introspect= ion: - the QEMU running the introspected machine creates the pair of file desc= riptors, passes the access fd to the introspector QEMU, and uses the control fd to= allow access to the memslots it creates for its machine - the QEMU running the introspector machine receives the access fd and mm= ap()s the regions made available, then hotplugs the obtained memory in its mach= ine Having this setup creates nested invalidate_range_start/end MMU notifier = calls. Patch organization: - patch 1 allows unmap_page_range() to run without rescheduling Needed for remote mapping to zap current process page tables when OOM c= alls mmu_notifier_invalidate_range_start_nonblock(&range) - patch 2 creates VMA-specific zapping behavior A remote mapping VMA does not own the pages it maps, so all it has to d= o is clear the PTEs. - patch 3 removed MMU notifier lockdep map It was just incompatible with our use case. - patch 4 is the remote mapping implementation - patch 5 adds suggested pidfd_mem system call Mircea Cirjaliu (5): mm: add atomic capability to zap_details mm: let the VMA decide how zap_pte_range() acts on mapped pages mm/mmu_notifier: remove lockdep map, allow mmu notifier to be used in nested scenarios mm/remote_mapping: use a pidfd to access memory belonging to unrelated process pidfd_mem: implemented remote memory mapping system call arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/mm.h | 22 + include/linux/mmu_notifier.h | 5 +- include/linux/pid.h | 1 + include/linux/remote_mapping.h | 22 + include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 2 + include/uapi/linux/remote_mapping.h | 36 + kernel/exit.c | 2 +- kernel/pid.c | 55 + mm/Kconfig | 11 + mm/Makefile | 1 + mm/memory.c | 193 ++-- mm/mmu_notifier.c | 19 - mm/remote_mapping.c | 1273 ++++++++++++++++++++++++ 16 files changed, 1535 insertions(+), 110 deletions(-) create mode 100644 include/linux/remote_mapping.h create mode 100644 include/uapi/linux/remote_mapping.h create mode 100644 mm/remote_mapping.c CC:Christian Brauner base-commit: ae83d0b416db002fe95601e7f97f64b59514d936