From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DA67C433E2 for ; Mon, 7 Sep 2020 20:44:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9AD8921582 for ; Mon, 7 Sep 2020 20:44:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="slVWam5L" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9AD8921582 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CC68E6B0003; Mon, 7 Sep 2020 16:44:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C76076B0037; Mon, 7 Sep 2020 16:44:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B40688E0001; Mon, 7 Sep 2020 16:44:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0132.hostedemail.com [216.40.44.132]) by kanga.kvack.org (Postfix) with ESMTP id 9AF256B0003 for ; Mon, 7 Sep 2020 16:44:04 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5A7DD8248047 for ; Mon, 7 Sep 2020 20:44:04 +0000 (UTC) X-FDA: 77237442408.18.slip49_4a092f6270cf Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 2F0E3100ED0FD for ; Mon, 7 Sep 2020 20:44:04 +0000 (UTC) X-HE-Tag: slip49_4a092f6270cf X-Filterd-Recvd-Size: 6731 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Sep 2020 20:44:03 +0000 (UTC) Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6F3D621941 for ; Mon, 7 Sep 2020 20:44:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1599511442; bh=MCVTKtLEHzP4Rwjmh0FawcGIlox9ccdzsQ40S2dyu/c=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=slVWam5LIC5G07eLJ8otjUmWPLdKHVH/EedkIQw7Plub564NnfNMdHijyx4JjTSP7 GIvzW1Kd+UvoKW55L99oeLV4mT0RyigAYFBSe1qOXPFalxQ+L6kcfEVeI19nbzay2h EnwiP5b0PLVp3WaIgmSKOFu4hMDeuYjh/teHp+vM= Received: by mail-wm1-f50.google.com with SMTP id a9so15215900wmm.2 for ; Mon, 07 Sep 2020 13:44:02 -0700 (PDT) X-Gm-Message-State: AOAM5316ZY9Ijr6hGg+IrOXoizOUb76V59EskqPsilAyvRnnuaNKhlqC 2Bv4SXobhW6DNcZPzcaOQwL8ndCJYfZdxIuCjg8+3Q== X-Google-Smtp-Source: ABdhPJzbwOk7GA9l9UAdPTz5URhOCp3ZUuG6FeZGVy45FAugqBEdQtPbk7kKiz9DitWp0Qb3raiOHQ9iX2aBHOI0HUk= X-Received: by 2002:a05:600c:2183:: with SMTP id e3mr1063353wme.49.1599511440878; Mon, 07 Sep 2020 13:44:00 -0700 (PDT) MIME-Version: 1.0 References: <20200904113116.20648-1-alazar@bitdefender.com> <20200907150547.hst4luvrpntdb3lr@wittgenstein> In-Reply-To: <20200907150547.hst4luvrpntdb3lr@wittgenstein> From: Andy Lutomirski Date: Mon, 7 Sep 2020 13:43:48 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RESEND RFC PATCH 0/5] Remote mapping To: Christian Brauner Cc: =?UTF-8?Q?Adalbert_Laz=C4=83r?= , Linux-MM , Linux API , Andrew Morton , Alexander Graf , Stefan Hajnoczi , Jerome Glisse , Paolo Bonzini , =?UTF-8?B?TWloYWkgRG9uyJt1?= , Mircea Cirjaliu , Andy Lutomirski , Arnd Bergmann , Sargun Dhillon , Aleksa Sarai , Oleg Nesterov , Jann Horn , Kees Cook , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2F0E3100ED0FD X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 7, 2020 at 8:05 AM Christian Brauner wrote: > > On Fri, Sep 04, 2020 at 02:31:11PM +0300, Adalbert Laz=C4=83r wrote: > > This patchset adds support for the remote mapping feature. > > Remote mapping, as its name suggests, is a means for transparent and > > zero-copy access of a remote process' address space. > > access of a remote process' address space. > > > > The feature was designed according to a specification suggested by > > Paolo Bonzini: > > >> The proposed API is a new pidfd system call, through which the paren= t > > >> can map portions of its virtual address space into a file descriptor > > >> and then pass that file descriptor to a child. > > >> > > >> This should be: > > >> > > >> - upstreamable, pidfd is the new cool thing and we could sell it as = a > > >> better way to do PTRACE_{PEEK,POKE}DATA > > In all honesty, that sentence made me a bit uneasy as it reads like this > is implemented on top of pidfds because it makes it more likely to go > upstream not because it is the right design. To be clear, I'm not > implying any sort of malicious intent on your part but I would suggest > to phrase this a little better. :) I thought about this whole thing some more, and here are some thoughts. First, I was nervous about two things. One was faulting in pages from the wrong context. (When a normal page fault or KVM faults in a page, the mm is loaded. (In the KVM case, the mm is sort of not loaded when the actual fault happens, but the mm is loaded when the fault is handled, I think. Maybe there are workqueues involved and I'm wrong.) When a remote mapping faults in a page, the mm is *not* loaded.) This ought not to be a problem, though -- get_user_pages_remote() also faults in pages from a non-current mm, and that's at least supposed to work correctly, so maybe this is okay. Second is recursion. I think this is a genuine problem. And I think that tying this to pidfds is the wrong approach. In fact, tying it to processes at all seems wrong. There is a lot of demand for various forms of memory isolation in which memory is mapped only by its intended user. Using something tied to a process mm gets in the way of this in the same way that KVM's current mapping model gets in the way. All that being said, I think the whole idea of making fancy address spaces composed from other mappable objects is neat and possibly quite useful. And, if you squint a bit, this is a lot like what KVM does today. So I suggest something that may be more generally useful as an alternative. This is a sketch and very subject to bikeshedding: Create an empty address space: int address_space_create(int flags, etc); Map an fd into an address space: int address_space_mmap(int asfd, int fd_to_map, offset, size, prot, ...); /* might run out of args here */ Unmap from an address space: int address_space_munmap(int asfd, unsigned long addr, unsigned long len); Stick an address space into KVM: ioctl(vmfd, KVM_MAP_ADDRESS_SPACE, asfd); /* or similar */ Maybe some day allow mapping an address space into a process. mmap(..., asfd, ...); And at least for now, there's a rule that an address space that is address_space_mmapped into an address space is disallowed. Maybe some day we also allow mremap(), madvise(), etc. And maybe some day we allow creating a special address_space that represents a real process's address space. Under the hood, an address_space could own an mm_struct that is not used by any tasks. And we could have special memfds that are bound to a VM such that all you can do with them is stick them into an address_space and map that address_space into the VM in question. For this to work, we would want a special vm_operation for mapping into a VM. What do you all think? Is this useful? Does it solve your problems? Is it a good approach going forward?