From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 108EDCA9EC4 for ; Wed, 30 Oct 2019 08:40:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BC56C20874 for ; Wed, 30 Oct 2019 08:40:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="Nesn8HlZ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BC56C20874 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 66B5F6B0003; Wed, 30 Oct 2019 04:40:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 61C546B0006; Wed, 30 Oct 2019 04:40:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 531B36B0007; Wed, 30 Oct 2019 04:40:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id 332A66B0003 for ; Wed, 30 Oct 2019 04:40:16 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id E0F097582 for ; Wed, 30 Oct 2019 08:40:15 +0000 (UTC) X-FDA: 76099803990.23.group82_76a39a962ec3a X-HE-Tag: group82_76a39a962ec3a X-Filterd-Recvd-Size: 6516 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Wed, 30 Oct 2019 08:40:15 +0000 (UTC) Received: from rapoport-lnx (190.228.71.37.rev.sfr.net [37.71.228.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 04AFE20856; Wed, 30 Oct 2019 08:40:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1572424814; bh=/nCth3tAN3zo9YpLVlMHUApCRm+0ZcTx184ZTc9xlgM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Nesn8HlZl7oj3aP21LaeInYQ+pjC5N1ru/Ic4LIsTmwnHs/f8XRQSOkx71rnYe1Au RSpTABrOh3VDdlU2zTYQNbqqVTXVwsnRNGFTBvg7cUfwoi8YwW1psOJ5Z6Fbj5mZP2 f+9qYKTxXa82mHdjLJcKrDzqXxxuQtR9XPh7IhV0= Date: Wed, 30 Oct 2019 09:40:06 +0100 From: Mike Rapoport To: Andy Lutomirski Cc: LKML , Alexey Dobriyan , Andrew Morton , Arnd Bergmann , Borislav Petkov , Dave Hansen , James Bottomley , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Linux API , Linux-MM , X86 ML , Mike Rapoport Subject: Re: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings Message-ID: <20191030084005.GC20624@rapoport-lnx> References: <1572171452-7958-1-git-send-email-rppt@kernel.org> <20191029093254.GE18773@rapoport-lnx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 29, 2019 at 10:00:55AM -0700, Andy Lutomirski wrote: > On Tue, Oct 29, 2019 at 2:33 AM Mike Rapoport wrote: > > > > On Mon, Oct 28, 2019 at 02:44:23PM -0600, Andy Lutomirski wrote: > > > > > > > On Oct 27, 2019, at 4:17 AM, Mike Rapoport wrot= e: > > > > > > > > =EF=BB=BFFrom: Mike Rapoport > > > > > > > > Hi, > > > > > > > > The patch below aims to allow applications to create mappins that= have > > > > pages visible only to the owning process. Such mappings could be = used to > > > > store secrets so that these secrets are not visible neither to ot= her > > > > processes nor to the kernel. > > > > > > > > I've only tested the basic functionality, the changes should be v= erified > > > > against THP/migration/compaction. Yet, I'd appreciate early feedb= ack. > > > > > > I=E2=80=99ve contemplated the concept a fair amount, and I think yo= u should > > > consider a change to the API. In particular, rather than having it = be a > > > MAP_ flag, make it a chardev. You can, at least at first, allow on= ly > > > MAP_SHARED, and admins can decide who gets to use it. It might als= o play > > > better with the VM overall, and you won=E2=80=99t need a VM_ flag f= or it =E2=80=94 you > > > can just wire up .fault to do the right thing. > > > > I think mmap()/mprotect()/madvise() are the natural APIs for such > > interface. >=20 > Then you have a whole bunch of questions to answer. For example: >=20 > What happens if you mprotect() or similar when the mapping is already > in use in a way that's incompatible with MAP_EXCLUSIVE? Then we refuse to mprotect()? Like in any other case when vm_flags are no= t compatible with required madvise()/mprotect() operation. > Is it actually reasonable to malloc() some memory and then make it excl= usive? >=20 > Are you permitted to map a file MAP_EXCLUSIVE? What does it mean? I'd limit MAP_EXCLUSIVE only to anonymous memory. > What does MAP_PRIVATE | MAP_EXCLUSIVE do? My preference is to have only mmap() and then the semantics is more clear= : MAP_PRIVATE | MAP_EXCLUSIVE creates a pre-populated region, marks it lock= ed and drops the pages in this region from the direct map. The pages are returned back on munmap().=20 Then there is no way to change an existing area to be exclusive or vice versa. > How does one pass exclusive memory via SCM_RIGHTS? (If it's a > memfd-like or chardev interface, it's trivial. mmap(), not so much.) Why passing such memory via SCM_RIGHTS would be useful? =20 > And finally, there's my personal giant pet peeve: a major use of this > will be for virtualization. I suspect that a lot of people would like > the majority of KVM guest memory to be unmapped from the host > pagetables. But people might also like for guest memory to be > unmapped in *QEMU's* pagetables, and mmap() is a basically worthless > interface for this. Getting fd-backed memory into a guest will take > some possibly major work in the kernel, but getting vma-backed memory > into a guest without mapping it in the host user address space seems > much, much worse. Well, in my view, the MAP_EXCLUSIVE is intended to keep small secrets rather than use it for the entire guest memory. I even considered adding = a limit for the mapping size, but then I decided that since RLIMIT_MEMLOCK = is anyway enforced there is no need for a new one. I agree that getting fd-backed memory into a guest would be less pain tha= t VMA, but KVM can already use memory outside the control of the kernel via /dev/map [1]. So unless I'm missing something here, there is no need to use MAP_EXCLUSI= VE for the guest memory. [1] https://lwn.net/Articles/778240/ > > Switching to a chardev doesn't solve the major problem of direct > > map fragmentation and defeats the ability to use exclusive memory map= pings > > with the existing allocators, while mprotect() and madvise() do not. > > >=20 > Will people really want to do malloc() and then remap it exclusive? > This sounds dubiously useful at best. Again, my preference is to have mmap() only, but I see a value in this us= e case as well. Application developers allocate memory and then sometimes change its properties rather than go mmap() something. For such usage mprotect() may be usefull. --=20 Sincerely yours, Mike.