From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5DDDCA9EAE for ; Tue, 29 Oct 2019 17:01:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 671A220874 for ; Tue, 29 Oct 2019 17:01:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="N6emyXAY" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 671A220874 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E5BCD6B0005; Tue, 29 Oct 2019 13:01:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DE40B6B0008; Tue, 29 Oct 2019 13:01:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CAB0C6B000A; Tue, 29 Oct 2019 13:01:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0225.hostedemail.com [216.40.44.225]) by kanga.kvack.org (Postfix) with ESMTP id A28076B0005 for ; Tue, 29 Oct 2019 13:01:10 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 46F36181AC9BF for ; Tue, 29 Oct 2019 17:01:10 +0000 (UTC) X-FDA: 76097437500.21.start29_4458d0aa53e4c X-HE-Tag: start29_4458d0aa53e4c X-Filterd-Recvd-Size: 5472 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 29 Oct 2019 17:01:09 +0000 (UTC) Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 479D320874 for ; Tue, 29 Oct 2019 17:01:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1572368468; bh=CEfNBOwII3kRKjNTV1QnlKd06oaG8CbrkIl/0qlzYVo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=N6emyXAYC5WBQ7AvxSe6P55Xl9opDB+7C57zvmZJu7bVCc7PBul5us16uzwhjJTFA NHys/4plqEbAZzJwfMckTzu6HCq3wTeXBAc+u5AXQ5OFwhCAUYq4YdcFaX8x7PQSNA ZN6Nj83uTPfFV40eqHYMzyxKd1P1s+yHFNQDIGm0= Received: by mail-wr1-f42.google.com with SMTP id a15so2195291wrf.9 for ; Tue, 29 Oct 2019 10:01:08 -0700 (PDT) X-Gm-Message-State: APjAAAV8gYyC8wMnkm8zndFSFzCPnf1Q77ijpqrksnejakRH3F3dvfJz y9Siel/Tl4jdgsm4pOkKdHlF5Db0Jp2qJLTdNmS3PQ== X-Google-Smtp-Source: APXvYqxXiNG2A88b6n9p+z8Zlwh9YkfuUsw66nu2TCU6joSFqbJXpVXXmh37z2G7Eps4xr3l8sMAkLs+7YfDYfVvLGo= X-Received: by 2002:adf:f04e:: with SMTP id t14mr21415495wro.106.1572368466787; Tue, 29 Oct 2019 10:01:06 -0700 (PDT) MIME-Version: 1.0 References: <1572171452-7958-1-git-send-email-rppt@kernel.org> <20191029093254.GE18773@rapoport-lnx> In-Reply-To: <20191029093254.GE18773@rapoport-lnx> From: Andy Lutomirski Date: Tue, 29 Oct 2019 10:00:55 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings To: Mike Rapoport Cc: LKML , Alexey Dobriyan , Andrew Morton , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Dave Hansen , James Bottomley , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Linux API , Linux-MM , X86 ML , Mike Rapoport Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 29, 2019 at 2:33 AM Mike Rapoport wrote: > > On Mon, Oct 28, 2019 at 02:44:23PM -0600, Andy Lutomirski wrote: > > > > > On Oct 27, 2019, at 4:17 AM, Mike Rapoport wrote: > > > > > > =EF=BB=BFFrom: Mike Rapoport > > > > > > Hi, > > > > > > The patch below aims to allow applications to create mappins that hav= e > > > pages visible only to the owning process. Such mappings could be used= to > > > store secrets so that these secrets are not visible neither to other > > > processes nor to the kernel. > > > > > > I've only tested the basic functionality, the changes should be verif= ied > > > against THP/migration/compaction. Yet, I'd appreciate early feedback. > > > > I=E2=80=99ve contemplated the concept a fair amount, and I think you sh= ould > > consider a change to the API. In particular, rather than having it be a > > MAP_ flag, make it a chardev. You can, at least at first, allow only > > MAP_SHARED, and admins can decide who gets to use it. It might also pl= ay > > better with the VM overall, and you won=E2=80=99t need a VM_ flag for i= t =E2=80=94 you > > can just wire up .fault to do the right thing. > > I think mmap()/mprotect()/madvise() are the natural APIs for such > interface. Then you have a whole bunch of questions to answer. For example: What happens if you mprotect() or similar when the mapping is already in use in a way that's incompatible with MAP_EXCLUSIVE? Is it actually reasonable to malloc() some memory and then make it exclusiv= e? Are you permitted to map a file MAP_EXCLUSIVE? What does it mean? What does MAP_PRIVATE | MAP_EXCLUSIVE do? How does one pass exclusive memory via SCM_RIGHTS? (If it's a memfd-like or chardev interface, it's trivial. mmap(), not so much.) And finally, there's my personal giant pet peeve: a major use of this will be for virtualization. I suspect that a lot of people would like the majority of KVM guest memory to be unmapped from the host pagetables. But people might also like for guest memory to be unmapped in *QEMU's* pagetables, and mmap() is a basically worthless interface for this. Getting fd-backed memory into a guest will take some possibly major work in the kernel, but getting vma-backed memory into a guest without mapping it in the host user address space seems much, much worse. > Switching to a chardev doesn't solve the major problem of direct > map fragmentation and defeats the ability to use exclusive memory mapping= s > with the existing allocators, while mprotect() and madvise() do not. > Will people really want to do malloc() and then remap it exclusive? This sounds dubiously useful at best.