From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 820CDC54EE9 for ; Tue, 27 Sep 2022 22:47:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 570AE8E0106; Tue, 27 Sep 2022 18:47:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F7AD8E00C1; Tue, 27 Sep 2022 18:47:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3757A8E0106; Tue, 27 Sep 2022 18:47:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 20F388E00C1 for ; Tue, 27 Sep 2022 18:47:36 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CBE7B1210CD for ; Tue, 27 Sep 2022 22:47:35 +0000 (UTC) X-FDA: 79959353670.01.1B1F24B Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by imf25.hostedemail.com (Postfix) with ESMTP id 13A7EA0008 for ; Tue, 27 Sep 2022 22:47:34 +0000 (UTC) Received: by mail-pf1-f172.google.com with SMTP id l65so10934340pfl.8 for ; Tue, 27 Sep 2022 15:47:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date; bh=Ga4pQWEDXBG9/XYtSTlScJ77nN/lKyKRhvrVN6ZYtEQ=; b=Zt6DAEHvTS8wxWv1WyEBwMpb/J0TAGqNJWSYq7B/TUUmald3XX9OlT/d5Lzj6Si+Bc j2rFkg3bzOhJaWeJmSBV0V8U3XVEjrwk9qHAVJiEkvevxCuOWyXtok5P3iv/Iw8ESLef 6LPeA8HKLKlToZ8BNFYQq63V4Oktsc1qDJcFhLSS4oHG7yiRN/kOGqpPb7Yj4VO8VCP6 +5edR6WRMr1lS7G3XjYbUcnl8iY36M5Ts6GW4jUNk9gvSm6NZePMAF2GZViOeXNHh0tr SRk8IvIZSS6mDMv+zdaigNV2/rVcP+cwljmgJZfO6Qwc/ZF3/efpr9a3QxY+c3ZolRxi dBSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date; bh=Ga4pQWEDXBG9/XYtSTlScJ77nN/lKyKRhvrVN6ZYtEQ=; b=t+/Ak0qAryzgNYqBn1VilOVfR2BM46djuNidqZnvO7KJen0EPjHDmVXoUY6lGN01/E vVq9/2/lzHCeLUOuki/1kfWiNpMX8onQg+zyXzFF+/TNsRDebfAiZsXMM5TalhDlNgQW pk43VWIPDbtAXwKG2A0J7H/A5c4Dbo148ik67Mpcc3owuFHYqU32MkjaD6U8Q+tcllAo lJspkxadMeNUe/q1+xZBqAaJ7i7xYsICJ5s7kqMDcUHhJQCOkvnpFDW8uNo83LTgsfS0 GNcZaAVhD1NOSMRLfZrbn7WcdDMmKe68bQO8oSvZ98IbNDh0vSC2c1fTu2GDT6z+3TL1 P/Dw== X-Gm-Message-State: ACrzQf15fzIBaofLXM51dZV4WRVmuiXW7N9skohvdAacYSwngfSrgcMC Ev9kHgT5t2Zfd1C7zxWsSD83cA== X-Google-Smtp-Source: AMsMyM7e3pORG/ZiCK/KOAmPTIhR9W8OFaIKxiR5Fl7ql/8DpZyNJcNG+S+vv6Tj5cxsHspqGFGwnQ== X-Received: by 2002:a63:1e03:0:b0:43a:a64d:f3a4 with SMTP id e3-20020a631e03000000b0043aa64df3a4mr26001459pge.121.1664318853506; Tue, 27 Sep 2022 15:47:33 -0700 (PDT) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id l3-20020a170902f68300b00176b3c9693esm2081016plg.299.2022.09.27.15.47.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Sep 2022 15:47:32 -0700 (PDT) Date: Tue, 27 Sep 2022 22:47:29 +0000 From: Sean Christopherson To: Fuad Tabba Cc: Chao Peng , David Hildenbrand , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song , wei.w.wang@intel.com, Will Deacon , Marc Zyngier Subject: Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd Message-ID: References: <20220915142913.2213336-1-chao.p.peng@linux.intel.com> <20220915142913.2213336-2-chao.p.peng@linux.intel.com> <20220926142330.GC2658254@chaop.bj.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664318855; a=rsa-sha256; cv=none; b=bQ6+VL7wcO2XIR2NPHu9bz+7c0Sa/zpAo8e/IbBDcIybJLngtE/Eu7q6Ci2MdzoEm/xm0N Z1X91ZWzSqHG6Bn/+jMc2lv679eb9Ria4mYc2L5RbmN2cQR0wSFEpvwZzTwsJW7c4X8R9+ Ew57Dl/m0PVEqolXaemWhZdvEaSkNEY= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Zt6DAEHv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of seanjc@google.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=seanjc@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664318855; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ga4pQWEDXBG9/XYtSTlScJ77nN/lKyKRhvrVN6ZYtEQ=; b=uDADdjWi4ZmTlS4fNoakr+Y5f5g8iJUpZo1yinfJHTPgAlB8iV03lIoJGOi8oaERTqiyaq 5HO7TklnHxrWf28OaL6d3x65AjMckzsFwHF/L0vamSqPB9pbWmPCt3WjUxmQZqeEQeKu72 eKLb6gJKwahvVQHDwLnhdavNNATmVL0= Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Zt6DAEHv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of seanjc@google.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=seanjc@google.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 13A7EA0008 X-Stat-Signature: 6ga7dyyigzg16ssoijqj6ei6ieumws3g X-Rspam-User: X-HE-Tag: 1664318854-26698 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 26, 2022, Fuad Tabba wrote: > Hi, > > On Mon, Sep 26, 2022 at 3:28 PM Chao Peng wrote: > > > > On Fri, Sep 23, 2022 at 04:19:46PM +0100, Fuad Tabba wrote: > > > > Then on the KVM side, its mmap_start() + mmap_end() sequence would: > > > > > > > > 1. Not be supported for TDX or SEV-SNP because they don't allow adding non-zero > > > > memory into the guest (after pre-boot phase). > > > > > > > > 2. Be mutually exclusive with shared<=>private conversions, and is allowed if > > > > and only if the entire gfn range of the associated memslot is shared. > > > > > > In general I think that this would work with pKVM. However, limiting > > > private<->shared conversions to the granularity of a whole memslot > > > might be difficult to handle in pKVM, since the guest doesn't have the > > > concept of memslots. For example, in pKVM right now, when a guest > > > shares back its restricted DMA pool with the host it does so at the > > > page-level. Y'all are killing me :-) Isn't the guest enlightened? E.g. can't you tell the guest "thou shalt share at granularity X"? With KVM's newfangled scalable memslots and per-vCPU MRU slot, X doesn't even have to be that high to get reasonable performance, e.g. assuming the DMA pool is at most 2GiB, that's "only" 1024 memslots, which is supposed to work just fine in KVM. > > > pKVM would also need a way to make an fd accessible again > > > when shared back, which I think isn't possible with this patch. > > > > But does pKVM really want to mmap/munmap a new region at the page-level, > > that can cause VMA fragmentation if the conversion is frequent as I see. > > Even with a KVM ioctl for mapping as mentioned below, I think there will > > be the same issue. > > pKVM doesn't really need to unmap the memory. What is really important > is that the memory is not GUP'able. Well, not entirely unguppable, just unguppable without a magic FOLL_* flag, otherwise KVM wouldn't be able to get the PFN to map into guest memory. The problem is that gup() and "mapped" are tied together. So yes, pKVM doesn't strictly need to unmap memory _in the untrusted host_, but since mapped==guppable, the end result is the same. Emphasis above because pKVM still needs unmap the memory _somehwere_. IIUC, the current approach is to do that only in the stage-2 page tables, i.e. only in the context of the hypervisor. Which is also the source of the gup() problems; the untrusted kernel is blissfully unaware that the memory is inaccessible. Any approach that moves some of that information into the untrusted kernel so that the kernel can protect itself will incur fragmentation in the VMAs. Well, unless all of guest memory becomes unguppable, but that's likely not a viable option.