From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0172FC433F5 for ; Mon, 22 Nov 2021 09:26:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F25C6B0072; Mon, 22 Nov 2021 04:26:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A1D06B0073; Mon, 22 Nov 2021 04:26:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F3D76B0074; Mon, 22 Nov 2021 04:26:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0196.hostedemail.com [216.40.44.196]) by kanga.kvack.org (Postfix) with ESMTP id 4DB1D6B0072 for ; Mon, 22 Nov 2021 04:26:28 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 18DD67FAAE for ; Mon, 22 Nov 2021 09:26:18 +0000 (UTC) X-FDA: 78836035236.14.EC22112 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 65A4E105539D for ; Mon, 22 Nov 2021 09:26:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637573177; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W24LOqoXm5rO4QnnE3nnOA9qNOGul+S9uy89jqjOA+M=; b=JsDWvvyOwg7B1qEHfHgUTlJBgb/rH5nZAqr/wCm59mY0ufqqLWqbW4U4+aI8z2vK5M0GKT zp2npE0kwCrqCIxizdonnROTfIbcboRxa0SJjuOW25g00FtoejRVTh9I8MDjSz7byMvTz6 A80LXZLhh1uOdZlPCwBhfvvy1yTVFmA= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-272-c4OyR411PT6YbYvo27Mb2w-1; Mon, 22 Nov 2021 04:26:15 -0500 X-MC-Unique: c4OyR411PT6YbYvo27Mb2w-1 Received: by mail-wr1-f70.google.com with SMTP id q17-20020adfcd91000000b0017bcb12ad4fso2934863wrj.12 for ; Mon, 22 Nov 2021 01:26:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=W24LOqoXm5rO4QnnE3nnOA9qNOGul+S9uy89jqjOA+M=; b=OmhP/aMzW9kZbi4i/KwxrJcIi0cL9ojOedQGDV4GXUQnBLhuVsDy6yt3K+gMgnhWEo O3VPEppwvyCjCVwFwRrc8GE36CK/jUWhcV8eaq/o6Tri5Pre2udQAWBcrHb4EDgBqpKB wLNRmxDZ2rVwY3BnM8/Y2luB0J7pUwmWT5rR9RHRaNpvE7orQrrUCtWqlmglnCyx10w9 QUvoLixiNPVbxIKuMpAkx5Z2HxqHf/Z+EGVcFI/6EIWIN299blnqZlbUH2RTxITzKiqR T0ekdj2QSHGYhUZyjkiUXLQ2vgjwvjA2tvGR0uSkvvqhJO3yVn3UTYgDR4FUC5rYUckA JhDw== X-Gm-Message-State: AOAM531cLwuZXR7FMam5Ji03d9OH0Y+qMkEpJkuX//xlT3GGdfuntkHM f6r+9zg5E9QaZmQqNMHAB7MtZ+C2OzIIIkJdR9RxsBwLYPTzEuSbS1NzUHk/ynIE8oTIVqh/uBe VwAGi+uFZo0s= X-Received: by 2002:a05:600c:104b:: with SMTP id 11mr28668555wmx.54.1637573174580; Mon, 22 Nov 2021 01:26:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJyLFJu5j4VPLLnAmpU8CA4IejbWxqy8Vo0yv4T3cYINYpxhqk/tZpLIoYKCers6pf22TsjTpQ== X-Received: by 2002:a05:600c:104b:: with SMTP id 11mr28668518wmx.54.1637573174383; Mon, 22 Nov 2021 01:26:14 -0800 (PST) Received: from [192.168.3.132] (p5b0c667b.dip0.t-ipconnect.de. [91.12.102.123]) by smtp.gmail.com with ESMTPSA id t8sm8351680wrv.30.2021.11.22.01.26.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 22 Nov 2021 01:26:13 -0800 (PST) Message-ID: <4efdccac-245f-eb1f-5b7f-c1044ff0103d@redhat.com> Date: Mon, 22 Nov 2021 10:26:12 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [RFC v2 PATCH 01/13] mm/shmem: Introduce F_SEAL_GUEST To: Jason Gunthorpe Cc: Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com References: <20211119134739.20218-1-chao.p.peng@linux.intel.com> <20211119134739.20218-2-chao.p.peng@linux.intel.com> <20211119151943.GH876299@ziepe.ca> <20211119160023.GI876299@ziepe.ca> From: David Hildenbrand Organization: Red Hat In-Reply-To: <20211119160023.GI876299@ziepe.ca> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 65A4E105539D X-Stat-Signature: nyjxnmc3zfyojk5mybouy57qngmk5jen Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JsDWvvyO; spf=none (imf13.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1637573175-93511 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 19.11.21 17:00, Jason Gunthorpe wrote: > On Fri, Nov 19, 2021 at 04:39:15PM +0100, David Hildenbrand wrote: > >>> If qmeu can put all the guest memory in a memfd and not map it, then >>> I'd also like to see that the IOMMU can use this interface too so we >>> can have VFIO working in this configuration. >> >> In QEMU we usually want to (and must) be able to access guest memory >> from user space, with the current design we wouldn't even be able to >> temporarily mmap it -- which makes sense for encrypted memory only. The >> corner case really is encrypted memory. So I don't think we'll see a >> broad use of this feature outside of encrypted VMs in QEMU. I might be >> wrong, most probably I am :) > > Interesting.. > > The non-encrypted case I had in mind is the horrible flow in VFIO to > support qemu re-execing itself (VFIO_DMA_UNMAP_FLAG_VADDR). Thanks for sharing! > > Here VFIO is connected to a VA in a mm_struct that will become invalid > during the kexec period, but VFIO needs to continue to access it. For > IOMMU cases this is OK because the memory is already pinned, but for > the 'emulated iommu' used by mdevs pages are pinned dynamically. qemu > needs to ensure that VFIO can continue to access the pages across the > kexec, even though there is nothing to pin_user_pages() on. > > This flow would work a lot better if VFIO was connected to the memfd > that is storing the guest memory. Then it naturally doesn't get > disrupted by exec() and we don't need the mess in the kernel.. I do wonder if we want to support sharing such memfds between processes in all cases ... we most certainly don't want to be able to share encrypted memory between VMs (I heard that the kernel has to forbid that). It would make sense in the use case you describe, though. > > I was wondering if we could get here using the direct_io APIs but this > would do the job too. > >> Apart from the special "encrypted memory" semantics, I assume nothing >> speaks against allowing for mmaping these memfds, for example, for any >> other VFIO use cases. > > We will eventually have VFIO with "encrypted memory". There was a talk > in LPC about the enabling work for this. Yes, I heard about that as well. In the foreseeable future, we'll have shared memory only visible for VFIO devices. > > So, if the plan is to put fully encrpyted memory inside a memfd, then > we still will eventually need a way to pull the pfns it into the > IOMMU, presumably along with the access control parameters needed to > pass to the secure monitor to join a PCI device to the secure memory. Long-term, agreed. -- Thanks, David / dhildenb