From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6D52C433ED for ; Wed, 7 Apr 2021 13:16:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 60E2E61362 for ; Wed, 7 Apr 2021 13:16:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 60E2E61362 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CE1556B007D; Wed, 7 Apr 2021 09:16:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C90E46B007E; Wed, 7 Apr 2021 09:16:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0A7C6B0080; Wed, 7 Apr 2021 09:16:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0232.hostedemail.com [216.40.44.232]) by kanga.kvack.org (Postfix) with ESMTP id 911126B007D for ; Wed, 7 Apr 2021 09:16:49 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 39D1D3ABA for ; Wed, 7 Apr 2021 13:16:49 +0000 (UTC) X-FDA: 78005620938.12.CEDA914 Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf18.hostedemail.com (Postfix) with ESMTP id 69FA02000263 for ; Wed, 7 Apr 2021 13:16:47 +0000 (UTC) Received: by mail-lf1-f49.google.com with SMTP id v140so10673626lfa.4 for ; Wed, 07 Apr 2021 06:16:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=CmyaBzs5rlfBlw22o3owRG9Y4teAsM7bByweP2c2pew=; b=hAGa9tnA5T5yXXeWlB51m9BDzK9gGtUXYJp31MsTp7EQg67tZAd00aancY6iyeMRBY tBorsdqfxjDCmtY0/RhsEE7KeveKqez9ZZXrC4hXCcf634xF9y/GJNbjqk8hmNd0GlK7 VG9A2wHv36ouHY0YbtPjeOfn5avynYWAzdHqcZuIGu7mfh4BNRkvSXvPdUIIR8B0KOLf 3T46aOhdFpftq6uJ4lBvUKr2NvuTF31wb7RfM0B6OMSDQHu+7ryCLjnjkHmRecvoe8Ej 6hRRNUO0sa9YnUM0wVwoOwtGP/zaIT67UIhoW3wmpyxAfTlTYZqJK0n/gEwm7lppY/Zb 5Gfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=CmyaBzs5rlfBlw22o3owRG9Y4teAsM7bByweP2c2pew=; b=bAJXuVVrla9+MOUrJehfGhVp2H9U2BrXrdak4zWXGBA2jDVZpBXYpaFIVcKswevi2O HgsLjL6H4vz/g2ze97u6A2mOznFWCBcaKdbBZCco3NdjW8lvtXIdw0+8VoRbS4RZtua2 R+beoodb8pxiHYIThFkaP+JUP+gF7f6URepfRWSnps0P5QIGEgZQ4PAtKTd6Rio8lrZl /dudoActpZNpsiOVIkbBLq7en/n5P2T8N1FzFGE9j7KDeYVzpHHkJMQmOs1p8WwMsEIh 7Brd8kLEXaf9xUYI7NtD46ly+gXspVhsXm3a42hADz265srQ4f3Xad5ZkDLHm5h5taet bGWA== X-Gm-Message-State: AOAM531m3GO3S1PMCUC/02ldUPrX1CS8L3vCtZPAn1Wj8H7H0I8Nn88f u9jmcioREBvuI+2MoQXA/I9fRA== X-Google-Smtp-Source: ABdhPJx1Cx5EZEnhUgLjREb/wy12U37HUpofuTHze5WAX/Y5AuDbDUJdubYX85KooKXYMsvjVl2pPg== X-Received: by 2002:a05:6512:504:: with SMTP id o4mr2351551lfb.438.1617801405426; Wed, 07 Apr 2021 06:16:45 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id r3sm2533152ljn.13.2021.04.07.06.16.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Apr 2021 06:16:44 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 58847102413; Wed, 7 Apr 2021 16:16:47 +0300 (+03) Date: Wed, 7 Apr 2021 16:16:47 +0300 From: "Kirill A. Shutemov" To: David Hildenbrand Cc: Dave Hansen , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson , David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: Re: [RFCv1 7/7] KVM: unmap guest memory using poisoned pages Message-ID: <20210407131647.djajbwhqsmlafsyo@box.shutemov.name> References: <20210402152645.26680-1-kirill.shutemov@linux.intel.com> <20210402152645.26680-8-kirill.shutemov@linux.intel.com> <52518f09-7350-ebe9-7ddb-29095cd3a4d9@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 69FA02000263 X-Stat-Signature: x6om6npi99wuac6zzz3149gojytcpf6h Received-SPF: none (shutemov.name>: No applicable sender policy available) receiver=imf18; identity=mailfrom; envelope-from=""; helo=mail-lf1-f49.google.com; client-ip=209.85.167.49 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617801407-942406 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 06, 2021 at 04:57:46PM +0200, David Hildenbrand wrote: > On 06.04.21 16:33, Dave Hansen wrote: > > On 4/6/21 12:44 AM, David Hildenbrand wrote: > > > On 02.04.21 17:26, Kirill A. Shutemov wrote: > > > > TDX architecture aims to provide resiliency against confidentiali= ty and > > > > integrity attacks. Towards this goal, the TDX architecture helps = enforce > > > > the enabling of memory integrity for all TD-private memory. > > > >=20 > > > > The CPU memory controller computes the integrity check value (MAC= ) for > > > > the data (cache line) during writes, and it stores the MAC with t= he > > > > memory as meta-data. A 28-bit MAC is stored in the ECC bits. > > > >=20 > > > > Checking of memory integrity is performed during memory reads. If > > > > integrity check fails, CPU poisones cache line. > > > >=20 > > > > On a subsequent consumption (read) of the poisoned data by softwa= re, > > > > there are two possible scenarios: > > > >=20 > > > > =A0 - Core determines that the execution can continue and it tre= ats > > > > =A0=A0=A0 poison with exception semantics signaled as a #MCE > > > >=20 > > > > =A0 - Core determines execution cannot continue,and it does an u= nbreakable > > > > =A0=A0=A0 shutdown > > > >=20 > > > > For more details, see Chapter 14 of Intel TDX Module EAS[1] > > > >=20 > > > > As some of integrity check failures may lead to system shutdown h= ost > > > > kernel must not allow any writes to TD-private memory. This requi= rment > > > > clashes with KVM design: KVM expects the guest memory to be mappe= d into > > > > host userspace (e.g. QEMU). > > >=20 > > > So what you are saying is that if QEMU would write to such memory, = it > > > could crash the kernel? What a broken design. > >=20 > > IMNHO, the broken design is mapping the memory to userspace in the fi= rst > > place. Why the heck would you actually expose something with the MMU= to > > a context that can't possibly meaningfully access or safely write to = it? >=20 > I'd say the broken design is being able to crash the machine via a simp= le > memory write, instead of only crashing a single process in case you're = doing > something nasty. From the evaluation of the problem it feels like this = was a > CPU design workaround: instead of properly cleaning up when it gets tri= cky > within the core, just crash the machine. And that's a CPU "feature", no= t a > kernel "feature". Now we have to fix broken HW in the kernel - once aga= in. >=20 > However, you raise a valid point: it does not make too much sense to to= map > this into user space. Not arguing against that; but crashing the machin= e is > just plain ugly. >=20 > I wonder: why do we even *want* a VMA/mmap describing that memory? Soun= ds > like: for hacking support for that memory type into QEMU/KVM. >=20 > This all feels wrong, but I cannot really tell how it could be better. = That > memory can really only be used (right now?) with hardware virtualizatio= n > from some point on. From that point on (right from the start?), there s= hould > be no VMA/mmap/page tables for user space anymore. >=20 > Or am I missing something? Is there still valid user space access? There is. For IO (e.g. virtio) the guest mark a range of memory as shared (or unencrypted for AMD SEV). The range is not pre-defined. > > This started with SEV. QEMU creates normal memory mappings with the = SEV > > C-bit (encryption) disabled. The kernel plumbs those into NPT, but w= hen > > those are instantiated, they have the C-bit set. So, we have mismatc= hed > > mappings. Where does that lead? The two mappings not only differ in > > the encryption bit, causing one side to read gibberish if the other > > writes: they're not even cache coherent. > >=20 > > That's the situation *TODAY*, even ignoring TDX. > >=20 > > BTW, I'm pretty sure I know the answer to the "why would you expose t= his > > to userspace" question: it's what QEMU/KVM did alreadhy for > > non-encrypted memory, so this was the quickest way to get SEV working= . > >=20 >=20 > Yes, I guess so. It was the fastest way to "hack" it into QEMU. >=20 > Would we ever even want a VMA/mmap/process page tables for that memory?= How > could user space ever do something *not so nasty* with that memory (in = the > current context of VMs)? In the future, the memory should be still managable by host MM: migration= , swapping, etc. But it's long way there. For now, the guest memory effectively pinned on the host. --=20 Kirill A. Shutemov