From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26927C433E1 for ; Mon, 25 May 2020 05:27:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E8FEB2071A for ; Mon, 25 May 2020 05:27:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E8FEB2071A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7DD2C80019; Mon, 25 May 2020 01:27:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 767E68000E; Mon, 25 May 2020 01:27:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6538A80019; Mon, 25 May 2020 01:27:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0045.hostedemail.com [216.40.44.45]) by kanga.kvack.org (Postfix) with ESMTP id 4B5F88000E for ; Mon, 25 May 2020 01:27:13 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 05A6F4DB6 for ; Mon, 25 May 2020 05:27:13 +0000 (UTC) X-FDA: 76854107946.26.cat71_3ee7b0cb5f93a X-HE-Tag: cat71_3ee7b0cb5f93a X-Filterd-Recvd-Size: 8110 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Mon, 25 May 2020 05:27:12 +0000 (UTC) IronPort-SDR: R/bcu/OnRhrP6GvIC9rK0mr4U9qdkP5j7Tl3yHgrEENyxPW0Sd6PeDcsK/m5RY/wQh51olhzmQ B6JqjNVMiuwQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 May 2020 22:27:10 -0700 IronPort-SDR: iJgO3NPxbSQrde39Pl/sGd3uIpknDtkS1Z6bF9xo+xNLMwaEOMKfpi1qnscwN4yVnKqkqnWG39 FlfTk6lI7Z/g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,432,1583222400"; d="scan'208";a="467949545" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga006.fm.intel.com with ESMTP; 24 May 2020 22:27:06 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 3F116D7; Mon, 25 May 2020 08:27:04 +0300 (EEST) Date: Mon, 25 May 2020 08:27:04 +0300 From: "Kirill A. Shutemov" To: "Kirill A. Shutemov" Cc: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Rientjes , Andrea Arcangeli , Kees Cook , Will Drewry , "Edgecombe, Rick P" , "Kleen, Andi" , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mike Rapoport , Alexandre Chartre , Marius Hillenbrand Subject: Re: [RFC 00/16] KVM protected memory extension Message-ID: <20200525052704.phyk5olkykncj3bj@black.fi.intel.com> References: <20200522125214.31348-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200522125214.31348-1-kirill.shutemov@linux.intel.com> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, May 22, 2020 at 03:51:58PM +0300, Kirill A. Shutemov wrote: > =3D=3D Background / Problem =3D=3D >=20 > There are a number of hardware features (MKTME, SEV) which protect gues= t > memory from some unauthorized host access. The patchset proposes a pure= ly > software feature that mitigates some of the same host-side read-only > attacks. CC people who worked on the related patchsets. =20 > =3D=3D What does this set mitigate? =3D=3D >=20 > - Host kernel =E2=80=9Daccidental=E2=80=9D access to guest data (think= speculation) >=20 > - Host kernel induced access to guest data (write(fd, &guest_data_ptr,= len)) >=20 > - Host userspace access to guest data (compromised qemu) >=20 > =3D=3D What does this set NOT mitigate? =3D=3D >=20 > - Full host kernel compromise. Kernel will just map the pages again. >=20 > - Hardware attacks >=20 >=20 > The patchset is RFC-quality: it works but has known issues that must be > addressed before it can be considered for applying. >=20 > We are looking for high-level feedback on the concept. Some open > questions: >=20 > - This protects from some kernel and host userspace read-only attacks, > but does not place the host kernel outside the trust boundary. Is it > still valuable? >=20 > - Can this approach be used to avoid cache-coherency problems with > hardware encryption schemes that repurpose physical bits? >=20 > - The guest kernel must be modified for this to work. Is that a deal > breaker, especially for public clouds? >=20 > - Are the costs of removing pages from the direct map too high to be > feasible? >=20 > =3D=3D Series Overview =3D=3D >=20 > The hardware features protect guest data by encrypting it and then > ensuring that only the right guest can decrypt it. This has the > side-effect of making the kernel direct map and userspace mapping > (QEMU et al) useless. But, this teaches us something very useful: > neither the kernel or userspace mappings are really necessary for norma= l > guest operations. >=20 > Instead of using encryption, this series simply unmaps the memory. One > advantage compared to allowing access to ciphertext is that it allows b= ad > accesses to be caught instead of simply reading garbage. >=20 > Protection from physical attacks needs to be provided by some other mea= ns. > On Intel platforms, (single-key) Total Memory Encryption (TME) provides > mitigation against physical attacks, such as DIMM interposers sniffing > memory bus traffic. >=20 > The patchset modifies both host and guest kernel. The guest OS must ena= ble > the feature via hypercall and mark any memory range that has to be shar= ed > with the host: DMA regions, bounce buffers, etc. SEV does this marking = via a > bit in the guest=E2=80=99s page table while this approach uses a hyperc= all. >=20 > For removing the userspace mapping, use a trick similar to what NUMA > balancing does: convert memory that belongs to KVM memory slots to > PROT_NONE: all existing entries converted to PROT_NONE with mprotect() = and > the newly faulted in pages get PROT_NONE from the updated vm_page_prot. > The new VMA flag -- VM_KVM_PROTECTED -- indicates that the pages in the > VMA must be treated in a special way in the GUP and fault paths. The fl= ag > allows GUP to return the page even though it is mapped with PROT_NONE, = but > only if the new GUP flag -- FOLL_KVM -- is specified. Any userspace acc= ess > to the memory would result in SIGBUS. Any GUP access without FOLL_KVM > would result in -EFAULT. >=20 > Any anonymous page faulted into the VM_KVM_PROTECTED VMA gets removed f= rom > the direct mapping with kernel_map_pages(). Note that kernel_map_pages(= ) only > flushes local TLB. I think it's a reasonable compromise between securit= y and > perfromance. >=20 > Zapping the PTE would bring the page back to the direct mapping after c= learing. > At least for now, we don't remove file-backed pages from the direct map= ping. > File-backed pages could be accessed via read/write syscalls. It adds > complexity. >=20 > Occasionally, host kernel has to access guest memory that was not made > shared by the guest. For instance, it happens for instruction emulation= . > Normally, it's done via copy_to/from_user() which would fail with -EFAU= LT > now. We introduced a new pair of helpers: copy_to/from_guest(). The new > helpers acquire the page via GUP, map it into kernel address space with > kmap_atomic()-style mechanism and only then copy the data. >=20 > For some instruction emulation copying is not good enough: cmpxchg > emulation has to have direct access to the guest memory. __kvm_map_gfn(= ) > is modified to accommodate the case. >=20 > The patchset is on top of v5.7-rc6 plus this patch: >=20 > https://lkml.kernel.org/r/20200402172507.2786-1-jimmyassarsson@gmail.co= m >=20 > =3D=3D Open Issues =3D=3D >=20 > Unmapping the pages from direct mapping bring a few of issues that have > not rectified yet: >=20 > - Touching direct mapping leads to fragmentation. We need to be able t= o > recover from it. I have a buggy patch that aims at recovering 2M/1G = page. > It has to be fixed and tested properly >=20 > - Page migration and KSM is not supported yet. >=20 > - Live migration of a guest would require a new flow. Not sure yet how= it > would look like. >=20 > - The feature interfere with NUMA balancing. Not sure yet if it's > possible to make them work together. >=20 > - Guests have no mechanism to ensure that even a well-behaving host ha= s > unmapped its private data. With SEV, for instance, the guest only h= as > to trust the hardware to encrypt a page after the C bit is set in a > guest PTE. A mechanism for a guest to query the host mapping state,= or > to constantly assert the intent for a page to be Private would be > valuable. --=20 Kirill A. Shutemov