From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BA28C433DF for ; Tue, 20 Oct 2020 07:46:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 75497222E8 for ; Tue, 20 Oct 2020 07:46:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Wgt+Am48" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 75497222E8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A36076B005C; Tue, 20 Oct 2020 03:46:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9BFBF6B0062; Tue, 20 Oct 2020 03:46:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 886A76B0068; Tue, 20 Oct 2020 03:46:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0248.hostedemail.com [216.40.44.248]) by kanga.kvack.org (Postfix) with ESMTP id 5359D6B005C for ; Tue, 20 Oct 2020 03:46:18 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DFC4E180AD807 for ; Tue, 20 Oct 2020 07:46:17 +0000 (UTC) X-FDA: 77391520794.04.horse61_5605c722723d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id BAF3580039A3 for ; Tue, 20 Oct 2020 07:46:17 +0000 (UTC) X-HE-Tag: horse61_5605c722723d X-Filterd-Recvd-Size: 12573 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 07:46:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1603179976; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sgXxImGcNfTdpL1HttYiv+7vAWjr3xqbtqxgsUpAur8=; b=Wgt+Am48e3HEEcJa5/saVHAFDgm9fByhqh99VoWotkuZvpewsqP6KG+LHdwQ2Z3rlHPKlP eRBLT0WWKi/MRZcsUQDMILrVRnFjABL8/EbeA1+uYdNxGhVZeILe/wFpo106YIXnGp0ph2 Ky4Ss+ae8GZ0UV0fWmSZC/NtUENjEFs= Received: from mail-ej1-f70.google.com (mail-ej1-f70.google.com [209.85.218.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-351-nPuu9vOjMoeVfR5TdIQOMA-1; Tue, 20 Oct 2020 03:46:14 -0400 X-MC-Unique: nPuu9vOjMoeVfR5TdIQOMA-1 Received: by mail-ej1-f70.google.com with SMTP id z8so504470ejw.3 for ; Tue, 20 Oct 2020 00:46:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=sgXxImGcNfTdpL1HttYiv+7vAWjr3xqbtqxgsUpAur8=; b=fRe/DCo3u83ev98QYg+kijYmYDVmpyd9cincit68xIhCFola8Ng1NAwmw/1yYF7yhq 9BiIiBRlA15U57lD+zDazr0FLQr9OisUp5WecdOsfRKrb0h3FzOWizI3klvrxjIjsNB6 cmOn7h+3ZbODN7Pb9LPysizwlumAwJIgE+EFI93cOQoUPrUoIvLplGCEsoG5ztfMnlGY Fr9x4SCA/yd7kBVM6xJrZI1tuJOMwsCjw2xVGpxF92/s2bV2wR2YaSIIoFeUckkhBQGH u+Hv9OsWJLMk/4NUedRPQNS+/Pot25UvhiCapGwotrvwHOjxJgJ65duv+/xt7ZcAQNRO OXyA== X-Gm-Message-State: AOAM533np5CRZSt7V+BrbmsqXI4JVLxFxTzo48Q4JkdRHFjsnBV4ebXu 8MR+CxTgQvhCG/bASKwJBfKYWrJ22SFyBUL5HRpZ16bVfr9svK/4Dyux7XnA8CAHQ/agLfSb4NK 8qWOQEnNmmAs= X-Received: by 2002:a50:fa42:: with SMTP id c2mr1518484edq.282.1603179973461; Tue, 20 Oct 2020 00:46:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzJ70UdcNwULC3ElIfnIU9WUEldW55FxNekhN61q+78PDLzJKQH8MyQOM4YfsYMYec8XQbuZw== X-Received: by 2002:a50:fa42:: with SMTP id c2mr1518467edq.282.1603179973177; Tue, 20 Oct 2020 00:46:13 -0700 (PDT) Received: from vitty.brq.redhat.com (g-server-2.ign.cz. [91.219.240.2]) by smtp.gmail.com with ESMTPSA id h4sm1363839edj.1.2020.10.20.00.46.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Oct 2020 00:46:12 -0700 (PDT) From: Vitaly Kuznetsov To: "Kirill A. Shutemov" Cc: David Rientjes , Andrea Arcangeli , Kees Cook , Will Drewry , "Edgecombe\, Rick P" , "Kleen\, Andi" , Liran Alon , Mike Rapoport , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Paolo Bonzini , Sean Christopherson , Wanpeng Li , Jim Mattson , Joerg Roedel Subject: Re: [RFCv2 00/16] KVM protected memory extension In-Reply-To: <20201020061859.18385-1-kirill.shutemov@linux.intel.com> References: <20201020061859.18385-1-kirill.shutemov@linux.intel.com> Date: Tue, 20 Oct 2020 09:46:11 +0200 Message-ID: <87ft6949x8.fsf@vitty.brq.redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=vkuznets@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: "Kirill A. Shutemov" writes: > =3D=3D Background / Problem =3D=3D > > There are a number of hardware features (MKTME, SEV) which protect gues= t > memory from some unauthorized host access. The patchset proposes a pure= ly > software feature that mitigates some of the same host-side read-only > attacks. > > > =3D=3D What does this set mitigate? =3D=3D > > - Host kernel =E2=80=9Daccidental=E2=80=9D access to guest data (think= speculation) > > - Host kernel induced access to guest data (write(fd, &guest_data_ptr,= len)) > > - Host userspace access to guest data (compromised qemu) > > - Guest privilege escalation via compromised QEMU device emulation > > =3D=3D What does this set NOT mitigate? =3D=3D > > - Full host kernel compromise. Kernel will just map the pages again. > > - Hardware attacks > > > The second RFC revision addresses /most/ of the feedback. > > I still didn't found a good solution to reboot and kexec. Unprotect all > the memory on such operations defeat the goal of the feature. Clearing = up > most of the memory before unprotecting what is required for reboot (or > kexec) is tedious and error-prone. > Maybe we should just declare them unsupported? Making reboot unsupported is a hard sell. Could you please elaborate on why you think that "unprotect all" hypercall (or rather a single hypercall supporting both protecting/unprotecting) defeats the purpose of the feature? (Leaving kexec aside for a while) Yes, it is not easy for a guest to clean up *all* its memory upon reboot, however: - It may only clean up the most sensitive parts. This should probably be done even without this new feature and even on bare metal (think about next boot target being malicious). - The attack window shrinks significantly. "Speculative" bugs require time to exploit and it will only remain open until it boots up again (few seconds). > > =3D=3D Series Overview =3D=3D > > The hardware features protect guest data by encrypting it and then > ensuring that only the right guest can decrypt it. This has the > side-effect of making the kernel direct map and userspace mapping > (QEMU et al) useless. But, this teaches us something very useful: > neither the kernel or userspace mappings are really necessary for norma= l > guest operations. > > Instead of using encryption, this series simply unmaps the memory. One > advantage compared to allowing access to ciphertext is that it allows b= ad > accesses to be caught instead of simply reading garbage. > > Protection from physical attacks needs to be provided by some other mea= ns. > On Intel platforms, (single-key) Total Memory Encryption (TME) provides > mitigation against physical attacks, such as DIMM interposers sniffing > memory bus traffic. > > The patchset modifies both host and guest kernel. The guest OS must ena= ble > the feature via hypercall and mark any memory range that has to be shar= ed > with the host: DMA regions, bounce buffers, etc. SEV does this marking = via a > bit in the guest=E2=80=99s page table while this approach uses a hyperc= all. > > For removing the userspace mapping, use a trick similar to what NUMA > balancing does: convert memory that belongs to KVM memory slots to > PROT_NONE: all existing entries converted to PROT_NONE with mprotect() = and > the newly faulted in pages get PROT_NONE from the updated vm_page_prot. > The new VMA flag -- VM_KVM_PROTECTED -- indicates that the pages in the > VMA must be treated in a special way in the GUP and fault paths. The fl= ag > allows GUP to return the page even though it is mapped with PROT_NONE, = but > only if the new GUP flag -- FOLL_KVM -- is specified. Any userspace acc= ess > to the memory would result in SIGBUS. Any GUP access without FOLL_KVM > would result in -EFAULT. > > Removing userspace mapping of the guest memory from QEMU process can he= lp > to address some guest privilege escalation attacks. Consider the case w= hen > unprivileged guest user exploits bug in a QEMU device emulation to gain > access to data it cannot normally have access within the guest. > > Any anonymous page faulted into the VM_KVM_PROTECTED VMA gets removed f= rom > the direct mapping with kernel_map_pages(). Note that kernel_map_pages(= ) only > flushes local TLB. I think it's a reasonable compromise between securit= y and > perfromance. > > Zapping the PTE would bring the page back to the direct mapping after c= learing. > At least for now, we don't remove file-backed pages from the direct map= ping. > File-backed pages could be accessed via read/write syscalls. It adds > complexity. > > Occasionally, host kernel has to access guest memory that was not made > shared by the guest. For instance, it happens for instruction emulation= . > Normally, it's done via copy_to/from_user() which would fail with -EFAU= LT > now. We introduced a new pair of helpers: copy_to/from_guest(). The new > helpers acquire the page via GUP, map it into kernel address space with > kmap_atomic()-style mechanism and only then copy the data. > > For some instruction emulation copying is not good enough: cmpxchg > emulation has to have direct access to the guest memory. __kvm_map_gfn(= ) > is modified to accommodate the case. > > The patchset is on top of v5.9 > > Kirill A. Shutemov (16): > x86/mm: Move force_dma_unencrypted() to common code > x86/kvm: Introduce KVM memory protection feature > x86/kvm: Make DMA pages shared > x86/kvm: Use bounce buffers for KVM memory protection > x86/kvm: Make VirtIO use DMA API in KVM guest > x86/kvmclock: Share hvclock memory with the host > x86/realmode: Share trampoline area if KVM memory protection enabled > KVM: Use GUP instead of copy_from/to_user() to access guest memory > KVM: mm: Introduce VM_KVM_PROTECTED > KVM: x86: Use GUP for page walk instead of __get_user() > KVM: Protected memory extension > KVM: x86: Enabled protected memory extension > KVM: Rework copy_to/from_guest() to avoid direct mapping > KVM: Handle protected memory in __kvm_map_gfn()/__kvm_unmap_gfn() > KVM: Unmap protected pages from direct mapping > mm: Do not use zero page for VM_KVM_PROTECTED VMAs > > arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- > arch/s390/include/asm/pgtable.h | 2 +- > arch/x86/Kconfig | 11 +- > arch/x86/include/asm/cpufeatures.h | 1 + > arch/x86/include/asm/io.h | 6 +- > arch/x86/include/asm/kvm_para.h | 5 + > arch/x86/include/asm/pgtable_types.h | 1 + > arch/x86/include/uapi/asm/kvm_para.h | 3 +- > arch/x86/kernel/kvm.c | 20 +++ > arch/x86/kernel/kvmclock.c | 2 +- > arch/x86/kernel/pci-swiotlb.c | 3 +- > arch/x86/kvm/Kconfig | 1 + > arch/x86/kvm/cpuid.c | 3 +- > arch/x86/kvm/mmu/mmu.c | 6 +- > arch/x86/kvm/mmu/paging_tmpl.h | 10 +- > arch/x86/kvm/x86.c | 9 + > arch/x86/mm/Makefile | 2 + > arch/x86/mm/ioremap.c | 16 +- > arch/x86/mm/mem_encrypt.c | 51 ------ > arch/x86/mm/mem_encrypt_common.c | 62 +++++++ > arch/x86/mm/pat/set_memory.c | 7 + > arch/x86/realmode/init.c | 7 +- > drivers/virtio/virtio_ring.c | 4 + > include/linux/kvm_host.h | 11 +- > include/linux/kvm_types.h | 1 + > include/linux/mm.h | 21 ++- > include/uapi/linux/kvm_para.h | 5 +- > mm/gup.c | 20 ++- > mm/huge_memory.c | 31 +++- > mm/ksm.c | 2 + > mm/memory.c | 18 +- > mm/mmap.c | 3 + > mm/rmap.c | 4 + > virt/kvm/Kconfig | 3 + > virt/kvm/async_pf.c | 2 +- > virt/kvm/kvm_main.c | 238 +++++++++++++++++++++---- > virt/lib/Makefile | 1 + > virt/lib/mem_protected.c | 193 ++++++++++++++++++++ > 39 files changed, 666 insertions(+), 123 deletions(-) > create mode 100644 arch/x86/mm/mem_encrypt_common.c > create mode 100644 virt/lib/mem_protected.c --=20 Vitaly