From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0B18C47258 for ; Thu, 25 Jan 2024 16:46:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 89A386B0098; Thu, 25 Jan 2024 11:46:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 825AD6B00A0; Thu, 25 Jan 2024 11:46:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 69D256B00A2; Thu, 25 Jan 2024 11:46:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 53BD56B0098 for ; Thu, 25 Jan 2024 11:46:00 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2A0EA1A0DF4 for ; Thu, 25 Jan 2024 16:46:00 +0000 (UTC) X-FDA: 81718410480.08.AC4E3DA Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf07.hostedemail.com (Postfix) with ESMTP id 5994C4002A for ; Thu, 25 Jan 2024 16:45:58 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; spf=pass (imf07.hostedemail.com: domain of alexandru.elisei@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=alexandru.elisei@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706201158; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VmI4/L7Bdj1EfzaZ0hfN72ArxrVtIDzRCaBT5xxScCg=; b=iqZpOnGEiqB2CAB5DooEvUdZ4pdPErCQ8ajEDHaYBTywrSkBZguif/OBVr0iLhKyROlIm7 BhJA5fllhNIDoocPZvYFOZaWJqac5hLs11rStukt4/qcS/RzOxJPASQVarLRyg4fPyJpOF orfEr60mbvD7f07lFEMsHU8MgFCPm58= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; spf=pass (imf07.hostedemail.com: domain of alexandru.elisei@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=alexandru.elisei@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706201158; a=rsa-sha256; cv=none; b=EZZ7/SZUBeyUG45aE4eaOKejNgEJuNL555m+D3Q9ePHVrEpYnvuLBaE4j7zXHuBKrsszBD Cu3/dp2KM2Xic2m9N2XtJsPHQTqb9zZCj3mg0YYV+8Zl13Z5BQiMOuHK3O31I559LB2Fa3 k7C3RZXah/IL5qTnS7qj/lyIN9XaLdQ= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1422C1758; Thu, 25 Jan 2024 08:46:42 -0800 (PST) Received: from e121798.cable.virginm.net (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2B5AC3F5A1; Thu, 25 Jan 2024 08:45:52 -0800 (PST) From: Alexandru Elisei To: catalin.marinas@arm.com, will@kernel.org, oliver.upton@linux.dev, maz@kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, arnd@arndb.de, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, mhiramat@kernel.org, rppt@kernel.org, hughd@google.com Cc: pcc@google.com, steven.price@arm.com, anshuman.khandual@arm.com, vincenzo.frascino@arm.com, david@redhat.com, eugenis@google.com, kcc@google.com, hyesoo.yu@samsung.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH RFC v3 33/35] KVM: arm64: mte: Introduce VM_MTE_KVM VMA flag Date: Thu, 25 Jan 2024 16:42:54 +0000 Message-Id: <20240125164256.4147-34-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240125164256.4147-1-alexandru.elisei@arm.com> References: <20240125164256.4147-1-alexandru.elisei@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 5994C4002A X-Rspam-User: X-Stat-Signature: rckzcsfgickrcwj6x4kkxnrk493koy79 X-Rspamd-Server: rspam01 X-HE-Tag: 1706201158-354265 X-HE-Meta: U2FsdGVkX1+hszJF/lJQXpBoud9k5qw/9E3glE4MgezIxOwW1yIZtdgnI6pp5Uu0DEPu4iWfSp5GtuUibcErQ28oVDTbQkh/43iwYs666xLlyqrsWx0S7swGAEvMJoqtWiKgWevMHBkh2YdOgaPulLMhBA2DQ/nVLw4+SaoZDkZqNUEcek2cpm3oY04GudbnJfhsj8L8XZHRcAV+dAHvl9Jbwx/nx8HNziL8oTWiVsfYULn9ba2OlYAwTORhLmWp+nj1+Jcl3WxPuK3qOultzDjzG4QS3WW2NZTZmur1jd/7fwEnTwl1oIHmYwuKrQvoe+sp5GpAWsWB5Acf4HmDtmnFkI2otDTLZUkm3iXwvVn7BjI5CFFULOakw+ditU+QQIxV9X6Q2KZDVrYGc2u9zkvzbIyuBbfuKXfwjt7JoIPm36ZQYBBle9HmUcL+LD5xVzA68FmGU2x/u2WHynrQE6gRjvsHUlON8OvBQoLj9R6Y00C555JFozYSKOXTkSy1t2DuVFNV546w+qUcYljQDnDHXGfL4ninjrzBKeVcSvayJYrmnJso7LeDD2VtYP9TOQ4rTplb0G/nDAuXuXVUaUk7qrIwBsnms9RFs+Iar/DGirnwBk3vvKSQ5+5oChFyvY8E3S+Rh0jy1Ocln4xehCEweLFnJnxHn/dCjih0IhtqFVmocvBx8huKtmJozro2Z/EKpIAktFulVFvWibeCL+NQDhYX4BRgykGcLN0NDtfBuou1ineh8rQdmX6AvZQHPxTPidm8uq2tS8uTFQuzIguMn1qQGaqeuUvlBn8fT0MlEY83EPIUVPyrpbYYOGeWEgdHMspKiu04v0iIfDWQDXpy2JU4aBq2TI7SRFvgDnv82IQENGJsY46E5Wjaf6aatD2d7+f036YNYOJ2cSc90wapMP88MdGBLyVXA5ZXeGUYkvaXMs+DE9+mehWyWPYJbrsbBuTCr/Mkhzdlb/B SqBlztS1 ruHnv5InrKKhUGZNFqaAsl1LTf94+87ruXjh/Xa/iObKvRybD2ZwiTnyjR9DD7u7Sj/tPv4yGFFYbxnss7EeUGbsg4AnwuRJ7DYF7eWilQyrPBzZ1tT8JdUrY8k1rsxWFKt81E+ZFs3eTNOQdyuCEyde5M0Ev2zuUjhV4dwXoLO/uNRdMAFwMSUG300tqgxybcTZe9nwHMeCHSGlp07Ghd6A+Xg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Tag storage pages mapped by the host in a VM with MTE enabled are migrated when they are first accessed by the guest. This introduces latency spikes for memory accesses made by the guest. Tag storage pages can be mapped in the guest memory when the VM_MTE VMA flag is not set. Introduce a new VMA flag, VM_MTE_KVM, to stop tag storage pages from being mapped in a VM with MTE enabled. The flag is different from VM_MTE, because the pages from the VMA won't be mapped as tagged in the host, and host's userspace can continue to access the guest memory as Untagged. The flag's only function is to instruct the page allocator to treat the allocation as tagged, so tag storage pages aren't used. The page allocator will also try to reserve tag storage for the new page, which can speed up stage 2 aborts further if the VMM has accessed the memory before the guest. For example, qemu and kvmtool will benefit from this change because the guest image is copied after the memslot is created. Signed-off-by: Alexandru Elisei --- Changes since rfc v2: * New patch. arch/arm64/kvm/mmu.c | 77 ++++++++++++++++++++++++++++++++++++++++++- arch/arm64/mm/fault.c | 2 +- include/linux/mm.h | 2 ++ 3 files changed, 79 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 986a9544228d..45c57c4b9fe2 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1420,7 +1420,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, unsigned long mmu_seq; struct kvm *kvm = vcpu->kvm; struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; - struct vm_area_struct *vma; + struct vm_area_struct *vma, *old_vma; short vma_shift; gfn_t gfn; kvm_pfn_t pfn; @@ -1428,6 +1428,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, long vma_pagesize, fault_granule; enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; struct kvm_pgtable *pgt; + bool vma_has_kvm_mte = false; if (fault_is_perm) fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu); @@ -1506,6 +1507,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, gfn = fault_ipa >> PAGE_SHIFT; mte_allowed = kvm_vma_mte_allowed(vma); + vma_has_kvm_mte = !!(vma->vm_flags & VM_MTE_KVM); + old_vma = vma; /* Don't use the VMA after the unlock -- it may have vanished */ vma = NULL; @@ -1521,6 +1524,27 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mmu_seq = vcpu->kvm->mmu_invalidate_seq; mmap_read_unlock(current->mm); + /* + * If the VMA was created after the memslot, it doesn't have the + * VM_MTE_KVM flag set. + */ + if (unlikely(tag_storage_enabled() && !fault_is_perm && + kvm_has_mte(kvm) && mte_allowed && !vma_has_kvm_mte)) { + mmap_write_lock(current->mm); + vma = vma_lookup(current->mm, hva); + /* The VMA was changed, replay the fault. */ + if (vma != old_vma) { + mmap_write_unlock(current->mm); + return 0; + } + if (!(vma->vm_flags & VM_MTE_KVM)) { + vma_start_write(vma); + vm_flags_reset(vma, vma->vm_flags | VM_MTE_KVM); + } + vma = NULL; + mmap_write_unlock(current->mm); + } + pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, write_fault, &writable, NULL); @@ -1986,6 +2010,40 @@ int __init kvm_mmu_init(u32 *hyp_va_bits) return err; } +static int kvm_set_clear_kvm_mte_vma(const struct kvm_memory_slot *memslot, bool set) +{ + struct vm_area_struct *vma; + hva_t hva, memslot_end; + int ret = 0; + + hva = memslot->userspace_addr; + memslot_end = hva + (memslot->npages << PAGE_SHIFT); + + mmap_write_lock(current->mm); + + do { + vma = find_vma_intersection(current->mm, hva, memslot_end); + if (!vma) + break; + if (!kvm_vma_mte_allowed(vma)) + continue; + if (set) { + if (!(vma->vm_flags & VM_MTE_KVM)) { + vma_start_write(vma); + vm_flags_reset(vma, vma->vm_flags | VM_MTE_KVM); + } + } else if (vma->vm_flags & VM_MTE_KVM) { + vma_start_write(vma); + vm_flags_reset(vma, vma->vm_flags & ~VM_MTE_KVM); + } + hva = min(memslot_end, vma->vm_end); + } while (hva < memslot_end); + + mmap_write_unlock(current->mm); + + return ret; +} + void kvm_arch_commit_memory_region(struct kvm *kvm, struct kvm_memory_slot *old, const struct kvm_memory_slot *new, @@ -1993,6 +2051,23 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, { bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES; + if (kvm_has_mte(kvm) && change != KVM_MR_FLAGS_ONLY) { + switch (change) { + case KVM_MR_CREATE: + kvm_set_clear_kvm_mte_vma(new, true); + break; + case KVM_MR_DELETE: + kvm_set_clear_kvm_mte_vma(old, false); + break; + case KVM_MR_MOVE: + kvm_set_clear_kvm_mte_vma(old, false); + kvm_set_clear_kvm_mte_vma(new, true); + break; + default: + WARN(true, "Unknown memslot change"); + } + } + /* * At this point memslot has been committed and there is an * allocated dirty_bitmap[], dirty pages will be tracked while the diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 5c12232bdf0b..f4ca3ba8dde7 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -947,7 +947,7 @@ NOKPROBE_SYMBOL(do_debug_exception); */ gfp_t arch_calc_vma_gfp(struct vm_area_struct *vma, gfp_t gfp) { - if (vma->vm_flags & VM_MTE) + if (vma->vm_flags & (VM_MTE |VM_MTE_KVM)) return __GFP_TAGGED; return 0; } diff --git a/include/linux/mm.h b/include/linux/mm.h index f5a97dec5169..924aa7c26ec9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -375,9 +375,11 @@ extern unsigned int kobjsize(const void *objp); #if defined(CONFIG_ARM64_MTE) # define VM_MTE VM_HIGH_ARCH_0 /* Use Tagged memory for access control */ # define VM_MTE_ALLOWED VM_HIGH_ARCH_1 /* Tagged memory permitted */ +# define VM_MTE_KVM VM_HIGH_ARCH_2 /* VMA is mapped in a virtual machine with MTE */ #else # define VM_MTE VM_NONE # define VM_MTE_ALLOWED VM_NONE +# define VM_MTE_KVM VM_NONE #endif #ifndef VM_GROWSUP -- 2.43.0