From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08AFCC678D7 for ; Tue, 17 Jan 2023 13:20:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8EC8A6B0071; Tue, 17 Jan 2023 08:20:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 89CD56B0073; Tue, 17 Jan 2023 08:20:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73DE86B0074; Tue, 17 Jan 2023 08:20:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6436B6B0071 for ; Tue, 17 Jan 2023 08:20:57 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 300051A0B6F for ; Tue, 17 Jan 2023 13:20:57 +0000 (UTC) X-FDA: 80364351354.22.04229CB Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf03.hostedemail.com (Postfix) with ESMTP id B5D9320012 for ; Tue, 17 Jan 2023 13:20:54 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=AIdG6ge+; spf=none (imf03.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.151) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673961655; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AX9igY5coR0NLRh160M/4PyGSmWPI9js6B7eJK5Kiy0=; b=YSqSIXX5w5NIXINskHbLzGU0DVyXM6o6AYysovJH15mAL4MqCBxWWoDusgLpYJCPuS/Flj psRRSQWCDT9MW0jdtqPmaNJqSnO8BNwdbxIiJZokt7qiCoH1nHr+0ANdmvsEGIRFp3dKXY s3dz9P+9VqdIa684Af2syQqH6GssXyE= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=AIdG6ge+; spf=none (imf03.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.151) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673961655; a=rsa-sha256; cv=none; b=bS4ZGkOc1c1Z5wPrFe4mdrJpFVuiATeNXKfRdujXNDiw3zDbvmIFC276kLcMyJNt9FUZtp mwwyg8qIjwLLu+SsiwLNNUFmVbJDX8eRCuRO32buPUMHYFFuZH96ZyAT/DsauVEbGNH53m pOdQeVM1h85eptaPszsWJU2enY76ttk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673961654; x=1705497654; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=ydzl4pu/BnSzLPZ1RlJR0Y9KESs9mzMw8rQlYf4Kzvo=; b=AIdG6ge+uUc9CoQTFA0QRG4ywR+nvy4wo+g7xr9fafsWto1tY/8vHBKw /GGf0q9QZyxMfI7tbT20pHcoauuqgbkQ3IxQAcrWgKl06a2OwzrCnEUwl J2FnA4Wyhps3/SGok2w6UnG+Nn2JtmjwHvjSEf0yviYaSZiDZ0zV6kqWZ 1sqTLYBjWcP0MsfiDXQ4Trwe/cwbf+6KIwlYwicut6TE5mYh+TEvuX8+V tI7jl/Kn+qE0Yywlbmjo7+5Pu8+t369h6bSC+8tv0C6gnzrymO4Zkqksu ENN6SVTmth0GxIvT/S+479DKm9GhF7278xRi9DrayIu1r773FNZ8eBYnV g==; X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="305067247" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="305067247" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2023 05:20:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10592"; a="689797088" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="689797088" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.192.105]) by orsmga008.jf.intel.com with ESMTP; 17 Jan 2023 05:20:39 -0800 Date: Tue, 17 Jan 2023 21:12:51 +0800 From: Chao Peng To: Sean Christopherson Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Arnd Bergmann , Naoya Horiguchi , Miaohe Lin , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , tabba@google.com, Michael Roth , mhocko@suse.com, wei.w.wang@intel.com Subject: Re: [PATCH v10 9/9] KVM: Enable and expose KVM_MEM_PRIVATE Message-ID: <20230117131251.GC273037@chaop.bj.intel.com> Reply-To: Chao Peng References: <20221202061347.1070246-1-chao.p.peng@linux.intel.com> <20221202061347.1070246-10-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B5D9320012 X-Rspam-User: X-Stat-Signature: 5nfjsk4b9iguz4xaotj7oahys6dwcf8a X-HE-Tag: 1673961654-575539 X-HE-Meta: U2FsdGVkX1/7vH14SkPyWFMX4UyFtXcqJ3GUnltYKEHHI5zdAKkEJYy+cYyf6FDnSvEwBhBRbWttYO+IDgJL6lnwbiYnIU6RD/bb2Ym1Q4VjPj7Kik9AmF+gOjb+WE0zxEbg27PFTzFAK8eJJjS0lAMbcju2Xx/fhK7D/XVrlHaoT3+us7fCpWFYF6nIJItPjbT140DJvwsVmUcj2DajSt4KMDRHlwk6eVPVOFAGueoY2j4vUd6aW7KXWb9cBjyR7Vs85vzdr/qpQqyfVKC2+3/4u1hToEaPRtgspwsNQYy9E77EYVvxZEHH3WesJz8SZbc0JHiJ8bnYJmtPAT6Fr+lYMuIoyMDZrParW2sN0fjCq+J+TlLUCNG+LIH514HhIY0LJh5jIVObIutKsu3Sm8c2J3J5UiRQpstCciUZVyG+ZsBRSSId3pi6b3T+QKaEecHLUngatBUfj6/Zy4zcqPIhgOwpYA5XrsFrroTCi6u+69Wdog0JjIcmsRjtj9lB5Eq92mPYGYRmnjxqZcS9SaA2eK/n0JDE7M8tY1ELcTCK8nq3TYUKCIXwRUTv7U1sa9S4sLt9u1CdmiSwOXjBt76ATfGS/V3A/Q50yE3Di93oMGmRvjz7Qnh/ozBOWhJ8ASSMOaMVksDAUL4RCd+JBcO0CRLSD0cf8Bu+IWxo0mRsTwDmRkq932ZcqXXRqQZ9i+x0lHUe2HBtyQ+/wZvJlUXsV2OBVZNA7o16DLBTVFaHQ9c7tIlm6VGo3B6LOtmEV4smuwhUvgecq+kADV4VcZs8UIFiZf61JPfvmzHDIzbsNqarqoB2si+ady2nas+2h0jJ76F1w2/DXrv/wTfAgdUGMzogH5DM/WQ7xYJEUkaS4dDGJ6eIweet9E6PJuV+TQYbapj0Z/mbDL7BZzGU5aUozZXdQ9ZIU8+m0oWuVOGIlUKhRSC/DYwlBwzPpnQle2g1jnNbB/HeMuAG+gu Povh/otl 5Yeo4Z0C61MWbFu7/GkAfcHyssA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jan 14, 2023 at 12:01:01AM +0000, Sean Christopherson wrote: > On Fri, Dec 02, 2022, Chao Peng wrote: > > @@ -10357,6 +10364,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > > > > if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu)) > > static_call(kvm_x86_update_cpu_dirty_logging)(vcpu); > > + > > + if (kvm_check_request(KVM_REQ_MEMORY_MCE, vcpu)) { > > + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; > > Synthesizing triple fault shutdown is not the right approach. Even with TDX's > MCE "architecture" (heavy sarcasm), it's possible that host userspace and the > guest have a paravirt interface for handling memory errors without killing the > host. Agree shutdown is not the correct choice. I see you made below change: send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, PAGE_SHIFT, current) The MCE may happen in any thread than KVM thread, sending siginal to 'current' thread may not be the expected behavior. Also how userspace can tell is the MCE on the shared page or private page? Do we care? > > > + r = 0; > > + goto out; > > + } > > } > > > > @@ -1982,6 +2112,10 @@ int __kvm_set_memory_region(struct kvm *kvm, > > !access_ok((void __user *)(unsigned long)mem->userspace_addr, > > mem->memory_size)) > > return -EINVAL; > > + if (mem->flags & KVM_MEM_PRIVATE && > > + (mem->restricted_offset & (PAGE_SIZE - 1) || > > Align indentation. > > > + mem->restricted_offset > U64_MAX - mem->memory_size)) > > Strongly prefer to use similar logic to existing code that detects wraps: > > mem->restricted_offset + mem->memory_size < mem->restricted_offset > > This is also where I'd like to add the "gfn is aligned to offset" check, though > my brain is too fried to figure that out right now. > > > + return -EINVAL; > > if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM) > > return -EINVAL; > > if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr) > > @@ -2020,6 +2154,9 @@ int __kvm_set_memory_region(struct kvm *kvm, > > if ((kvm->nr_memslot_pages + npages) < kvm->nr_memslot_pages) > > return -EINVAL; > > } else { /* Modify an existing slot. */ > > + /* Private memslots are immutable, they can only be deleted. */ > > I'm 99% certain I suggested this, but if we're going to make these memslots > immutable, then we should straight up disallow dirty logging, otherwise we'll > end up with a bizarre uAPI. But in my mind dirty logging will be needed in the very short time, when live migration gets supported? > > > + if (mem->flags & KVM_MEM_PRIVATE) > > + return -EINVAL; > > if ((mem->userspace_addr != old->userspace_addr) || > > (npages != old->npages) || > > ((mem->flags ^ old->flags) & KVM_MEM_READONLY)) > > @@ -2048,10 +2185,28 @@ int __kvm_set_memory_region(struct kvm *kvm, > > new->npages = npages; > > new->flags = mem->flags; > > new->userspace_addr = mem->userspace_addr; > > + if (mem->flags & KVM_MEM_PRIVATE) { > > + new->restricted_file = fget(mem->restricted_fd); > > + if (!new->restricted_file || > > + !file_is_restrictedmem(new->restricted_file)) { > > + r = -EINVAL; > > + goto out; > > + } > > + new->restricted_offset = mem->restricted_offset; I see you changed slot->restricted_offset type from loff_t to gfn_t and used pgoff_t when doing the restrictedmem_bind/unbind(). Using page index is reasonable KVM internally and sounds simpler than loff_t. But we also need initialize it to page index here as well as changes in another two cases. This is needed when restricted_offset != 0. diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 547b92215002..49e375e78f30 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2364,8 +2364,7 @@ static inline int kvm_restricted_mem_get_pfn(struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, int *order) { - pgoff_t index = gfn - slot->base_gfn + - (slot->restricted_offset >> PAGE_SHIFT); + pgoff_t index = gfn - slot->base_gfn + slot->restricted_offset; struct page *page; int ret; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 01db35ddd5b3..7439bdcb0d04 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -935,7 +935,7 @@ static bool restrictedmem_range_is_valid(struct kvm_memory_slot *slot, pgoff_t start, pgoff_t end, gfn_t *gfn_start, gfn_t *gfn_end) { - unsigned long base_pgoff = slot->restricted_offset >> PAGE_SHIFT; + unsigned long base_pgoff = slot->restricted_offset; if (start > base_pgoff) *gfn_start = slot->base_gfn + start - base_pgoff; @@ -2275,7 +2275,7 @@ int __kvm_set_memory_region(struct kvm *kvm, r = -EINVAL; goto out; } - new->restricted_offset = mem->restricted_offset; + new->restricted_offset = mem->restricted_offset >> PAGE_SHIFT; } r = kvm_set_memslot(kvm, old, new, change); Chao > > + } > > + > > + new->kvm = kvm; > > Set this above, just so that the code flows better.