From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E086AC6FD1C for ; Fri, 24 Mar 2023 02:21:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 69E4E6B0072; Thu, 23 Mar 2023 22:21:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 64E606B0074; Thu, 23 Mar 2023 22:21:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EED06B0075; Thu, 23 Mar 2023 22:21:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3CF8B6B0072 for ; Thu, 23 Mar 2023 22:21:09 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1283E120310 for ; Fri, 24 Mar 2023 02:21:09 +0000 (UTC) X-FDA: 80602189458.12.7005A1F Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf20.hostedemail.com (Postfix) with ESMTP id CFC141C0016 for ; Fri, 24 Mar 2023 02:21:05 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="WcY/I0Ob"; spf=none (imf20.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679624466; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UM6nuJkZqc83L3bN44Cme0CvbQ3eBH9mAUTa33I78Mw=; b=fNJ+cVmShqkG6pdPMmi2hpJ/wAHfVZ3VA6BOOux0+zUQad9Xy7bQyvS57Tg3Ttg0pAiPPj xUI2wEALjQj/Xl+ENDmDAZAb1AkSQb7IXMRSM6mHUcA2CnpCXYt5TjO200JjMsbOGlX7lu mrMnUzrPwAPcerOYL8ihdQTV/nI6QcI= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="WcY/I0Ob"; spf=none (imf20.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679624466; a=rsa-sha256; cv=none; b=Dc68nB9s28/24pe6nZRsGYnUQO1ReKsDvFcDlwpaamLw4QQcWiCZidbmKA8fp780vBM65a jhUZ3pNf/EUTNBXHyDSTYkRn9bW9edbHEmtAKsQU61RpCIAIPzLUhhuoUv+SMDLnTQziEK veB/ESNitb2K+b5RaeDR2QHN2TQRTPY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679624465; x=1711160465; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=WC/GKCLsgaUV1To9x+OiRyuA8e2gZbfaT5Wbg90dNqM=; b=WcY/I0ObkCpe8ZDLoe044nww/K1JcPSp9t9iPaT8i9sP1YkhrnnDkhTT +/Redidn+ESPSamHmuEa4CergYSCe+tsSbTL/M1tRLFbO5x1hyRcxYu9O irqJwuoCbGPfeIo1JtG/C/rsp0lOEcM2nxUE1Jg8hZOxhNM1VBTC2eHjl RuqnfykveF1GjCp/A81dXw4WySOoEWJsmuxgTuoGp0jlWlcWrRFtwJRoB uNScGsG9fkuDWqsGQMGTlHNDIqvKO/xsJqaL2l7B+2CdQmdsO5LO+Ouyy e6IH+DbMvuU4PAlPHbE3zBUsCYyTdTE3mHjiOGMvyQA339mgWZrH+QYth Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10658"; a="338401272" X-IronPort-AV: E=Sophos;i="5.98,286,1673942400"; d="scan'208";a="338401272" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2023 19:21:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10658"; a="746987117" X-IronPort-AV: E=Sophos;i="5.98,286,1673942400"; d="scan'208";a="746987117" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.192.105]) by fmsmga008.fm.intel.com with ESMTP; 23 Mar 2023 19:20:52 -0700 Date: Fri, 24 Mar 2023 10:13:17 +0800 From: Chao Peng To: Michael Roth Cc: Sean Christopherson , Isaku Yamahata , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Arnd Bergmann , Naoya Horiguchi , Miaohe Lin , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , tabba@google.com, mhocko@suse.com, wei.w.wang@intel.com Subject: Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM Message-ID: <20230324021317.GB2774613@chaop.bj.intel.com> Reply-To: Chao Peng References: <20221202061347.1070246-1-chao.p.peng@linux.intel.com> <20230119111308.GC2976263@ls.amr.corp.intel.com> <20230119223704.GD2976263@ls.amr.corp.intel.com> <20230213130102.two7q3kkcf254uof@amd.com> <20230221121135.GA1595130@chaop.bj.intel.com> <20230323012737.7vn4ynsbfz7c2ch4@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230323012737.7vn4ynsbfz7c2ch4@amd.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: CFC141C0016 X-Rspam-User: X-Stat-Signature: 19qeops5n6k5hnhcyw6fxs5ng9ycjte4 X-HE-Tag: 1679624465-285429 X-HE-Meta: U2FsdGVkX1/XaDMY8E00A8zaQoh4vGgJASrKUTDV+jtO3J9MR7KjT9GM88S9CrRbAJBEhOReBwHf7xTDL1UIy+W1ummO+VUzFPei/J7omMBBF/Oc5353qMA0zOF6I+ru/8eh5OTv6zGEGPgNW7QtCkzpxbPI4FKFfXoSLn6APqwFIhil9ggITgVd4m5VFz+WZWUle0kdpEZP+KIyqWzFJc+qjHBymA1rhT4+EdTUy+DsJhjHm1KKDa/94Q0PJd0vl5jbQKhhUzG35MIFZRX4+D0tICctfJ6BrSS8HUkgf+AiWr0AEUbdW9k8umDnxuI+yP3tCRkk3zjYBaDBEisIPpSe5VbF29/9bKkh0UVE02rmauSWnqXlTPpnP7Kh99hFzFj8Kve71PRF7793A6/jJaQHy9y8Q7FS/SP4L2EECUY95wdMFGAmMdC3pGq4DsHJhKX1V1lJb5n/1+IFLCATZvPoWyATy0k9g0/Et00HKwW5GVvv6RVKmBHOPmr8NUsEe52ACk6xK4HRAYFNSD/6i1GSR2R3PRc8bMxxvbgnusauczH0hIadhffzLwSEy5ADBGl2xQQ9bjzhYelJ5CVGSeH+IBDU13p/+UM1azENojU+weZzS9mSeMft7gnPHgLDmi0lGC2gJJLxbEYntIiR2Rpb+zuk936uxNr2pl58OslIrTCJZNJ/SFbSlycF+SSv1D1oZRvr5N4mE+qI6Rra9ahoMvnDFh/hvCEvM/rQnSfMqPx27RFQYUkJa/+eDabWqNgvXIjReyvsJZO5bKo2/D0Z3orcMuxocNXEOQS/KklbDY7TN2Wy6vZ8++bIRq4nR+SApEj8bCUOzrI0jVnpIwAe4uiujDAxs6oFsboocd76pjXAsZdh2N42wB1Nlu9/lR98E6weBDXmgR4vvfzmYsLV9t133wj9W1VkY+OGvXmROxLYyZe7U6Wz1Zp0ZkIjEJjs/uw/6MJsjPNzVrk 6xxgJaXZ SOLjeoZnqkv0ZAIV2KywueWsKiCT2mt3HwPKYK9Fx/ze5miatPvWrdtIf3CpOH2vK2Iid1akGinBzT1yOSRCylKJbwMoBWTB03E6J33bLeGmMh9MM8d6bYudnCo58E484wy6+QYi2XSUsx+9hRpTqdlDRCrjUuGmDICoW/vRcc1ZhNpdw5c/MTbijz6nSANolrooI71IGSskWh3Y2uVmmh+PL6h9jQAF44RQ6Wx29aqMzFOuP0hQpIvqddm2Qw6QSL2R2MvXHcbmSs0jdj313ckj19jOqp6Z46tp4ZzO7R8OoMZsjmy6AzKeVh5IYnpoAAXYBhi4qgqxnPcK1WjfNtXcfMrrNWiLqdGzZP8LTMRpV597OWOL6k/HbzgXMxcc9iu+M6PsUMimU2Vc3z8qLzW8ohO6ySVwYB1Y1ytuqC8O5vBfXF4F2bNeivg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 22, 2023 at 08:27:37PM -0500, Michael Roth wrote: > On Tue, Feb 21, 2023 at 08:11:35PM +0800, Chao Peng wrote: > > > Hi Sean, > > > > > > We've rebased the SEV+SNP support onto your updated UPM base support > > > tree and things seem to be working okay, but we needed some fixups on > > > top of the base support get things working, along with 1 workaround > > > for an issue that hasn't been root-caused yet: > > > > > > https://github.com/mdroth/linux/commits/upmv10b-host-snp-v8-wip > > > > > > *stash (upm_base_support): mm: restrictedmem: Kirill's pinning implementation > > > *workaround (use_base_support): mm: restrictedmem: loosen exclusivity check > > > > What I'm seeing is Slot#3 gets added first and then deleted. When it's > > gets added, Slot#0 already has the same range bound to restrictedmem so > > trigger the exclusive check. This check is exactly the current code for. > > With the following change in QEMU, we no longer trigger this check: > > diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c > index 20da121374..849b5de469 100644 > --- a/hw/pci-host/q35.c > +++ b/hw/pci-host/q35.c > @@ -588,9 +588,9 @@ static void mch_realize(PCIDevice *d, Error **errp) > memory_region_init_alias(&mch->open_high_smram, OBJECT(mch), "smram-open-high", > mch->ram_memory, MCH_HOST_BRIDGE_SMRAM_C_BASE, > MCH_HOST_BRIDGE_SMRAM_C_SIZE); > + memory_region_set_enabled(&mch->open_high_smram, false); > memory_region_add_subregion_overlap(mch->system_memory, 0xfeda0000, > &mch->open_high_smram, 1); > - memory_region_set_enabled(&mch->open_high_smram, false); > > I'm not sure if QEMU is actually doing something wrong here though or if > this check is putting tighter restrictions on userspace than what was > expected before. Will look into it more. I don't think above QEMU change is upstream acceptable. It may break functionality for 'normal' VMs. The UPM check does putting tighter restriction, the restriction is that you can't bind the same fd range to more than one memslot. For SMRAM in QEMU however, it violates this restriction. The right 'fix' is disabling SMM in QEMU for UPM usages rather than trying to work around it. There is more discussion in below link: https://lore.kernel.org/all/Y8bOB7VuVIsxoMcn@google.com/ Chao > > > > > > *fixup (upm_base_support): KVM: use inclusive ranges for restrictedmem binding/unbinding > > > *fixup (upm_base_support): mm: restrictedmem: use inclusive ranges for issuing invalidations > > > > As many kernel APIs treat 'end' as exclusive, I would rather keep using > > exclusive 'end' for these APIs(restrictedmem_bind/restrictedmem_unbind > > and notifier callbacks) but fix it internally in the restrictedmem. E.g. > > all the places where xarray API needs a 'last'/'max' we use 'end - 1'. > > See below for the change. > > Yes I did feel like I was fighting the kernel a bit on that; your > suggestion seems like it would be a better fit. > > > > > > *fixup (upm_base_support): KVM: fix restrictedmem GFN range calculations > > > > Subtracting slot->restrictedmem.index for start/end in > > restrictedmem_get_gfn_range() is the correct fix. > > > > > *fixup (upm_base_support): KVM: selftests: CoCo compilation fixes > > > > > > We plan to post an updated RFC for v8 soon, but also wanted to share > > > the staging tree in case you end up looking at the UPM integration aspects > > > before then. > > > > > > -Mike > > > > This is the restrictedmem fix to solve 'end' being stored and checked in xarray: > > Looks good. > > Thanks! > > -Mike > > > > > --- a/mm/restrictedmem.c > > +++ b/mm/restrictedmem.c > > @@ -46,12 +46,12 @@ static long restrictedmem_punch_hole(struct restrictedmem *rm, int mode, > > */ > > down_read(&rm->lock); > > > > - xa_for_each_range(&rm->bindings, index, notifier, start, end) > > + xa_for_each_range(&rm->bindings, index, notifier, start, end - 1) > > notifier->ops->invalidate_start(notifier, start, end); > > > > ret = memfd->f_op->fallocate(memfd, mode, offset, len); > > > > - xa_for_each_range(&rm->bindings, index, notifier, start, end) > > + xa_for_each_range(&rm->bindings, index, notifier, start, end - 1) > > notifier->ops->invalidate_end(notifier, start, end); > > > > up_read(&rm->lock); > > @@ -224,7 +224,7 @@ static int restricted_error_remove_page(struct address_space *mapping, > > } > > spin_unlock(&inode->i_lock); > > > > - xa_for_each_range(&rm->bindings, index, notifier, start, end) > > + xa_for_each_range(&rm->bindings, index, notifier, start, end - 1) > > notifier->ops->error(notifier, start, end); > > break; > > } > > @@ -301,11 +301,12 @@ int restrictedmem_bind(struct file *file, pgoff_t start, pgoff_t end, > > if (exclusive != rm->exclusive) > > goto out_unlock; > > > > - if (exclusive && xa_find(&rm->bindings, &start, end, XA_PRESENT)) > > + if (exclusive && > > + xa_find(&rm->bindings, &start, end - 1, XA_PRESENT)) > > goto out_unlock; > > } > > > > - xa_store_range(&rm->bindings, start, end, notifier, GFP_KERNEL); > > + xa_store_range(&rm->bindings, start, end - 1, notifier, GFP_KERNEL); > > rm->exclusive = exclusive; > > ret = 0; > > out_unlock: > > @@ -320,7 +321,7 @@ void restrictedmem_unbind(struct file *file, pgoff_t start, pgoff_t end, > > struct restrictedmem *rm = file->f_mapping->private_data; > > > > down_write(&rm->lock); > > - xa_store_range(&rm->bindings, start, end, NULL, GFP_KERNEL); > > + xa_store_range(&rm->bindings, start, end - 1, NULL, GFP_KERNEL); > > synchronize_rcu(); > > up_write(&rm->lock); > > }