From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F08BACCA47F for ; Fri, 8 Jul 2022 03:30:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E7AC900003; Thu, 7 Jul 2022 23:30:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 570AF900002; Thu, 7 Jul 2022 23:30:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3EC19900003; Thu, 7 Jul 2022 23:30:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2A5EF900002 for ; Thu, 7 Jul 2022 23:30:06 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EC97D34CEC for ; Fri, 8 Jul 2022 03:30:05 +0000 (UTC) X-FDA: 79662503970.01.359DD94 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf30.hostedemail.com (Postfix) with ESMTP id A76C48005B for ; Fri, 8 Jul 2022 03:30:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657251004; x=1688787004; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=PT02IACBa7Wm0O3NAyqOHw8PuqCfYgWEKpI8T6m3E9Q=; b=R470cfgi/bd/uazNpZ+qCDJ8eZDgshwla/5Py6h7HpbDByohy82534rX 5Ale6lWjVBoH8mcOMkSYmQQK/Ia1k3aQWcSuRDnJ7tLRijHKU/Tqd+j7n 4AVXXDqVm0kKOaKx80Cw5OrLjhXGwP8F2gygr2NhfRoTyGY9JnP5seI52 6OwGHnpS69uCZBLhF0ByRH0Yar87G3ZcCeq3oWW3RPtJgmneQsZBjkTrK IEGI1jpsWha7OCMzoAbbxVm4R72PyuC4QS7tRPKOArj7OMNLO9bRpDA1H e2+/INd94NOOyl01+RRbowjTwUCNLacheEor0qa38lm6CRsuyYLc1nUbT g==; X-IronPort-AV: E=McAfee;i="6400,9594,10401"; a="348162326" X-IronPort-AV: E=Sophos;i="5.92,254,1650956400"; d="scan'208";a="348162326" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jul 2022 20:30:03 -0700 X-IronPort-AV: E=Sophos;i="5.92,254,1650956400"; d="scan'208";a="651398466" Received: from xiaoyaol-hp-g830.ccr.corp.intel.com (HELO [10.249.175.131]) ([10.249.175.131]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jul 2022 20:29:51 -0700 Message-ID: <5d0b9341-78b5-0959-2517-0fb1fe83a205@intel.com> Date: Fri, 8 Jul 2022 11:29:49 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 Thunderbird/91.11.0 Subject: Re: [PATCH v6 6/8] KVM: Handle page fault for private memory Content-Language: en-US To: Sean Christopherson Cc: Michael Roth , Vishal Annapurve , Chao Peng , "Nikunj A. Dadhania" , kvm list , LKML , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86 , "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Yu Zhang , "Kirill A . Shutemov" , Andy Lutomirski , Jun Nakajima , Dave Hansen , Andi Kleen , David Hildenbrand , aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , mhocko@suse.com References: <20220519153713.819591-1-chao.p.peng@linux.intel.com> <20220519153713.819591-7-chao.p.peng@linux.intel.com> <20220624090246.GA2181919@chaop.bj.intel.com> <20220630222140.of4md7bufd5jv5bh@amd.com> <4fe3b47d-e94a-890a-5b87-6dfb7763bc7e@intel.com> From: Xiaoyao Li In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657251005; a=rsa-sha256; cv=none; b=R1OP/o25iM081cRu0F8aQTbXjz5LHub7T/LcyggGoXFWiwP2NPaxVfk9uGjU+K2lt8Xmch 0FasiBRu9gpEQGFkJDl4lBO6ML6GLGPQnfkH0wh3Y989QAF+muehmzrW7SY4DXDLkXMe4O YD6Blqw/q5rw2+q0/3Fe+me9630uopQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=R470cfgi; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf30.hostedemail.com: domain of xiaoyao.li@intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=xiaoyao.li@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657251005; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BQVhsVniGnOyGzQCypRxUs419LmykQ2y8vluMGbYZFI=; b=Kwj3+kqkyAOn7gVkS+zyp3tWpeH7heh+0vfjEI0IVxEjf8YH6/W57y6OXrFdSYluiEd3dE tVhgqVIz/G4UaYRyvHm4fNLgYvF6v1eEco9axciqxPkTzDXXVkiILYNpVBg6KGV8ZkgfkO xyldVErEaj/XdzgX8NnJAcMWeTQJ2jk= X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: A76C48005B Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=R470cfgi; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf30.hostedemail.com: domain of xiaoyao.li@intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=xiaoyao.li@intel.com X-Stat-Signature: zgshonqtpx1h4pocgfti8rcftqixe8mc X-Rspam-User: X-HE-Tag: 1657251004-769031 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/8/2022 4:08 AM, Sean Christopherson wrote: > On Fri, Jul 01, 2022, Xiaoyao Li wrote: >> On 7/1/2022 6:21 AM, Michael Roth wrote: >>> On Thu, Jun 30, 2022 at 12:14:13PM -0700, Vishal Annapurve wrote: >>>> With transparent_hugepages=always setting I see issues with the >>>> current implementation. > > ... > >>>> Looks like with transparent huge pages enabled kvm tried to handle the >>>> shared memory fault on 0x84d gfn by coalescing nearby 4K pages >>>> to form a contiguous 2MB page mapping at gfn 0x800, since level 2 was >>>> requested in kvm_mmu_spte_requested. >>>> This caused the private memory contents from regions 0x800-0x84c and >>>> 0x86e-0xa00 to get unmapped from the guest leading to guest vm >>>> shutdown. >>> >>> Interesting... seems like that wouldn't be an issue for non-UPM SEV, since >>> the private pages would still be mapped as part of that 2M mapping, and >>> it's completely up to the guest as to whether it wants to access as >>> private or shared. But for UPM it makes sense this would cause issues. >>> >>>> >>>> Does getting the mapping level as per the fault access type help >>>> address the above issue? Any such coalescing should not cross between >>>> private to >>>> shared or shared to private memory regions. >>> >>> Doesn't seem like changing the check to fault->is_private would help in >>> your particular case, since the subsequent host_pfn_mapping_level() call >>> only seems to limit the mapping level to whatever the mapping level is >>> for the HVA in the host page table. >>> >>> Seems like with UPM we need some additional handling here that also >>> checks that the entire 2M HVA range is backed by non-private memory. >>> >>> Non-UPM SNP hypervisor patches already have a similar hook added to >>> host_pfn_mapping_level() which implements such a check via RMP table, so >>> UPM might need something similar: >>> >>> https://github.com/AMDESE/linux/commit/ae4475bc740eb0b9d031a76412b0117339794139 >>> >>> -Mike >>> >> >> For TDX, we try to track the page type (shared, private, mixed) of each gfn >> at given level. Only when the type is shared/private, can it be mapped at >> that level. When it's mixed, i.e., it contains both shared pages and private >> pages at given level, it has to go to next smaller level. >> >> https://github.com/intel/tdx/commit/ed97f4042eb69a210d9e972ccca6a84234028cad > > Hmm, so a new slot->arch.page_attr array shouldn't be necessary, KVM can instead > update slot->arch.lpage_info on shared<->private conversions. Detecting whether > a given range is partially mapped could get nasty if KVM defers tracking to the > backing store, but if KVM itself does the tracking as was previously suggested[*], > then updating lpage_info should be relatively straightfoward, e.g. use > xa_for_each_range() to see if a given 2mb/1gb range is completely covered (fully > shared) or not covered at all (fully private). > > [*] https://lore.kernel.org/all/YofeZps9YXgtP3f1@google.com Yes, slot->arch.page_attr was introduced to help identify whether a page is completely shared/private at given level. It seems XARRAY can serve the same purpose, though I know nothing about it. Looking forward to seeing the patch of using XARRAY. yes, update slot->arch.lpage_info is good to utilize the existing logic and Isaku has applied it to slot->arch.lpage_info for 2MB support patches.