From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF6B5C3ABB9 for ; Mon, 5 May 2025 23:10:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E31AF6B0085; Mon, 5 May 2025 19:10:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB9386B0089; Mon, 5 May 2025 19:10:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C59576B008A; Mon, 5 May 2025 19:10:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A97526B0085 for ; Mon, 5 May 2025 19:10:02 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7FB01160CD1 for ; Mon, 5 May 2025 23:10:03 +0000 (UTC) X-FDA: 83410399086.06.4BEDE91 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf16.hostedemail.com (Postfix) with ESMTP id C3FD6180011 for ; Mon, 5 May 2025 23:10:01 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nOFPPfwl; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 3SEUZaAsKCH8dfnhuoh1wqjjrrjoh.frpolqx0-ppnydfn.ruj@flex--ackerleytng.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3SEUZaAsKCH8dfnhuoh1wqjjrrjoh.frpolqx0-ppnydfn.ruj@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746486601; a=rsa-sha256; cv=none; b=SllBkYhDlkQ7LQBt/lEMrfXTF63TWp/64hDEBEdNODxpqyL9h17bteygQSMDMopwJaNCS6 vXyJ93kPeP69PIckj4ft90y+g94Gq1eoFu4CAGmdj1py80nXPktI7oxMtnpXpcS+JW3bHF gEL9+AQpGfDGVWUtavfqt3Ad0/gU9EA= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nOFPPfwl; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 3SEUZaAsKCH8dfnhuoh1wqjjrrjoh.frpolqx0-ppnydfn.ruj@flex--ackerleytng.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3SEUZaAsKCH8dfnhuoh1wqjjrrjoh.frpolqx0-ppnydfn.ruj@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746486601; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QKDIrUmShBenn7oyPXZBGKMvZgIcDdiYuAx6vvvr0uk=; b=V2SqJu+4HLfr+fiTuMYlsVhf/6Oe2LlkeryQtfCriqbrKcxja3byFaPRHS4iZbdYZ2c5by pW/IOT1sL3jkbcMTlOe7Ha2r59xoahmLZkrgUsKyie8Mk4Nj3BojkuJbONrKRAM6Ne5uuL 8cebThOtq+chcRIcpq7cXAeHp2PkHtc= Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-736b5f9279cso4114803b3a.2 for ; Mon, 05 May 2025 16:10:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1746486600; x=1747091400; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QKDIrUmShBenn7oyPXZBGKMvZgIcDdiYuAx6vvvr0uk=; b=nOFPPfwlwbzaqSP6+SE9DLTGGES7k9dcj25+jdB9eozLeHDHdRngyPqXfXNKce7q14 MlRmTOQoCjxEQ0Pe2vCtH757m/FIvn2kb5p/Yc9LJ+USY4cRj0DBTv4O/Kl/B03Ba2fF hwl1DKFz0Y28IjraFxDkr875kaZ0UnUlBLNyIFIL7zj+UuDOEKHFodlK353n7Hm/yCqJ UvSue8EHPdlHM/zX0PlI+BIOXDygw+eOx3y8Iv3OEbhPFK/dNN/UGccEHxazbGxRa0Hr cj/KNOb/Wx7Xjhs5OhWQWDZ0lC7m44J9Mer/ZVLZlcv5MtkCDhBl5+wJ+gN5/8Hbvh0C KVeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746486600; x=1747091400; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QKDIrUmShBenn7oyPXZBGKMvZgIcDdiYuAx6vvvr0uk=; b=ojy5csvAQ5kQFOOONBtmxeT6eKpuq7y8P360R4NwafQel+iKPUfFyxqJM9cNka4rNE tJITUxnwykt6DhPQXEfURl+juI365o1CIOTYGGPCmqw3ouo0xZESGFFYKNcaf4wz5zGl QaJhd1s+zaJMPcOs4aIG+DLxGVXoLwumUlyFmOVAGV+nJc7WxS+uLq00n32t5riLQ23+ ZLsuLhID2AUXGLaGhVOqAkny1wC6fvz+Ss3GBWwOBAYrm16q9h4+zRRN0cjRfQKnysmR 3tKArZdvX4dpE4n4HtWuG50xXP54bMzv9lhP4xD2x88XUIfzKdH/MK1gKPLi1LrEuc/i txPw== X-Forwarded-Encrypted: i=1; AJvYcCVOpqpfr4HwZw0lz6mogMhr2WcMAReIuHZaINyjNfct6p6crBOh8FlehcghdvTjkM5DlbcaKytWow==@kvack.org X-Gm-Message-State: AOJu0YyD8HIuuKPcB1mPRKL02Kr49g9CUwpqauP3fA4/TFr3TMHqlHOS azPvGcu2vBzR9ivjmPB+8++hXmNh7ulToOignz3HwTZR/ou7nUT3S0lpKKQ/n67fiQAGc4yhQvJ DFyuTol/fuZOs//PFqp/vIg== X-Google-Smtp-Source: AGHT+IH8nRtKZdcXRNW5pHePDqwjVYLFx8dauinck+p9OIy/5vnrG+4KpoUHfGGyuxJRstUfMIhSpCGvhSjtLgsvJw== X-Received: from pfst35.prod.google.com ([2002:aa7:8fa3:0:b0:736:86e0:8dee]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:aa8d:b0:73e:1e24:5a4e with SMTP id d2e1a72fcca58-74091a963c7mr1411673b3a.24.1746486600519; Mon, 05 May 2025 16:10:00 -0700 (PDT) Date: Mon, 05 May 2025 16:09:58 -0700 In-Reply-To: <7e32aabe-c170-4cfc-99aa-f257d2a69364@redhat.com> Mime-Version: 1.0 References: <386c1169-8292-43d1-846b-c50cbdc1bc65@redhat.com> <7e32aabe-c170-4cfc-99aa-f257d2a69364@redhat.com> Message-ID: Subject: Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups From: Ackerley Tng To: David Hildenbrand , Sean Christopherson Cc: Fuad Tabba , kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: C3FD6180011 X-Stat-Signature: ud4skk3qechjj36d6jj4zd5o9arumwx5 X-Rspam-User: X-HE-Tag: 1746486601-219046 X-HE-Meta: U2FsdGVkX18zbe4Fv7oLNpNni2WFdTOHDE+ILDNvwIMKGAbGT8RNQqqcbg5rp6ufcA4hOuBD6pHG89mH3DnB8bCZkw3B5+RSFKc3mQU1yBYHpSM8co8Go91PdEvUHGQBsgb4X/nmVRRqv0g/udsHTIYhytsHhPNhdqnjU9eCMdUs/C3QE/r15rbzPgMR4qFLioIwccTTEbPBrbVaevhR1D/WNtIrjclTGhpYS2VtK9FwSKf8YpixBDqOncgFdIT4Vtu0N0N1keGTlf7sfmyv4GFegQXaJgI72GQsd7I0MUsZXurDTO7V4drU6on2DPUiEwh6gAekWdkZvyc8ORnLCXwTGmrnUgREgwRIPUH2eZCJnO81IQKWYa7cdPSSN/7jF0IRE8nQ3Dm3eBqAg7B4GAx37NoTjXYE55ZMJhyXyAfKRmFITENKH3mhl1AmwqYPM8YuHMZXtwoBpDgDbI0v8Ly0HwKRwYMQFAZWxYszpP7Xji89fi4A4NwPHq3IVO8brsNGE7zpMKgwxnOLiy+SWENrvXYD937lEWsvhQUPIMg5KIQnWFoxgPq1T6OPAd2DoIkwCfv8rm7cPmzkGXvlAC37fnvRoej2wpSeouziu3rdVed/1XlU8Tkcirv8OH6WcGL2qXp24WZitKb25wJrGBG2aFvkkwIm8xtu8n2g7cW/DAfepA4OGthcvh0SwmpEp7Npcn3kDpOLP24LPrRisjWvqB2ocFXLdhGIrJ1mBU495mdD2FRhAlA8G8U3emFfpsXjrHg0XPED963t6yw88gNv+55LOoLlc5JsmNGz1MlEOdmvsHMFRGWp6NdUK4juJob42Qu1p7AHp+wacnTNQAIZhUH14cZFou86a2VVBKQ3TsZyq3h8YsJQRWU8To3qKnaRooY2nmewl52WJnluNDRI50q2tr2BpLzz008RHjyKcq9Nj06t8nhsZFs7K5V99Ck9mExycbEO0bdzG77 Tl4pLJ+e sLhkiS4fuKnak+55VeFOAUyBC9+R9f+CjAEU9aHWNiCJf40E9t5i/LxlVySiSR6o+5htoGJk2yOSubiEb/XoIQxyiibBxNgdWkYGAsCi4AKkg1E++JQRoEaPFykMBp1tSAALMg+CG1hQY46NJ//1IIbA+RDUUbOFSh2/tfEkAoMiEtzyeUH6+Z+Yq0/c5Ey4bibMFZgO9aD8h1DyQD4YHHMZamZ1T4elLhEf1iFXyM0MKy3929eS9+rA5YkltpKJ4tdYXrQtXY5DsDULdT3GMfyhl8D4s61t7o8YblcmFuguxF/4KPuXngyOrUyHLGW2ELd1mdfWNKuqRmAsvrw4a+af+qjogaqlEJLtmnE225dHXpTbf21YUUViNweXC7EC6NA4MO7kPyxUdaCnnUWw5zN15tzAFuYGoCykXDwjhi5KXB5Y0LDsNxzte9XtvSIYQB16xwt+ipH011x32524FOgqsSqWubce/+ZRbUEkG/Gt2FJGv1CqKIo1bJPW3DPTzRqIV6TPd5jTaXz088zB08gSAw+Y37f7I4q87LKBlT2lcX7684twjWvo4wxAAEwqtdnOfEihK2dEKgWiI18im5pd4nMVN91Ye2fDobmJxGzfAmLghjt4w4IrOTZnXYWK8d+zapw2HBfjqHc2kXbuG/W80tuRvXBuOKhzfpUZWHHE/LwWiSlMoUXPChcgh0TBgUZPIIP2n2laAYoRjozghQd/HLsTE+MUOXR5I X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: David Hildenbrand writes: > On 03.05.25 00:00, Ackerley Tng wrote: >> Sean Christopherson writes: >> >>> On Fri, May 02, 2025, David Hildenbrand wrote: >>>> On 30.04.25 20:58, Ackerley Tng wrote: >>>>>> - if (is_private) >>>>>> + if (is_gmem) >>>>>> return max_level; >>>>> >>>>> I think this renaming isn't quite accurate. >>>> >>>> After our discussion yesterday, does that still hold true? >>> >>> No. >>> >>>>> IIUC in __kvm_mmu_max_mapping_level(), we skip considering >>>>> host_pfn_mapping_level() if the gfn is private because private memory >>>>> will not be mapped to userspace, so there's no need to query userspace >>>>> page tables in host_pfn_mapping_level(). >>>> >>>> I think the reason was that: for private we won't be walking the user space >>>> pages tables. >>>> >>>> Once guest_memfd is also responsible for the shared part, why should this >>>> here still be private-only, and why should we consider querying a user space >>>> mapping that might not even exist? >>> >>> +1, one of the big selling points for guest_memfd beyond CoCo is that it provides >>> guest-first memory. It is very explicitly an intended feature that the guest >>> mappings KVM creates can be a superset of the host userspace mappings. E.g. the >>> guest can use larger page sizes, have RW while the host has RO, etc. >> >> Do you mean that __kvm_mmu_max_mapping_level() should, in addition to >> the parameter renaming from is_private to is_gmem, do something like >> >> if (is_gmem) >> return kvm_gmem_get_max_mapping_level(slot, gfn); > > I assume you mean, not looking at lpage_info at all? > My bad. I actually meant just to take input from guest_memfd and stop there without checking with host page tables, perhaps something like min(kvm_gmem_get_max_mapping_level(slot, gfn), max_level); > I have limited understanding what lpage_info is or what it does. I > believe all it adds is a mechanism to *disable* large page mappings. > This is my understanding too. > We want to disable large pages if (using 2M region as example) > > (a) Mixed memory attributes. If a PFN falls into a 2M region, and parts > of that region are shared vs. private (mixed memory attributes -> > KVM_LPAGE_MIXED_FLAG) > > -> With gmem-shared we could have mixed memory attributes, not a PFN > fracturing. (PFNs don't depend on memory attributes) > > (b) page track: intercepting (mostly write) access to GFNs > Could you explain more about page track case? > > So, I wonder if we still have to take care of lpage_info, at least for > handling (b) correctly [I assume so]. Regarding (a) I am not sure: once > memory attributes are handled by gmem in the gmem-shared case. IIRC, > with AMD SEV we might still have to honor it? But gmem itself could > handle that. > For AMD SEV, I believe kvm_max_private_mapping_level() already takes care of that, at least for the MMU faulting path [1], where guest_memfd gives input using max_order, then arch-specific callback contributes input. > > What we could definitely do here for now is: > > if (is_gmem) > /* gmem only supports 4k pages for now. */ > return PG_LEVEL_4K; > > And not worry about lpage_infor for the time being, until we actually do > support larger pages. > > Perhaps this is better explained as an RFC in code. I'll put in a patch as part of Fuad's series if Fuad doesn't mind. >> >> and basically defer to gmem as long as gmem should be used for this gfn? >> >> There is another call to __kvm_mmu_max_mapping_level() via >> kvm_mmu_max_mapping_level() beginning from recover_huge_pages_range(), >> and IIUC that doesn't go through guest_memfd. >> >> Hence, unlike the call to __kvm_mmu_max_mapping_level() from the KVM x86 >> MMU fault path, guest_memfd didn't get a chance to provide its input in >> the form of returning max_order from kvm_gmem_get_pfn(). > > Right, we essentially say that "this is a private fault", likely > assuming that we already verified earlier that the memory is also private. > > [I can see that happening when the function is called through > direct_page_fault()] > > We could simply call kvm_mmu_max_mapping_level() from > kvm_mmu_hugepage_adjust() I guess. (could possibly be optimized later) > > -- > Cheers, > > David / dhildenb [1] https://github.com/torvalds/linux/blob/01f95500a162fca88cefab9ed64ceded5afabc12/arch/x86/kvm/mmu/mmu.c#L4480