From: Sean Christopherson <seanjc@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Oscar Salvador <osalvador@suse.de>,
Dan Williams <dan.j.williams@intel.com>,
James Houghton <jthoughton@google.com>,
Matthew Wilcox <willy@infradead.org>,
Nicholas Piggin <npiggin@gmail.com>,
Rik van Riel <riel@surriel.com>,
Dave Jiang <dave.jiang@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
Rick P Edgecombe <rick.p.edgecombe@intel.com>,
"Kirill A . Shutemov" <kirill@shutemov.name>,
linuxppc-dev@lists.ozlabs.org,
Mel Gorman <mgorman@techsingularity.net>,
Hugh Dickins <hughd@google.com>, Borislav Petkov <bp@alien8.de>,
David Hildenbrand <david@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Vlastimil Babka <vbabka@suse.cz>,
Dave Hansen <dave.hansen@linux.intel.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Huang Ying <ying.huang@intel.com>,
kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH v4 2/7] mm/mprotect: Push mmu notifier to PUDs
Date: Thu, 8 Aug 2024 14:31:19 -0700 [thread overview]
Message-ID: <ZrU5JyjIa1CwZ_KD@google.com> (raw)
In-Reply-To: <ZrU20AqADICwwmCy@x1n>
On Thu, Aug 08, 2024, Peter Xu wrote:
> Hi, Sean,
>
> On Thu, Aug 08, 2024 at 08:33:59AM -0700, Sean Christopherson wrote:
> > On Wed, Aug 07, 2024, Peter Xu wrote:
> > > mprotect() does mmu notifiers in PMD levels. It's there since 2014 of
> > > commit a5338093bfb4 ("mm: move mmu notifier call from change_protection to
> > > change_pmd_range").
> > >
> > > At that time, the issue was that NUMA balancing can be applied on a huge
> > > range of VM memory, even if nothing was populated. The notification can be
> > > avoided in this case if no valid pmd detected, which includes either THP or
> > > a PTE pgtable page.
> > >
> > > Now to pave way for PUD handling, this isn't enough. We need to generate
> > > mmu notifications even on PUD entries properly. mprotect() is currently
> > > broken on PUD (e.g., one can easily trigger kernel error with dax 1G
> > > mappings already), this is the start to fix it.
> > >
> > > To fix that, this patch proposes to push such notifications to the PUD
> > > layers.
> > >
> > > There is risk on regressing the problem Rik wanted to resolve before, but I
> > > think it shouldn't really happen, and I still chose this solution because
> > > of a few reasons:
> > >
> > > 1) Consider a large VM that should definitely contain more than GBs of
> > > memory, it's highly likely that PUDs are also none. In this case there
> >
> > I don't follow this. Did you mean to say it's highly likely that PUDs are *NOT*
> > none?
>
> I did mean the original wordings.
>
> Note that in the previous case Rik worked on, it's about a mostly empty VM
> got NUMA hint applied. So I did mean "PUDs are also none" here, with the
> hope that when the numa hint applies on any part of the unpopulated guest
> memory, it'll find nothing in PUDs. Here it's mostly not about a huge PUD
> mapping as long as the guest memory is not backed by DAX (since only DAX
> supports 1G huge pud so far, while hugetlb has its own path here in
> mprotect, so it must be things like anon or shmem), but a PUD entry that
> contains pmd pgtables. For that part, I was trying to justify "no pmd
> pgtable installed" with the fact that "a large VM that should definitely
> contain more than GBs of memory", it means the PUD range should hopefully
> never been accessed, so even the pmd pgtable entry should be missing.
Ah, now I get what you were saying.
Problem is, walking the rmaps for the shadow MMU doesn't benefit (much) from
empty PUDs, because KVM needs to blindly walk the rmaps for every gfn covered by
the PUD to see if there are any SPTEs in any shadow MMUs mapping that gfn. And
that walk is done without ever yielding, which I suspect is the source of the
soft lockups of yore.
And there's no way around that conundrum (walking rmaps), at least not without a
major rewrite in KVM. In a nested TDP scenario, KVM's stage-2 page tables (for
L2) key off of L2 gfns, not L1 gfns, and so the only way to find mappings is
through the rmaps.
next prev parent reply other threads:[~2024-08-08 21:31 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240807194812.819412-1-peterx@redhat.com>
[not found] ` <20240807194812.819412-3-peterx@redhat.com>
2024-08-08 15:33 ` Sean Christopherson
2024-08-08 21:21 ` Peter Xu
2024-08-08 21:31 ` Sean Christopherson [this message]
2024-08-08 21:47 ` Peter Xu
2024-08-08 22:45 ` Sean Christopherson
[not found] ` <20240807194812.819412-6-peterx@redhat.com>
[not found] ` <878qx80xy8.ffs@tglx>
2024-08-08 15:49 ` [PATCH v4 5/7] mm/x86: arch_check_zapped_pud() Peter Xu
2024-08-08 20:45 ` David Hildenbrand
[not found] ` <20240807194812.819412-7-peterx@redhat.com>
[not found] ` <875xsc0xjy.ffs@tglx>
2024-08-08 20:25 ` [PATCH v4 6/7] mm/x86: Add missing pud helpers Peter Xu
[not found] ` <20240807194812.819412-5-peterx@redhat.com>
[not found] ` <87bk240y8h.ffs@tglx>
[not found] ` <ZrTcGxANpcvwp1qt@x1n>
2024-08-09 12:08 ` [PATCH v4 4/7] mm/x86: Make pud_leaf() only care about PSE bit Thomas Gleixner
2024-08-09 13:53 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZrU5JyjIa1CwZ_KD@google.com \
--to=seanjc@google.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=bp@alien8.de \
--cc=christophe.leroy@csgroup.eu \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jthoughton@google.com \
--cc=kirill@shutemov.name \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=osalvador@suse.de \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox