From: Sean Christopherson <seanjc@google.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, pbonzini@redhat.com,
mike.kravetz@oracle.com, apopple@nvidia.com, rppt@kernel.org,
akpm@linux-foundation.org, kevin.tian@intel.com
Subject: Re: [RFC PATCH 3/3] KVM: x86/mmu: skip zap maybe-dma-pinned pages for NUMA migration
Date: Tue, 8 Aug 2023 07:26:07 -0700 [thread overview]
Message-ID: <ZNJQf1/jzEeyKaIi@google.com> (raw)
In-Reply-To: <ZNI14eN4bFV5eO4W@nvidia.com>
On Tue, Aug 08, 2023, Jason Gunthorpe wrote:
> On Tue, Aug 08, 2023 at 03:17:02PM +0800, Yan Zhao wrote:
> > @@ -859,6 +860,21 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
> > !is_last_spte(iter.old_spte, iter.level))
> > continue;
> >
> > + if (skip_pinned) {
> > + kvm_pfn_t pfn = spte_to_pfn(iter.old_spte);
> > + struct page *page = kvm_pfn_to_refcounted_page(pfn);
> > + struct folio *folio;
> > +
> > + if (!page)
> > + continue;
> > +
> > + folio = page_folio(page);
> > +
> > + if (folio_test_anon(folio) && PageAnonExclusive(&folio->page) &&
> > + folio_maybe_dma_pinned(folio))
> > + continue;
> > + }
> > +
>
> I don't get it..
>
> The last patch made it so that the NUMA balancing code doesn't change
> page_maybe_dma_pinned() pages to PROT_NONE
>
> So why doesn't KVM just check if the current and new SPTE are the same
> and refrain from invalidating if nothing changed?
Because KVM doesn't have visibility into the current and new PTEs when the zapping
occurs. The contract for invalidate_range_start() requires that KVM drop all
references before returning, and so the zapping occurs before change_pte_range()
or change_huge_pmd() have done antyhing.
> Duplicating the checks here seems very frail to me.
Yes, this is approach gets a hard NAK from me. IIUC, folio_maybe_dma_pinned()
can yield different results purely based on refcounts, i.e. KVM could skip pages
that the primary MMU does not, and thus violate the mmu_notifier contract. And
in general, I am steadfastedly against adding any kind of heuristic to KVM's
zapping logic.
This really needs to be fixed in the primary MMU and not require any direct
involvement from secondary MMUs, e.g. the mmu_notifier invalidation itself needs
to be skipped.
next prev parent reply other threads:[~2023-08-08 14:26 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-08 7:13 [RFC PATCH 0/3] Reduce NUMA balance caused TLB-shootdowns in a VM Yan Zhao
2023-08-08 7:14 ` [RFC PATCH 1/3] mm/mmu_notifier: introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA Yan Zhao
2023-08-08 7:15 ` [RFC PATCH 2/3] mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate purpose Yan Zhao
2023-08-08 7:17 ` [RFC PATCH 3/3] KVM: x86/mmu: skip zap maybe-dma-pinned pages for NUMA migration Yan Zhao
2023-08-08 12:32 ` Jason Gunthorpe
2023-08-08 14:26 ` Sean Christopherson [this message]
2023-08-08 14:32 ` Jason Gunthorpe
2023-08-08 23:56 ` Sean Christopherson
2023-08-09 0:11 ` Yan Zhao
2023-08-09 11:59 ` Jason Gunthorpe
2023-08-10 9:08 ` Yan Zhao
2023-08-09 5:06 ` Yan Zhao
2023-08-09 2:58 ` Yan Zhao
2023-08-09 0:29 ` Yan Zhao
2023-08-26 6:39 ` liulongfang
2023-09-04 7:03 ` Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZNJQf1/jzEeyKaIi@google.com \
--to=seanjc@google.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=pbonzini@redhat.com \
--cc=rppt@kernel.org \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox