linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yan Zhao <yan.y.zhao@intel.com>
To: bibo mao <maobibo@loongson.cn>
Cc: Sean Christopherson <seanjc@google.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
	<pbonzini@redhat.com>, <mike.kravetz@oracle.com>,
	<apopple@nvidia.com>, <jgg@nvidia.com>, <rppt@kernel.org>,
	<akpm@linux-foundation.org>, <kevin.tian@intel.com>,
	<david@redhat.com>
Subject: Re: [RFC PATCH v2 5/5] KVM: Unmap pages only when it's indeed protected for NUMA migration
Date: Wed, 16 Aug 2023 13:14:20 +0800	[thread overview]
Message-ID: <ZNxbLPG8qbs1FjhM@yzhao56-desk.sh.intel.com> (raw)
In-Reply-To: <c8ccc8f1-300a-09be-db6b-df2a1dedd4cf@loongson.cn>

On Wed, Aug 16, 2023 at 11:44:29AM +0800, bibo mao wrote:
> 
> 
> 在 2023/8/16 10:43, bibo mao 写道:
> > 
> > 
> > 在 2023/8/15 22:50, Sean Christopherson 写道:
> >> On Tue, Aug 15, 2023, Yan Zhao wrote:
> >>> On Mon, Aug 14, 2023 at 09:40:44AM -0700, Sean Christopherson wrote:
> >>>>>> Note, I'm assuming secondary MMUs aren't allowed to map swap entries...
> >>>>>>
> >>>>>> Compile tested only.
> >>>>>
> >>>>> I don't find a matching end to each
> >>>>> mmu_notifier_invalidate_range_start_nonblock().
> >>>>
> >>>> It pairs with existing call to mmu_notifier_invalidate_range_end() in change_pmd_range():
> >>>>
> >>>> 	if (range.start)
> >>>> 		mmu_notifier_invalidate_range_end(&range);
> >>> No, It doesn't work for mmu_notifier_invalidate_range_start() sent in change_pte_range(),
> >>> if we only want the range to include pages successfully set to PROT_NONE.
> >>
> >> Precise invalidation was a non-goal for my hack-a-patch.  The intent was purely
> >> to defer invalidation until it was actually needed, but still perform only a
> >> single notification so as to batch the TLB flushes, e.g. the start() call still
> >> used the original @end.
> >>
> >> The idea was to play nice with the scenario where nothing in a VMA could be migrated.
> >> It was complete untested though, so it may not have actually done anything to reduce
> >> the number of pointless invalidations.
> > For numa-balance scenery, can original page still be used by application even if pte
> > is changed with PROT_NONE?  If it can be used, maybe we can zap shadow mmu and flush tlb
For GUPs that does not honor FOLL_HONOR_NUMA_FAULT, yes,

See https://lore.kernel.org/all/20230803143208.383663-1-david@redhat.com/

> Since there is kvm_mmu_notifier_change_pte notification when numa page is replaced with
> new page, my meaning that can original page still be used by application even if pte
> is changed with PROT_NONE and before replaced with new page?
It's not .change_pte() notification, which is sent when COW.
The do_numa_page()/do_huge_pmd_numa_page() will try to unmap old page
protected with PROT_NONE, and if every check passes, a separate
.invalidate_range_start()/end() with event type MMU_NOTIFY_CLEAR will be
sent.

So, I think KVM (though it honors FOLL_HONOR_NUMA_FAULT), can safely
keep mapping maybe-dma pages until MMU_NOTIFY_CLEAR is sent.
(this approach is implemented in RFC v1
https://lore.kernel.org/all/20230810085636.25914-1-yan.y.zhao@intel.com/)

> 
> And for primary mmu, tlb is flushed after pte is changed with PROT_NONE and 
> after mmu_notifier_invalidate_range_end notification for secondary mmu.
> Regards
> Bibo Mao

> >> in notification mmu_notifier_invalidate_range_end with precised range, the range can
But I don't think flush tlb only in the .invalidate_range_end() in
secondary MMU is a good idea.
Flush must be done before kvm->mmu_lock is unlocked, otherwise,
confusion will be caused when multiple threads trying to update the
secondary MMU.

> >> be cross-range between range mmu_gather and mmu_notifier_range.





  reply	other threads:[~2023-08-16  5:41 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10  8:56 [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM Yan Zhao
2023-08-10  8:57 ` [RFC PATCH v2 1/5] mm/mmu_notifier: introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA Yan Zhao
2023-08-10  8:58 ` [RFC PATCH v2 2/5] mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate purpose Yan Zhao
2023-08-10  9:00 ` [RFC PATCH v2 3/5] mm/mmu_notifier: introduce a new callback .numa_protect Yan Zhao
2023-08-10  9:00 ` [RFC PATCH v2 4/5] mm/autonuma: call .numa_protect() when page is protected for NUMA migrate Yan Zhao
2023-08-11 18:52   ` Nadav Amit
2023-08-14  7:52     ` Yan Zhao
2023-08-10  9:02 ` [RFC PATCH v2 5/5] KVM: Unmap pages only when it's indeed protected for NUMA migration Yan Zhao
2023-08-10 13:16   ` bibo mao
2023-08-11  3:45     ` Yan Zhao
2023-08-11  7:40       ` bibo mao
2023-08-11  8:01         ` Yan Zhao
2023-08-11 17:14           ` Sean Christopherson
2023-08-11 17:18             ` Jason Gunthorpe
2023-08-14  6:52             ` Yan Zhao
2023-08-14  7:44               ` Yan Zhao
2023-08-14 16:40               ` Sean Christopherson
2023-08-15  1:54                 ` Yan Zhao
2023-08-15 14:50                   ` Sean Christopherson
2023-08-16  2:43                     ` bibo mao
2023-08-16  3:44                       ` bibo mao
2023-08-16  5:14                         ` Yan Zhao [this message]
2023-08-16  7:29                           ` bibo mao
2023-08-16  7:18                             ` Yan Zhao
2023-08-16  7:53                               ` bibo mao
2023-08-16 13:39                                 ` Sean Christopherson
2023-08-10  9:34 ` [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM David Hildenbrand
2023-08-10  9:50   ` Yan Zhao
2023-08-11 17:25     ` David Hildenbrand
2023-08-11 18:20       ` John Hubbard
2023-08-11 18:39         ` David Hildenbrand
2023-08-11 19:35           ` John Hubbard
2023-08-14  9:09             ` Yan Zhao
2023-08-15  2:34               ` John Hubbard
2023-08-16  7:43                 ` David Hildenbrand
2023-08-16  9:06                   ` Yan Zhao
2023-08-16  9:49                     ` David Hildenbrand
2023-08-16 18:00                       ` John Hubbard
2023-08-17  5:05                         ` Yan Zhao
2023-08-17  7:38                           ` David Hildenbrand
2023-08-18  0:13                             ` Yan Zhao
2023-08-18  2:29                               ` John Hubbard
2023-09-04  9:18                                 ` Yan Zhao
2023-08-15  2:36               ` Yuan Yao
2023-08-15  2:37                 ` Yan Zhao
2023-08-10 13:58 ` Chao Gao
2023-08-11  5:22   ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZNxbLPG8qbs1FjhM@yzhao56-desk.sh.intel.com \
    --to=yan.y.zhao@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maobibo@loongson.cn \
    --cc=mike.kravetz@oracle.com \
    --cc=pbonzini@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox