linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: intel-xe@lists.freedesktop.org,
	"Christian König" <christian.koenig@amd.com>,
	dri-devel@lists.freedesktop.org, "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Simona Vetter" <simona.vetter@ffwll.ch>,
	"Dave Airlie" <airlied@gmail.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/4] drm/xe/userptr: Convert invalidation to two-pass MMU notifier
Date: Mon, 2 Mar 2026 13:30:17 -0800	[thread overview]
Message-ID: <aaYBaXuvvXk28gNx@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <1722945c1b8a99bd9386b82a109e3308197fe914.camel@linux.intel.com>

On Mon, Mar 02, 2026 at 10:22:13PM +0100, Thomas Hellström wrote:
> Hi,
> 
> On Mon, 2026-03-02 at 10:57 -0800, Matthew Brost wrote:
> 
> Thanks for reviewing,
> 
> > On Mon, Mar 02, 2026 at 05:32:46PM +0100, Thomas Hellström wrote:
> > > In multi-GPU scenarios, asynchronous GPU job latency is a
> > > bottleneck if
> > > each notifier waits for its own GPU before returning. The two-pass
> > > mmu_interval_notifier infrastructure allows deferring the wait to a
> > > second pass, so all GPUs can be signalled in the first pass before
> > > any of them are waited on.
> > > 
> > > Convert the userptr invalidation to use the two-pass model:
> > > 
> > > Use invalidate_start as the first pass to mark the VMA for repin
> > > and
> > > enable software signalling on the VM reservation fences to start
> > > any
> > > gpu work needed for signaling. Fall back to completing the work
> > > synchronously if all fences are already signalled, or if a
> > > concurrent
> > > invalidation is already using the embedded finish structure.
> > > 
> > > Use invalidate_finish as the second pass to wait for the
> > > reservation
> > > fences to complete, invalidate the GPU TLB in fault mode, and unmap
> > > the gpusvm pages.
> > > 
> > > Embed a struct mmu_interval_notifier_finish in struct xe_userptr to
> > > avoid dynamic allocation in the notifier callback. Use a
> > > finish_inuse
> > > flag to prevent two concurrent invalidations from using it
> > > simultaneously; fall back to the synchronous path for the second
> > > caller.
> > > 
> > 
> > A couple nits below. Also for clarity, I'd probably rework this
> > series...
> > 
> >  - Move patch #3 to 2nd to patch
> >  - Squash patch #2, #4 into a single patch, make thia the last patch
> > 
> > Let me know what you think on the reordering. I'm looking with the
> > series applied and aside from nits below everything in xe_userptr.c
> > looks good to me.
> 
> We could do that, but unless you insist, I'd like to keep the current
> ordering since patch #2 is a very simple example on how to convert and
> also since #4 makes the notifier rather complex so it'd be good to
> be able to bisect in between those two.
> 

I think it is fine the way you have it - I never have strong opinions of
stuff like this. For me to review, I just had to apply the series to a
branch to get a full picture / verify all flows were correct.

> >  
> > > Assisted-by: GitHub Copilot:claude-sonnet-4.6
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_userptr.c | 96 +++++++++++++++++++++++++----
> > > ----
> > >  drivers/gpu/drm/xe/xe_userptr.h | 14 +++++
> > >  2 files changed, 88 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_userptr.c
> > > b/drivers/gpu/drm/xe/xe_userptr.c
> > > index e120323c43bc..440b0a79d16f 100644
> > > --- a/drivers/gpu/drm/xe/xe_userptr.c
> > > +++ b/drivers/gpu/drm/xe/xe_userptr.c
> > > @@ -73,18 +73,42 @@ int xe_vma_userptr_pin_pages(struct
> > > xe_userptr_vma *uvma)
> > >  				    &ctx);
> > >  }
> > >  
> > > -static void __vma_userptr_invalidate(struct xe_vm *vm, struct
> > > xe_userptr_vma *uvma)
> > > +static void xe_vma_userptr_do_inval(struct xe_vm *vm, struct
> > > xe_userptr_vma *uvma,
> > > +				    bool is_deferred)
> > >  {
> > >  	struct xe_userptr *userptr = &uvma->userptr;
> > >  	struct xe_vma *vma = &uvma->vma;
> > > -	struct dma_resv_iter cursor;
> > > -	struct dma_fence *fence;
> > >  	struct drm_gpusvm_ctx ctx = {
> > >  		.in_notifier = true,
> > >  		.read_only = xe_vma_read_only(vma),
> > >  	};
> > >  	long err;
> > >  
> > 
> > xe_svm_assert_in_notifier(vm);
> 
> This actually reveals a pre-existing bug. Since this code
> is called with the notifier lock held in read mode, and
> the vm resv held in the userptr invalidation injection.
> That assert would hit.
> 
> 
> Also drm_gpusvm_unmap_pages() below will assert the same thing, (also
> affected by the bug)
> but for clarity I agree we might want to have an assert here, but then
> it would need to include the other mode as well, and I'd need to update
> the locking docs for the two-pass state.
> 

Right! I have pointed this out Auld as this popped in my testing
implementing the CPU bind / ULLS series as I enabled
DRM_XE_USERPTR_INVAL_INJECT in my testing. We should fixup
drm_gpusvm_unmap_pages soonish too + enable DRM_XE_USERPTR_INVAL_INJECT
in some CI runs too.

I think some lockdep assert would be good - even if it just the
non-write mode version with maybe a brief explaination the error
injection path can do this in read mode while the notifier path does
this in write mode.

Matt

> > 
> > > +	err = dma_resv_wait_timeout(xe_vm_resv(vm),
> > > +				    DMA_RESV_USAGE_BOOKKEEP,
> > > +				    false, MAX_SCHEDULE_TIMEOUT);
> > > +	XE_WARN_ON(err <= 0);
> > > +
> > > +	if (xe_vm_in_fault_mode(vm) && userptr->initial_bind) {
> > > +		err = xe_vm_invalidate_vma(vma);
> > > +		XE_WARN_ON(err);
> > > +	}
> > > +
> > > +	if (is_deferred)
> > > +		userptr->finish_inuse = false;
> > > +	drm_gpusvm_unmap_pages(&vm->svm.gpusvm, &uvma-
> > > >userptr.pages,
> > > +			       xe_vma_size(vma) >> PAGE_SHIFT,
> > > &ctx);
> > > +}
> > > +
> > > +static struct mmu_interval_notifier_finish *
> > > +xe_vma_userptr_invalidate_pass1(struct xe_vm *vm, struct
> > > xe_userptr_vma *uvma)
> > > +{
> > > +	struct xe_userptr *userptr = &uvma->userptr;
> > > +	struct xe_vma *vma = &uvma->vma;
> > > +	struct dma_resv_iter cursor;
> > > +	struct dma_fence *fence;
> > > +	bool signaled = true;
> > > +
> > 
> > xe_svm_assert_in_notifier(vm);
> 
> Same here.
> 
> > 
> > >  	/*
> > >  	 * Tell exec and rebind worker they need to repin and
> > > rebind this
> > >  	 * userptr.
> > > @@ -105,27 +129,32 @@ static void __vma_userptr_invalidate(struct
> > > xe_vm *vm, struct xe_userptr_vma *uv
> > >  	 */
> > >  	dma_resv_iter_begin(&cursor, xe_vm_resv(vm),
> > >  			    DMA_RESV_USAGE_BOOKKEEP);
> > > -	dma_resv_for_each_fence_unlocked(&cursor, fence)
> > > +	dma_resv_for_each_fence_unlocked(&cursor, fence) {
> > >  		dma_fence_enable_sw_signaling(fence);
> > > +		if (signaled && !dma_fence_is_signaled(fence))
> > > +			signaled = false;
> > > +	}
> > >  	dma_resv_iter_end(&cursor);
> > >  
> > > -	err = dma_resv_wait_timeout(xe_vm_resv(vm),
> > > -				    DMA_RESV_USAGE_BOOKKEEP,
> > > -				    false, MAX_SCHEDULE_TIMEOUT);
> > > -	XE_WARN_ON(err <= 0);
> > > -
> > > -	if (xe_vm_in_fault_mode(vm) && userptr->initial_bind) {
> > > -		err = xe_vm_invalidate_vma(vma);
> > > -		XE_WARN_ON(err);
> > > +	/*
> > > +	 * Only one caller at a time can use the multi-pass state.
> > > +	 * If it's already in use, or all fences are already
> > > signaled,
> > > +	 * proceed directly to invalidation without deferring.
> > > +	 */
> > > +	if (signaled || userptr->finish_inuse) {
> > > +		xe_vma_userptr_do_inval(vm, uvma, false);
> > > +		return NULL;
> > >  	}
> > >  
> > > -	drm_gpusvm_unmap_pages(&vm->svm.gpusvm, &uvma-
> > > >userptr.pages,
> > > -			       xe_vma_size(vma) >> PAGE_SHIFT,
> > > &ctx);
> > > +	userptr->finish_inuse = true;
> > > +
> > > +	return &userptr->finish;
> > >  }
> > >  
> > > -static bool vma_userptr_invalidate(struct mmu_interval_notifier
> > > *mni,
> > > -				   const struct mmu_notifier_range
> > > *range,
> > > -				   unsigned long cur_seq)
> > > +static bool xe_vma_userptr_invalidate_start(struct
> > > mmu_interval_notifier *mni,
> > > +					    const struct
> > > mmu_notifier_range *range,
> > > +					    unsigned long cur_seq,
> > > +					    struct
> > > mmu_interval_notifier_finish **p_finish)
> > >  {
> > >  	struct xe_userptr_vma *uvma = container_of(mni,
> > > typeof(*uvma), userptr.notifier);
> > >  	struct xe_vma *vma = &uvma->vma;
> > > @@ -138,21 +167,40 @@ static bool vma_userptr_invalidate(struct
> > > mmu_interval_notifier *mni,
> > >  		return false;
> > >  
> > >  	vm_dbg(&xe_vma_vm(vma)->xe->drm,
> > > -	       "NOTIFIER: addr=0x%016llx, range=0x%016llx",
> > > +	       "NOTIFIER PASS1: addr=0x%016llx, range=0x%016llx",
> > >  		xe_vma_start(vma), xe_vma_size(vma));
> > >  
> > >  	down_write(&vm->svm.gpusvm.notifier_lock);
> > >  	mmu_interval_set_seq(mni, cur_seq);
> > >  
> > > -	__vma_userptr_invalidate(vm, uvma);
> > > +	*p_finish = xe_vma_userptr_invalidate_pass1(vm, uvma);
> > > +
> > >  	up_write(&vm->svm.gpusvm.notifier_lock);
> > > -	trace_xe_vma_userptr_invalidate_complete(vma);
> > > +	if (!*p_finish)
> > > +		trace_xe_vma_userptr_invalidate_complete(vma);
> > >  
> > >  	return true;
> > >  }
> > >  
> > > +static void xe_vma_userptr_invalidate_finish(struct
> > > mmu_interval_notifier_finish *finish)
> > > +{
> > > +	struct xe_userptr_vma *uvma = container_of(finish,
> > > typeof(*uvma), userptr.finish);
> > > +	struct xe_vma *vma = &uvma->vma;
> > > +	struct xe_vm *vm = xe_vma_vm(vma);
> > > +
> > > +	vm_dbg(&xe_vma_vm(vma)->xe->drm,
> > > +	       "NOTIFIER PASS2: addr=0x%016llx, range=0x%016llx",
> > > +		xe_vma_start(vma), xe_vma_size(vma));
> > > +
> > > +	down_write(&vm->svm.gpusvm.notifier_lock);
> > > +	xe_vma_userptr_do_inval(vm, uvma, true);
> > > +	up_write(&vm->svm.gpusvm.notifier_lock);
> > > +	trace_xe_vma_userptr_invalidate_complete(vma);
> > > +}
> > > +
> > >  static const struct mmu_interval_notifier_ops
> > > vma_userptr_notifier_ops = {
> > > -	.invalidate = vma_userptr_invalidate,
> > > +	.invalidate_start = xe_vma_userptr_invalidate_start,
> > > +	.invalidate_finish = xe_vma_userptr_invalidate_finish,
> > >  };
> > >  
> > >  #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
> > > @@ -164,6 +212,7 @@ static const struct mmu_interval_notifier_ops
> > > vma_userptr_notifier_ops = {
> > >   */
> > >  void xe_vma_userptr_force_invalidate(struct xe_userptr_vma *uvma)
> > >  {
> > > +	static struct mmu_interval_notifier_finish *finish;
> > >  	struct xe_vm *vm = xe_vma_vm(&uvma->vma);
> > >  
> > >  	/* Protect against concurrent userptr pinning */
> > > @@ -179,7 +228,10 @@ void xe_vma_userptr_force_invalidate(struct
> > > xe_userptr_vma *uvma)
> > >  	if (!mmu_interval_read_retry(&uvma->userptr.notifier,
> > >  				     uvma-
> > > >userptr.pages.notifier_seq))
> > >  		uvma->userptr.pages.notifier_seq -= 2;
> > > -	__vma_userptr_invalidate(vm, uvma);
> > > +
> > > +	finish = xe_vma_userptr_invalidate_pass1(vm, uvma);
> > > +	if (finish)
> > > +		xe_vma_userptr_do_inval(vm, uvma, true);
> > >  }
> > >  #endif
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_userptr.h
> > > b/drivers/gpu/drm/xe/xe_userptr.h
> > > index ef801234991e..4f42db61fd62 100644
> > > --- a/drivers/gpu/drm/xe/xe_userptr.h
> > > +++ b/drivers/gpu/drm/xe/xe_userptr.h
> > > @@ -57,12 +57,26 @@ struct xe_userptr {
> > >  	 */
> > >  	struct mmu_interval_notifier notifier;
> > >  
> > > +	/**
> > > +	 * @finish: MMU notifier finish structure for two-pass
> > > invalidation.
> > > +	 * Embedded here to avoid allocation in the notifier
> > > callback.
> > > +	 * Protected by @vm::svm.gpusvm.notifier_lock.
> > > +	 */
> > > +	struct mmu_interval_notifier_finish finish;
> > > +	/**
> > > +	 * @finish_inuse: Whether @finish is currently in use by
> > > an in-progress
> > > +	 * two-pass invalidation.
> > > +	 * Protected by @vm::svm.gpusvm.notifier_lock.
> > > +	 */
> > > +	bool finish_inuse;
> > > +
> > >  	/**
> > >  	 * @initial_bind: user pointer has been bound at least
> > > once.
> > >  	 * write: vm->svm.gpusvm.notifier_lock in read mode and
> > > vm->resv held.
> > >  	 * read: vm->svm.gpusvm.notifier_lock in write mode or vm-
> > > >resv held.
> > >  	 */
> > >  	bool initial_bind;
> > > +
> > 
> > Unrelated.
> 
> Sure. Will fix.
> 
> Thanks,
> Thomas
> 
> 
> 
> > 
> > Matt
> > 
> > >  #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
> > >  	u32 divisor;
> > >  #endif
> > > -- 
> > > 2.53.0
> > > 


  reply	other threads:[~2026-03-02 21:30 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02 16:32 [PATCH v2 0/4] Two-pass MMU interval notifiers Thomas Hellström
2026-03-02 16:32 ` [PATCH v2 1/4] mm/mmu_notifier: Allow two-pass struct mmu_interval_notifiers Thomas Hellström
2026-03-02 19:48   ` Matthew Brost
2026-03-02 21:12     ` Thomas Hellström
2026-03-02 16:32 ` [PATCH v2 2/4] drm/xe/userptr: Convert invalidation to two-pass MMU notifier Thomas Hellström
2026-03-02 18:57   ` Matthew Brost
2026-03-02 21:22     ` Thomas Hellström
2026-03-02 21:30       ` Matthew Brost [this message]
2026-03-02 16:32 ` [PATCH v2 3/4] drm/xe: Split TLB invalidation into submit and wait steps Thomas Hellström
2026-03-02 19:06   ` Matthew Brost
2026-03-02 21:29     ` Thomas Hellström
2026-03-02 21:31       ` Matthew Brost
2026-03-02 16:32 ` [PATCH v2 4/4] drm/xe/userptr: Defer Waiting for TLB invalidation to the second pass if possible Thomas Hellström
2026-03-02 19:14   ` Matthew Brost
2026-03-02 21:33     ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaYBaXuvvXk28gNx@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jgg@ziepe.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=simona.vetter@ffwll.ch \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox