Re: [Question] Are "device exclusive non-swap entries" / "SVM atomics in Nouveau" still getting used in practice?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Alistair Popple <apopple@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	 John Hubbard <jhubbard@nvidia.com>,
	nouveau@lists.freedesktop.org, Jason Gunthorpe <jgg@nvidia.com>,
	 DRI Development <dri-devel@lists.freedesktop.org>,
	Karol Herbst <kherbst@redhat.com>, Lyude Paul <lyude@redhat.com>,
	 Danilo Krummrich <dakr@kernel.org>
Subject: Re: [Question] Are "device exclusive non-swap entries" / "SVM atomics in Nouveau" still getting used in practice?
Date: Tue, 28 Jan 2025 11:09:24 +1100	[thread overview]
Message-ID: <fbwjse2zexcsxuro5w3a5vs2rq4eabpccfkbd3buc4qmkgoo7z@xpdtyukllzvo> (raw)
In-Reply-To: <f2f059a3-0c95-44cf-b79a-8c01e9334919@redhat.com>

On Fri, Jan 24, 2025 at 06:54:02PM +0100, David Hildenbrand wrote:
> > > > On integrated the gpu is tied into the coherency
> > > > fabric, so there it's not needed.
> > > > 
> > > > I think the more fundamental question with both this function here and
> > > > with forced migration to device memory is that there's no guarantee it
> > > > will work out.
> > > 
> > > Yes, in particular with device-exclusive, it doesn't really work with THP
> > > and is only limited to anonymous memory. I have patches to at least make it
> > > work reliably with THP.
> > 
> > I should have crawled through the implementation first before replying.
> > Since it only looks at folio_mapcount() make_device_exclusive() should at
> > least in theory work reliably on anon memory, and not be impacted by
> > elevated refcounts due to migration/ksm/thp/whatever.
> 
> Yes, there is -- in theory -- nothing blocking the conversion except the
> folio lock. That's different than page migration.

Indeed - this was the entire motivation for make_device_exclusive() - that we
needed a way to reliably exclude CPU access that couldn't be blocked in the same
way page migration can (otherwise we could have just migrated to a device page,
even if that may have added unwanted overhead).
> [...]
> 
> > 
> > > Then, we seem to give up too easily if we cannot lock the folio when wanting
> > > to convert to device-exclusive, which also looks rather odd. But well, maybe
> > > it just works good enough in the common case, or there is some other retry
> > > logic that makes it fly.
> > 
> > I've crawled through the path to migrate pages from device memory back to
> > system memory a few months ago, and found some livelock issues in there.
> > Wouldn't be surprised if m_d_e has some of the same, but I didn't dig
> > through it (least because intel can't use it because not-so-great hw
> > design).
> 
> Yes, that might be possible. Maybe something keeps spinning while the folio
> is locked instead of properly waiting for the lock.
> 
> ... or someone is trying to convert a tail page of a THP, in which case we
> will also keep failing the conversion right now.
>
> > > There are other concerns I have (what if the page is pinned and access
> > > outside of the user space page tables?). Maybe there was not need to handle
> > > these cases so far.
> > 
> > I think that's also ok, but might be good to document this clearly that
> > concurrent direct i/o or rdma registered buffer or whatever will mess with
> > this. The promise is only between the gpu and the cpu, not anything else,
> > in current apis. At least to my knowledge.

That is correct - we never came up with a good solution for direct i/o or
rdma so the promise is only between cpu and gpu. That said direct i/o probably
implies a shared filebacked mapping. We weren't convinced there was a useful use
case there because anything could potentially map a file and use non-atomic CPU
instructions to access shared the shared mapping anyway.

> Well, the issue is that e.g., iouring fixed buffers can be accessed from the
> CPU using the direct map from the CPU, not necessarily using DMA from a
> device.

Right. This was all targetted at private anonymous memory and it's assumed that
both cpu and gpu are using atomic operations. As soon as you're talking shared
non-anonymous mappings it seems impossible for userspace to guarantee nothing
is accessing that memory with non-atomic ops anyway, even with a coherent
interconnect.

> In any case, I'm planning on adding some code-level documentation for that
> and look into extending the high-level doc while at it.

Thanks! Please CC me in.
 
> > 
> > > So best I can do is make anonymous memory more reliable with
> > > device-exclusive and fixup some of the problematic parts that I see (e.g.,
> > > broken page reclaim, page migration, ...).
> > > 
> > > But before starting to cleanup+improve the existing handling of anonymous
> > > memory, I was wondering if this whole thing is getting used at all.
> > 
> > Yeah if this can be made reliable (under the limitation of only anon
> > memory and only excluding userspace access) then I expect we'll need this
> > for a very long time. I just had no idea whether even that is possible.
> > 
> > What isn't good is if it's only mostly reliable, like the current
> > pgmap->ops->migrate_to_ram() path in do_swap_page() still is.

Right. This is _supposed_ to be 100% reliable under those limitations although
I will admit I didn't consider THP deeply as that is not supported for
DEVICE_PRIVATE anyway.

> I'll cc you on patches once I figure out some details on how to fix some
> page table walkers that really don't expect these non-swap entries.
> 
> Fortunately, the hmm test device is in place to trigger some shaky
> scenarios.
> 
> -- 
> Cheers,
> 
> David / dhildenb
>

next prev parent reply	other threads:[~2025-01-28  0:09 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-23 10:20 David Hildenbrand
2025-01-23 15:08 ` Simona Vetter
2025-01-24 10:44   ` David Hildenbrand
2025-01-24 14:11     ` Jason Gunthorpe
2025-01-24 14:39       ` David Hildenbrand
2025-01-24 15:28     ` Simona Vetter
2025-01-24 17:54       ` David Hildenbrand
2025-01-28  0:09         ` Alistair Popple [this message]
2025-01-28 20:14           ` Simona Vetter
2025-01-28 20:24             ` David Hildenbrand
2025-01-29 10:48               ` Simona Vetter
2025-01-29 11:28                 ` Simona Vetter
2025-01-29 11:31                   ` David Hildenbrand
2025-01-29 14:05                     ` Simona Vetter
2025-01-29 16:13                       ` David Hildenbrand
2025-01-30  8:55                         ` Simona Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fbwjse2zexcsxuro5w3a5vs2rq4eabpccfkbd3buc4qmkgoo7z@xpdtyukllzvo \
    --to=apopple@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=david@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kherbst@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=lyude@redhat.com \
    --cc=nouveau@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox