From: Simona Vetter <simona.vetter@ffwll.ch>
To: David Hildenbrand <david@redhat.com>,
Alistair Popple <apopple@nvidia.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
John Hubbard <jhubbard@nvidia.com>,
nouveau@lists.freedesktop.org, Jason Gunthorpe <jgg@nvidia.com>,
DRI Development <dri-devel@lists.freedesktop.org>,
Karol Herbst <kherbst@redhat.com>, Lyude Paul <lyude@redhat.com>,
Danilo Krummrich <dakr@kernel.org>
Subject: Re: [Question] Are "device exclusive non-swap entries" / "SVM atomics in Nouveau" still getting used in practice?
Date: Wed, 29 Jan 2025 12:28:25 +0100 [thread overview]
Message-ID: <Z5oQ2YV1cRUc0KnD@phenom.ffwll.local> (raw)
In-Reply-To: <Z5oHY1pjjwBfRN1g@phenom.ffwll.local>
On Wed, Jan 29, 2025 at 11:48:03AM +0100, Simona Vetter wrote:
> On Tue, Jan 28, 2025 at 09:24:33PM +0100, David Hildenbrand wrote:
> > On 28.01.25 21:14, Simona Vetter wrote:
> > > On Tue, Jan 28, 2025 at 11:09:24AM +1100, Alistair Popple wrote:
> > > > On Fri, Jan 24, 2025 at 06:54:02PM +0100, David Hildenbrand wrote:
> > > > > > > > On integrated the gpu is tied into the coherency
> > > > > > > > fabric, so there it's not needed.
> > > > > > > >
> > > > > > > > I think the more fundamental question with both this function here and
> > > > > > > > with forced migration to device memory is that there's no guarantee it
> > > > > > > > will work out.
> > > > > > >
> > > > > > > Yes, in particular with device-exclusive, it doesn't really work with THP
> > > > > > > and is only limited to anonymous memory. I have patches to at least make it
> > > > > > > work reliably with THP.
> > > > > >
> > > > > > I should have crawled through the implementation first before replying.
> > > > > > Since it only looks at folio_mapcount() make_device_exclusive() should at
> > > > > > least in theory work reliably on anon memory, and not be impacted by
> > > > > > elevated refcounts due to migration/ksm/thp/whatever.
> > > > >
> > > > > Yes, there is -- in theory -- nothing blocking the conversion except the
> > > > > folio lock. That's different than page migration.
> > > >
> > > > Indeed - this was the entire motivation for make_device_exclusive() - that we
> > > > needed a way to reliably exclude CPU access that couldn't be blocked in the same
> > > > way page migration can (otherwise we could have just migrated to a device page,
> > > > even if that may have added unwanted overhead).
> > >
> > > The folio_trylock worries me a bit. I guess this is to avoid deadlocks
> > > when locking multiple folios, but I think at least on the first one we
> > > need an unconditional folio_lock to guarantee forward progress.
> >
> > At least on the hmm path I was able to trigger the EBUSY a couple of times
> > due to concurrent swapout. But the hmm-tests selftest fails immediately
> > instead of retrying.
>
> My worries with just retrying is that it's very hard to assess whether
> there's a livelock or whether the retry has a good chance of success. As
> an example the ->migrate_to_ram path has some trylocks, and the window
> where all other threads got halfway and then fail the trylock is big
> enough that once you pile up enough threads that spin through there,
> you're stuck forever. Which isn't great.
>
> So if we could convert at least the first folio_trylock into a plain lock
> then forward progress is obviously assured and there's no need to crawl
> through large chunks of mm/ code to hunt for corner cases where we could
> be too unlucky to ever win the race.
>
> > > Since
> > > atomics can't cross 4k boundaries (or the hw is just really broken) this
> > > should be enough to avoid being stuck in a livelock. I'm also not seeing
> > > any other reason why a folio_lock shouldn't work here, but then my
> > > understanding of mm/ stuff is really just scratching the surface.
> > >
> > > I did crawl through all the other code and it looks like everything else
> > > is unconditional locks. So looks all good and I didn't spot anything else
> > > that seemed problematic.
> > >
> > > Somewhat aside, I do wonder whether we really want to require callers to
> > > hold the mmap lock, or whether with all the work towards lockless fastpath
> > > that shouldn't instead just be an implementation detail.
> >
> > We might be able to use the VMA lock in the future, but that will require
> > GUP support and a bunch more. Until then, the mm_lock in read mode is
> > required.
>
> Yup. I also don't think we should try to improve before benchmarks show an
> actual need. It's more about future proofing and making sure mmap_lock
> doesn't leak into driver data structures that I'm worried about. Because
> I've seen some hmm/gpu rfc patches that heavily relied on mmap_lock to
> keep everything correct on the driver side, which is not a clean design.
>
> > I was not able to convince myself that we'll really need the folio lock, but
> > that's also a separate discussion.
>
> This is way above my pay understanding of mm/ unfortunately.
I pondered this some more, and I think it's to make sure we get a stable
reading of folio_mapcount() and are not racing with new rmaps being
established. But I also got lost a few times in the maze ...
-Sima
>
> > > At least for the
> > > gpu hmm code I've seen I've tried to push hard towards a world were the
> > > gpu side does not rely on mmap_read_lock being held at all, to future
> > > proof this all. And currently we only have one caller of
> > > make_device_exclusive_range() so would be simple to do.
> >
> > We could likely move the mmap_lock into that function, but avoiding it is
> > more effort.
>
> I didn't mean more than just that, which would make sure drivers at least
> do not rely on mmap_lock being held. That then allows us to switch over to
> vma lock or anything else entirely within mm/ code.
>
> If we leave it as-is then more drivers accidentally or intentionally will
> rely on this, like I think is the case for ->migrate_to_ram for hmm
> already. And then it's more pain to untangle.
>
> > In any case, I'll send something out probably tomorrow to fix page
> > migration/swapout of pages with device-exclusive entries and a bunch of
> > other things (THP, interaction with hugetlb, ...).
>
> Thanks a lot!
>
> Cheer, Sima
> >
> > --
> > Cheers,
> >
> > David / dhildenb
> >
>
> --
> Simona Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
next prev parent reply other threads:[~2025-01-29 11:28 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-23 10:20 David Hildenbrand
2025-01-23 15:08 ` Simona Vetter
2025-01-24 10:44 ` David Hildenbrand
2025-01-24 14:11 ` Jason Gunthorpe
2025-01-24 14:39 ` David Hildenbrand
2025-01-24 15:28 ` Simona Vetter
2025-01-24 17:54 ` David Hildenbrand
2025-01-28 0:09 ` Alistair Popple
2025-01-28 20:14 ` Simona Vetter
2025-01-28 20:24 ` David Hildenbrand
2025-01-29 10:48 ` Simona Vetter
2025-01-29 11:28 ` Simona Vetter [this message]
2025-01-29 11:31 ` David Hildenbrand
2025-01-29 14:05 ` Simona Vetter
2025-01-29 16:13 ` David Hildenbrand
2025-01-30 8:55 ` Simona Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5oQ2YV1cRUc0KnD@phenom.ffwll.local \
--to=simona.vetter@ffwll.ch \
--cc=apopple@nvidia.com \
--cc=dakr@kernel.org \
--cc=david@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kherbst@redhat.com \
--cc=linux-mm@kvack.org \
--cc=lyude@redhat.com \
--cc=nouveau@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox