From: Simona Vetter <simona.vetter@ffwll.ch>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Yonatan Maman" <ymaman@nvidia.com>,
kherbst@redhat.com, lyude@redhat.com, dakr@redhat.com,
airlied@gmail.com, simona@ffwll.ch, leon@kernel.org,
jglisse@redhat.com, akpm@linux-foundation.org,
GalShalom@nvidia.com, dri-devel@lists.freedesktop.org,
nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, linux-mm@kvack.org,
linux-tegra@vger.kernel.org
Subject: Re: [RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages
Date: Fri, 31 Jan 2025 17:59:26 +0100 [thread overview]
Message-ID: <Z50BbuUQWIaDPRzK@phenom.ffwll.local> (raw)
In-Reply-To: <20250130174217.GA2296753@ziepe.ca>
On Thu, Jan 30, 2025 at 01:42:17PM -0400, Jason Gunthorpe wrote:
> On Thu, Jan 30, 2025 at 05:09:44PM +0100, Simona Vetter wrote:
> > > > An optional callback is a lot less scary to me here (or redoing
> > > > hmm_range_fault or whacking the migration helpers a few times) looks a lot
> > > > less scary than making pgmap->owner mutable in some fashion.
> > >
> > > It extra for every single 4k page on every user :\
> > >
> > > And what are you going to do better inside this callback?
> >
> > Having more comfy illusions :-P
>
> Exactly!
>
> > Slightly more seriously, I can grab some locks and make life easier,
>
> Yes, but then see my concern about performance again. Now you are
> locking/unlocking every 4k? And then it still races since it can
> change after hmm_range_fault returns. That's not small, and not really
> better.
Hm yeah, I think that's the death argument for the callback. Consider me
convinced on that being a bad idea.
> > whereas sprinkling locking or even barriers over pgmap->owner in core mm
> > is not going to fly. Plus more flexibility, e.g. when the interconnect
> > doesn't work for atomics or some other funny reason it only works for some
> > of the traffic, where you need to more dynamically decide where memory is
> > ok to sit.
>
> Sure, an asymmetric mess could be problematic, and we might need more
> later, but lets get to that first..
>
> > Or checking p2pdma connectivity and all that stuff.
>
> Should be done in the core code, don't want drivers open coding this
> stuff.
Yeah so after amdkfd and noveau I agree that letting drivers mess this up
isn't great. But see below, I'm not sure whether going all the way to core
code is the right approach, at least for gpu internal needs.
> > Also note that fundamentally you can't protect against the hotunplug or
> > driver unload case for hardware access. So some access will go to nowhere
> > when that happens, until we've torn down all the mappings and migrated
> > memory out.
>
> I think a literal (safe!) hot unplug must always use the page map
> revoke, and that should be safely locked between hmm_range_fault and
> the notifier.
>
> If the underlying fabric is loosing operations during an unplanned hot
> unplug I expect things will need resets to recover..
So one aspect where I don't like the pgmap->owner approach much is that
it's a big thing to get right, and it feels a bit to me that we don't yet
know the right questions.
A bit related is that we'll have to do some driver-specific migration
after hmm_range_fault anyway for allocation policies. With coherent
interconnect that'd be up to numactl, but for driver private it's all up
to the driver. And once we have that, we can also migrate memory around
that's misplaced for functional and not just performance reasons.
The plan I discussed with Thomas a while back at least for gpus was to
have that as a drm_devpagemap library, which would have a common owner (or
maybe per driver or so as Thomas suggested). Then it would still not be in
drivers, but also a bit easier to mess around with for experiments. And
once we have some clear things that hmm_range_fault should do instead for
everyone, we can lift them up.
Doing this at a pagemap level should also be much more efficient, since I
think we can make the assumption that access limitations are uniform for a
given dev_pagemap (and if they're not if e.g. not the entire vram is bus
visible, drivers can handle that by splitting things up).
But upfront speccing all this out doesn't seem like a good idea to,
because I honestly don't know what we all need.
Cheers, Sima
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
next prev parent reply other threads:[~2025-01-31 16:59 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-01 10:36 [RFC 0/5] GPU Direct RDMA (P2P DMA) for Device Private Pages Yonatan Maman
2024-12-01 10:36 ` [RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages Yonatan Maman
2025-01-28 8:51 ` Thomas Hellström
2025-01-28 13:20 ` Jason Gunthorpe
2025-01-28 14:48 ` Thomas Hellström
2025-01-28 15:16 ` Jason Gunthorpe
2025-01-28 16:32 ` Thomas Hellström
2025-01-28 17:21 ` Jason Gunthorpe
2025-01-29 13:38 ` Simona Vetter
2025-01-29 13:47 ` Jason Gunthorpe
2025-01-29 17:09 ` Thomas Hellström
2025-01-30 10:50 ` Simona Vetter
2025-01-30 13:23 ` Jason Gunthorpe
2025-01-30 16:09 ` Simona Vetter
2025-01-30 17:42 ` Jason Gunthorpe
2025-01-31 16:59 ` Simona Vetter [this message]
2025-02-03 15:08 ` Jason Gunthorpe
2025-02-04 9:32 ` Thomas Hellström
2025-02-04 13:26 ` Jason Gunthorpe
2025-02-04 14:29 ` Thomas Hellström
2025-02-04 19:16 ` Jason Gunthorpe
2025-02-04 22:01 ` Thomas Hellström
2024-12-01 10:36 ` [RFC 2/5] nouveau/dmem: HMM P2P DMA for private dev pages Yonatan Maman
2024-12-01 10:36 ` [RFC 3/5] IB/core: P2P DMA for device private pages Yonatan Maman
2024-12-01 10:36 ` [RFC 4/5] RDMA/mlx5: Add fallback for P2P DMA errors Yonatan Maman
2024-12-01 10:36 ` [RFC 5/5] RDMA/mlx5: Enabling ATS for ODP memory Yonatan Maman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z50BbuUQWIaDPRzK@phenom.ffwll.local \
--to=simona.vetter@ffwll.ch \
--cc=GalShalom@nvidia.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=dakr@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=jgg@ziepe.ca \
--cc=jglisse@redhat.com \
--cc=kherbst@redhat.com \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=lyude@redhat.com \
--cc=nouveau@lists.freedesktop.org \
--cc=simona@ffwll.ch \
--cc=thomas.hellstrom@linux.intel.com \
--cc=ymaman@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox