From: David Hildenbrand <david@redhat.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
nouveau@lists.freedesktop.org,
"Andrew Morton" <akpm@linux-foundation.org>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Jonathan Corbet" <corbet@lwn.net>, "Alex Shi" <alexs@kernel.org>,
"Yanteng Si" <si.yanteng@linux.dev>,
"Karol Herbst" <kherbst@redhat.com>,
"Lyude Paul" <lyude@redhat.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Jann Horn" <jannh@google.com>,
"Pasha Tatashin" <pasha.tatashin@soleen.com>,
"Peter Xu" <peterx@redhat.com>,
"Jason Gunthorpe" <jgg@nvidia.com>
Subject: Re: [PATCH v1 4/4] mm/memory: document restore_exclusive_pte()
Date: Thu, 30 Jan 2025 10:37:06 +0100 [thread overview]
Message-ID: <cfc4f8ac-80c4-472f-85fc-36ffcd212441@redhat.com> (raw)
In-Reply-To: <7vejbjs7btkof4iguvn3nqvozxqpnzbymxbumd7pant4zi4ac4@3ozuzfzsm5tp>
On 30.01.25 01:27, Alistair Popple wrote:
> On Wed, Jan 29, 2025 at 12:58:02PM +0100, David Hildenbrand wrote:
>> Let's document how this function is to be used, and why the requirement
>> for the folio lock might maybe be dropped in the future.
>
> Sorry, only just catching up on your other thread. The folio lock was to ensure
> the GPU got a chance to make forward progress by mapping the page. Without it
> the CPU could immediately invalidate the entry before the GPU had a chance to
> retry the fault.
> > Obviously performance wise having such thrashing is terrible, so should
> really be avoided by userspace, but the lock at least allowed such programs
> to complete.
Thanks for the clarification. So it's relevant that the MMU notifier in
remove_device_exclusive_entry() is sent after taking the folio lock.
However, as soon as we drop the folio lock,
remove_device_exclusive_entry() will become active, lock the folio and
trigger the MMU notifier.
So the time it is actually mapped into the device is rather
>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> mm/memory.c | 25 +++++++++++++++++++++++++
>> 1 file changed, 25 insertions(+)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 46956994aaff..caaae8df11a9 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -718,6 +718,31 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma,
>> }
>> #endif
>>
>> +/**
>> + * restore_exclusive_pte - Restore a device-exclusive entry
>> + * @vma: VMA covering @address
>> + * @folio: the mapped folio
>> + * @page: the mapped folio page
>> + * @address: the virtual address
>> + * @ptep: PTE pointer into the locked page table mapping the folio page
>> + * @orig_pte: PTE value at @ptep
>> + *
>> + * Restore a device-exclusive non-swap entry to an ordinary present PTE.
>> + *
>> + * The folio and the page table must be locked, and MMU notifiers must have
>> + * been called to invalidate any (exclusive) device mappings. In case of
>> + * fork(), MMU_NOTIFY_PROTECTION_PAGE is triggered, and in case of a page
>> + * fault MMU_NOTIFY_EXCLUSIVE is triggered.
>> + *
>> + * Locking the folio makes sure that anybody who just converted the PTE to
>> + * a device-private entry can map it into the device, before unlocking it; so
>> + * the folio lock prevents concurrent conversion to device-exclusive.
>
> I don't quite follow this - a concurrent conversion would already fail
> because the GUP in make_device_exclusive_range() would most likely cause
> an unexpected reference during the migration. And if a migration entry
> has already been installed for the device private PTE conversion then
> make_device_exclusive_range() will skip it as a non-present entry anyway.
Sorry, I meant "device-exclusive", so migration is not a concern.
>
> However s/device-private/device-exclusive/ makes sense - the intent was to allow
> the device to map it before a call to restore_exclusive_pte() (ie. a CPU fault)
> could convert it back to a normal PTE.
>
>> + * TODO: the folio lock does not protect against all cases of concurrent
>> + * page table modifications (e.g., MADV_DONTNEED, mprotect), so device drivers
>> + * must already use MMU notifiers to sync against any concurrent changes
>
> Right. It's expected drivers are using MMU notifiers to keep page tables in
> sync, same as for hmm_range_fault().
Let me try to rephrase it given that the folio lock is purely to
guarantee forward-progress, not for correctness; that's what MMU
notifiers must be used for.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-01-30 9:37 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-29 11:57 [PATCH v1 0/4] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
2025-01-29 11:57 ` [PATCH v1 1/4] lib/test_hmm: make dmirror_atomic_map() consume a single page David Hildenbrand
2025-01-30 0:29 ` Alistair Popple
2025-01-29 11:58 ` [PATCH v1 2/4] mm/mmu_notifier: drop owner from MMU_NOTIFY_EXCLUSIVE David Hildenbrand
2025-01-30 5:34 ` Alistair Popple
2025-01-30 9:28 ` David Hildenbrand
2025-01-30 13:29 ` Simona Vetter
2025-01-30 15:26 ` David Hildenbrand
2025-01-29 11:58 ` [PATCH v1 3/4] mm/memory: pass folio and pte to restore_exclusive_pte() David Hildenbrand
2025-01-30 5:37 ` Alistair Popple
2025-01-29 11:58 ` [PATCH v1 4/4] mm/memory: document restore_exclusive_pte() David Hildenbrand
2025-01-30 0:27 ` Alistair Popple
2025-01-30 9:37 ` David Hildenbrand [this message]
2025-01-30 13:31 ` Simona Vetter
2025-01-30 15:29 ` David Hildenbrand
2025-01-31 0:14 ` Alistair Popple
2025-01-31 17:20 ` Simona Vetter
2025-01-30 10:43 ` Simona Vetter
2025-01-31 0:20 ` Alistair Popple
2025-01-31 9:15 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cfc4f8ac-80c4-472f-85fc-36ffcd212441@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexs@kernel.org \
--cc=apopple@nvidia.com \
--cc=corbet@lwn.net \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=jannh@google.com \
--cc=jgg@nvidia.com \
--cc=jglisse@redhat.com \
--cc=kherbst@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lyude@redhat.com \
--cc=nouveau@lists.freedesktop.org \
--cc=pasha.tatashin@soleen.com \
--cc=peterx@redhat.com \
--cc=si.yanteng@linux.dev \
--cc=simona@ffwll.ch \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox