linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: "David Hildenbrand" <david@redhat.com>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
	nouveau@lists.freedesktop.org,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Jonathan Corbet" <corbet@lwn.net>, "Alex Shi" <alexs@kernel.org>,
	"Yanteng Si" <si.yanteng@linux.dev>,
	"Karol Herbst" <kherbst@redhat.com>,
	"Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Jann Horn" <jannh@google.com>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>
Subject: Re: [PATCH v1 4/4] mm/memory: document restore_exclusive_pte()
Date: Fri, 31 Jan 2025 11:20:33 +1100	[thread overview]
Message-ID: <6moaipt4rmc62ijy2rtjbmzb5phgjpygxgqeic3bljydlwhxls@qqzuqbvs5gnh> (raw)
In-Reply-To: <Z5tXzV0vcKJg_wuM@phenom.ffwll.local>

On Thu, Jan 30, 2025 at 11:43:25AM +0100, Simona Vetter wrote:
> On Thu, Jan 30, 2025 at 11:27:37AM +1100, Alistair Popple wrote:
> > On Wed, Jan 29, 2025 at 12:58:02PM +0100, David Hildenbrand wrote:
> > > Let's document how this function is to be used, and why the requirement
> > > for the folio lock might maybe be dropped in the future.
> > 
> > Sorry, only just catching up on your other thread. The folio lock was to ensure
> > the GPU got a chance to make forward progress by mapping the page. Without it
> > the CPU could immediately invalidate the entry before the GPU had a chance to
> > retry the fault.
> > 
> > Obviously performance wise having such thrashing is terrible, so should
> > really be avoided by userspace, but the lock at least allowed such programs
> > to complete.
> 
> Imo this is not a legit use-case. If userspace concurrently (instead of
> clearly alternating) uses the same 4k page for gpu atomics and on the cpu,
> it just gets to keep the fallout.
> 
> Plus there's no guarantee that we hold the folio_lock long enough for the
> gpu to actually complete the atomic, so this isn't even really helping
> with forward progress even if this somehow would be a legit usecase.

Yes, agree it's not a legit real world use case. In practice though it was
useful for testing this and other driver code by thrashing to generate a lot
device/cpu faults and invalidations. Obviously "just for testing" is not a great
justification though, so if it's causing problems we could get rid of it.

> But this is also why thp is potentially an issue, because if thp
> constantly creates pmd entries that potentially causes false sharing and
> we do have thrashing that shouldn't happen.

Yeah, I don't we should extend this to thps.

 - Alistair

> -Sima
> 
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > >  mm/memory.c | 25 +++++++++++++++++++++++++
> > >  1 file changed, 25 insertions(+)
> > > 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 46956994aaff..caaae8df11a9 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -718,6 +718,31 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma,
> > >  }
> > >  #endif
> > >  
> > > +/**
> > > + * restore_exclusive_pte - Restore a device-exclusive entry
> > > + * @vma: VMA covering @address
> > > + * @folio: the mapped folio
> > > + * @page: the mapped folio page
> > > + * @address: the virtual address
> > > + * @ptep: PTE pointer into the locked page table mapping the folio page
> > > + * @orig_pte: PTE value at @ptep
> > > + *
> > > + * Restore a device-exclusive non-swap entry to an ordinary present PTE.
> > > + *
> > > + * The folio and the page table must be locked, and MMU notifiers must have
> > > + * been called to invalidate any (exclusive) device mappings. In case of
> > > + * fork(), MMU_NOTIFY_PROTECTION_PAGE is triggered, and in case of a page
> > > + * fault MMU_NOTIFY_EXCLUSIVE is triggered.
> > > + *
> > > + * Locking the folio makes sure that anybody who just converted the PTE to
> > > + * a device-private entry can map it into the device, before unlocking it; so
> > > + * the folio lock prevents concurrent conversion to device-exclusive.
> > 
> > I don't quite follow this - a concurrent conversion would already fail
> > because the GUP in make_device_exclusive_range() would most likely cause
> > an unexpected reference during the migration. And if a migration entry
> > has already been installed for the device private PTE conversion then
> > make_device_exclusive_range() will skip it as a non-present entry anyway.
> > 
> > However s/device-private/device-exclusive/ makes sense - the intent was to allow
> > the device to map it before a call to restore_exclusive_pte() (ie. a CPU fault)
> > could convert it back to a normal PTE.
> > 
> > > + * TODO: the folio lock does not protect against all cases of concurrent
> > > + * page table modifications (e.g., MADV_DONTNEED, mprotect), so device drivers
> > > + * must already use MMU notifiers to sync against any concurrent changes
> > 
> > Right. It's expected drivers are using MMU notifiers to keep page tables in
> > sync, same as for hmm_range_fault().
> > 
> > > + * Maybe the requirement for the folio lock can be dropped in the future.
> > > + */
> > >  static void restore_exclusive_pte(struct vm_area_struct *vma,
> > >  		struct folio *folio, struct page *page, unsigned long address,
> > >  		pte_t *ptep, pte_t orig_pte)
> > > -- 
> > > 2.48.1
> > > 
> 
> -- 
> Simona Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


  reply	other threads:[~2025-01-31  0:20 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-29 11:57 [PATCH v1 0/4] mm: cleanups for device-exclusive entries (hmm) David Hildenbrand
2025-01-29 11:57 ` [PATCH v1 1/4] lib/test_hmm: make dmirror_atomic_map() consume a single page David Hildenbrand
2025-01-30  0:29   ` Alistair Popple
2025-01-29 11:58 ` [PATCH v1 2/4] mm/mmu_notifier: drop owner from MMU_NOTIFY_EXCLUSIVE David Hildenbrand
2025-01-30  5:34   ` Alistair Popple
2025-01-30  9:28     ` David Hildenbrand
2025-01-30 13:29       ` Simona Vetter
2025-01-30 15:26         ` David Hildenbrand
2025-01-29 11:58 ` [PATCH v1 3/4] mm/memory: pass folio and pte to restore_exclusive_pte() David Hildenbrand
2025-01-30  5:37   ` Alistair Popple
2025-01-29 11:58 ` [PATCH v1 4/4] mm/memory: document restore_exclusive_pte() David Hildenbrand
2025-01-30  0:27   ` Alistair Popple
2025-01-30  9:37     ` David Hildenbrand
2025-01-30 13:31       ` Simona Vetter
2025-01-30 15:29         ` David Hildenbrand
2025-01-31  0:14           ` Alistair Popple
2025-01-31 17:20             ` Simona Vetter
2025-01-30 10:43     ` Simona Vetter
2025-01-31  0:20       ` Alistair Popple [this message]
2025-01-31  9:15         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6moaipt4rmc62ijy2rtjbmzb5phgjpygxgqeic3bljydlwhxls@qqzuqbvs5gnh \
    --to=apopple@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexs@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dakr@kernel.org \
    --cc=david@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jannh@google.com \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=kherbst@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lyude@redhat.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=si.yanteng@linux.dev \
    --cc=simona@ffwll.ch \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox