linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Simona Vetter <simona.vetter@ffwll.ch>
To: David Hildenbrand <david@redhat.com>
Cc: "Alistair Popple" <apopple@nvidia.com>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
	nouveau@lists.freedesktop.org,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Jonathan Corbet" <corbet@lwn.net>, "Alex Shi" <alexs@kernel.org>,
	"Yanteng Si" <si.yanteng@linux.dev>,
	"Karol Herbst" <kherbst@redhat.com>,
	"Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Jann Horn" <jannh@google.com>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>
Subject: Re: [PATCH v1 04/12] mm/rmap: implement make_device_exclusive() using folio_walk instead of rmap walk
Date: Thu, 30 Jan 2025 14:00:19 +0100	[thread overview]
Message-ID: <Z5t34-0K9FJKVQe6@phenom.ffwll.local> (raw)
In-Reply-To: <54a55ff7-38c8-42c2-886f-d6d1985072a9@redhat.com>

On Thu, Jan 30, 2025 at 10:47:29AM +0100, David Hildenbrand wrote:
> On 30.01.25 10:40, Simona Vetter wrote:
> > On Thu, Jan 30, 2025 at 05:11:49PM +1100, Alistair Popple wrote:
> > > On Wed, Jan 29, 2025 at 12:54:02PM +0100, David Hildenbrand wrote:
> > > > We require a writable PTE and only support anonymous folio: we can only
> > > > have exactly one PTE pointing at that page, which we can just lookup
> > > > using a folio walk, avoiding the rmap walk and the anon VMA lock.
> > > > 
> > > > So let's stop doing an rmap walk and perform a folio walk instead, so we
> > > > can easily just modify a single PTE and avoid relying on rmap/mapcounts.
> > > > 
> > > > We now effectively work on a single PTE instead of multiple PTEs of
> > > > a large folio, allowing for conversion of individual PTEs from
> > > > non-exclusive to device-exclusive -- note that the other way always
> > > > worked on single PTEs.
> > > > 
> > > > We can drop the MMU_NOTIFY_EXCLUSIVE MMU notifier call and document why
> > > > that is not required: GUP will already take care of the
> > > > MMU_NOTIFY_EXCLUSIVE call if required (there is already a device-exclusive
> > > > entry) when not finding a present PTE and having to trigger a fault and
> > > > ending up in remove_device_exclusive_entry().
> > > 
> > > I will have to look at this a bit more closely tomorrow but this doesn't seem
> > > right to me. We may be transitioning from a present PTE (ie. a writable
> > > anonymous mapping) to a non-present PTE (ie. a device-exclusive entry) and
> > > therefore any secondary processors (eg. other GPUs, iommus, etc.) will need to
> > > update their copies of the PTE. So I think the notifier call is needed.
> > 
> > I guess this is a question of semantics we want, for multiple gpus do we
> > require that device-exclusive also excludes other gpus or not. I'm leaning
> > towards agreeing with you here.
> 
> See my reply, it's also relevant for non-device, such as KVM. So it's the
> right thing to do.

Yeah sounds good.

> > > > Note that the PTE is
> > > > always writable, and we can always create a writable-device-exclusive
> > > > entry.
> > > > 
> > > > With this change, device-exclusive is fully compatible with THPs /
> > > > large folios. We still require PMD-sized THPs to get PTE-mapped, and
> > > > supporting PMD-mapped THP (without the PTE-remapping) is a different
> > > > endeavour that might not be worth it at this point.
> > 
> > I'm not sure we actually want hugepages for device exclusive, since it has
> > an impact on what's allowed and what not. If we only ever do 4k entries
> > then userspace can assume that as long atomics are separated by a 4k page
> > there's no issue when both the gpu and cpu hammer on them. If we try to
> > keep thp entries then suddenly a workload that worked before will result
> > in endless ping-pong between gpu and cpu because the separate atomic
> > counters (or whatever) now all sit in the same 2m page.
> 
> Agreed. And the conversion + mapping into the device gets trickier.
> 
> > 
> > So going with thp might result in userspace having to spread out atomics
> > even more, which is just wasting memory and not saving any tlb entries
> > since often you don't need that many.
> > 
> > tldr; I think not supporting thp entries for device exclusive is a
> > feature, not a bug.
> 
> So, you agree with my "different endeavour that might not be worth it"
> statement?

Yes.

Well I think we should go further and clearly document that we
intentionally return split pages. Because it's part of the uapi contract
with users of all this.

And if someone needs pmd entries for performance or whatever, we need two
things:

a) userspace must mmap that memory as hugepage memory, to clearly signal
the promise that atomics are split up on hugepage sizes and not just page
size

b) we need to extend make_device_exclusive and drivers to handle the
hugetlb folio case

I think thp is simply not going to work here, it's impossible (without
potentially causing fault storms) to figure out what userspace might want.

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


  reply	other threads:[~2025-01-30 13:00 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-29 11:53 [PATCH v1 00/12] mm: fixes for device-exclusive entries (hmm) David Hildenbrand
2025-01-29 11:53 ` [PATCH v1 01/12] mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs David Hildenbrand
2025-01-29 21:42   ` John Hubbard
2025-01-30  8:56     ` David Hildenbrand
2025-01-30  5:46   ` Alistair Popple
2025-01-29 11:54 ` [PATCH v1 02/12] mm/rmap: reject hugetlb folios in folio_make_device_exclusive() David Hildenbrand
2025-01-30  5:47   ` Alistair Popple
2025-01-29 11:54 ` [PATCH v1 03/12] mm/rmap: convert make_device_exclusive_range() to make_device_exclusive() David Hildenbrand
2025-01-30  5:57   ` Alistair Popple
2025-01-30  9:04     ` David Hildenbrand
2025-01-31  0:28     ` Alistair Popple
2025-01-31  9:29       ` David Hildenbrand
2025-01-30 13:46   ` Simona Vetter
2025-01-30 15:56     ` David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 04/12] mm/rmap: implement make_device_exclusive() using folio_walk instead of rmap walk David Hildenbrand
2025-01-30  6:11   ` Alistair Popple
2025-01-30  9:01     ` David Hildenbrand
2025-01-30  9:12       ` David Hildenbrand
2025-01-30  9:24       ` David Hildenbrand
2025-01-30 22:31         ` Alistair Popple
2025-02-04 10:56           ` David Hildenbrand
2025-01-30  9:40     ` Simona Vetter
2025-01-30  9:47       ` David Hildenbrand
2025-01-30 13:00         ` Simona Vetter [this message]
2025-01-30 15:59           ` David Hildenbrand
2025-01-31 17:00             ` Simona Vetter
2025-01-29 11:54 ` [PATCH v1 05/12] mm/memory: detect writability in restore_exclusive_pte() through can_change_pte_writable() David Hildenbrand
2025-01-30  9:51   ` Simona Vetter
2025-01-30  9:58     ` David Hildenbrand
2025-01-30 13:03       ` Simona Vetter
2025-01-30 23:06         ` Alistair Popple
2025-01-31 10:55           ` David Hildenbrand
2025-01-31 17:05             ` Simona Vetter
2025-02-04 10:58               ` David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 06/12] mm: use single SWP_DEVICE_EXCLUSIVE entry type David Hildenbrand
2025-01-30 13:43   ` Simona Vetter
2025-01-30 23:28   ` Alistair Popple
2025-01-29 11:54 ` [PATCH v1 07/12] mm/page_vma_mapped: device-private entries are not migration entries David Hildenbrand
2025-01-30 23:36   ` Alistair Popple
2025-01-31 11:06     ` David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 08/12] mm/rmap: handle device-exclusive entries correctly in try_to_unmap_one() David Hildenbrand
2025-01-30 10:10   ` Simona Vetter
2025-01-30 11:08     ` David Hildenbrand
2025-01-30 13:06       ` Simona Vetter
2025-01-30 14:08         ` Jason Gunthorpe
2025-01-30 16:10           ` Simona Vetter
2025-01-30 15:52         ` David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 09/12] mm/rmap: handle device-exclusive entries correctly in try_to_migrate_one() David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 10/12] mm/rmap: handle device-exclusive entries correctly in folio_referenced_one() David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 11/12] mm/rmap: handle device-exclusive entries correctly in page_vma_mkclean_one() David Hildenbrand
2025-01-29 11:54 ` [PATCH v1 12/12] mm/rmap: keep mapcount untouched for device-exclusive entries David Hildenbrand
2025-01-30 10:37   ` Simona Vetter
2025-01-30 11:42     ` David Hildenbrand
2025-01-30 13:19       ` Simona Vetter
2025-01-30 15:43         ` David Hildenbrand
2025-01-31 17:13           ` Simona Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5t34-0K9FJKVQe6@phenom.ffwll.local \
    --to=simona.vetter@ffwll.ch \
    --cc=Liam.Howlett@oracle.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexs@kernel.org \
    --cc=apopple@nvidia.com \
    --cc=corbet@lwn.net \
    --cc=dakr@kernel.org \
    --cc=david@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jannh@google.com \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=kherbst@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lyude@redhat.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=si.yanteng@linux.dev \
    --cc=simona@ffwll.ch \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox