* [PATCH v5 0/5] Enable THP support in drm_pagemap
@ 2026-01-14 19:19 Francois Dugast
2026-01-14 19:19 ` [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast
` (4 more replies)
0 siblings, 5 replies; 33+ messages in thread
From: Francois Dugast @ 2026-01-14 19:19 UTC (permalink / raw)
To: intel-xe
Cc: dri-devel, Francois Dugast, Zi Yan, Madhavan Srinivasan,
Alistair Popple, Lorenzo Stoakes, Liam R . Howlett,
Suren Baghdasaryan, Michal Hocko, Mike Rapoport, Vlastimil Babka,
Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
Felix Kuehling, Alex Deucher, Christian König, David Airlie,
Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Lyude Paul, Danilo Krummrich, Bjorn Helgaas,
Logan Gunthorpe, David Hildenbrand, Oscar Salvador,
Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Balbir Singh,
Dan Williams, Matthew Wilcox, Jan Kara, Alexander Viro,
Christian Brauner, linuxppc-dev, kvm, linux-kernel, amd-gfx,
nouveau, linux-pci, linux-mm, linux-cxl, nvdimm, linux-fsdevel
Use Balbir Singh's series for device-private THP support [1] and previous
preparation work in drm_pagemap [2] to add 2MB/THP support in xe. This leads
to significant performance improvements when using SVM with 2MB pages.
[1] https://lore.kernel.org/linux-mm/20251001065707.920170-1-balbirs@nvidia.com/
[2] https://patchwork.freedesktop.org/series/151754/
v2:
- rebase on top of multi-device SVM
- add drm_pagemap_cpages() with temporary patch
- address other feedback from Matt Brost on v1
v3:
The major change is to remove the dependency to the mm/huge_memory
helper migrate_device_split_page() that was called explicitely when
a 2M buddy allocation backed by a large folio would be later reused
for a smaller allocation (4K or 64K). Instead, the first 3 patches
provided by Matthew Brost ensure large folios are split at the time
of freeing.
v4:
- add order argument to folio_free callback
- send complete series to linux-mm and MM folks as requested (Zi Yan
and Andrew Morton) and cover letter to anyone receiving at least
one of the patches (Liam R. Howlett)
v5:
- update zone_device_page_init() in patch #1 to reinitialize large
zone device private folios
Cc: Zi Yan <ziy@nvidia.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: linux-pci@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-cxl@vger.kernel.org
Cc: nvdimm@lists.linux.dev
Cc: linux-fsdevel@vger.kernel.org
Francois Dugast (3):
drm/pagemap: Unlock and put folios when possible
drm/pagemap: Add helper to access zone_device_data
drm/pagemap: Enable THP support for GPU memory migration
Matthew Brost (2):
mm/zone_device: Reinitialize large zone device private folios
drm/pagemap: Correct cpages calculation for migrate_vma_setup
arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +-
drivers/gpu/drm/drm_gpusvm.c | 7 +-
drivers/gpu/drm/drm_pagemap.c | 158 ++++++++++++++++++-----
drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +-
include/drm/drm_pagemap.h | 15 +++
include/linux/memremap.h | 9 +-
lib/test_hmm.c | 4 +-
mm/memremap.c | 20 ++-
9 files changed, 180 insertions(+), 39 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 33+ messages in thread* [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-14 19:19 [PATCH v5 0/5] Enable THP support in drm_pagemap Francois Dugast @ 2026-01-14 19:19 ` Francois Dugast 2026-01-14 21:48 ` Andrew Morton ` (4 more replies) 2026-01-14 19:19 ` [PATCH v5 2/5] drm/pagemap: Unlock and put folios when possible Francois Dugast ` (3 subsequent siblings) 4 siblings, 5 replies; 33+ messages in thread From: Francois Dugast @ 2026-01-14 19:19 UTC (permalink / raw) To: intel-xe Cc: dri-devel, Matthew Brost, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl, Francois Dugast From: Matthew Brost <matthew.brost@intel.com> Reinitialize metadata for large zone device private folios in zone_device_page_init prior to creating a higher-order zone device private folio. This step is necessary when the folio’s order changes dynamically between zone_device_page_init calls to avoid building a corrupt folio. As part of the metadata reinitialization, the dev_pagemap must be passed in from the caller because the pgmap stored in the folio page may have been overwritten with a compound head. Cc: Zi Yan <ziy@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: adhavan Srinivasan <maddy@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: linux-mm@kvack.org Cc: linux-cxl@vger.kernel.org Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> --- arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- drivers/gpu/drm/drm_pagemap.c | 2 +- drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- include/linux/memremap.h | 9 ++++++--- lib/test_hmm.c | 4 +++- mm/memremap.c | 20 +++++++++++++++++++- 7 files changed, 32 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index e5000bef90f2..7cf9310de0ec 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) dpage = pfn_to_page(uvmem_pfn); dpage->zone_device_data = pvt; - zone_device_page_init(dpage, 0); + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); return dpage; out_clear: spin_lock(&kvmppc_uvmem_bitmap_lock); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index af53e796ea1b..6ada7b4af7c6 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) page = pfn_to_page(pfn); svm_range_bo_ref(prange->svm_bo); page->zone_device_data = prange->svm_bo; - zone_device_page_init(page, 0); + zone_device_page_init(page, page_pgmap(page), 0); } static void diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c index 03ee39a761a4..c497726b0147 100644 --- a/drivers/gpu/drm/drm_pagemap.c +++ b/drivers/gpu/drm/drm_pagemap.c @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, struct drm_pagemap_zdd *zdd) { page->zone_device_data = drm_pagemap_zdd_get(zdd); - zone_device_page_init(page, 0); + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); } /** diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 58071652679d..3d8031296eed 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) order = ilog2(DMEM_CHUNK_NPAGES); } - zone_device_folio_init(folio, order); + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); return page; } diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 713ec0435b48..e3c2ccf872a8 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) } #ifdef CONFIG_ZONE_DEVICE -void zone_device_page_init(struct page *page, unsigned int order); +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, + unsigned int order); void *memremap_pages(struct dev_pagemap *pgmap, int nid); void memunmap_pages(struct dev_pagemap *pgmap); void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); unsigned long memremap_compat_align(void); -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) +static inline void zone_device_folio_init(struct folio *folio, + struct dev_pagemap *pgmap, + unsigned int order) { - zone_device_page_init(&folio->page, order); + zone_device_page_init(&folio->page, pgmap, order); if (order) folio_set_large_rmappable(folio); } diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 8af169d3873a..455a6862ae50 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, goto error; } - zone_device_folio_init(page_folio(dpage), order); + zone_device_folio_init(page_folio(dpage), + page_pgmap(folio_page(page_folio(dpage), 0)), + order); dpage->zone_device_data = rpage; return dpage; diff --git a/mm/memremap.c b/mm/memremap.c index 63c6ab4fdf08..6f46ab14662b 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) } } -void zone_device_page_init(struct page *page, unsigned int order) +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, + unsigned int order) { + struct page *new_page = page; + unsigned int i; + VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); + for (i = 0; i < (1UL << order); ++i, ++new_page) { + struct folio *new_folio = (struct folio *)new_page; + + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ +#ifdef NR_PAGES_IN_LARGE_FOLIO + ((struct folio *)(new_page - 1))->_nr_pages = 0; +#endif + new_folio->mapping = NULL; + new_folio->pgmap = pgmap; /* Also clear compound head */ + new_folio->share = 0; /* fsdax only, unused for device private */ + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); + } + /* * Drivers shouldn't be allocating pages after calling * memunmap_pages(). -- 2.43.0 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-14 19:19 ` [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast @ 2026-01-14 21:48 ` Andrew Morton 2026-01-14 23:34 ` Matthew Brost 2026-01-15 2:36 ` Balbir Singh ` (3 subsequent siblings) 4 siblings, 1 reply; 33+ messages in thread From: Andrew Morton @ 2026-01-14 21:48 UTC (permalink / raw) To: Francois Dugast Cc: intel-xe, dri-devel, Matthew Brost, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Wed, 14 Jan 2026 20:19:52 +0100 Francois Dugast <francois.dugast@intel.com> wrote: > Reinitialize metadata for large zone device private folios in > zone_device_page_init prior to creating a higher-order zone device > private folio. This step is necessary when the folio’s order changes > dynamically between zone_device_page_init calls to avoid building a > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > must be passed in from the caller because the pgmap stored in the folio > page may have been overwritten with a compound head. Thanks. What are the worst-case userspace-visible effects of the bug? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-14 21:48 ` Andrew Morton @ 2026-01-14 23:34 ` Matthew Brost 2026-01-14 23:51 ` Matthew Brost 0 siblings, 1 reply; 33+ messages in thread From: Matthew Brost @ 2026-01-14 23:34 UTC (permalink / raw) To: Andrew Morton Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Wed, Jan 14, 2026 at 01:48:25PM -0800, Andrew Morton wrote: > On Wed, 14 Jan 2026 20:19:52 +0100 Francois Dugast <francois.dugast@intel.com> wrote: > > > Reinitialize metadata for large zone device private folios in > > zone_device_page_init prior to creating a higher-order zone device > > private folio. This step is necessary when the folio’s order changes > > dynamically between zone_device_page_init calls to avoid building a > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > must be passed in from the caller because the pgmap stored in the folio > > page may have been overwritten with a compound head. > > Thanks. What are the worst-case userspace-visible effects of the bug? If you reallocate a subset of pages from what was originally a large device folio, the pgmap mapping becomes invalid because it was overwritten by the compound head, and this can crash the kernel. Alternatively, consider the case where the original folio had an order of 9 and _nr_pages was set. If you then reallocate the folio plus one as an individual page, the flags would still have PG_locked set, causing a hang the next time you try to lock the page. This is pretty bad if drivers implement a buddy allocator for device pages (Xe does; Nouveau doesn’t, which is why they haven’t hit this issue). Only Nouveau enables large device pages in 6.19 but probably best to have kernel flying around with known issues. Matt ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-14 23:34 ` Matthew Brost @ 2026-01-14 23:51 ` Matthew Brost 2026-01-15 2:40 ` Andrew Morton 0 siblings, 1 reply; 33+ messages in thread From: Matthew Brost @ 2026-01-14 23:51 UTC (permalink / raw) To: Andrew Morton Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Wed, Jan 14, 2026 at 03:34:21PM -0800, Matthew Brost wrote: > On Wed, Jan 14, 2026 at 01:48:25PM -0800, Andrew Morton wrote: > > On Wed, 14 Jan 2026 20:19:52 +0100 Francois Dugast <francois.dugast@intel.com> wrote: > > > > > Reinitialize metadata for large zone device private folios in > > > zone_device_page_init prior to creating a higher-order zone device > > > private folio. This step is necessary when the folio’s order changes > > > dynamically between zone_device_page_init calls to avoid building a > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > must be passed in from the caller because the pgmap stored in the folio > > > page may have been overwritten with a compound head. > > > > Thanks. What are the worst-case userspace-visible effects of the bug? > > If you reallocate a subset of pages from what was originally a large > device folio, the pgmap mapping becomes invalid because it was > overwritten by the compound head, and this can crash the kernel. > > Alternatively, consider the case where the original folio had an order > of 9 and _nr_pages was set. If you then reallocate the folio plus one as s/_nr_pages/the order was encoded the page flags. Not clearing _nr_pages is probably bad too, not sure what the side affect of that is, but it can't be good. > an individual page, the flags would still have PG_locked set, causing a > hang the next time you try to lock the page. > > This is pretty bad if drivers implement a buddy allocator for device > pages (Xe does; Nouveau doesn’t, which is why they haven’t hit this > issue). Only Nouveau enables large device pages in 6.19 but probably > best to have kernel flying around with known issues. s/best to have kernel/best to not have kernels Matt > > Matt ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-14 23:51 ` Matthew Brost @ 2026-01-15 2:40 ` Andrew Morton 2026-01-15 2:50 ` Matthew Brost 0 siblings, 1 reply; 33+ messages in thread From: Andrew Morton @ 2026-01-15 2:40 UTC (permalink / raw) To: Matthew Brost Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Wed, 14 Jan 2026 15:51:16 -0800 Matthew Brost <matthew.brost@intel.com> wrote: > On Wed, Jan 14, 2026 at 03:34:21PM -0800, Matthew Brost wrote: > > On Wed, Jan 14, 2026 at 01:48:25PM -0800, Andrew Morton wrote: > > > On Wed, 14 Jan 2026 20:19:52 +0100 Francois Dugast <francois.dugast@intel.com> wrote: > > > > > > > Reinitialize metadata for large zone device private folios in > > > > zone_device_page_init prior to creating a higher-order zone device > > > > private folio. This step is necessary when the folio’s order changes > > > > dynamically between zone_device_page_init calls to avoid building a > > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > > must be passed in from the caller because the pgmap stored in the folio > > > > page may have been overwritten with a compound head. > > > > > > Thanks. What are the worst-case userspace-visible effects of the bug? > > > > If you reallocate a subset of pages from what was originally a large > > device folio, the pgmap mapping becomes invalid because it was > > overwritten by the compound head, and this can crash the kernel. > > > > Alternatively, consider the case where the original folio had an order > > of 9 and _nr_pages was set. If you then reallocate the folio plus one as > > s/_nr_pages/the order was encoded the page flags. > > ... > > s/best to have kernel/best to not have kernels > Great, thanks. I pasted all the above into the changelog to help explain our reasons. I'll retain the patch in mm-hotfixes, targeting 6.19-rcX. The remainder of the series is DRM stuff, NotMyProblem. I assume that getting this into 6.19-rcX is helpful to DRM - if not, and if taking this via the DRM tree is preferable then let's discuss. Can reviewers please take a look at this reasonably promptly? btw, this patch uses + struct folio *new_folio = (struct folio *)new_page; Was page_folio() unsuitable? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 2:40 ` Andrew Morton @ 2026-01-15 2:50 ` Matthew Brost 0 siblings, 0 replies; 33+ messages in thread From: Matthew Brost @ 2026-01-15 2:50 UTC (permalink / raw) To: Andrew Morton Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Wed, Jan 14, 2026 at 06:40:42PM -0800, Andrew Morton wrote: > On Wed, 14 Jan 2026 15:51:16 -0800 Matthew Brost <matthew.brost@intel.com> wrote: > > > On Wed, Jan 14, 2026 at 03:34:21PM -0800, Matthew Brost wrote: > > > On Wed, Jan 14, 2026 at 01:48:25PM -0800, Andrew Morton wrote: > > > > On Wed, 14 Jan 2026 20:19:52 +0100 Francois Dugast <francois.dugast@intel.com> wrote: > > > > > > > > > Reinitialize metadata for large zone device private folios in > > > > > zone_device_page_init prior to creating a higher-order zone device > > > > > private folio. This step is necessary when the folio’s order changes > > > > > dynamically between zone_device_page_init calls to avoid building a > > > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > > > must be passed in from the caller because the pgmap stored in the folio > > > > > page may have been overwritten with a compound head. > > > > > > > > Thanks. What are the worst-case userspace-visible effects of the bug? > > > > > > If you reallocate a subset of pages from what was originally a large > > > device folio, the pgmap mapping becomes invalid because it was > > > overwritten by the compound head, and this can crash the kernel. > > > > > > Alternatively, consider the case where the original folio had an order > > > of 9 and _nr_pages was set. If you then reallocate the folio plus one as > > > > s/_nr_pages/the order was encoded the page flags. > > > > ... > > > > s/best to have kernel/best to not have kernels > > > > Great, thanks. I pasted all the above into the changelog to help > explain our reasons. I'll retain the patch in mm-hotfixes, targeting > 6.19-rcX. The remainder of the series is DRM stuff, NotMyProblem. I > assume that getting this into 6.19-rcX is helpful to DRM - if not, and > if taking this via the DRM tree is preferable then let's discuss. > I would prefer to take this through DRM since our window for 7.0 closes earlier than the rest of Linux (typically this Friday), which makes it easier for me to merge the other four patches and include them in the next PR. If we can't take it through DRM, I'm sure we can figure something out - new as a maintainer here, so still figuring out all DRM flows. > Can reviewers please take a look at this reasonably promptly? > > > btw, this patch uses > > + struct folio *new_folio = (struct folio *)new_page; > > Was page_folio() unsuitable? > The compound head might be pointing somewhere else here, and we are trying to clear the metadata from new_page up to order << 1. So we explictly do not want to use page_folio here. Matt ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-14 19:19 ` [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast 2026-01-14 21:48 ` Andrew Morton @ 2026-01-15 2:36 ` Balbir Singh 2026-01-15 2:41 ` Matthew Brost 2026-01-15 3:01 ` Andrew Morton ` (2 subsequent siblings) 4 siblings, 1 reply; 33+ messages in thread From: Balbir Singh @ 2026-01-15 2:36 UTC (permalink / raw) To: Francois Dugast, intel-xe Cc: dri-devel, Matthew Brost, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On 1/15/26 06:19, Francois Dugast wrote: > From: Matthew Brost <matthew.brost@intel.com> > > Reinitialize metadata for large zone device private folios in > zone_device_page_init prior to creating a higher-order zone device > private folio. This step is necessary when the folio’s order changes > dynamically between zone_device_page_init calls to avoid building a > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > must be passed in from the caller because the pgmap stored in the folio > page may have been overwritten with a compound head. > > Cc: Zi Yan <ziy@nvidia.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > Cc: Alex Deucher <alexander.deucher@amd.com> > Cc: "Christian König" <christian.koenig@amd.com> > Cc: David Airlie <airlied@gmail.com> > Cc: Simona Vetter <simona@ffwll.ch> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > Cc: Maxime Ripard <mripard@kernel.org> > Cc: Thomas Zimmermann <tzimmermann@suse.de> > Cc: Lyude Paul <lyude@redhat.com> > Cc: Danilo Krummrich <dakr@kernel.org> > Cc: David Hildenbrand <david@kernel.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Leon Romanovsky <leon@kernel.org> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Mike Rapoport <rppt@kernel.org> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Balbir Singh <balbirs@nvidia.com> > Cc: linuxppc-dev@lists.ozlabs.org > Cc: kvm@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: amd-gfx@lists.freedesktop.org > Cc: dri-devel@lists.freedesktop.org > Cc: nouveau@lists.freedesktop.org > Cc: linux-mm@kvack.org > Cc: linux-cxl@vger.kernel.org > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > drivers/gpu/drm/drm_pagemap.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > include/linux/memremap.h | 9 ++++++--- > lib/test_hmm.c | 4 +++- > mm/memremap.c | 20 +++++++++++++++++++- > 7 files changed, 32 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > index e5000bef90f2..7cf9310de0ec 100644 > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > dpage = pfn_to_page(uvmem_pfn); > dpage->zone_device_data = pvt; > - zone_device_page_init(dpage, 0); > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > return dpage; > out_clear: > spin_lock(&kvmppc_uvmem_bitmap_lock); > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > index af53e796ea1b..6ada7b4af7c6 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > page = pfn_to_page(pfn); > svm_range_bo_ref(prange->svm_bo); > page->zone_device_data = prange->svm_bo; > - zone_device_page_init(page, 0); > + zone_device_page_init(page, page_pgmap(page), 0); > } > > static void > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > index 03ee39a761a4..c497726b0147 100644 > --- a/drivers/gpu/drm/drm_pagemap.c > +++ b/drivers/gpu/drm/drm_pagemap.c > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > struct drm_pagemap_zdd *zdd) > { > page->zone_device_data = drm_pagemap_zdd_get(zdd); > - zone_device_page_init(page, 0); > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > } > > /** > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > index 58071652679d..3d8031296eed 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > order = ilog2(DMEM_CHUNK_NPAGES); > } > > - zone_device_folio_init(folio, order); > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > return page; > } > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 713ec0435b48..e3c2ccf872a8 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > } > > #ifdef CONFIG_ZONE_DEVICE > -void zone_device_page_init(struct page *page, unsigned int order); > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > + unsigned int order); > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > void memunmap_pages(struct dev_pagemap *pgmap); > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > unsigned long memremap_compat_align(void); > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > +static inline void zone_device_folio_init(struct folio *folio, > + struct dev_pagemap *pgmap, > + unsigned int order) > { > - zone_device_page_init(&folio->page, order); > + zone_device_page_init(&folio->page, pgmap, order); > if (order) > folio_set_large_rmappable(folio); > } > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > index 8af169d3873a..455a6862ae50 100644 > --- a/lib/test_hmm.c > +++ b/lib/test_hmm.c > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > goto error; > } > > - zone_device_folio_init(page_folio(dpage), order); > + zone_device_folio_init(page_folio(dpage), > + page_pgmap(folio_page(page_folio(dpage), 0)), > + order); > dpage->zone_device_data = rpage; > return dpage; > > diff --git a/mm/memremap.c b/mm/memremap.c > index 63c6ab4fdf08..6f46ab14662b 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > } > } > > -void zone_device_page_init(struct page *page, unsigned int order) > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > + unsigned int order) > { > + struct page *new_page = page; > + unsigned int i; > + > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > + struct folio *new_folio = (struct folio *)new_page; > + > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > +#ifdef NR_PAGES_IN_LARGE_FOLIO > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > +#endif Not sure I follow the new_page - 1? What happens when order is 0? > + new_folio->mapping = NULL; > + new_folio->pgmap = pgmap; /* Also clear compound head */ > + new_folio->share = 0; /* fsdax only, unused for device private */ > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > + } > + > /* > * Drivers shouldn't be allocating pages after calling > * memunmap_pages(). I wish we did not have to pass in the pgmap, but I can see why we can't rely on the existing pgmap Balbir ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 2:36 ` Balbir Singh @ 2026-01-15 2:41 ` Matthew Brost 2026-01-15 7:13 ` Alistair Popple 0 siblings, 1 reply; 33+ messages in thread From: Matthew Brost @ 2026-01-15 2:41 UTC (permalink / raw) To: Balbir Singh Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Thu, Jan 15, 2026 at 01:36:11PM +1100, Balbir Singh wrote: > On 1/15/26 06:19, Francois Dugast wrote: > > From: Matthew Brost <matthew.brost@intel.com> > > > > Reinitialize metadata for large zone device private folios in > > zone_device_page_init prior to creating a higher-order zone device > > private folio. This step is necessary when the folio’s order changes > > dynamically between zone_device_page_init calls to avoid building a > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > must be passed in from the caller because the pgmap stored in the folio > > page may have been overwritten with a compound head. > > > > Cc: Zi Yan <ziy@nvidia.com> > > Cc: Alistair Popple <apopple@nvidia.com> > > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > > Cc: Nicholas Piggin <npiggin@gmail.com> > > Cc: Michael Ellerman <mpe@ellerman.id.au> > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > > Cc: Alex Deucher <alexander.deucher@amd.com> > > Cc: "Christian König" <christian.koenig@amd.com> > > Cc: David Airlie <airlied@gmail.com> > > Cc: Simona Vetter <simona@ffwll.ch> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > Cc: Maxime Ripard <mripard@kernel.org> > > Cc: Thomas Zimmermann <tzimmermann@suse.de> > > Cc: Lyude Paul <lyude@redhat.com> > > Cc: Danilo Krummrich <dakr@kernel.org> > > Cc: David Hildenbrand <david@kernel.org> > > Cc: Oscar Salvador <osalvador@suse.de> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > Cc: Leon Romanovsky <leon@kernel.org> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > Cc: Vlastimil Babka <vbabka@suse.cz> > > Cc: Mike Rapoport <rppt@kernel.org> > > Cc: Suren Baghdasaryan <surenb@google.com> > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: Balbir Singh <balbirs@nvidia.com> > > Cc: linuxppc-dev@lists.ozlabs.org > > Cc: kvm@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > Cc: amd-gfx@lists.freedesktop.org > > Cc: dri-devel@lists.freedesktop.org > > Cc: nouveau@lists.freedesktop.org > > Cc: linux-mm@kvack.org > > Cc: linux-cxl@vger.kernel.org > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > > --- > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > > drivers/gpu/drm/drm_pagemap.c | 2 +- > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > > include/linux/memremap.h | 9 ++++++--- > > lib/test_hmm.c | 4 +++- > > mm/memremap.c | 20 +++++++++++++++++++- > > 7 files changed, 32 insertions(+), 9 deletions(-) > > > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > > index e5000bef90f2..7cf9310de0ec 100644 > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > > > dpage = pfn_to_page(uvmem_pfn); > > dpage->zone_device_data = pvt; > > - zone_device_page_init(dpage, 0); > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > > return dpage; > > out_clear: > > spin_lock(&kvmppc_uvmem_bitmap_lock); > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > index af53e796ea1b..6ada7b4af7c6 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > > page = pfn_to_page(pfn); > > svm_range_bo_ref(prange->svm_bo); > > page->zone_device_data = prange->svm_bo; > > - zone_device_page_init(page, 0); > > + zone_device_page_init(page, page_pgmap(page), 0); > > } > > > > static void > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > > index 03ee39a761a4..c497726b0147 100644 > > --- a/drivers/gpu/drm/drm_pagemap.c > > +++ b/drivers/gpu/drm/drm_pagemap.c > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > struct drm_pagemap_zdd *zdd) > > { > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > - zone_device_page_init(page, 0); > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > } > > > > /** > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > index 58071652679d..3d8031296eed 100644 > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > > order = ilog2(DMEM_CHUNK_NPAGES); > > } > > > > - zone_device_folio_init(folio, order); > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > > return page; > > } > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > > index 713ec0435b48..e3c2ccf872a8 100644 > > --- a/include/linux/memremap.h > > +++ b/include/linux/memremap.h > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > > } > > > > #ifdef CONFIG_ZONE_DEVICE > > -void zone_device_page_init(struct page *page, unsigned int order); > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > + unsigned int order); > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > > void memunmap_pages(struct dev_pagemap *pgmap); > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > > > unsigned long memremap_compat_align(void); > > > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > > +static inline void zone_device_folio_init(struct folio *folio, > > + struct dev_pagemap *pgmap, > > + unsigned int order) > > { > > - zone_device_page_init(&folio->page, order); > > + zone_device_page_init(&folio->page, pgmap, order); > > if (order) > > folio_set_large_rmappable(folio); > > } > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > > index 8af169d3873a..455a6862ae50 100644 > > --- a/lib/test_hmm.c > > +++ b/lib/test_hmm.c > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > > goto error; > > } > > > > - zone_device_folio_init(page_folio(dpage), order); > > + zone_device_folio_init(page_folio(dpage), > > + page_pgmap(folio_page(page_folio(dpage), 0)), > > + order); > > dpage->zone_device_data = rpage; > > return dpage; > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > index 63c6ab4fdf08..6f46ab14662b 100644 > > --- a/mm/memremap.c > > +++ b/mm/memremap.c > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > > } > > } > > > > -void zone_device_page_init(struct page *page, unsigned int order) > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > + unsigned int order) > > { > > + struct page *new_page = page; > > + unsigned int i; > > + > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > > + struct folio *new_folio = (struct folio *)new_page; > > + > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > +#ifdef NR_PAGES_IN_LARGE_FOLIO > > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > > +#endif > > Not sure I follow the new_page - 1? What happens when order is 0? > This is just to get _nr_pages in the new_page as folio->_nr_pages is in the folio's second page. So it just modifying itself. I agree this is a bit goofy but couldn't think of a better way to do this. In the page structure this is the memcg_data field on most builds. Matt > > + new_folio->mapping = NULL; > > + new_folio->pgmap = pgmap; /* Also clear compound head */ > > + new_folio->share = 0; /* fsdax only, unused for device private */ > > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > > + } > > + > > /* > > * Drivers shouldn't be allocating pages after calling > > * memunmap_pages(). > > I wish we did not have to pass in the pgmap, but I can see why > we can't rely on the existing pgmap > > Balbir > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 2:41 ` Matthew Brost @ 2026-01-15 7:13 ` Alistair Popple 2026-01-15 7:57 ` Matthew Brost 0 siblings, 1 reply; 33+ messages in thread From: Alistair Popple @ 2026-01-15 7:13 UTC (permalink / raw) To: Matthew Brost Cc: Balbir Singh, Francois Dugast, intel-xe, dri-devel, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On 2026-01-15 at 13:41 +1100, Matthew Brost <matthew.brost@intel.com> wrote... > On Thu, Jan 15, 2026 at 01:36:11PM +1100, Balbir Singh wrote: > > On 1/15/26 06:19, Francois Dugast wrote: > > > From: Matthew Brost <matthew.brost@intel.com> > > > > > > Reinitialize metadata for large zone device private folios in > > > zone_device_page_init prior to creating a higher-order zone device > > > private folio. This step is necessary when the folio’s order changes > > > dynamically between zone_device_page_init calls to avoid building a > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > must be passed in from the caller because the pgmap stored in the folio > > > page may have been overwritten with a compound head. > > > > > > Cc: Zi Yan <ziy@nvidia.com> > > > Cc: Alistair Popple <apopple@nvidia.com> > > > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > > > Cc: Nicholas Piggin <npiggin@gmail.com> > > > Cc: Michael Ellerman <mpe@ellerman.id.au> > > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > > > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > Cc: "Christian König" <christian.koenig@amd.com> > > > Cc: David Airlie <airlied@gmail.com> > > > Cc: Simona Vetter <simona@ffwll.ch> > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > Cc: Maxime Ripard <mripard@kernel.org> > > > Cc: Thomas Zimmermann <tzimmermann@suse.de> > > > Cc: Lyude Paul <lyude@redhat.com> > > > Cc: Danilo Krummrich <dakr@kernel.org> > > > Cc: David Hildenbrand <david@kernel.org> > > > Cc: Oscar Salvador <osalvador@suse.de> > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > Cc: Leon Romanovsky <leon@kernel.org> > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > Cc: Mike Rapoport <rppt@kernel.org> > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > Cc: Michal Hocko <mhocko@suse.com> > > > Cc: Balbir Singh <balbirs@nvidia.com> > > > Cc: linuxppc-dev@lists.ozlabs.org > > > Cc: kvm@vger.kernel.org > > > Cc: linux-kernel@vger.kernel.org > > > Cc: amd-gfx@lists.freedesktop.org > > > Cc: dri-devel@lists.freedesktop.org > > > Cc: nouveau@lists.freedesktop.org > > > Cc: linux-mm@kvack.org > > > Cc: linux-cxl@vger.kernel.org > > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > > > --- > > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > > > drivers/gpu/drm/drm_pagemap.c | 2 +- > > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > > > include/linux/memremap.h | 9 ++++++--- > > > lib/test_hmm.c | 4 +++- > > > mm/memremap.c | 20 +++++++++++++++++++- > > > 7 files changed, 32 insertions(+), 9 deletions(-) > > > > > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > index e5000bef90f2..7cf9310de0ec 100644 > > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > > > > > dpage = pfn_to_page(uvmem_pfn); > > > dpage->zone_device_data = pvt; > > > - zone_device_page_init(dpage, 0); > > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > > > return dpage; > > > out_clear: > > > spin_lock(&kvmppc_uvmem_bitmap_lock); > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > index af53e796ea1b..6ada7b4af7c6 100644 > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > > > page = pfn_to_page(pfn); > > > svm_range_bo_ref(prange->svm_bo); > > > page->zone_device_data = prange->svm_bo; > > > - zone_device_page_init(page, 0); > > > + zone_device_page_init(page, page_pgmap(page), 0); > > > } > > > > > > static void > > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > > > index 03ee39a761a4..c497726b0147 100644 > > > --- a/drivers/gpu/drm/drm_pagemap.c > > > +++ b/drivers/gpu/drm/drm_pagemap.c > > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > > struct drm_pagemap_zdd *zdd) > > > { > > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > > - zone_device_page_init(page, 0); > > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > > } > > > > > > /** > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > index 58071652679d..3d8031296eed 100644 > > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > > > order = ilog2(DMEM_CHUNK_NPAGES); > > > } > > > > > > - zone_device_folio_init(folio, order); > > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > > > return page; > > > } > > > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > > > index 713ec0435b48..e3c2ccf872a8 100644 > > > --- a/include/linux/memremap.h > > > +++ b/include/linux/memremap.h > > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > > > } > > > > > > #ifdef CONFIG_ZONE_DEVICE > > > -void zone_device_page_init(struct page *page, unsigned int order); > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > + unsigned int order); > > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > > > void memunmap_pages(struct dev_pagemap *pgmap); > > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > > > > > unsigned long memremap_compat_align(void); > > > > > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > > > +static inline void zone_device_folio_init(struct folio *folio, > > > + struct dev_pagemap *pgmap, > > > + unsigned int order) > > > { > > > - zone_device_page_init(&folio->page, order); > > > + zone_device_page_init(&folio->page, pgmap, order); > > > if (order) > > > folio_set_large_rmappable(folio); > > > } > > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > > > index 8af169d3873a..455a6862ae50 100644 > > > --- a/lib/test_hmm.c > > > +++ b/lib/test_hmm.c > > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > > > goto error; > > > } > > > > > > - zone_device_folio_init(page_folio(dpage), order); > > > + zone_device_folio_init(page_folio(dpage), > > > + page_pgmap(folio_page(page_folio(dpage), 0)), > > > + order); > > > dpage->zone_device_data = rpage; > > > return dpage; > > > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > > index 63c6ab4fdf08..6f46ab14662b 100644 > > > --- a/mm/memremap.c > > > +++ b/mm/memremap.c > > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > > > } > > > } > > > > > > -void zone_device_page_init(struct page *page, unsigned int order) > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > + unsigned int order) > > > { > > > + struct page *new_page = page; > > > + unsigned int i; > > > + > > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > > > > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > + struct folio *new_folio = (struct folio *)new_page; > > > + > > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > +#ifdef NR_PAGES_IN_LARGE_FOLIO > > > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > +#endif > > > > Not sure I follow the new_page - 1? What happens when order is 0? > > > > This is just to get _nr_pages in the new_page as folio->_nr_pages is in > the folio's second page. So it just modifying itself. I agree this is a > bit goofy but couldn't think of a better way to do this. In the page > structure this is the memcg_data field on most builds. I still don't follow - page == new_page == new_folio so isn't &new_page->_nr_pages the same as &new_folio->_nr_pages? I don't understand why we would care about the a second page here. - Alistair > > Matt > > > > + new_folio->mapping = NULL; > > > + new_folio->pgmap = pgmap; /* Also clear compound head */ > > > + new_folio->share = 0; /* fsdax only, unused for device private */ > > > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > > > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > > > + } > > > + > > > /* > > > * Drivers shouldn't be allocating pages after calling > > > * memunmap_pages(). > > > > I wish we did not have to pass in the pgmap, but I can see why > > we can't rely on the existing pgmap > > > > Balbir > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 7:13 ` Alistair Popple @ 2026-01-15 7:57 ` Matthew Brost 0 siblings, 0 replies; 33+ messages in thread From: Matthew Brost @ 2026-01-15 7:57 UTC (permalink / raw) To: Alistair Popple Cc: Balbir Singh, Francois Dugast, intel-xe, dri-devel, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Thu, Jan 15, 2026 at 06:13:15PM +1100, Alistair Popple wrote: > On 2026-01-15 at 13:41 +1100, Matthew Brost <matthew.brost@intel.com> wrote... > > On Thu, Jan 15, 2026 at 01:36:11PM +1100, Balbir Singh wrote: > > > On 1/15/26 06:19, Francois Dugast wrote: > > > > From: Matthew Brost <matthew.brost@intel.com> > > > > > > > > Reinitialize metadata for large zone device private folios in > > > > zone_device_page_init prior to creating a higher-order zone device > > > > private folio. This step is necessary when the folio’s order changes > > > > dynamically between zone_device_page_init calls to avoid building a > > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > > must be passed in from the caller because the pgmap stored in the folio > > > > page may have been overwritten with a compound head. > > > > > > > > Cc: Zi Yan <ziy@nvidia.com> > > > > Cc: Alistair Popple <apopple@nvidia.com> > > > > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > > > > Cc: Nicholas Piggin <npiggin@gmail.com> > > > > Cc: Michael Ellerman <mpe@ellerman.id.au> > > > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > > > > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > Cc: "Christian König" <christian.koenig@amd.com> > > > > Cc: David Airlie <airlied@gmail.com> > > > > Cc: Simona Vetter <simona@ffwll.ch> > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > > Cc: Maxime Ripard <mripard@kernel.org> > > > > Cc: Thomas Zimmermann <tzimmermann@suse.de> > > > > Cc: Lyude Paul <lyude@redhat.com> > > > > Cc: Danilo Krummrich <dakr@kernel.org> > > > > Cc: David Hildenbrand <david@kernel.org> > > > > Cc: Oscar Salvador <osalvador@suse.de> > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > > Cc: Leon Romanovsky <leon@kernel.org> > > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > > Cc: Mike Rapoport <rppt@kernel.org> > > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > > Cc: Michal Hocko <mhocko@suse.com> > > > > Cc: Balbir Singh <balbirs@nvidia.com> > > > > Cc: linuxppc-dev@lists.ozlabs.org > > > > Cc: kvm@vger.kernel.org > > > > Cc: linux-kernel@vger.kernel.org > > > > Cc: amd-gfx@lists.freedesktop.org > > > > Cc: dri-devel@lists.freedesktop.org > > > > Cc: nouveau@lists.freedesktop.org > > > > Cc: linux-mm@kvack.org > > > > Cc: linux-cxl@vger.kernel.org > > > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > > > > --- > > > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > > > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > > > > drivers/gpu/drm/drm_pagemap.c | 2 +- > > > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > > > > include/linux/memremap.h | 9 ++++++--- > > > > lib/test_hmm.c | 4 +++- > > > > mm/memremap.c | 20 +++++++++++++++++++- > > > > 7 files changed, 32 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > index e5000bef90f2..7cf9310de0ec 100644 > > > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > > > > > > > dpage = pfn_to_page(uvmem_pfn); > > > > dpage->zone_device_data = pvt; > > > > - zone_device_page_init(dpage, 0); > > > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > > > > return dpage; > > > > out_clear: > > > > spin_lock(&kvmppc_uvmem_bitmap_lock); > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > index af53e796ea1b..6ada7b4af7c6 100644 > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > > > > page = pfn_to_page(pfn); > > > > svm_range_bo_ref(prange->svm_bo); > > > > page->zone_device_data = prange->svm_bo; > > > > - zone_device_page_init(page, 0); > > > > + zone_device_page_init(page, page_pgmap(page), 0); > > > > } > > > > > > > > static void > > > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > > > > index 03ee39a761a4..c497726b0147 100644 > > > > --- a/drivers/gpu/drm/drm_pagemap.c > > > > +++ b/drivers/gpu/drm/drm_pagemap.c > > > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > > > struct drm_pagemap_zdd *zdd) > > > > { > > > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > > > - zone_device_page_init(page, 0); > > > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > > > } > > > > > > > > /** > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > index 58071652679d..3d8031296eed 100644 > > > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > > > > order = ilog2(DMEM_CHUNK_NPAGES); > > > > } > > > > > > > > - zone_device_folio_init(folio, order); > > > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > > > > return page; > > > > } > > > > > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > > > > index 713ec0435b48..e3c2ccf872a8 100644 > > > > --- a/include/linux/memremap.h > > > > +++ b/include/linux/memremap.h > > > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > > > > } > > > > > > > > #ifdef CONFIG_ZONE_DEVICE > > > > -void zone_device_page_init(struct page *page, unsigned int order); > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > + unsigned int order); > > > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > > > > void memunmap_pages(struct dev_pagemap *pgmap); > > > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > > > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > > > > > > > unsigned long memremap_compat_align(void); > > > > > > > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > > > > +static inline void zone_device_folio_init(struct folio *folio, > > > > + struct dev_pagemap *pgmap, > > > > + unsigned int order) > > > > { > > > > - zone_device_page_init(&folio->page, order); > > > > + zone_device_page_init(&folio->page, pgmap, order); > > > > if (order) > > > > folio_set_large_rmappable(folio); > > > > } > > > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > > > > index 8af169d3873a..455a6862ae50 100644 > > > > --- a/lib/test_hmm.c > > > > +++ b/lib/test_hmm.c > > > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > > > > goto error; > > > > } > > > > > > > > - zone_device_folio_init(page_folio(dpage), order); > > > > + zone_device_folio_init(page_folio(dpage), > > > > + page_pgmap(folio_page(page_folio(dpage), 0)), > > > > + order); > > > > dpage->zone_device_data = rpage; > > > > return dpage; > > > > > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > > > index 63c6ab4fdf08..6f46ab14662b 100644 > > > > --- a/mm/memremap.c > > > > +++ b/mm/memremap.c > > > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > > > > } > > > > } > > > > > > > > -void zone_device_page_init(struct page *page, unsigned int order) > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > + unsigned int order) > > > > { > > > > + struct page *new_page = page; > > > > + unsigned int i; > > > > + > > > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > > > > > > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > > + struct folio *new_folio = (struct folio *)new_page; > > > > + > > > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > > +#ifdef NR_PAGES_IN_LARGE_FOLIO > > > > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > > +#endif > > > > > > Not sure I follow the new_page - 1? What happens when order is 0? > > > > > > > This is just to get _nr_pages in the new_page as folio->_nr_pages is in > > the folio's second page. So it just modifying itself. I agree this is a > > bit goofy but couldn't think of a better way to do this. In the page > > structure this is the memcg_data field on most builds. > > I still don't follow - page == new_page == new_folio so isn't > &new_page->_nr_pages the same as &new_folio->_nr_pages? I don't understand why > we would care about the a second page here. > I just replied to another email—this is quite confusing, but let me try here... Memory layout of a folio: page0 page1 <-- this is where _nr_pages is ... So ((struct folio *)(new_page - 1))->_nr_pages is pointing to memory at new_page but using casting to determine the _nr_pages location. At this point, we have no idea if _nr_pages in new_page was set by a prior larger folio, so we just blindly clear it, which is safe. This is no different than what folio_reset_order() does; we just do it for every single page’s memory within the orderi passed in. Matt > - Alistair > > > > > Matt > > > > > > + new_folio->mapping = NULL; > > > > + new_folio->pgmap = pgmap; /* Also clear compound head */ > > > > + new_folio->share = 0; /* fsdax only, unused for device private */ > > > > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > > > > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > > > > + } > > > > + > > > > /* > > > > * Drivers shouldn't be allocating pages after calling > > > > * memunmap_pages(). > > > > > > I wish we did not have to pass in the pgmap, but I can see why > > > we can't rely on the existing pgmap > > > > > > Balbir > > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-14 19:19 ` [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast 2026-01-14 21:48 ` Andrew Morton 2026-01-15 2:36 ` Balbir Singh @ 2026-01-15 3:01 ` Andrew Morton 2026-01-15 3:07 ` Matthew Brost 2026-01-15 5:27 ` Alistair Popple 2026-01-16 16:43 ` Rodrigo Vivi 4 siblings, 1 reply; 33+ messages in thread From: Andrew Morton @ 2026-01-15 3:01 UTC (permalink / raw) To: Francois Dugast Cc: intel-xe, dri-devel, Matthew Brost, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Wed, 14 Jan 2026 20:19:52 +0100 Francois Dugast <francois.dugast@intel.com> wrote: > From: Matthew Brost <matthew.brost@intel.com> > > Reinitialize metadata for large zone device private folios in > zone_device_page_init prior to creating a higher-order zone device > private folio. This step is necessary when the folio’s order changes > dynamically between zone_device_page_init calls to avoid building a > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > must be passed in from the caller because the pgmap stored in the folio > page may have been overwritten with a compound head. > > --- a/drivers/gpu/drm/drm_pagemap.c > +++ b/drivers/gpu/drm/drm_pagemap.c > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > struct drm_pagemap_zdd *zdd) > { > page->zone_device_data = drm_pagemap_zdd_get(zdd); > - zone_device_page_init(page, 0); > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > } drivers/gpu/drm/drm_pagemap.c:200:40: error: 'struct drm_pagemap_zdd' has no member named 'dpagemap' I guess this was accidentally fixed in a later patch? Please let's decide whether to fast-track the [1/N] fix into mainline and if so, prepare something which compiles! ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 3:01 ` Andrew Morton @ 2026-01-15 3:07 ` Matthew Brost 2026-01-15 4:05 ` Matthew Brost 0 siblings, 1 reply; 33+ messages in thread From: Matthew Brost @ 2026-01-15 3:07 UTC (permalink / raw) To: Andrew Morton Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Wed, Jan 14, 2026 at 07:01:54PM -0800, Andrew Morton wrote: > On Wed, 14 Jan 2026 20:19:52 +0100 Francois Dugast <francois.dugast@intel.com> wrote: > > > From: Matthew Brost <matthew.brost@intel.com> > > > > Reinitialize metadata for large zone device private folios in > > zone_device_page_init prior to creating a higher-order zone device > > private folio. This step is necessary when the folio’s order changes > > dynamically between zone_device_page_init calls to avoid building a > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > must be passed in from the caller because the pgmap stored in the folio > > page may have been overwritten with a compound head. > > > > --- a/drivers/gpu/drm/drm_pagemap.c > > +++ b/drivers/gpu/drm/drm_pagemap.c > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > struct drm_pagemap_zdd *zdd) > > { > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > - zone_device_page_init(page, 0); > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > } > > drivers/gpu/drm/drm_pagemap.c:200:40: error: 'struct drm_pagemap_zdd' has no member named 'dpagemap' > > I guess this was accidentally fixed in a later patch? > Ah, no. This is because we merged some in drm-tip which is not 6.19, this is based on the drm-tip branch. > Please let's decide whether to fast-track the [1/N] fix into mainline > and if so, prepare something which compiles! Maybe we just take this through the MM repo then? I suppose I should send out patch which applies to the MM repo? I just cloned that repo. Matt ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 3:07 ` Matthew Brost @ 2026-01-15 4:05 ` Matthew Brost 0 siblings, 0 replies; 33+ messages in thread From: Matthew Brost @ 2026-01-15 4:05 UTC (permalink / raw) To: Andrew Morton Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Wed, Jan 14, 2026 at 07:07:37PM -0800, Matthew Brost wrote: > On Wed, Jan 14, 2026 at 07:01:54PM -0800, Andrew Morton wrote: > > On Wed, 14 Jan 2026 20:19:52 +0100 Francois Dugast <francois.dugast@intel.com> wrote: > > > > > From: Matthew Brost <matthew.brost@intel.com> > > > > > > Reinitialize metadata for large zone device private folios in > > > zone_device_page_init prior to creating a higher-order zone device > > > private folio. This step is necessary when the folio’s order changes > > > dynamically between zone_device_page_init calls to avoid building a > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > must be passed in from the caller because the pgmap stored in the folio > > > page may have been overwritten with a compound head. > > > > > > --- a/drivers/gpu/drm/drm_pagemap.c > > > +++ b/drivers/gpu/drm/drm_pagemap.c > > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > > struct drm_pagemap_zdd *zdd) > > > { > > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > > - zone_device_page_init(page, 0); > > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > > } > > > > drivers/gpu/drm/drm_pagemap.c:200:40: error: 'struct drm_pagemap_zdd' has no member named 'dpagemap' > > > > I guess this was accidentally fixed in a later patch? > > > > Ah, no. This is because we merged some in drm-tip which is not 6.19, > this is based on the drm-tip branch. > > > Please let's decide whether to fast-track the [1/N] fix into mainline > > and if so, prepare something which compiles! > > Maybe we just take this through the MM repo then? I suppose I should > send out patch which applies to the MM repo? I just cloned that repo. > Sorry, typing to fast. I believe have a patch structure that applies to 6.19, MM branches, and drm-tip. Just need to run CI on 3 branches :). Matt > Matt ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-14 19:19 ` [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast ` (2 preceding siblings ...) 2026-01-15 3:01 ` Andrew Morton @ 2026-01-15 5:27 ` Alistair Popple 2026-01-15 5:57 ` Matthew Brost 2026-01-16 16:43 ` Rodrigo Vivi 4 siblings, 1 reply; 33+ messages in thread From: Alistair Popple @ 2026-01-15 5:27 UTC (permalink / raw) To: Francois Dugast Cc: intel-xe, dri-devel, Matthew Brost, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On 2026-01-15 at 06:19 +1100, Francois Dugast <francois.dugast@intel.com> wrote... > From: Matthew Brost <matthew.brost@intel.com> > > Reinitialize metadata for large zone device private folios in > zone_device_page_init prior to creating a higher-order zone device > private folio. This step is necessary when the folio’s order changes > dynamically between zone_device_page_init calls to avoid building a > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > must be passed in from the caller because the pgmap stored in the folio > page may have been overwritten with a compound head. Thanks for fixing, a couple of minor comments below. > Cc: Zi Yan <ziy@nvidia.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > Cc: Alex Deucher <alexander.deucher@amd.com> > Cc: "Christian König" <christian.koenig@amd.com> > Cc: David Airlie <airlied@gmail.com> > Cc: Simona Vetter <simona@ffwll.ch> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > Cc: Maxime Ripard <mripard@kernel.org> > Cc: Thomas Zimmermann <tzimmermann@suse.de> > Cc: Lyude Paul <lyude@redhat.com> > Cc: Danilo Krummrich <dakr@kernel.org> > Cc: David Hildenbrand <david@kernel.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Leon Romanovsky <leon@kernel.org> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Mike Rapoport <rppt@kernel.org> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Balbir Singh <balbirs@nvidia.com> > Cc: linuxppc-dev@lists.ozlabs.org > Cc: kvm@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: amd-gfx@lists.freedesktop.org > Cc: dri-devel@lists.freedesktop.org > Cc: nouveau@lists.freedesktop.org > Cc: linux-mm@kvack.org > Cc: linux-cxl@vger.kernel.org > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > drivers/gpu/drm/drm_pagemap.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > include/linux/memremap.h | 9 ++++++--- > lib/test_hmm.c | 4 +++- > mm/memremap.c | 20 +++++++++++++++++++- > 7 files changed, 32 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > index e5000bef90f2..7cf9310de0ec 100644 > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > dpage = pfn_to_page(uvmem_pfn); > dpage->zone_device_data = pvt; > - zone_device_page_init(dpage, 0); > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > return dpage; > out_clear: > spin_lock(&kvmppc_uvmem_bitmap_lock); > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > index af53e796ea1b..6ada7b4af7c6 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > page = pfn_to_page(pfn); > svm_range_bo_ref(prange->svm_bo); > page->zone_device_data = prange->svm_bo; > - zone_device_page_init(page, 0); > + zone_device_page_init(page, page_pgmap(page), 0); > } > > static void > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > index 03ee39a761a4..c497726b0147 100644 > --- a/drivers/gpu/drm/drm_pagemap.c > +++ b/drivers/gpu/drm/drm_pagemap.c > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > struct drm_pagemap_zdd *zdd) > { > page->zone_device_data = drm_pagemap_zdd_get(zdd); > - zone_device_page_init(page, 0); > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > } > > /** > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > index 58071652679d..3d8031296eed 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > order = ilog2(DMEM_CHUNK_NPAGES); > } > > - zone_device_folio_init(folio, order); > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > return page; > } > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 713ec0435b48..e3c2ccf872a8 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > } > > #ifdef CONFIG_ZONE_DEVICE > -void zone_device_page_init(struct page *page, unsigned int order); > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > + unsigned int order); > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > void memunmap_pages(struct dev_pagemap *pgmap); > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > unsigned long memremap_compat_align(void); > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > +static inline void zone_device_folio_init(struct folio *folio, > + struct dev_pagemap *pgmap, > + unsigned int order) > { > - zone_device_page_init(&folio->page, order); > + zone_device_page_init(&folio->page, pgmap, order); > if (order) > folio_set_large_rmappable(folio); > } > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > index 8af169d3873a..455a6862ae50 100644 > --- a/lib/test_hmm.c > +++ b/lib/test_hmm.c > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > goto error; > } > > - zone_device_folio_init(page_folio(dpage), order); > + zone_device_folio_init(page_folio(dpage), > + page_pgmap(folio_page(page_folio(dpage), 0)), > + order); > dpage->zone_device_data = rpage; > return dpage; > > diff --git a/mm/memremap.c b/mm/memremap.c > index 63c6ab4fdf08..6f46ab14662b 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > } > } > > -void zone_device_page_init(struct page *page, unsigned int order) > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > + unsigned int order) > { > + struct page *new_page = page; > + unsigned int i; > + > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > + struct folio *new_folio = (struct folio *)new_page; > + > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ This seems odd to me, mainly due to the "magic" number. Why not just clear the flags entirely? Or at least explicitly just clear the flags you care about which would remove the need for the comment and should let you use the proper PageFlag functions. > +#ifdef NR_PAGES_IN_LARGE_FOLIO > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > +#endif > + new_folio->mapping = NULL; > + new_folio->pgmap = pgmap; /* Also clear compound head */ > + new_folio->share = 0; /* fsdax only, unused for device private */ It would be nice if the FS DAX code actually used this as well. Is there a reason that change was dropped from the series? > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > + } > + > /* > * Drivers shouldn't be allocating pages after calling > * memunmap_pages(). > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 5:27 ` Alistair Popple @ 2026-01-15 5:57 ` Matthew Brost 2026-01-15 6:18 ` Matthew Brost 0 siblings, 1 reply; 33+ messages in thread From: Matthew Brost @ 2026-01-15 5:57 UTC (permalink / raw) To: Alistair Popple Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Thu, Jan 15, 2026 at 04:27:26PM +1100, Alistair Popple wrote: > On 2026-01-15 at 06:19 +1100, Francois Dugast <francois.dugast@intel.com> wrote... > > From: Matthew Brost <matthew.brost@intel.com> > > > > Reinitialize metadata for large zone device private folios in > > zone_device_page_init prior to creating a higher-order zone device > > private folio. This step is necessary when the folio’s order changes > > dynamically between zone_device_page_init calls to avoid building a > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > must be passed in from the caller because the pgmap stored in the folio > > page may have been overwritten with a compound head. > > Thanks for fixing, a couple of minor comments below. > > > Cc: Zi Yan <ziy@nvidia.com> > > Cc: Alistair Popple <apopple@nvidia.com> > > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > > Cc: Nicholas Piggin <npiggin@gmail.com> > > Cc: Michael Ellerman <mpe@ellerman.id.au> > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > > Cc: Alex Deucher <alexander.deucher@amd.com> > > Cc: "Christian König" <christian.koenig@amd.com> > > Cc: David Airlie <airlied@gmail.com> > > Cc: Simona Vetter <simona@ffwll.ch> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > Cc: Maxime Ripard <mripard@kernel.org> > > Cc: Thomas Zimmermann <tzimmermann@suse.de> > > Cc: Lyude Paul <lyude@redhat.com> > > Cc: Danilo Krummrich <dakr@kernel.org> > > Cc: David Hildenbrand <david@kernel.org> > > Cc: Oscar Salvador <osalvador@suse.de> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > Cc: Leon Romanovsky <leon@kernel.org> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > Cc: Vlastimil Babka <vbabka@suse.cz> > > Cc: Mike Rapoport <rppt@kernel.org> > > Cc: Suren Baghdasaryan <surenb@google.com> > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: Balbir Singh <balbirs@nvidia.com> > > Cc: linuxppc-dev@lists.ozlabs.org > > Cc: kvm@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > Cc: amd-gfx@lists.freedesktop.org > > Cc: dri-devel@lists.freedesktop.org > > Cc: nouveau@lists.freedesktop.org > > Cc: linux-mm@kvack.org > > Cc: linux-cxl@vger.kernel.org > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > > --- > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > > drivers/gpu/drm/drm_pagemap.c | 2 +- > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > > include/linux/memremap.h | 9 ++++++--- > > lib/test_hmm.c | 4 +++- > > mm/memremap.c | 20 +++++++++++++++++++- > > 7 files changed, 32 insertions(+), 9 deletions(-) > > > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > > index e5000bef90f2..7cf9310de0ec 100644 > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > > > dpage = pfn_to_page(uvmem_pfn); > > dpage->zone_device_data = pvt; > > - zone_device_page_init(dpage, 0); > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > > return dpage; > > out_clear: > > spin_lock(&kvmppc_uvmem_bitmap_lock); > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > index af53e796ea1b..6ada7b4af7c6 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > > page = pfn_to_page(pfn); > > svm_range_bo_ref(prange->svm_bo); > > page->zone_device_data = prange->svm_bo; > > - zone_device_page_init(page, 0); > > + zone_device_page_init(page, page_pgmap(page), 0); > > } > > > > static void > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > > index 03ee39a761a4..c497726b0147 100644 > > --- a/drivers/gpu/drm/drm_pagemap.c > > +++ b/drivers/gpu/drm/drm_pagemap.c > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > struct drm_pagemap_zdd *zdd) > > { > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > - zone_device_page_init(page, 0); > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > } > > > > /** > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > index 58071652679d..3d8031296eed 100644 > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > > order = ilog2(DMEM_CHUNK_NPAGES); > > } > > > > - zone_device_folio_init(folio, order); > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > > return page; > > } > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > > index 713ec0435b48..e3c2ccf872a8 100644 > > --- a/include/linux/memremap.h > > +++ b/include/linux/memremap.h > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > > } > > > > #ifdef CONFIG_ZONE_DEVICE > > -void zone_device_page_init(struct page *page, unsigned int order); > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > + unsigned int order); > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > > void memunmap_pages(struct dev_pagemap *pgmap); > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > > > unsigned long memremap_compat_align(void); > > > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > > +static inline void zone_device_folio_init(struct folio *folio, > > + struct dev_pagemap *pgmap, > > + unsigned int order) > > { > > - zone_device_page_init(&folio->page, order); > > + zone_device_page_init(&folio->page, pgmap, order); > > if (order) > > folio_set_large_rmappable(folio); > > } > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > > index 8af169d3873a..455a6862ae50 100644 > > --- a/lib/test_hmm.c > > +++ b/lib/test_hmm.c > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > > goto error; > > } > > > > - zone_device_folio_init(page_folio(dpage), order); > > + zone_device_folio_init(page_folio(dpage), > > + page_pgmap(folio_page(page_folio(dpage), 0)), > > + order); > > dpage->zone_device_data = rpage; > > return dpage; > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > index 63c6ab4fdf08..6f46ab14662b 100644 > > --- a/mm/memremap.c > > +++ b/mm/memremap.c > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > > } > > } > > > > -void zone_device_page_init(struct page *page, unsigned int order) > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > + unsigned int order) > > { > > + struct page *new_page = page; > > + unsigned int i; > > + > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > > + struct folio *new_folio = (struct folio *)new_page; > > + > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > This seems odd to me, mainly due to the "magic" number. Why not just clear > the flags entirely? Or at least explicitly just clear the flags you care about > which would remove the need for the comment and should let you use the proper > PageFlag functions. > I'm copying this from folio_reset_order [1]. My paranoia about touching anything related to struct page is high, so I did the same thing folio_reset_order does here. [1] https://elixir.bootlin.com/linux/v6.18.5/source/include/linux/mm.h#L1075 > > +#ifdef NR_PAGES_IN_LARGE_FOLIO > > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > > +#endif > > + new_folio->mapping = NULL; > > + new_folio->pgmap = pgmap; /* Also clear compound head */ > > + new_folio->share = 0; /* fsdax only, unused for device private */ > > It would be nice if the FS DAX code actually used this as well. Is there a > reason that change was dropped from the series? > I don't have a test platform for FS DAX. In prior revisions, I was just moving existing FS DAX code to a helper, which I felt confident about. This revision is slightly different, and I don't feel comfortable modifying FS DAX code without a test platform. I agree we should update FS DAX, but that should be done in a follow-up with coordinated testing. Matt > > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > > + } > > + > > /* > > * Drivers shouldn't be allocating pages after calling > > * memunmap_pages(). > > -- > > 2.43.0 > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 5:57 ` Matthew Brost @ 2026-01-15 6:18 ` Matthew Brost 2026-01-15 7:07 ` Alistair Popple 2026-01-16 16:13 ` Vlastimil Babka 0 siblings, 2 replies; 33+ messages in thread From: Matthew Brost @ 2026-01-15 6:18 UTC (permalink / raw) To: Alistair Popple Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Wed, Jan 14, 2026 at 09:57:31PM -0800, Matthew Brost wrote: > On Thu, Jan 15, 2026 at 04:27:26PM +1100, Alistair Popple wrote: > > On 2026-01-15 at 06:19 +1100, Francois Dugast <francois.dugast@intel.com> wrote... > > > From: Matthew Brost <matthew.brost@intel.com> > > > > > > Reinitialize metadata for large zone device private folios in > > > zone_device_page_init prior to creating a higher-order zone device > > > private folio. This step is necessary when the folio’s order changes > > > dynamically between zone_device_page_init calls to avoid building a > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > must be passed in from the caller because the pgmap stored in the folio > > > page may have been overwritten with a compound head. > > > > Thanks for fixing, a couple of minor comments below. > > > > > Cc: Zi Yan <ziy@nvidia.com> > > > Cc: Alistair Popple <apopple@nvidia.com> > > > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > > > Cc: Nicholas Piggin <npiggin@gmail.com> > > > Cc: Michael Ellerman <mpe@ellerman.id.au> > > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > > > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > Cc: "Christian König" <christian.koenig@amd.com> > > > Cc: David Airlie <airlied@gmail.com> > > > Cc: Simona Vetter <simona@ffwll.ch> > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > Cc: Maxime Ripard <mripard@kernel.org> > > > Cc: Thomas Zimmermann <tzimmermann@suse.de> > > > Cc: Lyude Paul <lyude@redhat.com> > > > Cc: Danilo Krummrich <dakr@kernel.org> > > > Cc: David Hildenbrand <david@kernel.org> > > > Cc: Oscar Salvador <osalvador@suse.de> > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > Cc: Leon Romanovsky <leon@kernel.org> > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > Cc: Mike Rapoport <rppt@kernel.org> > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > Cc: Michal Hocko <mhocko@suse.com> > > > Cc: Balbir Singh <balbirs@nvidia.com> > > > Cc: linuxppc-dev@lists.ozlabs.org > > > Cc: kvm@vger.kernel.org > > > Cc: linux-kernel@vger.kernel.org > > > Cc: amd-gfx@lists.freedesktop.org > > > Cc: dri-devel@lists.freedesktop.org > > > Cc: nouveau@lists.freedesktop.org > > > Cc: linux-mm@kvack.org > > > Cc: linux-cxl@vger.kernel.org > > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > > > --- > > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > > > drivers/gpu/drm/drm_pagemap.c | 2 +- > > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > > > include/linux/memremap.h | 9 ++++++--- > > > lib/test_hmm.c | 4 +++- > > > mm/memremap.c | 20 +++++++++++++++++++- > > > 7 files changed, 32 insertions(+), 9 deletions(-) > > > > > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > index e5000bef90f2..7cf9310de0ec 100644 > > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > > > > > dpage = pfn_to_page(uvmem_pfn); > > > dpage->zone_device_data = pvt; > > > - zone_device_page_init(dpage, 0); > > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > > > return dpage; > > > out_clear: > > > spin_lock(&kvmppc_uvmem_bitmap_lock); > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > index af53e796ea1b..6ada7b4af7c6 100644 > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > > > page = pfn_to_page(pfn); > > > svm_range_bo_ref(prange->svm_bo); > > > page->zone_device_data = prange->svm_bo; > > > - zone_device_page_init(page, 0); > > > + zone_device_page_init(page, page_pgmap(page), 0); > > > } > > > > > > static void > > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > > > index 03ee39a761a4..c497726b0147 100644 > > > --- a/drivers/gpu/drm/drm_pagemap.c > > > +++ b/drivers/gpu/drm/drm_pagemap.c > > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > > struct drm_pagemap_zdd *zdd) > > > { > > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > > - zone_device_page_init(page, 0); > > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > > } > > > > > > /** > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > index 58071652679d..3d8031296eed 100644 > > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > > > order = ilog2(DMEM_CHUNK_NPAGES); > > > } > > > > > > - zone_device_folio_init(folio, order); > > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > > > return page; > > > } > > > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > > > index 713ec0435b48..e3c2ccf872a8 100644 > > > --- a/include/linux/memremap.h > > > +++ b/include/linux/memremap.h > > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > > > } > > > > > > #ifdef CONFIG_ZONE_DEVICE > > > -void zone_device_page_init(struct page *page, unsigned int order); > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > + unsigned int order); > > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > > > void memunmap_pages(struct dev_pagemap *pgmap); > > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > > > > > unsigned long memremap_compat_align(void); > > > > > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > > > +static inline void zone_device_folio_init(struct folio *folio, > > > + struct dev_pagemap *pgmap, > > > + unsigned int order) > > > { > > > - zone_device_page_init(&folio->page, order); > > > + zone_device_page_init(&folio->page, pgmap, order); > > > if (order) > > > folio_set_large_rmappable(folio); > > > } > > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > > > index 8af169d3873a..455a6862ae50 100644 > > > --- a/lib/test_hmm.c > > > +++ b/lib/test_hmm.c > > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > > > goto error; > > > } > > > > > > - zone_device_folio_init(page_folio(dpage), order); > > > + zone_device_folio_init(page_folio(dpage), > > > + page_pgmap(folio_page(page_folio(dpage), 0)), > > > + order); > > > dpage->zone_device_data = rpage; > > > return dpage; > > > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > > index 63c6ab4fdf08..6f46ab14662b 100644 > > > --- a/mm/memremap.c > > > +++ b/mm/memremap.c > > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > > > } > > > } > > > > > > -void zone_device_page_init(struct page *page, unsigned int order) > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > + unsigned int order) > > > { > > > + struct page *new_page = page; > > > + unsigned int i; > > > + > > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > > > > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > + struct folio *new_folio = (struct folio *)new_page; > > > + > > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > > This seems odd to me, mainly due to the "magic" number. Why not just clear > > the flags entirely? Or at least explicitly just clear the flags you care about > > which would remove the need for the comment and should let you use the proper > > PageFlag functions. > > > > I'm copying this from folio_reset_order [1]. My paranoia about touching > anything related to struct page is high, so I did the same thing > folio_reset_order does here. > > [1] https://elixir.bootlin.com/linux/v6.18.5/source/include/linux/mm.h#L1075 > This immediately hangs my first SVM test... diff --git a/mm/memremap.c b/mm/memremap.c index 6f46ab14662b..ef8c56876cf5 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -488,7 +488,7 @@ void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, for (i = 0; i < (1UL << order); ++i, ++new_page) { struct folio *new_folio = (struct folio *)new_page; - new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ + new_page->flags.f = 0; #ifdef NR_PAGES_IN_LARGE_FOLIO ((struct folio *)(new_page - 1))->_nr_pages = 0; #endif I can walk through exactly which flags need to be cleared, but my feeling is that likely any flag that the order field overloads and can possibly encode should be cleared—so bits 0–7 based on the existing code. How about in a follow-up we normalize setting / clearing the order flag field with a #define and an inline helper? Matt > > > +#ifdef NR_PAGES_IN_LARGE_FOLIO > > > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > +#endif > > > + new_folio->mapping = NULL; > > > + new_folio->pgmap = pgmap; /* Also clear compound head */ > > > + new_folio->share = 0; /* fsdax only, unused for device private */ > > > > It would be nice if the FS DAX code actually used this as well. Is there a > > reason that change was dropped from the series? > > > > I don't have a test platform for FS DAX. In prior revisions, I was just > moving existing FS DAX code to a helper, which I felt confident about. > > This revision is slightly different, and I don't feel comfortable > modifying FS DAX code without a test platform. I agree we should update > FS DAX, but that should be done in a follow-up with coordinated testing. > > Matt > > > > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > > > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > > > + } > > > + > > > /* > > > * Drivers shouldn't be allocating pages after calling > > > * memunmap_pages(). > > > -- > > > 2.43.0 > > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 6:18 ` Matthew Brost @ 2026-01-15 7:07 ` Alistair Popple 2026-01-15 7:39 ` Balbir Singh 2026-01-15 7:43 ` Matthew Brost 2026-01-16 16:13 ` Vlastimil Babka 1 sibling, 2 replies; 33+ messages in thread From: Alistair Popple @ 2026-01-15 7:07 UTC (permalink / raw) To: Matthew Brost Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On 2026-01-15 at 17:18 +1100, Matthew Brost <matthew.brost@intel.com> wrote... > On Wed, Jan 14, 2026 at 09:57:31PM -0800, Matthew Brost wrote: > > On Thu, Jan 15, 2026 at 04:27:26PM +1100, Alistair Popple wrote: > > > On 2026-01-15 at 06:19 +1100, Francois Dugast <francois.dugast@intel.com> wrote... > > > > From: Matthew Brost <matthew.brost@intel.com> > > > > > > > > Reinitialize metadata for large zone device private folios in > > > > zone_device_page_init prior to creating a higher-order zone device > > > > private folio. This step is necessary when the folio’s order changes > > > > dynamically between zone_device_page_init calls to avoid building a > > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > > must be passed in from the caller because the pgmap stored in the folio > > > > page may have been overwritten with a compound head. > > > > > > Thanks for fixing, a couple of minor comments below. > > > > > > > Cc: Zi Yan <ziy@nvidia.com> > > > > Cc: Alistair Popple <apopple@nvidia.com> > > > > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > > > > Cc: Nicholas Piggin <npiggin@gmail.com> > > > > Cc: Michael Ellerman <mpe@ellerman.id.au> > > > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > > > > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > Cc: "Christian König" <christian.koenig@amd.com> > > > > Cc: David Airlie <airlied@gmail.com> > > > > Cc: Simona Vetter <simona@ffwll.ch> > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > > Cc: Maxime Ripard <mripard@kernel.org> > > > > Cc: Thomas Zimmermann <tzimmermann@suse.de> > > > > Cc: Lyude Paul <lyude@redhat.com> > > > > Cc: Danilo Krummrich <dakr@kernel.org> > > > > Cc: David Hildenbrand <david@kernel.org> > > > > Cc: Oscar Salvador <osalvador@suse.de> > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > > Cc: Leon Romanovsky <leon@kernel.org> > > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > > Cc: Mike Rapoport <rppt@kernel.org> > > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > > Cc: Michal Hocko <mhocko@suse.com> > > > > Cc: Balbir Singh <balbirs@nvidia.com> > > > > Cc: linuxppc-dev@lists.ozlabs.org > > > > Cc: kvm@vger.kernel.org > > > > Cc: linux-kernel@vger.kernel.org > > > > Cc: amd-gfx@lists.freedesktop.org > > > > Cc: dri-devel@lists.freedesktop.org > > > > Cc: nouveau@lists.freedesktop.org > > > > Cc: linux-mm@kvack.org > > > > Cc: linux-cxl@vger.kernel.org > > > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > > > > --- > > > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > > > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > > > > drivers/gpu/drm/drm_pagemap.c | 2 +- > > > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > > > > include/linux/memremap.h | 9 ++++++--- > > > > lib/test_hmm.c | 4 +++- > > > > mm/memremap.c | 20 +++++++++++++++++++- > > > > 7 files changed, 32 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > index e5000bef90f2..7cf9310de0ec 100644 > > > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > > > > > > > dpage = pfn_to_page(uvmem_pfn); > > > > dpage->zone_device_data = pvt; > > > > - zone_device_page_init(dpage, 0); > > > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > > > > return dpage; > > > > out_clear: > > > > spin_lock(&kvmppc_uvmem_bitmap_lock); > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > index af53e796ea1b..6ada7b4af7c6 100644 > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > > > > page = pfn_to_page(pfn); > > > > svm_range_bo_ref(prange->svm_bo); > > > > page->zone_device_data = prange->svm_bo; > > > > - zone_device_page_init(page, 0); > > > > + zone_device_page_init(page, page_pgmap(page), 0); > > > > } > > > > > > > > static void > > > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > > > > index 03ee39a761a4..c497726b0147 100644 > > > > --- a/drivers/gpu/drm/drm_pagemap.c > > > > +++ b/drivers/gpu/drm/drm_pagemap.c > > > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > > > struct drm_pagemap_zdd *zdd) > > > > { > > > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > > > - zone_device_page_init(page, 0); > > > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > > > } > > > > > > > > /** > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > index 58071652679d..3d8031296eed 100644 > > > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > > > > order = ilog2(DMEM_CHUNK_NPAGES); > > > > } > > > > > > > > - zone_device_folio_init(folio, order); > > > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > > > > return page; > > > > } > > > > > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > > > > index 713ec0435b48..e3c2ccf872a8 100644 > > > > --- a/include/linux/memremap.h > > > > +++ b/include/linux/memremap.h > > > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > > > > } > > > > > > > > #ifdef CONFIG_ZONE_DEVICE > > > > -void zone_device_page_init(struct page *page, unsigned int order); > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > + unsigned int order); > > > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > > > > void memunmap_pages(struct dev_pagemap *pgmap); > > > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > > > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > > > > > > > unsigned long memremap_compat_align(void); > > > > > > > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > > > > +static inline void zone_device_folio_init(struct folio *folio, > > > > + struct dev_pagemap *pgmap, > > > > + unsigned int order) > > > > { > > > > - zone_device_page_init(&folio->page, order); > > > > + zone_device_page_init(&folio->page, pgmap, order); > > > > if (order) > > > > folio_set_large_rmappable(folio); > > > > } > > > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > > > > index 8af169d3873a..455a6862ae50 100644 > > > > --- a/lib/test_hmm.c > > > > +++ b/lib/test_hmm.c > > > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > > > > goto error; > > > > } > > > > > > > > - zone_device_folio_init(page_folio(dpage), order); > > > > + zone_device_folio_init(page_folio(dpage), > > > > + page_pgmap(folio_page(page_folio(dpage), 0)), > > > > + order); > > > > dpage->zone_device_data = rpage; > > > > return dpage; > > > > > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > > > index 63c6ab4fdf08..6f46ab14662b 100644 > > > > --- a/mm/memremap.c > > > > +++ b/mm/memremap.c > > > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > > > > } > > > > } > > > > > > > > -void zone_device_page_init(struct page *page, unsigned int order) > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > + unsigned int order) > > > > { > > > > + struct page *new_page = page; > > > > + unsigned int i; > > > > + > > > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > > > > > > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > > + struct folio *new_folio = (struct folio *)new_page; > > > > + > > > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > > > > This seems odd to me, mainly due to the "magic" number. Why not just clear > > > the flags entirely? Or at least explicitly just clear the flags you care about > > > which would remove the need for the comment and should let you use the proper > > > PageFlag functions. > > > > > > > I'm copying this from folio_reset_order [1]. My paranoia about touching > > anything related to struct page is high, so I did the same thing > > folio_reset_order does here. So why not just use folio_reset_order() below? > > > > [1] https://elixir.bootlin.com/linux/v6.18.5/source/include/linux/mm.h#L1075 > > > > This immediately hangs my first SVM test... > > diff --git a/mm/memremap.c b/mm/memremap.c > index 6f46ab14662b..ef8c56876cf5 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -488,7 +488,7 @@ void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > for (i = 0; i < (1UL << order); ++i, ++new_page) { > struct folio *new_folio = (struct folio *)new_page; > > - new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > + new_page->flags.f = 0; > #ifdef NR_PAGES_IN_LARGE_FOLIO > ((struct folio *)(new_page - 1))->_nr_pages = 0; This seems wrong to me - I saw your reply to Balbir but for an order-0 page isn't this going to access a completely different, possibly already allocated, page? > #endif > > I can walk through exactly which flags need to be cleared, but my > feeling is that likely any flag that the order field overloads and can > possibly encode should be cleared—so bits 0–7 based on the existing > code. > > How about in a follow-up we normalize setting / clearing the order flag > field with a #define and an inline helper? Ie: Would something like the following work: ClearPageHead(new_page); clear_compound_head(new_page); folio_reset_order(new_folio); Which would also deal with setting _nr_pages. > Matt > > > > > +#ifdef NR_PAGES_IN_LARGE_FOLIO > > > > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > > +#endif > > > > + new_folio->mapping = NULL; > > > > + new_folio->pgmap = pgmap; /* Also clear compound head */ > > > > + new_folio->share = 0; /* fsdax only, unused for device private */ > > > > > > It would be nice if the FS DAX code actually used this as well. Is there a > > > reason that change was dropped from the series? > > > > > > > I don't have a test platform for FS DAX. In prior revisions, I was just > > moving existing FS DAX code to a helper, which I felt confident about. > > > > This revision is slightly different, and I don't feel comfortable > > modifying FS DAX code without a test platform. I agree we should update > > FS DAX, but that should be done in a follow-up with coordinated testing. Fair enough, I figured something like this might be your answer :-) You could update it and ask people with access to such a system to test it though (unfortunately my setup has bit-rotted beyond repair). But I'm ok leaving to for a future change. > > > > Matt > > > > > > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > > > > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > > > > + } > > > > + > > > > /* > > > > * Drivers shouldn't be allocating pages after calling > > > > * memunmap_pages(). > > > > -- > > > > 2.43.0 > > > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 7:07 ` Alistair Popple @ 2026-01-15 7:39 ` Balbir Singh 2026-01-15 7:43 ` Matthew Brost 1 sibling, 0 replies; 33+ messages in thread From: Balbir Singh @ 2026-01-15 7:39 UTC (permalink / raw) To: Alistair Popple, Matthew Brost Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On 1/15/26 18:07, Alistair Popple wrote: > On 2026-01-15 at 17:18 +1100, Matthew Brost <matthew.brost@intel.com> wrote... >> On Wed, Jan 14, 2026 at 09:57:31PM -0800, Matthew Brost wrote: >>> On Thu, Jan 15, 2026 at 04:27:26PM +1100, Alistair Popple wrote: >>>> On 2026-01-15 at 06:19 +1100, Francois Dugast <francois.dugast@intel.com> wrote... >>>>> From: Matthew Brost <matthew.brost@intel.com> >>>>> >>>>> Reinitialize metadata for large zone device private folios in >>>>> zone_device_page_init prior to creating a higher-order zone device >>>>> private folio. This step is necessary when the folio’s order changes >>>>> dynamically between zone_device_page_init calls to avoid building a >>>>> corrupt folio. As part of the metadata reinitialization, the dev_pagemap >>>>> must be passed in from the caller because the pgmap stored in the folio >>>>> page may have been overwritten with a compound head. >>>> >>>> Thanks for fixing, a couple of minor comments below. >>>> >>>>> Cc: Zi Yan <ziy@nvidia.com> >>>>> Cc: Alistair Popple <apopple@nvidia.com> >>>>> Cc: adhavan Srinivasan <maddy@linux.ibm.com> >>>>> Cc: Nicholas Piggin <npiggin@gmail.com> >>>>> Cc: Michael Ellerman <mpe@ellerman.id.au> >>>>> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> >>>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com> >>>>> Cc: Alex Deucher <alexander.deucher@amd.com> >>>>> Cc: "Christian König" <christian.koenig@amd.com> >>>>> Cc: David Airlie <airlied@gmail.com> >>>>> Cc: Simona Vetter <simona@ffwll.ch> >>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> >>>>> Cc: Maxime Ripard <mripard@kernel.org> >>>>> Cc: Thomas Zimmermann <tzimmermann@suse.de> >>>>> Cc: Lyude Paul <lyude@redhat.com> >>>>> Cc: Danilo Krummrich <dakr@kernel.org> >>>>> Cc: David Hildenbrand <david@kernel.org> >>>>> Cc: Oscar Salvador <osalvador@suse.de> >>>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>>> Cc: Jason Gunthorpe <jgg@ziepe.ca> >>>>> Cc: Leon Romanovsky <leon@kernel.org> >>>>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >>>>> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> >>>>> Cc: Vlastimil Babka <vbabka@suse.cz> >>>>> Cc: Mike Rapoport <rppt@kernel.org> >>>>> Cc: Suren Baghdasaryan <surenb@google.com> >>>>> Cc: Michal Hocko <mhocko@suse.com> >>>>> Cc: Balbir Singh <balbirs@nvidia.com> >>>>> Cc: linuxppc-dev@lists.ozlabs.org >>>>> Cc: kvm@vger.kernel.org >>>>> Cc: linux-kernel@vger.kernel.org >>>>> Cc: amd-gfx@lists.freedesktop.org >>>>> Cc: dri-devel@lists.freedesktop.org >>>>> Cc: nouveau@lists.freedesktop.org >>>>> Cc: linux-mm@kvack.org >>>>> Cc: linux-cxl@vger.kernel.org >>>>> Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") >>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com> >>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com> >>>>> --- >>>>> arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- >>>>> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- >>>>> drivers/gpu/drm/drm_pagemap.c | 2 +- >>>>> drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- >>>>> include/linux/memremap.h | 9 ++++++--- >>>>> lib/test_hmm.c | 4 +++- >>>>> mm/memremap.c | 20 +++++++++++++++++++- >>>>> 7 files changed, 32 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c >>>>> index e5000bef90f2..7cf9310de0ec 100644 >>>>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c >>>>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c >>>>> @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) >>>>> >>>>> dpage = pfn_to_page(uvmem_pfn); >>>>> dpage->zone_device_data = pvt; >>>>> - zone_device_page_init(dpage, 0); >>>>> + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); >>>>> return dpage; >>>>> out_clear: >>>>> spin_lock(&kvmppc_uvmem_bitmap_lock); >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >>>>> index af53e796ea1b..6ada7b4af7c6 100644 >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >>>>> @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) >>>>> page = pfn_to_page(pfn); >>>>> svm_range_bo_ref(prange->svm_bo); >>>>> page->zone_device_data = prange->svm_bo; >>>>> - zone_device_page_init(page, 0); >>>>> + zone_device_page_init(page, page_pgmap(page), 0); >>>>> } >>>>> >>>>> static void >>>>> diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c >>>>> index 03ee39a761a4..c497726b0147 100644 >>>>> --- a/drivers/gpu/drm/drm_pagemap.c >>>>> +++ b/drivers/gpu/drm/drm_pagemap.c >>>>> @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, >>>>> struct drm_pagemap_zdd *zdd) >>>>> { >>>>> page->zone_device_data = drm_pagemap_zdd_get(zdd); >>>>> - zone_device_page_init(page, 0); >>>>> + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); >>>>> } >>>>> >>>>> /** >>>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c >>>>> index 58071652679d..3d8031296eed 100644 >>>>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c >>>>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c >>>>> @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) >>>>> order = ilog2(DMEM_CHUNK_NPAGES); >>>>> } >>>>> >>>>> - zone_device_folio_init(folio, order); >>>>> + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); >>>>> return page; >>>>> } >>>>> >>>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >>>>> index 713ec0435b48..e3c2ccf872a8 100644 >>>>> --- a/include/linux/memremap.h >>>>> +++ b/include/linux/memremap.h >>>>> @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) >>>>> } >>>>> >>>>> #ifdef CONFIG_ZONE_DEVICE >>>>> -void zone_device_page_init(struct page *page, unsigned int order); >>>>> +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, >>>>> + unsigned int order); >>>>> void *memremap_pages(struct dev_pagemap *pgmap, int nid); >>>>> void memunmap_pages(struct dev_pagemap *pgmap); >>>>> void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); >>>>> @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); >>>>> >>>>> unsigned long memremap_compat_align(void); >>>>> >>>>> -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) >>>>> +static inline void zone_device_folio_init(struct folio *folio, >>>>> + struct dev_pagemap *pgmap, >>>>> + unsigned int order) >>>>> { >>>>> - zone_device_page_init(&folio->page, order); >>>>> + zone_device_page_init(&folio->page, pgmap, order); >>>>> if (order) >>>>> folio_set_large_rmappable(folio); >>>>> } >>>>> diff --git a/lib/test_hmm.c b/lib/test_hmm.c >>>>> index 8af169d3873a..455a6862ae50 100644 >>>>> --- a/lib/test_hmm.c >>>>> +++ b/lib/test_hmm.c >>>>> @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, >>>>> goto error; >>>>> } >>>>> >>>>> - zone_device_folio_init(page_folio(dpage), order); >>>>> + zone_device_folio_init(page_folio(dpage), >>>>> + page_pgmap(folio_page(page_folio(dpage), 0)), >>>>> + order); >>>>> dpage->zone_device_data = rpage; >>>>> return dpage; >>>>> >>>>> diff --git a/mm/memremap.c b/mm/memremap.c >>>>> index 63c6ab4fdf08..6f46ab14662b 100644 >>>>> --- a/mm/memremap.c >>>>> +++ b/mm/memremap.c >>>>> @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) >>>>> } >>>>> } >>>>> >>>>> -void zone_device_page_init(struct page *page, unsigned int order) >>>>> +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, >>>>> + unsigned int order) >>>>> { >>>>> + struct page *new_page = page; >>>>> + unsigned int i; >>>>> + >>>>> VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); >>>>> >>>>> + for (i = 0; i < (1UL << order); ++i, ++new_page) { >>>>> + struct folio *new_folio = (struct folio *)new_page; >>>>> + >>>>> + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ >>>> >>>> This seems odd to me, mainly due to the "magic" number. Why not just clear >>>> the flags entirely? Or at least explicitly just clear the flags you care about >>>> which would remove the need for the comment and should let you use the proper >>>> PageFlag functions. >>>> >>> >>> I'm copying this from folio_reset_order [1]. My paranoia about touching >>> anything related to struct page is high, so I did the same thing >>> folio_reset_order does here. > > So why not just use folio_reset_order() below? > >>> >>> [1] https://elixir.bootlin.com/linux/v6.18.5/source/include/linux/mm.h#L1075 >>> >> >> This immediately hangs my first SVM test... >> >> diff --git a/mm/memremap.c b/mm/memremap.c >> index 6f46ab14662b..ef8c56876cf5 100644 >> --- a/mm/memremap.c >> +++ b/mm/memremap.c >> @@ -488,7 +488,7 @@ void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, >> for (i = 0; i < (1UL << order); ++i, ++new_page) { >> struct folio *new_folio = (struct folio *)new_page; >> >> - new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ >> + new_page->flags.f = 0; >> #ifdef NR_PAGES_IN_LARGE_FOLIO >> ((struct folio *)(new_page - 1))->_nr_pages = 0; > > This seems wrong to me - I saw your reply to Balbir but for an order-0 page > isn't this going to access a completely different, possibly already allocated, > page? > >> #endif >> >> I can walk through exactly which flags need to be cleared, but my >> feeling is that likely any flag that the order field overloads and can >> possibly encode should be cleared—so bits 0–7 based on the existing >> code. >> >> How about in a follow-up we normalize setting / clearing the order flag >> field with a #define and an inline helper? > > Ie: Would something like the following work: > > ClearPageHead(new_page); > clear_compound_head(new_page); > folio_reset_order(new_folio); > > Which would also deal with setting _nr_pages. > I thought about this, but folio_reset_order works only for larger folios otherwise there is a VM_WARN_ON. >> Matt >> >>>>> +#ifdef NR_PAGES_IN_LARGE_FOLIO >>>>> + ((struct folio *)(new_page - 1))->_nr_pages = 0; >>>>> +#endif >>>>> + new_folio->mapping = NULL; >>>>> + new_folio->pgmap = pgmap; /* Also clear compound head */ >>>>> + new_folio->share = 0; /* fsdax only, unused for device private */ >>>> >>>> It would be nice if the FS DAX code actually used this as well. Is there a >>>> reason that change was dropped from the series? >>>> >>> >>> I don't have a test platform for FS DAX. In prior revisions, I was just >>> moving existing FS DAX code to a helper, which I felt confident about. >>> >>> This revision is slightly different, and I don't feel comfortable >>> modifying FS DAX code without a test platform. I agree we should update >>> FS DAX, but that should be done in a follow-up with coordinated testing. > > Fair enough, I figured something like this might be your answer :-) You > could update it and ask people with access to such a system to test it though > (unfortunately my setup has bit-rotted beyond repair). > > But I'm ok leaving to for a future change. > >>> >>> Matt >>> >>>>> + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); >>>>> + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); >>>>> + } >>>>> + >>>>> /* >>>>> * Drivers shouldn't be allocating pages after calling >>>>> * memunmap_pages(). >>>>> -- >>>>> 2.43.0 >>>>> ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 7:07 ` Alistair Popple 2026-01-15 7:39 ` Balbir Singh @ 2026-01-15 7:43 ` Matthew Brost 2026-01-15 11:05 ` Alistair Popple 1 sibling, 1 reply; 33+ messages in thread From: Matthew Brost @ 2026-01-15 7:43 UTC (permalink / raw) To: Alistair Popple Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Thu, Jan 15, 2026 at 06:07:08PM +1100, Alistair Popple wrote: > On 2026-01-15 at 17:18 +1100, Matthew Brost <matthew.brost@intel.com> wrote... > > On Wed, Jan 14, 2026 at 09:57:31PM -0800, Matthew Brost wrote: > > > On Thu, Jan 15, 2026 at 04:27:26PM +1100, Alistair Popple wrote: > > > > On 2026-01-15 at 06:19 +1100, Francois Dugast <francois.dugast@intel.com> wrote... > > > > > From: Matthew Brost <matthew.brost@intel.com> > > > > > > > > > > Reinitialize metadata for large zone device private folios in > > > > > zone_device_page_init prior to creating a higher-order zone device > > > > > private folio. This step is necessary when the folio’s order changes > > > > > dynamically between zone_device_page_init calls to avoid building a > > > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > > > must be passed in from the caller because the pgmap stored in the folio > > > > > page may have been overwritten with a compound head. > > > > > > > > Thanks for fixing, a couple of minor comments below. > > > > > > > > > Cc: Zi Yan <ziy@nvidia.com> > > > > > Cc: Alistair Popple <apopple@nvidia.com> > > > > > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > > > > > Cc: Nicholas Piggin <npiggin@gmail.com> > > > > > Cc: Michael Ellerman <mpe@ellerman.id.au> > > > > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > > > > > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > > Cc: "Christian König" <christian.koenig@amd.com> > > > > > Cc: David Airlie <airlied@gmail.com> > > > > > Cc: Simona Vetter <simona@ffwll.ch> > > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > > > Cc: Maxime Ripard <mripard@kernel.org> > > > > > Cc: Thomas Zimmermann <tzimmermann@suse.de> > > > > > Cc: Lyude Paul <lyude@redhat.com> > > > > > Cc: Danilo Krummrich <dakr@kernel.org> > > > > > Cc: David Hildenbrand <david@kernel.org> > > > > > Cc: Oscar Salvador <osalvador@suse.de> > > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > > > Cc: Leon Romanovsky <leon@kernel.org> > > > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > > > Cc: Mike Rapoport <rppt@kernel.org> > > > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > > > Cc: Michal Hocko <mhocko@suse.com> > > > > > Cc: Balbir Singh <balbirs@nvidia.com> > > > > > Cc: linuxppc-dev@lists.ozlabs.org > > > > > Cc: kvm@vger.kernel.org > > > > > Cc: linux-kernel@vger.kernel.org > > > > > Cc: amd-gfx@lists.freedesktop.org > > > > > Cc: dri-devel@lists.freedesktop.org > > > > > Cc: nouveau@lists.freedesktop.org > > > > > Cc: linux-mm@kvack.org > > > > > Cc: linux-cxl@vger.kernel.org > > > > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > > > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > > > > > --- > > > > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > > > > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > > > > > drivers/gpu/drm/drm_pagemap.c | 2 +- > > > > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > > > > > include/linux/memremap.h | 9 ++++++--- > > > > > lib/test_hmm.c | 4 +++- > > > > > mm/memremap.c | 20 +++++++++++++++++++- > > > > > 7 files changed, 32 insertions(+), 9 deletions(-) > > > > > > > > > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > index e5000bef90f2..7cf9310de0ec 100644 > > > > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > > > > > > > > > dpage = pfn_to_page(uvmem_pfn); > > > > > dpage->zone_device_data = pvt; > > > > > - zone_device_page_init(dpage, 0); > > > > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > > > > > return dpage; > > > > > out_clear: > > > > > spin_lock(&kvmppc_uvmem_bitmap_lock); > > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > index af53e796ea1b..6ada7b4af7c6 100644 > > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > > > > > page = pfn_to_page(pfn); > > > > > svm_range_bo_ref(prange->svm_bo); > > > > > page->zone_device_data = prange->svm_bo; > > > > > - zone_device_page_init(page, 0); > > > > > + zone_device_page_init(page, page_pgmap(page), 0); > > > > > } > > > > > > > > > > static void > > > > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > > > > > index 03ee39a761a4..c497726b0147 100644 > > > > > --- a/drivers/gpu/drm/drm_pagemap.c > > > > > +++ b/drivers/gpu/drm/drm_pagemap.c > > > > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > > > > struct drm_pagemap_zdd *zdd) > > > > > { > > > > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > > > > - zone_device_page_init(page, 0); > > > > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > > > > } > > > > > > > > > > /** > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > index 58071652679d..3d8031296eed 100644 > > > > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > > > > > order = ilog2(DMEM_CHUNK_NPAGES); > > > > > } > > > > > > > > > > - zone_device_folio_init(folio, order); > > > > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > > > > > return page; > > > > > } > > > > > > > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > > > > > index 713ec0435b48..e3c2ccf872a8 100644 > > > > > --- a/include/linux/memremap.h > > > > > +++ b/include/linux/memremap.h > > > > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > > > > > } > > > > > > > > > > #ifdef CONFIG_ZONE_DEVICE > > > > > -void zone_device_page_init(struct page *page, unsigned int order); > > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > > + unsigned int order); > > > > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > > > > > void memunmap_pages(struct dev_pagemap *pgmap); > > > > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > > > > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > > > > > > > > > unsigned long memremap_compat_align(void); > > > > > > > > > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > > > > > +static inline void zone_device_folio_init(struct folio *folio, > > > > > + struct dev_pagemap *pgmap, > > > > > + unsigned int order) > > > > > { > > > > > - zone_device_page_init(&folio->page, order); > > > > > + zone_device_page_init(&folio->page, pgmap, order); > > > > > if (order) > > > > > folio_set_large_rmappable(folio); > > > > > } > > > > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > > > > > index 8af169d3873a..455a6862ae50 100644 > > > > > --- a/lib/test_hmm.c > > > > > +++ b/lib/test_hmm.c > > > > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > > > > > goto error; > > > > > } > > > > > > > > > > - zone_device_folio_init(page_folio(dpage), order); > > > > > + zone_device_folio_init(page_folio(dpage), > > > > > + page_pgmap(folio_page(page_folio(dpage), 0)), > > > > > + order); > > > > > dpage->zone_device_data = rpage; > > > > > return dpage; > > > > > > > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > > > > index 63c6ab4fdf08..6f46ab14662b 100644 > > > > > --- a/mm/memremap.c > > > > > +++ b/mm/memremap.c > > > > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > > > > > } > > > > > } > > > > > > > > > > -void zone_device_page_init(struct page *page, unsigned int order) > > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > > + unsigned int order) > > > > > { > > > > > + struct page *new_page = page; > > > > > + unsigned int i; > > > > > + > > > > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > > > > > > > > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > > > + struct folio *new_folio = (struct folio *)new_page; > > > > > + > > > > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > > > > > > This seems odd to me, mainly due to the "magic" number. Why not just clear > > > > the flags entirely? Or at least explicitly just clear the flags you care about > > > > which would remove the need for the comment and should let you use the proper > > > > PageFlag functions. > > > > > > > > > > I'm copying this from folio_reset_order [1]. My paranoia about touching > > > anything related to struct page is high, so I did the same thing > > > folio_reset_order does here. > > So why not just use folio_reset_order() below? > > > > > > > [1] https://elixir.bootlin.com/linux/v6.18.5/source/include/linux/mm.h#L1075 > > > > > > > This immediately hangs my first SVM test... > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > index 6f46ab14662b..ef8c56876cf5 100644 > > --- a/mm/memremap.c > > +++ b/mm/memremap.c > > @@ -488,7 +488,7 @@ void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > for (i = 0; i < (1UL << order); ++i, ++new_page) { > > struct folio *new_folio = (struct folio *)new_page; > > > > - new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > + new_page->flags.f = 0; > > #ifdef NR_PAGES_IN_LARGE_FOLIO > > ((struct folio *)(new_page - 1))->_nr_pages = 0; > > This seems wrong to me - I saw your reply to Balbir but for an order-0 page > isn't this going to access a completely different, possibly already allocated, > page? > No — it accesses itself (new_page). It just uses some odd memory tricks for this, which I agree isn’t the best thing I’ve ever written, but it was the least-worst idea I had. I didn’t design the folio/page field aliasing; I understand why it exists, but it still makes my head hurt. folio->_nr_pages is page + 1 for reference (new_page after this math). Again, if I touched this memory directly in new_page, it’s most likely memcg_data, but that is hidden behind a Kconfig. This just blindly implementing part of folio_reset_order which clears _nr_pages. > > #endif > > > > I can walk through exactly which flags need to be cleared, but my > > feeling is that likely any flag that the order field overloads and can > > possibly encode should be cleared—so bits 0–7 based on the existing > > code. > > > > How about in a follow-up we normalize setting / clearing the order flag > > field with a #define and an inline helper? > > Ie: Would something like the following work: > > ClearPageHead(new_page); Any of these bit could possibly be set the order field in a folio, which modifies page + 1 flags field. PG_locked, /* Page is locked. Don't touch. */ PG_writeback, /* Page is under writeback */ PG_referenced, PG_uptodate, PG_dirty, PG_lru, PG_head, /* Must be in bit 6 */ PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */ So a common order-9 (2MB) folio would have PG_locked | PG_uptodate set. Now we get stuck on the next page lock because PG_locked is set. Offhand, I don’t know if different orders—which set different bits—cause any nasty issues either. So I figured the safest thing was clear any bits which folio order can set within subsequent page's memory flags like folio_reset_order does. > clear_compound_head(new_page); > folio_reset_order(new_folio); > > Which would also deal with setting _nr_pages. > folio_reset_order(new_folio) would set _nr_pages in the memory that is new_page + 1. So let's say that page has a ref count + memcg_data, now that memory is corrupted and will crash the kernel. All of the above is why is took me multiple hours to write 6 lines of code :). > > Matt > > > > > > > +#ifdef NR_PAGES_IN_LARGE_FOLIO > > > > > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > > > +#endif > > > > > + new_folio->mapping = NULL; > > > > > + new_folio->pgmap = pgmap; /* Also clear compound head */ > > > > > + new_folio->share = 0; /* fsdax only, unused for device private */ > > > > > > > > It would be nice if the FS DAX code actually used this as well. Is there a > > > > reason that change was dropped from the series? > > > > > > > > > > I don't have a test platform for FS DAX. In prior revisions, I was just > > > moving existing FS DAX code to a helper, which I felt confident about. > > > > > > This revision is slightly different, and I don't feel comfortable > > > modifying FS DAX code without a test platform. I agree we should update > > > FS DAX, but that should be done in a follow-up with coordinated testing. > > Fair enough, I figured something like this might be your answer :-) You > could update it and ask people with access to such a system to test it though > (unfortunately my setup has bit-rotted beyond repair). > > But I'm ok leaving to for a future change. > I did a quick grep in fs/dax.c and don’t see zone_device_page_init called there. It probably could be used if it’s creating compound pages and drop the open-coded reinit when shared == 0, but yeah, that’s not something I can blindly code without testing. I can try to put something together for people to test soonish. Matt > > > > > > Matt > > > > > > > > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > > > > > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > > > > > + } > > > > > + > > > > > /* > > > > > * Drivers shouldn't be allocating pages after calling > > > > > * memunmap_pages(). > > > > > -- > > > > > 2.43.0 > > > > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 7:43 ` Matthew Brost @ 2026-01-15 11:05 ` Alistair Popple 2026-01-16 6:35 ` Matthew Brost 0 siblings, 1 reply; 33+ messages in thread From: Alistair Popple @ 2026-01-15 11:05 UTC (permalink / raw) To: Matthew Brost Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On 2026-01-15 at 18:43 +1100, Matthew Brost <matthew.brost@intel.com> wrote... > On Thu, Jan 15, 2026 at 06:07:08PM +1100, Alistair Popple wrote: > > On 2026-01-15 at 17:18 +1100, Matthew Brost <matthew.brost@intel.com> wrote... > > > On Wed, Jan 14, 2026 at 09:57:31PM -0800, Matthew Brost wrote: > > > > On Thu, Jan 15, 2026 at 04:27:26PM +1100, Alistair Popple wrote: > > > > > On 2026-01-15 at 06:19 +1100, Francois Dugast <francois.dugast@intel.com> wrote... > > > > > > From: Matthew Brost <matthew.brost@intel.com> > > > > > > > > > > > > Reinitialize metadata for large zone device private folios in > > > > > > zone_device_page_init prior to creating a higher-order zone device > > > > > > private folio. This step is necessary when the folio’s order changes > > > > > > dynamically between zone_device_page_init calls to avoid building a > > > > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > > > > must be passed in from the caller because the pgmap stored in the folio > > > > > > page may have been overwritten with a compound head. > > > > > > > > > > Thanks for fixing, a couple of minor comments below. > > > > > > > > > > > Cc: Zi Yan <ziy@nvidia.com> > > > > > > Cc: Alistair Popple <apopple@nvidia.com> > > > > > > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > > > > > > Cc: Nicholas Piggin <npiggin@gmail.com> > > > > > > Cc: Michael Ellerman <mpe@ellerman.id.au> > > > > > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > > > > > > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > > > Cc: "Christian König" <christian.koenig@amd.com> > > > > > > Cc: David Airlie <airlied@gmail.com> > > > > > > Cc: Simona Vetter <simona@ffwll.ch> > > > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > > > > Cc: Maxime Ripard <mripard@kernel.org> > > > > > > Cc: Thomas Zimmermann <tzimmermann@suse.de> > > > > > > Cc: Lyude Paul <lyude@redhat.com> > > > > > > Cc: Danilo Krummrich <dakr@kernel.org> > > > > > > Cc: David Hildenbrand <david@kernel.org> > > > > > > Cc: Oscar Salvador <osalvador@suse.de> > > > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > > > > Cc: Leon Romanovsky <leon@kernel.org> > > > > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > > > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > > > > Cc: Mike Rapoport <rppt@kernel.org> > > > > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > > > > Cc: Michal Hocko <mhocko@suse.com> > > > > > > Cc: Balbir Singh <balbirs@nvidia.com> > > > > > > Cc: linuxppc-dev@lists.ozlabs.org > > > > > > Cc: kvm@vger.kernel.org > > > > > > Cc: linux-kernel@vger.kernel.org > > > > > > Cc: amd-gfx@lists.freedesktop.org > > > > > > Cc: dri-devel@lists.freedesktop.org > > > > > > Cc: nouveau@lists.freedesktop.org > > > > > > Cc: linux-mm@kvack.org > > > > > > Cc: linux-cxl@vger.kernel.org > > > > > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > > > > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > > > > > > --- > > > > > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > > > > > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > > > > > > drivers/gpu/drm/drm_pagemap.c | 2 +- > > > > > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > > > > > > include/linux/memremap.h | 9 ++++++--- > > > > > > lib/test_hmm.c | 4 +++- > > > > > > mm/memremap.c | 20 +++++++++++++++++++- > > > > > > 7 files changed, 32 insertions(+), 9 deletions(-) > > > > > > > > > > > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > > index e5000bef90f2..7cf9310de0ec 100644 > > > > > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > > > > > > > > > > > dpage = pfn_to_page(uvmem_pfn); > > > > > > dpage->zone_device_data = pvt; > > > > > > - zone_device_page_init(dpage, 0); > > > > > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > > > > > > return dpage; > > > > > > out_clear: > > > > > > spin_lock(&kvmppc_uvmem_bitmap_lock); > > > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > > index af53e796ea1b..6ada7b4af7c6 100644 > > > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > > > > > > page = pfn_to_page(pfn); > > > > > > svm_range_bo_ref(prange->svm_bo); > > > > > > page->zone_device_data = prange->svm_bo; > > > > > > - zone_device_page_init(page, 0); > > > > > > + zone_device_page_init(page, page_pgmap(page), 0); > > > > > > } > > > > > > > > > > > > static void > > > > > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > > > > > > index 03ee39a761a4..c497726b0147 100644 > > > > > > --- a/drivers/gpu/drm/drm_pagemap.c > > > > > > +++ b/drivers/gpu/drm/drm_pagemap.c > > > > > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > > > > > struct drm_pagemap_zdd *zdd) > > > > > > { > > > > > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > > > > > - zone_device_page_init(page, 0); > > > > > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > > > > > } > > > > > > > > > > > > /** > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > > index 58071652679d..3d8031296eed 100644 > > > > > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > > > > > > order = ilog2(DMEM_CHUNK_NPAGES); > > > > > > } > > > > > > > > > > > > - zone_device_folio_init(folio, order); > > > > > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > > > > > > return page; > > > > > > } > > > > > > > > > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > > > > > > index 713ec0435b48..e3c2ccf872a8 100644 > > > > > > --- a/include/linux/memremap.h > > > > > > +++ b/include/linux/memremap.h > > > > > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > > > > > > } > > > > > > > > > > > > #ifdef CONFIG_ZONE_DEVICE > > > > > > -void zone_device_page_init(struct page *page, unsigned int order); > > > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > > > + unsigned int order); > > > > > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > > > > > > void memunmap_pages(struct dev_pagemap *pgmap); > > > > > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > > > > > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > > > > > > > > > > > unsigned long memremap_compat_align(void); > > > > > > > > > > > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > > > > > > +static inline void zone_device_folio_init(struct folio *folio, > > > > > > + struct dev_pagemap *pgmap, > > > > > > + unsigned int order) > > > > > > { > > > > > > - zone_device_page_init(&folio->page, order); > > > > > > + zone_device_page_init(&folio->page, pgmap, order); > > > > > > if (order) > > > > > > folio_set_large_rmappable(folio); > > > > > > } > > > > > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > > > > > > index 8af169d3873a..455a6862ae50 100644 > > > > > > --- a/lib/test_hmm.c > > > > > > +++ b/lib/test_hmm.c > > > > > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > > > > > > goto error; > > > > > > } > > > > > > > > > > > > - zone_device_folio_init(page_folio(dpage), order); > > > > > > + zone_device_folio_init(page_folio(dpage), > > > > > > + page_pgmap(folio_page(page_folio(dpage), 0)), > > > > > > + order); > > > > > > dpage->zone_device_data = rpage; > > > > > > return dpage; > > > > > > > > > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > > > > > index 63c6ab4fdf08..6f46ab14662b 100644 > > > > > > --- a/mm/memremap.c > > > > > > +++ b/mm/memremap.c > > > > > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > > > > > > } > > > > > > } > > > > > > > > > > > > -void zone_device_page_init(struct page *page, unsigned int order) > > > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > > > + unsigned int order) > > > > > > { > > > > > > + struct page *new_page = page; > > > > > > + unsigned int i; > > > > > > + > > > > > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > > > > > > > > > > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > > > > + struct folio *new_folio = (struct folio *)new_page; > > > > > > + > > > > > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > > > > > > > > This seems odd to me, mainly due to the "magic" number. Why not just clear > > > > > the flags entirely? Or at least explicitly just clear the flags you care about > > > > > which would remove the need for the comment and should let you use the proper > > > > > PageFlag functions. > > > > > > > > > > > > > I'm copying this from folio_reset_order [1]. My paranoia about touching > > > > anything related to struct page is high, so I did the same thing > > > > folio_reset_order does here. > > > > So why not just use folio_reset_order() below? > > > > > > > > > > [1] https://elixir.bootlin.com/linux/v6.18.5/source/include/linux/mm.h#L1075 > > > > > > > > > > This immediately hangs my first SVM test... > > > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > > index 6f46ab14662b..ef8c56876cf5 100644 > > > --- a/mm/memremap.c > > > +++ b/mm/memremap.c > > > @@ -488,7 +488,7 @@ void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > struct folio *new_folio = (struct folio *)new_page; > > > > > > - new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > + new_page->flags.f = 0; > > > #ifdef NR_PAGES_IN_LARGE_FOLIO > > > ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > > This seems wrong to me - I saw your reply to Balbir but for an order-0 page > > isn't this going to access a completely different, possibly already allocated, > > page? > > > > No — it accesses itself (new_page). It just uses some odd memory tricks > for this, which I agree isn’t the best thing I’ve ever written, but it > was the least-worst idea I had. I didn’t design the folio/page field > aliasing; I understand why it exists, but it still makes my head hurt. And obviously mine, because I (was) still not getting it and had typed up a whole response and code walk through to show what was wrong, in the hope it would help settle the misunderstanding. Which it did, because I discovered where I was getting things wrong. But I've left the analysis below because it's probably useful for others following along: Walking through the code we have: void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, unsigned int order) { The first argument, page, is the first in a set of 1 << order contiguous struct page. In the simplest case order == 0, meaning this function should only initialise (ie. touch) a single struct page pointer which is passed as the first argument to the function. struct page *new_page = page; So now *new_page points to the single struct page we should touch. unsigned int i; VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); for (i = 0; i < (1UL << order); ++i, ++new_page) { order == 0, so this loop will only execute once. struct folio *new_folio = (struct folio *)new_page; new_page still points to the single page we're initialising, and new_folio points to the same page. Ie: &new_folio->page == new_page. There is a hazard here because new_folio->__page_1, __page_2, etc. all point to pages we shouldn't touch. new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ Clears the flags, makes sense. #ifdef NR_PAGES_IN_LARGE_FOLIO ((struct folio *)(new_page - 1))->_nr_pages = 0; If we break this down we have: struct page *tmp_new_page = new_page - 1; Which is the page before the one we're initialising and shouldn't be touched. Then we cast to a folio: struct folio *tmp_new_folio = (struct folio *) tmp_new_page; And reset _nr_pages: tmp_new_folio->_nr_pages = 0 And now I can see where I was confused - &tmp_new_folio->_nr_pages == &tmp_new_folio->__page_1->memcg_data == &new_page->memcg_data So after both Balbir, probably yourself, and definitely myself scratching our heads for way too long over this change I think we can conclude that the code as is is way too confusing to merge without a lot more comments :-) However why go through all this magic in the first place? Why not just treat everything here as a page and just do new_page->memcg_data = 0 directly? That seems like the more straight forward approach. In fact given all the confusion I wonder if it wouldn't be better to just do memset(new_page, 0, sizeof(*new_page)) and reinitialise everything from scratch. > folio->_nr_pages is page + 1 for reference (new_page after this math). > Again, if I touched this memory directly in new_page, it’s most likely > memcg_data, but that is hidden behind a Kconfig. > > This just blindly implementing part of folio_reset_order which clears > _nr_pages. Yeah, I get it now. But I think just clearing memcg_data would be the easiest to understand approach, especially if it had a comment explaining that it may have previously been used for _nr_pages. > > > #endif > > > > > > I can walk through exactly which flags need to be cleared, but my > > > feeling is that likely any flag that the order field overloads and can > > > possibly encode should be cleared—so bits 0–7 based on the existing > > > code. > > > > > > How about in a follow-up we normalize setting / clearing the order flag > > > field with a #define and an inline helper? > > > > Ie: Would something like the following work: > > > > ClearPageHead(new_page); > > Any of these bit could possibly be set the order field in a folio, which > modifies page + 1 flags field. > > PG_locked, /* Page is locked. Don't touch. */ > PG_writeback, /* Page is under writeback */ > PG_referenced, > PG_uptodate, > PG_dirty, > PG_lru, > PG_head, /* Must be in bit 6 */ > PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */ > > So a common order-9 (2MB) folio would have PG_locked | PG_uptodate set. > Now we get stuck on the next page lock because PG_locked is set. > Offhand, I don’t know if different orders—which set different bits—cause > any nasty issues either. So I figured the safest thing was clear any > bits which folio order can set within subsequent page's memory flags > like folio_reset_order does. Oh, I get the above. I was thinking folio_reset_order() below would clear the flags, but I see the folly there - that resets the flags for the next page. > > > clear_compound_head(new_page); > > folio_reset_order(new_folio); > > > > Which would also deal with setting _nr_pages. > > > > folio_reset_order(new_folio) would set _nr_pages in the memory that is > new_page + 1. So let's say that page has a ref count + memcg_data, now > that memory is corrupted and will crash the kernel. Yep, I just noticed that. Thanks for pointing that out. > All of the above is why is took me multiple hours to write 6 lines of > code :). And to review :) Good thing we don't get paid per SLOC of code right? - Alistair > > > Matt > > > > > > > > > +#ifdef NR_PAGES_IN_LARGE_FOLIO > > > > > > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > > > > +#endif > > > > > > + new_folio->mapping = NULL; > > > > > > + new_folio->pgmap = pgmap; /* Also clear compound head */ > > > > > > + new_folio->share = 0; /* fsdax only, unused for device private */ > > > > > > > > > > It would be nice if the FS DAX code actually used this as well. Is there a > > > > > reason that change was dropped from the series? > > > > > > > > > > > > > I don't have a test platform for FS DAX. In prior revisions, I was just > > > > moving existing FS DAX code to a helper, which I felt confident about. > > > > > > > > This revision is slightly different, and I don't feel comfortable > > > > modifying FS DAX code without a test platform. I agree we should update > > > > FS DAX, but that should be done in a follow-up with coordinated testing. > > > > Fair enough, I figured something like this might be your answer :-) You > > could update it and ask people with access to such a system to test it though > > (unfortunately my setup has bit-rotted beyond repair). > > > > But I'm ok leaving to for a future change. > > > > I did a quick grep in fs/dax.c and don’t see zone_device_page_init > called there. It probably could be used if it’s creating compound pages > and drop the open-coded reinit when shared == 0, but yeah, that’s not > something I can blindly code without testing. > > I can try to put something together for people to test soonish. > > Matt > > > > > > > > > Matt > > > > > > > > > > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > > > > > > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > > > > > > + } > > > > > > + > > > > > > /* > > > > > > * Drivers shouldn't be allocating pages after calling > > > > > > * memunmap_pages(). > > > > > > -- > > > > > > 2.43.0 > > > > > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 11:05 ` Alistair Popple @ 2026-01-16 6:35 ` Matthew Brost 2026-01-16 16:39 ` Rodrigo Vivi 0 siblings, 1 reply; 33+ messages in thread From: Matthew Brost @ 2026-01-16 6:35 UTC (permalink / raw) To: Alistair Popple Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Thu, Jan 15, 2026 at 10:05:00PM +1100, Alistair Popple wrote: > On 2026-01-15 at 18:43 +1100, Matthew Brost <matthew.brost@intel.com> wrote... > > On Thu, Jan 15, 2026 at 06:07:08PM +1100, Alistair Popple wrote: > > > On 2026-01-15 at 17:18 +1100, Matthew Brost <matthew.brost@intel.com> wrote... > > > > On Wed, Jan 14, 2026 at 09:57:31PM -0800, Matthew Brost wrote: > > > > > On Thu, Jan 15, 2026 at 04:27:26PM +1100, Alistair Popple wrote: > > > > > > On 2026-01-15 at 06:19 +1100, Francois Dugast <francois.dugast@intel.com> wrote... > > > > > > > From: Matthew Brost <matthew.brost@intel.com> > > > > > > > > > > > > > > Reinitialize metadata for large zone device private folios in > > > > > > > zone_device_page_init prior to creating a higher-order zone device > > > > > > > private folio. This step is necessary when the folio’s order changes > > > > > > > dynamically between zone_device_page_init calls to avoid building a > > > > > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > > > > > must be passed in from the caller because the pgmap stored in the folio > > > > > > > page may have been overwritten with a compound head. > > > > > > > > > > > > Thanks for fixing, a couple of minor comments below. > > > > > > > > > > > > > Cc: Zi Yan <ziy@nvidia.com> > > > > > > > Cc: Alistair Popple <apopple@nvidia.com> > > > > > > > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > > > > > > > Cc: Nicholas Piggin <npiggin@gmail.com> > > > > > > > Cc: Michael Ellerman <mpe@ellerman.id.au> > > > > > > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > > > > > > > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > > > > Cc: "Christian König" <christian.koenig@amd.com> > > > > > > > Cc: David Airlie <airlied@gmail.com> > > > > > > > Cc: Simona Vetter <simona@ffwll.ch> > > > > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > > > > > Cc: Maxime Ripard <mripard@kernel.org> > > > > > > > Cc: Thomas Zimmermann <tzimmermann@suse.de> > > > > > > > Cc: Lyude Paul <lyude@redhat.com> > > > > > > > Cc: Danilo Krummrich <dakr@kernel.org> > > > > > > > Cc: David Hildenbrand <david@kernel.org> > > > > > > > Cc: Oscar Salvador <osalvador@suse.de> > > > > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > > > > > Cc: Leon Romanovsky <leon@kernel.org> > > > > > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > > > > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > > > > > Cc: Mike Rapoport <rppt@kernel.org> > > > > > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > > > > > Cc: Michal Hocko <mhocko@suse.com> > > > > > > > Cc: Balbir Singh <balbirs@nvidia.com> > > > > > > > Cc: linuxppc-dev@lists.ozlabs.org > > > > > > > Cc: kvm@vger.kernel.org > > > > > > > Cc: linux-kernel@vger.kernel.org > > > > > > > Cc: amd-gfx@lists.freedesktop.org > > > > > > > Cc: dri-devel@lists.freedesktop.org > > > > > > > Cc: nouveau@lists.freedesktop.org > > > > > > > Cc: linux-mm@kvack.org > > > > > > > Cc: linux-cxl@vger.kernel.org > > > > > > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > > > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > > > > > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > > > > > > > --- > > > > > > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > > > > > > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > > > > > > > drivers/gpu/drm/drm_pagemap.c | 2 +- > > > > > > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > > > > > > > include/linux/memremap.h | 9 ++++++--- > > > > > > > lib/test_hmm.c | 4 +++- > > > > > > > mm/memremap.c | 20 +++++++++++++++++++- > > > > > > > 7 files changed, 32 insertions(+), 9 deletions(-) > > > > > > > > > > > > > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > > > index e5000bef90f2..7cf9310de0ec 100644 > > > > > > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > > > > > > > > > > > > > dpage = pfn_to_page(uvmem_pfn); > > > > > > > dpage->zone_device_data = pvt; > > > > > > > - zone_device_page_init(dpage, 0); > > > > > > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > > > > > > > return dpage; > > > > > > > out_clear: > > > > > > > spin_lock(&kvmppc_uvmem_bitmap_lock); > > > > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > > > index af53e796ea1b..6ada7b4af7c6 100644 > > > > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > > > > > > > page = pfn_to_page(pfn); > > > > > > > svm_range_bo_ref(prange->svm_bo); > > > > > > > page->zone_device_data = prange->svm_bo; > > > > > > > - zone_device_page_init(page, 0); > > > > > > > + zone_device_page_init(page, page_pgmap(page), 0); > > > > > > > } > > > > > > > > > > > > > > static void > > > > > > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > > > > > > > index 03ee39a761a4..c497726b0147 100644 > > > > > > > --- a/drivers/gpu/drm/drm_pagemap.c > > > > > > > +++ b/drivers/gpu/drm/drm_pagemap.c > > > > > > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > > > > > > struct drm_pagemap_zdd *zdd) > > > > > > > { > > > > > > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > > > > > > - zone_device_page_init(page, 0); > > > > > > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > > > > > > } > > > > > > > > > > > > > > /** > > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > > > index 58071652679d..3d8031296eed 100644 > > > > > > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > > > > > > > order = ilog2(DMEM_CHUNK_NPAGES); > > > > > > > } > > > > > > > > > > > > > > - zone_device_folio_init(folio, order); > > > > > > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > > > > > > > return page; > > > > > > > } > > > > > > > > > > > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > > > > > > > index 713ec0435b48..e3c2ccf872a8 100644 > > > > > > > --- a/include/linux/memremap.h > > > > > > > +++ b/include/linux/memremap.h > > > > > > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > > > > > > > } > > > > > > > > > > > > > > #ifdef CONFIG_ZONE_DEVICE > > > > > > > -void zone_device_page_init(struct page *page, unsigned int order); > > > > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > > > > + unsigned int order); > > > > > > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > > > > > > > void memunmap_pages(struct dev_pagemap *pgmap); > > > > > > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > > > > > > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > > > > > > > > > > > > > unsigned long memremap_compat_align(void); > > > > > > > > > > > > > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > > > > > > > +static inline void zone_device_folio_init(struct folio *folio, > > > > > > > + struct dev_pagemap *pgmap, > > > > > > > + unsigned int order) > > > > > > > { > > > > > > > - zone_device_page_init(&folio->page, order); > > > > > > > + zone_device_page_init(&folio->page, pgmap, order); > > > > > > > if (order) > > > > > > > folio_set_large_rmappable(folio); > > > > > > > } > > > > > > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > > > > > > > index 8af169d3873a..455a6862ae50 100644 > > > > > > > --- a/lib/test_hmm.c > > > > > > > +++ b/lib/test_hmm.c > > > > > > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > > > > > > > goto error; > > > > > > > } > > > > > > > > > > > > > > - zone_device_folio_init(page_folio(dpage), order); > > > > > > > + zone_device_folio_init(page_folio(dpage), > > > > > > > + page_pgmap(folio_page(page_folio(dpage), 0)), > > > > > > > + order); > > > > > > > dpage->zone_device_data = rpage; > > > > > > > return dpage; > > > > > > > > > > > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > > > > > > index 63c6ab4fdf08..6f46ab14662b 100644 > > > > > > > --- a/mm/memremap.c > > > > > > > +++ b/mm/memremap.c > > > > > > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > -void zone_device_page_init(struct page *page, unsigned int order) > > > > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > > > > + unsigned int order) > > > > > > > { > > > > > > > + struct page *new_page = page; > > > > > > > + unsigned int i; > > > > > > > + > > > > > > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > > > > > > > > > > > > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > > > > > + struct folio *new_folio = (struct folio *)new_page; > > > > > > > + > > > > > > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > > > > > > > > > > This seems odd to me, mainly due to the "magic" number. Why not just clear > > > > > > the flags entirely? Or at least explicitly just clear the flags you care about > > > > > > which would remove the need for the comment and should let you use the proper > > > > > > PageFlag functions. > > > > > > > > > > > > > > > > I'm copying this from folio_reset_order [1]. My paranoia about touching > > > > > anything related to struct page is high, so I did the same thing > > > > > folio_reset_order does here. > > > > > > So why not just use folio_reset_order() below? > > > > > > > > > > > > > [1] https://elixir.bootlin.com/linux/v6.18.5/source/include/linux/mm.h#L1075 > > > > > > > > > > > > > This immediately hangs my first SVM test... > > > > > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > > > index 6f46ab14662b..ef8c56876cf5 100644 > > > > --- a/mm/memremap.c > > > > +++ b/mm/memremap.c > > > > @@ -488,7 +488,7 @@ void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > > struct folio *new_folio = (struct folio *)new_page; > > > > > > > > - new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > > + new_page->flags.f = 0; > > > > #ifdef NR_PAGES_IN_LARGE_FOLIO > > > > ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > > > > This seems wrong to me - I saw your reply to Balbir but for an order-0 page > > > isn't this going to access a completely different, possibly already allocated, > > > page? > > > > > > > No — it accesses itself (new_page). It just uses some odd memory tricks > > for this, which I agree isn’t the best thing I’ve ever written, but it > > was the least-worst idea I had. I didn’t design the folio/page field > > aliasing; I understand why it exists, but it still makes my head hurt. > > And obviously mine, because I (was) still not getting it and had typed up a > whole response and code walk through to show what was wrong, in the hope it > would help settle the misunderstanding. Which it did, because I discovered > where I was getting things wrong. But I've left the analysis below because it's > probably useful for others following along: > > Walking through the code we have: > > void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > unsigned int order) > { > > The first argument, page, is the first in a set of 1 << order contiguous > struct page. In the simplest case order == 0, meaning this function should only > initialise (ie. touch) a single struct page pointer which is passed as the first > argument to the function. Yes. > > struct page *new_page = page; > > So now *new_page points to the single struct page we should touch. > > unsigned int i; > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > for (i = 0; i < (1UL << order); ++i, ++new_page) { > > order == 0, so this loop will only execute once. > Yes. > struct folio *new_folio = (struct folio *)new_page; > > new_page still points to the single page we're initialising, and new_folio > points to the same page. Ie: &new_folio->page == new_page. There is a hazard > here because new_folio->__page_1, __page_2, etc. all point to pages we shouldn't > touch. > Yes. > new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > Clears the flags, makes sense. > +1 > #ifdef NR_PAGES_IN_LARGE_FOLIO > ((struct folio *)(new_page - 1))->_nr_pages = 0; > > If we break this down we have: > > struct page *tmp_new_page = new_page - 1; > > Which is the page before the one we're initialising and shouldn't be touched. > Then we cast to a folio: > > struct folio *tmp_new_folio = (struct folio *) tmp_new_page; > > And reset _nr_pages: > > tmp_new_folio->_nr_pages = 0 > > And now I can see where I was confused - &tmp_new_folio->_nr_pages == &tmp_new_folio->__page_1->memcg_data == &new_page->memcg_data > Not 100% right, as _nr_pages is 4 bytes and memcg_data is 8, but the pointer base address is the same. > So after both Balbir, probably yourself, and definitely myself scratching our > heads for way too long over this change I think we can conclude that the code as > is is way too confusing to merge without a lot more comments :-) > I think more comments is the way to go. More below. > However why go through all this magic in the first place? Why not just treat > everything here as a page and just do > > new_page->memcg_data = 0 > Well, memcg_data is 8 bytes and _nr_pages is 4. They also have different #ifdef conditions around each field, etc. I’ve also seen failures in our testing, and so has François, with the memcg_data change. I wish I had a stack trace to share or explain, but the times I hit the error I didn’t capture the dmesg, and I’ve been having issues with my dev machine today. If I catch the error again, I’ll reply with a stack trace and analysis. > directly? That seems like the more straight forward approach. In fact given > all the confusion I wonder if it wouldn't be better to just do > memset(new_page, 0, sizeof(*new_page)) and reinitialise everything from > scratch. I had considered this option too, but I’d be a little concerned about the performance. Reinitializing a zone page/folio is a hot path, as this is typically done in a GPU fault handler. I think adding verbose comments explaining why this works, plus some follow-up helpers, might be the better option. > > > folio->_nr_pages is page + 1 for reference (new_page after this math). > > Again, if I touched this memory directly in new_page, it’s most likely > > memcg_data, but that is hidden behind a Kconfig. > > > > This just blindly implementing part of folio_reset_order which clears > > _nr_pages. > > Yeah, I get it now. But I think just clearing memcg_data would be the easiest to > understand approach, especially if it had a comment explaining that it may have > previously been used for _nr_pages. > See above — the different sizes, the failure I’m seeing, and the conflicting #ifdefs are why this is not my preferred option. > > > > #endif > > > > > > > > I can walk through exactly which flags need to be cleared, but my > > > > feeling is that likely any flag that the order field overloads and can > > > > possibly encode should be cleared—so bits 0–7 based on the existing > > > > code. > > > > > > > > How about in a follow-up we normalize setting / clearing the order flag > > > > field with a #define and an inline helper? > > > > > > Ie: Would something like the following work: > > > > > > ClearPageHead(new_page); > > > > Any of these bit could possibly be set the order field in a folio, which > > modifies page + 1 flags field. > > > > PG_locked, /* Page is locked. Don't touch. */ > > PG_writeback, /* Page is under writeback */ > > PG_referenced, > > PG_uptodate, > > PG_dirty, > > PG_lru, > > PG_head, /* Must be in bit 6 */ > > PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */ > > > > So a common order-9 (2MB) folio would have PG_locked | PG_uptodate set. > > Now we get stuck on the next page lock because PG_locked is set. > > Offhand, I don’t know if different orders—which set different bits—cause > > any nasty issues either. So I figured the safest thing was clear any > > bits which folio order can set within subsequent page's memory flags > > like folio_reset_order does. > > Oh, I get the above. I was thinking folio_reset_order() below would clear the > flags, but I see the folly there - that resets the flags for the next page. > Correct. > > > > > clear_compound_head(new_page); > > > folio_reset_order(new_folio); > > > > > > Which would also deal with setting _nr_pages. > > > > > > > folio_reset_order(new_folio) would set _nr_pages in the memory that is > > new_page + 1. So let's say that page has a ref count + memcg_data, now > > that memory is corrupted and will crash the kernel. > > Yep, I just noticed that. Thanks for pointing that out. > > > All of the above is why is took me multiple hours to write 6 lines of > > code :). > > And to review :) Good thing we don't get paid per SLOC of code right? > I don’t think anyone would touch core MM if pay were based on SLOC — it would be a terrible career choice. :) All joking aside, I think the next revision should use this version, plus more comments and helpers/defines in a follow-up—which I’ll commit to—along with fixing the branch mismatch Andrew pointed out between drm-tip (which this series is based on) and 6.19 (where this patch needs to apply). Matt > - Alistair > > > > > Matt > > > > > > > > > > > +#ifdef NR_PAGES_IN_LARGE_FOLIO > > > > > > > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > > > > > +#endif > > > > > > > + new_folio->mapping = NULL; > > > > > > > + new_folio->pgmap = pgmap; /* Also clear compound head */ > > > > > > > + new_folio->share = 0; /* fsdax only, unused for device private */ > > > > > > > > > > > > It would be nice if the FS DAX code actually used this as well. Is there a > > > > > > reason that change was dropped from the series? > > > > > > > > > > > > > > > > I don't have a test platform for FS DAX. In prior revisions, I was just > > > > > moving existing FS DAX code to a helper, which I felt confident about. > > > > > > > > > > This revision is slightly different, and I don't feel comfortable > > > > > modifying FS DAX code without a test platform. I agree we should update > > > > > FS DAX, but that should be done in a follow-up with coordinated testing. > > > > > > Fair enough, I figured something like this might be your answer :-) You > > > could update it and ask people with access to such a system to test it though > > > (unfortunately my setup has bit-rotted beyond repair). > > > > > > But I'm ok leaving to for a future change. > > > > > > > I did a quick grep in fs/dax.c and don’t see zone_device_page_init > > called there. It probably could be used if it’s creating compound pages > > and drop the open-coded reinit when shared == 0, but yeah, that’s not > > something I can blindly code without testing. > > > > I can try to put something together for people to test soonish. > > > > Matt > > > > > > > > > > > > Matt > > > > > > > > > > > > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > > > > > > > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > > > > > > > + } > > > > > > > + > > > > > > > /* > > > > > > > * Drivers shouldn't be allocating pages after calling > > > > > > > * memunmap_pages(). > > > > > > > -- > > > > > > > 2.43.0 > > > > > > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-16 6:35 ` Matthew Brost @ 2026-01-16 16:39 ` Rodrigo Vivi 0 siblings, 0 replies; 33+ messages in thread From: Rodrigo Vivi @ 2026-01-16 16:39 UTC (permalink / raw) To: Matthew Brost, Madhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), linuxppc-dev, kvm, linux-kernel, David Hildenbrand, Oscar Salvador, linux-mm, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Maxime Ripard, Thomas Zimmermann, Maarten Lankhorst, Dave Airlie, Simona Vetter Cc: Alistair Popple, Francois Dugast, intel-xe, dri-devel, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Thu, Jan 15, 2026 at 10:35:56PM -0800, Matthew Brost wrote: > On Thu, Jan 15, 2026 at 10:05:00PM +1100, Alistair Popple wrote: > > On 2026-01-15 at 18:43 +1100, Matthew Brost <matthew.brost@intel.com> wrote... > > > On Thu, Jan 15, 2026 at 06:07:08PM +1100, Alistair Popple wrote: > > > > On 2026-01-15 at 17:18 +1100, Matthew Brost <matthew.brost@intel.com> wrote... > > > > > On Wed, Jan 14, 2026 at 09:57:31PM -0800, Matthew Brost wrote: > > > > > > On Thu, Jan 15, 2026 at 04:27:26PM +1100, Alistair Popple wrote: > > > > > > > On 2026-01-15 at 06:19 +1100, Francois Dugast <francois.dugast@intel.com> wrote... > > > > > > > > From: Matthew Brost <matthew.brost@intel.com> > > > > > > > > > > > > > > > > Reinitialize metadata for large zone device private folios in > > > > > > > > zone_device_page_init prior to creating a higher-order zone device > > > > > > > > private folio. This step is necessary when the folio’s order changes > > > > > > > > dynamically between zone_device_page_init calls to avoid building a > > > > > > > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > > > > > > > > must be passed in from the caller because the pgmap stored in the folio > > > > > > > > page may have been overwritten with a compound head. > > > > > > > > > > > > > > Thanks for fixing, a couple of minor comments below. > > > > > > > > > > > > > > > Cc: Zi Yan <ziy@nvidia.com> > > > > > > > > Cc: Alistair Popple <apopple@nvidia.com> > > > > > > > > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > > > > > > > > Cc: Nicholas Piggin <npiggin@gmail.com> > > > > > > > > Cc: Michael Ellerman <mpe@ellerman.id.au> > > > > > > > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > > > > > > > > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > > > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > > > > > Cc: "Christian König" <christian.koenig@amd.com> > > > > > > > > Cc: David Airlie <airlied@gmail.com> > > > > > > > > Cc: Simona Vetter <simona@ffwll.ch> > > > > > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > > > > > > > > Cc: Maxime Ripard <mripard@kernel.org> > > > > > > > > Cc: Thomas Zimmermann <tzimmermann@suse.de> > > > > > > > > Cc: Lyude Paul <lyude@redhat.com> > > > > > > > > Cc: Danilo Krummrich <dakr@kernel.org> > > > > > > > > Cc: David Hildenbrand <david@kernel.org> > > > > > > > > Cc: Oscar Salvador <osalvador@suse.de> > > > > > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > > > > > > Cc: Leon Romanovsky <leon@kernel.org> > > > > > > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > > > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > > > > > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > > > > > > Cc: Mike Rapoport <rppt@kernel.org> > > > > > > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > > > > > > Cc: Michal Hocko <mhocko@suse.com> > > > > > > > > Cc: Balbir Singh <balbirs@nvidia.com> > > > > > > > > Cc: linuxppc-dev@lists.ozlabs.org > > > > > > > > Cc: kvm@vger.kernel.org > > > > > > > > Cc: linux-kernel@vger.kernel.org > > > > > > > > Cc: amd-gfx@lists.freedesktop.org > > > > > > > > Cc: dri-devel@lists.freedesktop.org > > > > > > > > Cc: nouveau@lists.freedesktop.org > > > > > > > > Cc: linux-mm@kvack.org > > > > > > > > Cc: linux-cxl@vger.kernel.org > > > > > > > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > > > > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > > > > > > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > > > > > > > > --- > > > > > > > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > > > > > > > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > > > > > > > > drivers/gpu/drm/drm_pagemap.c | 2 +- > > > > > > > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > > > > > > > > include/linux/memremap.h | 9 ++++++--- > > > > > > > > lib/test_hmm.c | 4 +++- > > > > > > > > mm/memremap.c | 20 +++++++++++++++++++- > > > > > > > > 7 files changed, 32 insertions(+), 9 deletions(-) > > > > > > > > > > > > > > > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > > > > index e5000bef90f2..7cf9310de0ec 100644 > > > > > > > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > > > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > > > > > > > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > > > > > > > > > > > > > > > dpage = pfn_to_page(uvmem_pfn); > > > > > > > > dpage->zone_device_data = pvt; > > > > > > > > - zone_device_page_init(dpage, 0); > > > > > > > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > > > > > > > > return dpage; > > > > > > > > out_clear: > > > > > > > > spin_lock(&kvmppc_uvmem_bitmap_lock); > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > > > > index af53e796ea1b..6ada7b4af7c6 100644 > > > > > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > > > > > > > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > > > > > > > > page = pfn_to_page(pfn); > > > > > > > > svm_range_bo_ref(prange->svm_bo); > > > > > > > > page->zone_device_data = prange->svm_bo; > > > > > > > > - zone_device_page_init(page, 0); > > > > > > > > + zone_device_page_init(page, page_pgmap(page), 0); > > > > > > > > } > > > > > > > > > > > > > > > > static void > > > > > > > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > > > > > > > > index 03ee39a761a4..c497726b0147 100644 > > > > > > > > --- a/drivers/gpu/drm/drm_pagemap.c > > > > > > > > +++ b/drivers/gpu/drm/drm_pagemap.c > > > > > > > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > > > > > > > > struct drm_pagemap_zdd *zdd) > > > > > > > > { > > > > > > > > page->zone_device_data = drm_pagemap_zdd_get(zdd); > > > > > > > > - zone_device_page_init(page, 0); > > > > > > > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > > > > > > > > } > > > > > > > > > > > > > > > > /** > > > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > > > > index 58071652679d..3d8031296eed 100644 > > > > > > > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > > > > > > > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > > > > > > > > order = ilog2(DMEM_CHUNK_NPAGES); > > > > > > > > } > > > > > > > > > > > > > > > > - zone_device_folio_init(folio, order); > > > > > > > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > > > > > > > > return page; > > > > > > > > } > > > > > > > > > > > > > > > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > > > > > > > > index 713ec0435b48..e3c2ccf872a8 100644 > > > > > > > > --- a/include/linux/memremap.h > > > > > > > > +++ b/include/linux/memremap.h > > > > > > > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > > > > > > > > } > > > > > > > > > > > > > > > > #ifdef CONFIG_ZONE_DEVICE > > > > > > > > -void zone_device_page_init(struct page *page, unsigned int order); > > > > > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > > > > > + unsigned int order); > > > > > > > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > > > > > > > > void memunmap_pages(struct dev_pagemap *pgmap); > > > > > > > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > > > > > > > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > > > > > > > > > > > > > > > unsigned long memremap_compat_align(void); > > > > > > > > > > > > > > > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > > > > > > > > +static inline void zone_device_folio_init(struct folio *folio, > > > > > > > > + struct dev_pagemap *pgmap, > > > > > > > > + unsigned int order) > > > > > > > > { > > > > > > > > - zone_device_page_init(&folio->page, order); > > > > > > > > + zone_device_page_init(&folio->page, pgmap, order); > > > > > > > > if (order) > > > > > > > > folio_set_large_rmappable(folio); > > > > > > > > } > > > > > > > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > > > > > > > > index 8af169d3873a..455a6862ae50 100644 > > > > > > > > --- a/lib/test_hmm.c > > > > > > > > +++ b/lib/test_hmm.c > > > > > > > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > > > > > > > > goto error; > > > > > > > > } > > > > > > > > > > > > > > > > - zone_device_folio_init(page_folio(dpage), order); > > > > > > > > + zone_device_folio_init(page_folio(dpage), > > > > > > > > + page_pgmap(folio_page(page_folio(dpage), 0)), > > > > > > > > + order); > > > > > > > > dpage->zone_device_data = rpage; > > > > > > > > return dpage; > > > > > > > > > > > > > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > > > > > > > index 63c6ab4fdf08..6f46ab14662b 100644 > > > > > > > > --- a/mm/memremap.c > > > > > > > > +++ b/mm/memremap.c > > > > > > > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > -void zone_device_page_init(struct page *page, unsigned int order) > > > > > > > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > > > > > + unsigned int order) > > > > > > > > { > > > > > > > > + struct page *new_page = page; > > > > > > > > + unsigned int i; > > > > > > > > + > > > > > > > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > > > > > > > > > > > > > > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > > > > > > + struct folio *new_folio = (struct folio *)new_page; > > > > > > > > + > > > > > > > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > > > > > > > > > > > > This seems odd to me, mainly due to the "magic" number. Why not just clear > > > > > > > the flags entirely? Or at least explicitly just clear the flags you care about > > > > > > > which would remove the need for the comment and should let you use the proper > > > > > > > PageFlag functions. > > > > > > > > > > > > > > > > > > > I'm copying this from folio_reset_order [1]. My paranoia about touching > > > > > > anything related to struct page is high, so I did the same thing > > > > > > folio_reset_order does here. > > > > > > > > So why not just use folio_reset_order() below? > > > > > > > > > > > > > > > > [1] https://elixir.bootlin.com/linux/v6.18.5/source/include/linux/mm.h#L1075 > > > > > > > > > > > > > > > > This immediately hangs my first SVM test... > > > > > > > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > > > > index 6f46ab14662b..ef8c56876cf5 100644 > > > > > --- a/mm/memremap.c > > > > > +++ b/mm/memremap.c > > > > > @@ -488,7 +488,7 @@ void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > > > > for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > > > struct folio *new_folio = (struct folio *)new_page; > > > > > > > > > > - new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > > > + new_page->flags.f = 0; > > > > > #ifdef NR_PAGES_IN_LARGE_FOLIO > > > > > ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > > > > > > This seems wrong to me - I saw your reply to Balbir but for an order-0 page > > > > isn't this going to access a completely different, possibly already allocated, > > > > page? > > > > > > > > > > No — it accesses itself (new_page). It just uses some odd memory tricks > > > for this, which I agree isn’t the best thing I’ve ever written, but it > > > was the least-worst idea I had. I didn’t design the folio/page field > > > aliasing; I understand why it exists, but it still makes my head hurt. > > > > And obviously mine, because I (was) still not getting it and had typed up a > > whole response and code walk through to show what was wrong, in the hope it > > would help settle the misunderstanding. Which it did, because I discovered > > where I was getting things wrong. But I've left the analysis below because it's > > probably useful for others following along: > > > > Walking through the code we have: > > > > void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > > unsigned int order) > > { > > > > The first argument, page, is the first in a set of 1 << order contiguous > > struct page. In the simplest case order == 0, meaning this function should only > > initialise (ie. touch) a single struct page pointer which is passed as the first > > argument to the function. > > Yes. > > > > > struct page *new_page = page; > > > > So now *new_page points to the single struct page we should touch. > > > > unsigned int i; > > > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > > > for (i = 0; i < (1UL << order); ++i, ++new_page) { > > > > order == 0, so this loop will only execute once. > > > > Yes. > > > struct folio *new_folio = (struct folio *)new_page; > > > > new_page still points to the single page we're initialising, and new_folio > > points to the same page. Ie: &new_folio->page == new_page. There is a hazard > > here because new_folio->__page_1, __page_2, etc. all point to pages we shouldn't > > touch. > > > > Yes. > > > new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > > > > Clears the flags, makes sense. > > > > +1 > > > #ifdef NR_PAGES_IN_LARGE_FOLIO > > ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > > If we break this down we have: > > > > struct page *tmp_new_page = new_page - 1; > > > > Which is the page before the one we're initialising and shouldn't be touched. > > Then we cast to a folio: > > > > struct folio *tmp_new_folio = (struct folio *) tmp_new_page; > > > > And reset _nr_pages: > > > > tmp_new_folio->_nr_pages = 0 > > > > And now I can see where I was confused - &tmp_new_folio->_nr_pages == &tmp_new_folio->__page_1->memcg_data == &new_page->memcg_data > > > > Not 100% right, as _nr_pages is 4 bytes and memcg_data is 8, but the > pointer base address is the same. > > > So after both Balbir, probably yourself, and definitely myself scratching our > > heads for way too long over this change I think we can conclude that the code as > > is is way too confusing to merge without a lot more comments :-) > > > > I think more comments is the way to go. More below. > > > However why go through all this magic in the first place? Why not just treat > > everything here as a page and just do > > > > new_page->memcg_data = 0 > > > > Well, memcg_data is 8 bytes and _nr_pages is 4. They also have different > #ifdef conditions around each field, etc. > > I’ve also seen failures in our testing, and so has François, with the > memcg_data change. I wish I had a stack trace to share or explain, but > the times I hit the error I didn’t capture the dmesg, and I’ve been > having issues with my dev machine today. If I catch the error again, > I’ll reply with a stack trace and analysis. > > > directly? That seems like the more straight forward approach. In fact given > > all the confusion I wonder if it wouldn't be better to just do > > memset(new_page, 0, sizeof(*new_page)) and reinitialise everything from > > scratch. > > I had considered this option too, but I’d be a little concerned about > the performance. Reinitializing a zone page/folio is a hot path, as this > is typically done in a GPU fault handler. I think adding verbose > comments explaining why this works, plus some follow-up helpers, might > be the better option. > > > > > > folio->_nr_pages is page + 1 for reference (new_page after this math). > > > Again, if I touched this memory directly in new_page, it’s most likely > > > memcg_data, but that is hidden behind a Kconfig. > > > > > > This just blindly implementing part of folio_reset_order which clears > > > _nr_pages. > > > > Yeah, I get it now. But I think just clearing memcg_data would be the easiest to > > understand approach, especially if it had a comment explaining that it may have > > previously been used for _nr_pages. > > > > See above — the different sizes, the failure I’m seeing, and the > conflicting #ifdefs are why this is not my preferred option. > > > > > > #endif > > > > > > > > > > I can walk through exactly which flags need to be cleared, but my > > > > > feeling is that likely any flag that the order field overloads and can > > > > > possibly encode should be cleared—so bits 0–7 based on the existing > > > > > code. > > > > > > > > > > How about in a follow-up we normalize setting / clearing the order flag > > > > > field with a #define and an inline helper? > > > > > > > > Ie: Would something like the following work: > > > > > > > > ClearPageHead(new_page); > > > > > > Any of these bit could possibly be set the order field in a folio, which > > > modifies page + 1 flags field. > > > > > > PG_locked, /* Page is locked. Don't touch. */ > > > PG_writeback, /* Page is under writeback */ > > > PG_referenced, > > > PG_uptodate, > > > PG_dirty, > > > PG_lru, > > > PG_head, /* Must be in bit 6 */ > > > PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */ > > > > > > So a common order-9 (2MB) folio would have PG_locked | PG_uptodate set. > > > Now we get stuck on the next page lock because PG_locked is set. > > > Offhand, I don’t know if different orders—which set different bits—cause > > > any nasty issues either. So I figured the safest thing was clear any > > > bits which folio order can set within subsequent page's memory flags > > > like folio_reset_order does. > > > > Oh, I get the above. I was thinking folio_reset_order() below would clear the > > flags, but I see the folly there - that resets the flags for the next page. > > > > Correct. > > > > > > > > clear_compound_head(new_page); > > > > folio_reset_order(new_folio); > > > > > > > > Which would also deal with setting _nr_pages. > > > > > > > > > > folio_reset_order(new_folio) would set _nr_pages in the memory that is > > > new_page + 1. So let's say that page has a ref count + memcg_data, now > > > that memory is corrupted and will crash the kernel. > > > > Yep, I just noticed that. Thanks for pointing that out. > > > > > All of the above is why is took me multiple hours to write 6 lines of > > > code :). > > > > And to review :) Good thing we don't get paid per SLOC of code right? > > > > I don’t think anyone would touch core MM if pay were based on SLOC — it > would be a terrible career choice. :) > > All joking aside, I think the next revision should use this version, > plus more comments and helpers/defines in a follow-up—which I’ll commit > to—along with fixing the branch mismatch Andrew pointed out between > drm-tip (which this series is based on) and 6.19 (where this patch needs > to apply). I believe the best branch for this series would be drm-misc-next indeed. But this patch in particular needs multiple acks to get through drm trees. At least one from each block: ## arch/powerpc/kvm/book3s_hv_uvmem.c Madhavan Srinivasan <maddy@linux.ibm.com> (maintainer:KERNEL VIRTUAL MACHINE FOR POWERPC (KVM/powerpc)) Nicholas Piggin <npiggin@gmail.com> (reviewer:KERNEL VIRTUAL MACHINE FOR POWERPC (KVM/powerpc)) Michael Ellerman <mpe@ellerman.id.au> (maintainer:LINUX FOR POWERPC (32-BIT AND 64-BIT)) "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> (reviewer:LINUX FOR POWERPC (32-BIT AND 64-BIT)) ## include/linux/memremap.h David Hildenbrand <david@kernel.org> (maintainer:MEMORY HOT(UN)PLUG) Oscar Salvador <osalvador@suse.de> (maintainer:MEMORY HOT(UN)PLUG) ## lib/test_hmm.c Andrew Morton <akpm@linux-foundation.org> (maintainer:LIBRARY CODE) Jason Gunthorpe <jgg@ziepe.ca> (maintainer:HMM - Heterogeneous Memory Management) Leon Romanovsky <leon@kernel.org> (maintainer:HMM - Heterogeneous Memory Management) ## mm/memremap.c David Hildenbrand <david@kernel.org> (maintainer:MEMORY HOT(UN)PLUG) Oscar Salvador <osalvador@suse.de> (maintainer:MEMORY HOT(UN)PLUG) Andrew Morton <akpm@linux-foundation.org> (maintainer:MEMORY MANAGEMENT) On the other hand we would also need Max to do one extra last pull-request towards 7.0 after we get this merged. Because our window in drm closed earlier. Or this patch goes to any regular mm tree, and we hold the drm ones after we backmerge 7.0-rc1 > > Matt > > > - Alistair > > > > > > > Matt > > > > > > > > > > > > > +#ifdef NR_PAGES_IN_LARGE_FOLIO > > > > > > > > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > > > > > > > > +#endif > > > > > > > > + new_folio->mapping = NULL; > > > > > > > > + new_folio->pgmap = pgmap; /* Also clear compound head */ > > > > > > > > + new_folio->share = 0; /* fsdax only, unused for device private */ > > > > > > > > > > > > > > It would be nice if the FS DAX code actually used this as well. Is there a > > > > > > > reason that change was dropped from the series? > > > > > > > > > > > > > > > > > > > I don't have a test platform for FS DAX. In prior revisions, I was just > > > > > > moving existing FS DAX code to a helper, which I felt confident about. > > > > > > > > > > > > This revision is slightly different, and I don't feel comfortable > > > > > > modifying FS DAX code without a test platform. I agree we should update > > > > > > FS DAX, but that should be done in a follow-up with coordinated testing. > > > > > > > > Fair enough, I figured something like this might be your answer :-) You > > > > could update it and ask people with access to such a system to test it though > > > > (unfortunately my setup has bit-rotted beyond repair). > > > > > > > > But I'm ok leaving to for a future change. > > > > > > > > > > I did a quick grep in fs/dax.c and don’t see zone_device_page_init > > > called there. It probably could be used if it’s creating compound pages > > > and drop the open-coded reinit when shared == 0, but yeah, that’s not > > > something I can blindly code without testing. > > > > > > I can try to put something together for people to test soonish. > > > > > > Matt > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > > > > > > > > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > > > > > > > > + } > > > > > > > > + > > > > > > > > /* > > > > > > > > * Drivers shouldn't be allocating pages after calling > > > > > > > > * memunmap_pages(). > > > > > > > > -- > > > > > > > > 2.43.0 > > > > > > > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-15 6:18 ` Matthew Brost 2026-01-15 7:07 ` Alistair Popple @ 2026-01-16 16:13 ` Vlastimil Babka 1 sibling, 0 replies; 33+ messages in thread From: Vlastimil Babka @ 2026-01-16 16:13 UTC (permalink / raw) To: Matthew Brost, Alistair Popple Cc: Francois Dugast, intel-xe, dri-devel, Zi Yan, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On 1/15/26 07:18, Matthew Brost wrote: > On Wed, Jan 14, 2026 at 09:57:31PM -0800, Matthew Brost wrote: >> On Thu, Jan 15, 2026 at 04:27:26PM +1100, Alistair Popple wrote: >> > On 2026-01-15 at 06:19 +1100, Francois Dugast <francois.dugast@intel.com> wrote... >> > > From: Matthew Brost <matthew.brost@intel.com> >> > > >> > > Reinitialize metadata for large zone device private folios in >> > > zone_device_page_init prior to creating a higher-order zone device >> > > private folio. This step is necessary when the folio’s order changes >> > > dynamically between zone_device_page_init calls to avoid building a >> > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap >> > > must be passed in from the caller because the pgmap stored in the folio >> > > page may have been overwritten with a compound head. >> > >> > Thanks for fixing, a couple of minor comments below. >> > >> > > Cc: Zi Yan <ziy@nvidia.com> >> > > Cc: Alistair Popple <apopple@nvidia.com> >> > > Cc: adhavan Srinivasan <maddy@linux.ibm.com> >> > > Cc: Nicholas Piggin <npiggin@gmail.com> >> > > Cc: Michael Ellerman <mpe@ellerman.id.au> >> > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> >> > > Cc: Felix Kuehling <Felix.Kuehling@amd.com> >> > > Cc: Alex Deucher <alexander.deucher@amd.com> >> > > Cc: "Christian König" <christian.koenig@amd.com> >> > > Cc: David Airlie <airlied@gmail.com> >> > > Cc: Simona Vetter <simona@ffwll.ch> >> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> >> > > Cc: Maxime Ripard <mripard@kernel.org> >> > > Cc: Thomas Zimmermann <tzimmermann@suse.de> >> > > Cc: Lyude Paul <lyude@redhat.com> >> > > Cc: Danilo Krummrich <dakr@kernel.org> >> > > Cc: David Hildenbrand <david@kernel.org> >> > > Cc: Oscar Salvador <osalvador@suse.de> >> > > Cc: Andrew Morton <akpm@linux-foundation.org> >> > > Cc: Jason Gunthorpe <jgg@ziepe.ca> >> > > Cc: Leon Romanovsky <leon@kernel.org> >> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >> > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> >> > > Cc: Vlastimil Babka <vbabka@suse.cz> >> > > Cc: Mike Rapoport <rppt@kernel.org> >> > > Cc: Suren Baghdasaryan <surenb@google.com> >> > > Cc: Michal Hocko <mhocko@suse.com> >> > > Cc: Balbir Singh <balbirs@nvidia.com> >> > > Cc: linuxppc-dev@lists.ozlabs.org >> > > Cc: kvm@vger.kernel.org >> > > Cc: linux-kernel@vger.kernel.org >> > > Cc: amd-gfx@lists.freedesktop.org >> > > Cc: dri-devel@lists.freedesktop.org >> > > Cc: nouveau@lists.freedesktop.org >> > > Cc: linux-mm@kvack.org >> > > Cc: linux-cxl@vger.kernel.org >> > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") >> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> >> > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> >> > > --- >> > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- >> > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- >> > > drivers/gpu/drm/drm_pagemap.c | 2 +- >> > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- >> > > include/linux/memremap.h | 9 ++++++--- >> > > lib/test_hmm.c | 4 +++- >> > > mm/memremap.c | 20 +++++++++++++++++++- >> > > 7 files changed, 32 insertions(+), 9 deletions(-) >> > > >> > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c >> > > index e5000bef90f2..7cf9310de0ec 100644 >> > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c >> > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c >> > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) >> > > >> > > dpage = pfn_to_page(uvmem_pfn); >> > > dpage->zone_device_data = pvt; >> > > - zone_device_page_init(dpage, 0); >> > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); >> > > return dpage; >> > > out_clear: >> > > spin_lock(&kvmppc_uvmem_bitmap_lock); >> > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >> > > index af53e796ea1b..6ada7b4af7c6 100644 >> > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >> > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >> > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) >> > > page = pfn_to_page(pfn); >> > > svm_range_bo_ref(prange->svm_bo); >> > > page->zone_device_data = prange->svm_bo; >> > > - zone_device_page_init(page, 0); >> > > + zone_device_page_init(page, page_pgmap(page), 0); >> > > } >> > > >> > > static void >> > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c >> > > index 03ee39a761a4..c497726b0147 100644 >> > > --- a/drivers/gpu/drm/drm_pagemap.c >> > > +++ b/drivers/gpu/drm/drm_pagemap.c >> > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, >> > > struct drm_pagemap_zdd *zdd) >> > > { >> > > page->zone_device_data = drm_pagemap_zdd_get(zdd); >> > > - zone_device_page_init(page, 0); >> > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); >> > > } >> > > >> > > /** >> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c >> > > index 58071652679d..3d8031296eed 100644 >> > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c >> > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c >> > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) >> > > order = ilog2(DMEM_CHUNK_NPAGES); >> > > } >> > > >> > > - zone_device_folio_init(folio, order); >> > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); >> > > return page; >> > > } >> > > >> > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h >> > > index 713ec0435b48..e3c2ccf872a8 100644 >> > > --- a/include/linux/memremap.h >> > > +++ b/include/linux/memremap.h >> > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) >> > > } >> > > >> > > #ifdef CONFIG_ZONE_DEVICE >> > > -void zone_device_page_init(struct page *page, unsigned int order); >> > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, >> > > + unsigned int order); >> > > void *memremap_pages(struct dev_pagemap *pgmap, int nid); >> > > void memunmap_pages(struct dev_pagemap *pgmap); >> > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); >> > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); >> > > >> > > unsigned long memremap_compat_align(void); >> > > >> > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) >> > > +static inline void zone_device_folio_init(struct folio *folio, >> > > + struct dev_pagemap *pgmap, >> > > + unsigned int order) >> > > { >> > > - zone_device_page_init(&folio->page, order); >> > > + zone_device_page_init(&folio->page, pgmap, order); >> > > if (order) >> > > folio_set_large_rmappable(folio); >> > > } >> > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c >> > > index 8af169d3873a..455a6862ae50 100644 >> > > --- a/lib/test_hmm.c >> > > +++ b/lib/test_hmm.c >> > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, >> > > goto error; >> > > } >> > > >> > > - zone_device_folio_init(page_folio(dpage), order); >> > > + zone_device_folio_init(page_folio(dpage), >> > > + page_pgmap(folio_page(page_folio(dpage), 0)), >> > > + order); >> > > dpage->zone_device_data = rpage; >> > > return dpage; >> > > >> > > diff --git a/mm/memremap.c b/mm/memremap.c >> > > index 63c6ab4fdf08..6f46ab14662b 100644 >> > > --- a/mm/memremap.c >> > > +++ b/mm/memremap.c >> > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) >> > > } >> > > } >> > > >> > > -void zone_device_page_init(struct page *page, unsigned int order) >> > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, >> > > + unsigned int order) >> > > { >> > > + struct page *new_page = page; >> > > + unsigned int i; >> > > + >> > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); >> > > >> > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { >> > > + struct folio *new_folio = (struct folio *)new_page; >> > > + >> > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ >> > >> > This seems odd to me, mainly due to the "magic" number. Why not just clear >> > the flags entirely? Or at least explicitly just clear the flags you care about >> > which would remove the need for the comment and should let you use the proper >> > PageFlag functions. >> > >> >> I'm copying this from folio_reset_order [1]. My paranoia about touching >> anything related to struct page is high, so I did the same thing >> folio_reset_order does here. >> >> [1] https://elixir.bootlin.com/linux/v6.18.5/source/include/linux/mm.h#L1075 >> > > This immediately hangs my first SVM test... > > diff --git a/mm/memremap.c b/mm/memremap.c > index 6f46ab14662b..ef8c56876cf5 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -488,7 +488,7 @@ void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > for (i = 0; i < (1UL << order); ++i, ++new_page) { > struct folio *new_folio = (struct folio *)new_page; > > - new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > + new_page->flags.f = 0; > #ifdef NR_PAGES_IN_LARGE_FOLIO > ((struct folio *)(new_page - 1))->_nr_pages = 0; > #endif the flags field includes also zone and node id, so clearing that will break the pages's node+zone association, that's probably why... ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-14 19:19 ` [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast ` (3 preceding siblings ...) 2026-01-15 5:27 ` Alistair Popple @ 2026-01-16 16:43 ` Rodrigo Vivi 2026-01-16 18:07 ` Kuehling, Felix 4 siblings, 1 reply; 33+ messages in thread From: Rodrigo Vivi @ 2026-01-16 16:43 UTC (permalink / raw) To: Francois Dugast, Felix Kuehling, Alex Deucher, Christian König, Lyude Paul, Danilo Krummrich, nouveau, amd-gfx Cc: intel-xe, dri-devel, Matthew Brost, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), Felix Kuehling, Alex Deucher, Christian König, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Lyude Paul, Danilo Krummrich, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, amd-gfx, nouveau, linux-mm, linux-cxl On Wed, Jan 14, 2026 at 08:19:52PM +0100, Francois Dugast wrote: > From: Matthew Brost <matthew.brost@intel.com> > > Reinitialize metadata for large zone device private folios in > zone_device_page_init prior to creating a higher-order zone device > private folio. This step is necessary when the folio’s order changes > dynamically between zone_device_page_init calls to avoid building a > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > must be passed in from the caller because the pgmap stored in the folio > page may have been overwritten with a compound head. > > Cc: Zi Yan <ziy@nvidia.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: adhavan Srinivasan <maddy@linux.ibm.com> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> > Cc: Felix Kuehling <Felix.Kuehling@amd.com> > Cc: Alex Deucher <alexander.deucher@amd.com> > Cc: "Christian König" <christian.koenig@amd.com> > Cc: David Airlie <airlied@gmail.com> > Cc: Simona Vetter <simona@ffwll.ch> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > Cc: Maxime Ripard <mripard@kernel.org> > Cc: Thomas Zimmermann <tzimmermann@suse.de> > Cc: Lyude Paul <lyude@redhat.com> > Cc: Danilo Krummrich <dakr@kernel.org> > Cc: David Hildenbrand <david@kernel.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Leon Romanovsky <leon@kernel.org> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Mike Rapoport <rppt@kernel.org> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Balbir Singh <balbirs@nvidia.com> > Cc: linuxppc-dev@lists.ozlabs.org > Cc: kvm@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: amd-gfx@lists.freedesktop.org > Cc: dri-devel@lists.freedesktop.org > Cc: nouveau@lists.freedesktop.org > Cc: linux-mm@kvack.org > Cc: linux-cxl@vger.kernel.org > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- AMD folks, ack to get this patch through drm-misc-next or even perhaps some mm related tree? > drivers/gpu/drm/drm_pagemap.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- Nouveau folks, ack as well? > include/linux/memremap.h | 9 ++++++--- > lib/test_hmm.c | 4 +++- > mm/memremap.c | 20 +++++++++++++++++++- > 7 files changed, 32 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > index e5000bef90f2..7cf9310de0ec 100644 > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) > > dpage = pfn_to_page(uvmem_pfn); > dpage->zone_device_data = pvt; > - zone_device_page_init(dpage, 0); > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > return dpage; > out_clear: > spin_lock(&kvmppc_uvmem_bitmap_lock); > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > index af53e796ea1b..6ada7b4af7c6 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) > page = pfn_to_page(pfn); > svm_range_bo_ref(prange->svm_bo); > page->zone_device_data = prange->svm_bo; > - zone_device_page_init(page, 0); > + zone_device_page_init(page, page_pgmap(page), 0); > } > > static void > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > index 03ee39a761a4..c497726b0147 100644 > --- a/drivers/gpu/drm/drm_pagemap.c > +++ b/drivers/gpu/drm/drm_pagemap.c > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > struct drm_pagemap_zdd *zdd) > { > page->zone_device_data = drm_pagemap_zdd_get(zdd); > - zone_device_page_init(page, 0); > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > } > > /** > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c > index 58071652679d..3d8031296eed 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) > order = ilog2(DMEM_CHUNK_NPAGES); > } > > - zone_device_folio_init(folio, order); > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > return page; > } > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 713ec0435b48..e3c2ccf872a8 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > } > > #ifdef CONFIG_ZONE_DEVICE > -void zone_device_page_init(struct page *page, unsigned int order); > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > + unsigned int order); > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > void memunmap_pages(struct dev_pagemap *pgmap); > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); > > unsigned long memremap_compat_align(void); > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) > +static inline void zone_device_folio_init(struct folio *folio, > + struct dev_pagemap *pgmap, > + unsigned int order) > { > - zone_device_page_init(&folio->page, order); > + zone_device_page_init(&folio->page, pgmap, order); > if (order) > folio_set_large_rmappable(folio); > } > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > index 8af169d3873a..455a6862ae50 100644 > --- a/lib/test_hmm.c > +++ b/lib/test_hmm.c > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, > goto error; > } > > - zone_device_folio_init(page_folio(dpage), order); > + zone_device_folio_init(page_folio(dpage), > + page_pgmap(folio_page(page_folio(dpage), 0)), > + order); > dpage->zone_device_data = rpage; > return dpage; > > diff --git a/mm/memremap.c b/mm/memremap.c > index 63c6ab4fdf08..6f46ab14662b 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > } > } > > -void zone_device_page_init(struct page *page, unsigned int order) > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > + unsigned int order) > { > + struct page *new_page = page; > + unsigned int i; > + > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > + struct folio *new_folio = (struct folio *)new_page; > + > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ > +#ifdef NR_PAGES_IN_LARGE_FOLIO > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > +#endif > + new_folio->mapping = NULL; > + new_folio->pgmap = pgmap; /* Also clear compound head */ > + new_folio->share = 0; /* fsdax only, unused for device private */ > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > + } > + > /* > * Drivers shouldn't be allocating pages after calling > * memunmap_pages(). > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios 2026-01-16 16:43 ` Rodrigo Vivi @ 2026-01-16 18:07 ` Kuehling, Felix 0 siblings, 0 replies; 33+ messages in thread From: Kuehling, Felix @ 2026-01-16 18:07 UTC (permalink / raw) To: Rodrigo Vivi, Francois Dugast, Alex Deucher, Christian König, Lyude Paul, Danilo Krummrich, nouveau, amd-gfx Cc: intel-xe, dri-devel, Matthew Brost, Zi Yan, Alistair Popple, adhavan Srinivasan, Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP), David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Hildenbrand, Oscar Salvador, Andrew Morton, Jason Gunthorpe, Leon Romanovsky, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Balbir Singh, linuxppc-dev, kvm, linux-kernel, linux-mm, linux-cxl On 2026-01-16 11:43, Rodrigo Vivi wrote: > On Wed, Jan 14, 2026 at 08:19:52PM +0100, Francois Dugast wrote: >> From: Matthew Brost <matthew.brost@intel.com> >> >> Reinitialize metadata for large zone device private folios in >> zone_device_page_init prior to creating a higher-order zone device >> private folio. This step is necessary when the folio’s order changes >> dynamically between zone_device_page_init calls to avoid building a >> corrupt folio. As part of the metadata reinitialization, the dev_pagemap >> must be passed in from the caller because the pgmap stored in the folio >> page may have been overwritten with a compound head. >> >> Cc: Zi Yan <ziy@nvidia.com> >> Cc: Alistair Popple <apopple@nvidia.com> >> Cc: adhavan Srinivasan <maddy@linux.ibm.com> >> Cc: Nicholas Piggin <npiggin@gmail.com> >> Cc: Michael Ellerman <mpe@ellerman.id.au> >> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> >> Cc: Felix Kuehling <Felix.Kuehling@amd.com> >> Cc: Alex Deucher <alexander.deucher@amd.com> >> Cc: "Christian König" <christian.koenig@amd.com> >> Cc: David Airlie <airlied@gmail.com> >> Cc: Simona Vetter <simona@ffwll.ch> >> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> >> Cc: Maxime Ripard <mripard@kernel.org> >> Cc: Thomas Zimmermann <tzimmermann@suse.de> >> Cc: Lyude Paul <lyude@redhat.com> >> Cc: Danilo Krummrich <dakr@kernel.org> >> Cc: David Hildenbrand <david@kernel.org> >> Cc: Oscar Salvador <osalvador@suse.de> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Jason Gunthorpe <jgg@ziepe.ca> >> Cc: Leon Romanovsky <leon@kernel.org> >> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> >> Cc: Vlastimil Babka <vbabka@suse.cz> >> Cc: Mike Rapoport <rppt@kernel.org> >> Cc: Suren Baghdasaryan <surenb@google.com> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Balbir Singh <balbirs@nvidia.com> >> Cc: linuxppc-dev@lists.ozlabs.org >> Cc: kvm@vger.kernel.org >> Cc: linux-kernel@vger.kernel.org >> Cc: amd-gfx@lists.freedesktop.org >> Cc: dri-devel@lists.freedesktop.org >> Cc: nouveau@lists.freedesktop.org >> Cc: linux-mm@kvack.org >> Cc: linux-cxl@vger.kernel.org >> Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios") >> Signed-off-by: Matthew Brost <matthew.brost@intel.com> >> Signed-off-by: Francois Dugast <francois.dugast@intel.com> >> --- >> arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- >> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > AMD folks, ack to get this patch through drm-misc-next or even perhaps some mm > related tree? The kfd_migrate change looks straight-forward enough for me. We don't use large device pages yet, so I guess the changes in zone_device_page_init should be safe for us. Feel free to add my Acked-by: Felix Kuehling <felix.kuehling@amd.com> I have no objections to merging this through drm-misc-next. @Alex, how will this flow back into our amd-staging-drm-next branch? I guess we'll get it on the next branch rebase. There should be no rush as I don't think we're affected by the bug being fixed here. Thanks, Felix > >> drivers/gpu/drm/drm_pagemap.c | 2 +- >> drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > Nouveau folks, ack as well? > >> include/linux/memremap.h | 9 ++++++--- >> lib/test_hmm.c | 4 +++- >> mm/memremap.c | 20 +++++++++++++++++++- >> 7 files changed, 32 insertions(+), 9 deletions(-) >> >> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c >> index e5000bef90f2..7cf9310de0ec 100644 >> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c >> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c >> @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) >> >> dpage = pfn_to_page(uvmem_pfn); >> dpage->zone_device_data = pvt; >> - zone_device_page_init(dpage, 0); >> + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); >> return dpage; >> out_clear: >> spin_lock(&kvmppc_uvmem_bitmap_lock); >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >> index af53e796ea1b..6ada7b4af7c6 100644 >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >> @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn) >> page = pfn_to_page(pfn); >> svm_range_bo_ref(prange->svm_bo); >> page->zone_device_data = prange->svm_bo; >> - zone_device_page_init(page, 0); >> + zone_device_page_init(page, page_pgmap(page), 0); >> } >> >> static void >> diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c >> index 03ee39a761a4..c497726b0147 100644 >> --- a/drivers/gpu/drm/drm_pagemap.c >> +++ b/drivers/gpu/drm/drm_pagemap.c >> @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, >> struct drm_pagemap_zdd *zdd) >> { >> page->zone_device_data = drm_pagemap_zdd_get(zdd); >> - zone_device_page_init(page, 0); >> + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); >> } >> >> /** >> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c >> index 58071652679d..3d8031296eed 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c >> @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large) >> order = ilog2(DMEM_CHUNK_NPAGES); >> } >> >> - zone_device_folio_init(folio, order); >> + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); >> return page; >> } >> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >> index 713ec0435b48..e3c2ccf872a8 100644 >> --- a/include/linux/memremap.h >> +++ b/include/linux/memremap.h >> @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) >> } >> >> #ifdef CONFIG_ZONE_DEVICE >> -void zone_device_page_init(struct page *page, unsigned int order); >> +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, >> + unsigned int order); >> void *memremap_pages(struct dev_pagemap *pgmap, int nid); >> void memunmap_pages(struct dev_pagemap *pgmap); >> void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); >> @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn); >> >> unsigned long memremap_compat_align(void); >> >> -static inline void zone_device_folio_init(struct folio *folio, unsigned int order) >> +static inline void zone_device_folio_init(struct folio *folio, >> + struct dev_pagemap *pgmap, >> + unsigned int order) >> { >> - zone_device_page_init(&folio->page, order); >> + zone_device_page_init(&folio->page, pgmap, order); >> if (order) >> folio_set_large_rmappable(folio); >> } >> diff --git a/lib/test_hmm.c b/lib/test_hmm.c >> index 8af169d3873a..455a6862ae50 100644 >> --- a/lib/test_hmm.c >> +++ b/lib/test_hmm.c >> @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror, >> goto error; >> } >> >> - zone_device_folio_init(page_folio(dpage), order); >> + zone_device_folio_init(page_folio(dpage), >> + page_pgmap(folio_page(page_folio(dpage), 0)), >> + order); >> dpage->zone_device_data = rpage; >> return dpage; >> >> diff --git a/mm/memremap.c b/mm/memremap.c >> index 63c6ab4fdf08..6f46ab14662b 100644 >> --- a/mm/memremap.c >> +++ b/mm/memremap.c >> @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) >> } >> } >> >> -void zone_device_page_init(struct page *page, unsigned int order) >> +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, >> + unsigned int order) >> { >> + struct page *new_page = page; >> + unsigned int i; >> + >> VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); >> >> + for (i = 0; i < (1UL << order); ++i, ++new_page) { >> + struct folio *new_folio = (struct folio *)new_page; >> + >> + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ >> +#ifdef NR_PAGES_IN_LARGE_FOLIO >> + ((struct folio *)(new_page - 1))->_nr_pages = 0; >> +#endif >> + new_folio->mapping = NULL; >> + new_folio->pgmap = pgmap; /* Also clear compound head */ >> + new_folio->share = 0; /* fsdax only, unused for device private */ >> + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); >> + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); >> + } >> + >> /* >> * Drivers shouldn't be allocating pages after calling >> * memunmap_pages(). >> -- >> 2.43.0 >> ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v5 2/5] drm/pagemap: Unlock and put folios when possible 2026-01-14 19:19 [PATCH v5 0/5] Enable THP support in drm_pagemap Francois Dugast 2026-01-14 19:19 ` [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast @ 2026-01-14 19:19 ` Francois Dugast 2026-01-15 2:41 ` Balbir Singh 2026-01-14 19:19 ` [PATCH v5 3/5] drm/pagemap: Add helper to access zone_device_data Francois Dugast ` (2 subsequent siblings) 4 siblings, 1 reply; 33+ messages in thread From: Francois Dugast @ 2026-01-14 19:19 UTC (permalink / raw) To: intel-xe Cc: dri-devel, Francois Dugast, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Zi Yan, Alistair Popple, Balbir Singh, linux-mm, Matthew Brost If the page is part of a folio, unlock and put the whole folio at once instead of individual pages one after the other. This will reduce the amount of operations once device THP are in use. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: linux-mm@kvack.org Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> --- drivers/gpu/drm/drm_pagemap.c | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c index c497726b0147..b31090b8e97c 100644 --- a/drivers/gpu/drm/drm_pagemap.c +++ b/drivers/gpu/drm/drm_pagemap.c @@ -154,15 +154,15 @@ static void drm_pagemap_zdd_put(struct drm_pagemap_zdd *zdd) } /** - * drm_pagemap_migration_unlock_put_page() - Put a migration page - * @page: Pointer to the page to put + * drm_pagemap_migration_unlock_put_folio() - Put a migration folio + * @folio: Pointer to the folio to put * - * This function unlocks and puts a page. + * This function unlocks and puts a folio. */ -static void drm_pagemap_migration_unlock_put_page(struct page *page) +static void drm_pagemap_migration_unlock_put_folio(struct folio *folio) { - unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio); } /** @@ -177,15 +177,23 @@ static void drm_pagemap_migration_unlock_put_pages(unsigned long npages, { unsigned long i; - for (i = 0; i < npages; ++i) { + for (i = 0; i < npages;) { struct page *page; + struct folio *folio; + unsigned int order = 0; if (!migrate_pfn[i]) - continue; + goto next; page = migrate_pfn_to_page(migrate_pfn[i]); - drm_pagemap_migration_unlock_put_page(page); + folio = page_folio(page); + order = folio_order(folio); + + drm_pagemap_migration_unlock_put_folio(folio); migrate_pfn[i] = 0; + +next: + i += NR_PAGES(order); } } -- 2.43.0 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 2/5] drm/pagemap: Unlock and put folios when possible 2026-01-14 19:19 ` [PATCH v5 2/5] drm/pagemap: Unlock and put folios when possible Francois Dugast @ 2026-01-15 2:41 ` Balbir Singh 2026-01-15 2:54 ` Matthew Brost 0 siblings, 1 reply; 33+ messages in thread From: Balbir Singh @ 2026-01-15 2:41 UTC (permalink / raw) To: Francois Dugast, intel-xe Cc: dri-devel, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Zi Yan, Alistair Popple, linux-mm, Matthew Brost On 1/15/26 06:19, Francois Dugast wrote: > If the page is part of a folio, unlock and put the whole folio at once > instead of individual pages one after the other. This will reduce the > amount of operations once device THP are in use. > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: David Hildenbrand <david@kernel.org> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Mike Rapoport <rppt@kernel.org> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: Balbir Singh <balbirs@nvidia.com> > Cc: linux-mm@kvack.org > Suggested-by: Matthew Brost <matthew.brost@intel.com> > Reviewed-by: Matthew Brost <matthew.brost@intel.com> > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > --- > drivers/gpu/drm/drm_pagemap.c | 26 +++++++++++++++++--------- > 1 file changed, 17 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > index c497726b0147..b31090b8e97c 100644 > --- a/drivers/gpu/drm/drm_pagemap.c > +++ b/drivers/gpu/drm/drm_pagemap.c > @@ -154,15 +154,15 @@ static void drm_pagemap_zdd_put(struct drm_pagemap_zdd *zdd) > } > > /** > - * drm_pagemap_migration_unlock_put_page() - Put a migration page > - * @page: Pointer to the page to put > + * drm_pagemap_migration_unlock_put_folio() - Put a migration folio > + * @folio: Pointer to the folio to put > * > - * This function unlocks and puts a page. > + * This function unlocks and puts a folio. > */ > -static void drm_pagemap_migration_unlock_put_page(struct page *page) > +static void drm_pagemap_migration_unlock_put_folio(struct folio *folio) > { > - unlock_page(page); > - put_page(page); > + folio_unlock(folio); > + folio_put(folio); > } > > /** > @@ -177,15 +177,23 @@ static void drm_pagemap_migration_unlock_put_pages(unsigned long npages, > { > unsigned long i; > > - for (i = 0; i < npages; ++i) { > + for (i = 0; i < npages;) { > struct page *page; > + struct folio *folio; > + unsigned int order = 0; > > if (!migrate_pfn[i]) > - continue; > + goto next; > > page = migrate_pfn_to_page(migrate_pfn[i]); > - drm_pagemap_migration_unlock_put_page(page); > + folio = page_folio(page); > + order = folio_order(folio); > + > + drm_pagemap_migration_unlock_put_folio(folio); > migrate_pfn[i] = 0; > + > +next: > + i += NR_PAGES(order); Is this just a wrapper on top of folio_nr_pages()? > } > } > Reviewed-by: Balbir Singh <balbirs@nvidia.com> ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 2/5] drm/pagemap: Unlock and put folios when possible 2026-01-15 2:41 ` Balbir Singh @ 2026-01-15 2:54 ` Matthew Brost 0 siblings, 0 replies; 33+ messages in thread From: Matthew Brost @ 2026-01-15 2:54 UTC (permalink / raw) To: Balbir Singh Cc: Francois Dugast, intel-xe, dri-devel, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Zi Yan, Alistair Popple, linux-mm On Thu, Jan 15, 2026 at 01:41:11PM +1100, Balbir Singh wrote: > On 1/15/26 06:19, Francois Dugast wrote: > > If the page is part of a folio, unlock and put the whole folio at once > > instead of individual pages one after the other. This will reduce the > > amount of operations once device THP are in use. > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: David Hildenbrand <david@kernel.org> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > Cc: Vlastimil Babka <vbabka@suse.cz> > > Cc: Mike Rapoport <rppt@kernel.org> > > Cc: Suren Baghdasaryan <surenb@google.com> > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: Zi Yan <ziy@nvidia.com> > > Cc: Alistair Popple <apopple@nvidia.com> > > Cc: Balbir Singh <balbirs@nvidia.com> > > Cc: linux-mm@kvack.org > > Suggested-by: Matthew Brost <matthew.brost@intel.com> > > Reviewed-by: Matthew Brost <matthew.brost@intel.com> > > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > > --- > > drivers/gpu/drm/drm_pagemap.c | 26 +++++++++++++++++--------- > > 1 file changed, 17 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > > index c497726b0147..b31090b8e97c 100644 > > --- a/drivers/gpu/drm/drm_pagemap.c > > +++ b/drivers/gpu/drm/drm_pagemap.c > > @@ -154,15 +154,15 @@ static void drm_pagemap_zdd_put(struct drm_pagemap_zdd *zdd) > > } > > > > /** > > - * drm_pagemap_migration_unlock_put_page() - Put a migration page > > - * @page: Pointer to the page to put > > + * drm_pagemap_migration_unlock_put_folio() - Put a migration folio > > + * @folio: Pointer to the folio to put > > * > > - * This function unlocks and puts a page. > > + * This function unlocks and puts a folio. > > */ > > -static void drm_pagemap_migration_unlock_put_page(struct page *page) > > +static void drm_pagemap_migration_unlock_put_folio(struct folio *folio) > > { > > - unlock_page(page); > > - put_page(page); > > + folio_unlock(folio); > > + folio_put(folio); > > } > > > > /** > > @@ -177,15 +177,23 @@ static void drm_pagemap_migration_unlock_put_pages(unsigned long npages, > > { > > unsigned long i; > > > > - for (i = 0; i < npages; ++i) { > > + for (i = 0; i < npages;) { > > struct page *page; > > + struct folio *folio; > > + unsigned int order = 0; > > > > if (!migrate_pfn[i]) > > - continue; > > + goto next; > > > > page = migrate_pfn_to_page(migrate_pfn[i]); > > - drm_pagemap_migration_unlock_put_page(page); > > + folio = page_folio(page); > > + order = folio_order(folio); > > + > > + drm_pagemap_migration_unlock_put_folio(folio); > > migrate_pfn[i] = 0; > > + > > +next: > > + i += NR_PAGES(order); > > Is this just a wrapper on top of folio_nr_pages()? > We might not have folio order. This is a macro defined here [1]. There probably is a similar macro elsewhere in kernel that does the same thing, I can look for that and clean this up in a follow up if I can find one. [1] https://elixir.bootlin.com/linux/v6.19-rc5/source/include/drm/drm_pagemap.h#L9 > > } > > } > > > > Reviewed-by: Balbir Singh <balbirs@nvidia.com> Thanks! Matt ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v5 3/5] drm/pagemap: Add helper to access zone_device_data 2026-01-14 19:19 [PATCH v5 0/5] Enable THP support in drm_pagemap Francois Dugast 2026-01-14 19:19 ` [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast 2026-01-14 19:19 ` [PATCH v5 2/5] drm/pagemap: Unlock and put folios when possible Francois Dugast @ 2026-01-14 19:19 ` Francois Dugast 2026-01-14 19:19 ` [PATCH v5 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast 2026-01-14 19:19 ` [PATCH v5 5/5] drm/pagemap: Enable THP support for GPU memory migration Francois Dugast 4 siblings, 0 replies; 33+ messages in thread From: Francois Dugast @ 2026-01-14 19:19 UTC (permalink / raw) To: intel-xe Cc: dri-devel, Francois Dugast, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Zi Yan, Alistair Popple, Balbir Singh, linux-mm, Matthew Brost This new helper helps ensure all accesses to zone_device_data use the correct API whether the page is part of a folio or not. v2: - Move to drm_pagemap.h, stick to folio_zone_device_data (Matthew Brost) - Return struct drm_pagemap_zdd * (Matthew Brost) Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: linux-mm@kvack.org Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> --- drivers/gpu/drm/drm_gpusvm.c | 7 +++++-- drivers/gpu/drm/drm_pagemap.c | 21 ++++++++++++--------- include/drm/drm_pagemap.h | 15 +++++++++++++++ 3 files changed, 32 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c index aa9a0b60e727..585d913d3d19 100644 --- a/drivers/gpu/drm/drm_gpusvm.c +++ b/drivers/gpu/drm/drm_gpusvm.c @@ -1488,12 +1488,15 @@ int drm_gpusvm_get_pages(struct drm_gpusvm *gpusvm, order = drm_gpusvm_hmm_pfn_to_order(pfns[i], i, npages); if (is_device_private_page(page) || is_device_coherent_page(page)) { + struct drm_pagemap_zdd *__zdd = + drm_pagemap_page_zone_device_data(page); + if (!ctx->allow_mixed && - zdd != page->zone_device_data && i > 0) { + zdd != __zdd && i > 0) { err = -EOPNOTSUPP; goto err_unmap; } - zdd = page->zone_device_data; + zdd = __zdd; if (pagemap != page_pgmap(page)) { if (i > 0) { err = -EOPNOTSUPP; diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c index b31090b8e97c..f613b4d48499 100644 --- a/drivers/gpu/drm/drm_pagemap.c +++ b/drivers/gpu/drm/drm_pagemap.c @@ -252,7 +252,7 @@ static int drm_pagemap_migrate_map_pages(struct device *dev, order = folio_order(folio); if (is_device_private_page(page)) { - struct drm_pagemap_zdd *zdd = page->zone_device_data; + struct drm_pagemap_zdd *zdd = drm_pagemap_page_zone_device_data(page); struct drm_pagemap *dpagemap = zdd->dpagemap; struct drm_pagemap_addr addr; @@ -323,7 +323,7 @@ static void drm_pagemap_migrate_unmap_pages(struct device *dev, goto next; if (is_zone_device_page(page)) { - struct drm_pagemap_zdd *zdd = page->zone_device_data; + struct drm_pagemap_zdd *zdd = drm_pagemap_page_zone_device_data(page); struct drm_pagemap *dpagemap = zdd->dpagemap; dpagemap->ops->device_unmap(dpagemap, dev, pagemap_addr[i]); @@ -611,7 +611,8 @@ int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation, pages[i] = NULL; if (src_page && is_device_private_page(src_page)) { - struct drm_pagemap_zdd *src_zdd = src_page->zone_device_data; + struct drm_pagemap_zdd *src_zdd = + drm_pagemap_page_zone_device_data(src_page); if (page_pgmap(src_page) == pagemap && !mdetails->can_migrate_same_pagemap) { @@ -733,8 +734,8 @@ static int drm_pagemap_migrate_populate_ram_pfn(struct vm_area_struct *vas, goto next; if (fault_page) { - if (src_page->zone_device_data != - fault_page->zone_device_data) + if (drm_pagemap_page_zone_device_data(src_page) != + drm_pagemap_page_zone_device_data(fault_page)) goto next; } @@ -1075,7 +1076,7 @@ static int __drm_pagemap_migrate_to_ram(struct vm_area_struct *vas, void *buf; int i, err = 0; - zdd = page->zone_device_data; + zdd = drm_pagemap_page_zone_device_data(page); if (time_before64(get_jiffies_64(), zdd->devmem_allocation->timeslice_expiration)) return 0; @@ -1158,7 +1159,9 @@ static int __drm_pagemap_migrate_to_ram(struct vm_area_struct *vas, */ static void drm_pagemap_folio_free(struct folio *folio) { - drm_pagemap_zdd_put(folio->page.zone_device_data); + struct page *page = folio_page(folio, 0); + + drm_pagemap_zdd_put(drm_pagemap_page_zone_device_data(page)); } /** @@ -1174,7 +1177,7 @@ static void drm_pagemap_folio_free(struct folio *folio) */ static vm_fault_t drm_pagemap_migrate_to_ram(struct vm_fault *vmf) { - struct drm_pagemap_zdd *zdd = vmf->page->zone_device_data; + struct drm_pagemap_zdd *zdd = drm_pagemap_page_zone_device_data(vmf->page); int err; err = __drm_pagemap_migrate_to_ram(vmf->vma, @@ -1240,7 +1243,7 @@ EXPORT_SYMBOL_GPL(drm_pagemap_devmem_init); */ struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page *page) { - struct drm_pagemap_zdd *zdd = page->zone_device_data; + struct drm_pagemap_zdd *zdd = drm_pagemap_page_zone_device_data(page); return zdd->devmem_allocation->dpagemap; } diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h index 46e9c58f09e0..736fb6cb7b33 100644 --- a/include/drm/drm_pagemap.h +++ b/include/drm/drm_pagemap.h @@ -4,6 +4,7 @@ #include <linux/dma-direction.h> #include <linux/hmm.h> +#include <linux/memremap.h> #include <linux/types.h> #define NR_PAGES(order) (1U << (order)) @@ -359,4 +360,18 @@ int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap, void drm_pagemap_destroy(struct drm_pagemap *dpagemap, bool is_atomic_or_reclaim); int drm_pagemap_reinit(struct drm_pagemap *dpagemap); + +/** + * drm_pagemap_page_zone_device_data() - Page to zone_device_data + * @page: Pointer to the page + * + * Return: Page's zone_device_data + */ +static inline struct drm_pagemap_zdd *drm_pagemap_page_zone_device_data(struct page *page) +{ + struct folio *folio = page_folio(page); + + return folio_zone_device_data(folio); +} + #endif -- 2.43.0 ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v5 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup 2026-01-14 19:19 [PATCH v5 0/5] Enable THP support in drm_pagemap Francois Dugast ` (2 preceding siblings ...) 2026-01-14 19:19 ` [PATCH v5 3/5] drm/pagemap: Add helper to access zone_device_data Francois Dugast @ 2026-01-14 19:19 ` Francois Dugast 2026-01-15 7:48 ` Balbir Singh 2026-01-14 19:19 ` [PATCH v5 5/5] drm/pagemap: Enable THP support for GPU memory migration Francois Dugast 4 siblings, 1 reply; 33+ messages in thread From: Francois Dugast @ 2026-01-14 19:19 UTC (permalink / raw) To: intel-xe Cc: dri-devel, Matthew Brost, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Zi Yan, Alistair Popple, Balbir Singh, linux-mm, Francois Dugast From: Matthew Brost <matthew.brost@intel.com> cpages returned from migrate_vma_setup represents the total number of individual pages found, not the number of 4K pages. The math in drm_pagemap_migrate_to_devmem for npages is based on the number of 4K pages, so cpages != npages can fail even if the entire memory range is found in migrate_vma_setup (e.g., when a single 2M page is found). Add drm_pagemap_cpages, which converts cpages to the number of 4K pages found. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: linux-mm@kvack.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> --- drivers/gpu/drm/drm_pagemap.c | 38 ++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c index f613b4d48499..3fc466f04b13 100644 --- a/drivers/gpu/drm/drm_pagemap.c +++ b/drivers/gpu/drm/drm_pagemap.c @@ -452,6 +452,41 @@ static int drm_pagemap_migrate_range(struct drm_pagemap_devmem *devmem, return ret; } +/** + * drm_pagemap_cpages() - Count collected pages + * @migrate_pfn: Array of migrate_pfn entries to account + * @npages: Number of entries in @migrate_pfn + * + * Compute the total number of minimum-sized pages represented by the + * collected entries in @migrate_pfn. The total is derived from the + * order encoded in each entry. + * + * Return: Total number of minimum-sized pages. + */ +static int drm_pagemap_cpages(unsigned long *migrate_pfn, unsigned long npages) +{ + unsigned long i, cpages = 0; + + for (i = 0; i < npages;) { + struct page *page = migrate_pfn_to_page(migrate_pfn[i]); + struct folio *folio; + unsigned int order = 0; + + if (page) { + folio = page_folio(page); + order = folio_order(folio); + cpages += NR_PAGES(order); + } else if (migrate_pfn[i] & MIGRATE_PFN_COMPOUND) { + order = HPAGE_PMD_ORDER; + cpages += NR_PAGES(order); + } + + i += NR_PAGES(order); + } + + return cpages; +} + /** * drm_pagemap_migrate_to_devmem() - Migrate a struct mm_struct range to device memory * @devmem_allocation: The device memory allocation to migrate to. @@ -564,7 +599,8 @@ int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation, goto err_free; } - if (migrate.cpages != npages) { + if (migrate.cpages != npages && + drm_pagemap_cpages(migrate.src, npages) != npages) { /* * Some pages to migrate. But we want to migrate all or * nothing. Raced or unknown device pages. -- 2.43.0 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup 2026-01-14 19:19 ` [PATCH v5 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast @ 2026-01-15 7:48 ` Balbir Singh 0 siblings, 0 replies; 33+ messages in thread From: Balbir Singh @ 2026-01-15 7:48 UTC (permalink / raw) To: Francois Dugast, intel-xe Cc: dri-devel, Matthew Brost, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Zi Yan, Alistair Popple, linux-mm On 1/15/26 06:19, Francois Dugast wrote: > From: Matthew Brost <matthew.brost@intel.com> > > cpages returned from migrate_vma_setup represents the total number of > individual pages found, not the number of 4K pages. The math in > drm_pagemap_migrate_to_devmem for npages is based on the number of 4K > pages, so cpages != npages can fail even if the entire memory range is > found in migrate_vma_setup (e.g., when a single 2M page is found). > Add drm_pagemap_cpages, which converts cpages to the number of 4K pages > found. > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: David Hildenbrand <david@kernel.org> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Mike Rapoport <rppt@kernel.org> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: Balbir Singh <balbirs@nvidia.com> > Cc: linux-mm@kvack.org > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > Reviewed-by: Francois Dugast <francois.dugast@intel.com> > Signed-off-by: Francois Dugast <francois.dugast@intel.com> > --- > drivers/gpu/drm/drm_pagemap.c | 38 ++++++++++++++++++++++++++++++++++- > 1 file changed, 37 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > index f613b4d48499..3fc466f04b13 100644 > --- a/drivers/gpu/drm/drm_pagemap.c > +++ b/drivers/gpu/drm/drm_pagemap.c > @@ -452,6 +452,41 @@ static int drm_pagemap_migrate_range(struct drm_pagemap_devmem *devmem, > return ret; > } > > +/** > + * drm_pagemap_cpages() - Count collected pages > + * @migrate_pfn: Array of migrate_pfn entries to account > + * @npages: Number of entries in @migrate_pfn > + * > + * Compute the total number of minimum-sized pages represented by the > + * collected entries in @migrate_pfn. The total is derived from the > + * order encoded in each entry. > + * > + * Return: Total number of minimum-sized pages. > + */ > +static int drm_pagemap_cpages(unsigned long *migrate_pfn, unsigned long npages) > +{ > + unsigned long i, cpages = 0; > + > + for (i = 0; i < npages;) { > + struct page *page = migrate_pfn_to_page(migrate_pfn[i]); > + struct folio *folio; > + unsigned int order = 0; > + > + if (page) { > + folio = page_folio(page); > + order = folio_order(folio); > + cpages += NR_PAGES(order); > + } else if (migrate_pfn[i] & MIGRATE_PFN_COMPOUND) { > + order = HPAGE_PMD_ORDER; > + cpages += NR_PAGES(order); > + } > + > + i += NR_PAGES(order); > + } > + > + return cpages; > +} > + > /** > * drm_pagemap_migrate_to_devmem() - Migrate a struct mm_struct range to device memory > * @devmem_allocation: The device memory allocation to migrate to. > @@ -564,7 +599,8 @@ int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation, > goto err_free; > } > > - if (migrate.cpages != npages) { > + if (migrate.cpages != npages && > + drm_pagemap_cpages(migrate.src, npages) != npages) { > /* > * Some pages to migrate. But we want to migrate all or > * nothing. Raced or unknown device pages. Reviewed-by: Balbir Singh <balbirs@nvidia.com> ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v5 5/5] drm/pagemap: Enable THP support for GPU memory migration 2026-01-14 19:19 [PATCH v5 0/5] Enable THP support in drm_pagemap Francois Dugast ` (3 preceding siblings ...) 2026-01-14 19:19 ` [PATCH v5 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast @ 2026-01-14 19:19 ` Francois Dugast 4 siblings, 0 replies; 33+ messages in thread From: Francois Dugast @ 2026-01-14 19:19 UTC (permalink / raw) To: intel-xe Cc: dri-devel, Francois Dugast, Matthew Brost, Thomas Hellström, Michal Mrozek, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Zi Yan, Alistair Popple, Balbir Singh, linux-mm This enables support for Transparent Huge Pages (THP) for device pages by using MIGRATE_VMA_SELECT_COMPOUND during migration. It removes the need to split folios and loop multiple times over all pages to perform required operations at page level. Instead, we rely on newly introduced support for higher orders in drm_pagemap and folio-level API. In Xe, this drastically improves performance when using SVM. The GT stats below collected after a 2MB page fault show overall servicing is more than 7 times faster, and thanks to reduced CPU overhead the time spent on the actual copy goes from 23% without THP to 80% with THP: Without THP: svm_2M_pagefault_us: 966 svm_2M_migrate_us: 942 svm_2M_device_copy_us: 223 svm_2M_get_pages_us: 9 svm_2M_bind_us: 10 With THP: svm_2M_pagefault_us: 132 svm_2M_migrate_us: 128 svm_2M_device_copy_us: 106 svm_2M_get_pages_us: 1 svm_2M_bind_us: 2 v2: - Fix one occurrence of drm_pagemap_get_devmem_page() (Matthew Brost) v3: - Remove migrate_device_split_page() and folio_split_lock, instead rely on free_zone_device_folio() to split folios before freeing (Matthew Brost) - Assert folio order is HPAGE_PMD_ORDER (Matthew Brost) - Always use folio_set_zone_device_data() in split (Matthew Brost) v4: - Warn on compound device page, s/continue/goto next/ (Matthew Brost) v5: - Revert warn on compound device page - s/zone_device_page_init()/zone_device_folio_init() (Matthew Brost) Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: linux-mm@kvack.org Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> --- drivers/gpu/drm/drm_pagemap.c | 73 ++++++++++++++++++++++++++++++----- 1 file changed, 63 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c index 3fc466f04b13..e2aecd519f14 100644 --- a/drivers/gpu/drm/drm_pagemap.c +++ b/drivers/gpu/drm/drm_pagemap.c @@ -200,16 +200,19 @@ static void drm_pagemap_migration_unlock_put_pages(unsigned long npages, /** * drm_pagemap_get_devmem_page() - Get a reference to a device memory page * @page: Pointer to the page + * @order: Order * @zdd: Pointer to the GPU SVM zone device data * * This function associates the given page with the specified GPU SVM zone * device data and initializes it for zone device usage. */ static void drm_pagemap_get_devmem_page(struct page *page, + unsigned int order, struct drm_pagemap_zdd *zdd) { - page->zone_device_data = drm_pagemap_zdd_get(zdd); - zone_device_page_init(page, zdd->dpagemap->pagemap, 0); + zone_device_folio_init((struct folio *)page, zdd->dpagemap->pagemap, + order); + folio_set_zone_device_data(page_folio(page), drm_pagemap_zdd_get(zdd)); } /** @@ -534,7 +537,8 @@ int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation, * rare and only occur when the madvise attributes of memory are * changed or atomics are being used. */ - .flags = MIGRATE_VMA_SELECT_SYSTEM | MIGRATE_VMA_SELECT_DEVICE_COHERENT, + .flags = MIGRATE_VMA_SELECT_SYSTEM | MIGRATE_VMA_SELECT_DEVICE_COHERENT | + MIGRATE_VMA_SELECT_COMPOUND, }; unsigned long i, npages = npages_in_range(start, end); unsigned long own_pages = 0, migrated_pages = 0; @@ -640,11 +644,13 @@ int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation, own_pages = 0; - for (i = 0; i < npages; ++i) { + for (i = 0; i < npages;) { + unsigned long j; struct page *page = pfn_to_page(migrate.dst[i]); struct page *src_page = migrate_pfn_to_page(migrate.src[i]); - cur.start = i; + unsigned int order = 0; + cur.start = i; pages[i] = NULL; if (src_page && is_device_private_page(src_page)) { struct drm_pagemap_zdd *src_zdd = @@ -654,7 +660,7 @@ int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation, !mdetails->can_migrate_same_pagemap) { migrate.dst[i] = 0; own_pages++; - continue; + goto next; } if (mdetails->source_peer_migrates) { cur.dpagemap = src_zdd->dpagemap; @@ -670,7 +676,20 @@ int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation, pages[i] = page; } migrate.dst[i] = migrate_pfn(migrate.dst[i]); - drm_pagemap_get_devmem_page(page, zdd); + + if (migrate.src[i] & MIGRATE_PFN_COMPOUND) { + drm_WARN_ONCE(dpagemap->drm, src_page && + folio_order(page_folio(src_page)) != HPAGE_PMD_ORDER, + "Unexpected folio order\n"); + + order = HPAGE_PMD_ORDER; + migrate.dst[i] |= MIGRATE_PFN_COMPOUND; + + for (j = 1; j < NR_PAGES(order) && i + j < npages; j++) + migrate.dst[i + j] = 0; + } + + drm_pagemap_get_devmem_page(page, order, zdd); /* If we switched the migrating drm_pagemap, migrate previous pages now */ err = drm_pagemap_migrate_range(devmem_allocation, migrate.src, migrate.dst, @@ -680,7 +699,11 @@ int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation, npages = i + 1; goto err_finalize; } + +next: + i += NR_PAGES(order); } + cur.start = npages; cur.ops = NULL; /* Force migration */ err = drm_pagemap_migrate_range(devmem_allocation, migrate.src, migrate.dst, @@ -789,6 +812,8 @@ static int drm_pagemap_migrate_populate_ram_pfn(struct vm_area_struct *vas, page = folio_page(folio, 0); mpfn[i] = migrate_pfn(page_to_pfn(page)); + if (order) + mpfn[i] |= MIGRATE_PFN_COMPOUND; next: if (page) addr += page_size(page); @@ -1044,8 +1069,15 @@ int drm_pagemap_evict_to_ram(struct drm_pagemap_devmem *devmem_allocation) if (err) goto err_finalize; - for (i = 0; i < npages; ++i) + for (i = 0; i < npages;) { + unsigned int order = 0; + pages[i] = migrate_pfn_to_page(src[i]); + if (pages[i]) + order = folio_order(page_folio(pages[i])); + + i += NR_PAGES(order); + } err = ops->copy_to_ram(pages, pagemap_addr, npages, NULL); if (err) @@ -1098,7 +1130,8 @@ static int __drm_pagemap_migrate_to_ram(struct vm_area_struct *vas, .vma = vas, .pgmap_owner = page_pgmap(page)->owner, .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE | - MIGRATE_VMA_SELECT_DEVICE_COHERENT, + MIGRATE_VMA_SELECT_DEVICE_COHERENT | + MIGRATE_VMA_SELECT_COMPOUND, .fault_page = page, }; struct drm_pagemap_migrate_details mdetails = {}; @@ -1164,8 +1197,15 @@ static int __drm_pagemap_migrate_to_ram(struct vm_area_struct *vas, if (err) goto err_finalize; - for (i = 0; i < npages; ++i) + for (i = 0; i < npages;) { + unsigned int order = 0; + pages[i] = migrate_pfn_to_page(migrate.src[i]); + if (pages[i]) + order = folio_order(page_folio(pages[i])); + + i += NR_PAGES(order); + } err = ops->copy_to_ram(pages, pagemap_addr, npages, NULL); if (err) @@ -1223,9 +1263,22 @@ static vm_fault_t drm_pagemap_migrate_to_ram(struct vm_fault *vmf) return err ? VM_FAULT_SIGBUS : 0; } +static void drm_pagemap_folio_split(struct folio *orig_folio, struct folio *new_folio) +{ + struct drm_pagemap_zdd *zdd; + + if (!new_folio) + return; + + new_folio->pgmap = orig_folio->pgmap; + zdd = folio_zone_device_data(orig_folio); + folio_set_zone_device_data(new_folio, drm_pagemap_zdd_get(zdd)); +} + static const struct dev_pagemap_ops drm_pagemap_pagemap_ops = { .folio_free = drm_pagemap_folio_free, .migrate_to_ram = drm_pagemap_migrate_to_ram, + .folio_split = drm_pagemap_folio_split, }; /** -- 2.43.0 ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2026-01-16 18:07 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-01-14 19:19 [PATCH v5 0/5] Enable THP support in drm_pagemap Francois Dugast 2026-01-14 19:19 ` [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast 2026-01-14 21:48 ` Andrew Morton 2026-01-14 23:34 ` Matthew Brost 2026-01-14 23:51 ` Matthew Brost 2026-01-15 2:40 ` Andrew Morton 2026-01-15 2:50 ` Matthew Brost 2026-01-15 2:36 ` Balbir Singh 2026-01-15 2:41 ` Matthew Brost 2026-01-15 7:13 ` Alistair Popple 2026-01-15 7:57 ` Matthew Brost 2026-01-15 3:01 ` Andrew Morton 2026-01-15 3:07 ` Matthew Brost 2026-01-15 4:05 ` Matthew Brost 2026-01-15 5:27 ` Alistair Popple 2026-01-15 5:57 ` Matthew Brost 2026-01-15 6:18 ` Matthew Brost 2026-01-15 7:07 ` Alistair Popple 2026-01-15 7:39 ` Balbir Singh 2026-01-15 7:43 ` Matthew Brost 2026-01-15 11:05 ` Alistair Popple 2026-01-16 6:35 ` Matthew Brost 2026-01-16 16:39 ` Rodrigo Vivi 2026-01-16 16:13 ` Vlastimil Babka 2026-01-16 16:43 ` Rodrigo Vivi 2026-01-16 18:07 ` Kuehling, Felix 2026-01-14 19:19 ` [PATCH v5 2/5] drm/pagemap: Unlock and put folios when possible Francois Dugast 2026-01-15 2:41 ` Balbir Singh 2026-01-15 2:54 ` Matthew Brost 2026-01-14 19:19 ` [PATCH v5 3/5] drm/pagemap: Add helper to access zone_device_data Francois Dugast 2026-01-14 19:19 ` [PATCH v5 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast 2026-01-15 7:48 ` Balbir Singh 2026-01-14 19:19 ` [PATCH v5 5/5] drm/pagemap: Enable THP support for GPU memory migration Francois Dugast
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox