From: Matthew Brost <matthew.brost@intel.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: "Francois Dugast" <francois.dugast@intel.com>,
intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
"Zi Yan" <ziy@nvidia.com>,
"adhavan Srinivasan" <maddy@linux.ibm.com>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Michael Ellerman" <mpe@ellerman.id.au>,
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
"Felix Kuehling" <Felix.Kuehling@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"Lyude Paul" <lyude@redhat.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Hildenbrand" <david@kernel.org>,
"Oscar Salvador" <osalvador@suse.de>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Jason Gunthorpe" <jgg@ziepe.ca>,
"Leon Romanovsky" <leon@kernel.org>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Mike Rapoport" <rppt@kernel.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Michal Hocko" <mhocko@suse.com>,
"Balbir Singh" <balbirs@nvidia.com>,
linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
nouveau@lists.freedesktop.org, linux-mm@kvack.org,
linux-cxl@vger.kernel.org
Subject: Re: [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios
Date: Wed, 14 Jan 2026 22:18:30 -0800 [thread overview]
Message-ID: <aWiGtlKI3LOtjUl6@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <aWiBy3nZ4FrPYURF@lstrano-desk.jf.intel.com>
On Wed, Jan 14, 2026 at 09:57:31PM -0800, Matthew Brost wrote:
> On Thu, Jan 15, 2026 at 04:27:26PM +1100, Alistair Popple wrote:
> > On 2026-01-15 at 06:19 +1100, Francois Dugast <francois.dugast@intel.com> wrote...
> > > From: Matthew Brost <matthew.brost@intel.com>
> > >
> > > Reinitialize metadata for large zone device private folios in
> > > zone_device_page_init prior to creating a higher-order zone device
> > > private folio. This step is necessary when the folio’s order changes
> > > dynamically between zone_device_page_init calls to avoid building a
> > > corrupt folio. As part of the metadata reinitialization, the dev_pagemap
> > > must be passed in from the caller because the pgmap stored in the folio
> > > page may have been overwritten with a compound head.
> >
> > Thanks for fixing, a couple of minor comments below.
> >
> > > Cc: Zi Yan <ziy@nvidia.com>
> > > Cc: Alistair Popple <apopple@nvidia.com>
> > > Cc: adhavan Srinivasan <maddy@linux.ibm.com>
> > > Cc: Nicholas Piggin <npiggin@gmail.com>
> > > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > > Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
> > > Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > Cc: "Christian König" <christian.koenig@amd.com>
> > > Cc: David Airlie <airlied@gmail.com>
> > > Cc: Simona Vetter <simona@ffwll.ch>
> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > Cc: Maxime Ripard <mripard@kernel.org>
> > > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > > Cc: Lyude Paul <lyude@redhat.com>
> > > Cc: Danilo Krummrich <dakr@kernel.org>
> > > Cc: David Hildenbrand <david@kernel.org>
> > > Cc: Oscar Salvador <osalvador@suse.de>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Leon Romanovsky <leon@kernel.org>
> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Cc: Mike Rapoport <rppt@kernel.org>
> > > Cc: Suren Baghdasaryan <surenb@google.com>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Balbir Singh <balbirs@nvidia.com>
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Cc: kvm@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: amd-gfx@lists.freedesktop.org
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: nouveau@lists.freedesktop.org
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-cxl@vger.kernel.org
> > > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios")
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> > > ---
> > > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +-
> > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +-
> > > drivers/gpu/drm/drm_pagemap.c | 2 +-
> > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +-
> > > include/linux/memremap.h | 9 ++++++---
> > > lib/test_hmm.c | 4 +++-
> > > mm/memremap.c | 20 +++++++++++++++++++-
> > > 7 files changed, 32 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > > index e5000bef90f2..7cf9310de0ec 100644
> > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
> > >
> > > dpage = pfn_to_page(uvmem_pfn);
> > > dpage->zone_device_data = pvt;
> > > - zone_device_page_init(dpage, 0);
> > > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0);
> > > return dpage;
> > > out_clear:
> > > spin_lock(&kvmppc_uvmem_bitmap_lock);
> > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> > > index af53e796ea1b..6ada7b4af7c6 100644
> > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> > > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn)
> > > page = pfn_to_page(pfn);
> > > svm_range_bo_ref(prange->svm_bo);
> > > page->zone_device_data = prange->svm_bo;
> > > - zone_device_page_init(page, 0);
> > > + zone_device_page_init(page, page_pgmap(page), 0);
> > > }
> > >
> > > static void
> > > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
> > > index 03ee39a761a4..c497726b0147 100644
> > > --- a/drivers/gpu/drm/drm_pagemap.c
> > > +++ b/drivers/gpu/drm/drm_pagemap.c
> > > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page,
> > > struct drm_pagemap_zdd *zdd)
> > > {
> > > page->zone_device_data = drm_pagemap_zdd_get(zdd);
> > > - zone_device_page_init(page, 0);
> > > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0);
> > > }
> > >
> > > /**
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> > > index 58071652679d..3d8031296eed 100644
> > > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> > > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, bool is_large)
> > > order = ilog2(DMEM_CHUNK_NPAGES);
> > > }
> > >
> > > - zone_device_folio_init(folio, order);
> > > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order);
> > > return page;
> > > }
> > >
> > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> > > index 713ec0435b48..e3c2ccf872a8 100644
> > > --- a/include/linux/memremap.h
> > > +++ b/include/linux/memremap.h
> > > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page)
> > > }
> > >
> > > #ifdef CONFIG_ZONE_DEVICE
> > > -void zone_device_page_init(struct page *page, unsigned int order);
> > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap,
> > > + unsigned int order);
> > > void *memremap_pages(struct dev_pagemap *pgmap, int nid);
> > > void memunmap_pages(struct dev_pagemap *pgmap);
> > > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap);
> > > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn);
> > >
> > > unsigned long memremap_compat_align(void);
> > >
> > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int order)
> > > +static inline void zone_device_folio_init(struct folio *folio,
> > > + struct dev_pagemap *pgmap,
> > > + unsigned int order)
> > > {
> > > - zone_device_page_init(&folio->page, order);
> > > + zone_device_page_init(&folio->page, pgmap, order);
> > > if (order)
> > > folio_set_large_rmappable(folio);
> > > }
> > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c
> > > index 8af169d3873a..455a6862ae50 100644
> > > --- a/lib/test_hmm.c
> > > +++ b/lib/test_hmm.c
> > > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror *dmirror,
> > > goto error;
> > > }
> > >
> > > - zone_device_folio_init(page_folio(dpage), order);
> > > + zone_device_folio_init(page_folio(dpage),
> > > + page_pgmap(folio_page(page_folio(dpage), 0)),
> > > + order);
> > > dpage->zone_device_data = rpage;
> > > return dpage;
> > >
> > > diff --git a/mm/memremap.c b/mm/memremap.c
> > > index 63c6ab4fdf08..6f46ab14662b 100644
> > > --- a/mm/memremap.c
> > > +++ b/mm/memremap.c
> > > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio)
> > > }
> > > }
> > >
> > > -void zone_device_page_init(struct page *page, unsigned int order)
> > > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap,
> > > + unsigned int order)
> > > {
> > > + struct page *new_page = page;
> > > + unsigned int i;
> > > +
> > > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES);
> > >
> > > + for (i = 0; i < (1UL << order); ++i, ++new_page) {
> > > + struct folio *new_folio = (struct folio *)new_page;
> > > +
> > > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */
> >
> > This seems odd to me, mainly due to the "magic" number. Why not just clear
> > the flags entirely? Or at least explicitly just clear the flags you care about
> > which would remove the need for the comment and should let you use the proper
> > PageFlag functions.
> >
>
> I'm copying this from folio_reset_order [1]. My paranoia about touching
> anything related to struct page is high, so I did the same thing
> folio_reset_order does here.
>
> [1] https://elixir.bootlin.com/linux/v6.18.5/source/include/linux/mm.h#L1075
>
This immediately hangs my first SVM test...
diff --git a/mm/memremap.c b/mm/memremap.c
index 6f46ab14662b..ef8c56876cf5 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -488,7 +488,7 @@ void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap,
for (i = 0; i < (1UL << order); ++i, ++new_page) {
struct folio *new_folio = (struct folio *)new_page;
- new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */
+ new_page->flags.f = 0;
#ifdef NR_PAGES_IN_LARGE_FOLIO
((struct folio *)(new_page - 1))->_nr_pages = 0;
#endif
I can walk through exactly which flags need to be cleared, but my
feeling is that likely any flag that the order field overloads and can
possibly encode should be cleared—so bits 0–7 based on the existing
code.
How about in a follow-up we normalize setting / clearing the order flag
field with a #define and an inline helper?
Matt
> > > +#ifdef NR_PAGES_IN_LARGE_FOLIO
> > > + ((struct folio *)(new_page - 1))->_nr_pages = 0;
> > > +#endif
> > > + new_folio->mapping = NULL;
> > > + new_folio->pgmap = pgmap; /* Also clear compound head */
> > > + new_folio->share = 0; /* fsdax only, unused for device private */
> >
> > It would be nice if the FS DAX code actually used this as well. Is there a
> > reason that change was dropped from the series?
> >
>
> I don't have a test platform for FS DAX. In prior revisions, I was just
> moving existing FS DAX code to a helper, which I felt confident about.
>
> This revision is slightly different, and I don't feel comfortable
> modifying FS DAX code without a test platform. I agree we should update
> FS DAX, but that should be done in a follow-up with coordinated testing.
>
> Matt
>
> > > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio);
> > > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio);
> > > + }
> > > +
> > > /*
> > > * Drivers shouldn't be allocating pages after calling
> > > * memunmap_pages().
> > > --
> > > 2.43.0
> > >
next prev parent reply other threads:[~2026-01-15 6:18 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-14 19:19 [PATCH v5 0/5] Enable THP support in drm_pagemap Francois Dugast
2026-01-14 19:19 ` [PATCH v5 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast
2026-01-14 21:48 ` Andrew Morton
2026-01-14 23:34 ` Matthew Brost
2026-01-14 23:51 ` Matthew Brost
2026-01-15 2:40 ` Andrew Morton
2026-01-15 2:50 ` Matthew Brost
2026-01-15 2:36 ` Balbir Singh
2026-01-15 2:41 ` Matthew Brost
2026-01-15 7:13 ` Alistair Popple
2026-01-15 7:57 ` Matthew Brost
2026-01-15 3:01 ` Andrew Morton
2026-01-15 3:07 ` Matthew Brost
2026-01-15 4:05 ` Matthew Brost
2026-01-15 5:27 ` Alistair Popple
2026-01-15 5:57 ` Matthew Brost
2026-01-15 6:18 ` Matthew Brost [this message]
2026-01-15 7:07 ` Alistair Popple
2026-01-15 7:39 ` Balbir Singh
2026-01-15 7:43 ` Matthew Brost
2026-01-15 11:05 ` Alistair Popple
2026-01-16 6:35 ` Matthew Brost
2026-01-16 16:39 ` Rodrigo Vivi
2026-01-16 16:13 ` Vlastimil Babka
2026-01-16 16:43 ` Rodrigo Vivi
2026-01-16 18:07 ` Kuehling, Felix
2026-01-14 19:19 ` [PATCH v5 2/5] drm/pagemap: Unlock and put folios when possible Francois Dugast
2026-01-15 2:41 ` Balbir Singh
2026-01-15 2:54 ` Matthew Brost
2026-01-14 19:19 ` [PATCH v5 3/5] drm/pagemap: Add helper to access zone_device_data Francois Dugast
2026-01-14 19:19 ` [PATCH v5 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast
2026-01-15 7:48 ` Balbir Singh
2026-01-14 19:19 ` [PATCH v5 5/5] drm/pagemap: Enable THP support for GPU memory migration Francois Dugast
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWiGtlKI3LOtjUl6@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=Felix.Kuehling@amd.com \
--cc=Liam.Howlett@oracle.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=apopple@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=chleroy@kernel.org \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=david@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jgg@ziepe.ca \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lyude@redhat.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=maddy@linux.ibm.com \
--cc=mhocko@suse.com \
--cc=mpe@ellerman.id.au \
--cc=mripard@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=npiggin@gmail.com \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=simona@ffwll.ch \
--cc=surenb@google.com \
--cc=tzimmermann@suse.de \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox