From: Matthew Brost <matthew.brost@intel.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Francois Dugast <francois.dugast@intel.com>,
<intel-xe@lists.freedesktop.org>,
<dri-devel@lists.freedesktop.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
Balbir Singh <balbirs@nvidia.com>, <linux-mm@kvack.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH 1/4] mm/migrate: Add migrate_device_split_page
Date: Wed, 7 Jan 2026 13:15:48 -0800 [thread overview]
Message-ID: <aV7NBE3NS1wdsXBo@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <95BD5D5B-C8EB-4EFA-A895-CFD660504485@nvidia.com>
On Wed, Jan 07, 2026 at 03:38:35PM -0500, Zi Yan wrote:
> On 7 Jan 2026, at 15:20, Zi Yan wrote:
>
> > +THP folks
>
> +willy, since he commented in another thread.
>
> >
> > On 16 Dec 2025, at 15:10, Francois Dugast wrote:
> >
> >> From: Matthew Brost <matthew.brost@intel.com>
> >>
> >> Introduce migrate_device_split_page() to split a device page into
> >> lower-order pages. Used when a folio allocated as higher-order is freed
> >> and later reallocated at a smaller order by the driver memory manager.
> >>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Balbir Singh <balbirs@nvidia.com>
> >> Cc: dri-devel@lists.freedesktop.org
> >> Cc: linux-mm@kvack.org
> >> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> >> ---
> >> include/linux/huge_mm.h | 3 +++
> >> include/linux/migrate.h | 1 +
> >> mm/huge_memory.c | 6 ++---
> >> mm/migrate_device.c | 49 +++++++++++++++++++++++++++++++++++++++++
> >> 4 files changed, 56 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >> index a4d9f964dfde..6ad8f359bc0d 100644
> >> --- a/include/linux/huge_mm.h
> >> +++ b/include/linux/huge_mm.h
> >> @@ -374,6 +374,9 @@ int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list
> >> int folio_split_unmapped(struct folio *folio, unsigned int new_order);
> >> unsigned int min_order_for_split(struct folio *folio);
> >> int split_folio_to_list(struct folio *folio, struct list_head *list);
> >> +int __split_unmapped_folio(struct folio *folio, int new_order,
> >> + struct page *split_at, struct xa_state *xas,
> >> + struct address_space *mapping, enum split_type split_type);
> >> int folio_check_splittable(struct folio *folio, unsigned int new_order,
> >> enum split_type split_type);
> >> int folio_split(struct folio *folio, unsigned int new_order, struct page *page,
> >> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> >> index 26ca00c325d9..ec65e4fd5f88 100644
> >> --- a/include/linux/migrate.h
> >> +++ b/include/linux/migrate.h
> >> @@ -192,6 +192,7 @@ void migrate_device_pages(unsigned long *src_pfns, unsigned long *dst_pfns,
> >> unsigned long npages);
> >> void migrate_device_finalize(unsigned long *src_pfns,
> >> unsigned long *dst_pfns, unsigned long npages);
> >> +int migrate_device_split_page(struct page *page);
> >>
> >> #endif /* CONFIG_MIGRATION */
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 40cf59301c21..7ded35a3ecec 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -3621,9 +3621,9 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
> >> * Return: 0 - successful, <0 - failed (if -ENOMEM is returned, @folio might be
> >> * split but not to @new_order, the caller needs to check)
> >> */
> >> -static int __split_unmapped_folio(struct folio *folio, int new_order,
> >> - struct page *split_at, struct xa_state *xas,
> >> - struct address_space *mapping, enum split_type split_type)
> >> +int __split_unmapped_folio(struct folio *folio, int new_order,
> >> + struct page *split_at, struct xa_state *xas,
> >> + struct address_space *mapping, enum split_type split_type)
> >> {
> >> const bool is_anon = folio_test_anon(folio);
> >> int old_order = folio_order(folio);
> >> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> >> index 23379663b1e1..eb0f0e938947 100644
> >> --- a/mm/migrate_device.c
> >> +++ b/mm/migrate_device.c
> >> @@ -775,6 +775,49 @@ int migrate_vma_setup(struct migrate_vma *args)
> >> EXPORT_SYMBOL(migrate_vma_setup);
> >>
> >> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> >> +/**
> >> + * migrate_device_split_page() - Split device page
> >> + * @page: Device page to split
> >> + *
> >> + * Splits a device page into smaller pages. Typically called when reallocating a
> >> + * folio to a smaller size. Inherently racy—only safe if the caller ensures
> >> + * mutual exclusion within the page's folio (i.e., no other threads are using
> >> + * pages within the folio). Expected to be called a free device page and
> >> + * restores all split out pages to a free state.
> >> + */
>
> Do you mind explaining why __split_unmapped_folio() is needed for a free device
> page? A free page is not supposed to be a large folio, at least from a core
> MM point of view. __split_unmapped_folio() is intended to work on large folios
> (or compound pages), even if the input folio has refcount == 0 (because it is
> frozen).
>
Well, then maybe this is a bug in core MM where the freed page is still
a THP. Let me explain the scenario and why this is needed from my POV.
Our VRAM allocator in Xe (and several other DRM drivers) is DRM buddy.
This is a shared pool between traditional DRM GEMs (buffer objects) and
SVM allocations (pages). It doesn’t have any view of the page backing—it
basically just hands back a pointer to VRAM space that we allocate from.
From that, if it’s an SVM allocation, we can derive the device pages.
What I see happening is: a 2M buddy allocation occurs, we make the
backing device pages a large folio, and sometime later the folio
refcount goes to zero and we free the buddy allocation. Later, the buddy
allocation is reused for a smaller allocation (e.g., 4K or 64K), but the
backing pages are still a large folio. Here is where we need to split
the folio into 4K pages so we can properly migrate the pages via the
migrate_vma_* calls. Also note: if you call zone_device_page_init with
an order of zero on a large device folio, that also blows up.
Open to other ideas here for how to handle this scenario.
> >> +int migrate_device_split_page(struct page *page)
> >> +{
> >> + struct folio *folio = page_folio(page);
> >> + struct dev_pagemap *pgmap = folio->pgmap;
> >> + struct page *unlock_page = folio_page(folio, 0);
> >> + unsigned int order = folio_order(folio), i;
> >> + int ret = 0;
> >> +
> >> + VM_BUG_ON_FOLIO(!order, folio);
> >> + VM_BUG_ON_FOLIO(!folio_is_device_private(folio), folio);
> >> + VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);
>
> Please use VM_WARN_ON_FOLIO() instead to catch errors. There is no need to crash
> the kernel
>
Sure.
> >> +
> >> + folio_lock(folio);
> >> +
> >> + ret = __split_unmapped_folio(folio, 0, page, NULL, NULL, SPLIT_TYPE_UNIFORM);
> >> + if (ret) {
> >> + /*
> >> + * We can't fail here unless the caller doesn't know what they
> >> + * are doing.
> >> + */
> >> + VM_BUG_ON_FOLIO(ret, folio);
>
> Same here.
>
Will do.
Matt
> >> +
> >> + return ret;
> >> + }
> >> +
> >> + for (i = 0; i < 0x1 << order; ++i, ++unlock_page) {
> >> + page_folio(unlock_page)->pgmap = pgmap;
> >> + folio_unlock(page_folio(unlock_page));
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> /**
> >> * migrate_vma_insert_huge_pmd_page: Insert a huge folio into @migrate->vma->vm_mm
> >> * at @addr. folio is already allocated as a part of the migration process with
> >> @@ -927,6 +970,11 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
> >> return ret;
> >> }
> >> #else /* !CONFIG_ARCH_ENABLE_THP_MIGRATION */
> >> +int migrate_device_split_page(struct page *page)
> >> +{
> >> + return 0;
> >> +}
> >> +
> >> static int migrate_vma_insert_huge_pmd_page(struct migrate_vma *migrate,
> >> unsigned long addr,
> >> struct page *page,
> >> @@ -943,6 +991,7 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
> >> return 0;
> >> }
> >> #endif
> >> +EXPORT_SYMBOL(migrate_device_split_page);
> >>
> >> static unsigned long migrate_vma_nr_pages(unsigned long *src)
> >> {
> >> --
> >> 2.43.0
> >
> >
> > Best Regards,
> > Yan, Zi
>
>
> Best Regards,
> Yan, Zi
next prev parent reply other threads:[~2026-01-07 21:16 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251216201206.1660899-1-francois.dugast@intel.com>
2025-12-16 20:10 ` Francois Dugast
2025-12-16 20:34 ` Matthew Wilcox
2025-12-16 21:39 ` Matthew Brost
2026-01-06 2:39 ` Matthew Brost
2026-01-07 20:15 ` Zi Yan
2026-01-07 20:20 ` Zi Yan
2026-01-07 20:38 ` Zi Yan
2026-01-07 21:15 ` Matthew Brost [this message]
2026-01-07 22:03 ` Zi Yan
2026-01-08 0:56 ` Balbir Singh
2026-01-08 2:17 ` Matthew Brost
2026-01-08 2:53 ` Zi Yan
2026-01-08 3:14 ` Alistair Popple
2026-01-08 3:42 ` Matthew Brost
2026-01-08 4:47 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aV7NBE3NS1wdsXBo@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=balbirs@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lance.yang@linux.dev \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox