[PATCH v3 1/7] mm: Add folio_split

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 1/7] mm: Add folio_split_unref helper
       [not found] <20260109085605.443316-1-francois.dugast@intel.com>
@ 2026-01-09  8:54 ` Francois Dugast
  2026-01-09 13:19   ` David Hildenbrand (Red Hat)
  2026-01-09  8:54 ` [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing Francois Dugast
  1 sibling, 1 reply; 27+ messages in thread
From: Francois Dugast @ 2026-01-09  8:54 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, Matthew Brost, Balbir Singh, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, linux-mm, linux-kernel, Alistair Popple,
	Francois Dugast

From: Matthew Brost <matthew.brost@intel.com>

Add folio_split_unref helper which splits an unreferenced folio
(refcount == 0) into individual pages. Intended to be called on special
pages (e.g., device-private, DAX, etc.) when returning the folio to the
free page pool.

Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
---
 include/linux/huge_mm.h |  1 +
 mm/huge_memory.c        | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a4d9f964dfde..18cb9728d8f1 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -369,6 +369,7 @@ enum split_type {
 	SPLIT_TYPE_NON_UNIFORM,
 };
 
+void folio_split_unref(struct folio *folio);
 int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 		unsigned int new_order);
 int folio_split_unmapped(struct folio *folio, unsigned int new_order);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 40cf59301c21..0eb9e6ad8639 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3580,6 +3580,45 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
 		ClearPageCompound(&folio->page);
 }
 
+/**
+ * folio_split_unref() - split an unreferenced folio (refcount == 0)
+ * @folio: the to-be-split folio
+ *
+ * Split an unreferenced folio (refcount == 0) into individual pages.
+ * Intended to be called on special pages (e.g., device-private, DAX, etc.)
+ * when returning the folio to the free page pool.
+ */
+void folio_split_unref(struct folio *folio)
+{
+	struct dev_pagemap *pgmap = page_pgmap(&folio->page);
+	int order, i;
+
+	folio->mapping = NULL;
+	order = folio_order(folio);
+	if (!order)
+		return;
+
+	folio_reset_order(folio);
+
+	for (i = 0; i < (1UL << order); i++) {
+		struct page *page = folio_page(folio, i);
+		struct folio *new_folio = (struct folio *)page;
+
+		ClearPageHead(page);
+		clear_compound_head(page);
+
+		new_folio->mapping = NULL;
+		/*
+		 * Reset pgmap which was over-written by
+		 * prep_compound_page().
+		 */
+		new_folio->pgmap = pgmap;
+		new_folio->share = 0;	/* fsdax only, unused for device private */
+		VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio);
+	}
+}
+EXPORT_SYMBOL_GPL(folio_split_unref);
+
 /**
  * __split_unmapped_folio() - splits an unmapped @folio to lower order folios in
  * two ways: uniform split or non-uniform split.
-- 
2.43.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
       [not found] <20260109085605.443316-1-francois.dugast@intel.com>
  2026-01-09  8:54 ` [PATCH v3 1/7] mm: Add folio_split_unref helper Francois Dugast
@ 2026-01-09  8:54 ` Francois Dugast
  2026-01-09 11:09   ` Mika Penttilä
  1 sibling, 1 reply; 27+ messages in thread
From: Francois Dugast @ 2026-01-09  8:54 UTC (permalink / raw)
  To: intel-xe
  Cc: dri-devel, Matthew Brost, Balbir Singh, Alistair Popple, Zi Yan,
	David Hildenbrand, Oscar Salvador, Andrew Morton, linux-mm,
	linux-cxl, linux-kernel, Francois Dugast

From: Matthew Brost <matthew.brost@intel.com>

Split device-private and coherent folios into individual pages before
freeing so that any order folio can be formed upon the next use of the
pages.

Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: linux-cxl@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
---
 mm/memremap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/memremap.c b/mm/memremap.c
index 63c6ab4fdf08..7289cdd6862f 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
 	case MEMORY_DEVICE_COHERENT:
 		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
 			break;
+
+		folio_split_unref(folio);
 		pgmap->ops->folio_free(folio);
 		percpu_ref_put_many(&folio->pgmap->ref, nr);
 		break;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09  8:54 ` [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing Francois Dugast
@ 2026-01-09 11:09   ` Mika Penttilä
  2026-01-09 17:28     ` Zi Yan
  0 siblings, 1 reply; 27+ messages in thread
From: Mika Penttilä @ 2026-01-09 11:09 UTC (permalink / raw)
  To: Francois Dugast, intel-xe
  Cc: dri-devel, Matthew Brost, Balbir Singh, Alistair Popple, Zi Yan,
	David Hildenbrand, Oscar Salvador, Andrew Morton, linux-mm,
	linux-cxl, linux-kernel

Hi,

On 1/9/26 10:54, Francois Dugast wrote:

> From: Matthew Brost <matthew.brost@intel.com>
>
> Split device-private and coherent folios into individual pages before
> freeing so that any order folio can be formed upon the next use of the
> pages.
>
> Cc: Balbir Singh <balbirs@nvidia.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org
> Cc: linux-cxl@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> ---
>  mm/memremap.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 63c6ab4fdf08..7289cdd6862f 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>  	case MEMORY_DEVICE_COHERENT:
>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>  			break;
> +
> +		folio_split_unref(folio);
>  		pgmap->ops->folio_free(folio);
>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>  		break;

This breaks folio_free implementations like nouveau_dmem_folio_free
which checks the folio order and act upon that.
Maybe add an order parameter to folio_free or let the driver handle the split?

Thanks,
Mika



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/7] mm: Add folio_split_unref helper
  2026-01-09  8:54 ` [PATCH v3 1/7] mm: Add folio_split_unref helper Francois Dugast
@ 2026-01-09 13:19   ` David Hildenbrand (Red Hat)
  2026-01-09 13:26     ` David Hildenbrand (Red Hat)
  2026-01-09 18:37     ` Andrew Morton
  0 siblings, 2 replies; 27+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-09 13:19 UTC (permalink / raw)
  To: Francois Dugast, intel-xe
  Cc: dri-devel, Matthew Brost, Balbir Singh, Andrew Morton,
	Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel, Alistair Popple

On 1/9/26 09:54, Francois Dugast wrote:
> From: Matthew Brost <matthew.brost@intel.com>
> 
> Add folio_split_unref helper which splits an unreferenced folio

split_unref reads like "split and unref".

You probably want to call this something like "folio_split_frozen" ?

The very definition of "frozen" is "refcount = 0 ", so you can simplify 
the documentation.

Are the folios you want to pass in there completely unused (-> free) or 
might they still be in use (e.g., migration entries point at them during 
folio split)

So I am not sure yet if this should be "folio_split_frozen()" or 
"folio_split_freed()" or sth like that.

I'm not CCed on the other patches in the series or the cover letter, so 
I don't see the context.

You should describe in this patch here in which context the function is 
supposed to be used in later commits.


> (refcount == 0) into individual pages. Intended to be called on special
> pages (e.g., device-private, DAX, etc.) when returning the folio to the
> free page pool.
> 
> Cc: Balbir Singh <balbirs@nvidia.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> Cc: Nico Pache <npache@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: Barry Song <baohua@kernel.org>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Suggested-by: Alistair Popple <apopple@nvidia.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> ---
>   include/linux/huge_mm.h |  1 +
>   mm/huge_memory.c        | 39 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 40 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index a4d9f964dfde..18cb9728d8f1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -369,6 +369,7 @@ enum split_type {
>   	SPLIT_TYPE_NON_UNIFORM,
>   };
>   
> +void folio_split_unref(struct folio *folio);
>   int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>   		unsigned int new_order);
>   int folio_split_unmapped(struct folio *folio, unsigned int new_order);
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 40cf59301c21..0eb9e6ad8639 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3580,6 +3580,45 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>   		ClearPageCompound(&folio->page);
>   }
>   
> +/**
> + * folio_split_unref() - split an unreferenced folio (refcount == 0)
> + * @folio: the to-be-split folio
> + *
> + * Split an unreferenced folio (refcount == 0) into individual pages.
> + * Intended to be called on special pages (e.g., device-private, DAX, etc.)
> + * when returning the folio to the free page pool.
> + */
> +void folio_split_unref(struct folio *folio)
> +{
> +	struct dev_pagemap *pgmap = page_pgmap(&folio->page);
> +	int order, i;
> +
> +	folio->mapping = NULL;

It's unclear why you mess with the mapping. Usually, throughout a folio 
split, we populate the folio->mapping to all split folios.


-- 
Cheers

David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/7] mm: Add folio_split_unref helper
  2026-01-09 13:19   ` David Hildenbrand (Red Hat)
@ 2026-01-09 13:26     ` David Hildenbrand (Red Hat)
  2026-01-09 14:30       ` Zi Yan
  2026-01-09 18:37     ` Andrew Morton
  1 sibling, 1 reply; 27+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-09 13:26 UTC (permalink / raw)
  To: Francois Dugast, intel-xe
  Cc: dri-devel, Matthew Brost, Balbir Singh, Andrew Morton,
	Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel, Alistair Popple

On 1/9/26 14:19, David Hildenbrand (Red Hat) wrote:
> On 1/9/26 09:54, Francois Dugast wrote:
>> From: Matthew Brost <matthew.brost@intel.com>
>>
>> Add folio_split_unref helper which splits an unreferenced folio
> 
> split_unref reads like "split and unref".
> 
> You probably want to call this something like "folio_split_frozen" ?
> 
> The very definition of "frozen" is "refcount = 0 ", so you can simplify
> the documentation.
> 
> Are the folios you want to pass in there completely unused (-> free) or
> might they still be in use (e.g., migration entries point at them during
> folio split)
> 
> So I am not sure yet if this should be "folio_split_frozen()" or
> "folio_split_freed()" or sth like that.
> 
> I'm not CCed on the other patches in the series or the cover letter, so
> I don't see the context.
> 

Ah, I was CCed on #3 where we call this function on folios that are 
getting freed.

In that case it would be acceptable to initialize folio->mapping (and 
folio->index?) of the split folios. Do we also have to initialize 
folio->flags, folio->private etc?

See __split_huge_page_tail().

folio_split_freed() would likely be best, because then it is clearer 
that there is absolutely no state to copy from the large folio.

> You should describe in this patch here in which context the function is
> supposed to be used in later commits.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/7] mm: Add folio_split_unref helper
  2026-01-09 13:26     ` David Hildenbrand (Red Hat)
@ 2026-01-09 14:30       ` Zi Yan
  2026-01-09 15:11         ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 27+ messages in thread
From: Zi Yan @ 2026-01-09 14:30 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Francois Dugast, intel-xe, dri-devel, Matthew Brost,
	Balbir Singh, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, linux-mm, linux-kernel, Alistair Popple

On 9 Jan 2026, at 8:26, David Hildenbrand (Red Hat) wrote:

> On 1/9/26 14:19, David Hildenbrand (Red Hat) wrote:
>> On 1/9/26 09:54, Francois Dugast wrote:
>>> From: Matthew Brost <matthew.brost@intel.com>
>>>
>>> Add folio_split_unref helper which splits an unreferenced folio
>>
>> split_unref reads like "split and unref".
>>
>> You probably want to call this something like "folio_split_frozen" ?
>>
>> The very definition of "frozen" is "refcount = 0 ", so you can simplify
>> the documentation.
>>
>> Are the folios you want to pass in there completely unused (-> free) or
>> might they still be in use (e.g., migration entries point at them during
>> folio split)
>>
>> So I am not sure yet if this should be "folio_split_frozen()" or
>> "folio_split_freed()" or sth like that.
>>
>> I'm not CCed on the other patches in the series or the cover letter, so
>> I don't see the context.
>>
>
> Ah, I was CCed on #3 where we call this function on folios that are getting freed.
>
> In that case it would be acceptable to initialize folio->mapping (and folio->index?) of the split folios. Do we also have to initialize folio->flags, folio->private etc?
>
> See __split_huge_page_tail().
>
> folio_split_freed() would likely be best, because then it is clearer that there is absolutely no state to copy from the large folio.

Yes, basically, we do not have a reverse function of prep_compound_page() and
open codes the reverse process in free_pages_prepare(). For zone devices,
zone_device_page_init() calls prep_compound_page() to form a folio but
free_zone_device_folio() never does the reverse. FS DAX has its own
dax_folio_put() to do it. Alistair suggested to come up with a helper
function for both FS DAX and free_zone_device_folio().

Maybe free_zone_device_folio_prepare() is better. And put it in mm/memremap.c.

>
>> You should describe in this patch here in which context the function is
>> supposed to be used in later commits.
>
> -- 
> Cheers
>
> David


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/7] mm: Add folio_split_unref helper
  2026-01-09 14:30       ` Zi Yan
@ 2026-01-09 15:11         ` David Hildenbrand (Red Hat)
  2026-01-09 18:38           ` Matthew Brost
  0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-09 15:11 UTC (permalink / raw)
  To: Zi Yan
  Cc: Francois Dugast, intel-xe, dri-devel, Matthew Brost,
	Balbir Singh, Andrew Morton, Lorenzo Stoakes, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, linux-mm, linux-kernel, Alistair Popple

On 1/9/26 15:30, Zi Yan wrote:
> On 9 Jan 2026, at 8:26, David Hildenbrand (Red Hat) wrote:
> 
>> On 1/9/26 14:19, David Hildenbrand (Red Hat) wrote:
>>> On 1/9/26 09:54, Francois Dugast wrote:
>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>
>>>> Add folio_split_unref helper which splits an unreferenced folio
>>>
>>> split_unref reads like "split and unref".
>>>
>>> You probably want to call this something like "folio_split_frozen" ?
>>>
>>> The very definition of "frozen" is "refcount = 0 ", so you can simplify
>>> the documentation.
>>>
>>> Are the folios you want to pass in there completely unused (-> free) or
>>> might they still be in use (e.g., migration entries point at them during
>>> folio split)
>>>
>>> So I am not sure yet if this should be "folio_split_frozen()" or
>>> "folio_split_freed()" or sth like that.
>>>
>>> I'm not CCed on the other patches in the series or the cover letter, so
>>> I don't see the context.
>>>
>>
>> Ah, I was CCed on #3 where we call this function on folios that are getting freed.
>>
>> In that case it would be acceptable to initialize folio->mapping (and folio->index?) of the split folios. Do we also have to initialize folio->flags, folio->private etc?
>>
>> See __split_huge_page_tail().
>>
>> folio_split_freed() would likely be best, because then it is clearer that there is absolutely no state to copy from the large folio.
> 
> Yes, basically, we do not have a reverse function of prep_compound_page() and
> open codes the reverse process in free_pages_prepare(). For zone devices,
> zone_device_page_init() calls prep_compound_page() to form a folio but
> free_zone_device_folio() never does the reverse. FS DAX has its own
> dax_folio_put() to do it. Alistair suggested to come up with a helper
> function for both FS DAX and free_zone_device_folio().
> 
> Maybe free_zone_device_folio_prepare() is better. And put it in mm/memremap.c.

That would be even better, if we can limit this completely to zone_device.
-- 
Cheers

David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 11:09   ` Mika Penttilä
@ 2026-01-09 17:28     ` Zi Yan
  2026-01-09 18:26       ` Matthew Brost
  0 siblings, 1 reply; 27+ messages in thread
From: Zi Yan @ 2026-01-09 17:28 UTC (permalink / raw)
  To: Mika Penttilä
  Cc: Francois Dugast, intel-xe, dri-devel, Matthew Brost,
	Balbir Singh, Alistair Popple, David Hildenbrand, Oscar Salvador,
	Andrew Morton, linux-mm, linux-cxl, linux-kernel

On 9 Jan 2026, at 6:09, Mika Penttilä wrote:

> Hi,
>
> On 1/9/26 10:54, Francois Dugast wrote:
>
>> From: Matthew Brost <matthew.brost@intel.com>
>>
>> Split device-private and coherent folios into individual pages before
>> freeing so that any order folio can be formed upon the next use of the
>> pages.
>>
>> Cc: Balbir Singh <balbirs@nvidia.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: David Hildenbrand <david@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: linux-mm@kvack.org
>> Cc: linux-cxl@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>> ---
>>  mm/memremap.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/mm/memremap.c b/mm/memremap.c
>> index 63c6ab4fdf08..7289cdd6862f 100644
>> --- a/mm/memremap.c
>> +++ b/mm/memremap.c
>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>  	case MEMORY_DEVICE_COHERENT:
>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>>  			break;
>> +
>> +		folio_split_unref(folio);
>>  		pgmap->ops->folio_free(folio);
>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>>  		break;
>
> This breaks folio_free implementations like nouveau_dmem_folio_free
> which checks the folio order and act upon that.
> Maybe add an order parameter to folio_free or let the driver handle the split?

Passing an order parameter might be better to avoid exposing core MM internals
by asking drivers to undo compound pages.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 17:28     ` Zi Yan
@ 2026-01-09 18:26       ` Matthew Brost
  2026-01-09 18:53         ` Zi Yan
  0 siblings, 1 reply; 27+ messages in thread
From: Matthew Brost @ 2026-01-09 18:26 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Balbir Singh,
	Alistair Popple, David Hildenbrand, Oscar Salvador,
	Andrew Morton, linux-mm, linux-cxl, linux-kernel

On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
> 
> > Hi,
> >
> > On 1/9/26 10:54, Francois Dugast wrote:
> >
> >> From: Matthew Brost <matthew.brost@intel.com>
> >>
> >> Split device-private and coherent folios into individual pages before
> >> freeing so that any order folio can be formed upon the next use of the
> >> pages.
> >>
> >> Cc: Balbir Singh <balbirs@nvidia.com>
> >> Cc: Alistair Popple <apopple@nvidia.com>
> >> Cc: Zi Yan <ziy@nvidia.com>
> >> Cc: David Hildenbrand <david@kernel.org>
> >> Cc: Oscar Salvador <osalvador@suse.de>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: linux-mm@kvack.org
> >> Cc: linux-cxl@vger.kernel.org
> >> Cc: linux-kernel@vger.kernel.org
> >> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> >> ---
> >>  mm/memremap.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/mm/memremap.c b/mm/memremap.c
> >> index 63c6ab4fdf08..7289cdd6862f 100644
> >> --- a/mm/memremap.c
> >> +++ b/mm/memremap.c
> >> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
> >>  	case MEMORY_DEVICE_COHERENT:
> >>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
> >>  			break;
> >> +
> >> +		folio_split_unref(folio);
> >>  		pgmap->ops->folio_free(folio);
> >>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
> >>  		break;
> >
> > This breaks folio_free implementations like nouveau_dmem_folio_free
> > which checks the folio order and act upon that.
> > Maybe add an order parameter to folio_free or let the driver handle the split?

'let the driver handle the split?' - I had consisder this as an option.

> 
> Passing an order parameter might be better to avoid exposing core MM internals
> by asking drivers to undo compound pages.
> 

It looks like Nouveau tracks free folios and free pages—something Xe’s
device memory allocator (DRM Buddy) cannot do. I guess this answers my
earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
SVM with respect to reusing folios. It appears Nouveau prefers not to
split the folio, so I’m leaning toward moving this call into the
driver’s folio_free function.

Matt

> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/7] mm: Add folio_split_unref helper
  2026-01-09 13:19   ` David Hildenbrand (Red Hat)
  2026-01-09 13:26     ` David Hildenbrand (Red Hat)
@ 2026-01-09 18:37     ` Andrew Morton
  2026-01-09 18:41       ` Zi Yan
  2026-01-09 18:43       ` Matthew Brost
  1 sibling, 2 replies; 27+ messages in thread
From: Andrew Morton @ 2026-01-09 18:37 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Francois Dugast, intel-xe, dri-devel, Matthew Brost,
	Balbir Singh, Lorenzo Stoakes, Zi Yan, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, linux-mm, linux-kernel, Alistair Popple

On Fri, 9 Jan 2026 14:19:16 +0100 "David Hildenbrand (Red Hat)" <david@kernel.org> wrote:

> I'm not CCed on the other patches in the series or the cover letter, so 
> I don't see the context.

Both linux-mm and I received a random subset of this series.  Something
went wrong.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/7] mm: Add folio_split_unref helper
  2026-01-09 15:11         ` David Hildenbrand (Red Hat)
@ 2026-01-09 18:38           ` Matthew Brost
  0 siblings, 0 replies; 27+ messages in thread
From: Matthew Brost @ 2026-01-09 18:38 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Zi Yan, Francois Dugast, intel-xe, dri-devel, Balbir Singh,
	Andrew Morton, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel, Alistair Popple

On Fri, Jan 09, 2026 at 04:11:23PM +0100, David Hildenbrand (Red Hat) wrote:
> On 1/9/26 15:30, Zi Yan wrote:
> > On 9 Jan 2026, at 8:26, David Hildenbrand (Red Hat) wrote:
> > 
> > > On 1/9/26 14:19, David Hildenbrand (Red Hat) wrote:
> > > > On 1/9/26 09:54, Francois Dugast wrote:
> > > > > From: Matthew Brost <matthew.brost@intel.com>
> > > > > 
> > > > > Add folio_split_unref helper which splits an unreferenced folio
> > > > 
> > > > split_unref reads like "split and unref".
> > > > 
> > > > You probably want to call this something like "folio_split_frozen" ?
> > > > 
> > > > The very definition of "frozen" is "refcount = 0 ", so you can simplify
> > > > the documentation.
> > > > 
> > > > Are the folios you want to pass in there completely unused (-> free) or
> > > > might they still be in use (e.g., migration entries point at them during
> > > > folio split)
> > > > 
> > > > So I am not sure yet if this should be "folio_split_frozen()" or
> > > > "folio_split_freed()" or sth like that.
> > > > 
> > > > I'm not CCed on the other patches in the series or the cover letter, so
> > > > I don't see the context.

Here is a patchwork link to the entire series:
https://patchwork.freedesktop.org/series/159119/

> > > > 
> > > 
> > > Ah, I was CCed on #3 where we call this function on folios that are getting freed.
> > > 
> > > In that case it would be acceptable to initialize folio->mapping (and folio->index?) of the split folios. Do we also have to initialize folio->flags, folio->private etc?
> > >

I lifted this code from FSDAX here:
https://elixir.bootlin.com/linux/v6.18.4/source/fs/dax.c#L394

It seemly does everything we need for a zone_device split.

> > > See __split_huge_page_tail().
> > > 

I'm not seeing this function defined anywhere.

> > > folio_split_freed() would likely be best, because then it is clearer that there is absolutely no state to copy from the large folio.
> > 
> > Yes, basically, we do not have a reverse function of prep_compound_page() and
> > open codes the reverse process in free_pages_prepare(). For zone devices,
> > zone_device_page_init() calls prep_compound_page() to form a folio but
> > free_zone_device_folio() never does the reverse. FS DAX has its own
> > dax_folio_put() to do it. Alistair suggested to come up with a helper
> > function for both FS DAX and free_zone_device_folio().
> > 
> > Maybe free_zone_device_folio_prepare() is better. And put it in mm/memremap.c.

+1 can rename and move.

> 
> That would be even better, if we can limit this completely to zone_device.

Yes, I believe this function should only used by device private, device
coherent, and fsdax folios which are all zone_device. I can add warning
in this function too if it is called on non-zone_device folio too.

Matt 

> -- 
> Cheers
> 
> David


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/7] mm: Add folio_split_unref helper
  2026-01-09 18:37     ` Andrew Morton
@ 2026-01-09 18:41       ` Zi Yan
  2026-01-09 18:54         ` Francois Dugast
  2026-01-09 18:43       ` Matthew Brost
  1 sibling, 1 reply; 27+ messages in thread
From: Zi Yan @ 2026-01-09 18:41 UTC (permalink / raw)
  To: Andrew Morton, Francois Dugast
  Cc: David Hildenbrand (Red Hat),
	intel-xe, dri-devel, Matthew Brost, Balbir Singh,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, linux-mm,
	linux-kernel, Alistair Popple

On 9 Jan 2026, at 13:37, Andrew Morton wrote:

> On Fri, 9 Jan 2026 14:19:16 +0100 "David Hildenbrand (Red Hat)" <david@kernel.org> wrote:
>
>> I'm not CCed on the other patches in the series or the cover letter, so
>> I don't see the context.
>
> Both linux-mm and I received a random subset of this series.  Something
> went wrong.

Apparently, the whole series[1] was sent to intel-xe and dri-devel lists
and only mm part was sent to linux-mm and related people.

Hi Francois,

Do you mind CCing linux-mm and MM people in the whole series next time?

Thanks.

[1] https://lore.kernel.org/all/20260109085605.443316-1-francois.dugast@intel.com/

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/7] mm: Add folio_split_unref helper
  2026-01-09 18:37     ` Andrew Morton
  2026-01-09 18:41       ` Zi Yan
@ 2026-01-09 18:43       ` Matthew Brost
  2026-01-09 19:22         ` Andrew Morton
  1 sibling, 1 reply; 27+ messages in thread
From: Matthew Brost @ 2026-01-09 18:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand (Red Hat),
	Francois Dugast, intel-xe, dri-devel, Balbir Singh,
	Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel, Alistair Popple

On Fri, Jan 09, 2026 at 10:37:41AM -0800, Andrew Morton wrote:
> On Fri, 9 Jan 2026 14:19:16 +0100 "David Hildenbrand (Red Hat)" <david@kernel.org> wrote:
> 
> > I'm not CCed on the other patches in the series or the cover letter, so 
> > I don't see the context.
> 
> Both linux-mm and I received a random subset of this series.  Something
> went wrong.

Apologies for the list workflow issues. Here is the link to the entire
series [1].

For future reference, when we submit core MM patches in a series, should
we CC linux-mm plus MM maintainers on all patches in the series, even
those that do not touch core MM?

Matt

[1] https://patchwork.freedesktop.org/series/159119/


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 18:26       ` Matthew Brost
@ 2026-01-09 18:53         ` Zi Yan
  2026-01-09 19:08           ` Matthew Brost
  0 siblings, 1 reply; 27+ messages in thread
From: Zi Yan @ 2026-01-09 18:53 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Balbir Singh,
	Alistair Popple, David Hildenbrand, Oscar Salvador,
	Andrew Morton, linux-mm, linux-cxl, linux-kernel

On 9 Jan 2026, at 13:26, Matthew Brost wrote:

> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>
>>> Hi,
>>>
>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>
>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>
>>>> Split device-private and coherent folios into individual pages before
>>>> freeing so that any order folio can be formed upon the next use of the
>>>> pages.
>>>>
>>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>> Cc: David Hildenbrand <david@kernel.org>
>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: linux-mm@kvack.org
>>>> Cc: linux-cxl@vger.kernel.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>>>> ---
>>>>  mm/memremap.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>> --- a/mm/memremap.c
>>>> +++ b/mm/memremap.c
>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>>>  	case MEMORY_DEVICE_COHERENT:
>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>>>>  			break;
>>>> +
>>>> +		folio_split_unref(folio);
>>>>  		pgmap->ops->folio_free(folio);
>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>  		break;
>>>
>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>> which checks the folio order and act upon that.
>>> Maybe add an order parameter to folio_free or let the driver handle the split?
>
> 'let the driver handle the split?' - I had consisder this as an option.
>
>>
>> Passing an order parameter might be better to avoid exposing core MM internals
>> by asking drivers to undo compound pages.
>>
>
> It looks like Nouveau tracks free folios and free pages—something Xe’s
> device memory allocator (DRM Buddy) cannot do. I guess this answers my
> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
> SVM with respect to reusing folios. It appears Nouveau prefers not to
> split the folio, so I’m leaning toward moving this call into the
> driver’s folio_free function.

No, that creates asymmetric page handling and is error prone.

In addition, looking at nouveau’s implementation in
nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
which is never split, and passes it to zone_device_folio_init(). This
is wrong, since if the folio is large, it will go through prep_compound_page()
again. The bug has not manifested because there is only order-9 large folios.
Once mTHP support is added, how is nouveau going to allocate a order-4 folio
from a free order-9 folio? Maintain a per-order free folio list and
reimplement a buddy allocator? Nevertheless, nouveau’s implementation
is wrong by calling prep_compound_page() on a folio (already compound page).

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/7] mm: Add folio_split_unref helper
  2026-01-09 18:41       ` Zi Yan
@ 2026-01-09 18:54         ` Francois Dugast
  0 siblings, 0 replies; 27+ messages in thread
From: Francois Dugast @ 2026-01-09 18:54 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, David Hildenbrand (Red Hat),
	intel-xe, dri-devel, Matthew Brost, Balbir Singh,
	Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, linux-mm,
	linux-kernel, Alistair Popple

On Fri, Jan 09, 2026 at 01:41:04PM -0500, Zi Yan wrote:
> On 9 Jan 2026, at 13:37, Andrew Morton wrote:
> 
> > On Fri, 9 Jan 2026 14:19:16 +0100 "David Hildenbrand (Red Hat)" <david@kernel.org> wrote:
> >
> >> I'm not CCed on the other patches in the series or the cover letter, so
> >> I don't see the context.
> >
> > Both linux-mm and I received a random subset of this series.  Something
> > went wrong.
> 
> Apparently, the whole series[1] was sent to intel-xe and dri-devel lists
> and only mm part was sent to linux-mm and related people.
> 
> Hi Francois,
> 
> Do you mind CCing linux-mm and MM people in the whole series next time?

Sure, will do to ensure context is provided. Sorry for the confusion.

Francois

> 
> Thanks.
> 
> [1] https://lore.kernel.org/all/20260109085605.443316-1-francois.dugast@intel.com/
> 
> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 18:53         ` Zi Yan
@ 2026-01-09 19:08           ` Matthew Brost
  2026-01-09 19:23             ` Zi Yan
  0 siblings, 1 reply; 27+ messages in thread
From: Matthew Brost @ 2026-01-09 19:08 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Balbir Singh,
	Alistair Popple, David Hildenbrand, Oscar Salvador,
	Andrew Morton, linux-mm, linux-cxl, linux-kernel

On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
> 
> > On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
> >> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
> >>
> >>> Hi,
> >>>
> >>> On 1/9/26 10:54, Francois Dugast wrote:
> >>>
> >>>> From: Matthew Brost <matthew.brost@intel.com>
> >>>>
> >>>> Split device-private and coherent folios into individual pages before
> >>>> freeing so that any order folio can be formed upon the next use of the
> >>>> pages.
> >>>>
> >>>> Cc: Balbir Singh <balbirs@nvidia.com>
> >>>> Cc: Alistair Popple <apopple@nvidia.com>
> >>>> Cc: Zi Yan <ziy@nvidia.com>
> >>>> Cc: David Hildenbrand <david@kernel.org>
> >>>> Cc: Oscar Salvador <osalvador@suse.de>
> >>>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>>> Cc: linux-mm@kvack.org
> >>>> Cc: linux-cxl@vger.kernel.org
> >>>> Cc: linux-kernel@vger.kernel.org
> >>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> >>>> ---
> >>>>  mm/memremap.c | 2 ++
> >>>>  1 file changed, 2 insertions(+)
> >>>>
> >>>> diff --git a/mm/memremap.c b/mm/memremap.c
> >>>> index 63c6ab4fdf08..7289cdd6862f 100644
> >>>> --- a/mm/memremap.c
> >>>> +++ b/mm/memremap.c
> >>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
> >>>>  	case MEMORY_DEVICE_COHERENT:
> >>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
> >>>>  			break;
> >>>> +
> >>>> +		folio_split_unref(folio);
> >>>>  		pgmap->ops->folio_free(folio);
> >>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
> >>>>  		break;
> >>>
> >>> This breaks folio_free implementations like nouveau_dmem_folio_free
> >>> which checks the folio order and act upon that.
> >>> Maybe add an order parameter to folio_free or let the driver handle the split?
> >
> > 'let the driver handle the split?' - I had consisder this as an option.
> >
> >>
> >> Passing an order parameter might be better to avoid exposing core MM internals
> >> by asking drivers to undo compound pages.
> >>
> >
> > It looks like Nouveau tracks free folios and free pages—something Xe’s
> > device memory allocator (DRM Buddy) cannot do. I guess this answers my
> > earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
> > SVM with respect to reusing folios. It appears Nouveau prefers not to
> > split the folio, so I’m leaning toward moving this call into the
> > driver’s folio_free function.
> 
> No, that creates asymmetric page handling and is error prone.
> 

I agree it is asymmetric and symmetric is likely better.

> In addition, looking at nouveau’s implementation in
> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
> which is never split, and passes it to zone_device_folio_init(). This
> is wrong, since if the folio is large, it will go through prep_compound_page()
> again. The bug has not manifested because there is only order-9 large folios.
> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
> from a free order-9 folio? Maintain a per-order free folio list and
> reimplement a buddy allocator? Nevertheless, nouveau’s implementation

The way Nouveau handles memory allocations here looks wrong to me—it
should probably use DRM Buddy and convert a block buddy to pages rather
than tracking a free folio list and free page list. But this is not my
driver.

> is wrong by calling prep_compound_page() on a folio (already compound page).
>

I don’t disagree that this implementation is questionable.

So what’s the suggestion here—add folio order to folio_free just to
accommodate Nouveau’s rather odd memory allocation algorithm? That
doesn’t seem right to me either.

Matt
 
> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/7] mm: Add folio_split_unref helper
  2026-01-09 18:43       ` Matthew Brost
@ 2026-01-09 19:22         ` Andrew Morton
  2026-01-09 19:26           ` Liam R. Howlett
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2026-01-09 19:22 UTC (permalink / raw)
  To: Matthew Brost
  Cc: David Hildenbrand (Red Hat),
	Francois Dugast, intel-xe, dri-devel, Balbir Singh,
	Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel, Alistair Popple

On Fri, 9 Jan 2026 10:43:06 -0800 Matthew Brost <matthew.brost@intel.com> wrote:

> On Fri, Jan 09, 2026 at 10:37:41AM -0800, Andrew Morton wrote:
> > On Fri, 9 Jan 2026 14:19:16 +0100 "David Hildenbrand (Red Hat)" <david@kernel.org> wrote:
> > 
> > > I'm not CCed on the other patches in the series or the cover letter, so 
> > > I don't see the context.
> > 
> > Both linux-mm and I received a random subset of this series.  Something
> > went wrong.
> 
> Apologies for the list workflow issues. Here is the link to the entire
> series [1].

Cool.  It might be best to spray it all out again, after any IT issues
are fixed.

> For future reference, when we submit core MM patches in a series, should
> we CC linux-mm plus MM maintainers on all patches in the series, even
> those that do not touch core MM?

I think that's best.  I personally don't like seeing just a subset,
although it's trivial to go find the rest on the list.  I've heard
others state that preference, I don't know where the consensus lies.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 19:08           ` Matthew Brost
@ 2026-01-09 19:23             ` Zi Yan
  2026-01-09 20:03               ` Matthew Brost
  0 siblings, 1 reply; 27+ messages in thread
From: Zi Yan @ 2026-01-09 19:23 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Balbir Singh,
	Alistair Popple, David Hildenbrand, Oscar Salvador,
	Andrew Morton, linux-mm, linux-cxl, linux-kernel

On 9 Jan 2026, at 14:08, Matthew Brost wrote:

> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
>>
>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>>>
>>>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>>>
>>>>>> Split device-private and coherent folios into individual pages before
>>>>>> freeing so that any order folio can be formed upon the next use of the
>>>>>> pages.
>>>>>>
>>>>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>>> Cc: David Hildenbrand <david@kernel.org>
>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>> Cc: linux-mm@kvack.org
>>>>>> Cc: linux-cxl@vger.kernel.org
>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>>>>>> ---
>>>>>>  mm/memremap.c | 2 ++
>>>>>>  1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>>>> --- a/mm/memremap.c
>>>>>> +++ b/mm/memremap.c
>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>>>>>  	case MEMORY_DEVICE_COHERENT:
>>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>>>>>>  			break;
>>>>>> +
>>>>>> +		folio_split_unref(folio);
>>>>>>  		pgmap->ops->folio_free(folio);
>>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>>>  		break;
>>>>>
>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>>>> which checks the folio order and act upon that.
>>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
>>>
>>> 'let the driver handle the split?' - I had consisder this as an option.
>>>
>>>>
>>>> Passing an order parameter might be better to avoid exposing core MM internals
>>>> by asking drivers to undo compound pages.
>>>>
>>>
>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
>>> split the folio, so I’m leaning toward moving this call into the
>>> driver’s folio_free function.
>>
>> No, that creates asymmetric page handling and is error prone.
>>
>
> I agree it is asymmetric and symmetric is likely better.
>
>> In addition, looking at nouveau’s implementation in
>> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
>> which is never split, and passes it to zone_device_folio_init(). This
>> is wrong, since if the folio is large, it will go through prep_compound_page()
>> again. The bug has not manifested because there is only order-9 large folios.
>> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
>> from a free order-9 folio? Maintain a per-order free folio list and
>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
>
> The way Nouveau handles memory allocations here looks wrong to me—it
> should probably use DRM Buddy and convert a block buddy to pages rather
> than tracking a free folio list and free page list. But this is not my
> driver.
>
>> is wrong by calling prep_compound_page() on a folio (already compound page).
>>
>
> I don’t disagree that this implementation is questionable.
>
> So what’s the suggestion here—add folio order to folio_free just to
> accommodate Nouveau’s rather odd memory allocation algorithm? That
> doesn’t seem right to me either.

Splitting the folio in free_zone_device_folio() and passing folio order
to folio_free() make sense to me, since after the split, the folio passed
to folio_free() contains no order information, but just the used-to-be
head page and the remaining 511 pages are free. How does Intel Xe driver
handle it without knowing folio order?

Do we really need the order info in ->folio_free() if the folio is split
in free_zone_device_folio()? free_zone_device_folio() should just call
->folio_free() 2^order times to free individual page.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/7] mm: Add folio_split_unref helper
  2026-01-09 19:22         ` Andrew Morton
@ 2026-01-09 19:26           ` Liam R. Howlett
  0 siblings, 0 replies; 27+ messages in thread
From: Liam R. Howlett @ 2026-01-09 19:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Brost, David Hildenbrand (Red Hat),
	Francois Dugast, intel-xe, dri-devel, Balbir Singh,
	Lorenzo Stoakes, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, linux-mm, linux-kernel,
	Alistair Popple

* Andrew Morton <akpm@linux-foundation.org> [260109 14:23]:
> On Fri, 9 Jan 2026 10:43:06 -0800 Matthew Brost <matthew.brost@intel.com> wrote:
> 
> > On Fri, Jan 09, 2026 at 10:37:41AM -0800, Andrew Morton wrote:
> > > On Fri, 9 Jan 2026 14:19:16 +0100 "David Hildenbrand (Red Hat)" <david@kernel.org> wrote:
> > > 
> > > > I'm not CCed on the other patches in the series or the cover letter, so 
> > > > I don't see the context.
> > > 
> > > Both linux-mm and I received a random subset of this series.  Something
> > > went wrong.
> > 
> > Apologies for the list workflow issues. Here is the link to the entire
> > series [1].
> 
> Cool.  It might be best to spray it all out again, after any IT issues
> are fixed.
> 
> > For future reference, when we submit core MM patches in a series, should
> > we CC linux-mm plus MM maintainers on all patches in the series, even
> > those that do not touch core MM?
> 
> I think that's best.  I personally don't like seeing just a subset,
> although it's trivial to go find the rest on the list.  I've heard
> others state that preference, I don't know where the consensus lies.

If you include the cover letter to everyone, then b4 can get the full
set with minimal effort.  It's probably worth telling people the overall
goal, if any of the patches are going to them.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 19:23             ` Zi Yan
@ 2026-01-09 20:03               ` Matthew Brost
  2026-01-09 20:15                 ` Zi Yan
  0 siblings, 1 reply; 27+ messages in thread
From: Matthew Brost @ 2026-01-09 20:03 UTC (permalink / raw)
  To: Zi Yan
  Cc: Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Balbir Singh,
	Alistair Popple, David Hildenbrand, Oscar Salvador,
	Andrew Morton, linux-mm, linux-cxl, linux-kernel

On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
> 
> > On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
> >> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
> >>
> >>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
> >>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> On 1/9/26 10:54, Francois Dugast wrote:
> >>>>>
> >>>>>> From: Matthew Brost <matthew.brost@intel.com>
> >>>>>>
> >>>>>> Split device-private and coherent folios into individual pages before
> >>>>>> freeing so that any order folio can be formed upon the next use of the
> >>>>>> pages.
> >>>>>>
> >>>>>> Cc: Balbir Singh <balbirs@nvidia.com>
> >>>>>> Cc: Alistair Popple <apopple@nvidia.com>
> >>>>>> Cc: Zi Yan <ziy@nvidia.com>
> >>>>>> Cc: David Hildenbrand <david@kernel.org>
> >>>>>> Cc: Oscar Salvador <osalvador@suse.de>
> >>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>>>>> Cc: linux-mm@kvack.org
> >>>>>> Cc: linux-cxl@vger.kernel.org
> >>>>>> Cc: linux-kernel@vger.kernel.org
> >>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> >>>>>> ---
> >>>>>>  mm/memremap.c | 2 ++
> >>>>>>  1 file changed, 2 insertions(+)
> >>>>>>
> >>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
> >>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
> >>>>>> --- a/mm/memremap.c
> >>>>>> +++ b/mm/memremap.c
> >>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
> >>>>>>  	case MEMORY_DEVICE_COHERENT:
> >>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
> >>>>>>  			break;
> >>>>>> +
> >>>>>> +		folio_split_unref(folio);
> >>>>>>  		pgmap->ops->folio_free(folio);
> >>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
> >>>>>>  		break;
> >>>>>
> >>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
> >>>>> which checks the folio order and act upon that.
> >>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
> >>>
> >>> 'let the driver handle the split?' - I had consisder this as an option.
> >>>
> >>>>
> >>>> Passing an order parameter might be better to avoid exposing core MM internals
> >>>> by asking drivers to undo compound pages.
> >>>>
> >>>
> >>> It looks like Nouveau tracks free folios and free pages—something Xe’s
> >>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
> >>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
> >>> SVM with respect to reusing folios. It appears Nouveau prefers not to
> >>> split the folio, so I’m leaning toward moving this call into the
> >>> driver’s folio_free function.
> >>
> >> No, that creates asymmetric page handling and is error prone.
> >>
> >
> > I agree it is asymmetric and symmetric is likely better.
> >
> >> In addition, looking at nouveau’s implementation in
> >> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
> >> which is never split, and passes it to zone_device_folio_init(). This
> >> is wrong, since if the folio is large, it will go through prep_compound_page()
> >> again. The bug has not manifested because there is only order-9 large folios.
> >> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
> >> from a free order-9 folio? Maintain a per-order free folio list and
> >> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
> >
> > The way Nouveau handles memory allocations here looks wrong to me—it
> > should probably use DRM Buddy and convert a block buddy to pages rather
> > than tracking a free folio list and free page list. But this is not my
> > driver.
> >
> >> is wrong by calling prep_compound_page() on a folio (already compound page).
> >>
> >
> > I don’t disagree that this implementation is questionable.
> >
> > So what’s the suggestion here—add folio order to folio_free just to
> > accommodate Nouveau’s rather odd memory allocation algorithm? That
> > doesn’t seem right to me either.
> 
> Splitting the folio in free_zone_device_folio() and passing folio order
> to folio_free() make sense to me, since after the split, the folio passed

If this is concensous / direction - I can do this but a tree wide
change.

I do have another question for everyone here - do we think this spliting
implementation should be considered a Fixes so this can go into 6.19?

> to folio_free() contains no order information, but just the used-to-be
> head page and the remaining 511 pages are free. How does Intel Xe driver
> handle it without knowing folio order?
> 

It’s a bit convoluted, but folio/page->zone_device_data points to a
reference-counted object in GPU SVM. When the object’s reference count
drops to zero, we callback into the driver layer to release the memory.
In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
is then released. If it’s not clear, our original allocation size
determines the granularity at which we free the backing store.

> Do we really need the order info in ->folio_free() if the folio is split
> in free_zone_device_folio()? free_zone_device_folio() should just call
> ->folio_free() 2^order times to free individual page.
> 

No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
reference to our GPU SVM object, so we can free the backing in a single
->folio_free call.

Now, if that folio gets split at some point into 4KB pages, then we’d
have 512 references to this object set up in the ->folio_split calls.
We’d then expect 512 ->folio_free() calls. Same case here: if, for
whatever reason, we can’t create a 2MB device page during a 2MB
migration and need to create 512 4KB pages so we'd have 512 references
to our GPU SVM object.

Matt

> 
> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 20:03               ` Matthew Brost
@ 2026-01-09 20:15                 ` Zi Yan
  2026-01-09 21:34                   ` Balbir Singh
  0 siblings, 1 reply; 27+ messages in thread
From: Zi Yan @ 2026-01-09 20:15 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Balbir Singh,
	Alistair Popple, David Hildenbrand, Oscar Salvador,
	Andrew Morton, linux-mm, linux-cxl, linux-kernel

On 9 Jan 2026, at 15:03, Matthew Brost wrote:

> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
>> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
>>
>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
>>>>
>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>>>>>
>>>>>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>
>>>>>>>> Split device-private and coherent folios into individual pages before
>>>>>>>> freeing so that any order folio can be formed upon the next use of the
>>>>>>>> pages.
>>>>>>>>
>>>>>>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>>>>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>>>>> Cc: David Hildenbrand <david@kernel.org>
>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>> Cc: linux-mm@kvack.org
>>>>>>>> Cc: linux-cxl@vger.kernel.org
>>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>>>>>>>> ---
>>>>>>>>  mm/memremap.c | 2 ++
>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>>>>>> --- a/mm/memremap.c
>>>>>>>> +++ b/mm/memremap.c
>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>>>>>>>  	case MEMORY_DEVICE_COHERENT:
>>>>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>>>>>>>>  			break;
>>>>>>>> +
>>>>>>>> +		folio_split_unref(folio);
>>>>>>>>  		pgmap->ops->folio_free(folio);
>>>>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>>>>>  		break;
>>>>>>>
>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>>>>>> which checks the folio order and act upon that.
>>>>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
>>>>>
>>>>> 'let the driver handle the split?' - I had consisder this as an option.
>>>>>
>>>>>>
>>>>>> Passing an order parameter might be better to avoid exposing core MM internals
>>>>>> by asking drivers to undo compound pages.
>>>>>>
>>>>>
>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
>>>>> split the folio, so I’m leaning toward moving this call into the
>>>>> driver’s folio_free function.
>>>>
>>>> No, that creates asymmetric page handling and is error prone.
>>>>
>>>
>>> I agree it is asymmetric and symmetric is likely better.
>>>
>>>> In addition, looking at nouveau’s implementation in
>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
>>>> which is never split, and passes it to zone_device_folio_init(). This
>>>> is wrong, since if the folio is large, it will go through prep_compound_page()
>>>> again. The bug has not manifested because there is only order-9 large folios.
>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
>>>> from a free order-9 folio? Maintain a per-order free folio list and
>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
>>>
>>> The way Nouveau handles memory allocations here looks wrong to me—it
>>> should probably use DRM Buddy and convert a block buddy to pages rather
>>> than tracking a free folio list and free page list. But this is not my
>>> driver.
>>>
>>>> is wrong by calling prep_compound_page() on a folio (already compound page).
>>>>
>>>
>>> I don’t disagree that this implementation is questionable.
>>>
>>> So what’s the suggestion here—add folio order to folio_free just to
>>> accommodate Nouveau’s rather odd memory allocation algorithm? That
>>> doesn’t seem right to me either.
>>
>> Splitting the folio in free_zone_device_folio() and passing folio order
>> to folio_free() make sense to me, since after the split, the folio passed
>
> If this is concensous / direction - I can do this but a tree wide
> change.
>
> I do have another question for everyone here - do we think this spliting
> implementation should be considered a Fixes so this can go into 6.19?

IMHO, this should be a fix, since it is wrong to call prep_compound_page()
on a large folio. IIUC this seems to only affect nouveau now, I will let
them to decide.

>
>> to folio_free() contains no order information, but just the used-to-be
>> head page and the remaining 511 pages are free. How does Intel Xe driver
>> handle it without knowing folio order?
>>
>
> It’s a bit convoluted, but folio/page->zone_device_data points to a
> reference-counted object in GPU SVM. When the object’s reference count
> drops to zero, we callback into the driver layer to release the memory.
> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
> is then released. If it’s not clear, our original allocation size
> determines the granularity at which we free the backing store.
>
>> Do we really need the order info in ->folio_free() if the folio is split
>> in free_zone_device_folio()? free_zone_device_folio() should just call
>> ->folio_free() 2^order times to free individual page.
>>
>
> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
> reference to our GPU SVM object, so we can free the backing in a single
> ->folio_free call.
>
> Now, if that folio gets split at some point into 4KB pages, then we’d
> have 512 references to this object set up in the ->folio_split calls.
> We’d then expect 512 ->folio_free() calls. Same case here: if, for
> whatever reason, we can’t create a 2MB device page during a 2MB
> migration and need to create 512 4KB pages so we'd have 512 references
> to our GPU SVM object.

Thank you for the explanation. Adding folio order to ->folio_free() makes
sense to me now.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 20:15                 ` Zi Yan
@ 2026-01-09 21:34                   ` Balbir Singh
  2026-01-09 21:43                     ` Zi Yan
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2026-01-09 21:34 UTC (permalink / raw)
  To: Zi Yan, Matthew Brost
  Cc: Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Alistair Popple,
	David Hildenbrand, Oscar Salvador, Andrew Morton, linux-mm,
	linux-cxl, linux-kernel

On 1/10/26 06:15, Zi Yan wrote:
> On 9 Jan 2026, at 15:03, Matthew Brost wrote:
> 
>> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
>>> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
>>>
>>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
>>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
>>>>>
>>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>>>>>>
>>>>>>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>
>>>>>>>>> Split device-private and coherent folios into individual pages before
>>>>>>>>> freeing so that any order folio can be formed upon the next use of the
>>>>>>>>> pages.
>>>>>>>>>
>>>>>>>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>>>>>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>>>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>>>>>> Cc: David Hildenbrand <david@kernel.org>
>>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>>> Cc: linux-mm@kvack.org
>>>>>>>>> Cc: linux-cxl@vger.kernel.org
>>>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>>>>>>>>> ---
>>>>>>>>>  mm/memremap.c | 2 ++
>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>>>>>>> --- a/mm/memremap.c
>>>>>>>>> +++ b/mm/memremap.c
>>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>>>>>>>>  	case MEMORY_DEVICE_COHERENT:
>>>>>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>>>>>>>>>  			break;
>>>>>>>>> +
>>>>>>>>> +		folio_split_unref(folio);
>>>>>>>>>  		pgmap->ops->folio_free(folio);
>>>>>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>>>>>>  		break;
>>>>>>>>
>>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>>>>>>> which checks the folio order and act upon that.
>>>>>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
>>>>>>
>>>>>> 'let the driver handle the split?' - I had consisder this as an option.
>>>>>>
>>>>>>>
>>>>>>> Passing an order parameter might be better to avoid exposing core MM internals
>>>>>>> by asking drivers to undo compound pages.
>>>>>>>
>>>>>>
>>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
>>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
>>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
>>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
>>>>>> split the folio, so I’m leaning toward moving this call into the
>>>>>> driver’s folio_free function.
>>>>>
>>>>> No, that creates asymmetric page handling and is error prone.
>>>>>
>>>>
>>>> I agree it is asymmetric and symmetric is likely better.
>>>>
>>>>> In addition, looking at nouveau’s implementation in
>>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
>>>>> which is never split, and passes it to zone_device_folio_init(). This
>>>>> is wrong, since if the folio is large, it will go through prep_compound_page()
>>>>> again. The bug has not manifested because there is only order-9 large folios.
>>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
>>>>> from a free order-9 folio? Maintain a per-order free folio list and
>>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
>>>>
>>>> The way Nouveau handles memory allocations here looks wrong to me—it
>>>> should probably use DRM Buddy and convert a block buddy to pages rather
>>>> than tracking a free folio list and free page list. But this is not my
>>>> driver.
>>>>
>>>>> is wrong by calling prep_compound_page() on a folio (already compound page).
>>>>>
>>>>
>>>> I don’t disagree that this implementation is questionable.
>>>>
>>>> So what’s the suggestion here—add folio order to folio_free just to
>>>> accommodate Nouveau’s rather odd memory allocation algorithm? That
>>>> doesn’t seem right to me either.
>>>
>>> Splitting the folio in free_zone_device_folio() and passing folio order
>>> to folio_free() make sense to me, since after the split, the folio passed
>>
>> If this is concensous / direction - I can do this but a tree wide
>> change.
>>
>> I do have another question for everyone here - do we think this spliting
>> implementation should be considered a Fixes so this can go into 6.19?
> 
> IMHO, this should be a fix, since it is wrong to call prep_compound_page()
> on a large folio. IIUC this seems to only affect nouveau now, I will let
> them to decide.
> 

Agreed, free_zone_device_folio() needs to split the folio on put.


>>
>>> to folio_free() contains no order information, but just the used-to-be
>>> head page and the remaining 511 pages are free. How does Intel Xe driver
>>> handle it without knowing folio order?
>>>
>>
>> It’s a bit convoluted, but folio/page->zone_device_data points to a
>> reference-counted object in GPU SVM. When the object’s reference count
>> drops to zero, we callback into the driver layer to release the memory.
>> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
>> is then released. If it’s not clear, our original allocation size
>> determines the granularity at which we free the backing store.
>>
>>> Do we really need the order info in ->folio_free() if the folio is split
>>> in free_zone_device_folio()? free_zone_device_folio() should just call
>>> ->folio_free() 2^order times to free individual page.
>>>
>>
>> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
>> reference to our GPU SVM object, so we can free the backing in a single
>> ->folio_free call.
>>
>> Now, if that folio gets split at some point into 4KB pages, then we’d
>> have 512 references to this object set up in the ->folio_split calls.
>> We’d then expect 512 ->folio_free() calls. Same case here: if, for
>> whatever reason, we can’t create a 2MB device page during a 2MB
>> migration and need to create 512 4KB pages so we'd have 512 references
>> to our GPU SVM object.
> 

I still don't follow why the folio_order does not capture the order of the folio.
If the folio is split, we should now have 512 split folios for THP

> Thank you for the explanation. Adding folio order to ->folio_free() makes
> sense to me now.
> 


Balbir


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 21:34                   ` Balbir Singh
@ 2026-01-09 21:43                     ` Zi Yan
  2026-01-09 22:11                       ` Balbir Singh
  0 siblings, 1 reply; 27+ messages in thread
From: Zi Yan @ 2026-01-09 21:43 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Matthew Brost, Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Alistair Popple,
	David Hildenbrand, Oscar Salvador, Andrew Morton, linux-mm,
	linux-cxl, linux-kernel

On 9 Jan 2026, at 16:34, Balbir Singh wrote:

> On 1/10/26 06:15, Zi Yan wrote:
>> On 9 Jan 2026, at 15:03, Matthew Brost wrote:
>>
>>> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
>>>> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
>>>>
>>>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
>>>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
>>>>>>
>>>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>>>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>>>>>>>
>>>>>>>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>
>>>>>>>>>> Split device-private and coherent folios into individual pages before
>>>>>>>>>> freeing so that any order folio can be formed upon the next use of the
>>>>>>>>>> pages.
>>>>>>>>>>
>>>>>>>>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>>>>>>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>>>>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>>>>>>> Cc: David Hildenbrand <david@kernel.org>
>>>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>>>> Cc: linux-mm@kvack.org
>>>>>>>>>> Cc: linux-cxl@vger.kernel.org
>>>>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>>>>>>>>>> ---
>>>>>>>>>>  mm/memremap.c | 2 ++
>>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>>>>>>>> --- a/mm/memremap.c
>>>>>>>>>> +++ b/mm/memremap.c
>>>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>>>>>>>>>  	case MEMORY_DEVICE_COHERENT:
>>>>>>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>>>>>>>>>>  			break;
>>>>>>>>>> +
>>>>>>>>>> +		folio_split_unref(folio);
>>>>>>>>>>  		pgmap->ops->folio_free(folio);
>>>>>>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>>>>>>>  		break;
>>>>>>>>>
>>>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>>>>>>>> which checks the folio order and act upon that.
>>>>>>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
>>>>>>>
>>>>>>> 'let the driver handle the split?' - I had consisder this as an option.
>>>>>>>
>>>>>>>>
>>>>>>>> Passing an order parameter might be better to avoid exposing core MM internals
>>>>>>>> by asking drivers to undo compound pages.
>>>>>>>>
>>>>>>>
>>>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
>>>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
>>>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
>>>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
>>>>>>> split the folio, so I’m leaning toward moving this call into the
>>>>>>> driver’s folio_free function.
>>>>>>
>>>>>> No, that creates asymmetric page handling and is error prone.
>>>>>>
>>>>>
>>>>> I agree it is asymmetric and symmetric is likely better.
>>>>>
>>>>>> In addition, looking at nouveau’s implementation in
>>>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
>>>>>> which is never split, and passes it to zone_device_folio_init(). This
>>>>>> is wrong, since if the folio is large, it will go through prep_compound_page()
>>>>>> again. The bug has not manifested because there is only order-9 large folios.
>>>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
>>>>>> from a free order-9 folio? Maintain a per-order free folio list and
>>>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
>>>>>
>>>>> The way Nouveau handles memory allocations here looks wrong to me—it
>>>>> should probably use DRM Buddy and convert a block buddy to pages rather
>>>>> than tracking a free folio list and free page list. But this is not my
>>>>> driver.
>>>>>
>>>>>> is wrong by calling prep_compound_page() on a folio (already compound page).
>>>>>>
>>>>>
>>>>> I don’t disagree that this implementation is questionable.
>>>>>
>>>>> So what’s the suggestion here—add folio order to folio_free just to
>>>>> accommodate Nouveau’s rather odd memory allocation algorithm? That
>>>>> doesn’t seem right to me either.
>>>>
>>>> Splitting the folio in free_zone_device_folio() and passing folio order
>>>> to folio_free() make sense to me, since after the split, the folio passed
>>>
>>> If this is concensous / direction - I can do this but a tree wide
>>> change.
>>>
>>> I do have another question for everyone here - do we think this spliting
>>> implementation should be considered a Fixes so this can go into 6.19?
>>
>> IMHO, this should be a fix, since it is wrong to call prep_compound_page()
>> on a large folio. IIUC this seems to only affect nouveau now, I will let
>> them to decide.
>>
>
> Agreed, free_zone_device_folio() needs to split the folio on put.
>
>
>>>
>>>> to folio_free() contains no order information, but just the used-to-be
>>>> head page and the remaining 511 pages are free. How does Intel Xe driver
>>>> handle it without knowing folio order?
>>>>
>>>
>>> It’s a bit convoluted, but folio/page->zone_device_data points to a
>>> reference-counted object in GPU SVM. When the object’s reference count
>>> drops to zero, we callback into the driver layer to release the memory.
>>> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
>>> is then released. If it’s not clear, our original allocation size
>>> determines the granularity at which we free the backing store.
>>>
>>>> Do we really need the order info in ->folio_free() if the folio is split
>>>> in free_zone_device_folio()? free_zone_device_folio() should just call
>>>> ->folio_free() 2^order times to free individual page.
>>>>
>>>
>>> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
>>> reference to our GPU SVM object, so we can free the backing in a single
>>> ->folio_free call.
>>>
>>> Now, if that folio gets split at some point into 4KB pages, then we’d
>>> have 512 references to this object set up in the ->folio_split calls.
>>> We’d then expect 512 ->folio_free() calls. Same case here: if, for
>>> whatever reason, we can’t create a 2MB device page during a 2MB
>>> migration and need to create 512 4KB pages so we'd have 512 references
>>> to our GPU SVM object.
>>
>
> I still don't follow why the folio_order does not capture the order of the folio.
> If the folio is split, we should now have 512 split folios for THP

folio_order() should return 0 after the folio is split.

In terms of the number of after-split folios, it is 512 for current code base
since THP is only 2MB in zone devices, but not future proof if mTHP support
is added. It also causes confusion in core MM, where folio can have
all kinds of orders.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 21:43                     ` Zi Yan
@ 2026-01-09 22:11                       ` Balbir Singh
  2026-01-09 22:14                         ` Zi Yan
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2026-01-09 22:11 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Brost, Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Alistair Popple,
	David Hildenbrand, Oscar Salvador, Andrew Morton, linux-mm,
	linux-cxl, linux-kernel

On 1/10/26 07:43, Zi Yan wrote:
> On 9 Jan 2026, at 16:34, Balbir Singh wrote:
> 
>> On 1/10/26 06:15, Zi Yan wrote:
>>> On 9 Jan 2026, at 15:03, Matthew Brost wrote:
>>>
>>>> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
>>>>> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
>>>>>
>>>>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
>>>>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
>>>>>>>
>>>>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>>>>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>>>>>>>>
>>>>>>>>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>
>>>>>>>>>>> Split device-private and coherent folios into individual pages before
>>>>>>>>>>> freeing so that any order folio can be formed upon the next use of the
>>>>>>>>>>> pages.
>>>>>>>>>>>
>>>>>>>>>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>>>>>>>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>>>>>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>>>>>>>> Cc: David Hildenbrand <david@kernel.org>
>>>>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>>>>> Cc: linux-mm@kvack.org
>>>>>>>>>>> Cc: linux-cxl@vger.kernel.org
>>>>>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>>>>>>>>>>> ---
>>>>>>>>>>>  mm/memremap.c | 2 ++
>>>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>>>>>>>>> --- a/mm/memremap.c
>>>>>>>>>>> +++ b/mm/memremap.c
>>>>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>>>>>>>>>>  	case MEMORY_DEVICE_COHERENT:
>>>>>>>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>>>>>>>>>>>  			break;
>>>>>>>>>>> +
>>>>>>>>>>> +		folio_split_unref(folio);
>>>>>>>>>>>  		pgmap->ops->folio_free(folio);
>>>>>>>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>>>>>>>>  		break;
>>>>>>>>>>
>>>>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>>>>>>>>> which checks the folio order and act upon that.
>>>>>>>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
>>>>>>>>
>>>>>>>> 'let the driver handle the split?' - I had consisder this as an option.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Passing an order parameter might be better to avoid exposing core MM internals
>>>>>>>>> by asking drivers to undo compound pages.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
>>>>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
>>>>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
>>>>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
>>>>>>>> split the folio, so I’m leaning toward moving this call into the
>>>>>>>> driver’s folio_free function.
>>>>>>>
>>>>>>> No, that creates asymmetric page handling and is error prone.
>>>>>>>
>>>>>>
>>>>>> I agree it is asymmetric and symmetric is likely better.
>>>>>>
>>>>>>> In addition, looking at nouveau’s implementation in
>>>>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
>>>>>>> which is never split, and passes it to zone_device_folio_init(). This
>>>>>>> is wrong, since if the folio is large, it will go through prep_compound_page()
>>>>>>> again. The bug has not manifested because there is only order-9 large folios.
>>>>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
>>>>>>> from a free order-9 folio? Maintain a per-order free folio list and
>>>>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
>>>>>>
>>>>>> The way Nouveau handles memory allocations here looks wrong to me—it
>>>>>> should probably use DRM Buddy and convert a block buddy to pages rather
>>>>>> than tracking a free folio list and free page list. But this is not my
>>>>>> driver.
>>>>>>
>>>>>>> is wrong by calling prep_compound_page() on a folio (already compound page).
>>>>>>>
>>>>>>
>>>>>> I don’t disagree that this implementation is questionable.
>>>>>>
>>>>>> So what’s the suggestion here—add folio order to folio_free just to
>>>>>> accommodate Nouveau’s rather odd memory allocation algorithm? That
>>>>>> doesn’t seem right to me either.
>>>>>
>>>>> Splitting the folio in free_zone_device_folio() and passing folio order
>>>>> to folio_free() make sense to me, since after the split, the folio passed
>>>>
>>>> If this is concensous / direction - I can do this but a tree wide
>>>> change.
>>>>
>>>> I do have another question for everyone here - do we think this spliting
>>>> implementation should be considered a Fixes so this can go into 6.19?
>>>
>>> IMHO, this should be a fix, since it is wrong to call prep_compound_page()
>>> on a large folio. IIUC this seems to only affect nouveau now, I will let
>>> them to decide.
>>>
>>
>> Agreed, free_zone_device_folio() needs to split the folio on put.
>>
>>
>>>>
>>>>> to folio_free() contains no order information, but just the used-to-be
>>>>> head page and the remaining 511 pages are free. How does Intel Xe driver
>>>>> handle it without knowing folio order?
>>>>>
>>>>
>>>> It’s a bit convoluted, but folio/page->zone_device_data points to a
>>>> reference-counted object in GPU SVM. When the object’s reference count
>>>> drops to zero, we callback into the driver layer to release the memory.
>>>> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
>>>> is then released. If it’s not clear, our original allocation size
>>>> determines the granularity at which we free the backing store.
>>>>
>>>>> Do we really need the order info in ->folio_free() if the folio is split
>>>>> in free_zone_device_folio()? free_zone_device_folio() should just call
>>>>> ->folio_free() 2^order times to free individual page.
>>>>>
>>>>
>>>> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
>>>> reference to our GPU SVM object, so we can free the backing in a single
>>>> ->folio_free call.
>>>>
>>>> Now, if that folio gets split at some point into 4KB pages, then we’d
>>>> have 512 references to this object set up in the ->folio_split calls.
>>>> We’d then expect 512 ->folio_free() calls. Same case here: if, for
>>>> whatever reason, we can’t create a 2MB device page during a 2MB
>>>> migration and need to create 512 4KB pages so we'd have 512 references
>>>> to our GPU SVM object.
>>>
>>
>> I still don't follow why the folio_order does not capture the order of the folio.
>> If the folio is split, we should now have 512 split folios for THP
> 
> folio_order() should return 0 after the folio is split.
> 
> In terms of the number of after-split folios, it is 512 for current code base
> since THP is only 2MB in zone devices, but not future proof if mTHP support
> is added. It also causes confusion in core MM, where folio can have
> all kinds of orders.
> 
> 

I see that folio_split_unref() to see that there is no driver
callback during the split. Patch 3 controls the order of

+		folio_split_unref(folio);
 		pgmap->ops->folio_free(folio);

@Matthew, is there a reason to do the split prior to free? pgmap->ops->folio_free(folio)
shouldn't impact the folio itself, the backing memory can be freed and then the
folio split?


Balbir


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 22:11                       ` Balbir Singh
@ 2026-01-09 22:14                         ` Zi Yan
  2026-01-09 22:36                           ` Balbir Singh
  0 siblings, 1 reply; 27+ messages in thread
From: Zi Yan @ 2026-01-09 22:14 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Matthew Brost, Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Alistair Popple,
	David Hildenbrand, Oscar Salvador, Andrew Morton, linux-mm,
	linux-cxl, linux-kernel

On 9 Jan 2026, at 17:11, Balbir Singh wrote:

> On 1/10/26 07:43, Zi Yan wrote:
>> On 9 Jan 2026, at 16:34, Balbir Singh wrote:
>>
>>> On 1/10/26 06:15, Zi Yan wrote:
>>>> On 9 Jan 2026, at 15:03, Matthew Brost wrote:
>>>>
>>>>> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
>>>>>> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
>>>>>>
>>>>>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
>>>>>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
>>>>>>>>
>>>>>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>>>>>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>>>>>>>>>
>>>>>>>>>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Split device-private and coherent folios into individual pages before
>>>>>>>>>>>> freeing so that any order folio can be formed upon the next use of the
>>>>>>>>>>>> pages.
>>>>>>>>>>>>
>>>>>>>>>>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>>>>>>>>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>>>>>>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>>>>>>>>> Cc: David Hildenbrand <david@kernel.org>
>>>>>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>>>>>> Cc: linux-mm@kvack.org
>>>>>>>>>>>> Cc: linux-cxl@vger.kernel.org
>>>>>>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  mm/memremap.c | 2 ++
>>>>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>>>>>>>>>> --- a/mm/memremap.c
>>>>>>>>>>>> +++ b/mm/memremap.c
>>>>>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>>>>>>>>>>>  	case MEMORY_DEVICE_COHERENT:
>>>>>>>>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>>>>>>>>>>>>  			break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +		folio_split_unref(folio);
>>>>>>>>>>>>  		pgmap->ops->folio_free(folio);
>>>>>>>>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>>>>>>>>>  		break;
>>>>>>>>>>>
>>>>>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>>>>>>>>>> which checks the folio order and act upon that.
>>>>>>>>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
>>>>>>>>>
>>>>>>>>> 'let the driver handle the split?' - I had consisder this as an option.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Passing an order parameter might be better to avoid exposing core MM internals
>>>>>>>>>> by asking drivers to undo compound pages.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
>>>>>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
>>>>>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
>>>>>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
>>>>>>>>> split the folio, so I’m leaning toward moving this call into the
>>>>>>>>> driver’s folio_free function.
>>>>>>>>
>>>>>>>> No, that creates asymmetric page handling and is error prone.
>>>>>>>>
>>>>>>>
>>>>>>> I agree it is asymmetric and symmetric is likely better.
>>>>>>>
>>>>>>>> In addition, looking at nouveau’s implementation in
>>>>>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
>>>>>>>> which is never split, and passes it to zone_device_folio_init(). This
>>>>>>>> is wrong, since if the folio is large, it will go through prep_compound_page()
>>>>>>>> again. The bug has not manifested because there is only order-9 large folios.
>>>>>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
>>>>>>>> from a free order-9 folio? Maintain a per-order free folio list and
>>>>>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
>>>>>>>
>>>>>>> The way Nouveau handles memory allocations here looks wrong to me—it
>>>>>>> should probably use DRM Buddy and convert a block buddy to pages rather
>>>>>>> than tracking a free folio list and free page list. But this is not my
>>>>>>> driver.
>>>>>>>
>>>>>>>> is wrong by calling prep_compound_page() on a folio (already compound page).
>>>>>>>>
>>>>>>>
>>>>>>> I don’t disagree that this implementation is questionable.
>>>>>>>
>>>>>>> So what’s the suggestion here—add folio order to folio_free just to
>>>>>>> accommodate Nouveau’s rather odd memory allocation algorithm? That
>>>>>>> doesn’t seem right to me either.
>>>>>>
>>>>>> Splitting the folio in free_zone_device_folio() and passing folio order
>>>>>> to folio_free() make sense to me, since after the split, the folio passed
>>>>>
>>>>> If this is concensous / direction - I can do this but a tree wide
>>>>> change.
>>>>>
>>>>> I do have another question for everyone here - do we think this spliting
>>>>> implementation should be considered a Fixes so this can go into 6.19?
>>>>
>>>> IMHO, this should be a fix, since it is wrong to call prep_compound_page()
>>>> on a large folio. IIUC this seems to only affect nouveau now, I will let
>>>> them to decide.
>>>>
>>>
>>> Agreed, free_zone_device_folio() needs to split the folio on put.
>>>
>>>
>>>>>
>>>>>> to folio_free() contains no order information, but just the used-to-be
>>>>>> head page and the remaining 511 pages are free. How does Intel Xe driver
>>>>>> handle it without knowing folio order?
>>>>>>
>>>>>
>>>>> It’s a bit convoluted, but folio/page->zone_device_data points to a
>>>>> reference-counted object in GPU SVM. When the object’s reference count
>>>>> drops to zero, we callback into the driver layer to release the memory.
>>>>> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
>>>>> is then released. If it’s not clear, our original allocation size
>>>>> determines the granularity at which we free the backing store.
>>>>>
>>>>>> Do we really need the order info in ->folio_free() if the folio is split
>>>>>> in free_zone_device_folio()? free_zone_device_folio() should just call
>>>>>> ->folio_free() 2^order times to free individual page.
>>>>>>
>>>>>
>>>>> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
>>>>> reference to our GPU SVM object, so we can free the backing in a single
>>>>> ->folio_free call.
>>>>>
>>>>> Now, if that folio gets split at some point into 4KB pages, then we’d
>>>>> have 512 references to this object set up in the ->folio_split calls.
>>>>> We’d then expect 512 ->folio_free() calls. Same case here: if, for
>>>>> whatever reason, we can’t create a 2MB device page during a 2MB
>>>>> migration and need to create 512 4KB pages so we'd have 512 references
>>>>> to our GPU SVM object.
>>>>
>>>
>>> I still don't follow why the folio_order does not capture the order of the folio.
>>> If the folio is split, we should now have 512 split folios for THP
>>
>> folio_order() should return 0 after the folio is split.
>>
>> In terms of the number of after-split folios, it is 512 for current code base
>> since THP is only 2MB in zone devices, but not future proof if mTHP support
>> is added. It also causes confusion in core MM, where folio can have
>> all kinds of orders.
>>
>>
>
> I see that folio_split_unref() to see that there is no driver
> callback during the split. Patch 3 controls the order of
>
> +		folio_split_unref(folio);
>  		pgmap->ops->folio_free(folio);
>
> @Matthew, is there a reason to do the split prior to free? pgmap->ops->folio_free(folio)
> shouldn't impact the folio itself, the backing memory can be freed and then the
> folio split?

Quote Matthew from [1]:

... this step must be done before calling folio_free and include a barrier,
as the page can be immediately reallocated.

[1] https://lore.kernel.org/all/aV8TuK5255NXd2PS@lstrano-desk.jf.intel.com/

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 22:14                         ` Zi Yan
@ 2026-01-09 22:36                           ` Balbir Singh
  2026-01-09 23:15                             ` Matthew Brost
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2026-01-09 22:36 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Brost, Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Alistair Popple,
	David Hildenbrand, Oscar Salvador, Andrew Morton, linux-mm,
	linux-cxl, linux-kernel

On 1/10/26 08:14, Zi Yan wrote:
> On 9 Jan 2026, at 17:11, Balbir Singh wrote:
> 
>> On 1/10/26 07:43, Zi Yan wrote:
>>> On 9 Jan 2026, at 16:34, Balbir Singh wrote:
>>>
>>>> On 1/10/26 06:15, Zi Yan wrote:
>>>>> On 9 Jan 2026, at 15:03, Matthew Brost wrote:
>>>>>
>>>>>> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
>>>>>>> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
>>>>>>>
>>>>>>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
>>>>>>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
>>>>>>>>>
>>>>>>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>>>>>>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Split device-private and coherent folios into individual pages before
>>>>>>>>>>>>> freeing so that any order folio can be formed upon the next use of the
>>>>>>>>>>>>> pages.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>>>>>>>>>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>>>>>>>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>>>>>>>>>> Cc: David Hildenbrand <david@kernel.org>
>>>>>>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>>>>>>> Cc: linux-mm@kvack.org
>>>>>>>>>>>>> Cc: linux-cxl@vger.kernel.org
>>>>>>>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>>>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  mm/memremap.c | 2 ++
>>>>>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>>>>>>>>>>> --- a/mm/memremap.c
>>>>>>>>>>>>> +++ b/mm/memremap.c
>>>>>>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>>>>>>>>>>>>  	case MEMORY_DEVICE_COHERENT:
>>>>>>>>>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>>>>>>>>>>>>>  			break;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +		folio_split_unref(folio);
>>>>>>>>>>>>>  		pgmap->ops->folio_free(folio);
>>>>>>>>>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>>>>>>>>>>  		break;
>>>>>>>>>>>>
>>>>>>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>>>>>>>>>>> which checks the folio order and act upon that.
>>>>>>>>>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
>>>>>>>>>>
>>>>>>>>>> 'let the driver handle the split?' - I had consisder this as an option.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Passing an order parameter might be better to avoid exposing core MM internals
>>>>>>>>>>> by asking drivers to undo compound pages.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
>>>>>>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
>>>>>>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
>>>>>>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
>>>>>>>>>> split the folio, so I’m leaning toward moving this call into the
>>>>>>>>>> driver’s folio_free function.
>>>>>>>>>
>>>>>>>>> No, that creates asymmetric page handling and is error prone.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree it is asymmetric and symmetric is likely better.
>>>>>>>>
>>>>>>>>> In addition, looking at nouveau’s implementation in
>>>>>>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
>>>>>>>>> which is never split, and passes it to zone_device_folio_init(). This
>>>>>>>>> is wrong, since if the folio is large, it will go through prep_compound_page()
>>>>>>>>> again. The bug has not manifested because there is only order-9 large folios.
>>>>>>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
>>>>>>>>> from a free order-9 folio? Maintain a per-order free folio list and
>>>>>>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
>>>>>>>>
>>>>>>>> The way Nouveau handles memory allocations here looks wrong to me—it
>>>>>>>> should probably use DRM Buddy and convert a block buddy to pages rather
>>>>>>>> than tracking a free folio list and free page list. But this is not my
>>>>>>>> driver.
>>>>>>>>
>>>>>>>>> is wrong by calling prep_compound_page() on a folio (already compound page).
>>>>>>>>>
>>>>>>>>
>>>>>>>> I don’t disagree that this implementation is questionable.
>>>>>>>>
>>>>>>>> So what’s the suggestion here—add folio order to folio_free just to
>>>>>>>> accommodate Nouveau’s rather odd memory allocation algorithm? That
>>>>>>>> doesn’t seem right to me either.
>>>>>>>
>>>>>>> Splitting the folio in free_zone_device_folio() and passing folio order
>>>>>>> to folio_free() make sense to me, since after the split, the folio passed
>>>>>>
>>>>>> If this is concensous / direction - I can do this but a tree wide
>>>>>> change.
>>>>>>
>>>>>> I do have another question for everyone here - do we think this spliting
>>>>>> implementation should be considered a Fixes so this can go into 6.19?
>>>>>
>>>>> IMHO, this should be a fix, since it is wrong to call prep_compound_page()
>>>>> on a large folio. IIUC this seems to only affect nouveau now, I will let
>>>>> them to decide.
>>>>>
>>>>
>>>> Agreed, free_zone_device_folio() needs to split the folio on put.
>>>>
>>>>
>>>>>>
>>>>>>> to folio_free() contains no order information, but just the used-to-be
>>>>>>> head page and the remaining 511 pages are free. How does Intel Xe driver
>>>>>>> handle it without knowing folio order?
>>>>>>>
>>>>>>
>>>>>> It’s a bit convoluted, but folio/page->zone_device_data points to a
>>>>>> reference-counted object in GPU SVM. When the object’s reference count
>>>>>> drops to zero, we callback into the driver layer to release the memory.
>>>>>> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
>>>>>> is then released. If it’s not clear, our original allocation size
>>>>>> determines the granularity at which we free the backing store.
>>>>>>
>>>>>>> Do we really need the order info in ->folio_free() if the folio is split
>>>>>>> in free_zone_device_folio()? free_zone_device_folio() should just call
>>>>>>> ->folio_free() 2^order times to free individual page.
>>>>>>>
>>>>>>
>>>>>> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
>>>>>> reference to our GPU SVM object, so we can free the backing in a single
>>>>>> ->folio_free call.
>>>>>>
>>>>>> Now, if that folio gets split at some point into 4KB pages, then we’d
>>>>>> have 512 references to this object set up in the ->folio_split calls.
>>>>>> We’d then expect 512 ->folio_free() calls. Same case here: if, for
>>>>>> whatever reason, we can’t create a 2MB device page during a 2MB
>>>>>> migration and need to create 512 4KB pages so we'd have 512 references
>>>>>> to our GPU SVM object.
>>>>>
>>>>
>>>> I still don't follow why the folio_order does not capture the order of the folio.
>>>> If the folio is split, we should now have 512 split folios for THP
>>>
>>> folio_order() should return 0 after the folio is split.
>>>
>>> In terms of the number of after-split folios, it is 512 for current code base
>>> since THP is only 2MB in zone devices, but not future proof if mTHP support
>>> is added. It also causes confusion in core MM, where folio can have
>>> all kinds of orders.
>>>
>>>
>>
>> I see that folio_split_unref() to see that there is no driver
>> callback during the split. Patch 3 controls the order of
>>
>> +		folio_split_unref(folio);
>>  		pgmap->ops->folio_free(folio);
>>
>> @Matthew, is there a reason to do the split prior to free? pgmap->ops->folio_free(folio)
>> shouldn't impact the folio itself, the backing memory can be freed and then the
>> folio split?
> 
> Quote Matthew from [1]:
> 
> ... this step must be done before calling folio_free and include a barrier,
> as the page can be immediately reallocated.
> 
> [1] https://lore.kernel.org/all/aV8TuK5255NXd2PS@lstrano-desk.jf.intel.com/
> 

Thanks, I am not a TTM/BO expert

So that leaves us with

1. Pass the order to folio_free()
2. Consider calling folio_free() callback for each split folio during folio_split_unref(),
   but that means the driver needs to consolidate all the relevant information

#1 works, but the information there is stale, in the sense that we are passing in the
old order information, the order is useful for the driver to know the size of it's
backing allocation
#2 should work too, but it means PMD_ORDER frees as opposed to 1

Balbir


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
  2026-01-09 22:36                           ` Balbir Singh
@ 2026-01-09 23:15                             ` Matthew Brost
  0 siblings, 0 replies; 27+ messages in thread
From: Matthew Brost @ 2026-01-09 23:15 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Zi Yan, Mika Penttilä,
	Francois Dugast, intel-xe, dri-devel, Alistair Popple,
	David Hildenbrand, Oscar Salvador, Andrew Morton, linux-mm,
	linux-cxl, linux-kernel

On Sat, Jan 10, 2026 at 09:36:04AM +1100, Balbir Singh wrote:
> On 1/10/26 08:14, Zi Yan wrote:
> > On 9 Jan 2026, at 17:11, Balbir Singh wrote:
> > 
> >> On 1/10/26 07:43, Zi Yan wrote:
> >>> On 9 Jan 2026, at 16:34, Balbir Singh wrote:
> >>>
> >>>> On 1/10/26 06:15, Zi Yan wrote:
> >>>>> On 9 Jan 2026, at 15:03, Matthew Brost wrote:
> >>>>>
> >>>>>> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
> >>>>>>> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
> >>>>>>>
> >>>>>>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
> >>>>>>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
> >>>>>>>>>
> >>>>>>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
> >>>>>>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 1/9/26 10:54, Francois Dugast wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> From: Matthew Brost <matthew.brost@intel.com>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Split device-private and coherent folios into individual pages before
> >>>>>>>>>>>>> freeing so that any order folio can be formed upon the next use of the
> >>>>>>>>>>>>> pages.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cc: Balbir Singh <balbirs@nvidia.com>
> >>>>>>>>>>>>> Cc: Alistair Popple <apopple@nvidia.com>
> >>>>>>>>>>>>> Cc: Zi Yan <ziy@nvidia.com>
> >>>>>>>>>>>>> Cc: David Hildenbrand <david@kernel.org>
> >>>>>>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
> >>>>>>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>>>>>>>>>>>> Cc: linux-mm@kvack.org
> >>>>>>>>>>>>> Cc: linux-cxl@vger.kernel.org
> >>>>>>>>>>>>> Cc: linux-kernel@vger.kernel.org
> >>>>>>>>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>>>>>>>>>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> >>>>>>>>>>>>> ---
> >>>>>>>>>>>>>  mm/memremap.c | 2 ++
> >>>>>>>>>>>>>  1 file changed, 2 insertions(+)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
> >>>>>>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
> >>>>>>>>>>>>> --- a/mm/memremap.c
> >>>>>>>>>>>>> +++ b/mm/memremap.c
> >>>>>>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
> >>>>>>>>>>>>>  	case MEMORY_DEVICE_COHERENT:
> >>>>>>>>>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
> >>>>>>>>>>>>>  			break;
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +		folio_split_unref(folio);
> >>>>>>>>>>>>>  		pgmap->ops->folio_free(folio);
> >>>>>>>>>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
> >>>>>>>>>>>>>  		break;
> >>>>>>>>>>>>
> >>>>>>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
> >>>>>>>>>>>> which checks the folio order and act upon that.
> >>>>>>>>>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
> >>>>>>>>>>
> >>>>>>>>>> 'let the driver handle the split?' - I had consisder this as an option.
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Passing an order parameter might be better to avoid exposing core MM internals
> >>>>>>>>>>> by asking drivers to undo compound pages.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
> >>>>>>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
> >>>>>>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
> >>>>>>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
> >>>>>>>>>> split the folio, so I’m leaning toward moving this call into the
> >>>>>>>>>> driver’s folio_free function.
> >>>>>>>>>
> >>>>>>>>> No, that creates asymmetric page handling and is error prone.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I agree it is asymmetric and symmetric is likely better.
> >>>>>>>>
> >>>>>>>>> In addition, looking at nouveau’s implementation in
> >>>>>>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
> >>>>>>>>> which is never split, and passes it to zone_device_folio_init(). This
> >>>>>>>>> is wrong, since if the folio is large, it will go through prep_compound_page()
> >>>>>>>>> again. The bug has not manifested because there is only order-9 large folios.
> >>>>>>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
> >>>>>>>>> from a free order-9 folio? Maintain a per-order free folio list and
> >>>>>>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
> >>>>>>>>
> >>>>>>>> The way Nouveau handles memory allocations here looks wrong to me—it
> >>>>>>>> should probably use DRM Buddy and convert a block buddy to pages rather
> >>>>>>>> than tracking a free folio list and free page list. But this is not my
> >>>>>>>> driver.
> >>>>>>>>
> >>>>>>>>> is wrong by calling prep_compound_page() on a folio (already compound page).
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I don’t disagree that this implementation is questionable.
> >>>>>>>>
> >>>>>>>> So what’s the suggestion here—add folio order to folio_free just to
> >>>>>>>> accommodate Nouveau’s rather odd memory allocation algorithm? That
> >>>>>>>> doesn’t seem right to me either.
> >>>>>>>
> >>>>>>> Splitting the folio in free_zone_device_folio() and passing folio order
> >>>>>>> to folio_free() make sense to me, since after the split, the folio passed
> >>>>>>
> >>>>>> If this is concensous / direction - I can do this but a tree wide
> >>>>>> change.
> >>>>>>
> >>>>>> I do have another question for everyone here - do we think this spliting
> >>>>>> implementation should be considered a Fixes so this can go into 6.19?
> >>>>>
> >>>>> IMHO, this should be a fix, since it is wrong to call prep_compound_page()
> >>>>> on a large folio. IIUC this seems to only affect nouveau now, I will let
> >>>>> them to decide.
> >>>>>
> >>>>
> >>>> Agreed, free_zone_device_folio() needs to split the folio on put.
> >>>>
> >>>>
> >>>>>>
> >>>>>>> to folio_free() contains no order information, but just the used-to-be
> >>>>>>> head page and the remaining 511 pages are free. How does Intel Xe driver
> >>>>>>> handle it without knowing folio order?
> >>>>>>>
> >>>>>>
> >>>>>> It’s a bit convoluted, but folio/page->zone_device_data points to a
> >>>>>> reference-counted object in GPU SVM. When the object’s reference count
> >>>>>> drops to zero, we callback into the driver layer to release the memory.
> >>>>>> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
> >>>>>> is then released. If it’s not clear, our original allocation size
> >>>>>> determines the granularity at which we free the backing store.
> >>>>>>
> >>>>>>> Do we really need the order info in ->folio_free() if the folio is split
> >>>>>>> in free_zone_device_folio()? free_zone_device_folio() should just call
> >>>>>>> ->folio_free() 2^order times to free individual page.
> >>>>>>>
> >>>>>>
> >>>>>> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
> >>>>>> reference to our GPU SVM object, so we can free the backing in a single
> >>>>>> ->folio_free call.
> >>>>>>
> >>>>>> Now, if that folio gets split at some point into 4KB pages, then we’d
> >>>>>> have 512 references to this object set up in the ->folio_split calls.
> >>>>>> We’d then expect 512 ->folio_free() calls. Same case here: if, for
> >>>>>> whatever reason, we can’t create a 2MB device page during a 2MB
> >>>>>> migration and need to create 512 4KB pages so we'd have 512 references
> >>>>>> to our GPU SVM object.
> >>>>>
> >>>>
> >>>> I still don't follow why the folio_order does not capture the order of the folio.
> >>>> If the folio is split, we should now have 512 split folios for THP
> >>>
> >>> folio_order() should return 0 after the folio is split.
> >>>
> >>> In terms of the number of after-split folios, it is 512 for current code base
> >>> since THP is only 2MB in zone devices, but not future proof if mTHP support
> >>> is added. It also causes confusion in core MM, where folio can have
> >>> all kinds of orders.
> >>>
> >>>
> >>
> >> I see that folio_split_unref() to see that there is no driver
> >> callback during the split. Patch 3 controls the order of
> >>
> >> +		folio_split_unref(folio);
> >>  		pgmap->ops->folio_free(folio);
> >>
> >> @Matthew, is there a reason to do the split prior to free? pgmap->ops->folio_free(folio)
> >> shouldn't impact the folio itself, the backing memory can be freed and then the
> >> folio split?
> > 
> > Quote Matthew from [1]:
> > 
> > ... this step must be done before calling folio_free and include a barrier,

Actually, I think it’s fine without a barrier—I confused myself a bit
there. But yes, it must be split before releasing the memory back to the
pool from which it can be reallocated.

> > as the page can be immediately reallocated.
> > 
> > [1] https://lore.kernel.org/all/aV8TuK5255NXd2PS@lstrano-desk.jf.intel.com/
> > 
> 
> Thanks, I am not a TTM/BO expert
> 
> So that leaves us with
> 
> 1. Pass the order to folio_free()
> 2. Consider calling folio_free() callback for each split folio during folio_split_unref(),
>    but that means the driver needs to consolidate all the relevant information
> 
> #1 works, but the information there is stale, in the sense that we are passing in the
> old order information, the order is useful for the driver to know the size of it's
> backing allocation

#1 is my preference here. We don’t need this information in GPU SVM for
Xe, but Nouveau does, and I see a straightforward change in Nouveau.

In this case, “order” means the folio plus the number of pages being
released, with each individual page in an initialized state (i.e., not
compound and with a proper pgmap value, they look the pages output from
memremap_pages, etc...).

I think this interface actually makes sense now that I’ve written it
down. My next revision will implement this along with the renaming
suggestions for s/folio_split_unref/free_zone_device_folio_prepare
agreed upon in patch #1. I’ll also likely mark the relevant core MM
patches wirh fixes tags in my next revision so 6.19 has the correct
folio-splitting behavior - it would be a bit odd to have kernel floating
around with different behavior here.

Let me know if anyone has objections before I move forward with this.

Matt

> #2 should work too, but it means PMD_ORDER frees as opposed to 1
> 
> Balbir


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2026-01-09 23:16 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20260109085605.443316-1-francois.dugast@intel.com>
2026-01-09  8:54 ` [PATCH v3 1/7] mm: Add folio_split_unref helper Francois Dugast
2026-01-09 13:19   ` David Hildenbrand (Red Hat)
2026-01-09 13:26     ` David Hildenbrand (Red Hat)
2026-01-09 14:30       ` Zi Yan
2026-01-09 15:11         ` David Hildenbrand (Red Hat)
2026-01-09 18:38           ` Matthew Brost
2026-01-09 18:37     ` Andrew Morton
2026-01-09 18:41       ` Zi Yan
2026-01-09 18:54         ` Francois Dugast
2026-01-09 18:43       ` Matthew Brost
2026-01-09 19:22         ` Andrew Morton
2026-01-09 19:26           ` Liam R. Howlett
2026-01-09  8:54 ` [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing Francois Dugast
2026-01-09 11:09   ` Mika Penttilä
2026-01-09 17:28     ` Zi Yan
2026-01-09 18:26       ` Matthew Brost
2026-01-09 18:53         ` Zi Yan
2026-01-09 19:08           ` Matthew Brost
2026-01-09 19:23             ` Zi Yan
2026-01-09 20:03               ` Matthew Brost
2026-01-09 20:15                 ` Zi Yan
2026-01-09 21:34                   ` Balbir Singh
2026-01-09 21:43                     ` Zi Yan
2026-01-09 22:11                       ` Balbir Singh
2026-01-09 22:14                         ` Zi Yan
2026-01-09 22:36                           ` Balbir Singh
2026-01-09 23:15                             ` Matthew Brost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox