[Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
@ 2025-10-02  1:38 Wei Yang
  2025-10-02  1:46 ` Wei Yang
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Wei Yang @ 2025-10-02  1:38 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang,
	wangkefeng.wang
  Cc: linux-mm, Wei Yang, stable

We add pmd folio into ds_queue on the first page fault in
__do_huge_pmd_anonymous_page(), so that we can split it in case of
memory pressure. This should be the same for a pmd folio during wp
page fault.

Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
to add it to ds_queue, which means system may not reclaim enough memory
in case of memory pressure even the pmd folio is under used.

Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
folio installation consistent.

Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Dev Jain <dev.jain@arm.com>
Cc: <stable@vger.kernel.org>

---
v2:
  * add fix, cc stable and put description about the flow of current
    code
  * move deferred_split_folio() into map_anon_folio_pmd()
---
 mm/huge_memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1b81680b4225..f13de93637bf 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1232,6 +1232,7 @@ static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd,
 	count_vm_event(THP_FAULT_ALLOC);
 	count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
 	count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
+	deferred_split_folio(folio, false);
 }
 
 static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
@@ -1272,7 +1273,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
 		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
 		map_anon_folio_pmd(folio, vmf->pmd, vma, haddr);
 		mm_inc_nr_ptes(vma->vm_mm);
-		deferred_split_folio(folio, false);
 		spin_unlock(vmf->ptl);
 	}
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-02  1:38 [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd() Wei Yang
@ 2025-10-02  1:46 ` Wei Yang
  2025-10-02  2:31   ` Lance Yang
  2025-10-02  7:14 ` David Hildenbrand
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Wei Yang @ 2025-10-02  1:46 UTC (permalink / raw)
  To: Wei Yang
  Cc: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang,
	wangkefeng.wang, linux-mm, stable

On Thu, Oct 02, 2025 at 01:38:25AM +0000, Wei Yang wrote:
>We add pmd folio into ds_queue on the first page fault in
>__do_huge_pmd_anonymous_page(), so that we can split it in case of
>memory pressure. This should be the same for a pmd folio during wp
>page fault.
>
>Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
>to add it to ds_queue, which means system may not reclaim enough memory
>in case of memory pressure even the pmd folio is under used.
>
>Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
>folio installation consistent.
>

Since we move deferred_split_folio() into map_anon_folio_pmd(), I am thinking
about whether we can consolidate the process in collapse_huge_page().

Use map_anon_folio_pmd() in collapse_huge_page(), but skip those statistic
adjustment.

>Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
>Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>Cc: David Hildenbrand <david@redhat.com>
>Cc: Lance Yang <lance.yang@linux.dev>
>Cc: Dev Jain <dev.jain@arm.com>
>Cc: <stable@vger.kernel.org>
>

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-02  1:46 ` Wei Yang
@ 2025-10-02  2:31   ` Lance Yang
  2025-10-02  3:17     ` Wei Yang
  0 siblings, 1 reply; 18+ messages in thread
From: Lance Yang @ 2025-10-02  2:31 UTC (permalink / raw)
  To: Wei Yang
  Cc: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, wangkefeng.wang,
	linux-mm, stable



On 2025/10/2 09:46, Wei Yang wrote:
> On Thu, Oct 02, 2025 at 01:38:25AM +0000, Wei Yang wrote:
>> We add pmd folio into ds_queue on the first page fault in
>> __do_huge_pmd_anonymous_page(), so that we can split it in case of
>> memory pressure. This should be the same for a pmd folio during wp
>> page fault.
>>
>> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
>> to add it to ds_queue, which means system may not reclaim enough memory
>> in case of memory pressure even the pmd folio is under used.
>>
>> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
>> folio installation consistent.
>>
> 
> Since we move deferred_split_folio() into map_anon_folio_pmd(), I am thinking
> about whether we can consolidate the process in collapse_huge_page().
> 
> Use map_anon_folio_pmd() in collapse_huge_page(), but skip those statistic
> adjustment.

Yeah, that's a good idea :)

We could add a simple bool is_fault parameter to map_anon_folio_pmd()
to control the statistics.

The fault paths would call it with true, and the collapse paths could
then call it with false.

Something like this:

```
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1b81680b4225..9924180a4a56 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1218,7 +1218,7 @@ static struct folio 
*vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
  }

  static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd,
-		struct vm_area_struct *vma, unsigned long haddr)
+		struct vm_area_struct *vma, unsigned long haddr, bool is_fault)
  {
  	pmd_t entry;

@@ -1228,10 +1228,15 @@ static void map_anon_folio_pmd(struct folio 
*folio, pmd_t *pmd,
  	folio_add_lru_vma(folio, vma);
  	set_pmd_at(vma->vm_mm, haddr, pmd, entry);
  	update_mmu_cache_pmd(vma, haddr, pmd);
-	add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
-	count_vm_event(THP_FAULT_ALLOC);
-	count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
-	count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
+
+	if (is_fault) {
+		add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
+		count_vm_event(THP_FAULT_ALLOC);
+		count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
+		count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
+	}
+
+	deferred_split_folio(folio, false);
  }

  static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d0957648db19..2eddd5a60e48 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1227,17 +1227,10 @@ static int collapse_huge_page(struct mm_struct 
*mm, unsigned long address,
  	__folio_mark_uptodate(folio);
  	pgtable = pmd_pgtable(_pmd);

-	_pmd = folio_mk_pmd(folio, vma->vm_page_prot);
-	_pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
-
  	spin_lock(pmd_ptl);
  	BUG_ON(!pmd_none(*pmd));
-	folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
-	folio_add_lru_vma(folio, vma);
  	pgtable_trans_huge_deposit(mm, pmd, pgtable);
-	set_pmd_at(mm, address, pmd, _pmd);
-	update_mmu_cache_pmd(vma, address, pmd);
-	deferred_split_folio(folio, false);
+	map_anon_folio_pmd(folio, pmd, vma, address, false);
  	spin_unlock(pmd_ptl);

  	folio = NULL;
```

Untested, though.

> 
>> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Lance Yang <lance.yang@linux.dev>
>> Cc: Dev Jain <dev.jain@arm.com>
>> Cc: <stable@vger.kernel.org>
>>
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-02  2:31   ` Lance Yang
@ 2025-10-02  3:17     ` Wei Yang
  2025-10-02  7:16       ` David Hildenbrand
  0 siblings, 1 reply; 18+ messages in thread
From: Wei Yang @ 2025-10-02  3:17 UTC (permalink / raw)
  To: Lance Yang
  Cc: Wei Yang, akpm, david, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua,
	wangkefeng.wang, linux-mm, stable

On Thu, Oct 02, 2025 at 10:31:53AM +0800, Lance Yang wrote:
>
>
>On 2025/10/2 09:46, Wei Yang wrote:
>> On Thu, Oct 02, 2025 at 01:38:25AM +0000, Wei Yang wrote:
>> > We add pmd folio into ds_queue on the first page fault in
>> > __do_huge_pmd_anonymous_page(), so that we can split it in case of
>> > memory pressure. This should be the same for a pmd folio during wp
>> > page fault.
>> > 
>> > Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
>> > to add it to ds_queue, which means system may not reclaim enough memory
>> > in case of memory pressure even the pmd folio is under used.
>> > 
>> > Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
>> > folio installation consistent.
>> > 
>> 
>> Since we move deferred_split_folio() into map_anon_folio_pmd(), I am thinking
>> about whether we can consolidate the process in collapse_huge_page().
>> 
>> Use map_anon_folio_pmd() in collapse_huge_page(), but skip those statistic
>> adjustment.
>
>Yeah, that's a good idea :)
>
>We could add a simple bool is_fault parameter to map_anon_folio_pmd()
>to control the statistics.
>
>The fault paths would call it with true, and the collapse paths could
>then call it with false.
>
>Something like this:
>
>```
>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>index 1b81680b4225..9924180a4a56 100644
>--- a/mm/huge_memory.c
>+++ b/mm/huge_memory.c
>@@ -1218,7 +1218,7 @@ static struct folio *vma_alloc_anon_folio_pmd(struct
>vm_area_struct *vma,
> }
>
> static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd,
>-		struct vm_area_struct *vma, unsigned long haddr)
>+		struct vm_area_struct *vma, unsigned long haddr, bool is_fault)
> {
> 	pmd_t entry;
>
>@@ -1228,10 +1228,15 @@ static void map_anon_folio_pmd(struct folio *folio,
>pmd_t *pmd,
> 	folio_add_lru_vma(folio, vma);
> 	set_pmd_at(vma->vm_mm, haddr, pmd, entry);
> 	update_mmu_cache_pmd(vma, haddr, pmd);
>-	add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>-	count_vm_event(THP_FAULT_ALLOC);
>-	count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>-	count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
>+
>+	if (is_fault) {
>+		add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>+		count_vm_event(THP_FAULT_ALLOC);
>+		count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>+		count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
>+	}
>+
>+	deferred_split_folio(folio, false);
> }
>
> static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index d0957648db19..2eddd5a60e48 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -1227,17 +1227,10 @@ static int collapse_huge_page(struct mm_struct *mm,
>unsigned long address,
> 	__folio_mark_uptodate(folio);
> 	pgtable = pmd_pgtable(_pmd);
>
>-	_pmd = folio_mk_pmd(folio, vma->vm_page_prot);
>-	_pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
>-
> 	spin_lock(pmd_ptl);
> 	BUG_ON(!pmd_none(*pmd));
>-	folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
>-	folio_add_lru_vma(folio, vma);
> 	pgtable_trans_huge_deposit(mm, pmd, pgtable);
>-	set_pmd_at(mm, address, pmd, _pmd);
>-	update_mmu_cache_pmd(vma, address, pmd);
>-	deferred_split_folio(folio, false);
>+	map_anon_folio_pmd(folio, pmd, vma, address, false);
> 	spin_unlock(pmd_ptl);
>
> 	folio = NULL;
>```
>
>Untested, though.
>

This is the same as I thought.

Will prepare a patch for it.


-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-02  1:38 [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd() Wei Yang
  2025-10-02  1:46 ` Wei Yang
@ 2025-10-02  7:14 ` David Hildenbrand
  2025-10-02  7:26 ` Lance Yang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2025-10-02  7:14 UTC (permalink / raw)
  To: Wei Yang, akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang,
	wangkefeng.wang
  Cc: linux-mm, stable

On 02.10.25 03:38, Wei Yang wrote:
> We add pmd folio into ds_queue on the first page fault in
> __do_huge_pmd_anonymous_page(), so that we can split it in case of
> memory pressure. This should be the same for a pmd folio during wp
> page fault.
> 
> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
> to add it to ds_queue, which means system may not reclaim enough memory
> in case of memory pressure even the pmd folio is under used.
> 
> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
> folio installation consistent.
> 
> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: <stable@vger.kernel.org>

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-02  3:17     ` Wei Yang
@ 2025-10-02  7:16       ` David Hildenbrand
  2025-10-02  7:27         ` Lance Yang
  0 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2025-10-02  7:16 UTC (permalink / raw)
  To: Wei Yang, Lance Yang
  Cc: akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, wangkefeng.wang, linux-mm,
	stable

On 02.10.25 05:17, Wei Yang wrote:
> On Thu, Oct 02, 2025 at 10:31:53AM +0800, Lance Yang wrote:
>>
>>
>> On 2025/10/2 09:46, Wei Yang wrote:
>>> On Thu, Oct 02, 2025 at 01:38:25AM +0000, Wei Yang wrote:
>>>> We add pmd folio into ds_queue on the first page fault in
>>>> __do_huge_pmd_anonymous_page(), so that we can split it in case of
>>>> memory pressure. This should be the same for a pmd folio during wp
>>>> page fault.
>>>>
>>>> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
>>>> to add it to ds_queue, which means system may not reclaim enough memory
>>>> in case of memory pressure even the pmd folio is under used.
>>>>
>>>> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
>>>> folio installation consistent.
>>>>
>>>
>>> Since we move deferred_split_folio() into map_anon_folio_pmd(), I am thinking
>>> about whether we can consolidate the process in collapse_huge_page().
>>>
>>> Use map_anon_folio_pmd() in collapse_huge_page(), but skip those statistic
>>> adjustment.
>>
>> Yeah, that's a good idea :)
>>
>> We could add a simple bool is_fault parameter to map_anon_folio_pmd()
>> to control the statistics.
>>
>> The fault paths would call it with true, and the collapse paths could
>> then call it with false.
>>
>> Something like this:
>>
>> ```
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 1b81680b4225..9924180a4a56 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1218,7 +1218,7 @@ static struct folio *vma_alloc_anon_folio_pmd(struct
>> vm_area_struct *vma,
>> }
>>
>> static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd,
>> -		struct vm_area_struct *vma, unsigned long haddr)
>> +		struct vm_area_struct *vma, unsigned long haddr, bool is_fault)
>> {
>> 	pmd_t entry;
>>
>> @@ -1228,10 +1228,15 @@ static void map_anon_folio_pmd(struct folio *folio,
>> pmd_t *pmd,
>> 	folio_add_lru_vma(folio, vma);
>> 	set_pmd_at(vma->vm_mm, haddr, pmd, entry);
>> 	update_mmu_cache_pmd(vma, haddr, pmd);
>> -	add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>> -	count_vm_event(THP_FAULT_ALLOC);
>> -	count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>> -	count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
>> +
>> +	if (is_fault) {
>> +		add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>> +		count_vm_event(THP_FAULT_ALLOC);
>> +		count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>> +		count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
>> +	}
>> +
>> +	deferred_split_folio(folio, false);
>> }
>>
>> static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index d0957648db19..2eddd5a60e48 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1227,17 +1227,10 @@ static int collapse_huge_page(struct mm_struct *mm,
>> unsigned long address,
>> 	__folio_mark_uptodate(folio);
>> 	pgtable = pmd_pgtable(_pmd);
>>
>> -	_pmd = folio_mk_pmd(folio, vma->vm_page_prot);
>> -	_pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
>> -
>> 	spin_lock(pmd_ptl);
>> 	BUG_ON(!pmd_none(*pmd));
>> -	folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
>> -	folio_add_lru_vma(folio, vma);
>> 	pgtable_trans_huge_deposit(mm, pmd, pgtable);
>> -	set_pmd_at(mm, address, pmd, _pmd);
>> -	update_mmu_cache_pmd(vma, address, pmd);
>> -	deferred_split_folio(folio, false);
>> +	map_anon_folio_pmd(folio, pmd, vma, address, false);
>> 	spin_unlock(pmd_ptl);
>>
>> 	folio = NULL;
>> ```
>>
>> Untested, though.
>>
> 
> This is the same as I thought.
> 
> Will prepare a patch for it.

Let's do that as an add-on patch, though.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-02  1:38 [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd() Wei Yang
  2025-10-02  1:46 ` Wei Yang
  2025-10-02  7:14 ` David Hildenbrand
@ 2025-10-02  7:26 ` Lance Yang
  2025-10-03  7:54 ` Dev Jain
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Lance Yang @ 2025-10-02  7:26 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, stable, npache, lorenzo.stoakes, baohua, ziy, dev.jain,
	wangkefeng.wang, baolin.wang, david, ryan.roberts, Liam.Howlett,
	akpm



On 2025/10/2 09:38, Wei Yang wrote:
> We add pmd folio into ds_queue on the first page fault in
> __do_huge_pmd_anonymous_page(), so that we can split it in case of
> memory pressure. This should be the same for a pmd folio during wp
> page fault.
> 
> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
> to add it to ds_queue, which means system may not reclaim enough memory
> in case of memory pressure even the pmd folio is under used.
> 
> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
> folio installation consistent.
> 
> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: <stable@vger.kernel.org>

Cool. LGTM.

Reviewed-by: Lance Yang <lance.yang@linux.dev>

> 
> ---
> v2:
>    * add fix, cc stable and put description about the flow of current
>      code
>    * move deferred_split_folio() into map_anon_folio_pmd()
> ---
>   mm/huge_memory.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1b81680b4225..f13de93637bf 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1232,6 +1232,7 @@ static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd,
>   	count_vm_event(THP_FAULT_ALLOC);
>   	count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>   	count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
> +	deferred_split_folio(folio, false);
>   }
>   
>   static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
> @@ -1272,7 +1273,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>   		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
>   		map_anon_folio_pmd(folio, vmf->pmd, vma, haddr);
>   		mm_inc_nr_ptes(vma->vm_mm);
> -		deferred_split_folio(folio, false);
>   		spin_unlock(vmf->ptl);
>   	}
>   



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-02  7:16       ` David Hildenbrand
@ 2025-10-02  7:27         ` Lance Yang
  0 siblings, 0 replies; 18+ messages in thread
From: Lance Yang @ 2025-10-02  7:27 UTC (permalink / raw)
  To: Wei Yang, David Hildenbrand
  Cc: akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, wangkefeng.wang, linux-mm,
	stable



On 2025/10/2 15:16, David Hildenbrand wrote:
> On 02.10.25 05:17, Wei Yang wrote:
>> On Thu, Oct 02, 2025 at 10:31:53AM +0800, Lance Yang wrote:
>>>
>>>
>>> On 2025/10/2 09:46, Wei Yang wrote:
>>>> On Thu, Oct 02, 2025 at 01:38:25AM +0000, Wei Yang wrote:
>>>>> We add pmd folio into ds_queue on the first page fault in
>>>>> __do_huge_pmd_anonymous_page(), so that we can split it in case of
>>>>> memory pressure. This should be the same for a pmd folio during wp
>>>>> page fault.
>>>>>
>>>>> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
>>>>> to add it to ds_queue, which means system may not reclaim enough 
>>>>> memory
>>>>> in case of memory pressure even the pmd folio is under used.
>>>>>
>>>>> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
>>>>> folio installation consistent.
>>>>>
>>>>
>>>> Since we move deferred_split_folio() into map_anon_folio_pmd(), I am 
>>>> thinking
>>>> about whether we can consolidate the process in collapse_huge_page().
>>>>
>>>> Use map_anon_folio_pmd() in collapse_huge_page(), but skip those 
>>>> statistic
>>>> adjustment.
>>>
>>> Yeah, that's a good idea :)
>>>
>>> We could add a simple bool is_fault parameter to map_anon_folio_pmd()
>>> to control the statistics.
>>>
>>> The fault paths would call it with true, and the collapse paths could
>>> then call it with false.
>>>
>>> Something like this:
>>>
>>> ```
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 1b81680b4225..9924180a4a56 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -1218,7 +1218,7 @@ static struct folio 
>>> *vma_alloc_anon_folio_pmd(struct
>>> vm_area_struct *vma,
>>> }
>>>
>>> static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd,
>>> -        struct vm_area_struct *vma, unsigned long haddr)
>>> +        struct vm_area_struct *vma, unsigned long haddr, bool is_fault)
>>> {
>>>     pmd_t entry;
>>>
>>> @@ -1228,10 +1228,15 @@ static void map_anon_folio_pmd(struct folio 
>>> *folio,
>>> pmd_t *pmd,
>>>     folio_add_lru_vma(folio, vma);
>>>     set_pmd_at(vma->vm_mm, haddr, pmd, entry);
>>>     update_mmu_cache_pmd(vma, haddr, pmd);
>>> -    add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>>> -    count_vm_event(THP_FAULT_ALLOC);
>>> -    count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>>> -    count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
>>> +
>>> +    if (is_fault) {
>>> +        add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>>> +        count_vm_event(THP_FAULT_ALLOC);
>>> +        count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>>> +        count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
>>> +    }
>>> +
>>> +    deferred_split_folio(folio, false);
>>> }
>>>
>>> static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index d0957648db19..2eddd5a60e48 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -1227,17 +1227,10 @@ static int collapse_huge_page(struct 
>>> mm_struct *mm,
>>> unsigned long address,
>>>     __folio_mark_uptodate(folio);
>>>     pgtable = pmd_pgtable(_pmd);
>>>
>>> -    _pmd = folio_mk_pmd(folio, vma->vm_page_prot);
>>> -    _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
>>> -
>>>     spin_lock(pmd_ptl);
>>>     BUG_ON(!pmd_none(*pmd));
>>> -    folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
>>> -    folio_add_lru_vma(folio, vma);
>>>     pgtable_trans_huge_deposit(mm, pmd, pgtable);
>>> -    set_pmd_at(mm, address, pmd, _pmd);
>>> -    update_mmu_cache_pmd(vma, address, pmd);
>>> -    deferred_split_folio(folio, false);
>>> +    map_anon_folio_pmd(folio, pmd, vma, address, false);
>>>     spin_unlock(pmd_ptl);
>>>
>>>     folio = NULL;
>>> ```
>>>
>>> Untested, though.
>>>
>>
>> This is the same as I thought.
>>
>> Will prepare a patch for it.
> 
> Let's do that as an add-on patch, though.

Yeah, let’s do that separately ;)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-02  1:38 [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd() Wei Yang
                   ` (2 preceding siblings ...)
  2025-10-02  7:26 ` Lance Yang
@ 2025-10-03  7:54 ` Dev Jain
  2025-10-03 13:49 ` Lance Yang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Dev Jain @ 2025-10-03  7:54 UTC (permalink / raw)
  To: Wei Yang, akpm, david, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, baohua, lance.yang,
	wangkefeng.wang
  Cc: linux-mm, stable


On 02/10/25 7:08 am, Wei Yang wrote:
> We add pmd folio into ds_queue on the first page fault in
> __do_huge_pmd_anonymous_page(), so that we can split it in case of
> memory pressure. This should be the same for a pmd folio during wp
> page fault.
>
> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
> to add it to ds_queue, which means system may not reclaim enough memory
> in case of memory pressure even the pmd folio is under used.
>
> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
> folio installation consistent.
>
> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: <stable@vger.kernel.org>
>
> ---

Thanks!

Reviewed-by: Dev Jain <dev.jain@arm.com>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-02  1:38 [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd() Wei Yang
                   ` (3 preceding siblings ...)
  2025-10-03  7:54 ` Dev Jain
@ 2025-10-03 13:49 ` Lance Yang
  2025-10-03 14:08   ` Zi Yan
  2025-10-04  2:04   ` Wei Yang
  2025-10-03 13:53 ` Zi Yan
  2025-10-14  3:49 ` Baolin Wang
  6 siblings, 2 replies; 18+ messages in thread
From: Lance Yang @ 2025-10-03 13:49 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, baolin.wang, lorenzo.stoakes, Liam.Howlett,
	wangkefeng.wang, stable, ziy, ryan.roberts, dev.jain, npache,
	baohua, akpm, david

Hey Wei,

On 2025/10/2 09:38, Wei Yang wrote:
> We add pmd folio into ds_queue on the first page fault in
> __do_huge_pmd_anonymous_page(), so that we can split it in case of
> memory pressure. This should be the same for a pmd folio during wp
> page fault.
> 
> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
> to add it to ds_queue, which means system may not reclaim enough memory

IIRC, it was commit dafff3f4c850 ("mm: split underused THPs") that
started unconditionally adding all new anon THPs to _deferred_list :)

> in case of memory pressure even the pmd folio is under used.
> 
> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
> folio installation consistent.
> 
> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")

Shouldn't this rather be the following?

Fixes: dafff3f4c850 ("mm: split underused THPs")

Thanks,
Lance

> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: <stable@vger.kernel.org>
> 
> ---
> v2:
>    * add fix, cc stable and put description about the flow of current
>      code
>    * move deferred_split_folio() into map_anon_folio_pmd()
> ---
>   mm/huge_memory.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1b81680b4225..f13de93637bf 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1232,6 +1232,7 @@ static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd,
>   	count_vm_event(THP_FAULT_ALLOC);
>   	count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>   	count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
> +	deferred_split_folio(folio, false);
>   }
>   
>   static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
> @@ -1272,7 +1273,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>   		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
>   		map_anon_folio_pmd(folio, vmf->pmd, vma, haddr);
>   		mm_inc_nr_ptes(vma->vm_mm);
> -		deferred_split_folio(folio, false);
>   		spin_unlock(vmf->ptl);
>   	}
>   



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-02  1:38 [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd() Wei Yang
                   ` (4 preceding siblings ...)
  2025-10-03 13:49 ` Lance Yang
@ 2025-10-03 13:53 ` Zi Yan
  2025-10-14  3:49 ` Baolin Wang
  6 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2025-10-03 13:53 UTC (permalink / raw)
  To: Wei Yang
  Cc: akpm, david, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, wangkefeng.wang,
	linux-mm, stable

On 1 Oct 2025, at 21:38, Wei Yang wrote:

> We add pmd folio into ds_queue on the first page fault in
> __do_huge_pmd_anonymous_page(), so that we can split it in case of
> memory pressure. This should be the same for a pmd folio during wp
> page fault.
>
> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
> to add it to ds_queue, which means system may not reclaim enough memory
> in case of memory pressure even the pmd folio is under used.
>
> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
> folio installation consistent.
>
> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: <stable@vger.kernel.org>
>
> ---
> v2:
>   * add fix, cc stable and put description about the flow of current
>     code
>   * move deferred_split_folio() into map_anon_folio_pmd()
> ---
>  mm/huge_memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-03 13:49 ` Lance Yang
@ 2025-10-03 14:08   ` Zi Yan
  2025-10-03 15:30     ` Usama Arif
  2025-10-04  2:13     ` Wei Yang
  2025-10-04  2:04   ` Wei Yang
  1 sibling, 2 replies; 18+ messages in thread
From: Zi Yan @ 2025-10-03 14:08 UTC (permalink / raw)
  To: Usama Arif, Lance Yang
  Cc: Wei Yang, linux-mm, baolin.wang, lorenzo.stoakes, Liam.Howlett,
	wangkefeng.wang, stable, ryan.roberts, dev.jain, npache, baohua,
	akpm, david

On 3 Oct 2025, at 9:49, Lance Yang wrote:

> Hey Wei,
>
> On 2025/10/2 09:38, Wei Yang wrote:
>> We add pmd folio into ds_queue on the first page fault in
>> __do_huge_pmd_anonymous_page(), so that we can split it in case of
>> memory pressure. This should be the same for a pmd folio during wp
>> page fault.
>>
>> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
>> to add it to ds_queue, which means system may not reclaim enough memory
>
> IIRC, it was commit dafff3f4c850 ("mm: split underused THPs") that
> started unconditionally adding all new anon THPs to _deferred_list :)
>
>> in case of memory pressure even the pmd folio is under used.
>>
>> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
>> folio installation consistent.
>>
>> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
>
> Shouldn't this rather be the following?
>
> Fixes: dafff3f4c850 ("mm: split underused THPs")

Yes, I agree. In this case, this patch looks more like an optimization
for split underused THPs.

One observation on this change is that right after zero pmd wp, the
deferred split queue could be scanned, the newly added pmd folio will
split since it is all zero except one subpage. This means we probably
should allocate a base folio for zero pmd wp and map the rest to zero
page at the beginning if split underused THP is enabled to avoid
this long trip. The downside is that user app cannot get a pmd folio
if it is intended to write data into the entire folio.

Usama might be able to give some insight here.


>
> Thanks,
> Lance
>
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Lance Yang <lance.yang@linux.dev>
>> Cc: Dev Jain <dev.jain@arm.com>
>> Cc: <stable@vger.kernel.org>
>>
>> ---
>> v2:
>>    * add fix, cc stable and put description about the flow of current
>>      code
>>    * move deferred_split_folio() into map_anon_folio_pmd()
>> ---
>>   mm/huge_memory.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 1b81680b4225..f13de93637bf 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1232,6 +1232,7 @@ static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd,
>>   	count_vm_event(THP_FAULT_ALLOC);
>>   	count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>>   	count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
>> +	deferred_split_folio(folio, false);
>>   }
>>    static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>> @@ -1272,7 +1273,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>>   		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
>>   		map_anon_folio_pmd(folio, vmf->pmd, vma, haddr);
>>   		mm_inc_nr_ptes(vma->vm_mm);
>> -		deferred_split_folio(folio, false);
>>   		spin_unlock(vmf->ptl);
>>   	}
>>


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-03 14:08   ` Zi Yan
@ 2025-10-03 15:30     ` Usama Arif
  2025-10-03 17:11       ` Zi Yan
  2025-10-04  2:13     ` Wei Yang
  1 sibling, 1 reply; 18+ messages in thread
From: Usama Arif @ 2025-10-03 15:30 UTC (permalink / raw)
  To: Zi Yan, Lance Yang
  Cc: Wei Yang, linux-mm, baolin.wang, lorenzo.stoakes, Liam.Howlett,
	wangkefeng.wang, stable, ryan.roberts, dev.jain, npache, baohua,
	akpm, david



On 03/10/2025 15:08, Zi Yan wrote:
> On 3 Oct 2025, at 9:49, Lance Yang wrote:
> 
>> Hey Wei,
>>
>> On 2025/10/2 09:38, Wei Yang wrote:
>>> We add pmd folio into ds_queue on the first page fault in
>>> __do_huge_pmd_anonymous_page(), so that we can split it in case of
>>> memory pressure. This should be the same for a pmd folio during wp
>>> page fault.
>>>
>>> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
>>> to add it to ds_queue, which means system may not reclaim enough memory
>>
>> IIRC, it was commit dafff3f4c850 ("mm: split underused THPs") that
>> started unconditionally adding all new anon THPs to _deferred_list :)
>>
>>> in case of memory pressure even the pmd folio is under used.
>>>
>>> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
>>> folio installation consistent.
>>>
>>> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
>>
>> Shouldn't this rather be the following?
>>
>> Fixes: dafff3f4c850 ("mm: split underused THPs")
> 
> Yes, I agree. In this case, this patch looks more like an optimization
> for split underused THPs.
> 
> One observation on this change is that right after zero pmd wp, the
> deferred split queue could be scanned, the newly added pmd folio will
> split since it is all zero except one subpage. This means we probably
> should allocate a base folio for zero pmd wp and map the rest to zero
> page at the beginning if split underused THP is enabled to avoid
> this long trip. The downside is that user app cannot get a pmd folio
> if it is intended to write data into the entire folio.
> 
> Usama might be able to give some insight here.
> 

Thanks for CCing me Zi!

hmm I think the downside of not having PMD folio probably outweights the cost of splitting
a zer-filled page?
ofcourse I dont have any numbers to back that up, but that would be my initial guess.

Also:

Acked-by: Usama Arif <usamaarif642@gmail.com>


> 
>>
>> Thanks,
>> Lance
>>
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Lance Yang <lance.yang@linux.dev>
>>> Cc: Dev Jain <dev.jain@arm.com>
>>> Cc: <stable@vger.kernel.org>
>>>
>>> ---
>>> v2:
>>>    * add fix, cc stable and put description about the flow of current
>>>      code
>>>    * move deferred_split_folio() into map_anon_folio_pmd()
>>> ---
>>>   mm/huge_memory.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 1b81680b4225..f13de93637bf 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -1232,6 +1232,7 @@ static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd,
>>>   	count_vm_event(THP_FAULT_ALLOC);
>>>   	count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>>>   	count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
>>> +	deferred_split_folio(folio, false);
>>>   }
>>>    static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>>> @@ -1272,7 +1273,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>>>   		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
>>>   		map_anon_folio_pmd(folio, vmf->pmd, vma, haddr);
>>>   		mm_inc_nr_ptes(vma->vm_mm);
>>> -		deferred_split_folio(folio, false);
>>>   		spin_unlock(vmf->ptl);
>>>   	}
>>>
> 
> 
> Best Regards,
> Yan, Zi



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-03 15:30     ` Usama Arif
@ 2025-10-03 17:11       ` Zi Yan
  0 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2025-10-03 17:11 UTC (permalink / raw)
  To: Usama Arif
  Cc: Lance Yang, Wei Yang, linux-mm, baolin.wang, lorenzo.stoakes,
	Liam.Howlett, wangkefeng.wang, stable, ryan.roberts, dev.jain,
	npache, baohua, akpm, david

On 3 Oct 2025, at 11:30, Usama Arif wrote:

> On 03/10/2025 15:08, Zi Yan wrote:
>> On 3 Oct 2025, at 9:49, Lance Yang wrote:
>>
>>> Hey Wei,
>>>
>>> On 2025/10/2 09:38, Wei Yang wrote:
>>>> We add pmd folio into ds_queue on the first page fault in
>>>> __do_huge_pmd_anonymous_page(), so that we can split it in case of
>>>> memory pressure. This should be the same for a pmd folio during wp
>>>> page fault.
>>>>
>>>> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
>>>> to add it to ds_queue, which means system may not reclaim enough memory
>>>
>>> IIRC, it was commit dafff3f4c850 ("mm: split underused THPs") that
>>> started unconditionally adding all new anon THPs to _deferred_list :)
>>>
>>>> in case of memory pressure even the pmd folio is under used.
>>>>
>>>> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
>>>> folio installation consistent.
>>>>
>>>> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
>>>
>>> Shouldn't this rather be the following?
>>>
>>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>>
>> Yes, I agree. In this case, this patch looks more like an optimization
>> for split underused THPs.
>>
>> One observation on this change is that right after zero pmd wp, the
>> deferred split queue could be scanned, the newly added pmd folio will
>> split since it is all zero except one subpage. This means we probably
>> should allocate a base folio for zero pmd wp and map the rest to zero
>> page at the beginning if split underused THP is enabled to avoid
>> this long trip. The downside is that user app cannot get a pmd folio
>> if it is intended to write data into the entire folio.
>>
>> Usama might be able to give some insight here.
>>
>
> Thanks for CCing me Zi!
>
> hmm I think the downside of not having PMD folio probably outweights the cost of splitting
> a zer-filled page?

Yeah, I agree.

> ofcourse I dont have any numbers to back that up, but that would be my initial guess.
>
> Also:
>
> Acked-by: Usama Arif <usamaarif642@gmail.com>
>
>
>>
>>>
>>> Thanks,
>>> Lance
>>>
>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Lance Yang <lance.yang@linux.dev>
>>>> Cc: Dev Jain <dev.jain@arm.com>
>>>> Cc: <stable@vger.kernel.org>
>>>>
>>>> ---
>>>> v2:
>>>>    * add fix, cc stable and put description about the flow of current
>>>>      code
>>>>    * move deferred_split_folio() into map_anon_folio_pmd()
>>>> ---
>>>>   mm/huge_memory.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 1b81680b4225..f13de93637bf 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -1232,6 +1232,7 @@ static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd,
>>>>   	count_vm_event(THP_FAULT_ALLOC);
>>>>   	count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>>>>   	count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
>>>> +	deferred_split_folio(folio, false);
>>>>   }
>>>>    static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>>>> @@ -1272,7 +1273,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>>>>   		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
>>>>   		map_anon_folio_pmd(folio, vmf->pmd, vma, haddr);
>>>>   		mm_inc_nr_ptes(vma->vm_mm);
>>>> -		deferred_split_folio(folio, false);
>>>>   		spin_unlock(vmf->ptl);
>>>>   	}
>>>>
>>
>>
>> Best Regards,
>> Yan, Zi


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-03 13:49 ` Lance Yang
  2025-10-03 14:08   ` Zi Yan
@ 2025-10-04  2:04   ` Wei Yang
  2025-10-04  2:37     ` Lance Yang
  1 sibling, 1 reply; 18+ messages in thread
From: Wei Yang @ 2025-10-04  2:04 UTC (permalink / raw)
  To: Lance Yang
  Cc: Wei Yang, linux-mm, baolin.wang, lorenzo.stoakes, Liam.Howlett,
	wangkefeng.wang, stable, ziy, ryan.roberts, dev.jain, npache,
	baohua, akpm, david

On Fri, Oct 03, 2025 at 09:49:28PM +0800, Lance Yang wrote:
>Hey Wei,
>
>On 2025/10/2 09:38, Wei Yang wrote:
>> We add pmd folio into ds_queue on the first page fault in
>> __do_huge_pmd_anonymous_page(), so that we can split it in case of
>> memory pressure. This should be the same for a pmd folio during wp
>> page fault.
>> 
>> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
>> to add it to ds_queue, which means system may not reclaim enough memory
>
>IIRC, it was commit dafff3f4c850 ("mm: split underused THPs") that
>started unconditionally adding all new anon THPs to _deferred_list :)
>

Thanks for taking a look.

While at this time do_huge_zero_wp_pmd() is not introduced, how it fix a
non-exist case? And how could it be backported? I am confused here.

>> in case of memory pressure even the pmd folio is under used.
>> 
>> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
>> folio installation consistent.
>> 
>> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
>
>Shouldn't this rather be the following?
>
>Fixes: dafff3f4c850 ("mm: split underused THPs")
>
>Thanks,
>Lance

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-03 14:08   ` Zi Yan
  2025-10-03 15:30     ` Usama Arif
@ 2025-10-04  2:13     ` Wei Yang
  1 sibling, 0 replies; 18+ messages in thread
From: Wei Yang @ 2025-10-04  2:13 UTC (permalink / raw)
  To: Zi Yan
  Cc: Usama Arif, Lance Yang, Wei Yang, linux-mm, baolin.wang,
	lorenzo.stoakes, Liam.Howlett, wangkefeng.wang, stable,
	ryan.roberts, dev.jain, npache, baohua, akpm, david

On Fri, Oct 03, 2025 at 10:08:37AM -0400, Zi Yan wrote:
>On 3 Oct 2025, at 9:49, Lance Yang wrote:
>
>> Hey Wei,
>>
>> On 2025/10/2 09:38, Wei Yang wrote:
>>> We add pmd folio into ds_queue on the first page fault in
>>> __do_huge_pmd_anonymous_page(), so that we can split it in case of
>>> memory pressure. This should be the same for a pmd folio during wp
>>> page fault.
>>>
>>> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
>>> to add it to ds_queue, which means system may not reclaim enough memory
>>
>> IIRC, it was commit dafff3f4c850 ("mm: split underused THPs") that
>> started unconditionally adding all new anon THPs to _deferred_list :)
>>
>>> in case of memory pressure even the pmd folio is under used.
>>>
>>> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
>>> folio installation consistent.
>>>
>>> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
>>
>> Shouldn't this rather be the following?
>>
>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>
>Yes, I agree. In this case, this patch looks more like an optimization
>for split underused THPs.
>
>One observation on this change is that right after zero pmd wp, the
>deferred split queue could be scanned, the newly added pmd folio will
>split since it is all zero except one subpage. This means we probably
>should allocate a base folio for zero pmd wp and map the rest to zero
>page at the beginning if split underused THP is enabled to avoid
>this long trip. The downside is that user app cannot get a pmd folio
>if it is intended to write data into the entire folio.

Thanks for raising this.

IMHO, we could face the similar situation in __do_huge_pmd_anonymous_page().
If my understanding is correct, the allocated folio is zeroed and we don't
have idea how user would write data to it.

Since shrinker is active when memory is low, maybe vma_alloc_anon_folio_pmd()
has told use current status of the memory. If it does get a pmd folio, we are
probably having enough memory in the system.

>
>Usama might be able to give some insight here.
>

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-04  2:04   ` Wei Yang
@ 2025-10-04  2:37     ` Lance Yang
  0 siblings, 0 replies; 18+ messages in thread
From: Lance Yang @ 2025-10-04  2:37 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, baolin.wang, lorenzo.stoakes, Liam.Howlett,
	wangkefeng.wang, stable, ziy, ryan.roberts, dev.jain, npache,
	baohua, akpm, david



On 2025/10/4 10:04, Wei Yang wrote:
> On Fri, Oct 03, 2025 at 09:49:28PM +0800, Lance Yang wrote:
>> Hey Wei,
>>
>> On 2025/10/2 09:38, Wei Yang wrote:
>>> We add pmd folio into ds_queue on the first page fault in
>>> __do_huge_pmd_anonymous_page(), so that we can split it in case of
>>> memory pressure. This should be the same for a pmd folio during wp
>>> page fault.
>>>
>>> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
>>> to add it to ds_queue, which means system may not reclaim enough memory
>>
>> IIRC, it was commit dafff3f4c850 ("mm: split underused THPs") that
>> started unconditionally adding all new anon THPs to _deferred_list :)
>>
> 
> Thanks for taking a look.
> 
> While at this time do_huge_zero_wp_pmd() is not introduced, how it fix a

Ah, I see. I was focused on the policy change ...

> non-exist case? And how could it be backported? I am confused here.

And, yes, 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") was
merged later and it introduced the new do_huge_zero_wp_pmd() path without
aligning with the policy ...

Thanks for clarifying!
Lance

> 
>>> in case of memory pressure even the pmd folio is under used.
>>>
>>> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
>>> folio installation consistent.
>>>
>>> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
>>
>> Shouldn't this rather be the following?
>>
>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>>
>> Thanks,
>> Lance
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd()
  2025-10-02  1:38 [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd() Wei Yang
                   ` (5 preceding siblings ...)
  2025-10-03 13:53 ` Zi Yan
@ 2025-10-14  3:49 ` Baolin Wang
  6 siblings, 0 replies; 18+ messages in thread
From: Baolin Wang @ 2025-10-14  3:49 UTC (permalink / raw)
  To: Wei Yang, akpm, david, lorenzo.stoakes, ziy, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang,
	wangkefeng.wang
  Cc: linux-mm, stable



On 2025/10/2 09:38, Wei Yang wrote:
> We add pmd folio into ds_queue on the first page fault in
> __do_huge_pmd_anonymous_page(), so that we can split it in case of
> memory pressure. This should be the same for a pmd folio during wp
> page fault.
> 
> Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss
> to add it to ds_queue, which means system may not reclaim enough memory
> in case of memory pressure even the pmd folio is under used.
> 
> Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd
> folio installation consistent.
> 
> Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: <stable@vger.kernel.org>
> 
> ---

Nice catch. LGTM.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-10-14  3:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-02  1:38 [Patch v2] mm/huge_memory: add pmd folio to ds_queue in do_huge_zero_wp_pmd() Wei Yang
2025-10-02  1:46 ` Wei Yang
2025-10-02  2:31   ` Lance Yang
2025-10-02  3:17     ` Wei Yang
2025-10-02  7:16       ` David Hildenbrand
2025-10-02  7:27         ` Lance Yang
2025-10-02  7:14 ` David Hildenbrand
2025-10-02  7:26 ` Lance Yang
2025-10-03  7:54 ` Dev Jain
2025-10-03 13:49 ` Lance Yang
2025-10-03 14:08   ` Zi Yan
2025-10-03 15:30     ` Usama Arif
2025-10-03 17:11       ` Zi Yan
2025-10-04  2:13     ` Wei Yang
2025-10-04  2:04   ` Wei Yang
2025-10-04  2:37     ` Lance Yang
2025-10-03 13:53 ` Zi Yan
2025-10-14  3:49 ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox