linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file()
@ 2025-05-22  9:34 Shivank Garg
  2025-05-22 10:01 ` Baolin Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Shivank Garg @ 2025-05-22  9:34 UTC (permalink / raw)
  To: akpm, david, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, dev.jain, fengwei.yin, shivankg, bharata,
	syzbot+2b99589e33edbe9475ca

folio_mapcount() checks folio_test_large() before proceeding to
folio_large_mapcount(), but there exists a race window where a folio
could be split between these checks which triggered the
VM_WARN_ON_FOLIO(!folio_test_large(folio), folio) in
folio_large_mapcount().

Take a temporary folio reference in hpage_collapse_scan_file() to prevent
races with concurrent folio splitting/freeing. This prevent potential
incorrect large folio detection.

Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single value")
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 mm/khugepaged.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index cc945c6ab3bd..6e8902f9d88c 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2295,6 +2295,17 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
 			continue;
 		}
 
+		if (!folio_try_get(folio)) {
+			xas_reset(&xas);
+			continue;
+		}
+
+		if (unlikely(folio != xas_reload(&xas))) {
+			folio_put(folio);
+			xas_reset(&xas);
+			continue;
+		}
+
 		if (folio_order(folio) == HPAGE_PMD_ORDER &&
 		    folio->index == start) {
 			/* Maybe PMD-mapped */
@@ -2305,23 +2316,27 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
 			 * it's safe to skip LRU and refcount checks before
 			 * returning.
 			 */
+			folio_put(folio);
 			break;
 		}
 
 		node = folio_nid(folio);
 		if (hpage_collapse_scan_abort(node, cc)) {
 			result = SCAN_SCAN_ABORT;
+			folio_put(folio);
 			break;
 		}
 		cc->node_load[node]++;
 
 		if (!folio_test_lru(folio)) {
 			result = SCAN_PAGE_LRU;
+			folio_put(folio);
 			break;
 		}
 
 		if (!is_refcount_suitable(folio)) {
 			result = SCAN_PAGE_COUNT;
+			folio_put(folio);
 			break;
 		}
 
@@ -2333,6 +2348,7 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
 		 */
 
 		present += folio_nr_pages(folio);
+		folio_put(folio);
 
 		if (need_resched()) {
 			xas_pause(&xas);
-- 
2.34.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file()
  2025-05-22  9:34 [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file() Shivank Garg
@ 2025-05-22 10:01 ` Baolin Wang
  2025-05-22 10:04   ` Dev Jain
  2025-05-22 11:59   ` David Hildenbrand
  2025-05-22 10:02 ` Dev Jain
  2025-05-22 10:07 ` Dev Jain
  2 siblings, 2 replies; 8+ messages in thread
From: Baolin Wang @ 2025-05-22 10:01 UTC (permalink / raw)
  To: Shivank Garg, akpm, david, linux-mm, linux-kernel
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca



On 2025/5/22 17:34, Shivank Garg wrote:
> folio_mapcount() checks folio_test_large() before proceeding to
> folio_large_mapcount(), but there exists a race window where a folio
> could be split between these checks which triggered the
> VM_WARN_ON_FOLIO(!folio_test_large(folio), folio) in
> folio_large_mapcount().
> 
> Take a temporary folio reference in hpage_collapse_scan_file() to prevent
> races with concurrent folio splitting/freeing. This prevent potential
> incorrect large folio detection.
> 
> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single value")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>   mm/khugepaged.c | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index cc945c6ab3bd..6e8902f9d88c 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2295,6 +2295,17 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>   			continue;
>   		}
>   
> +		if (!folio_try_get(folio)) {
> +			xas_reset(&xas);
> +			continue;
> +		}
> +
> +		if (unlikely(folio != xas_reload(&xas))) {
> +			folio_put(folio);
> +			xas_reset(&xas);
> +			continue;
> +		}
> +
>   		if (folio_order(folio) == HPAGE_PMD_ORDER &&
>   		    folio->index == start) {
>   			/* Maybe PMD-mapped */
> @@ -2305,23 +2316,27 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>   			 * it's safe to skip LRU and refcount checks before
>   			 * returning.
>   			 */
> +			folio_put(folio);
>   			break;
>   		}
>   
>   		node = folio_nid(folio);
>   		if (hpage_collapse_scan_abort(node, cc)) {
>   			result = SCAN_SCAN_ABORT;
> +			folio_put(folio);
>   			break;
>   		}
>   		cc->node_load[node]++;
>   
>   		if (!folio_test_lru(folio)) {
>   			result = SCAN_PAGE_LRU;
> +			folio_put(folio);
>   			break;
>   		}
>   
>   		if (!is_refcount_suitable(folio)) {

You add a temporary refcnt for the folio, then the 
is_refcount_suitable() will always fail, right?

>   			result = SCAN_PAGE_COUNT;
> +			folio_put(folio);
>   			break;
>   		}
>   
> @@ -2333,6 +2348,7 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>   		 */
>   
>   		present += folio_nr_pages(folio);
> +		folio_put(folio);
>   
>   		if (need_resched()) {
>   			xas_pause(&xas);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file()
  2025-05-22  9:34 [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file() Shivank Garg
  2025-05-22 10:01 ` Baolin Wang
@ 2025-05-22 10:02 ` Dev Jain
  2025-05-22 10:07 ` Dev Jain
  2 siblings, 0 replies; 8+ messages in thread
From: Dev Jain @ 2025-05-22 10:02 UTC (permalink / raw)
  To: Shivank Garg, akpm, david, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca


On 22/05/25 3:04 pm, Shivank Garg wrote:
> folio_mapcount() checks folio_test_large() before proceeding to
> folio_large_mapcount(), but there exists a race window where a folio
> could be split between these checks which triggered the
> VM_WARN_ON_FOLIO(!folio_test_large(folio), folio) in
> folio_large_mapcount().
>
> Take a temporary folio reference in hpage_collapse_scan_file() to prevent
> races with concurrent folio splitting/freeing. This prevent potential
> incorrect large folio detection.
>
> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single value")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>   mm/khugepaged.c | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index cc945c6ab3bd..6e8902f9d88c 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2295,6 +2295,17 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>   			continue;
>   		}
>   
> +		if (!folio_try_get(folio)) {
> +			xas_reset(&xas);
> +			continue;
> +		}
> +
> +		if (unlikely(folio != xas_reload(&xas))) {
> +			folio_put(folio);
> +			xas_reset(&xas);
> +			continue;
> +		}
> +
>   		if (folio_order(folio) == HPAGE_PMD_ORDER &&
>   		    folio->index == start) {
>   			/* Maybe PMD-mapped */
> @@ -2305,23 +2316,27 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>   			 * it's safe to skip LRU and refcount checks before
>   			 * returning.
>   			 */
> +			folio_put(folio);
>   			break;
>   		}
>   
>   		node = folio_nid(folio);
>   		if (hpage_collapse_scan_abort(node, cc)) {
>   			result = SCAN_SCAN_ABORT;
> +			folio_put(folio);
>   			break;
>   		}
>   		cc->node_load[node]++;
>   
>   		if (!folio_test_lru(folio)) {
>   			result = SCAN_PAGE_LRU;
> +			folio_put(folio);
>   			break;
>   		}
>   
>   		if (!is_refcount_suitable(folio)) {


Do we need to change is_refcount_suitable()?


>   			result = SCAN_PAGE_COUNT;
> +			folio_put(folio);
>   			break;
>   		}
>   
> @@ -2333,6 +2348,7 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>   		 */
>   
>   		present += folio_nr_pages(folio);
> +		folio_put(folio);
>   
>   		if (need_resched()) {
>   			xas_pause(&xas);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file()
  2025-05-22 10:01 ` Baolin Wang
@ 2025-05-22 10:04   ` Dev Jain
  2025-05-22 11:59   ` David Hildenbrand
  1 sibling, 0 replies; 8+ messages in thread
From: Dev Jain @ 2025-05-22 10:04 UTC (permalink / raw)
  To: Baolin Wang, Shivank Garg, akpm, david, linux-mm, linux-kernel
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca


On 22/05/25 3:31 pm, Baolin Wang wrote:
>
>
> On 2025/5/22 17:34, Shivank Garg wrote:
>> folio_mapcount() checks folio_test_large() before proceeding to
>> folio_large_mapcount(), but there exists a race window where a folio
>> could be split between these checks which triggered the
>> VM_WARN_ON_FOLIO(!folio_test_large(folio), folio) in
>> folio_large_mapcount().
>>
>> Take a temporary folio reference in hpage_collapse_scan_file() to 
>> prevent
>> races with concurrent folio splitting/freeing. This prevent potential
>> incorrect large folio detection.
>>
>> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
>> Closes: 
>> https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
>> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single 
>> value")
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>> ---
>>   mm/khugepaged.c | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index cc945c6ab3bd..6e8902f9d88c 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2295,6 +2295,17 @@ static int hpage_collapse_scan_file(struct 
>> mm_struct *mm, unsigned long addr,
>>               continue;
>>           }
>>   +        if (!folio_try_get(folio)) {
>> +            xas_reset(&xas);
>> +            continue;
>> +        }
>> +
>> +        if (unlikely(folio != xas_reload(&xas))) {
>> +            folio_put(folio);
>> +            xas_reset(&xas);
>> +            continue;
>> +        }
>> +
>>           if (folio_order(folio) == HPAGE_PMD_ORDER &&
>>               folio->index == start) {
>>               /* Maybe PMD-mapped */
>> @@ -2305,23 +2316,27 @@ static int hpage_collapse_scan_file(struct 
>> mm_struct *mm, unsigned long addr,
>>                * it's safe to skip LRU and refcount checks before
>>                * returning.
>>                */
>> +            folio_put(folio);
>>               break;
>>           }
>>             node = folio_nid(folio);
>>           if (hpage_collapse_scan_abort(node, cc)) {
>>               result = SCAN_SCAN_ABORT;
>> +            folio_put(folio);
>>               break;
>>           }
>>           cc->node_load[node]++;
>>             if (!folio_test_lru(folio)) {
>>               result = SCAN_PAGE_LRU;
>> +            folio_put(folio);
>>               break;
>>           }
>>             if (!is_refcount_suitable(folio)) {
>
> You add a temporary refcnt for the folio, then the 
> is_refcount_suitable() will always fail, right?


Oops, you are one minute faster :wink


>
>>               result = SCAN_PAGE_COUNT;
>> +            folio_put(folio);
>>               break;
>>           }
>>   @@ -2333,6 +2348,7 @@ static int hpage_collapse_scan_file(struct 
>> mm_struct *mm, unsigned long addr,
>>            */
>>             present += folio_nr_pages(folio);
>> +        folio_put(folio);
>>             if (need_resched()) {
>>               xas_pause(&xas);
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file()
  2025-05-22  9:34 [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file() Shivank Garg
  2025-05-22 10:01 ` Baolin Wang
  2025-05-22 10:02 ` Dev Jain
@ 2025-05-22 10:07 ` Dev Jain
  2025-05-23  7:50   ` Shivank Garg
  2 siblings, 1 reply; 8+ messages in thread
From: Dev Jain @ 2025-05-22 10:07 UTC (permalink / raw)
  To: Shivank Garg, akpm, david, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca


On 22/05/25 3:04 pm, Shivank Garg wrote:
> folio_mapcount() checks folio_test_large() before proceeding to


It is not very clear in the description, where is this folio_mapcount() 
call present?

Are you talking about is_refcount_suitable()?


> folio_large_mapcount(), but there exists a race window where a folio
> could be split between these checks which triggered the
> VM_WARN_ON_FOLIO(!folio_test_large(folio), folio) in
> folio_large_mapcount().
>
> Take a temporary folio reference in hpage_collapse_scan_file() to prevent
> races with concurrent folio splitting/freeing. This prevent potential
> incorrect large folio detection.
>
> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single value")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>   mm/khugepaged.c | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index cc945c6ab3bd..6e8902f9d88c 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2295,6 +2295,17 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>   			continue;
>   		}
>   
> +		if (!folio_try_get(folio)) {
> +			xas_reset(&xas);
> +			continue;
> +		}
> +
> +		if (unlikely(folio != xas_reload(&xas))) {
> +			folio_put(folio);
> +			xas_reset(&xas);
> +			continue;
> +		}
> +
>   		if (folio_order(folio) == HPAGE_PMD_ORDER &&
>   		    folio->index == start) {
>   			/* Maybe PMD-mapped */
> @@ -2305,23 +2316,27 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>   			 * it's safe to skip LRU and refcount checks before
>   			 * returning.
>   			 */
> +			folio_put(folio);
>   			break;
>   		}
>   
>   		node = folio_nid(folio);
>   		if (hpage_collapse_scan_abort(node, cc)) {
>   			result = SCAN_SCAN_ABORT;
> +			folio_put(folio);
>   			break;
>   		}
>   		cc->node_load[node]++;
>   
>   		if (!folio_test_lru(folio)) {
>   			result = SCAN_PAGE_LRU;
> +			folio_put(folio);
>   			break;
>   		}
>   
>   		if (!is_refcount_suitable(folio)) {
>   			result = SCAN_PAGE_COUNT;
> +			folio_put(folio);
>   			break;
>   		}
>   
> @@ -2333,6 +2348,7 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>   		 */
>   
>   		present += folio_nr_pages(folio);
> +		folio_put(folio);
>   
>   		if (need_resched()) {
>   			xas_pause(&xas);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file()
  2025-05-22 10:01 ` Baolin Wang
  2025-05-22 10:04   ` Dev Jain
@ 2025-05-22 11:59   ` David Hildenbrand
  2025-05-23  7:50     ` Shivank Garg
  1 sibling, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2025-05-22 11:59 UTC (permalink / raw)
  To: Baolin Wang, Shivank Garg, akpm, linux-mm, linux-kernel
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca

On 22.05.25 12:01, Baolin Wang wrote:
> 
> 
> On 2025/5/22 17:34, Shivank Garg wrote:
>> folio_mapcount() checks folio_test_large() before proceeding to
>> folio_large_mapcount(), but there exists a race window where a folio
>> could be split between these checks which triggered the
>> VM_WARN_ON_FOLIO(!folio_test_large(folio), folio) in
>> folio_large_mapcount().
>>
>> Take a temporary folio reference in hpage_collapse_scan_file() to prevent
>> races with concurrent folio splitting/freeing. This prevent potential
>> incorrect large folio detection.
>>
>> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
>> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single value")
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>> ---
>>    mm/khugepaged.c | 16 ++++++++++++++++
>>    1 file changed, 16 insertions(+)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index cc945c6ab3bd..6e8902f9d88c 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2295,6 +2295,17 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>>    			continue;
>>    		}
>>    
>> +		if (!folio_try_get(folio)) {
>> +			xas_reset(&xas);
>> +			continue;
>> +		}
>> +
>> +		if (unlikely(folio != xas_reload(&xas))) {
>> +			folio_put(folio);
>> +			xas_reset(&xas);
>> +			continue;
>> +		}
>> +
>>    		if (folio_order(folio) == HPAGE_PMD_ORDER &&
>>    		    folio->index == start) {
>>    			/* Maybe PMD-mapped */
>> @@ -2305,23 +2316,27 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>>    			 * it's safe to skip LRU and refcount checks before
>>    			 * returning.
>>    			 */
>> +			folio_put(folio);
>>    			break;
>>    		}
>>    
>>    		node = folio_nid(folio);
>>    		if (hpage_collapse_scan_abort(node, cc)) {
>>    			result = SCAN_SCAN_ABORT;
>> +			folio_put(folio);
>>    			break;
>>    		}
>>    		cc->node_load[node]++;
>>    
>>    		if (!folio_test_lru(folio)) {
>>    			result = SCAN_PAGE_LRU;
>> +			folio_put(folio);
>>    			break;
>>    		}
>>    
>>    		if (!is_refcount_suitable(folio)) {
> 
> You add a temporary refcnt for the folio, then the
> is_refcount_suitable() will always fail, right?

Indeed. Would one of our MADV_COLLAPSE selftests catch that?

We should also be converting that code to use folio_expected_ref_count() 
-- either directly or wrapped in is_refcount_suitable().

Likely just here through

if (folio_expected_ref_count(folio) + 1 != folio_ref_count(folio))
	...

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file()
  2025-05-22 11:59   ` David Hildenbrand
@ 2025-05-23  7:50     ` Shivank Garg
  0 siblings, 0 replies; 8+ messages in thread
From: Shivank Garg @ 2025-05-23  7:50 UTC (permalink / raw)
  To: David Hildenbrand, Baolin Wang, akpm, linux-mm, linux-kernel, dev.jain
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca



On 5/22/2025 5:29 PM, David Hildenbrand wrote:
> On 22.05.25 12:01, Baolin Wang wrote:
>>
>>
>> On 2025/5/22 17:34, Shivank Garg wrote:
>>> folio_mapcount() checks folio_test_large() before proceeding to
>>> folio_large_mapcount(), but there exists a race window where a folio
>>> could be split between these checks which triggered the
>>> VM_WARN_ON_FOLIO(!folio_test_large(folio), folio) in
>>> folio_large_mapcount().
>>>
>>> Take a temporary folio reference in hpage_collapse_scan_file() to prevent
>>> races with concurrent folio splitting/freeing. This prevent potential
>>> incorrect large folio detection.
>>>
>>> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
>>> Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
>>> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single value")
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>>> ---
>>>    mm/khugepaged.c | 16 ++++++++++++++++
>>>    1 file changed, 16 insertions(+)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index cc945c6ab3bd..6e8902f9d88c 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -2295,6 +2295,17 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>>>                continue;
>>>            }
>>>    +        if (!folio_try_get(folio)) {
>>> +            xas_reset(&xas);
>>> +            continue;
>>> +        }
>>> +
>>> +        if (unlikely(folio != xas_reload(&xas))) {
>>> +            folio_put(folio);
>>> +            xas_reset(&xas);
>>> +            continue;
>>> +        }
>>> +
>>>            if (folio_order(folio) == HPAGE_PMD_ORDER &&
>>>                folio->index == start) {
>>>                /* Maybe PMD-mapped */
>>> @@ -2305,23 +2316,27 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>>>                 * it's safe to skip LRU and refcount checks before
>>>                 * returning.
>>>                 */
>>> +            folio_put(folio);
>>>                break;
>>>            }
>>>               node = folio_nid(folio);
>>>            if (hpage_collapse_scan_abort(node, cc)) {
>>>                result = SCAN_SCAN_ABORT;
>>> +            folio_put(folio);
>>>                break;
>>>            }
>>>            cc->node_load[node]++;
>>>               if (!folio_test_lru(folio)) {
>>>                result = SCAN_PAGE_LRU;
>>> +            folio_put(folio);
>>>                break;
>>>            }
>>>               if (!is_refcount_suitable(folio)) {
>>
>> You add a temporary refcnt for the folio, then the
>> is_refcount_suitable() will always fail, right?
You're right. Good Catch!

> Indeed. Would one of our MADV_COLLAPSE selftests catch that?

The status of this one test case changed from PASS to SKIP
with the patch:

./tools/testing/selftests/mm/uffd-unit-tests
Testing minor-collapse on shmem... skipped [reason: MADV_COLLAPSE failed]
Userfaults unit tests: pass=65, skip=1, fail=0 (total=66)

All test cases in khugepaged.c PASS.

In cow.c, Test Case status with or without patch remain unchanged for me.

# [RUN] Basic COW after fork() when collapsing after fork() (fully shared)
ok 758 # SKIP MADV_COLLAPSE failed: Invalid argument
# 1 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:769 fail:0 xfail:8 xpass:0 skip:1 error:0

> 
> We should also be converting that code to use folio_expected_ref_count() -- either directly or wrapped in is_refcount_suitable().
> 
> Likely just here through
> 
> if (folio_expected_ref_count(folio) + 1 != folio_ref_count(folio))
>     ...
> 

This makes sense. Then is_refcount_suitable() will not be needed.

Thanks,
Shivank






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file()
  2025-05-22 10:07 ` Dev Jain
@ 2025-05-23  7:50   ` Shivank Garg
  0 siblings, 0 replies; 8+ messages in thread
From: Shivank Garg @ 2025-05-23  7:50 UTC (permalink / raw)
  To: Dev Jain, akpm, david, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca



On 5/22/2025 3:37 PM, Dev Jain wrote:
> 
> On 22/05/25 3:04 pm, Shivank Garg wrote:
>> folio_mapcount() checks folio_test_large() before proceeding to
> 
> 
> It is not very clear in the description, where is this folio_mapcount() call present?
> 
> Are you talking about is_refcount_suitable()?

Yes, The issue originates from is_refcount_suitable(), which internally relies on folio_mapcount(), which contain the race.

I'll update the description to make it clearer.

Thanks,
Shivank


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-23  7:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-22  9:34 [PATCH] mm/khugepaged: Fix race with folio splitting in hpage_collapse_scan_file() Shivank Garg
2025-05-22 10:01 ` Baolin Wang
2025-05-22 10:04   ` Dev Jain
2025-05-22 11:59   ` David Hildenbrand
2025-05-23  7:50     ` Shivank Garg
2025-05-22 10:02 ` Dev Jain
2025-05-22 10:07 ` Dev Jain
2025-05-23  7:50   ` Shivank Garg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox