linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
@ 2024-02-21  8:50 Barry Song
  2024-02-21  9:15 ` David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Barry Song @ 2024-02-21  8:50 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: linux-kernel, Barry Song, Yin Fengwei, Yu Zhao, Ryan Roberts,
	David Hildenbrand, Kefeng Wang, Matthew Wilcox, Minchan Kim,
	Vishal Moola, Yang Shi

From: Barry Song <v-songbaohua@oppo.com>

The purpose is stopping splitting large folios whose mapcount are 2 or
above. Folios whose estimated_shares = 0 should be still perfect and
even better candidates than estimated_shares = 1.

Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
the current code will split folios and reclaim them while madvise goes
on this folio; but if we unmap subpage 0, we will keep this folio and
break. This is weird.

For pmd-mapped large folios, we can still use "= 1" as the condition
as anyway we have the entire map for it. So this patch doesn't change
the condition for pmd-mapped large folios.
This also explains why we had been using "= 1" for both pmd-mapped and
pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
madvise_cold_or_pageout_pte_range() to use folios"), because in the
past, we used the mapcount of the specific subpage, since the subpage
had pte present, its mapcount wouldn't be 0.

The problem can be quite easily reproduced by writing a small program,
unmapping the first subpage of a pte-mapped large folio vs. unmapping
anyone other than the first subpage.

Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index cfa5e7288261..abde3edb04f0 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
 		if (folio_test_large(folio)) {
 			int err;
 
-			if (folio_estimated_sharers(folio) != 1)
+			if (folio_estimated_sharers(folio) > 1)
 				break;
 			if (pageout_anon_only_filter && !folio_test_anon(folio))
 				break;
-- 
2.34.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
  2024-02-21  8:50 [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0 Barry Song
@ 2024-02-21  9:15 ` David Hildenbrand
  2024-02-21  9:21   ` David Hildenbrand
  2024-02-21 17:35 ` Vishal Moola
  2024-02-26 13:46 ` Ryan Roberts
  2 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-02-21  9:15 UTC (permalink / raw)
  To: Barry Song, akpm, linux-mm
  Cc: linux-kernel, Barry Song, Yin Fengwei, Yu Zhao, Ryan Roberts,
	Kefeng Wang, Matthew Wilcox, Minchan Kim, Vishal Moola, Yang Shi

On 21.02.24 09:50, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> The purpose is stopping splitting large folios whose mapcount are 2 or
> above. Folios whose estimated_shares = 0 should be still perfect and
> even better candidates than estimated_shares = 1.
> 
> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
> the current code will split folios and reclaim them while madvise goes
> on this folio; but if we unmap subpage 0, we will keep this folio and
> break. This is weird.
> 
> For pmd-mapped large folios, we can still use "= 1" as the condition
> as anyway we have the entire map for it. So this patch doesn't change
> the condition for pmd-mapped large folios.
> This also explains why we had been using "= 1" for both pmd-mapped and
> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
> madvise_cold_or_pageout_pte_range() to use folios"), because in the
> past, we used the mapcount of the specific subpage, since the subpage
> had pte present, its mapcount wouldn't be 0.
> 
> The problem can be quite easily reproduced by writing a small program,
> unmapping the first subpage of a pte-mapped large folio vs. unmapping
> anyone other than the first subpage.
> 
> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
> Cc: Yin Fengwei <fengwei.yin@intel.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>   mm/madvise.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index cfa5e7288261..abde3edb04f0 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>   		if (folio_test_large(folio)) {
>   			int err;
>   
> -			if (folio_estimated_sharers(folio) != 1)
> +			if (folio_estimated_sharers(folio) > 1)
>   				break;
>   			if (pageout_anon_only_filter && !folio_test_anon(folio))
>   				break;

That's also what I do in

https://lkml.kernel.org/r/20231124132626.235350-4-david@redhat.com

I'll revive that soon.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
  2024-02-21  9:15 ` David Hildenbrand
@ 2024-02-21  9:21   ` David Hildenbrand
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2024-02-21  9:21 UTC (permalink / raw)
  To: Barry Song, akpm, linux-mm
  Cc: linux-kernel, Barry Song, Yin Fengwei, Yu Zhao, Ryan Roberts,
	Kefeng Wang, Matthew Wilcox, Minchan Kim, Vishal Moola, Yang Shi

On 21.02.24 10:15, David Hildenbrand wrote:
> On 21.02.24 09:50, Barry Song wrote:
>> From: Barry Song <v-songbaohua@oppo.com>
>>
>> The purpose is stopping splitting large folios whose mapcount are 2 or
>> above. Folios whose estimated_shares = 0 should be still perfect and
>> even better candidates than estimated_shares = 1.
>>
>> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
>> the current code will split folios and reclaim them while madvise goes
>> on this folio; but if we unmap subpage 0, we will keep this folio and
>> break. This is weird.
>>
>> For pmd-mapped large folios, we can still use "= 1" as the condition
>> as anyway we have the entire map for it. So this patch doesn't change
>> the condition for pmd-mapped large folios.
>> This also explains why we had been using "= 1" for both pmd-mapped and
>> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
>> madvise_cold_or_pageout_pte_range() to use folios"), because in the
>> past, we used the mapcount of the specific subpage, since the subpage
>> had pte present, its mapcount wouldn't be 0.
>>
>> The problem can be quite easily reproduced by writing a small program,
>> unmapping the first subpage of a pte-mapped large folio vs. unmapping
>> anyone other than the first subpage.
>>
>> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
>> Cc: Yin Fengwei <fengwei.yin@intel.com>
>> Cc: Yu Zhao <yuzhao@google.com>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>> ---
>>    mm/madvise.c | 2 +-
>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index cfa5e7288261..abde3edb04f0 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>>    		if (folio_test_large(folio)) {
>>    			int err;
>>    
>> -			if (folio_estimated_sharers(folio) != 1)
>> +			if (folio_estimated_sharers(folio) > 1)
>>    				break;
>>    			if (pageout_anon_only_filter && !folio_test_anon(folio))
>>    				break;
> 
> That's also what I do in
> 
> https://lkml.kernel.org/r/20231124132626.235350-4-david@redhat.com
> 
> I'll revive that soon.

Forgot to add: we can pull this in early.

Reviewed-by: David Hildenbrand <david@redhat.com>

(I'll do the simple folio_estimated_sharers() to folio_mapped_shared() 
conversion first and optimize with total mapcount separately)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
  2024-02-21  8:50 [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0 Barry Song
  2024-02-21  9:15 ` David Hildenbrand
@ 2024-02-21 17:35 ` Vishal Moola
  2024-02-26 13:46 ` Ryan Roberts
  2 siblings, 0 replies; 8+ messages in thread
From: Vishal Moola @ 2024-02-21 17:35 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, linux-mm, linux-kernel, Barry Song, Yin Fengwei, Yu Zhao,
	Ryan Roberts, David Hildenbrand, Kefeng Wang, Matthew Wilcox,
	Minchan Kim, Yang Shi

On Wed, Feb 21, 2024 at 12:50 AM Barry Song <21cnbao@gmail.com> wrote:
>
> From: Barry Song <v-songbaohua@oppo.com>
>
> The purpose is stopping splitting large folios whose mapcount are 2 or
> above. Folios whose estimated_shares = 0 should be still perfect and
> even better candidates than estimated_shares = 1.
>
> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
> the current code will split folios and reclaim them while madvise goes
> on this folio; but if we unmap subpage 0, we will keep this folio and
> break. This is weird.
> For pmd-mapped large folios, we can still use "= 1" as the condition
> as anyway we have the entire map for it. So this patch doesn't change
> the condition for pmd-mapped large folios.
> This also explains why we had been using "= 1" for both pmd-mapped and
> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
> madvise_cold_or_pageout_pte_range() to use folios"), because in the
> past, we used the mapcount of the specific subpage, since the subpage
> had pte present, its mapcount wouldn't be 0.
> The problem can be quite easily reproduced by writing a small program,
> unmapping the first subpage of a pte-mapped large folio vs. unmapping
> anyone other than the first subpage.
>
> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
> Cc: Yin Fengwei <fengwei.yin@intel.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>

Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>

> ---
>  mm/madvise.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index cfa5e7288261..abde3edb04f0 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>                 if (folio_test_large(folio)) {
>                         int err;
>
> -                       if (folio_estimated_sharers(folio) != 1)
> +                       if (folio_estimated_sharers(folio) > 1)
>                                 break;
>                         if (pageout_anon_only_filter && !folio_test_anon(folio))
>                                 break;
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
  2024-02-21  8:50 [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0 Barry Song
  2024-02-21  9:15 ` David Hildenbrand
  2024-02-21 17:35 ` Vishal Moola
@ 2024-02-26 13:46 ` Ryan Roberts
  2024-02-26 13:50   ` David Hildenbrand
  2024-02-26 21:17   ` Barry Song
  2 siblings, 2 replies; 8+ messages in thread
From: Ryan Roberts @ 2024-02-26 13:46 UTC (permalink / raw)
  To: Barry Song, akpm, linux-mm
  Cc: linux-kernel, Barry Song, Yin Fengwei, Yu Zhao,
	David Hildenbrand, Kefeng Wang, Matthew Wilcox, Minchan Kim,
	Vishal Moola, Yang Shi

On 21/02/2024 08:50, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> The purpose is stopping splitting large folios whose mapcount are 2 or
> above. Folios whose estimated_shares = 0 should be still perfect and
> even better candidates than estimated_shares = 1.
> 
> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
> the current code will split folios and reclaim them while madvise goes
> on this folio; but if we unmap subpage 0, we will keep this folio and
> break. This is weird.
> 
> For pmd-mapped large folios, we can still use "= 1" as the condition
> as anyway we have the entire map for it. So this patch doesn't change
> the condition for pmd-mapped large folios.
> This also explains why we had been using "= 1" for both pmd-mapped and
> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
> madvise_cold_or_pageout_pte_range() to use folios"), because in the
> past, we used the mapcount of the specific subpage, since the subpage
> had pte present, its mapcount wouldn't be 0.
> 
> The problem can be quite easily reproduced by writing a small program,
> unmapping the first subpage of a pte-mapped large folio vs. unmapping
> anyone other than the first subpage.
> 
> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
> Cc: Yin Fengwei <fengwei.yin@intel.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>  mm/madvise.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index cfa5e7288261..abde3edb04f0 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  		if (folio_test_large(folio)) {
>  			int err;
>  
> -			if (folio_estimated_sharers(folio) != 1)
> +			if (folio_estimated_sharers(folio) > 1)
>  				break;
>  			if (pageout_anon_only_filter && !folio_test_anon(folio))
>  				break;

I wonder if we should change all the instances:

folio_estimated_sharers() != 1   ->   folio_estimated_sharers() > 1
folio_estimated_sharers() == 1   ->   folio_estimated_sharers() <= 1

It shouldn't cause a problem for the pmd case, and there are definitely other
cases where it will help. e.g. madvise_free_pte_range().

Regardless:

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
  2024-02-26 13:46 ` Ryan Roberts
@ 2024-02-26 13:50   ` David Hildenbrand
  2024-02-26 21:17   ` Barry Song
  1 sibling, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2024-02-26 13:50 UTC (permalink / raw)
  To: Ryan Roberts, Barry Song, akpm, linux-mm
  Cc: linux-kernel, Barry Song, Yin Fengwei, Yu Zhao, Kefeng Wang,
	Matthew Wilcox, Minchan Kim, Vishal Moola, Yang Shi

On 26.02.24 14:46, Ryan Roberts wrote:
> On 21/02/2024 08:50, Barry Song wrote:
>> From: Barry Song <v-songbaohua@oppo.com>
>>
>> The purpose is stopping splitting large folios whose mapcount are 2 or
>> above. Folios whose estimated_shares = 0 should be still perfect and
>> even better candidates than estimated_shares = 1.
>>
>> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
>> the current code will split folios and reclaim them while madvise goes
>> on this folio; but if we unmap subpage 0, we will keep this folio and
>> break. This is weird.
>>
>> For pmd-mapped large folios, we can still use "= 1" as the condition
>> as anyway we have the entire map for it. So this patch doesn't change
>> the condition for pmd-mapped large folios.
>> This also explains why we had been using "= 1" for both pmd-mapped and
>> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
>> madvise_cold_or_pageout_pte_range() to use folios"), because in the
>> past, we used the mapcount of the specific subpage, since the subpage
>> had pte present, its mapcount wouldn't be 0.
>>
>> The problem can be quite easily reproduced by writing a small program,
>> unmapping the first subpage of a pte-mapped large folio vs. unmapping
>> anyone other than the first subpage.
>>
>> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
>> Cc: Yin Fengwei <fengwei.yin@intel.com>
>> Cc: Yu Zhao <yuzhao@google.com>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>> ---
>>   mm/madvise.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index cfa5e7288261..abde3edb04f0 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>>   		if (folio_test_large(folio)) {
>>   			int err;
>>   
>> -			if (folio_estimated_sharers(folio) != 1)
>> +			if (folio_estimated_sharers(folio) > 1)
>>   				break;
>>   			if (pageout_anon_only_filter && !folio_test_anon(folio))
>>   				break;
> 
> I wonder if we should change all the instances:
> 
> folio_estimated_sharers() != 1   ->   folio_estimated_sharers() > 1
> folio_estimated_sharers() == 1   ->   folio_estimated_sharers() <= 1

I'll send out something that wraps that in folio_mapped_shared() later 
today or tomorrow.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
  2024-02-26 13:46 ` Ryan Roberts
  2024-02-26 13:50   ` David Hildenbrand
@ 2024-02-26 21:17   ` Barry Song
  2024-02-27  9:10     ` Ryan Roberts
  1 sibling, 1 reply; 8+ messages in thread
From: Barry Song @ 2024-02-26 21:17 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: akpm, linux-mm, linux-kernel, Barry Song, Yin Fengwei, Yu Zhao,
	David Hildenbrand, Kefeng Wang, Matthew Wilcox, Minchan Kim,
	Vishal Moola, Yang Shi

On Tue, Feb 27, 2024 at 2:46 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 21/02/2024 08:50, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > The purpose is stopping splitting large folios whose mapcount are 2 or
> > above. Folios whose estimated_shares = 0 should be still perfect and
> > even better candidates than estimated_shares = 1.
> >
> > Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
> > the current code will split folios and reclaim them while madvise goes
> > on this folio; but if we unmap subpage 0, we will keep this folio and
> > break. This is weird.
> >
> > For pmd-mapped large folios, we can still use "= 1" as the condition
> > as anyway we have the entire map for it. So this patch doesn't change
> > the condition for pmd-mapped large folios.
> > This also explains why we had been using "= 1" for both pmd-mapped and
> > pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
> > madvise_cold_or_pageout_pte_range() to use folios"), because in the
> > past, we used the mapcount of the specific subpage, since the subpage
> > had pte present, its mapcount wouldn't be 0.
> >
> > The problem can be quite easily reproduced by writing a small program,
> > unmapping the first subpage of a pte-mapped large folio vs. unmapping
> > anyone other than the first subpage.
> >
> > Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
> > Cc: Yin Fengwei <fengwei.yin@intel.com>
> > Cc: Yu Zhao <yuzhao@google.com>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> > Cc: Yang Shi <shy828301@gmail.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > ---
> >  mm/madvise.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index cfa5e7288261..abde3edb04f0 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> >               if (folio_test_large(folio)) {
> >                       int err;
> >
> > -                     if (folio_estimated_sharers(folio) != 1)
> > +                     if (folio_estimated_sharers(folio) > 1)
> >                               break;
> >                       if (pageout_anon_only_filter && !folio_test_anon(folio))
> >                               break;
>
> I wonder if we should change all the instances:
>
> folio_estimated_sharers() != 1   ->   folio_estimated_sharers() > 1
> folio_estimated_sharers() == 1   ->   folio_estimated_sharers() <= 1
>
> It shouldn't cause a problem for the pmd case, and there are definitely other
> cases where it will help. e.g. madvise_free_pte_range().

right. My test case covered PAGEOUT only and I agree madvise_free and
others have
exactly the same issue. for pmd case, it doesn't matter whether we
change the condition
or not because we have already pmd-mapped in the page table.

And good to know David will have a wrapper in folio_mapped_shared()  to more
widely address this issue.

>
> Regardless:
>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>

Thanks though we might have missed your tag as this one has been
in mm-stable.

Best regards,
Barry


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
  2024-02-26 21:17   ` Barry Song
@ 2024-02-27  9:10     ` Ryan Roberts
  0 siblings, 0 replies; 8+ messages in thread
From: Ryan Roberts @ 2024-02-27  9:10 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, linux-mm, linux-kernel, Barry Song, Yin Fengwei, Yu Zhao,
	David Hildenbrand, Kefeng Wang, Matthew Wilcox, Minchan Kim,
	Vishal Moola, Yang Shi

On 26/02/2024 21:17, Barry Song wrote:
> On Tue, Feb 27, 2024 at 2:46 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 21/02/2024 08:50, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> The purpose is stopping splitting large folios whose mapcount are 2 or
>>> above. Folios whose estimated_shares = 0 should be still perfect and
>>> even better candidates than estimated_shares = 1.
>>>
>>> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
>>> the current code will split folios and reclaim them while madvise goes
>>> on this folio; but if we unmap subpage 0, we will keep this folio and
>>> break. This is weird.
>>>
>>> For pmd-mapped large folios, we can still use "= 1" as the condition
>>> as anyway we have the entire map for it. So this patch doesn't change
>>> the condition for pmd-mapped large folios.
>>> This also explains why we had been using "= 1" for both pmd-mapped and
>>> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
>>> madvise_cold_or_pageout_pte_range() to use folios"), because in the
>>> past, we used the mapcount of the specific subpage, since the subpage
>>> had pte present, its mapcount wouldn't be 0.
>>>
>>> The problem can be quite easily reproduced by writing a small program,
>>> unmapping the first subpage of a pte-mapped large folio vs. unmapping
>>> anyone other than the first subpage.
>>>
>>> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
>>> Cc: Yin Fengwei <fengwei.yin@intel.com>
>>> Cc: Yu Zhao <yuzhao@google.com>
>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
>>> Cc: Matthew Wilcox <willy@infradead.org>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
>>> Cc: Yang Shi <shy828301@gmail.com>
>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>> ---
>>>  mm/madvise.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>> index cfa5e7288261..abde3edb04f0 100644
>>> --- a/mm/madvise.c
>>> +++ b/mm/madvise.c
>>> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>>>               if (folio_test_large(folio)) {
>>>                       int err;
>>>
>>> -                     if (folio_estimated_sharers(folio) != 1)
>>> +                     if (folio_estimated_sharers(folio) > 1)
>>>                               break;
>>>                       if (pageout_anon_only_filter && !folio_test_anon(folio))
>>>                               break;
>>
>> I wonder if we should change all the instances:
>>
>> folio_estimated_sharers() != 1   ->   folio_estimated_sharers() > 1
>> folio_estimated_sharers() == 1   ->   folio_estimated_sharers() <= 1
>>
>> It shouldn't cause a problem for the pmd case, and there are definitely other
>> cases where it will help. e.g. madvise_free_pte_range().
> 
> right. My test case covered PAGEOUT only and I agree madvise_free and
> others have
> exactly the same issue. for pmd case, it doesn't matter whether we
> change the condition
> or not because we have already pmd-mapped in the page table.
> 
> And good to know David will have a wrapper in folio_mapped_shared()  to more
> widely address this issue.
> 
>>
>> Regardless:
>>
>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>>
> 
> Thanks though we might have missed your tag as this one has been
> in mm-stable.

No problem! I've been out on holiday so a bit behind on where everything is.

> 
> Best regards,
> Barry



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-02-27  9:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-21  8:50 [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0 Barry Song
2024-02-21  9:15 ` David Hildenbrand
2024-02-21  9:21   ` David Hildenbrand
2024-02-21 17:35 ` Vishal Moola
2024-02-26 13:46 ` Ryan Roberts
2024-02-26 13:50   ` David Hildenbrand
2024-02-26 21:17   ` Barry Song
2024-02-27  9:10     ` Ryan Roberts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox