* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
2024-02-21 8:50 [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0 Barry Song
@ 2024-02-21 9:15 ` David Hildenbrand
2024-02-21 9:21 ` David Hildenbrand
2024-02-21 17:35 ` Vishal Moola
2024-02-26 13:46 ` Ryan Roberts
2 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-02-21 9:15 UTC (permalink / raw)
To: Barry Song, akpm, linux-mm
Cc: linux-kernel, Barry Song, Yin Fengwei, Yu Zhao, Ryan Roberts,
Kefeng Wang, Matthew Wilcox, Minchan Kim, Vishal Moola, Yang Shi
On 21.02.24 09:50, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
>
> The purpose is stopping splitting large folios whose mapcount are 2 or
> above. Folios whose estimated_shares = 0 should be still perfect and
> even better candidates than estimated_shares = 1.
>
> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
> the current code will split folios and reclaim them while madvise goes
> on this folio; but if we unmap subpage 0, we will keep this folio and
> break. This is weird.
>
> For pmd-mapped large folios, we can still use "= 1" as the condition
> as anyway we have the entire map for it. So this patch doesn't change
> the condition for pmd-mapped large folios.
> This also explains why we had been using "= 1" for both pmd-mapped and
> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
> madvise_cold_or_pageout_pte_range() to use folios"), because in the
> past, we used the mapcount of the specific subpage, since the subpage
> had pte present, its mapcount wouldn't be 0.
>
> The problem can be quite easily reproduced by writing a small program,
> unmapping the first subpage of a pte-mapped large folio vs. unmapping
> anyone other than the first subpage.
>
> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
> Cc: Yin Fengwei <fengwei.yin@intel.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
> mm/madvise.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index cfa5e7288261..abde3edb04f0 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> if (folio_test_large(folio)) {
> int err;
>
> - if (folio_estimated_sharers(folio) != 1)
> + if (folio_estimated_sharers(folio) > 1)
> break;
> if (pageout_anon_only_filter && !folio_test_anon(folio))
> break;
That's also what I do in
https://lkml.kernel.org/r/20231124132626.235350-4-david@redhat.com
I'll revive that soon.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
2024-02-21 9:15 ` David Hildenbrand
@ 2024-02-21 9:21 ` David Hildenbrand
0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2024-02-21 9:21 UTC (permalink / raw)
To: Barry Song, akpm, linux-mm
Cc: linux-kernel, Barry Song, Yin Fengwei, Yu Zhao, Ryan Roberts,
Kefeng Wang, Matthew Wilcox, Minchan Kim, Vishal Moola, Yang Shi
On 21.02.24 10:15, David Hildenbrand wrote:
> On 21.02.24 09:50, Barry Song wrote:
>> From: Barry Song <v-songbaohua@oppo.com>
>>
>> The purpose is stopping splitting large folios whose mapcount are 2 or
>> above. Folios whose estimated_shares = 0 should be still perfect and
>> even better candidates than estimated_shares = 1.
>>
>> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
>> the current code will split folios and reclaim them while madvise goes
>> on this folio; but if we unmap subpage 0, we will keep this folio and
>> break. This is weird.
>>
>> For pmd-mapped large folios, we can still use "= 1" as the condition
>> as anyway we have the entire map for it. So this patch doesn't change
>> the condition for pmd-mapped large folios.
>> This also explains why we had been using "= 1" for both pmd-mapped and
>> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
>> madvise_cold_or_pageout_pte_range() to use folios"), because in the
>> past, we used the mapcount of the specific subpage, since the subpage
>> had pte present, its mapcount wouldn't be 0.
>>
>> The problem can be quite easily reproduced by writing a small program,
>> unmapping the first subpage of a pte-mapped large folio vs. unmapping
>> anyone other than the first subpage.
>>
>> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
>> Cc: Yin Fengwei <fengwei.yin@intel.com>
>> Cc: Yu Zhao <yuzhao@google.com>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>> ---
>> mm/madvise.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index cfa5e7288261..abde3edb04f0 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>> if (folio_test_large(folio)) {
>> int err;
>>
>> - if (folio_estimated_sharers(folio) != 1)
>> + if (folio_estimated_sharers(folio) > 1)
>> break;
>> if (pageout_anon_only_filter && !folio_test_anon(folio))
>> break;
>
> That's also what I do in
>
> https://lkml.kernel.org/r/20231124132626.235350-4-david@redhat.com
>
> I'll revive that soon.
Forgot to add: we can pull this in early.
Reviewed-by: David Hildenbrand <david@redhat.com>
(I'll do the simple folio_estimated_sharers() to folio_mapped_shared()
conversion first and optimize with total mapcount separately)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
2024-02-21 8:50 [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0 Barry Song
2024-02-21 9:15 ` David Hildenbrand
@ 2024-02-21 17:35 ` Vishal Moola
2024-02-26 13:46 ` Ryan Roberts
2 siblings, 0 replies; 8+ messages in thread
From: Vishal Moola @ 2024-02-21 17:35 UTC (permalink / raw)
To: Barry Song
Cc: akpm, linux-mm, linux-kernel, Barry Song, Yin Fengwei, Yu Zhao,
Ryan Roberts, David Hildenbrand, Kefeng Wang, Matthew Wilcox,
Minchan Kim, Yang Shi
On Wed, Feb 21, 2024 at 12:50 AM Barry Song <21cnbao@gmail.com> wrote:
>
> From: Barry Song <v-songbaohua@oppo.com>
>
> The purpose is stopping splitting large folios whose mapcount are 2 or
> above. Folios whose estimated_shares = 0 should be still perfect and
> even better candidates than estimated_shares = 1.
>
> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
> the current code will split folios and reclaim them while madvise goes
> on this folio; but if we unmap subpage 0, we will keep this folio and
> break. This is weird.
> For pmd-mapped large folios, we can still use "= 1" as the condition
> as anyway we have the entire map for it. So this patch doesn't change
> the condition for pmd-mapped large folios.
> This also explains why we had been using "= 1" for both pmd-mapped and
> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
> madvise_cold_or_pageout_pte_range() to use folios"), because in the
> past, we used the mapcount of the specific subpage, since the subpage
> had pte present, its mapcount wouldn't be 0.
> The problem can be quite easily reproduced by writing a small program,
> unmapping the first subpage of a pte-mapped large folio vs. unmapping
> anyone other than the first subpage.
>
> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
> Cc: Yin Fengwei <fengwei.yin@intel.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> ---
> mm/madvise.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index cfa5e7288261..abde3edb04f0 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> if (folio_test_large(folio)) {
> int err;
>
> - if (folio_estimated_sharers(folio) != 1)
> + if (folio_estimated_sharers(folio) > 1)
> break;
> if (pageout_anon_only_filter && !folio_test_anon(folio))
> break;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
2024-02-21 8:50 [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0 Barry Song
2024-02-21 9:15 ` David Hildenbrand
2024-02-21 17:35 ` Vishal Moola
@ 2024-02-26 13:46 ` Ryan Roberts
2024-02-26 13:50 ` David Hildenbrand
2024-02-26 21:17 ` Barry Song
2 siblings, 2 replies; 8+ messages in thread
From: Ryan Roberts @ 2024-02-26 13:46 UTC (permalink / raw)
To: Barry Song, akpm, linux-mm
Cc: linux-kernel, Barry Song, Yin Fengwei, Yu Zhao,
David Hildenbrand, Kefeng Wang, Matthew Wilcox, Minchan Kim,
Vishal Moola, Yang Shi
On 21/02/2024 08:50, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
>
> The purpose is stopping splitting large folios whose mapcount are 2 or
> above. Folios whose estimated_shares = 0 should be still perfect and
> even better candidates than estimated_shares = 1.
>
> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
> the current code will split folios and reclaim them while madvise goes
> on this folio; but if we unmap subpage 0, we will keep this folio and
> break. This is weird.
>
> For pmd-mapped large folios, we can still use "= 1" as the condition
> as anyway we have the entire map for it. So this patch doesn't change
> the condition for pmd-mapped large folios.
> This also explains why we had been using "= 1" for both pmd-mapped and
> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
> madvise_cold_or_pageout_pte_range() to use folios"), because in the
> past, we used the mapcount of the specific subpage, since the subpage
> had pte present, its mapcount wouldn't be 0.
>
> The problem can be quite easily reproduced by writing a small program,
> unmapping the first subpage of a pte-mapped large folio vs. unmapping
> anyone other than the first subpage.
>
> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
> Cc: Yin Fengwei <fengwei.yin@intel.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
> mm/madvise.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index cfa5e7288261..abde3edb04f0 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> if (folio_test_large(folio)) {
> int err;
>
> - if (folio_estimated_sharers(folio) != 1)
> + if (folio_estimated_sharers(folio) > 1)
> break;
> if (pageout_anon_only_filter && !folio_test_anon(folio))
> break;
I wonder if we should change all the instances:
folio_estimated_sharers() != 1 -> folio_estimated_sharers() > 1
folio_estimated_sharers() == 1 -> folio_estimated_sharers() <= 1
It shouldn't cause a problem for the pmd case, and there are definitely other
cases where it will help. e.g. madvise_free_pte_range().
Regardless:
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
2024-02-26 13:46 ` Ryan Roberts
@ 2024-02-26 13:50 ` David Hildenbrand
2024-02-26 21:17 ` Barry Song
1 sibling, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2024-02-26 13:50 UTC (permalink / raw)
To: Ryan Roberts, Barry Song, akpm, linux-mm
Cc: linux-kernel, Barry Song, Yin Fengwei, Yu Zhao, Kefeng Wang,
Matthew Wilcox, Minchan Kim, Vishal Moola, Yang Shi
On 26.02.24 14:46, Ryan Roberts wrote:
> On 21/02/2024 08:50, Barry Song wrote:
>> From: Barry Song <v-songbaohua@oppo.com>
>>
>> The purpose is stopping splitting large folios whose mapcount are 2 or
>> above. Folios whose estimated_shares = 0 should be still perfect and
>> even better candidates than estimated_shares = 1.
>>
>> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
>> the current code will split folios and reclaim them while madvise goes
>> on this folio; but if we unmap subpage 0, we will keep this folio and
>> break. This is weird.
>>
>> For pmd-mapped large folios, we can still use "= 1" as the condition
>> as anyway we have the entire map for it. So this patch doesn't change
>> the condition for pmd-mapped large folios.
>> This also explains why we had been using "= 1" for both pmd-mapped and
>> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
>> madvise_cold_or_pageout_pte_range() to use folios"), because in the
>> past, we used the mapcount of the specific subpage, since the subpage
>> had pte present, its mapcount wouldn't be 0.
>>
>> The problem can be quite easily reproduced by writing a small program,
>> unmapping the first subpage of a pte-mapped large folio vs. unmapping
>> anyone other than the first subpage.
>>
>> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
>> Cc: Yin Fengwei <fengwei.yin@intel.com>
>> Cc: Yu Zhao <yuzhao@google.com>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>> ---
>> mm/madvise.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index cfa5e7288261..abde3edb04f0 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>> if (folio_test_large(folio)) {
>> int err;
>>
>> - if (folio_estimated_sharers(folio) != 1)
>> + if (folio_estimated_sharers(folio) > 1)
>> break;
>> if (pageout_anon_only_filter && !folio_test_anon(folio))
>> break;
>
> I wonder if we should change all the instances:
>
> folio_estimated_sharers() != 1 -> folio_estimated_sharers() > 1
> folio_estimated_sharers() == 1 -> folio_estimated_sharers() <= 1
I'll send out something that wraps that in folio_mapped_shared() later
today or tomorrow.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
2024-02-26 13:46 ` Ryan Roberts
2024-02-26 13:50 ` David Hildenbrand
@ 2024-02-26 21:17 ` Barry Song
2024-02-27 9:10 ` Ryan Roberts
1 sibling, 1 reply; 8+ messages in thread
From: Barry Song @ 2024-02-26 21:17 UTC (permalink / raw)
To: Ryan Roberts
Cc: akpm, linux-mm, linux-kernel, Barry Song, Yin Fengwei, Yu Zhao,
David Hildenbrand, Kefeng Wang, Matthew Wilcox, Minchan Kim,
Vishal Moola, Yang Shi
On Tue, Feb 27, 2024 at 2:46 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 21/02/2024 08:50, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > The purpose is stopping splitting large folios whose mapcount are 2 or
> > above. Folios whose estimated_shares = 0 should be still perfect and
> > even better candidates than estimated_shares = 1.
> >
> > Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
> > the current code will split folios and reclaim them while madvise goes
> > on this folio; but if we unmap subpage 0, we will keep this folio and
> > break. This is weird.
> >
> > For pmd-mapped large folios, we can still use "= 1" as the condition
> > as anyway we have the entire map for it. So this patch doesn't change
> > the condition for pmd-mapped large folios.
> > This also explains why we had been using "= 1" for both pmd-mapped and
> > pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
> > madvise_cold_or_pageout_pte_range() to use folios"), because in the
> > past, we used the mapcount of the specific subpage, since the subpage
> > had pte present, its mapcount wouldn't be 0.
> >
> > The problem can be quite easily reproduced by writing a small program,
> > unmapping the first subpage of a pte-mapped large folio vs. unmapping
> > anyone other than the first subpage.
> >
> > Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
> > Cc: Yin Fengwei <fengwei.yin@intel.com>
> > Cc: Yu Zhao <yuzhao@google.com>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> > Cc: Yang Shi <shy828301@gmail.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > ---
> > mm/madvise.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index cfa5e7288261..abde3edb04f0 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> > if (folio_test_large(folio)) {
> > int err;
> >
> > - if (folio_estimated_sharers(folio) != 1)
> > + if (folio_estimated_sharers(folio) > 1)
> > break;
> > if (pageout_anon_only_filter && !folio_test_anon(folio))
> > break;
>
> I wonder if we should change all the instances:
>
> folio_estimated_sharers() != 1 -> folio_estimated_sharers() > 1
> folio_estimated_sharers() == 1 -> folio_estimated_sharers() <= 1
>
> It shouldn't cause a problem for the pmd case, and there are definitely other
> cases where it will help. e.g. madvise_free_pte_range().
right. My test case covered PAGEOUT only and I agree madvise_free and
others have
exactly the same issue. for pmd case, it doesn't matter whether we
change the condition
or not because we have already pmd-mapped in the page table.
And good to know David will have a wrapper in folio_mapped_shared() to more
widely address this issue.
>
> Regardless:
>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>
Thanks though we might have missed your tag as this one has been
in mm-stable.
Best regards,
Barry
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): allow split while folio_estimated_sharers = 0
2024-02-26 21:17 ` Barry Song
@ 2024-02-27 9:10 ` Ryan Roberts
0 siblings, 0 replies; 8+ messages in thread
From: Ryan Roberts @ 2024-02-27 9:10 UTC (permalink / raw)
To: Barry Song
Cc: akpm, linux-mm, linux-kernel, Barry Song, Yin Fengwei, Yu Zhao,
David Hildenbrand, Kefeng Wang, Matthew Wilcox, Minchan Kim,
Vishal Moola, Yang Shi
On 26/02/2024 21:17, Barry Song wrote:
> On Tue, Feb 27, 2024 at 2:46 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 21/02/2024 08:50, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> The purpose is stopping splitting large folios whose mapcount are 2 or
>>> above. Folios whose estimated_shares = 0 should be still perfect and
>>> even better candidates than estimated_shares = 1.
>>>
>>> Consider a pte-mapped large folio with 16 subpages, if we unmap 1-15,
>>> the current code will split folios and reclaim them while madvise goes
>>> on this folio; but if we unmap subpage 0, we will keep this folio and
>>> break. This is weird.
>>>
>>> For pmd-mapped large folios, we can still use "= 1" as the condition
>>> as anyway we have the entire map for it. So this patch doesn't change
>>> the condition for pmd-mapped large folios.
>>> This also explains why we had been using "= 1" for both pmd-mapped and
>>> pte-mapped large folios before commit 07e8c82b5eff ("madvise: convert
>>> madvise_cold_or_pageout_pte_range() to use folios"), because in the
>>> past, we used the mapcount of the specific subpage, since the subpage
>>> had pte present, its mapcount wouldn't be 0.
>>>
>>> The problem can be quite easily reproduced by writing a small program,
>>> unmapping the first subpage of a pte-mapped large folio vs. unmapping
>>> anyone other than the first subpage.
>>>
>>> Fixes: 2f406263e3e9 ("madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check")
>>> Cc: Yin Fengwei <fengwei.yin@intel.com>
>>> Cc: Yu Zhao <yuzhao@google.com>
>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
>>> Cc: Matthew Wilcox <willy@infradead.org>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
>>> Cc: Yang Shi <shy828301@gmail.com>
>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>> ---
>>> mm/madvise.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>> index cfa5e7288261..abde3edb04f0 100644
>>> --- a/mm/madvise.c
>>> +++ b/mm/madvise.c
>>> @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>>> if (folio_test_large(folio)) {
>>> int err;
>>>
>>> - if (folio_estimated_sharers(folio) != 1)
>>> + if (folio_estimated_sharers(folio) > 1)
>>> break;
>>> if (pageout_anon_only_filter && !folio_test_anon(folio))
>>> break;
>>
>> I wonder if we should change all the instances:
>>
>> folio_estimated_sharers() != 1 -> folio_estimated_sharers() > 1
>> folio_estimated_sharers() == 1 -> folio_estimated_sharers() <= 1
>>
>> It shouldn't cause a problem for the pmd case, and there are definitely other
>> cases where it will help. e.g. madvise_free_pte_range().
>
> right. My test case covered PAGEOUT only and I agree madvise_free and
> others have
> exactly the same issue. for pmd case, it doesn't matter whether we
> change the condition
> or not because we have already pmd-mapped in the page table.
>
> And good to know David will have a wrapper in folio_mapped_shared() to more
> widely address this issue.
>
>>
>> Regardless:
>>
>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>>
>
> Thanks though we might have missed your tag as this one has been
> in mm-stable.
No problem! I've been out on holiday so a bit behind on where everything is.
>
> Best regards,
> Barry
^ permalink raw reply [flat|nested] 8+ messages in thread