[PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
@ 2024-06-06  9:42 xu.xin16
  2024-06-06  9:49 ` Barry Song
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: xu.xin16 @ 2024-06-06  9:42 UTC (permalink / raw)
  To: david, ziy, v-songbaohua, akpm
  Cc: linux-kernel, linux-mm, mhocko, xu.xin16, yang.yang29, ran.xiaokai

From: Ran Xiaokai <ran.xiaokai@zte.com.cn>

When I did a large folios split test, a WARNING
"[ 5059.122759][  T166] Cannot split file folio to non-0 order"
was triggered. But the test cases are only for anonmous folios.
while mapping_large_folio_support() is only reasonable for page
cache folios.

In split_huge_page_to_list_to_order(), the folio passed to
mapping_large_folio_support() maybe anonmous folio. The
folio_test_anon() check is missing. So the split of the anonmous THP
is failed. This is also the same for shmem_mapping(). We'd better add
a check for both. But the shmem_mapping() in __split_huge_page() is
not involved, as for anonmous folios, the end parameter is set to -1, so
(head[i].index >= end) is always false. shmem_mapping() is not called.

Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
for anon mapping, So we can detect the wrong use more easily.

THP folios maybe exist in the pagecache even the file system doesn't
support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
is enabled, khugepaged will try to collapse read-only file-backed pages
to THP. But the mapping does not actually support multi order
large folios properly.

Using /sys/kernel/debug/split_huge_pages to verify this, with this
patch, large anon THP is successfully split and the warning is ceased.

Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
---
 include/linux/pagemap.h |  4 ++++
 mm/huge_memory.c        | 27 ++++++++++++++++-----------
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ee633712bba0..59f1df0cde5a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -381,6 +381,10 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
  */
 static inline bool mapping_large_folio_support(struct address_space *mapping)
 {
+	/* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
+	VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
+			"Anonymous mapping always supports large folio");
+
 	return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
 		test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 317de2afd371..62d57270b08e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3009,30 +3009,35 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 	if (new_order >= folio_order(folio))
 		return -EINVAL;

-	/* Cannot split anonymous THP to order-1 */
-	if (new_order == 1 && folio_test_anon(folio)) {
-		VM_WARN_ONCE(1, "Cannot split to order-1 folio");
-		return -EINVAL;
-	}
-
-	if (new_order) {
-		/* Only swapping a whole PMD-mapped folio is supported */
-		if (folio_test_swapcache(folio))
+	if (folio_test_anon(folio)) {
+		/* Cannot split anonymous THP to order-1 */
+		if (new_order == 1) {
+			VM_WARN_ONCE(1, "Cannot split to order-1 folio");
 			return -EINVAL;
+		}
+	} else if (new_order) {
 		/* Split shmem folio to non-zero order not supported */
 		if (shmem_mapping(folio->mapping)) {
 			VM_WARN_ONCE(1,
 				"Cannot split shmem folio to non-0 order");
 			return -EINVAL;
 		}
-		/* No split if the file system does not support large folio */
-		if (!mapping_large_folio_support(folio->mapping)) {
+		/* No split if the file system does not support large folio.
+		 * Note that we might still have THPs in such mappings due to
+		 * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
+		 * does not actually support large folios properly.
+		 */
+		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
+			!mapping_large_folio_support(folio->mapping)) {
 			VM_WARN_ONCE(1,
 				"Cannot split file folio to non-0 order");
 			return -EINVAL;
 		}
 	}

+	/* Only swapping a whole PMD-mapped folio is supported */
+	if (folio_test_swapcache(folio) && new_order)
+		return -EINVAL;

 	is_hzp = is_huge_zero_folio(folio);
 	if (is_hzp) {
-- 
2.15.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  2024-06-06  9:42 [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios xu.xin16
@ 2024-06-06  9:49 ` Barry Song
  2024-06-07  1:36   ` ran xiaokai
  2024-06-06 14:28 ` Zi Yan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Barry Song @ 2024-06-06  9:49 UTC (permalink / raw)
  To: xu.xin16
  Cc: david, ziy, v-songbaohua, akpm, linux-kernel, linux-mm, mhocko,
	yang.yang29, ran.xiaokai

On Thu, Jun 6, 2024 at 9:42 PM <xu.xin16@zte.com.cn> wrote:
>
> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>
> When I did a large folios split test, a WARNING
> "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
> was triggered. But the test cases are only for anonmous folios.
> while mapping_large_folio_support() is only reasonable for page
> cache folios.
>
> In split_huge_page_to_list_to_order(), the folio passed to
> mapping_large_folio_support() maybe anonmous folio. The
> folio_test_anon() check is missing. So the split of the anonmous THP
> is failed. This is also the same for shmem_mapping(). We'd better add
> a check for both. But the shmem_mapping() in __split_huge_page() is
> not involved, as for anonmous folios, the end parameter is set to -1, so
> (head[i].index >= end) is always false. shmem_mapping() is not called.
>
> Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
> for anon mapping, So we can detect the wrong use more easily.
>
> THP folios maybe exist in the pagecache even the file system doesn't
> support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
> is enabled, khugepaged will try to collapse read-only file-backed pages
> to THP. But the mapping does not actually support multi order
> large folios properly.
>
> Using /sys/kernel/debug/split_huge_pages to verify this, with this
> patch, large anon THP is successfully split and the warning is ceased.
>
> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> ---
>  include/linux/pagemap.h |  4 ++++
>  mm/huge_memory.c        | 27 ++++++++++++++++-----------
>  2 files changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index ee633712bba0..59f1df0cde5a 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -381,6 +381,10 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
>   */
>  static inline bool mapping_large_folio_support(struct address_space *mapping)
>  {
> +       /* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
> +       VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
> +                       "Anonymous mapping always supports large folio");
> +
>         return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
>                 test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
>  }
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 317de2afd371..62d57270b08e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3009,30 +3009,35 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>         if (new_order >= folio_order(folio))
>                 return -EINVAL;
>
> -       /* Cannot split anonymous THP to order-1 */
> -       if (new_order == 1 && folio_test_anon(folio)) {
> -               VM_WARN_ONCE(1, "Cannot split to order-1 folio");
> -               return -EINVAL;
> -       }
> -
> -       if (new_order) {
> -               /* Only swapping a whole PMD-mapped folio is supported */
> -               if (folio_test_swapcache(folio))
> +       if (folio_test_anon(folio)) {
> +               /* Cannot split anonymous THP to order-1 */

This is simply what the code is indicating. Shouldn't we phrase
it differently to explain "why" but not "how"? for example, anon
order-1 mTHP is not supported?

Otherwise, it looks good to me.

Reviewed-by: Barry Song <baohua@kernel.org>

> +               if (new_order == 1) {
> +                       VM_WARN_ONCE(1, "Cannot split to order-1 folio");
>                         return -EINVAL;
> +               }
> +       } else if (new_order) {
>                 /* Split shmem folio to non-zero order not supported */
>                 if (shmem_mapping(folio->mapping)) {
>                         VM_WARN_ONCE(1,
>                                 "Cannot split shmem folio to non-0 order");
>                         return -EINVAL;
>                 }
> -               /* No split if the file system does not support large folio */
> -               if (!mapping_large_folio_support(folio->mapping)) {
> +               /* No split if the file system does not support large folio.
> +                * Note that we might still have THPs in such mappings due to
> +                * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
> +                * does not actually support large folios properly.
> +                */
> +               if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> +                       !mapping_large_folio_support(folio->mapping)) {
>                         VM_WARN_ONCE(1,
>                                 "Cannot split file folio to non-0 order");
>                         return -EINVAL;
>                 }
>         }
>
> +       /* Only swapping a whole PMD-mapped folio is supported */
> +       if (folio_test_swapcache(folio) && new_order)
> +               return -EINVAL;
>
>         is_hzp = is_huge_zero_folio(folio);
>         if (is_hzp) {
> --
> 2.15.2
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  2024-06-06  9:42 [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios xu.xin16
  2024-06-06  9:49 ` Barry Song
@ 2024-06-06 14:28 ` Zi Yan
  2024-06-06 21:00   ` Barry Song
  2024-06-06 21:44 ` Zi Yan
  2024-06-07  7:56 ` David Hildenbrand
  3 siblings, 1 reply; 11+ messages in thread
From: Zi Yan @ 2024-06-06 14:28 UTC (permalink / raw)
  To: xu.xin16
  Cc: david, v-songbaohua, akpm, linux-kernel, linux-mm, mhocko,
	yang.yang29, ran.xiaokai, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 4613 bytes --]

+Matthew

For mapping_large_folio_support() changes.

On 6 Jun 2024, at 2:42, xu.xin16@zte.com.cn wrote:

> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>
> When I did a large folios split test, a WARNING
> "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
> was triggered. But the test cases are only for anonmous folios.
> while mapping_large_folio_support() is only reasonable for page
> cache folios.
>
> In split_huge_page_to_list_to_order(), the folio passed to
> mapping_large_folio_support() maybe anonmous folio. The
> folio_test_anon() check is missing. So the split of the anonmous THP
> is failed. This is also the same for shmem_mapping(). We'd better add
> a check for both. But the shmem_mapping() in __split_huge_page() is
> not involved, as for anonmous folios, the end parameter is set to -1, so
> (head[i].index >= end) is always false. shmem_mapping() is not called.
>
> Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
> for anon mapping, So we can detect the wrong use more easily.
>
> THP folios maybe exist in the pagecache even the file system doesn't
> support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
> is enabled, khugepaged will try to collapse read-only file-backed pages
> to THP. But the mapping does not actually support multi order
> large folios properly.
>
> Using /sys/kernel/debug/split_huge_pages to verify this, with this
> patch, large anon THP is successfully split and the warning is ceased.
>
> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> ---
>  include/linux/pagemap.h |  4 ++++
>  mm/huge_memory.c        | 27 ++++++++++++++++-----------
>  2 files changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index ee633712bba0..59f1df0cde5a 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -381,6 +381,10 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
>   */
>  static inline bool mapping_large_folio_support(struct address_space *mapping)
>  {
> +	/* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
> +	VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
> +			"Anonymous mapping always supports large folio");
> +
>  	return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
>  		test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
>  }
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 317de2afd371..62d57270b08e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3009,30 +3009,35 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>  	if (new_order >= folio_order(folio))
>  		return -EINVAL;
>
> -	/* Cannot split anonymous THP to order-1 */
> -	if (new_order == 1 && folio_test_anon(folio)) {
> -		VM_WARN_ONCE(1, "Cannot split to order-1 folio");
> -		return -EINVAL;
> -	}
> -
> -	if (new_order) {
> -		/* Only swapping a whole PMD-mapped folio is supported */
> -		if (folio_test_swapcache(folio))
> +	if (folio_test_anon(folio)) {
> +		/* Cannot split anonymous THP to order-1 */
> +		if (new_order == 1) {
> +			VM_WARN_ONCE(1, "Cannot split to order-1 folio");
>  			return -EINVAL;
> +		}
> +	} else if (new_order) {
>  		/* Split shmem folio to non-zero order not supported */
>  		if (shmem_mapping(folio->mapping)) {
>  			VM_WARN_ONCE(1,
>  				"Cannot split shmem folio to non-0 order");
>  			return -EINVAL;
>  		}
> -		/* No split if the file system does not support large folio */
> -		if (!mapping_large_folio_support(folio->mapping)) {
> +		/* No split if the file system does not support large folio.
> +		 * Note that we might still have THPs in such mappings due to
> +		 * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
> +		 * does not actually support large folios properly.
> +		 */
> +		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> +			!mapping_large_folio_support(folio->mapping)) {

Shouldn’t this be

if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
	!mapping_large_folio_support(folio->mapping)) {

?

When CONFIG_READ_ONLY_THP_FOR_FS is not set, we need to check
mapping_large_folio_support(), otherwise we do not.

>  			VM_WARN_ONCE(1,
>  				"Cannot split file folio to non-0 order");
>  			return -EINVAL;
>  		}
>  	}
>
> +	/* Only swapping a whole PMD-mapped folio is supported */
> +	if (folio_test_swapcache(folio) && new_order)
> +		return -EINVAL;
>
>  	is_hzp = is_huge_zero_folio(folio);
>  	if (is_hzp) {
> -- 
> 2.15.2


Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  2024-06-06 14:28 ` Zi Yan
@ 2024-06-06 21:00   ` Barry Song
  2024-06-06 21:24     ` Zi Yan
  0 siblings, 1 reply; 11+ messages in thread
From: Barry Song @ 2024-06-06 21:00 UTC (permalink / raw)
  To: Zi Yan
  Cc: xu.xin16, david, v-songbaohua, akpm, linux-kernel, linux-mm,
	mhocko, yang.yang29, ran.xiaokai, Matthew Wilcox

On Fri, Jun 7, 2024 at 2:35 AM Zi Yan <ziy@nvidia.com> wrote:
>
> +Matthew
>
> For mapping_large_folio_support() changes.
>
> On 6 Jun 2024, at 2:42, xu.xin16@zte.com.cn wrote:
>
> > From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >
> > When I did a large folios split test, a WARNING
> > "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
> > was triggered. But the test cases are only for anonmous folios.
> > while mapping_large_folio_support() is only reasonable for page
> > cache folios.
> >
> > In split_huge_page_to_list_to_order(), the folio passed to
> > mapping_large_folio_support() maybe anonmous folio. The
> > folio_test_anon() check is missing. So the split of the anonmous THP
> > is failed. This is also the same for shmem_mapping(). We'd better add
> > a check for both. But the shmem_mapping() in __split_huge_page() is
> > not involved, as for anonmous folios, the end parameter is set to -1, so
> > (head[i].index >= end) is always false. shmem_mapping() is not called.
> >
> > Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
> > for anon mapping, So we can detect the wrong use more easily.
> >
> > THP folios maybe exist in the pagecache even the file system doesn't
> > support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
> > is enabled, khugepaged will try to collapse read-only file-backed pages
> > to THP. But the mapping does not actually support multi order
> > large folios properly.
> >
> > Using /sys/kernel/debug/split_huge_pages to verify this, with this
> > patch, large anon THP is successfully split and the warning is ceased.
> >
> > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> > ---
> >  include/linux/pagemap.h |  4 ++++
> >  mm/huge_memory.c        | 27 ++++++++++++++++-----------
> >  2 files changed, 20 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index ee633712bba0..59f1df0cde5a 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -381,6 +381,10 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
> >   */
> >  static inline bool mapping_large_folio_support(struct address_space *mapping)
> >  {
> > +     /* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
> > +     VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
> > +                     "Anonymous mapping always supports large folio");
> > +
> >       return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> >               test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
> >  }
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 317de2afd371..62d57270b08e 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -3009,30 +3009,35 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >       if (new_order >= folio_order(folio))
> >               return -EINVAL;
> >
> > -     /* Cannot split anonymous THP to order-1 */
> > -     if (new_order == 1 && folio_test_anon(folio)) {
> > -             VM_WARN_ONCE(1, "Cannot split to order-1 folio");
> > -             return -EINVAL;
> > -     }
> > -
> > -     if (new_order) {
> > -             /* Only swapping a whole PMD-mapped folio is supported */
> > -             if (folio_test_swapcache(folio))
> > +     if (folio_test_anon(folio)) {
> > +             /* Cannot split anonymous THP to order-1 */
> > +             if (new_order == 1) {
> > +                     VM_WARN_ONCE(1, "Cannot split to order-1 folio");
> >                       return -EINVAL;
> > +             }
> > +     } else if (new_order) {
> >               /* Split shmem folio to non-zero order not supported */
> >               if (shmem_mapping(folio->mapping)) {
> >                       VM_WARN_ONCE(1,
> >                               "Cannot split shmem folio to non-0 order");
> >                       return -EINVAL;
> >               }
> > -             /* No split if the file system does not support large folio */
> > -             if (!mapping_large_folio_support(folio->mapping)) {
> > +             /* No split if the file system does not support large folio.
> > +              * Note that we might still have THPs in such mappings due to
> > +              * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
> > +              * does not actually support large folios properly.
> > +              */
> > +             if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> > +                     !mapping_large_folio_support(folio->mapping)) {
>
> Shouldn’t this be
>
> if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>         !mapping_large_folio_support(folio->mapping)) {
>
> ?
>
> When CONFIG_READ_ONLY_THP_FOR_FS is not set, we need to check
> mapping_large_folio_support(), otherwise we do not.

while CONFIG_READ_ONLY_THP_FOR_FS is not set, that is no way
a large folio can be mapped to a filesystem which doesn't support
large folio mapping. i think
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) is correct.

The below means a BUG which has never a chance to happen if it
is true.

!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
        !mapping_large_folio_support(folio->mapping));

>
> >                       VM_WARN_ONCE(1,
> >                               "Cannot split file folio to non-0 order");
> >                       return -EINVAL;
> >               }
> >       }
> >
> > +     /* Only swapping a whole PMD-mapped folio is supported */
> > +     if (folio_test_swapcache(folio) && new_order)
> > +             return -EINVAL;
> >
> >       is_hzp = is_huge_zero_folio(folio);
> >       if (is_hzp) {
> > --
> > 2.15.2
>
>
> Best Regards,
> Yan, Zi

Thanks
Barry


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  2024-06-06 21:00   ` Barry Song
@ 2024-06-06 21:24     ` Zi Yan
  2024-06-06 21:33       ` Barry Song
  0 siblings, 1 reply; 11+ messages in thread
From: Zi Yan @ 2024-06-06 21:24 UTC (permalink / raw)
  To: Barry Song
  Cc: xu.xin16, david, v-songbaohua, akpm, linux-kernel, linux-mm,
	mhocko, yang.yang29, ran.xiaokai, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 6256 bytes --]

On 6 Jun 2024, at 14:00, Barry Song wrote:

> On Fri, Jun 7, 2024 at 2:35 AM Zi Yan <ziy@nvidia.com> wrote:
>>
>> +Matthew
>>
>> For mapping_large_folio_support() changes.
>>
>> On 6 Jun 2024, at 2:42, xu.xin16@zte.com.cn wrote:
>>
>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>>
>>> When I did a large folios split test, a WARNING
>>> "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
>>> was triggered. But the test cases are only for anonmous folios.
>>> while mapping_large_folio_support() is only reasonable for page
>>> cache folios.
>>>
>>> In split_huge_page_to_list_to_order(), the folio passed to
>>> mapping_large_folio_support() maybe anonmous folio. The
>>> folio_test_anon() check is missing. So the split of the anonmous THP
>>> is failed. This is also the same for shmem_mapping(). We'd better add
>>> a check for both. But the shmem_mapping() in __split_huge_page() is
>>> not involved, as for anonmous folios, the end parameter is set to -1, so
>>> (head[i].index >= end) is always false. shmem_mapping() is not called.
>>>
>>> Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
>>> for anon mapping, So we can detect the wrong use more easily.
>>>
>>> THP folios maybe exist in the pagecache even the file system doesn't
>>> support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
>>> is enabled, khugepaged will try to collapse read-only file-backed pages
>>> to THP. But the mapping does not actually support multi order
>>> large folios properly.
>>>
>>> Using /sys/kernel/debug/split_huge_pages to verify this, with this
>>> patch, large anon THP is successfully split and the warning is ceased.
>>>
>>> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>> ---
>>>  include/linux/pagemap.h |  4 ++++
>>>  mm/huge_memory.c        | 27 ++++++++++++++++-----------
>>>  2 files changed, 20 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>>> index ee633712bba0..59f1df0cde5a 100644
>>> --- a/include/linux/pagemap.h
>>> +++ b/include/linux/pagemap.h
>>> @@ -381,6 +381,10 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
>>>   */
>>>  static inline bool mapping_large_folio_support(struct address_space *mapping)
>>>  {
>>> +     /* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
>>> +     VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
>>> +                     "Anonymous mapping always supports large folio");
>>> +
>>>       return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
>>>               test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
>>>  }
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 317de2afd371..62d57270b08e 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -3009,30 +3009,35 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>>       if (new_order >= folio_order(folio))
>>>               return -EINVAL;
>>>
>>> -     /* Cannot split anonymous THP to order-1 */
>>> -     if (new_order == 1 && folio_test_anon(folio)) {
>>> -             VM_WARN_ONCE(1, "Cannot split to order-1 folio");
>>> -             return -EINVAL;
>>> -     }
>>> -
>>> -     if (new_order) {
>>> -             /* Only swapping a whole PMD-mapped folio is supported */
>>> -             if (folio_test_swapcache(folio))
>>> +     if (folio_test_anon(folio)) {
>>> +             /* Cannot split anonymous THP to order-1 */
>>> +             if (new_order == 1) {
>>> +                     VM_WARN_ONCE(1, "Cannot split to order-1 folio");
>>>                       return -EINVAL;
>>> +             }
>>> +     } else if (new_order) {
>>>               /* Split shmem folio to non-zero order not supported */
>>>               if (shmem_mapping(folio->mapping)) {
>>>                       VM_WARN_ONCE(1,
>>>                               "Cannot split shmem folio to non-0 order");
>>>                       return -EINVAL;
>>>               }
>>> -             /* No split if the file system does not support large folio */
>>> -             if (!mapping_large_folio_support(folio->mapping)) {
>>> +             /* No split if the file system does not support large folio.
>>> +              * Note that we might still have THPs in such mappings due to
>>> +              * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
>>> +              * does not actually support large folios properly.
>>> +              */
>>> +             if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>> +                     !mapping_large_folio_support(folio->mapping)) {
>>
>> Shouldn’t this be
>>
>> if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>         !mapping_large_folio_support(folio->mapping)) {
>>
>> ?
>>
>> When CONFIG_READ_ONLY_THP_FOR_FS is not set, we need to check
>> mapping_large_folio_support(), otherwise we do not.
>
> while CONFIG_READ_ONLY_THP_FOR_FS is not set, that is no way
> a large folio can be mapped to a filesystem which doesn't support
> large folio mapping. i think

That is why we have the warning below to catch this undesired
case.

> if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) is correct.

When it is set, khugepaged can create a large pagecache folio
on a filesystem without large folio support and the warning
will be triggered once the created large pagecache folio
is split. That is not what we want.

>
> The below means a BUG which has never a chance to happen if it
> is true.
>
> !IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>         !mapping_large_folio_support(folio->mapping));
>
>>
>>>                       VM_WARN_ONCE(1,
>>>                               "Cannot split file folio to non-0 order");
>>>                       return -EINVAL;
>>>               }
>>>       }
>>>
>>> +     /* Only swapping a whole PMD-mapped folio is supported */
>>> +     if (folio_test_swapcache(folio) && new_order)
>>> +             return -EINVAL;
>>>
>>>       is_hzp = is_huge_zero_folio(folio);
>>>       if (is_hzp) {
>>> --
>>> 2.15.2
>>
>>
>> Best Regards,
>> Yan, Zi
>
> Thanks
> Barry


Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  2024-06-06 21:24     ` Zi Yan
@ 2024-06-06 21:33       ` Barry Song
  2024-06-06 21:42         ` Zi Yan
  0 siblings, 1 reply; 11+ messages in thread
From: Barry Song @ 2024-06-06 21:33 UTC (permalink / raw)
  To: Zi Yan
  Cc: xu.xin16, david, v-songbaohua, akpm, linux-kernel, linux-mm,
	mhocko, yang.yang29, ran.xiaokai, Matthew Wilcox

On Fri, Jun 7, 2024 at 9:24 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 6 Jun 2024, at 14:00, Barry Song wrote:
>
> > On Fri, Jun 7, 2024 at 2:35 AM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> +Matthew
> >>
> >> For mapping_large_folio_support() changes.
> >>
> >> On 6 Jun 2024, at 2:42, xu.xin16@zte.com.cn wrote:
> >>
> >>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >>>
> >>> When I did a large folios split test, a WARNING
> >>> "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
> >>> was triggered. But the test cases are only for anonmous folios.
> >>> while mapping_large_folio_support() is only reasonable for page
> >>> cache folios.
> >>>
> >>> In split_huge_page_to_list_to_order(), the folio passed to
> >>> mapping_large_folio_support() maybe anonmous folio. The
> >>> folio_test_anon() check is missing. So the split of the anonmous THP
> >>> is failed. This is also the same for shmem_mapping(). We'd better add
> >>> a check for both. But the shmem_mapping() in __split_huge_page() is
> >>> not involved, as for anonmous folios, the end parameter is set to -1, so
> >>> (head[i].index >= end) is always false. shmem_mapping() is not called.
> >>>
> >>> Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
> >>> for anon mapping, So we can detect the wrong use more easily.
> >>>
> >>> THP folios maybe exist in the pagecache even the file system doesn't
> >>> support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
> >>> is enabled, khugepaged will try to collapse read-only file-backed pages
> >>> to THP. But the mapping does not actually support multi order
> >>> large folios properly.
> >>>
> >>> Using /sys/kernel/debug/split_huge_pages to verify this, with this
> >>> patch, large anon THP is successfully split and the warning is ceased.
> >>>
> >>> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >>> ---
> >>>  include/linux/pagemap.h |  4 ++++
> >>>  mm/huge_memory.c        | 27 ++++++++++++++++-----------
> >>>  2 files changed, 20 insertions(+), 11 deletions(-)
> >>>
> >>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> >>> index ee633712bba0..59f1df0cde5a 100644
> >>> --- a/include/linux/pagemap.h
> >>> +++ b/include/linux/pagemap.h
> >>> @@ -381,6 +381,10 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
> >>>   */
> >>>  static inline bool mapping_large_folio_support(struct address_space *mapping)
> >>>  {
> >>> +     /* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
> >>> +     VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
> >>> +                     "Anonymous mapping always supports large folio");
> >>> +
> >>>       return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> >>>               test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
> >>>  }
> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>> index 317de2afd371..62d57270b08e 100644
> >>> --- a/mm/huge_memory.c
> >>> +++ b/mm/huge_memory.c
> >>> @@ -3009,30 +3009,35 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >>>       if (new_order >= folio_order(folio))
> >>>               return -EINVAL;
> >>>
> >>> -     /* Cannot split anonymous THP to order-1 */
> >>> -     if (new_order == 1 && folio_test_anon(folio)) {
> >>> -             VM_WARN_ONCE(1, "Cannot split to order-1 folio");
> >>> -             return -EINVAL;
> >>> -     }
> >>> -
> >>> -     if (new_order) {
> >>> -             /* Only swapping a whole PMD-mapped folio is supported */
> >>> -             if (folio_test_swapcache(folio))
> >>> +     if (folio_test_anon(folio)) {
> >>> +             /* Cannot split anonymous THP to order-1 */
> >>> +             if (new_order == 1) {
> >>> +                     VM_WARN_ONCE(1, "Cannot split to order-1 folio");
> >>>                       return -EINVAL;
> >>> +             }
> >>> +     } else if (new_order) {
> >>>               /* Split shmem folio to non-zero order not supported */
> >>>               if (shmem_mapping(folio->mapping)) {
> >>>                       VM_WARN_ONCE(1,
> >>>                               "Cannot split shmem folio to non-0 order");
> >>>                       return -EINVAL;
> >>>               }
> >>> -             /* No split if the file system does not support large folio */
> >>> -             if (!mapping_large_folio_support(folio->mapping)) {
> >>> +             /* No split if the file system does not support large folio.
> >>> +              * Note that we might still have THPs in such mappings due to
> >>> +              * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
> >>> +              * does not actually support large folios properly.
> >>> +              */
> >>> +             if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> >>> +                     !mapping_large_folio_support(folio->mapping)) {
> >>
> >> Shouldn’t this be
> >>
> >> if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> >>         !mapping_large_folio_support(folio->mapping)) {
> >>
> >> ?
> >>
> >> When CONFIG_READ_ONLY_THP_FOR_FS is not set, we need to check
> >> mapping_large_folio_support(), otherwise we do not.
> >
> > while CONFIG_READ_ONLY_THP_FOR_FS is not set, that is no way
> > a large folio can be mapped to a filesystem which doesn't support
> > large folio mapping. i think
>
> That is why we have the warning below to catch this undesired
> case.
>
> > if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) is correct.
>
> When it is set, khugepaged can create a large pagecache folio
> on a filesystem without large folio support and the warning
> will be triggered once the created large pagecache folio
> is split. That is not what we want.

yes. This is exactly why we need if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
but not if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) .

because if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)), folio is definitely
pointing to a file system supporting large folio. otherwise, it is a bug.

>
> >
> > The below means a BUG which has never a chance to happen if it
> > is true.
> >
> > !IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> >         !mapping_large_folio_support(folio->mapping));
> >
> >>
> >>>                       VM_WARN_ONCE(1,
> >>>                               "Cannot split file folio to non-0 order");
> >>>                       return -EINVAL;
> >>>               }
> >>>       }
> >>>
> >>> +     /* Only swapping a whole PMD-mapped folio is supported */
> >>> +     if (folio_test_swapcache(folio) && new_order)
> >>> +             return -EINVAL;
> >>>
> >>>       is_hzp = is_huge_zero_folio(folio);
> >>>       if (is_hzp) {
> >>> --
> >>> 2.15.2
> >>
> >>
> >> Best Regards,
> >> Yan, Zi
> >
> > Thanks
> > Barry
>
>
> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  2024-06-06 21:33       ` Barry Song
@ 2024-06-06 21:42         ` Zi Yan
  0 siblings, 0 replies; 11+ messages in thread
From: Zi Yan @ 2024-06-06 21:42 UTC (permalink / raw)
  To: Barry Song
  Cc: xu.xin16, david, v-songbaohua, akpm, linux-kernel, linux-mm,
	mhocko, yang.yang29, ran.xiaokai, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 7071 bytes --]

On 6 Jun 2024, at 14:33, Barry Song wrote:

> On Fri, Jun 7, 2024 at 9:24 AM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 6 Jun 2024, at 14:00, Barry Song wrote:
>>
>>> On Fri, Jun 7, 2024 at 2:35 AM Zi Yan <ziy@nvidia.com> wrote:
>>>>
>>>> +Matthew
>>>>
>>>> For mapping_large_folio_support() changes.
>>>>
>>>> On 6 Jun 2024, at 2:42, xu.xin16@zte.com.cn wrote:
>>>>
>>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>>>>
>>>>> When I did a large folios split test, a WARNING
>>>>> "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
>>>>> was triggered. But the test cases are only for anonmous folios.
>>>>> while mapping_large_folio_support() is only reasonable for page
>>>>> cache folios.
>>>>>
>>>>> In split_huge_page_to_list_to_order(), the folio passed to
>>>>> mapping_large_folio_support() maybe anonmous folio. The
>>>>> folio_test_anon() check is missing. So the split of the anonmous THP
>>>>> is failed. This is also the same for shmem_mapping(). We'd better add
>>>>> a check for both. But the shmem_mapping() in __split_huge_page() is
>>>>> not involved, as for anonmous folios, the end parameter is set to -1, so
>>>>> (head[i].index >= end) is always false. shmem_mapping() is not called.
>>>>>
>>>>> Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
>>>>> for anon mapping, So we can detect the wrong use more easily.
>>>>>
>>>>> THP folios maybe exist in the pagecache even the file system doesn't
>>>>> support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
>>>>> is enabled, khugepaged will try to collapse read-only file-backed pages
>>>>> to THP. But the mapping does not actually support multi order
>>>>> large folios properly.
>>>>>
>>>>> Using /sys/kernel/debug/split_huge_pages to verify this, with this
>>>>> patch, large anon THP is successfully split and the warning is ceased.
>>>>>
>>>>> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>>>> ---
>>>>>  include/linux/pagemap.h |  4 ++++
>>>>>  mm/huge_memory.c        | 27 ++++++++++++++++-----------
>>>>>  2 files changed, 20 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>>>>> index ee633712bba0..59f1df0cde5a 100644
>>>>> --- a/include/linux/pagemap.h
>>>>> +++ b/include/linux/pagemap.h
>>>>> @@ -381,6 +381,10 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
>>>>>   */
>>>>>  static inline bool mapping_large_folio_support(struct address_space *mapping)
>>>>>  {
>>>>> +     /* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
>>>>> +     VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
>>>>> +                     "Anonymous mapping always supports large folio");
>>>>> +
>>>>>       return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
>>>>>               test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
>>>>>  }
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 317de2afd371..62d57270b08e 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -3009,30 +3009,35 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>>>>       if (new_order >= folio_order(folio))
>>>>>               return -EINVAL;
>>>>>
>>>>> -     /* Cannot split anonymous THP to order-1 */
>>>>> -     if (new_order == 1 && folio_test_anon(folio)) {
>>>>> -             VM_WARN_ONCE(1, "Cannot split to order-1 folio");
>>>>> -             return -EINVAL;
>>>>> -     }
>>>>> -
>>>>> -     if (new_order) {
>>>>> -             /* Only swapping a whole PMD-mapped folio is supported */
>>>>> -             if (folio_test_swapcache(folio))
>>>>> +     if (folio_test_anon(folio)) {
>>>>> +             /* Cannot split anonymous THP to order-1 */
>>>>> +             if (new_order == 1) {
>>>>> +                     VM_WARN_ONCE(1, "Cannot split to order-1 folio");
>>>>>                       return -EINVAL;
>>>>> +             }
>>>>> +     } else if (new_order) {
>>>>>               /* Split shmem folio to non-zero order not supported */
>>>>>               if (shmem_mapping(folio->mapping)) {
>>>>>                       VM_WARN_ONCE(1,
>>>>>                               "Cannot split shmem folio to non-0 order");
>>>>>                       return -EINVAL;
>>>>>               }
>>>>> -             /* No split if the file system does not support large folio */
>>>>> -             if (!mapping_large_folio_support(folio->mapping)) {
>>>>> +             /* No split if the file system does not support large folio.
>>>>> +              * Note that we might still have THPs in such mappings due to
>>>>> +              * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
>>>>> +              * does not actually support large folios properly.
>>>>> +              */
>>>>> +             if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>>>> +                     !mapping_large_folio_support(folio->mapping)) {
>>>>
>>>> Shouldn’t this be
>>>>
>>>> if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>>>         !mapping_large_folio_support(folio->mapping)) {
>>>>
>>>> ?
>>>>
>>>> When CONFIG_READ_ONLY_THP_FOR_FS is not set, we need to check
>>>> mapping_large_folio_support(), otherwise we do not.
>>>
>>> while CONFIG_READ_ONLY_THP_FOR_FS is not set, that is no way
>>> a large folio can be mapped to a filesystem which doesn't support
>>> large folio mapping. i think
>>
>> That is why we have the warning below to catch this undesired
>> case.
>>
>>> if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) is correct.
>>
>> When it is set, khugepaged can create a large pagecache folio
>> on a filesystem without large folio support and the warning
>> will be triggered once the created large pagecache folio
>> is split. That is not what we want.
>
> yes. This is exactly why we need if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> but not if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) .
>
> because if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)), folio is definitely
> pointing to a file system supporting large folio. otherwise, it is a bug.

Oh, got it. Thanks for the explanation. :)

>>
>>>
>>> The below means a BUG which has never a chance to happen if it
>>> is true.
>>>
>>> !IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
>>>         !mapping_large_folio_support(folio->mapping));
>>>
>>>>
>>>>>                       VM_WARN_ONCE(1,
>>>>>                               "Cannot split file folio to non-0 order");
>>>>>                       return -EINVAL;
>>>>>               }
>>>>>       }
>>>>>
>>>>> +     /* Only swapping a whole PMD-mapped folio is supported */
>>>>> +     if (folio_test_swapcache(folio) && new_order)
>>>>> +             return -EINVAL;
>>>>>
>>>>>       is_hzp = is_huge_zero_folio(folio);
>>>>>       if (is_hzp) {
>>>>> --
>>>>> 2.15.2
>>>>
>>>>
>>>> Best Regards,
>>>> Yan, Zi
>>>
>>> Thanks
>>> Barry
>>
>>
>> Best Regards,
>> Yan, Zi


Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  2024-06-06  9:42 [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios xu.xin16
  2024-06-06  9:49 ` Barry Song
  2024-06-06 14:28 ` Zi Yan
@ 2024-06-06 21:44 ` Zi Yan
  2024-06-07  7:56 ` David Hildenbrand
  3 siblings, 0 replies; 11+ messages in thread
From: Zi Yan @ 2024-06-06 21:44 UTC (permalink / raw)
  To: xu.xin16
  Cc: david, v-songbaohua, akpm, linux-kernel, linux-mm, mhocko,
	yang.yang29, ran.xiaokai

[-- Attachment #1: Type: text/plain, Size: 1708 bytes --]

On 6 Jun 2024, at 2:42, xu.xin16@zte.com.cn wrote:

> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>
> When I did a large folios split test, a WARNING
> "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
> was triggered. But the test cases are only for anonmous folios.
> while mapping_large_folio_support() is only reasonable for page
> cache folios.
>
> In split_huge_page_to_list_to_order(), the folio passed to
> mapping_large_folio_support() maybe anonmous folio. The
> folio_test_anon() check is missing. So the split of the anonmous THP
> is failed. This is also the same for shmem_mapping(). We'd better add
> a check for both. But the shmem_mapping() in __split_huge_page() is
> not involved, as for anonmous folios, the end parameter is set to -1, so
> (head[i].index >= end) is always false. shmem_mapping() is not called.
>
> Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
> for anon mapping, So we can detect the wrong use more easily.
>
> THP folios maybe exist in the pagecache even the file system doesn't
> support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
> is enabled, khugepaged will try to collapse read-only file-backed pages
> to THP. But the mapping does not actually support multi order
> large folios properly.
>
> Using /sys/kernel/debug/split_huge_pages to verify this, with this
> patch, large anon THP is successfully split and the warning is ceased.
>
> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> ---
>  include/linux/pagemap.h |  4 ++++
>  mm/huge_memory.c        | 27 ++++++++++++++++-----------
>  2 files changed, 20 insertions(+), 11 deletions(-)

Reviewed-by: Zi Yan <ziy@nvidia.com>

Thanks.

Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  2024-06-06  9:49 ` Barry Song
@ 2024-06-07  1:36   ` ran xiaokai
  2024-06-07  2:00     ` Barry Song
  0 siblings, 1 reply; 11+ messages in thread
From: ran xiaokai @ 2024-06-07  1:36 UTC (permalink / raw)
  To: 21cnbao
  Cc: akpm, david, linux-kernel, linux-mm, mhocko, ran.xiaokai,
	v-songbaohua, si.hao, xu.xin16, yang.yang29, ziy

> > From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >
> > When I did a large folios split test, a WARNING
> > "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
> > was triggered. But the test cases are only for anonmous folios.
> > while mapping_large_folio_support() is only reasonable for page
> > cache folios.
> >
> > In split_huge_page_to_list_to_order(), the folio passed to
> > mapping_large_folio_support() maybe anonmous folio. The
> > folio_test_anon() check is missing. So the split of the anonmous THP
> > is failed. This is also the same for shmem_mapping(). We'd better add
> > a check for both. But the shmem_mapping() in __split_huge_page() is
> > not involved, as for anonmous folios, the end parameter is set to -1, so
> > (head[i].index >= end) is always false. shmem_mapping() is not called.
> >
> > Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
> > for anon mapping, So we can detect the wrong use more easily.
> >
> > THP folios maybe exist in the pagecache even the file system doesn't
> > support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
> > is enabled, khugepaged will try to collapse read-only file-backed pages
> > to THP. But the mapping does not actually support multi order
> > large folios properly.
> >
> > Using /sys/kernel/debug/split_huge_pages to verify this, with this
> > patch, large anon THP is successfully split and the warning is ceased.
> >
> > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> > ---
> >  include/linux/pagemap.h |  4 ++++
> >  mm/huge_memory.c        | 27 ++++++++++++++++-----------
> >  2 files changed, 20 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index ee633712bba0..59f1df0cde5a 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -381,6 +381,10 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
> >   */
> >  static inline bool mapping_large_folio_support(struct address_space *mapping)
> >  {
> > +       /* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
> > +       VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
> > +                       "Anonymous mapping always supports large folio");
> > +
> >         return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> >                 test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
> >  }
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 317de2afd371..62d57270b08e 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -3009,30 +3009,35 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >         if (new_order >= folio_order(folio))
> >                 return -EINVAL;
> >
> > -       /* Cannot split anonymous THP to order-1 */
> > -       if (new_order == 1 && folio_test_anon(folio)) {
> > -               VM_WARN_ONCE(1, "Cannot split to order-1 folio");
> > -               return -EINVAL;
> > -       }
> > -
> > -       if (new_order) {
> > -               /* Only swapping a whole PMD-mapped folio is supported */
> > -               if (folio_test_swapcache(folio))
> > +       if (folio_test_anon(folio)) {
> > +               /* Cannot split anonymous THP to order-1 */
> 
> This is simply what the code is indicating. Shouldn't we phrase
> it differently to explain "why" but not "how"? for example, anon
> order-1 mTHP is not supported?

Hi, Barry,
Good comments, thanks.
Is "order-1 is not a anonymouns mTHP suitable order." better?
 
> Otherwise, it looks good to me.
> 
> Reviewed-by: Barry Song <baohua@kernel.org>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  2024-06-07  1:36   ` ran xiaokai
@ 2024-06-07  2:00     ` Barry Song
  0 siblings, 0 replies; 11+ messages in thread
From: Barry Song @ 2024-06-07  2:00 UTC (permalink / raw)
  To: ran xiaokai
  Cc: akpm, david, linux-kernel, linux-mm, mhocko, ran.xiaokai,
	v-songbaohua, si.hao, xu.xin16, yang.yang29, ziy

On Fri, Jun 7, 2024 at 1:37 PM ran xiaokai <ranxiaokai627@163.com> wrote:
>
> > > From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> > >
> > > When I did a large folios split test, a WARNING
> > > "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
> > > was triggered. But the test cases are only for anonmous folios.
> > > while mapping_large_folio_support() is only reasonable for page
> > > cache folios.
> > >
> > > In split_huge_page_to_list_to_order(), the folio passed to
> > > mapping_large_folio_support() maybe anonmous folio. The
> > > folio_test_anon() check is missing. So the split of the anonmous THP
> > > is failed. This is also the same for shmem_mapping(). We'd better add
> > > a check for both. But the shmem_mapping() in __split_huge_page() is
> > > not involved, as for anonmous folios, the end parameter is set to -1, so
> > > (head[i].index >= end) is always false. shmem_mapping() is not called.
> > >
> > > Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
> > > for anon mapping, So we can detect the wrong use more easily.
> > >
> > > THP folios maybe exist in the pagecache even the file system doesn't
> > > support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
> > > is enabled, khugepaged will try to collapse read-only file-backed pages
> > > to THP. But the mapping does not actually support multi order
> > > large folios properly.
> > >
> > > Using /sys/kernel/debug/split_huge_pages to verify this, with this
> > > patch, large anon THP is successfully split and the warning is ceased.
> > >
> > > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> > > ---
> > >  include/linux/pagemap.h |  4 ++++
> > >  mm/huge_memory.c        | 27 ++++++++++++++++-----------
> > >  2 files changed, 20 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > > index ee633712bba0..59f1df0cde5a 100644
> > > --- a/include/linux/pagemap.h
> > > +++ b/include/linux/pagemap.h
> > > @@ -381,6 +381,10 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
> > >   */
> > >  static inline bool mapping_large_folio_support(struct address_space *mapping)
> > >  {
> > > +       /* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
> > > +       VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
> > > +                       "Anonymous mapping always supports large folio");
> > > +
> > >         return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> > >                 test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
> > >  }
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 317de2afd371..62d57270b08e 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -3009,30 +3009,35 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> > >         if (new_order >= folio_order(folio))
> > >                 return -EINVAL;
> > >
> > > -       /* Cannot split anonymous THP to order-1 */
> > > -       if (new_order == 1 && folio_test_anon(folio)) {
> > > -               VM_WARN_ONCE(1, "Cannot split to order-1 folio");
> > > -               return -EINVAL;
> > > -       }
> > > -
> > > -       if (new_order) {
> > > -               /* Only swapping a whole PMD-mapped folio is supported */
> > > -               if (folio_test_swapcache(folio))
> > > +       if (folio_test_anon(folio)) {
> > > +               /* Cannot split anonymous THP to order-1 */
> >
> > This is simply what the code is indicating. Shouldn't we phrase
> > it differently to explain "why" but not "how"? for example, anon
> > order-1 mTHP is not supported?
>
> Hi, Barry,
> Good comments, thanks.
> Is "order-1 is not a anonymouns mTHP suitable order." better?

could pick up some words from include/linux/huge_mm.h, particularly
those words regarding "a limitation of the THP implementation".

/*
 * Mask of all large folio orders supported for anonymous THP; all orders up to
 * and including PMD_ORDER, except order-0 (which is not "huge") and order-1
 * (which is a limitation of the THP implementation).
 */
#define THP_ORDERS_ALL_ANON     ((BIT(PMD_ORDER + 1) - 1) & ~(BIT(0) | BIT(1)))

perhaps, you can even do

if (order > 0 && !(bit(order) & THP_ORDERS_ALL_ANON))
      return -EINVAL;

This is self-commented. Either way is fine.

>
> > Otherwise, it looks good to me.
> >
> > Reviewed-by: Barry Song <baohua@kernel.org>
>

Thanks
Barry


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
  2024-06-06  9:42 [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios xu.xin16
                   ` (2 preceding siblings ...)
  2024-06-06 21:44 ` Zi Yan
@ 2024-06-07  7:56 ` David Hildenbrand
  3 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-06-07  7:56 UTC (permalink / raw)
  To: xu.xin16, ziy, v-songbaohua, akpm
  Cc: linux-kernel, linux-mm, mhocko, yang.yang29, ran.xiaokai

On 06.06.24 11:42, xu.xin16@zte.com.cn wrote:
> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> 
> When I did a large folios split test, a WARNING
> "[ 5059.122759][  T166] Cannot split file folio to non-0 order"
> was triggered. But the test cases are only for anonmous folios.
> while mapping_large_folio_support() is only reasonable for page
> cache folios.
> 
> In split_huge_page_to_list_to_order(), the folio passed to
> mapping_large_folio_support() maybe anonmous folio. The
> folio_test_anon() check is missing. So the split of the anonmous THP
> is failed. This is also the same for shmem_mapping(). We'd better add
> a check for both. But the shmem_mapping() in __split_huge_page() is
> not involved, as for anonmous folios, the end parameter is set to -1, so
> (head[i].index >= end) is always false. shmem_mapping() is not called.
> 
> Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support()
> for anon mapping, So we can detect the wrong use more easily.
> 
> THP folios maybe exist in the pagecache even the file system doesn't
> support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE
> is enabled, khugepaged will try to collapse read-only file-backed pages
> to THP. But the mapping does not actually support multi order
> large folios properly.
> 
> Using /sys/kernel/debug/split_huge_pages to verify this, with this
> patch, large anon THP is successfully split and the warning is ceased.
> 

Smaller nits:

> +	} else if (new_order) {
>   		/* Split shmem folio to non-zero order not supported */
>   		if (shmem_mapping(folio->mapping)) {
>   			VM_WARN_ONCE(1,
>   				"Cannot split shmem folio to non-0 order");
>   			return -EINVAL;
>   		}
> -		/* No split if the file system does not support large folio */
> -		if (!mapping_large_folio_support(folio->mapping)) {
> +		/* No split if the file system does not support large folio.

/*
  * No ...


> +		 * Note that we might still have THPs in such mappings due to
> +		 * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
> +		 * does not actually support large folios properly.
> +		 */
> +		if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
> +			!mapping_large_folio_support(folio->mapping)) {

if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
     !mapping_large_folio_support(folio->mapping)) {

>   			VM_WARN_ONCE(1,
>   				"Cannot split file folio to non-0 order");
>   			return -EINVAL;
>   		}
>   	}
> 
> +	/* Only swapping a whole PMD-mapped folio is supported */
> +	if (folio_test_swapcache(folio) && new_order)
> +		return -EINVAL;
> 
>   	is_hzp = is_huge_zero_folio(folio);
>   	if (is_hzp) {

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-06-07  7:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-06  9:42 [PATCH linux-next v2] mm: huge_memory: fix misused mapping_large_folio_support() for anon folios xu.xin16
2024-06-06  9:49 ` Barry Song
2024-06-07  1:36   ` ran xiaokai
2024-06-07  2:00     ` Barry Song
2024-06-06 14:28 ` Zi Yan
2024-06-06 21:00   ` Barry Song
2024-06-06 21:24     ` Zi Yan
2024-06-06 21:33       ` Barry Song
2024-06-06 21:42         ` Zi Yan
2024-06-06 21:44 ` Zi Yan
2024-06-07  7:56 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox