[PATCH] mm/page_alloc: don't warn about large allocations with __GFP

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm/page_alloc: don't warn about large allocations with __GFP_NOFAIL
@ 2025-11-05  7:41 libaokun
  2025-11-05  8:05 ` Michal Hocko
  2025-11-05  8:09 ` Vlastimil Babka
  0 siblings, 2 replies; 4+ messages in thread
From: libaokun @ 2025-11-05  7:41 UTC (permalink / raw)
  To: linux-mm
  Cc: akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy, willy,
	shakeel.butt, jack, yi.zhang, yangerkun, libaokun1, libaokun

From: Baokun Li <libaokun1@huawei.com>

Filesystems use __GFP_NOFAIL to allocate block-sized folios for metadata
reads at critical points, since they cannot afford to go read-only,
shut down, or enter an inconsistent state due to memory pressure.

Currently, attempting to allocate page units greater than order-1 with
the __GFP_NOFAIL flag triggers a WARN_ON() in __alloc_pages_slowpath().
However, filesystems supporting large block sizes (blocksize > PAGE_SIZE)
can easily require allocations larger than order-1.

As Matthew Wilcox noted, if we have a filesystem with 64KiB sectors, there
will be many clean folios in the page cache that are 64KiB or larger.

With gfp flags and order already included in the OOM report, both
Vlastimil Babka and Michal Hocko suggested that we can take the risk of
removing this warning first and then observe whether a large number of
related OOM reports appear.

If that happens, we can consider adding special handling in other places.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/all/aQPX1-XWQjKaMTZB@casper.infradead.org
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/all/188a95ba-6384-4319-bb74-c0d9ec6c4079@suse.cz
Suggested-by: Michal Hocko <mhocko@suse.com>
Link: https://lore.kernel.org/all/aQotQBjnDDeL_wHx@tiehlicka
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---

RFC: https://lore.kernel.org/all/20251031061350.2052509-1-libaokun@huaweicloud.com

 mm/page_alloc.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fb91c566327c..e4efda1158b2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4683,11 +4683,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	int reserve_flags;

 	if (unlikely(nofail)) {
-		/*
-		 * We most definitely don't want callers attempting to
-		 * allocate greater than order-1 page units with __GFP_NOFAIL.
-		 */
-		WARN_ON_ONCE(order > 1);
 		/*
 		 * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM,
 		 * otherwise, we may result in lockup.
-- 
2.46.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/page_alloc: don't warn about large allocations with __GFP_NOFAIL
  2025-11-05  7:41 [PATCH] mm/page_alloc: don't warn about large allocations with __GFP_NOFAIL libaokun
@ 2025-11-05  8:05 ` Michal Hocko
  2025-11-05  8:11   ` Baokun Li
  2025-11-05  8:09 ` Vlastimil Babka
  1 sibling, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2025-11-05  8:05 UTC (permalink / raw)
  To: libaokun
  Cc: linux-mm, akpm, vbabka, surenb, jackmanb, hannes, ziy, willy,
	shakeel.butt, jack, yi.zhang, yangerkun, libaokun1

On Wed 05-11-25 15:41:06, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> Filesystems use __GFP_NOFAIL to allocate block-sized folios for metadata
> reads at critical points, since they cannot afford to go read-only,
> shut down, or enter an inconsistent state due to memory pressure.
> 
> Currently, attempting to allocate page units greater than order-1 with
> the __GFP_NOFAIL flag triggers a WARN_ON() in __alloc_pages_slowpath().
> However, filesystems supporting large block sizes (blocksize > PAGE_SIZE)
> can easily require allocations larger than order-1.
> 
> As Matthew Wilcox noted, if we have a filesystem with 64KiB sectors, there
> will be many clean folios in the page cache that are 64KiB or larger.
> 
> With gfp flags and order already included in the OOM report, both
> Vlastimil Babka and Michal Hocko suggested that we can take the risk of
> removing this warning first and then observe whether a large number of
> related OOM reports appear.
> 
> If that happens, we can consider adding special handling in other places.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Link: https://lore.kernel.org/all/aQPX1-XWQjKaMTZB@casper.infradead.org
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Link: https://lore.kernel.org/all/188a95ba-6384-4319-bb74-c0d9ec6c4079@suse.cz
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Link: https://lore.kernel.org/all/aQotQBjnDDeL_wHx@tiehlicka
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Thanks for referencing above links which are providing a useful insight
for future reference. I would just add a link to Matthew explanation why
kvmalloc is not an option
Link: https://lore.kernel.org/all/aQTHMI3t5mNXp0M1@casper.infradead.org/T/#u
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> 
> RFC: https://lore.kernel.org/all/20251031061350.2052509-1-libaokun@huaweicloud.com
> 
>  mm/page_alloc.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fb91c566327c..e4efda1158b2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4683,11 +4683,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	int reserve_flags;
>  
>  	if (unlikely(nofail)) {
> -		/*
> -		 * We most definitely don't want callers attempting to
> -		 * allocate greater than order-1 page units with __GFP_NOFAIL.
> -		 */
> -		WARN_ON_ONCE(order > 1);
>  		/*
>  		 * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM,
>  		 * otherwise, we may result in lockup.
> -- 
> 2.46.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/page_alloc: don't warn about large allocations with __GFP_NOFAIL
  2025-11-05  7:41 [PATCH] mm/page_alloc: don't warn about large allocations with __GFP_NOFAIL libaokun
  2025-11-05  8:05 ` Michal Hocko
@ 2025-11-05  8:09 ` Vlastimil Babka
  1 sibling, 0 replies; 4+ messages in thread
From: Vlastimil Babka @ 2025-11-05  8:09 UTC (permalink / raw)
  To: libaokun, linux-mm
  Cc: akpm, surenb, mhocko, jackmanb, hannes, ziy, willy, shakeel.butt,
	jack, yi.zhang, yangerkun, libaokun1

On 11/5/25 08:41, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> Filesystems use __GFP_NOFAIL to allocate block-sized folios for metadata
> reads at critical points, since they cannot afford to go read-only,
> shut down, or enter an inconsistent state due to memory pressure.
> 
> Currently, attempting to allocate page units greater than order-1 with
> the __GFP_NOFAIL flag triggers a WARN_ON() in __alloc_pages_slowpath().
> However, filesystems supporting large block sizes (blocksize > PAGE_SIZE)
> can easily require allocations larger than order-1.
> 
> As Matthew Wilcox noted, if we have a filesystem with 64KiB sectors, there
> will be many clean folios in the page cache that are 64KiB or larger.
> 
> With gfp flags and order already included in the OOM report, both
> Vlastimil Babka and Michal Hocko suggested that we can take the risk of
> removing this warning first and then observe whether a large number of
> related OOM reports appear.
> 
> If that happens, we can consider adding special handling in other places.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Link: https://lore.kernel.org/all/aQPX1-XWQjKaMTZB@casper.infradead.org
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Link: https://lore.kernel.org/all/188a95ba-6384-4319-bb74-c0d9ec6c4079@suse.cz
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Link: https://lore.kernel.org/all/aQotQBjnDDeL_wHx@tiehlicka
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
> 
> RFC: https://lore.kernel.org/all/20251031061350.2052509-1-libaokun@huaweicloud.com
> 
>  mm/page_alloc.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fb91c566327c..e4efda1158b2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4683,11 +4683,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	int reserve_flags;
>  
>  	if (unlikely(nofail)) {
> -		/*
> -		 * We most definitely don't want callers attempting to
> -		 * allocate greater than order-1 page units with __GFP_NOFAIL.
> -		 */
> -		WARN_ON_ONCE(order > 1);
>  		/*
>  		 * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM,
>  		 * otherwise, we may result in lockup.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/page_alloc: don't warn about large allocations with __GFP_NOFAIL
  2025-11-05  8:05 ` Michal Hocko
@ 2025-11-05  8:11   ` Baokun Li
  0 siblings, 0 replies; 4+ messages in thread
From: Baokun Li @ 2025-11-05  8:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, akpm, vbabka, surenb, jackmanb, hannes, ziy, willy,
	shakeel.butt, jack, yi.zhang, yangerkun, libaokun1

On 2025-11-05 16:05, Michal Hocko wrote:
> On Wed 05-11-25 15:41:06, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> Filesystems use __GFP_NOFAIL to allocate block-sized folios for metadata
>> reads at critical points, since they cannot afford to go read-only,
>> shut down, or enter an inconsistent state due to memory pressure.
>>
>> Currently, attempting to allocate page units greater than order-1 with
>> the __GFP_NOFAIL flag triggers a WARN_ON() in __alloc_pages_slowpath().
>> However, filesystems supporting large block sizes (blocksize > PAGE_SIZE)
>> can easily require allocations larger than order-1.
>>
>> As Matthew Wilcox noted, if we have a filesystem with 64KiB sectors, there
>> will be many clean folios in the page cache that are 64KiB or larger.
>>
>> With gfp flags and order already included in the OOM report, both
>> Vlastimil Babka and Michal Hocko suggested that we can take the risk of
>> removing this warning first and then observe whether a large number of
>> related OOM reports appear.
>>
>> If that happens, we can consider adding special handling in other places.
>>
>> Suggested-by: Matthew Wilcox <willy@infradead.org>
>> Link: https://lore.kernel.org/all/aQPX1-XWQjKaMTZB@casper.infradead.org
>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>> Link: https://lore.kernel.org/all/188a95ba-6384-4319-bb74-c0d9ec6c4079@suse.cz
>> Suggested-by: Michal Hocko <mhocko@suse.com>
>> Link: https://lore.kernel.org/all/aQotQBjnDDeL_wHx@tiehlicka
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Thanks for referencing above links which are providing a useful insight
> for future reference. I would just add a link to Matthew explanation why
> kvmalloc is not an option
> Link: https://lore.kernel.org/all/aQTHMI3t5mNXp0M1@casper.infradead.org/T/#u
> Acked-by: Michal Hocko <mhocko@suse.com>
Okay, I will add the link in the next version.

Thanks for your review!


Cheers,
Baokun

>> ---
>>
>> RFC: https://lore.kernel.org/all/20251031061350.2052509-1-libaokun@huaweicloud.com
>>
>>  mm/page_alloc.c | 5 -----
>>  1 file changed, 5 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index fb91c566327c..e4efda1158b2 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4683,11 +4683,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>>  	int reserve_flags;
>>  
>>  	if (unlikely(nofail)) {
>> -		/*
>> -		 * We most definitely don't want callers attempting to
>> -		 * allocate greater than order-1 page units with __GFP_NOFAIL.
>> -		 */
>> -		WARN_ON_ONCE(order > 1);
>>  		/*
>>  		 * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM,
>>  		 * otherwise, we may result in lockup.
>> -- 
>> 2.46.1

-- 
With Best Regards,
Baokun Li



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-11-05  8:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-05  7:41 [PATCH] mm/page_alloc: don't warn about large allocations with __GFP_NOFAIL libaokun
2025-11-05  8:05 ` Michal Hocko
2025-11-05  8:11   ` Baokun Li
2025-11-05  8:09 ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox