From: Vlastimil Babka <vbabka@suse.com>
To: Dave Chinner <dgc@kernel.org>, Salvatore Dipietro <dipiets@amazon.it>
Cc: linux-kernel@vger.kernel.org, alisaidi@amazon.com,
blakgeof@amazon.com, abuehaze@amazon.de,
dipietro.salvatore@gmail.com, willy@infradead.org,
stable@vger.kernel.org, Christian Brauner <brauner@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
"Ritesh Harjani (IBM)" <ritesh.list@gmail.com>,
Christoph Hellwig <hch@infradead.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Michal Hocko <mhocko@suse.com>,
"David Hildenbrand (Red Hat)" <david@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation
Date: Tue, 21 Apr 2026 11:02:45 +0200 [thread overview]
Message-ID: <1f50ce04-20e6-46a0-9d8a-00a5f7a74967@suse.com> (raw)
In-Reply-To: <adLlrSZ5oRAa_Hfd@dread>
On 4/6/26 00:43, Dave Chinner wrote:
> On Fri, Apr 03, 2026 at 07:35:34PM +0000, Salvatore Dipietro wrote:
>> Commit 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
>> introduced high-order folio allocations in the buffered write
>> path. When memory is fragmented, each failed allocation triggers
>> compaction and drain_all_pages() via __alloc_pages_slowpath(),
>> causing a 0.75x throughput drop on pgbench (simple-update) with
>> 1024 clients on a 96-vCPU arm64 system.
>>
>> Strip __GFP_DIRECT_RECLAIM from folio allocations in
>> iomap_get_folio() when the order exceeds PAGE_ALLOC_COSTLY_ORDER,
>> making them purely opportunistic.
>>
>> Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Salvatore Dipietro <dipiets@amazon.it>
BTW, backporting perf regressions fixes to 6.6, when they are only reported
at the time 7.0 is released, might be too risky. There will likely be a
different workload that will regress as a result, no matter what we do.
>> ---
>> fs/iomap/buffered-io.c | 15 ++++++++++++++-
>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 92a831cf4bf1..cb843d54b4d9 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -715,6 +715,7 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
>> struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len)
>> {
>> fgf_t fgp = FGP_WRITEBEGIN | FGP_NOFS;
>> + gfp_t gfp;
>>
>> if (iter->flags & IOMAP_NOWAIT)
>> fgp |= FGP_NOWAIT;
>> @@ -722,8 +723,20 @@ struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len)
>> fgp |= FGP_DONTCACHE;
>> fgp |= fgf_set_order(len);
>>
>> + gfp = mapping_gfp_mask(iter->inode->i_mapping);
>> +
>> + /*
>> + * If the folio order hint exceeds PAGE_ALLOC_COSTLY_ORDER,
>> + * strip __GFP_DIRECT_RECLAIM to make the allocation purely
>> + * opportunistic. This avoids compaction + drain_all_pages()
>> + * in __alloc_pages_slowpath() that devastate throughput
>> + * on large systems during buffered writes.
>> + */
>> + if (FGF_GET_ORDER(fgp) > PAGE_ALLOC_COSTLY_ORDER)
>> + gfp &= ~__GFP_DIRECT_RECLAIM;
>
> Adding these "gfp &= ~__GFP_DIRECT_RECLAIM" hacks everywhere
> we need to do high order folio allocation is getting out of hand.
>
> Compaction improves long term system performance, so we don't really
> just want to turn it off whenever we have demand for high order
> folios.
>
> We should be doing is getting rid of compaction out of the direct
> reclaim path - it is -clearly- way too costly for hot paths that use
> large allocations, especially those with fallbacks to smaller
> allocations or vmalloc.
>
> Instead, memory reclaim should kick background compaction and let it
> do the work. If the allocation path really, really needs high order
> allocation to succeed, then it can direct the allocation to retry
> until it succeeds and the allocator itself can wait for background
> compaction to make progress.
>
> For code that has fallbacks to smaller allocations, then there is no
> need to wait for compaction - we can attempt fast smaller allocations
> and continue that way until an allocation succeeds....
So, should we do a LSF/MM session?
But I think in any case, the page allocator needs to know which allocations
do have the fallback. __GFP_NORETRY exists for this. Here it wasn't tried at
all, in v2 [1] it was, but not alone. I'd start from __GFP_NORETRY alone,
and then we can look at tweaking what it does if it's currently insufficient.
We could have a helper to encapsulate this "turn this allocation to a
lightweight fallbackable one", which would add __GFP_NORETRY. It probably
already exists somewhere but not gfp.h. But I'm not sure we can simply
change GFP_KERNEL to start failing more for non-costly orders. We've
discussed that a lot in the past :)
[1] https://lore.kernel.org/all/20260420161404.642-1-dipiets@amazon.it/
> -Dave.
prev parent reply other threads:[~2026-04-21 9:02 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260403193535.9970-1-dipiets@amazon.it>
[not found] ` <20260403193535.9970-2-dipiets@amazon.it>
2026-04-04 1:13 ` Ritesh Harjani
2026-04-04 4:15 ` Matthew Wilcox
2026-04-04 16:47 ` Ritesh Harjani
2026-04-04 20:46 ` Matthew Wilcox
2026-04-16 15:14 ` Ritesh Harjani
2026-04-20 16:33 ` Salvatore Dipietro
2026-04-20 18:44 ` Matthew Wilcox
2026-04-21 1:16 ` Ritesh Harjani
[not found] ` <adLlrSZ5oRAa_Hfd@dread>
2026-04-21 9:02 ` Vlastimil Babka [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1f50ce04-20e6-46a0-9d8a-00a5f7a74967@suse.com \
--to=vbabka@suse.com \
--cc=abuehaze@amazon.de \
--cc=alisaidi@amazon.com \
--cc=blakgeof@amazon.com \
--cc=brauner@kernel.org \
--cc=david@kernel.org \
--cc=dgc@kernel.org \
--cc=dipietro.salvatore@gmail.com \
--cc=dipiets@amazon.it \
--cc=djwong@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=ritesh.list@gmail.com \
--cc=stable@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox