From: Harry Yoo <harry.yoo@oracle.com>
To: Chris Bainbridge <chris.bainbridge@gmail.com>
Cc: vbabka@suse.cz, surenb@google.com, hao.li@linux.dev,
leitao@debian.org, Liam.Howlett@oracle.com, zhao1.liu@intel.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-btrfs@vger.kernel.org, regressions@lists.linux.dev
Subject: Re: [REGRESSION] kswapd0: page allocation failure (bisected to "slab: add sheaves to most caches")
Date: Mon, 23 Feb 2026 20:59:30 +0900 [thread overview]
Message-ID: <aZxBIpE8R8DxO4eJ@hyeyoo> (raw)
In-Reply-To: <aZw2LyOjxMc-c3dl@debian.local>
On Mon, Feb 23, 2026 at 11:12:47AM +0000, Chris Bainbridge wrote:
> On Mon, Feb 23, 2026 at 05:41:17PM +0900, Harry Yoo wrote:
> > On Sun, Feb 22, 2026 at 09:36:58PM +0000, Chris Bainbridge wrote:
> > > Hi,
> > >
> > > The latest mainline kernel (v6.19-11831-ga95f71ad3e2e) has page
> > > allocation failures when doing things like compiling a kernel. I can
> > > also reproduce this with a stress test like
> > > `stress-ng --vm 2 --vm-bytes 110% --verify -v`
> >
> > Hi, thanks for the report!
> >
> > > [ 104.032925] kswapd0: page allocation failure: order:0, mode:0xc0c40(GFP_NOFS|__GFP_COMP|__GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0
> > > [ 104.033307] CPU: 4 UID: 0 PID: 156 Comm: kswapd0 Not tainted 6.19.0-rc5-00027-g40fd0acc45d0 #435 PREEMPT(voluntary)
> > > [ 104.033312] Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.17 12/18/2024
> > > [ 104.033314] Call Trace:
> > > [ 104.033316] <TASK>
> > > [ 104.033319] dump_stack_lvl+0x6a/0x90
> > > [ 104.033328] warn_alloc.cold+0x95/0x1af
> > > [ 104.033334] ? zone_watermark_ok+0x80/0x80
> > > [ 104.033350] __alloc_frozen_pages_noprof+0xec3/0x2470
> > > [ 104.033353] ? __lock_acquire+0x489/0x2600
> > > [ 104.033359] ? stack_access_ok+0x1c0/0x1c0
> > > [ 104.033367] ? warn_alloc+0x1d0/0x1d0
> > > [ 104.033371] ? __lock_acquire+0x489/0x2600
> > > [ 104.033375] ? _raw_spin_unlock_irqrestore+0x48/0x60
> > > [ 104.033379] ? _raw_spin_unlock_irqrestore+0x48/0x60
> > > [ 104.033382] ? lockdep_hardirqs_on+0x78/0x100
> > > [ 104.033394] allocate_slab+0x2b7/0x510
> > > [ 104.033399] refill_objects+0x25d/0x380
> > > [ 104.033407] __pcs_replace_empty_main+0x193/0x5f0
> > > [ 104.033412] kmem_cache_alloc_noprof+0x5b6/0x6f0
> > > [ 104.033415] ? alloc_extent_state+0x1b/0x210 [btrfs]
> > > [ 104.033479] alloc_extent_state+0x1b/0x210 [btrfs]
> > > [ 104.033527] btrfs_clear_extent_bit_changeset+0x2be/0x9c0 [btrfs]
> >
> > Hmm while bisect points out the first bad commit is
> > commit e47c897a2949 ("slab: add sheaves to most caches"),
> >
> > I think the caller is supposed to specify __GFP_NOWARN if it doesn't
> > care about allocation failure?
> >
> > btrfs_clear_extent_bit_changeset() says:
> > > if (!prealloc) {
> > > /*
> > > * Don't care for allocation failure here because we might end
> > > * up not needing the pre-allocated extent state at all, which
> > > * is the case if we only have in the tree extent states that
> > > * cover our input range and don't cover too any other range.
> > > * If we end up needing a new extent state we allocate it later.
> > > */
> > > prealloc = alloc_extent_state(mask);
> > > }
> >
> > Oh wait, I see what's going on. bisection pointed out the commit
> > because slab tries to refill sheaves with __GFP_NOMEMALLOC (and then
> > falls back to slowpath if it fails).
> >
> > Since failing to refill sheaves doesn't mean the allocation will fail,
> > it should specify __GFP_NOWARN with __GFP_NOMEMALLOC as long as there's
> > fallback method.
> >
> > But for __prefill_sheaf_pfmemalloc(), it should specify __GPF_NOWARN on
> > the first attempt only when gfp_pfmemalloc_allowed() returns true.
>
> Is this fix sufficient to do the right thing? I tested it, and it does
> appear to prevent logging of the allocation failures for my test case.
I think we should do both both 1) setting __GFP_NOWARN from btrfs side
and 2) making slab try to refill sheaves with __GFP_NOWARN when
there's a fallback path.
I'm writing a fix for 2) and I'll send it soon.
> diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
> index d0dd50f7d279..d2e1083848e8 100644
> --- a/fs/btrfs/extent-io-tree.c
> +++ b/fs/btrfs/extent-io-tree.c
> @@ -641,7 +641,7 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
> * cover our input range and don't cover too any other range.
> * If we end up needing a new extent state we allocate it later.
> */
> - prealloc = alloc_extent_state(mask);
> + prealloc = alloc_extent_state(mask | __GFP_NOWARN);
This seems to be a right thing to do to me, but as I'm not familiar
with btrfs, I'll let btrfs folks leave comment on it :)
> }
>
> spin_lock(&tree->lock);
--
Cheers,
Harry / Hyeonggon
prev parent reply other threads:[~2026-02-23 11:59 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-22 21:36 Chris Bainbridge
2026-02-23 8:41 ` Harry Yoo
2026-02-23 11:12 ` Chris Bainbridge
2026-02-23 11:59 ` Harry Yoo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZxBIpE8R8DxO4eJ@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=chris.bainbridge@gmail.com \
--cc=hao.li@linux.dev \
--cc=leitao@debian.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=regressions@lists.linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox