[REGRESSION] kswapd0: page allocation failure (bisected to "slab: add sheaves to most caches")

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [REGRESSION] kswapd0: page allocation failure (bisected to "slab: add sheaves to most caches")
@ 2026-02-22 21:36 Chris Bainbridge
  2026-02-23  8:41 ` Harry Yoo
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Bainbridge @ 2026-02-22 21:36 UTC (permalink / raw)
  To: vbabka
  Cc: surenb, harry.yoo, hao.li, leitao, Liam.Howlett, zhao1.liu,
	linux-kernel, linux-mm, linux-btrfs, regressions

Hi,

The latest mainline kernel (v6.19-11831-ga95f71ad3e2e) has page
allocation failures when doing things like compiling a kernel. I can
also reproduce this with a stress test like
`stress-ng --vm 2 --vm-bytes 110% --verify -v`


[  104.032925] kswapd0: page allocation failure: order:0, mode:0xc0c40(GFP_NOFS|__GFP_COMP|__GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0
[  104.033307] CPU: 4 UID: 0 PID: 156 Comm: kswapd0 Not tainted 6.19.0-rc5-00027-g40fd0acc45d0 #435 PREEMPT(voluntary) 
[  104.033312] Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.17 12/18/2024
[  104.033314] Call Trace:
[  104.033316]  <TASK>
[  104.033319]  dump_stack_lvl+0x6a/0x90
[  104.033328]  warn_alloc.cold+0x95/0x1af
[  104.033334]  ? zone_watermark_ok+0x80/0x80
[  104.033350]  __alloc_frozen_pages_noprof+0xec3/0x2470
[  104.033353]  ? __lock_acquire+0x489/0x2600
[  104.033359]  ? stack_access_ok+0x1c0/0x1c0
[  104.033367]  ? warn_alloc+0x1d0/0x1d0
[  104.033371]  ? __lock_acquire+0x489/0x2600
[  104.033375]  ? _raw_spin_unlock_irqrestore+0x48/0x60
[  104.033379]  ? _raw_spin_unlock_irqrestore+0x48/0x60
[  104.033382]  ? lockdep_hardirqs_on+0x78/0x100
[  104.033394]  allocate_slab+0x2b7/0x510
[  104.033399]  refill_objects+0x25d/0x380
[  104.033407]  __pcs_replace_empty_main+0x193/0x5f0
[  104.033412]  kmem_cache_alloc_noprof+0x5b6/0x6f0
[  104.033415]  ? alloc_extent_state+0x1b/0x210 [btrfs]
[  104.033479]  alloc_extent_state+0x1b/0x210 [btrfs]
[  104.033527]  btrfs_clear_extent_bit_changeset+0x2be/0x9c0 [btrfs]
[  104.033575]  btrfs_clear_record_extent_bits+0x10/0x20 [btrfs]
[  104.033615]  btrfs_qgroup_check_reserved_leak+0xbd/0x2b0 [btrfs]
[  104.033659]  ? lock_release+0x17b/0x2a0
[  104.033663]  ? btrfs_qgroup_convert_reserved_meta+0xe90/0xe90 [btrfs]
[  104.033703]  ? do_raw_spin_unlock+0x54/0x1e0
[  104.033707]  ? _raw_spin_unlock+0x29/0x40
[  104.033710]  ? btrfs_lookup_first_ordered_extent+0x1d4/0x370 [btrfs]
[  104.033762]  ? preempt_count_add+0x73/0x140
[  104.033768]  btrfs_destroy_inode+0x301/0x6a0 [btrfs]
[  104.033820]  ? __destroy_inode+0x194/0x570
[  104.033826]  destroy_inode+0xb9/0x190
[  104.033830]  evict+0x4d8/0x900
[  104.033832]  ? lock_release+0x17b/0x2a0
[  104.033835]  ? find_held_lock+0x2b/0x80
[  104.033839]  ? destroy_inode+0x190/0x190
[  104.033842]  ? __list_lru_walk_one+0x30d/0x440
[  104.033849]  ? _raw_spin_unlock+0x29/0x40
[  104.033851]  ? __list_lru_walk_one+0x30d/0x440
[  104.033854]  ? __wait_on_freeing_inode+0x2a0/0x2a0
[  104.033860]  dispose_list+0xf0/0x1b0
[  104.033866]  prune_icache_sb+0xde/0x150
[  104.033869]  ? list_lru_count_one+0x13f/0x270
[  104.033873]  ? dump_mapping+0x250/0x250
[  104.033875]  ? lock_release+0x17b/0x2a0
[  104.033882]  super_cache_scan+0x302/0x4d0
[  104.033889]  do_shrink_slab+0x32e/0xd30
[  104.033898]  shrink_slab+0x7b6/0xda0
[  104.033902]  ? shrink_slab+0x4b1/0xda0
[  104.033908]  ? reparent_shrinker_deferred+0x330/0x330
[  104.033914]  ? trace_event_raw_event_sched_switch+0x410/0x410
[  104.033921]  shrink_node+0xac4/0x36e0
[  104.033933]  ? lru_gen_release_memcg+0x3c0/0x3c0
[  104.033940]  ? pgdat_balanced+0x15f/0x4b0
[  104.033943]  ? __cond_resched+0x23/0x30
[  104.033950]  ? balance_pgdat+0x739/0x1530
[  104.033952]  balance_pgdat+0x739/0x1530
[  104.033960]  ? shrink_node+0x36e0/0x36e0
[  104.033962]  ? __timer_delete_sync+0x177/0x240
[  104.033966]  ? __timer_delete_sync+0x177/0x240
[  104.033970]  ? _raw_spin_unlock_irqrestore+0x48/0x60
[  104.033975]  ? __lock_acquire+0x489/0x2600
[  104.033979]  ? call_timer_fn+0x3b0/0x3b0
[  104.033981]  ? schedule+0x2ba/0x390
[  104.033990]  ? lock_is_held_type+0xd5/0x130
[  104.033997]  ? kswapd+0x364/0x7f0
[  104.034004]  kswapd+0x445/0x7f0
[  104.034010]  ? balance_pgdat+0x1530/0x1530
[  104.034013]  ? _raw_spin_unlock_irqrestore+0x48/0x60
[  104.034016]  ? finish_wait+0x280/0x280
[  104.034022]  ? __kthread_parkme+0xb4/0x200
[  104.034027]  ? balance_pgdat+0x1530/0x1530
[  104.034029]  kthread+0x3ad/0x760
[  104.034033]  ? kthread_is_per_cpu+0xb0/0xb0
[  104.034035]  ? ret_from_fork+0x70/0x850
[  104.034039]  ? ret_from_fork+0x70/0x850
[  104.034042]  ? _raw_spin_unlock_irq+0x24/0x50
[  104.034045]  ? kthread_is_per_cpu+0xb0/0xb0
[  104.034049]  ret_from_fork+0x6dc/0x850
[  104.034053]  ? exit_thread+0x70/0x70
[  104.034057]  ? __switch_to+0x36f/0xd80
[  104.034061]  ? kthread_is_per_cpu+0xb0/0xb0
[  104.034065]  ret_from_fork_asm+0x11/0x20
[  104.034077]  </TASK>
[  104.034078] Mem-Info:
[  104.034111] active_anon:511 inactive_anon:2355672 isolated_anon:0
                active_file:77595 inactive_file:204731 isolated_file:0
                unevictable:7150 dirty:925 writeback:57
                slab_reclaimable:20227 slab_unreclaimable:201840
                mapped:121227 shmem:10197 pagetables:9634
                sec_pagetables:733 bounce:0
                kernel_misc_reclaimable:0
                free:36223 free_pcp:529 free_cma:0
[  104.034119] Node 0 active_anon:2044kB inactive_anon:9422688kB active_file:310380kB inactive_file:818924kB unevictable:28600kB isolated(anon):0kB isolated(file):0kB mapped:484908kB dirty:3700kB writeback:228kB shmem:40788kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:8534016kB kernel_stack:31616kB pagetables:38536kB sec_pagetables:2932kB all_unreclaimable? no Balloon:0kB
[  104.034126] Node 0 DMA free:13316kB boost:0kB min:84kB low:104kB high:124kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB zspages:0kB present:15996kB managed:15364kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  104.034135] lowmem_reserve[]: 0 2862 11990 11990 11990
[  104.034147] Node 0 DMA32 free:52184kB boost:0kB min:15860kB low:19824kB high:23788kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:2871780kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB zspages:0kB present:2997084kB managed:2931416kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  104.034155] lowmem_reserve[]: 0 0 9127 9127 9127
[  104.034166] Node 0 Normal free:79392kB boost:28672kB min:80308kB low:93216kB high:106124kB reserved_highatomic:2048KB free_highatomic:32KB active_anon:2044kB inactive_anon:6550896kB active_file:310380kB inactive_file:818924kB unevictable:28600kB writepending:4252kB zspages:0kB present:13077504kB managed:9346788kB mlocked:28600kB bounce:0kB free_pcp:2116kB local_pcp:0kB free_cma:0kB
[  104.034174] lowmem_reserve[]: 0 0 0 0 0
[  104.034185] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 13316kB
[  104.034308] Node 0 DMA32: 3*4kB (U) 5*8kB (UM) 5*16kB (UM) 8*32kB (UM) 11*64kB (UM) 15*128kB (UM) 11*256kB (UM) 9*512kB (UM) 9*1024kB (UM) 0*2048kB 8*4096kB (UM) = 52420kB
[  104.034348] Node 0 Normal: 1024*4kB (UMEH) 534*8kB (UEH) 409*16kB (UME) 1301*32kB (UME) 154*64kB (UME) 39*128kB (UME) 14*256kB (UM) 1*512kB (U) 2*1024kB (UM) 1*2048kB (U) 0*4096kB = 79584kB
[  104.034390] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[  104.034393] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  104.034396] 299766 total pagecache pages
[  104.034398] 0 pages in swap cache
[  104.034401] Free swap  = 8387580kB
[  104.034403] Total swap = 8387580kB
[  104.034405] 4022646 pages RAM
[  104.034407] 0 pages HighMem/MovableOnly
[  104.034410] 949254 pages reserved
[  104.034412] 0 pages hwpoisoned


The page allocation failures bisect to:

e47c897a29491ade20b27612fdd3107c39a07357 slab: add sheaves to most caches

#regzbot introduced: e47c897a29491ade20b27612fdd3107c39a07357
#regzbot title: kswapd0: page allocation failure


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] kswapd0: page allocation failure (bisected to "slab: add sheaves to most caches")
  2026-02-22 21:36 [REGRESSION] kswapd0: page allocation failure (bisected to "slab: add sheaves to most caches") Chris Bainbridge
@ 2026-02-23  8:41 ` Harry Yoo
  2026-02-23 11:12   ` Chris Bainbridge
  0 siblings, 1 reply; 5+ messages in thread
From: Harry Yoo @ 2026-02-23  8:41 UTC (permalink / raw)
  To: Chris Bainbridge
  Cc: vbabka, surenb, hao.li, leitao, Liam.Howlett, zhao1.liu,
	linux-kernel, linux-mm, linux-btrfs, regressions

On Sun, Feb 22, 2026 at 09:36:58PM +0000, Chris Bainbridge wrote:
> Hi,
> 
> The latest mainline kernel (v6.19-11831-ga95f71ad3e2e) has page
> allocation failures when doing things like compiling a kernel. I can
> also reproduce this with a stress test like
> `stress-ng --vm 2 --vm-bytes 110% --verify -v`

Hi, thanks for the report!

> [  104.032925] kswapd0: page allocation failure: order:0, mode:0xc0c40(GFP_NOFS|__GFP_COMP|__GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0
> [  104.033307] CPU: 4 UID: 0 PID: 156 Comm: kswapd0 Not tainted 6.19.0-rc5-00027-g40fd0acc45d0 #435 PREEMPT(voluntary) 
> [  104.033312] Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.17 12/18/2024
> [  104.033314] Call Trace:
> [  104.033316]  <TASK>
> [  104.033319]  dump_stack_lvl+0x6a/0x90
> [  104.033328]  warn_alloc.cold+0x95/0x1af
> [  104.033334]  ? zone_watermark_ok+0x80/0x80
> [  104.033350]  __alloc_frozen_pages_noprof+0xec3/0x2470
> [  104.033353]  ? __lock_acquire+0x489/0x2600
> [  104.033359]  ? stack_access_ok+0x1c0/0x1c0
> [  104.033367]  ? warn_alloc+0x1d0/0x1d0
> [  104.033371]  ? __lock_acquire+0x489/0x2600
> [  104.033375]  ? _raw_spin_unlock_irqrestore+0x48/0x60
> [  104.033379]  ? _raw_spin_unlock_irqrestore+0x48/0x60
> [  104.033382]  ? lockdep_hardirqs_on+0x78/0x100
> [  104.033394]  allocate_slab+0x2b7/0x510
> [  104.033399]  refill_objects+0x25d/0x380
> [  104.033407]  __pcs_replace_empty_main+0x193/0x5f0
> [  104.033412]  kmem_cache_alloc_noprof+0x5b6/0x6f0
> [  104.033415]  ? alloc_extent_state+0x1b/0x210 [btrfs]
> [  104.033479]  alloc_extent_state+0x1b/0x210 [btrfs]
> [  104.033527]  btrfs_clear_extent_bit_changeset+0x2be/0x9c0 [btrfs]

Hmm while bisect points out the first bad commit is
commit e47c897a2949 ("slab: add sheaves to most caches"),

I think the caller is supposed to specify __GFP_NOWARN if it doesn't
care about allocation failure?

btrfs_clear_extent_bit_changeset() says:
>         if (!prealloc) {
>                 /*
>                  * Don't care for allocation failure here because we might end
>                  * up not needing the pre-allocated extent state at all, which
>                  * is the case if we only have in the tree extent states that 
>                  * cover our input range and don't cover too any other range.
>                  * If we end up needing a new extent state we allocate it later.
>                  */
>                 prealloc = alloc_extent_state(mask);
>         }

Oh wait, I see what's going on. bisection pointed out the commit
because slab tries to refill sheaves with __GFP_NOMEMALLOC (and then
falls back to slowpath if it fails).

Since failing to refill sheaves doesn't mean the allocation will fail,
it should specify __GFP_NOWARN with __GFP_NOMEMALLOC as long as there's
fallback method.

But for __prefill_sheaf_pfmemalloc(), it should specify __GPF_NOWARN on
the first attempt only when gfp_pfmemalloc_allowed() returns true.

-- 
Cheers,
Harry / Hyeonggon

> [  104.033575]  btrfs_clear_record_extent_bits+0x10/0x20 [btrfs]
> [  104.033615]  btrfs_qgroup_check_reserved_leak+0xbd/0x2b0 [btrfs]
> [  104.033659]  ? lock_release+0x17b/0x2a0
> [  104.033663]  ? btrfs_qgroup_convert_reserved_meta+0xe90/0xe90 [btrfs]
> [  104.033703]  ? do_raw_spin_unlock+0x54/0x1e0
> [  104.033707]  ? _raw_spin_unlock+0x29/0x40
> [  104.033710]  ? btrfs_lookup_first_ordered_extent+0x1d4/0x370 [btrfs]
> [  104.033762]  ? preempt_count_add+0x73/0x140
> [  104.033768]  btrfs_destroy_inode+0x301/0x6a0 [btrfs]
> [  104.033820]  ? __destroy_inode+0x194/0x570
> [  104.033826]  destroy_inode+0xb9/0x190
> [  104.033830]  evict+0x4d8/0x900
> [  104.033832]  ? lock_release+0x17b/0x2a0
> [  104.033835]  ? find_held_lock+0x2b/0x80
> [  104.033839]  ? destroy_inode+0x190/0x190
> [  104.033842]  ? __list_lru_walk_one+0x30d/0x440
> [  104.033849]  ? _raw_spin_unlock+0x29/0x40
> [  104.033851]  ? __list_lru_walk_one+0x30d/0x440
> [  104.033854]  ? __wait_on_freeing_inode+0x2a0/0x2a0
> [  104.033860]  dispose_list+0xf0/0x1b0
> [  104.033866]  prune_icache_sb+0xde/0x150
> [  104.033869]  ? list_lru_count_one+0x13f/0x270
> [  104.033873]  ? dump_mapping+0x250/0x250
> [  104.033875]  ? lock_release+0x17b/0x2a0
> [  104.033882]  super_cache_scan+0x302/0x4d0
> [  104.033889]  do_shrink_slab+0x32e/0xd30
> [  104.033898]  shrink_slab+0x7b6/0xda0
> [  104.033902]  ? shrink_slab+0x4b1/0xda0
> [  104.033908]  ? reparent_shrinker_deferred+0x330/0x330
> [  104.033914]  ? trace_event_raw_event_sched_switch+0x410/0x410
> [  104.033921]  shrink_node+0xac4/0x36e0
> [  104.033933]  ? lru_gen_release_memcg+0x3c0/0x3c0
> [  104.033940]  ? pgdat_balanced+0x15f/0x4b0
> [  104.033943]  ? __cond_resched+0x23/0x30
> [  104.033950]  ? balance_pgdat+0x739/0x1530
> [  104.033952]  balance_pgdat+0x739/0x1530
> [  104.033960]  ? shrink_node+0x36e0/0x36e0
> [  104.033962]  ? __timer_delete_sync+0x177/0x240
> [  104.033966]  ? __timer_delete_sync+0x177/0x240
> [  104.033970]  ? _raw_spin_unlock_irqrestore+0x48/0x60
> [  104.033975]  ? __lock_acquire+0x489/0x2600
> [  104.033979]  ? call_timer_fn+0x3b0/0x3b0
> [  104.033981]  ? schedule+0x2ba/0x390
> [  104.033990]  ? lock_is_held_type+0xd5/0x130
> [  104.033997]  ? kswapd+0x364/0x7f0
> [  104.034004]  kswapd+0x445/0x7f0
> [  104.034010]  ? balance_pgdat+0x1530/0x1530
> [  104.034013]  ? _raw_spin_unlock_irqrestore+0x48/0x60
> [  104.034016]  ? finish_wait+0x280/0x280
> [  104.034022]  ? __kthread_parkme+0xb4/0x200
> [  104.034027]  ? balance_pgdat+0x1530/0x1530
> [  104.034029]  kthread+0x3ad/0x760
> [  104.034033]  ? kthread_is_per_cpu+0xb0/0xb0
> [  104.034035]  ? ret_from_fork+0x70/0x850
> [  104.034039]  ? ret_from_fork+0x70/0x850
> [  104.034042]  ? _raw_spin_unlock_irq+0x24/0x50
> [  104.034045]  ? kthread_is_per_cpu+0xb0/0xb0
> [  104.034049]  ret_from_fork+0x6dc/0x850
> [  104.034053]  ? exit_thread+0x70/0x70
> [  104.034057]  ? __switch_to+0x36f/0xd80
> [  104.034061]  ? kthread_is_per_cpu+0xb0/0xb0
> [  104.034065]  ret_from_fork_asm+0x11/0x20
> [  104.034077]  </TASK>
> [  104.034078] Mem-Info:
> [  104.034111] active_anon:511 inactive_anon:2355672 isolated_anon:0
>                 active_file:77595 inactive_file:204731 isolated_file:0
>                 unevictable:7150 dirty:925 writeback:57
>                 slab_reclaimable:20227 slab_unreclaimable:201840
>                 mapped:121227 shmem:10197 pagetables:9634
>                 sec_pagetables:733 bounce:0
>                 kernel_misc_reclaimable:0
>                 free:36223 free_pcp:529 free_cma:0
> [  104.034119] Node 0 active_anon:2044kB inactive_anon:9422688kB active_file:310380kB inactive_file:818924kB unevictable:28600kB isolated(anon):0kB isolated(file):0kB mapped:484908kB dirty:3700kB writeback:228kB shmem:40788kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:8534016kB kernel_stack:31616kB pagetables:38536kB sec_pagetables:2932kB all_unreclaimable? no Balloon:0kB
> [  104.034126] Node 0 DMA free:13316kB boost:0kB min:84kB low:104kB high:124kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB zspages:0kB present:15996kB managed:15364kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [  104.034135] lowmem_reserve[]: 0 2862 11990 11990 11990
> [  104.034147] Node 0 DMA32 free:52184kB boost:0kB min:15860kB low:19824kB high:23788kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:2871780kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB zspages:0kB present:2997084kB managed:2931416kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [  104.034155] lowmem_reserve[]: 0 0 9127 9127 9127
> [  104.034166] Node 0 Normal free:79392kB boost:28672kB min:80308kB low:93216kB high:106124kB reserved_highatomic:2048KB free_highatomic:32KB active_anon:2044kB inactive_anon:6550896kB active_file:310380kB inactive_file:818924kB unevictable:28600kB writepending:4252kB zspages:0kB present:13077504kB managed:9346788kB mlocked:28600kB bounce:0kB free_pcp:2116kB local_pcp:0kB free_cma:0kB
> [  104.034174] lowmem_reserve[]: 0 0 0 0 0
> [  104.034185] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 13316kB
> [  104.034308] Node 0 DMA32: 3*4kB (U) 5*8kB (UM) 5*16kB (UM) 8*32kB (UM) 11*64kB (UM) 15*128kB (UM) 11*256kB (UM) 9*512kB (UM) 9*1024kB (UM) 0*2048kB 8*4096kB (UM) = 52420kB
> [  104.034348] Node 0 Normal: 1024*4kB (UMEH) 534*8kB (UEH) 409*16kB (UME) 1301*32kB (UME) 154*64kB (UME) 39*128kB (UME) 14*256kB (UM) 1*512kB (U) 2*1024kB (UM) 1*2048kB (U) 0*4096kB = 79584kB
> [  104.034390] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> [  104.034393] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [  104.034396] 299766 total pagecache pages
> [  104.034398] 0 pages in swap cache
> [  104.034401] Free swap  = 8387580kB
> [  104.034403] Total swap = 8387580kB
> [  104.034405] 4022646 pages RAM
> [  104.034407] 0 pages HighMem/MovableOnly
> [  104.034410] 949254 pages reserved
> [  104.034412] 0 pages hwpoisoned
> 
> 
> The page allocation failures bisect to:
> 
> e47c897a29491ade20b27612fdd3107c39a07357 slab: add sheaves to most caches
> 
> #regzbot introduced: e47c897a29491ade20b27612fdd3107c39a07357
> #regzbot title: kswapd0: page allocation failure


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] kswapd0: page allocation failure (bisected to "slab: add sheaves to most caches")
  2026-02-23  8:41 ` Harry Yoo
@ 2026-02-23 11:12   ` Chris Bainbridge
  2026-02-23 11:59     ` Harry Yoo
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Bainbridge @ 2026-02-23 11:12 UTC (permalink / raw)
  To: Harry Yoo
  Cc: vbabka, surenb, hao.li, leitao, Liam.Howlett, zhao1.liu,
	linux-kernel, linux-mm, linux-btrfs, regressions

On Mon, Feb 23, 2026 at 05:41:17PM +0900, Harry Yoo wrote:
> On Sun, Feb 22, 2026 at 09:36:58PM +0000, Chris Bainbridge wrote:
> > Hi,
> > 
> > The latest mainline kernel (v6.19-11831-ga95f71ad3e2e) has page
> > allocation failures when doing things like compiling a kernel. I can
> > also reproduce this with a stress test like
> > `stress-ng --vm 2 --vm-bytes 110% --verify -v`
> 
> Hi, thanks for the report!
> 
> > [  104.032925] kswapd0: page allocation failure: order:0, mode:0xc0c40(GFP_NOFS|__GFP_COMP|__GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0
> > [  104.033307] CPU: 4 UID: 0 PID: 156 Comm: kswapd0 Not tainted 6.19.0-rc5-00027-g40fd0acc45d0 #435 PREEMPT(voluntary) 
> > [  104.033312] Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.17 12/18/2024
> > [  104.033314] Call Trace:
> > [  104.033316]  <TASK>
> > [  104.033319]  dump_stack_lvl+0x6a/0x90
> > [  104.033328]  warn_alloc.cold+0x95/0x1af
> > [  104.033334]  ? zone_watermark_ok+0x80/0x80
> > [  104.033350]  __alloc_frozen_pages_noprof+0xec3/0x2470
> > [  104.033353]  ? __lock_acquire+0x489/0x2600
> > [  104.033359]  ? stack_access_ok+0x1c0/0x1c0
> > [  104.033367]  ? warn_alloc+0x1d0/0x1d0
> > [  104.033371]  ? __lock_acquire+0x489/0x2600
> > [  104.033375]  ? _raw_spin_unlock_irqrestore+0x48/0x60
> > [  104.033379]  ? _raw_spin_unlock_irqrestore+0x48/0x60
> > [  104.033382]  ? lockdep_hardirqs_on+0x78/0x100
> > [  104.033394]  allocate_slab+0x2b7/0x510
> > [  104.033399]  refill_objects+0x25d/0x380
> > [  104.033407]  __pcs_replace_empty_main+0x193/0x5f0
> > [  104.033412]  kmem_cache_alloc_noprof+0x5b6/0x6f0
> > [  104.033415]  ? alloc_extent_state+0x1b/0x210 [btrfs]
> > [  104.033479]  alloc_extent_state+0x1b/0x210 [btrfs]
> > [  104.033527]  btrfs_clear_extent_bit_changeset+0x2be/0x9c0 [btrfs]
> 
> Hmm while bisect points out the first bad commit is
> commit e47c897a2949 ("slab: add sheaves to most caches"),
> 
> I think the caller is supposed to specify __GFP_NOWARN if it doesn't
> care about allocation failure?
> 
> btrfs_clear_extent_bit_changeset() says:
> >         if (!prealloc) {
> >                 /*
> >                  * Don't care for allocation failure here because we might end
> >                  * up not needing the pre-allocated extent state at all, which
> >                  * is the case if we only have in the tree extent states that 
> >                  * cover our input range and don't cover too any other range.
> >                  * If we end up needing a new extent state we allocate it later.
> >                  */
> >                 prealloc = alloc_extent_state(mask);
> >         }
> 
> Oh wait, I see what's going on. bisection pointed out the commit
> because slab tries to refill sheaves with __GFP_NOMEMALLOC (and then
> falls back to slowpath if it fails).
> 
> Since failing to refill sheaves doesn't mean the allocation will fail,
> it should specify __GFP_NOWARN with __GFP_NOMEMALLOC as long as there's
> fallback method.
> 
> But for __prefill_sheaf_pfmemalloc(), it should specify __GPF_NOWARN on
> the first attempt only when gfp_pfmemalloc_allowed() returns true.

Is this fix sufficient to do the right thing? I tested it, and it does
appear to prevent logging of the allocation failures for my test case.

diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
index d0dd50f7d279..d2e1083848e8 100644
--- a/fs/btrfs/extent-io-tree.c
+++ b/fs/btrfs/extent-io-tree.c
@@ -641,7 +641,7 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
 		 * cover our input range and don't cover too any other range.
 		 * If we end up needing a new extent state we allocate it later.
 		 */
-		prealloc = alloc_extent_state(mask);
+		prealloc = alloc_extent_state(mask | __GFP_NOWARN);
 	}
 
 	spin_lock(&tree->lock);


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] kswapd0: page allocation failure (bisected to "slab: add sheaves to most caches")
  2026-02-23 11:12   ` Chris Bainbridge
@ 2026-02-23 11:59     ` Harry Yoo
  2026-02-23 20:30       ` David Sterba
  0 siblings, 1 reply; 5+ messages in thread
From: Harry Yoo @ 2026-02-23 11:59 UTC (permalink / raw)
  To: Chris Bainbridge
  Cc: vbabka, surenb, hao.li, leitao, Liam.Howlett, zhao1.liu,
	linux-kernel, linux-mm, linux-btrfs, regressions

On Mon, Feb 23, 2026 at 11:12:47AM +0000, Chris Bainbridge wrote:
> On Mon, Feb 23, 2026 at 05:41:17PM +0900, Harry Yoo wrote:
> > On Sun, Feb 22, 2026 at 09:36:58PM +0000, Chris Bainbridge wrote:
> > > Hi,
> > > 
> > > The latest mainline kernel (v6.19-11831-ga95f71ad3e2e) has page
> > > allocation failures when doing things like compiling a kernel. I can
> > > also reproduce this with a stress test like
> > > `stress-ng --vm 2 --vm-bytes 110% --verify -v`
> > 
> > Hi, thanks for the report!
> > 
> > > [  104.032925] kswapd0: page allocation failure: order:0, mode:0xc0c40(GFP_NOFS|__GFP_COMP|__GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0
> > > [  104.033307] CPU: 4 UID: 0 PID: 156 Comm: kswapd0 Not tainted 6.19.0-rc5-00027-g40fd0acc45d0 #435 PREEMPT(voluntary) 
> > > [  104.033312] Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.17 12/18/2024
> > > [  104.033314] Call Trace:
> > > [  104.033316]  <TASK>
> > > [  104.033319]  dump_stack_lvl+0x6a/0x90
> > > [  104.033328]  warn_alloc.cold+0x95/0x1af
> > > [  104.033334]  ? zone_watermark_ok+0x80/0x80
> > > [  104.033350]  __alloc_frozen_pages_noprof+0xec3/0x2470
> > > [  104.033353]  ? __lock_acquire+0x489/0x2600
> > > [  104.033359]  ? stack_access_ok+0x1c0/0x1c0
> > > [  104.033367]  ? warn_alloc+0x1d0/0x1d0
> > > [  104.033371]  ? __lock_acquire+0x489/0x2600
> > > [  104.033375]  ? _raw_spin_unlock_irqrestore+0x48/0x60
> > > [  104.033379]  ? _raw_spin_unlock_irqrestore+0x48/0x60
> > > [  104.033382]  ? lockdep_hardirqs_on+0x78/0x100
> > > [  104.033394]  allocate_slab+0x2b7/0x510
> > > [  104.033399]  refill_objects+0x25d/0x380
> > > [  104.033407]  __pcs_replace_empty_main+0x193/0x5f0
> > > [  104.033412]  kmem_cache_alloc_noprof+0x5b6/0x6f0
> > > [  104.033415]  ? alloc_extent_state+0x1b/0x210 [btrfs]
> > > [  104.033479]  alloc_extent_state+0x1b/0x210 [btrfs]
> > > [  104.033527]  btrfs_clear_extent_bit_changeset+0x2be/0x9c0 [btrfs]
> > 
> > Hmm while bisect points out the first bad commit is
> > commit e47c897a2949 ("slab: add sheaves to most caches"),
> > 
> > I think the caller is supposed to specify __GFP_NOWARN if it doesn't
> > care about allocation failure?
> > 
> > btrfs_clear_extent_bit_changeset() says:
> > >         if (!prealloc) {
> > >                 /*
> > >                  * Don't care for allocation failure here because we might end
> > >                  * up not needing the pre-allocated extent state at all, which
> > >                  * is the case if we only have in the tree extent states that 
> > >                  * cover our input range and don't cover too any other range.
> > >                  * If we end up needing a new extent state we allocate it later.
> > >                  */
> > >                 prealloc = alloc_extent_state(mask);
> > >         }
> > 
> > Oh wait, I see what's going on. bisection pointed out the commit
> > because slab tries to refill sheaves with __GFP_NOMEMALLOC (and then
> > falls back to slowpath if it fails).
> > 
> > Since failing to refill sheaves doesn't mean the allocation will fail,
> > it should specify __GFP_NOWARN with __GFP_NOMEMALLOC as long as there's
> > fallback method.
> > 
> > But for __prefill_sheaf_pfmemalloc(), it should specify __GPF_NOWARN on
> > the first attempt only when gfp_pfmemalloc_allowed() returns true.
> 
> Is this fix sufficient to do the right thing? I tested it, and it does
> appear to prevent logging of the allocation failures for my test case.

I think we should do both both 1) setting __GFP_NOWARN from btrfs side
and 2) making slab try to refill sheaves with __GFP_NOWARN when
there's a fallback path.

I'm writing a fix for 2) and I'll send it soon.

> diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
> index d0dd50f7d279..d2e1083848e8 100644
> --- a/fs/btrfs/extent-io-tree.c
> +++ b/fs/btrfs/extent-io-tree.c
> @@ -641,7 +641,7 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
>  		 * cover our input range and don't cover too any other range.
>  		 * If we end up needing a new extent state we allocate it later.
>  		 */
> -		prealloc = alloc_extent_state(mask);
> +		prealloc = alloc_extent_state(mask | __GFP_NOWARN);

This seems to be a right thing to do to me, but as I'm not familiar
with btrfs, I'll let btrfs folks leave comment on it :)

>  	}
>  
>  	spin_lock(&tree->lock);

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] kswapd0: page allocation failure (bisected to "slab: add sheaves to most caches")
  2026-02-23 11:59     ` Harry Yoo
@ 2026-02-23 20:30       ` David Sterba
  0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2026-02-23 20:30 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Chris Bainbridge, vbabka, surenb, hao.li, leitao, Liam.Howlett,
	zhao1.liu, linux-kernel, linux-mm, linux-btrfs, regressions

On Mon, Feb 23, 2026 at 08:59:30PM +0900, Harry Yoo wrote:
> On Mon, Feb 23, 2026 at 11:12:47AM +0000, Chris Bainbridge wrote:
> > On Mon, Feb 23, 2026 at 05:41:17PM +0900, Harry Yoo wrote:
> > > On Sun, Feb 22, 2026 at 09:36:58PM +0000, Chris Bainbridge wrote:
> > > > Hi,
> > > > 
> > > > The latest mainline kernel (v6.19-11831-ga95f71ad3e2e) has page
> > > > allocation failures when doing things like compiling a kernel. I can
> > > > also reproduce this with a stress test like
> > > > `stress-ng --vm 2 --vm-bytes 110% --verify -v`
> > > 
> > > Hi, thanks for the report!
> > > 
> > > > [  104.032925] kswapd0: page allocation failure: order:0, mode:0xc0c40(GFP_NOFS|__GFP_COMP|__GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0
> > > > [  104.033307] CPU: 4 UID: 0 PID: 156 Comm: kswapd0 Not tainted 6.19.0-rc5-00027-g40fd0acc45d0 #435 PREEMPT(voluntary) 
> > > > [  104.033312] Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.17 12/18/2024
> > > > [  104.033314] Call Trace:
> > > > [  104.033316]  <TASK>
> > > > [  104.033319]  dump_stack_lvl+0x6a/0x90
> > > > [  104.033328]  warn_alloc.cold+0x95/0x1af
> > > > [  104.033334]  ? zone_watermark_ok+0x80/0x80
> > > > [  104.033350]  __alloc_frozen_pages_noprof+0xec3/0x2470
> > > > [  104.033353]  ? __lock_acquire+0x489/0x2600
> > > > [  104.033359]  ? stack_access_ok+0x1c0/0x1c0
> > > > [  104.033367]  ? warn_alloc+0x1d0/0x1d0
> > > > [  104.033371]  ? __lock_acquire+0x489/0x2600
> > > > [  104.033375]  ? _raw_spin_unlock_irqrestore+0x48/0x60
> > > > [  104.033379]  ? _raw_spin_unlock_irqrestore+0x48/0x60
> > > > [  104.033382]  ? lockdep_hardirqs_on+0x78/0x100
> > > > [  104.033394]  allocate_slab+0x2b7/0x510
> > > > [  104.033399]  refill_objects+0x25d/0x380
> > > > [  104.033407]  __pcs_replace_empty_main+0x193/0x5f0
> > > > [  104.033412]  kmem_cache_alloc_noprof+0x5b6/0x6f0
> > > > [  104.033415]  ? alloc_extent_state+0x1b/0x210 [btrfs]
> > > > [  104.033479]  alloc_extent_state+0x1b/0x210 [btrfs]
> > > > [  104.033527]  btrfs_clear_extent_bit_changeset+0x2be/0x9c0 [btrfs]
> > > 
> > > Hmm while bisect points out the first bad commit is
> > > commit e47c897a2949 ("slab: add sheaves to most caches"),
> > > 
> > > I think the caller is supposed to specify __GFP_NOWARN if it doesn't
> > > care about allocation failure?
> > > 
> > > btrfs_clear_extent_bit_changeset() says:
> > > >         if (!prealloc) {
> > > >                 /*
> > > >                  * Don't care for allocation failure here because we might end
> > > >                  * up not needing the pre-allocated extent state at all, which
> > > >                  * is the case if we only have in the tree extent states that 
> > > >                  * cover our input range and don't cover too any other range.
> > > >                  * If we end up needing a new extent state we allocate it later.
> > > >                  */
> > > >                 prealloc = alloc_extent_state(mask);
> > > >         }
> > > 
> > > Oh wait, I see what's going on. bisection pointed out the commit
> > > because slab tries to refill sheaves with __GFP_NOMEMALLOC (and then
> > > falls back to slowpath if it fails).
> > > 
> > > Since failing to refill sheaves doesn't mean the allocation will fail,
> > > it should specify __GFP_NOWARN with __GFP_NOMEMALLOC as long as there's
> > > fallback method.
> > > 
> > > But for __prefill_sheaf_pfmemalloc(), it should specify __GPF_NOWARN on
> > > the first attempt only when gfp_pfmemalloc_allowed() returns true.
> > 
> > Is this fix sufficient to do the right thing? I tested it, and it does
> > appear to prevent logging of the allocation failures for my test case.
> 
> I think we should do both both 1) setting __GFP_NOWARN from btrfs side
> and 2) making slab try to refill sheaves with __GFP_NOWARN when
> there's a fallback path.
> 
> I'm writing a fix for 2) and I'll send it soon.
> 
> > diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c
> > index d0dd50f7d279..d2e1083848e8 100644
> > --- a/fs/btrfs/extent-io-tree.c
> > +++ b/fs/btrfs/extent-io-tree.c
> > @@ -641,7 +641,7 @@ int btrfs_clear_extent_bit_changeset(struct extent_io_tree *tree, u64 start, u64
> >  		 * cover our input range and don't cover too any other range.
> >  		 * If we end up needing a new extent state we allocate it later.
> >  		 */
> > -		prealloc = alloc_extent_state(mask);
> > +		prealloc = alloc_extent_state(mask | __GFP_NOWARN);
> 
> This seems to be a right thing to do to me, but as I'm not familiar
> with btrfs, I'll let btrfs folks leave comment on it :)

I agree the flag should be added, as the comment explains allocation
failures are not fatal at this place. There's another call to the
alloc_extent_state() with GFP_ATOMIC so we cannot simply sink NOWARN
there.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-23 20:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-22 21:36 [REGRESSION] kswapd0: page allocation failure (bisected to "slab: add sheaves to most caches") Chris Bainbridge
2026-02-23  8:41 ` Harry Yoo
2026-02-23 11:12   ` Chris Bainbridge
2026-02-23 11:59     ` Harry Yoo
2026-02-23 20:30       ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox