linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* order-0 page alloc failures during interrupt context on v6.6.43
@ 2024-08-22 20:02 Matt Fleming
  2024-08-22 22:07 ` Matt Fleming
  2024-08-23 16:09 ` Christoph Lameter (Ampere)
  0 siblings, 2 replies; 7+ messages in thread
From: Matt Fleming @ 2024-08-22 20:02 UTC (permalink / raw)
  To: linux-mm; +Cc: willy, kernel-team

Hey there,

I'm seeing page allocation failures across the Cloudflare fleet,
typically during the network RX path, when trying to allocate order-0
pages in interrupt context. The machines appear to be under memory
pressure because the code that gets interrupted is
shrink_folio_list(). Below is an example stacktrace.

Does anyone have any pointers on how to dig into this some more? It
appears as though the machines are not able to reclaim memory fast
enough when under pressure. Happy to provide more metrics or stats on
request.

Thanks,
Matt

----8<----

kswapd1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC),
nodemask=(null),cpuset=/,mems_allowed=0-7
CPU: 10 PID: 696 Comm: kswapd1 Kdump: loaded Tainted: G           O
   6.6.43-CUSTOM #1
Hardware name: MACHINE
Call Trace:
 <IRQ>
 dump_stack_lvl+0x3c/0x50
 warn_alloc+0x13a/0x1c0
 __alloc_pages_slowpath.constprop.0+0xc9d/0xd10
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __alloc_pages_bulk+0x3a0/0x630
 __alloc_pages+0x327/0x340
 __napi_alloc_skb+0x16d/0x1f0
 bnxt_rx_page_skb+0x96/0x1b0 [bnxt_en]
 bnxt_rx_pkt+0x201/0x15e0 [bnxt_en]
 ? skb_release_data+0x14f/0x1b0
 __bnxt_poll_work+0x156/0x2b0 [bnxt_en]
 bnxt_poll+0xd9/0x1c0 [bnxt_en]
 ? srso_alias_return_thunk+0x5/0xfbef5
 __napi_poll+0x2b/0x1b0
 bpf_trampoline_6442524138+0x7d/0x1000
 __napi_poll+0x5/0x1b0
 net_rx_action+0x342/0x740
 ? srso_alias_return_thunk+0x5/0xfbef5
 handle_softirqs+0xcf/0x2b0
 irq_exit_rcu+0x6c/0x90
 sysvec_apic_timer_interrupt+0x72/0x90
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:queued_spin_lock_slowpath+0x260/0x2b0
Code: 83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 c0 30 03 00 48 03
04 d5 a0 d7 10 9c 48 89 28 8b 45 08 85 c0 75 09 f3 90 8b 45 08 <85> c0
74 f7 48 8b 55 00 48 85 d2 74 83 0f 0d 0a e9 7b ff ff ff 65
RSP: 0018:ffffc9000f9cb768 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff88905a3a9880 RCX: 0000000000000001
RDX: 000000000000001b RSI: 0000000000700000 RDI: ffff88905a3a9880
RBP: ffff88902f5330c0 R08: ffffc9000f9cb750 R09: 0000000000000000
R10: 0000000000000000 R11: 0000603fce623320 R12: 00000000002c0000
R13: 0000000000000001 R14: 00000000002c0000 R15: ffff889062f84a00
 zs_malloc+0x9d/0x520 [zsmalloc]
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __zstd_compress+0x60/0xa0 [zstd]
 zram_submit_bio+0x8d1/0x9f0 [zram]
 ? srso_alias_return_thunk+0x5/0xfbef5
 __submit_bio+0xaa/0x160
 submit_bio_noacct_nocheck+0x145/0x380
 ? submit_bio_noacct+0x24/0x4c0
 submit_bio_wait+0x5b/0xc0
 swap_writepage_bdev_sync+0xf8/0x170
 ? __pfx_submit_bio_wait_endio+0x10/0x10
 swap_writepage+0x36/0x80
 pageout+0xc8/0x240
 shrink_folio_list+0x489/0xd60
 shrink_lruvec+0x5a8/0xc40
 shrink_node+0x2c5/0x7a0
 balance_pgdat+0x32d/0x740
 kswapd+0x205/0x400
 ? __pfx_autoremove_wake_function+0x10/0x10
 ? __pfx_kswapd+0x10/0x10
 kthread+0xe8/0x120
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x34/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>
Mem-Info:
active_anon:14289951 inactive_anon:25056935 isolated_anon:1577
 active_file:3254095 inactive_file:3963476 isolated_file:1
 unevictable:4 dirty:305545 writeback:132
 slab_reclaimable:2916775 slab_unreclaimable:1689088
 mapped:2592762 shmem:1980658 pagetables:530605
 sec_pagetables:0 bounce:0
 kernel_misc_reclaimable:0
 free:618653 free_pcp:129763 free_cma:0
Node 0 active_anon:6461468kB inactive_anon:11667080kB
active_file:1971908kB inactive_file:2302944kB unevictable:0kB
isolated(anon):960kB isolated(file):0kB mapped:1070000kB
dirty:110140kB writeback:64kB shmem:842272kB shmem_thp:0kB
shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
kernel_stack:37624kB pagetables:235212kB sec_pagetables:0kB
all_unreclaimable? no
Node 1 active_anon:7027824kB inactive_anon:12544448kB
active_file:1695500kB inactive_file:2093056kB unevictable:0kB
isolated(anon):308kB isolated(file):0kB mapped:1694880kB
dirty:163436kB writeback:24kB shmem:1090692kB shmem_thp:0kB
shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
kernel_stack:31860kB pagetables:231608kB sec_pagetables:0kB
all_unreclaimable? no
Node 2 active_anon:7168612kB inactive_anon:11850084kB
active_file:1669812kB inactive_file:1870596kB unevictable:0kB
isolated(anon):144kB isolated(file):0kB mapped:1420628kB
dirty:105912kB writeback:24kB shmem:1092068kB shmem_thp:0kB
shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
kernel_stack:40220kB pagetables:263428kB sec_pagetables:0kB
all_unreclaimable? no
Node 3 active_anon:7160892kB inactive_anon:12851880kB
active_file:1453156kB inactive_file:1884092kB unevictable:0kB
isolated(anon):452kB isolated(file):0kB mapped:1199768kB
dirty:124548kB writeback:72kB shmem:965128kB shmem_thp:0kB
shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB
kernel_stack:27124kB pagetables:284676kB sec_pagetables:0kB
all_unreclaimable? no
Node 4 active_anon:7505196kB inactive_anon:12764280kB
active_file:1466756kB inactive_file:1878740kB unevictable:16kB
isolated(anon):640kB isolated(file):0kB mapped:1170484kB
dirty:136668kB writeback:44kB shmem:986212kB shmem_thp:0kB
shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB
kernel_stack:32380kB pagetables:312216kB sec_pagetables:0kB
all_unreclaimable? no
Node 5 active_anon:7169752kB inactive_anon:12867040kB
active_file:1769832kB inactive_file:1809448kB unevictable:0kB
isolated(anon):1008kB isolated(file):0kB mapped:1589272kB
dirty:128616kB writeback:112kB shmem:1108816kB shmem_thp:0kB
shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
kernel_stack:32784kB pagetables:278392kB sec_pagetables:0kB
all_unreclaimable? no
Node 6 active_anon:7333288kB inactive_anon:12854340kB
active_file:1504536kB inactive_file:2096488kB unevictable:0kB
isolated(anon):1336kB isolated(file):4kB mapped:1117792kB
dirty:228512kB writeback:92kB shmem:958680kB shmem_thp:0kB
shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
kernel_stack:43852kB pagetables:254060kB sec_pagetables:0kB
all_unreclaimable? no
Node 7 active_anon:7332772kB inactive_anon:12828588kB
active_file:1484880kB inactive_file:1918540kB unevictable:0kB
isolated(anon):1460kB isolated(file):0kB mapped:1108224kB
dirty:224348kB writeback:96kB shmem:878764kB shmem_thp:0kB
shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB
kernel_stack:35580kB pagetables:262828kB sec_pagetables:0kB
all_unreclaimable? no
Node 0 DMA free:11264kB boost:0kB min:48kB low:60kB high:72kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 2095 31529 31529
Node 0 DMA32 free:118988kB boost:0kB min:6832kB low:8976kB
high:11120kB reserved_highatomic:0KB active_anon:445316kB
inactive_anon:780792kB active_file:122148kB inactive_file:151592kB
unevictable:0kB writepending:1464kB present:2735864kB
managed:2145496kB mlocked:0kB bounce:0kB free_pcp:20468kB
local_pcp:48kB free_cma:0kB
lowmem_reserve[]: 0 0 29434 29434
Node 0 Normal free:266252kB boost:0kB min:95988kB low:126128kB
high:156268kB reserved_highatomic:305152KB active_anon:6016024kB
inactive_anon:10884436kB active_file:1849108kB inactive_file:2149856kB
unevictable:0kB writepending:108740kB present:30670848kB
managed:30141044kB mlocked:0kB bounce:0kB free_pcp:37432kB
local_pcp:84kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 1 Normal free:290496kB boost:0kB min:105164kB low:138184kB
high:171204kB reserved_highatomic:333824KB active_anon:7028084kB
inactive_anon:12543028kB active_file:1694884kB inactive_file:2092728kB
unevictable:0kB writepending:163200kB present:33552384kB
managed:33022704kB mlocked:0kB bounce:0kB free_pcp:53668kB
local_pcp:892kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 2 Normal free:295000kB boost:0kB min:105172kB low:138196kB
high:171220kB reserved_highatomic:333824KB active_anon:7168872kB
inactive_anon:11848752kB active_file:1668876kB inactive_file:1871016kB
unevictable:0kB writepending:106604kB present:33554432kB
managed:33024756kB mlocked:0kB bounce:0kB free_pcp:48468kB
local_pcp:752kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 3 Normal free:308228kB boost:0kB min:105012kB low:137984kB
high:170956kB reserved_highatomic:333824KB active_anon:7164068kB
inactive_anon:12847600kB active_file:1453016kB inactive_file:1885952kB
unevictable:0kB writepending:126480kB present:33553408kB
managed:32974232kB mlocked:0kB bounce:0kB free_pcp:64400kB
local_pcp:732kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 4 Normal free:271672kB boost:0kB min:105172kB low:138196kB
high:171220kB reserved_highatomic:333824KB active_anon:7505196kB
inactive_anon:12763688kB active_file:1465932kB inactive_file:1880212kB
unevictable:16kB writepending:137892kB present:33554432kB
managed:33024756kB mlocked:16kB bounce:0kB free_pcp:60204kB
local_pcp:632kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 5 Normal free:291824kB boost:0kB min:105168kB low:138188kB
high:171208kB reserved_highatomic:333824KB active_anon:7169428kB
inactive_anon:12866872kB active_file:1769184kB inactive_file:1811512kB
unevictable:0kB writepending:131024kB present:33553408kB
managed:33023728kB mlocked:0kB bounce:0kB free_pcp:78708kB
local_pcp:568kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 6 Normal free:310936kB boost:0kB min:105172kB low:138196kB
high:171220kB reserved_highatomic:333824KB active_anon:7333792kB
inactive_anon:12852816kB active_file:1503264kB inactive_file:2097500kB
unevictable:0kB writepending:229284kB present:33554432kB
managed:33024756kB mlocked:0kB bounce:0kB free_pcp:74936kB
local_pcp:796kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 7 Normal free:309668kB boost:0kB min:105112kB low:138116kB
high:171120kB reserved_highatomic:333824KB active_anon:7331892kB
inactive_anon:12827964kB active_file:1484024kB inactive_file:1920356kB
unevictable:0kB writepending:226576kB present:33541120kB
managed:33005940kB mlocked:0kB bounce:0kB free_pcp:80748kB
local_pcp:704kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
Node 0 DMA32: 2225*4kB (UME) 338*8kB (UME) 178*16kB (UME) 459*32kB
(UME) 215*64kB (UME) 115*128kB (ME) 86*256kB (UME) 35*512kB (UME)
4*1024kB (UM) 6*2048kB (M) 1*4096kB (U) = 118036kB
Node 0 Normal: 797*4kB (H) 871*8kB (H) 802*16kB (H) 804*32kB (H)
601*64kB (H) 310*128kB (H) 164*256kB (H) 67*512kB (H) 25*1024kB (H)
14*2048kB (H) 2*4096kB (H) = 265612kB
Node 1 Normal: 507*4kB (H) 680*8kB (H) 682*16kB (H) 699*32kB (H)
589*64kB (H) 363*128kB (H) 211*256kB (H) 93*512kB (H) 37*1024kB (H)
13*2048kB (H) 0*4096kB = 291052kB
Node 2 Normal: 598*4kB (H) 843*8kB (H) 740*16kB (H) 735*32kB (H)
507*64kB (H) 298*128kB (H) 175*256kB (H) 102*512kB (H) 37*1024kB (H)
21*2048kB (H) 1*4096kB (H) = 297104kB
Node 3 Normal: 440*4kB (H) 509*8kB (H) 493*16kB (H) 559*32kB (H)
438*64kB (H) 304*128kB (H) 197*256kB (H) 126*512kB (H) 50*1024kB (H)
21*2048kB (H) 0*4096kB = 307704kB
Node 4 Normal: 604*4kB (H) 716*8kB (H) 674*16kB (H) 819*32kB (H)
544*64kB (H) 303*128kB (H) 182*256kB (H) 74*512kB (H) 24*1024kB (H)
20*2048kB (H) 0*4096kB = 268752kB
Node 5 Normal: 809*4kB (H) 873*8kB (H) 775*16kB (H) 749*32kB (H)
414*64kB (H) 254*128kB (H) 154*256kB (H) 90*512kB (H) 37*1024kB (H)
31*2048kB (H) 0*4096kB = 292476kB
Node 6 Normal: 659*4kB (H) 689*8kB (H) 708*16kB (H) 851*32kB (H)
592*64kB (H) 386*128kB (H) 226*256kB (H) 91*512kB (H) 40*1024kB (H)
13*2048kB (H) 1*4096kB (H) = 310132kB
Node 7 Normal: 898*4kB (H) 907*8kB (H) 893*16kB (H) 897*32kB (H)
597*64kB (H) 375*128kB (H) 203*256kB (H) 86*512kB (H) 29*1024kB (H)
20*2048kB (H) 0*4096kB = 306704kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Node 4 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Node 5 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Node 6 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Node 7 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
9214746 total pagecache pages
17797 pages in swap cache
Free swap  = 208645424kB
Total swap = 263402492kB
67071581 pages RAM
0 pages HighMem/MovableOnly
1220888 pages reserved


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order-0 page alloc failures during interrupt context on v6.6.43
  2024-08-22 20:02 order-0 page alloc failures during interrupt context on v6.6.43 Matt Fleming
@ 2024-08-22 22:07 ` Matt Fleming
  2024-08-29 15:48   ` Vlastimil Babka
  2024-08-23 16:09 ` Christoph Lameter (Ampere)
  1 sibling, 1 reply; 7+ messages in thread
From: Matt Fleming @ 2024-08-22 22:07 UTC (permalink / raw)
  To: linux-mm; +Cc: willy, kernel-team, Mel Gorman

(Adding Mel to Cc list)

On Thu, Aug 22, 2024 at 9:02 PM Matt Fleming <mfleming@cloudflare.com> wrote:
>
> Hey there,
>
> I'm seeing page allocation failures across the Cloudflare fleet,
> typically during the network RX path, when trying to allocate order-0
> pages in interrupt context. The machines appear to be under memory
> pressure because the code that gets interrupted is
> shrink_folio_list(). Below is an example stacktrace.
>
> Does anyone have any pointers on how to dig into this some more? It
> appears as though the machines are not able to reclaim memory fast
> enough when under pressure. Happy to provide more metrics or stats on
> request.
>
> Thanks,
> Matt
>
> ----8<----
>
> kswapd1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC),
> nodemask=(null),cpuset=/,mems_allowed=0-7
> CPU: 10 PID: 696 Comm: kswapd1 Kdump: loaded Tainted: G           O
>    6.6.43-CUSTOM #1
> Hardware name: MACHINE
> Call Trace:
>  <IRQ>
>  dump_stack_lvl+0x3c/0x50
>  warn_alloc+0x13a/0x1c0
>  __alloc_pages_slowpath.constprop.0+0xc9d/0xd10
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  ? __alloc_pages_bulk+0x3a0/0x630
>  __alloc_pages+0x327/0x340
>  __napi_alloc_skb+0x16d/0x1f0
>  bnxt_rx_page_skb+0x96/0x1b0 [bnxt_en]
>  bnxt_rx_pkt+0x201/0x15e0 [bnxt_en]
>  ? skb_release_data+0x14f/0x1b0
>  __bnxt_poll_work+0x156/0x2b0 [bnxt_en]
>  bnxt_poll+0xd9/0x1c0 [bnxt_en]
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  __napi_poll+0x2b/0x1b0
>  bpf_trampoline_6442524138+0x7d/0x1000
>  __napi_poll+0x5/0x1b0
>  net_rx_action+0x342/0x740
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  handle_softirqs+0xcf/0x2b0
>  irq_exit_rcu+0x6c/0x90
>  sysvec_apic_timer_interrupt+0x72/0x90
>  </IRQ>
>  <TASK>
>  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> RIP: 0010:queued_spin_lock_slowpath+0x260/0x2b0
> Code: 83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 c0 30 03 00 48 03
> 04 d5 a0 d7 10 9c 48 89 28 8b 45 08 85 c0 75 09 f3 90 8b 45 08 <85> c0
> 74 f7 48 8b 55 00 48 85 d2 74 83 0f 0d 0a e9 7b ff ff ff 65
> RSP: 0018:ffffc9000f9cb768 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: ffff88905a3a9880 RCX: 0000000000000001
> RDX: 000000000000001b RSI: 0000000000700000 RDI: ffff88905a3a9880
> RBP: ffff88902f5330c0 R08: ffffc9000f9cb750 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000603fce623320 R12: 00000000002c0000
> R13: 0000000000000001 R14: 00000000002c0000 R15: ffff889062f84a00
>  zs_malloc+0x9d/0x520 [zsmalloc]
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  ? __zstd_compress+0x60/0xa0 [zstd]
>  zram_submit_bio+0x8d1/0x9f0 [zram]
>  ? srso_alias_return_thunk+0x5/0xfbef5
>  __submit_bio+0xaa/0x160
>  submit_bio_noacct_nocheck+0x145/0x380
>  ? submit_bio_noacct+0x24/0x4c0
>  submit_bio_wait+0x5b/0xc0
>  swap_writepage_bdev_sync+0xf8/0x170
>  ? __pfx_submit_bio_wait_endio+0x10/0x10
>  swap_writepage+0x36/0x80
>  pageout+0xc8/0x240
>  shrink_folio_list+0x489/0xd60
>  shrink_lruvec+0x5a8/0xc40
>  shrink_node+0x2c5/0x7a0
>  balance_pgdat+0x32d/0x740
>  kswapd+0x205/0x400
>  ? __pfx_autoremove_wake_function+0x10/0x10
>  ? __pfx_kswapd+0x10/0x10
>  kthread+0xe8/0x120
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork+0x34/0x50
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
> Mem-Info:
> active_anon:14289951 inactive_anon:25056935 isolated_anon:1577
>  active_file:3254095 inactive_file:3963476 isolated_file:1
>  unevictable:4 dirty:305545 writeback:132
>  slab_reclaimable:2916775 slab_unreclaimable:1689088
>  mapped:2592762 shmem:1980658 pagetables:530605
>  sec_pagetables:0 bounce:0
>  kernel_misc_reclaimable:0
>  free:618653 free_pcp:129763 free_cma:0
> Node 0 active_anon:6461468kB inactive_anon:11667080kB
> active_file:1971908kB inactive_file:2302944kB unevictable:0kB
> isolated(anon):960kB isolated(file):0kB mapped:1070000kB
> dirty:110140kB writeback:64kB shmem:842272kB shmem_thp:0kB
> shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
> kernel_stack:37624kB pagetables:235212kB sec_pagetables:0kB
> all_unreclaimable? no
> Node 1 active_anon:7027824kB inactive_anon:12544448kB
> active_file:1695500kB inactive_file:2093056kB unevictable:0kB
> isolated(anon):308kB isolated(file):0kB mapped:1694880kB
> dirty:163436kB writeback:24kB shmem:1090692kB shmem_thp:0kB
> shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
> kernel_stack:31860kB pagetables:231608kB sec_pagetables:0kB
> all_unreclaimable? no
> Node 2 active_anon:7168612kB inactive_anon:11850084kB
> active_file:1669812kB inactive_file:1870596kB unevictable:0kB
> isolated(anon):144kB isolated(file):0kB mapped:1420628kB
> dirty:105912kB writeback:24kB shmem:1092068kB shmem_thp:0kB
> shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
> kernel_stack:40220kB pagetables:263428kB sec_pagetables:0kB
> all_unreclaimable? no
> Node 3 active_anon:7160892kB inactive_anon:12851880kB
> active_file:1453156kB inactive_file:1884092kB unevictable:0kB
> isolated(anon):452kB isolated(file):0kB mapped:1199768kB
> dirty:124548kB writeback:72kB shmem:965128kB shmem_thp:0kB
> shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB
> kernel_stack:27124kB pagetables:284676kB sec_pagetables:0kB
> all_unreclaimable? no
> Node 4 active_anon:7505196kB inactive_anon:12764280kB
> active_file:1466756kB inactive_file:1878740kB unevictable:16kB
> isolated(anon):640kB isolated(file):0kB mapped:1170484kB
> dirty:136668kB writeback:44kB shmem:986212kB shmem_thp:0kB
> shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB
> kernel_stack:32380kB pagetables:312216kB sec_pagetables:0kB
> all_unreclaimable? no
> Node 5 active_anon:7169752kB inactive_anon:12867040kB
> active_file:1769832kB inactive_file:1809448kB unevictable:0kB
> isolated(anon):1008kB isolated(file):0kB mapped:1589272kB
> dirty:128616kB writeback:112kB shmem:1108816kB shmem_thp:0kB
> shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
> kernel_stack:32784kB pagetables:278392kB sec_pagetables:0kB
> all_unreclaimable? no
> Node 6 active_anon:7333288kB inactive_anon:12854340kB
> active_file:1504536kB inactive_file:2096488kB unevictable:0kB
> isolated(anon):1336kB isolated(file):4kB mapped:1117792kB
> dirty:228512kB writeback:92kB shmem:958680kB shmem_thp:0kB
> shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
> kernel_stack:43852kB pagetables:254060kB sec_pagetables:0kB
> all_unreclaimable? no
> Node 7 active_anon:7332772kB inactive_anon:12828588kB
> active_file:1484880kB inactive_file:1918540kB unevictable:0kB
> isolated(anon):1460kB isolated(file):0kB mapped:1108224kB
> dirty:224348kB writeback:96kB shmem:878764kB shmem_thp:0kB
> shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB
> kernel_stack:35580kB pagetables:262828kB sec_pagetables:0kB
> all_unreclaimable? no
> Node 0 DMA free:11264kB boost:0kB min:48kB low:60kB high:72kB
> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
> present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB
> local_pcp:0kB free_cma:0kB
> lowmem_reserve[]: 0 2095 31529 31529
> Node 0 DMA32 free:118988kB boost:0kB min:6832kB low:8976kB
> high:11120kB reserved_highatomic:0KB active_anon:445316kB
> inactive_anon:780792kB active_file:122148kB inactive_file:151592kB
> unevictable:0kB writepending:1464kB present:2735864kB
> managed:2145496kB mlocked:0kB bounce:0kB free_pcp:20468kB
> local_pcp:48kB free_cma:0kB
> lowmem_reserve[]: 0 0 29434 29434
> Node 0 Normal free:266252kB boost:0kB min:95988kB low:126128kB
> high:156268kB reserved_highatomic:305152KB active_anon:6016024kB
> inactive_anon:10884436kB active_file:1849108kB inactive_file:2149856kB
> unevictable:0kB writepending:108740kB present:30670848kB
> managed:30141044kB mlocked:0kB bounce:0kB free_pcp:37432kB
> local_pcp:84kB free_cma:0kB
> lowmem_reserve[]: 0 0 0 0
> Node 1 Normal free:290496kB boost:0kB min:105164kB low:138184kB
> high:171204kB reserved_highatomic:333824KB active_anon:7028084kB
> inactive_anon:12543028kB active_file:1694884kB inactive_file:2092728kB
> unevictable:0kB writepending:163200kB present:33552384kB
> managed:33022704kB mlocked:0kB bounce:0kB free_pcp:53668kB
> local_pcp:892kB free_cma:0kB
> lowmem_reserve[]: 0 0 0 0
> Node 2 Normal free:295000kB boost:0kB min:105172kB low:138196kB
> high:171220kB reserved_highatomic:333824KB active_anon:7168872kB
> inactive_anon:11848752kB active_file:1668876kB inactive_file:1871016kB
> unevictable:0kB writepending:106604kB present:33554432kB
> managed:33024756kB mlocked:0kB bounce:0kB free_pcp:48468kB
> local_pcp:752kB free_cma:0kB
> lowmem_reserve[]: 0 0 0 0
> Node 3 Normal free:308228kB boost:0kB min:105012kB low:137984kB
> high:170956kB reserved_highatomic:333824KB active_anon:7164068kB
> inactive_anon:12847600kB active_file:1453016kB inactive_file:1885952kB
> unevictable:0kB writepending:126480kB present:33553408kB
> managed:32974232kB mlocked:0kB bounce:0kB free_pcp:64400kB
> local_pcp:732kB free_cma:0kB
> lowmem_reserve[]: 0 0 0 0
> Node 4 Normal free:271672kB boost:0kB min:105172kB low:138196kB
> high:171220kB reserved_highatomic:333824KB active_anon:7505196kB
> inactive_anon:12763688kB active_file:1465932kB inactive_file:1880212kB
> unevictable:16kB writepending:137892kB present:33554432kB
> managed:33024756kB mlocked:16kB bounce:0kB free_pcp:60204kB
> local_pcp:632kB free_cma:0kB
> lowmem_reserve[]: 0 0 0 0
> Node 5 Normal free:291824kB boost:0kB min:105168kB low:138188kB
> high:171208kB reserved_highatomic:333824KB active_anon:7169428kB
> inactive_anon:12866872kB active_file:1769184kB inactive_file:1811512kB
> unevictable:0kB writepending:131024kB present:33553408kB
> managed:33023728kB mlocked:0kB bounce:0kB free_pcp:78708kB
> local_pcp:568kB free_cma:0kB
> lowmem_reserve[]: 0 0 0 0
> Node 6 Normal free:310936kB boost:0kB min:105172kB low:138196kB
> high:171220kB reserved_highatomic:333824KB active_anon:7333792kB
> inactive_anon:12852816kB active_file:1503264kB inactive_file:2097500kB
> unevictable:0kB writepending:229284kB present:33554432kB
> managed:33024756kB mlocked:0kB bounce:0kB free_pcp:74936kB
> local_pcp:796kB free_cma:0kB
> lowmem_reserve[]: 0 0 0 0
> Node 7 Normal free:309668kB boost:0kB min:105112kB low:138116kB
> high:171120kB reserved_highatomic:333824KB active_anon:7331892kB
> inactive_anon:12827964kB active_file:1484024kB inactive_file:1920356kB
> unevictable:0kB writepending:226576kB present:33541120kB
> managed:33005940kB mlocked:0kB bounce:0kB free_pcp:80748kB
> local_pcp:704kB free_cma:0kB
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
> Node 0 DMA32: 2225*4kB (UME) 338*8kB (UME) 178*16kB (UME) 459*32kB
> (UME) 215*64kB (UME) 115*128kB (ME) 86*256kB (UME) 35*512kB (UME)
> 4*1024kB (UM) 6*2048kB (M) 1*4096kB (U) = 118036kB
> Node 0 Normal: 797*4kB (H) 871*8kB (H) 802*16kB (H) 804*32kB (H)
> 601*64kB (H) 310*128kB (H) 164*256kB (H) 67*512kB (H) 25*1024kB (H)
> 14*2048kB (H) 2*4096kB (H) = 265612kB
> Node 1 Normal: 507*4kB (H) 680*8kB (H) 682*16kB (H) 699*32kB (H)
> 589*64kB (H) 363*128kB (H) 211*256kB (H) 93*512kB (H) 37*1024kB (H)
> 13*2048kB (H) 0*4096kB = 291052kB
> Node 2 Normal: 598*4kB (H) 843*8kB (H) 740*16kB (H) 735*32kB (H)
> 507*64kB (H) 298*128kB (H) 175*256kB (H) 102*512kB (H) 37*1024kB (H)
> 21*2048kB (H) 1*4096kB (H) = 297104kB
> Node 3 Normal: 440*4kB (H) 509*8kB (H) 493*16kB (H) 559*32kB (H)
> 438*64kB (H) 304*128kB (H) 197*256kB (H) 126*512kB (H) 50*1024kB (H)
> 21*2048kB (H) 0*4096kB = 307704kB
> Node 4 Normal: 604*4kB (H) 716*8kB (H) 674*16kB (H) 819*32kB (H)
> 544*64kB (H) 303*128kB (H) 182*256kB (H) 74*512kB (H) 24*1024kB (H)
> 20*2048kB (H) 0*4096kB = 268752kB
> Node 5 Normal: 809*4kB (H) 873*8kB (H) 775*16kB (H) 749*32kB (H)
> 414*64kB (H) 254*128kB (H) 154*256kB (H) 90*512kB (H) 37*1024kB (H)
> 31*2048kB (H) 0*4096kB = 292476kB
> Node 6 Normal: 659*4kB (H) 689*8kB (H) 708*16kB (H) 851*32kB (H)
> 592*64kB (H) 386*128kB (H) 226*256kB (H) 91*512kB (H) 40*1024kB (H)
> 13*2048kB (H) 1*4096kB (H) = 310132kB
> Node 7 Normal: 898*4kB (H) 907*8kB (H) 893*16kB (H) 897*32kB (H)
> 597*64kB (H) 375*128kB (H) 203*256kB (H) 86*512kB (H) 29*1024kB (H)
> 20*2048kB (H) 0*4096kB = 306704kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> Node 4 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> Node 5 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> Node 6 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> Node 7 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> 9214746 total pagecache pages
> 17797 pages in swap cache
> Free swap  = 208645424kB
> Total swap = 263402492kB
> 67071581 pages RAM
> 0 pages HighMem/MovableOnly
> 1220888 pages reserved


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order-0 page alloc failures during interrupt context on v6.6.43
  2024-08-22 20:02 order-0 page alloc failures during interrupt context on v6.6.43 Matt Fleming
  2024-08-22 22:07 ` Matt Fleming
@ 2024-08-23 16:09 ` Christoph Lameter (Ampere)
  2024-08-23 20:25   ` Matt Fleming
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Lameter (Ampere) @ 2024-08-23 16:09 UTC (permalink / raw)
  To: Matt Fleming; +Cc: linux-mm, willy, kernel-team

On Thu, 22 Aug 2024, Matt Fleming wrote:

> I'm seeing page allocation failures across the Cloudflare fleet,
> typically during the network RX path, when trying to allocate order-0
> pages in interrupt context. The machines appear to be under memory
> pressure because the code that gets interrupted is
> shrink_folio_list(). Below is an example stacktrace.
>
> Does anyone have any pointers on how to dig into this some more? It
> appears as though the machines are not able to reclaim memory fast
> enough when under pressure. Happy to provide more metrics or stats on
> request.

Look at the full kernel log output until the time of the allocation
failure? It looks like there is enough memory in every zone to satify the
allocation request.

Stacktrace looks like memory is pushed out via kswapd and zram to
somewhere and then we get interrupted by incoming network traffic.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order-0 page alloc failures during interrupt context on v6.6.43
  2024-08-23 16:09 ` Christoph Lameter (Ampere)
@ 2024-08-23 20:25   ` Matt Fleming
  0 siblings, 0 replies; 7+ messages in thread
From: Matt Fleming @ 2024-08-23 20:25 UTC (permalink / raw)
  To: Christoph Lameter (Ampere); +Cc: linux-mm, willy, kernel-team

On Fri, Aug 23, 2024 at 5:09 PM Christoph Lameter (Ampere)
<cl@gentwo.org> wrote:
>
> Look at the full kernel log output until the time of the allocation
> failure? It looks like there is enough memory in every zone to satify the
> allocation request.
>
> Stacktrace looks like memory is pushed out via kswapd and zram to
> somewhere and then we get interrupted by incoming network traffic.

Sure. My original email was a bit terse because it was a continuation
from a discussion on IRC and I didn't explicitly say so but I already
read the stacktrace. As you say, given that there's plenty of free
pages in the zones including factoring in the watermarks, it's not
clear to me why the allocation failed at all.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order-0 page alloc failures during interrupt context on v6.6.43
  2024-08-22 22:07 ` Matt Fleming
@ 2024-08-29 15:48   ` Vlastimil Babka
  2024-08-30 15:43     ` Matt Fleming
  0 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2024-08-29 15:48 UTC (permalink / raw)
  To: Matt Fleming, linux-mm, Christoph Lameter (Ampere)
  Cc: willy, kernel-team, Mel Gorman, Johannes Weiner, Charan Teja Kalla

On 8/23/24 00:07, Matt Fleming wrote:
> (Adding Mel to Cc list)
> 
> On Thu, Aug 22, 2024 at 9:02 PM Matt Fleming <mfleming@cloudflare.com> wrote:
>>
>> Hey there,
>>
>> I'm seeing page allocation failures across the Cloudflare fleet,
>> typically during the network RX path, when trying to allocate order-0
>> pages in interrupt context. The machines appear to be under memory
>> pressure because the code that gets interrupted is
>> shrink_folio_list(). Below is an example stacktrace.
>>
>> Does anyone have any pointers on how to dig into this some more? It
>> appears as though the machines are not able to reclaim memory fast
>> enough when under pressure. Happy to provide more metrics or stats on
>> request.
>>
>> Thanks,
>> Matt
>>
>> ----8<----
>>
>> kswapd1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC),
>> nodemask=(null),cpuset=/,mems_allowed=0-7
>> CPU: 10 PID: 696 Comm: kswapd1 Kdump: loaded Tainted: G           O
>>    6.6.43-CUSTOM #1
>> Hardware name: MACHINE
>> Call Trace:
>>  <IRQ>
>>  dump_stack_lvl+0x3c/0x50
>>  warn_alloc+0x13a/0x1c0
>>  __alloc_pages_slowpath.constprop.0+0xc9d/0xd10
>>  ? srso_alias_return_thunk+0x5/0xfbef5
>>  ? __alloc_pages_bulk+0x3a0/0x630
>>  __alloc_pages+0x327/0x340
>>  __napi_alloc_skb+0x16d/0x1f0
>>  bnxt_rx_page_skb+0x96/0x1b0 [bnxt_en]
>>  bnxt_rx_pkt+0x201/0x15e0 [bnxt_en]
>>  ? skb_release_data+0x14f/0x1b0
>>  __bnxt_poll_work+0x156/0x2b0 [bnxt_en]
>>  bnxt_poll+0xd9/0x1c0 [bnxt_en]
>>  ? srso_alias_return_thunk+0x5/0xfbef5
>>  __napi_poll+0x2b/0x1b0
>>  bpf_trampoline_6442524138+0x7d/0x1000
>>  __napi_poll+0x5/0x1b0
>>  net_rx_action+0x342/0x740
>>  ? srso_alias_return_thunk+0x5/0xfbef5
>>  handle_softirqs+0xcf/0x2b0
>>  irq_exit_rcu+0x6c/0x90
>>  sysvec_apic_timer_interrupt+0x72/0x90
>>  </IRQ>
>>  <TASK>
>>  asm_sysvec_apic_timer_interrupt+0x1a/0x20
>> RIP: 0010:queued_spin_lock_slowpath+0x260/0x2b0
>> Code: 83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 c0 30 03 00 48 03
>> 04 d5 a0 d7 10 9c 48 89 28 8b 45 08 85 c0 75 09 f3 90 8b 45 08 <85> c0
>> 74 f7 48 8b 55 00 48 85 d2 74 83 0f 0d 0a e9 7b ff ff ff 65
>> RSP: 0018:ffffc9000f9cb768 EFLAGS: 00000246
>> RAX: 0000000000000000 RBX: ffff88905a3a9880 RCX: 0000000000000001
>> RDX: 000000000000001b RSI: 0000000000700000 RDI: ffff88905a3a9880
>> RBP: ffff88902f5330c0 R08: ffffc9000f9cb750 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000603fce623320 R12: 00000000002c0000
>> R13: 0000000000000001 R14: 00000000002c0000 R15: ffff889062f84a00
>>  zs_malloc+0x9d/0x520 [zsmalloc]
>>  ? srso_alias_return_thunk+0x5/0xfbef5
>>  ? __zstd_compress+0x60/0xa0 [zstd]
>>  zram_submit_bio+0x8d1/0x9f0 [zram]
>>  ? srso_alias_return_thunk+0x5/0xfbef5
>>  __submit_bio+0xaa/0x160
>>  submit_bio_noacct_nocheck+0x145/0x380
>>  ? submit_bio_noacct+0x24/0x4c0
>>  submit_bio_wait+0x5b/0xc0
>>  swap_writepage_bdev_sync+0xf8/0x170
>>  ? __pfx_submit_bio_wait_endio+0x10/0x10
>>  swap_writepage+0x36/0x80
>>  pageout+0xc8/0x240
>>  shrink_folio_list+0x489/0xd60
>>  shrink_lruvec+0x5a8/0xc40
>>  shrink_node+0x2c5/0x7a0
>>  balance_pgdat+0x32d/0x740
>>  kswapd+0x205/0x400
>>  ? __pfx_autoremove_wake_function+0x10/0x10
>>  ? __pfx_kswapd+0x10/0x10
>>  kthread+0xe8/0x120
>>  ? __pfx_kthread+0x10/0x10
>>  ret_from_fork+0x34/0x50
>>  ? __pfx_kthread+0x10/0x10
>>  ret_from_fork_asm+0x1b/0x30
>>  </TASK>
>> Mem-Info:
>> active_anon:14289951 inactive_anon:25056935 isolated_anon:1577
>>  active_file:3254095 inactive_file:3963476 isolated_file:1
>>  unevictable:4 dirty:305545 writeback:132
>>  slab_reclaimable:2916775 slab_unreclaimable:1689088
>>  mapped:2592762 shmem:1980658 pagetables:530605
>>  sec_pagetables:0 bounce:0
>>  kernel_misc_reclaimable:0
>>  free:618653 free_pcp:129763 free_cma:0
>> Node 0 active_anon:6461468kB inactive_anon:11667080kB
>> active_file:1971908kB inactive_file:2302944kB unevictable:0kB
>> isolated(anon):960kB isolated(file):0kB mapped:1070000kB
>> dirty:110140kB writeback:64kB shmem:842272kB shmem_thp:0kB
>> shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
>> kernel_stack:37624kB pagetables:235212kB sec_pagetables:0kB
>> all_unreclaimable? no
>> Node 1 active_anon:7027824kB inactive_anon:12544448kB
>> active_file:1695500kB inactive_file:2093056kB unevictable:0kB
>> isolated(anon):308kB isolated(file):0kB mapped:1694880kB
>> dirty:163436kB writeback:24kB shmem:1090692kB shmem_thp:0kB
>> shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
>> kernel_stack:31860kB pagetables:231608kB sec_pagetables:0kB
>> all_unreclaimable? no
>> Node 2 active_anon:7168612kB inactive_anon:11850084kB
>> active_file:1669812kB inactive_file:1870596kB unevictable:0kB
>> isolated(anon):144kB isolated(file):0kB mapped:1420628kB
>> dirty:105912kB writeback:24kB shmem:1092068kB shmem_thp:0kB
>> shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
>> kernel_stack:40220kB pagetables:263428kB sec_pagetables:0kB
>> all_unreclaimable? no
>> Node 3 active_anon:7160892kB inactive_anon:12851880kB
>> active_file:1453156kB inactive_file:1884092kB unevictable:0kB
>> isolated(anon):452kB isolated(file):0kB mapped:1199768kB
>> dirty:124548kB writeback:72kB shmem:965128kB shmem_thp:0kB
>> shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB
>> kernel_stack:27124kB pagetables:284676kB sec_pagetables:0kB
>> all_unreclaimable? no
>> Node 4 active_anon:7505196kB inactive_anon:12764280kB
>> active_file:1466756kB inactive_file:1878740kB unevictable:16kB
>> isolated(anon):640kB isolated(file):0kB mapped:1170484kB
>> dirty:136668kB writeback:44kB shmem:986212kB shmem_thp:0kB
>> shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB
>> kernel_stack:32380kB pagetables:312216kB sec_pagetables:0kB
>> all_unreclaimable? no
>> Node 5 active_anon:7169752kB inactive_anon:12867040kB
>> active_file:1769832kB inactive_file:1809448kB unevictable:0kB
>> isolated(anon):1008kB isolated(file):0kB mapped:1589272kB
>> dirty:128616kB writeback:112kB shmem:1108816kB shmem_thp:0kB
>> shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
>> kernel_stack:32784kB pagetables:278392kB sec_pagetables:0kB
>> all_unreclaimable? no
>> Node 6 active_anon:7333288kB inactive_anon:12854340kB
>> active_file:1504536kB inactive_file:2096488kB unevictable:0kB
>> isolated(anon):1336kB isolated(file):4kB mapped:1117792kB
>> dirty:228512kB writeback:92kB shmem:958680kB shmem_thp:0kB
>> shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB
>> kernel_stack:43852kB pagetables:254060kB sec_pagetables:0kB
>> all_unreclaimable? no
>> Node 7 active_anon:7332772kB inactive_anon:12828588kB
>> active_file:1484880kB inactive_file:1918540kB unevictable:0kB
>> isolated(anon):1460kB isolated(file):0kB mapped:1108224kB
>> dirty:224348kB writeback:96kB shmem:878764kB shmem_thp:0kB
>> shmem_pmdmapped:0kB anon_thp:2048kB writeback_tmp:0kB
>> kernel_stack:35580kB pagetables:262828kB sec_pagetables:0kB
>> all_unreclaimable? no
>> Node 0 DMA free:11264kB boost:0kB min:48kB low:60kB high:72kB
>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>> present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB
>> local_pcp:0kB free_cma:0kB
>> lowmem_reserve[]: 0 2095 31529 31529
>> Node 0 DMA32 free:118988kB boost:0kB min:6832kB low:8976kB
>> high:11120kB reserved_highatomic:0KB active_anon:445316kB
>> inactive_anon:780792kB active_file:122148kB inactive_file:151592kB
>> unevictable:0kB writepending:1464kB present:2735864kB
>> managed:2145496kB mlocked:0kB bounce:0kB free_pcp:20468kB
>> local_pcp:48kB free_cma:0kB
>> lowmem_reserve[]: 0 0 29434 29434
>> Node 0 Normal free:266252kB boost:0kB min:95988kB low:126128kB

We're nominally above min watermark (free > min).

>> high:156268kB reserved_highatomic:305152KB active_anon:6016024kB

But when subtracting reserve_highatomic from free, it gets us even below zero?

>> inactive_anon:10884436kB active_file:1849108kB inactive_file:2149856kB
>> unevictable:0kB writepending:108740kB present:30670848kB
>> managed:30141044kB mlocked:0kB bounce:0kB free_pcp:37432kB
>> local_pcp:84kB free_cma:0kB
>> lowmem_reserve[]: 0 0 0 0
>> Node 1 Normal free:290496kB boost:0kB min:105164kB low:138184kB
>> high:171204kB reserved_highatomic:333824KB active_anon:7028084kB
>> inactive_anon:12543028kB active_file:1694884kB inactive_file:2092728kB
>> unevictable:0kB writepending:163200kB present:33552384kB
>> managed:33022704kB mlocked:0kB bounce:0kB free_pcp:53668kB
>> local_pcp:892kB free_cma:0kB
>> lowmem_reserve[]: 0 0 0 0
>> Node 2 Normal free:295000kB boost:0kB min:105172kB low:138196kB
>> high:171220kB reserved_highatomic:333824KB active_anon:7168872kB
>> inactive_anon:11848752kB active_file:1668876kB inactive_file:1871016kB
>> unevictable:0kB writepending:106604kB present:33554432kB
>> managed:33024756kB mlocked:0kB bounce:0kB free_pcp:48468kB
>> local_pcp:752kB free_cma:0kB
>> lowmem_reserve[]: 0 0 0 0
>> Node 3 Normal free:308228kB boost:0kB min:105012kB low:137984kB
>> high:170956kB reserved_highatomic:333824KB active_anon:7164068kB
>> inactive_anon:12847600kB active_file:1453016kB inactive_file:1885952kB
>> unevictable:0kB writepending:126480kB present:33553408kB
>> managed:32974232kB mlocked:0kB bounce:0kB free_pcp:64400kB
>> local_pcp:732kB free_cma:0kB
>> lowmem_reserve[]: 0 0 0 0
>> Node 4 Normal free:271672kB boost:0kB min:105172kB low:138196kB
>> high:171220kB reserved_highatomic:333824KB active_anon:7505196kB
>> inactive_anon:12763688kB active_file:1465932kB inactive_file:1880212kB
>> unevictable:16kB writepending:137892kB present:33554432kB
>> managed:33024756kB mlocked:16kB bounce:0kB free_pcp:60204kB
>> local_pcp:632kB free_cma:0kB
>> lowmem_reserve[]: 0 0 0 0
>> Node 5 Normal free:291824kB boost:0kB min:105168kB low:138188kB
>> high:171208kB reserved_highatomic:333824KB active_anon:7169428kB
>> inactive_anon:12866872kB active_file:1769184kB inactive_file:1811512kB
>> unevictable:0kB writepending:131024kB present:33553408kB
>> managed:33023728kB mlocked:0kB bounce:0kB free_pcp:78708kB
>> local_pcp:568kB free_cma:0kB
>> lowmem_reserve[]: 0 0 0 0
>> Node 6 Normal free:310936kB boost:0kB min:105172kB low:138196kB
>> high:171220kB reserved_highatomic:333824KB active_anon:7333792kB
>> inactive_anon:12852816kB active_file:1503264kB inactive_file:2097500kB
>> unevictable:0kB writepending:229284kB present:33554432kB
>> managed:33024756kB mlocked:0kB bounce:0kB free_pcp:74936kB
>> local_pcp:796kB free_cma:0kB
>> lowmem_reserve[]: 0 0 0 0
>> Node 7 Normal free:309668kB boost:0kB min:105112kB low:138116kB
>> high:171120kB reserved_highatomic:333824KB active_anon:7331892kB

All of the nodes have the same amount of reserved_highatomic.

>> inactive_anon:12827964kB active_file:1484024kB inactive_file:1920356kB
>> unevictable:0kB writepending:226576kB present:33541120kB
>> managed:33005940kB mlocked:0kB bounce:0kB free_pcp:80748kB
>> local_pcp:704kB free_cma:0kB
>> lowmem_reserve[]: 0 0 0 0
>> Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
>> 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
>> Node 0 DMA32: 2225*4kB (UME) 338*8kB (UME) 178*16kB (UME) 459*32kB
>> (UME) 215*64kB (UME) 115*128kB (ME) 86*256kB (UME) 35*512kB (UME)
>> 4*1024kB (UM) 6*2048kB (M) 1*4096kB (U) = 118036kB
>> Node 0 Normal: 797*4kB (H) 871*8kB (H) 802*16kB (H) 804*32kB (H)
>> 601*64kB (H) 310*128kB (H) 164*256kB (H) 67*512kB (H) 25*1024kB (H)
>> 14*2048kB (H) 2*4096kB (H) = 265612kB
>> Node 1 Normal: 507*4kB (H) 680*8kB (H) 682*16kB (H) 699*32kB (H)
>> 589*64kB (H) 363*128kB (H) 211*256kB (H) 93*512kB (H) 37*1024kB (H)
>> 13*2048kB (H) 0*4096kB = 291052kB
>> Node 2 Normal: 598*4kB (H) 843*8kB (H) 740*16kB (H) 735*32kB (H)
>> 507*64kB (H) 298*128kB (H) 175*256kB (H) 102*512kB (H) 37*1024kB (H)
>> 21*2048kB (H) 1*4096kB (H) = 297104kB
>> Node 3 Normal: 440*4kB (H) 509*8kB (H) 493*16kB (H) 559*32kB (H)
>> 438*64kB (H) 304*128kB (H) 197*256kB (H) 126*512kB (H) 50*1024kB (H)
>> 21*2048kB (H) 0*4096kB = 307704kB
>> Node 4 Normal: 604*4kB (H) 716*8kB (H) 674*16kB (H) 819*32kB (H)
>> 544*64kB (H) 303*128kB (H) 182*256kB (H) 74*512kB (H) 24*1024kB (H)
>> 20*2048kB (H) 0*4096kB = 268752kB
>> Node 5 Normal: 809*4kB (H) 873*8kB (H) 775*16kB (H) 749*32kB (H)
>> 414*64kB (H) 254*128kB (H) 154*256kB (H) 90*512kB (H) 37*1024kB (H)
>> 31*2048kB (H) 0*4096kB = 292476kB
>> Node 6 Normal: 659*4kB (H) 689*8kB (H) 708*16kB (H) 851*32kB (H)
>> 592*64kB (H) 386*128kB (H) 226*256kB (H) 91*512kB (H) 40*1024kB (H)
>> 13*2048kB (H) 1*4096kB (H) = 310132kB
>> Node 7 Normal: 898*4kB (H) 907*8kB (H) 893*16kB (H) 897*32kB (H)
>> 597*64kB (H) 375*128kB (H) 203*256kB (H) 86*512kB (H) 29*1024kB (H)
>> 20*2048kB (H) 0*4096kB = 306704kB

And (H) everywhere confirms all the free memory in Normal zones is reserved
highatomic.

We have several paths where the reserved highatomic would shrink itself in
response to different allocatiosn struggling , and I recall some recent-ish
fixes in this area. But from a glance it seems none of them would be
relevant and just missing in 6.6 LTS. The "order:0 GFP_ATOMIC" case seems to
be missing a way to dip into the highatomic reserves and perhaps it should?

AFAICS:

- __zone_watermark_unusable_free() for ALLOC_RESERVES (which includes
ALLOC_NON_BLOCK which GFP_ATOMIC allocations have) does not subtract the
reserve_highatomic, so the allocations pass the watermarks
- but in rmqueue_buddy() only ALLOC_OOM is able to fallback into highatomic
- unreserve_highatomic_pageblock() is only called from reclaim and there's
no reclaim for GFP_ATOMIC

(also worth checking if kswapd even does anything if free > high, but it's
all highatomic, maybe not? so it can't help us here)

>> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> Node 4 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> Node 5 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> Node 6 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> Node 7 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> 9214746 total pagecache pages
>> 17797 pages in swap cache
>> Free swap  = 208645424kB
>> Total swap = 263402492kB
>> 67071581 pages RAM
>> 0 pages HighMem/MovableOnly
>> 1220888 pages reserved
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order-0 page alloc failures during interrupt context on v6.6.43
  2024-08-29 15:48   ` Vlastimil Babka
@ 2024-08-30 15:43     ` Matt Fleming
  2024-09-01 19:24       ` Vlastimil Babka
  0 siblings, 1 reply; 7+ messages in thread
From: Matt Fleming @ 2024-08-30 15:43 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Christoph Lameter (Ampere),
	willy, kernel-team, Mel Gorman, Johannes Weiner,
	Charan Teja Kalla

On Thu, Aug 29, 2024 at 4:48 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> AFAICS:
>
> - __zone_watermark_unusable_free() for ALLOC_RESERVES (which includes
> ALLOC_NON_BLOCK which GFP_ATOMIC allocations have) does not subtract the
> reserve_highatomic, so the allocations pass the watermarks
> - but in rmqueue_buddy() only ALLOC_OOM is able to fallback into highatomic
> - unreserve_highatomic_pageblock() is only called from reclaim and there's
> no reclaim for GFP_ATOMIC
>
> (also worth checking if kswapd even does anything if free > high, but it's
> all highatomic, maybe not? so it can't help us here)

As far as I can tell we'll wake kswapd but like you said because free
> high it thinks the pgdat is balanced.

How is the system supposed to recover in this situation? Wait for a
non-atomic alloc to fail and enter direct reclaim?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: order-0 page alloc failures during interrupt context on v6.6.43
  2024-08-30 15:43     ` Matt Fleming
@ 2024-09-01 19:24       ` Vlastimil Babka
  0 siblings, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2024-09-01 19:24 UTC (permalink / raw)
  To: Matt Fleming
  Cc: linux-mm, Christoph Lameter (Ampere),
	willy, kernel-team, Mel Gorman, Johannes Weiner,
	Charan Teja Kalla

On 8/30/24 17:43, Matt Fleming wrote:
> On Thu, Aug 29, 2024 at 4:48 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>> AFAICS:
>>
>> - __zone_watermark_unusable_free() for ALLOC_RESERVES (which includes
>> ALLOC_NON_BLOCK which GFP_ATOMIC allocations have) does not subtract the
>> reserve_highatomic, so the allocations pass the watermarks
>> - but in rmqueue_buddy() only ALLOC_OOM is able to fallback into highatomic
>> - unreserve_highatomic_pageblock() is only called from reclaim and there's
>> no reclaim for GFP_ATOMIC
>>
>> (also worth checking if kswapd even does anything if free > high, but it's
>> all highatomic, maybe not? so it can't help us here)
> 
> As far as I can tell we'll wake kswapd but like you said because free
>> high it thinks the pgdat is balanced.

Hm wonder if we should change that and kswapd should count free without
highatomic reserve.

> How is the system supposed to recover in this situation? Wait for a
> non-atomic alloc to fail and enter direct reclaim?

That's probably how it recovers but it's not how it's supposed to recover :)
Aside from changing the kswapd behavior, GFP_ATOMIC should likely be allowed
to fallback into highatomic reserve in rmqueue_buddy() same as ALLOC_OOM can?


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-09-01 19:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-22 20:02 order-0 page alloc failures during interrupt context on v6.6.43 Matt Fleming
2024-08-22 22:07 ` Matt Fleming
2024-08-29 15:48   ` Vlastimil Babka
2024-08-30 15:43     ` Matt Fleming
2024-09-01 19:24       ` Vlastimil Babka
2024-08-23 16:09 ` Christoph Lameter (Ampere)
2024-08-23 20:25   ` Matt Fleming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox