Hi Matthew, Barry, > So, we try to do an order-3 allocation. kswapd runs and ... > succeeds in creating order-3 pages? Or fails to? From our reproducer runs, both happen. We observe intermittent order-3 successes, but also frequent high-order failures followed by order-0 fallback. > If it fails, that's something we need to sort out. Agreed. In this workload, the bottleneck appears to be contiguity, not raw reclaimable memory shortage. Order-0 memory remains available while suitable order-3 blocks are often unavailable. > If it succeeds, now we have several order-3 pages, great. But where do > they all go that we need to run kswapd again? In our runs, order-3 pockets do show up, but they do not last long. They get consumed quickly by ongoing skb demand, and the pressure returns. To investigate this, we built a reproducer that keeps creating memory fragments while the network stack continuously requests order-3 allocations.[1][2] Raw sample output (trimmed): --------------------------------------------------------------------------------------------------- TIME | BUDDYINFO (Normal Zone) | MEMINFO | KSWAPD CPU & VMSTAT --------------------------------------------------------------------------------------------------- 11:08:11 | ord0:11622 ord3:0 | Free:96MB Avail:1309MB | CPU: 10.0% scan:83107932 [*] PHASE 3: Triggering Order-3 Pressure (UDP Storm). 11:08:15 | ord0:52079 ord3:0 | Free:273MB Avail:1300MB | CPU: 90.9% scan:85328881 11:08:16 | ord0:102895 ord3:0 | Free:477MB Avail:1309MB | CPU: 60.0% scan:85873777 11:08:17 | ord0:115459 ord3:5 | Free:517MB Avail:1284MB | CPU: 54.5% scan:86584389 11:08:18 | ord0:115164 ord3:0 | Free:509MB Avail:1107MB | CPU: 36.4% scan:87083561 --------------------------------------------------------------------------------------------------- The current phenomenon we observed is: Free memory is plentiful, Order-0 pages are abundant, and the network allocation has already successfully entered the fallback-to-order-0 path. Everything seems normal on the surface, yet kswapd remains trapped in a futile loop. It appears that kswapd is stuck in the following logic: wakeup_kswapd -> pgdat_balance -> __zone_watermark_ok. Specifically, in __zone_watermark_ok(): /* For a high-order request, check at least one suitable page is free */ for (o = order; o < NR_PAGE_ORDERS; o++) { struct free_area *area = &z->free_area[o]; int mt; if (!area->nr_free) continue; for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) { if (!free_area_empty(area, mt)) return true; } } Because our reproducer keeps creating fragmentation while the network stack requests order-3, this loop continues to return 'false' for the high-order requirement, even though the system is functionally fine with order-0. To be clear, we are not intentionally creating "artificial" fragments just for the sake of it. Rather, we designed this reproducer to effectively stress-test and expose the existing feedback gap in the reclaim/compaction logic—helping to pinpoint why kswapd continues thumping CPU cycles to satisfy a watermark that the allocator has already abandoned in favor of order-0 fallback. A related discussion in [3] helps reduce vmpressure noise in this area. Useful, but it does not close the contiguity gap by itself: high-order wake/reclaim can still repeat when contiguous blocks cannot be formed. It seems the current situation directs us to take a much closer look at how kswapd behaves in these scenarios. After carefully reviewing everyone's input, we believe it is time to do some targeted work on handling these high-order page issues. We already have some rough ideas and plan to conduct further experiments in this area. We would appreciate a broader discussion to help address this potential oversight that we might have collectively missed. Links: [1] https://github.com/hack-kernel-just-for-fun/kswap/blob/main/kswapd_spin_repro.c [2] https://github.com/hack-kernel-just-for-fun/kswap/blob/main/kswapd.sh [3] https://lore.kernel.org/all/20260406195014.112521-1-jp.kobryn@linux.dev/#r This was reproduced and cross-checked independently by our team (Wang Lian and Kunwu Chan ). -- Best Regards, wang lian