From: Mikhail Zaslonko <zaslonko@linux.ibm.com>
To: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Yosry Ahmed <yosry.ahmed@linux.dev>, Zi Yan <ziy@nvidia.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Usama Arif <usama.arif@linux.dev>,
Kiryl Shutsemau <kas@kernel.org>,
Dave Chinner <david@fromorbit.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-s390@vger.kernel.org,
Alexander Egorenkov <egorenar@linux.ibm.com>
Subject: Re: [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru - [s390] panic in __memcg_list_lru_alloc
Date: Mon, 30 Mar 2026 18:37:01 +0200 [thread overview]
Message-ID: <4d3f8d79-3593-47df-9de8-f94f7f09a403@linux.ibm.com> (raw)
In-Reply-To: <20260318200352.1039011-8-hannes@cmpxchg.org>
On 18-Mar-26 20:53, Johannes Weiner wrote:
> The deferred split queue handles cgroups in a suboptimal fashion.
> The queue is per-NUMA node or per-cgroup, not the intersection. That
> means on a cgrouped system, a node-restricted allocation entering
> reclaim can end up splitting large pages on other nodes:
>
> alloc/unmap deferred_split_folio() list_add_tail(memcg-
> >split_queue) set_shrinker_bit(memcg, node, deferred_shrinker_id)
>
> for_each_zone_zonelist_nodemask(restricted_nodes) mem_cgroup_iter()
> shrink_slab(node, memcg) shrink_slab_memcg(node, memcg) if
> test_shrinker_bit(memcg, node, deferred_shrinker_id)
> deferred_split_scan() walks memcg->split_queue
>
> The shrinker bit adds an imperfect guard rail. As soon as the
> cgroup has a single large page on the node of interest, all large
> pages owned by that memcg, including those on other nodes, will be
> split.
>
> list_lru properly sets up per-node, per-cgroup lists. As a bonus,
> it streamlines a lot of the list operations and reclaim walks. It's
> used widely by other major shrinkers already. Convert the deferred
> split queue as well.
>
> The list_lru per-memcg heads are instantiated on demand when the
> first object of interest is allocated for a cgroup, by calling
> folio_memcg_list_lru_alloc(). Add calls to where splittable pages
> are created: anon faults, swapin faults, khugepaged collapse.
>
> These calls create all possible node heads for the cgroup at once,
> so the migration code (between nodes) doesn't need any special care.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- include/
> linux/huge_mm.h | 6 +- include/linux/memcontrol.h | 4 -
> include/linux/mmzone.h | 12 -- mm/huge_memory.c | 342
> ++++++++++++------------------------- mm/internal.h | 2
> +- mm/khugepaged.c | 7 + mm/memcontrol.c | 12 +- mm/
> memory.c | 52 +++--- mm/ mm_init.c |
> 15 -- 9 files changed, 151 insertions(+), 301 deletions(-)
>
Hi,
with this series in linux-next (since next-20260324) I see a reproducible panic on s390 in the
dump kernel when running NVMe standalone dump (ngdump).
This only happens in the 'capture kernel', normal boot of the same kernel works fine.
[ 14.350676] Unable to handle kernel pointer dereference in virtual kernel address space
[ 14.350682] Failing address: 4000000000000000 TEID: 4000000000000803 ESOP-2 FSI
[ 14.350686] Fault in home space mode while using kernel ASCE.
[ 14.350689] AS:0000000002798007 R3:000000002d2c4007 S:000000002d2c3001 P:000000000000013d
[ 14.350730] Oops: 0038 ilc:3 [#1]SMP
[ 14.350735] Modules linked in: dm_service_time zfcp scsi_transport_fc uvdevice diag288_wdt nvme prng aes_s390 nvme_core des_s390 libdes zcrypt_cex4 dm_mirror dm_region_hash dm_log scsi_dh_rdac scsi_dh_emc scsi_dh_alua paes_s390 crypto_engine pkey_cca pkey_ep11 zcrypt rng_core pkey_pckmo pkey dm_multipath autofs4
[ 14.350760] CPU: 0 UID: 0 PID: 32 Comm: khugepaged Not tainted 7.0.0-rc5-next-20260324
[ 14.350762] Hardware name: IBM 3931 A01 704 (LPAR)
[ 14.350764] Krnl PSW : 0704d00180000000 000003ffe0443a82 (__memcg_list_lru_alloc+0x52/0x1d0)
[ 14.350774] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
[ 14.350776] Krnl GPRS: 0000000000000402 00000000000bece0 0000000000000000 000003ffe1c17928
[ 14.350778] 00000000001c24ca 0000000000000000 0000000000000000 000003ffe1c17948
[ 14.350780] 0000000000000000 00000000000824c0 0000037200098000 4000000000000000
[ 14.350782] 0000000000782400 0000000000000001 0000037fe00f39b8 0000037fe00f3918
[ 14.350788] Krnl Code: 000003ffe0443a72: a7690000 lghi %r6,0
[ 14.350788] 000003ffe0443a76: e380f0a00004 lg %r8,160(%r15)
[ 14.350788] *000003ffe0443a7c: e3b080b80004 lg %r11,184(%r8)
[ 14.350788] >000003ffe0443a82: e330b9400012 lt %r3,2368(%r11)
[ 14.350788] 000003ffe0443a88: a7a40065 brc 10,000003ffe0443b52
[ 14.350788] 000003ffe0443a8c: e3b0f0a00004 lg %r11,160(%r15)
[ 14.350788] 000003ffe0443a92: ec68006f007c cgij %r6,0,8,000003ffe0443b70
[ 14.350788] 000003ffe0443a98: e300b9400014 lgf %r0,2368(%r11)
[ 14.350825] Call Trace:
[ 14.350826] [<000003ffe0443a82>] __memcg_list_lru_alloc+0x52/0x1d0
[ 14.350831] [<000003ffe044529a>] folio_memcg_list_lru_alloc+0xba/0x150
[ 14.350834] [<000003ffe04f279a>] alloc_charge_folio+0x18a/0x250
[ 14.350839] [<000003ffe04f34dc>] collapse_huge_page+0x8c/0x890
[ 14.350841] [<000003ffe04f4222>] collapse_scan_pmd+0x542/0x690
[ 14.350844] [<000003ffe04f65b4>] collapse_single_pmd+0x144/0x240
[ 14.350847] [<000003ffe04f69ce>] collapse_scan_mm_slot.constprop.0+0x31e/0x480
[ 14.350849] [<000003ffe04f6d3c>] khugepaged+0x20c/0x210
[ 14.350852] [<000003ffe019b0a8>] kthread+0x148/0x170
[ 14.350856] [<000003ffe0119fec>] __ret_from_fork+0x3c/0x240
[ 14.350860] [<000003ffe0ffa4b2>] ret_from_fork+0xa/0x30
[ 14.350865] Last Breaking-Event-Address:
[ 14.350865] [<000003ffe0445294>] folio_memcg_list_lru_alloc+0xb4/0x150
[ 14.350870] Kernel panic - not syncing: Fatal exception: panic_on_oops
Environment:
Arch: s390x (IBM LPAR)
Kernel: next-20260324
Config: (can provide if needed)
Reproducible: always
Steps to Reproduce:
Install ngdump to an NVMe device partition via 'zipl -d' and initiate a dump (same issue with DASD ldipl-dump).
I have bisected to this specific commit.
good: 230bbdc110b3 ("mm: list_lru: introduce folio_memcg_list_lru_alloc()")
bad: b0f512f6e36c ("mm: switch deferred split shrinker to list_lru")
Reverting it on top of linux-next <next-20260327> restores normal Standalone Dump operation.
Let me know if I can provide any other data.
Thanks,
Mikhail
next prev parent reply other threads:[~2026-03-30 16:37 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-18 19:53 [PATCH v3 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
2026-03-18 19:53 ` [PATCH v3 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
2026-03-18 20:12 ` Shakeel Butt
2026-03-24 11:30 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 2/7] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
2026-03-24 11:32 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
2026-03-18 20:20 ` Shakeel Butt
2026-03-24 11:34 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 4/7] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
2026-03-18 20:22 ` Shakeel Butt
2026-03-24 11:36 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 5/7] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
2026-03-18 20:51 ` Shakeel Butt
2026-03-20 16:18 ` Johannes Weiner
2026-03-24 11:55 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc() Johannes Weiner
2026-03-18 20:52 ` Shakeel Butt
2026-03-18 21:01 ` Shakeel Butt
2026-03-24 12:01 ` Lorenzo Stoakes (Oracle)
2026-03-30 16:54 ` Johannes Weiner
2026-04-01 14:43 ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
2026-03-18 20:26 ` David Hildenbrand (Arm)
2026-03-18 23:18 ` Shakeel Butt
2026-03-24 13:48 ` Lorenzo Stoakes (Oracle)
2026-03-30 16:40 ` Johannes Weiner
2026-04-01 17:33 ` Lorenzo Stoakes (Oracle)
2026-04-06 21:37 ` Johannes Weiner
2026-04-07 9:55 ` Lorenzo Stoakes (Oracle)
2026-03-27 7:51 ` Kairui Song
2026-03-30 16:51 ` Johannes Weiner
2026-03-30 16:37 ` Mikhail Zaslonko [this message]
2026-03-30 19:03 ` [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru - [s390] panic in __memcg_list_lru_alloc Andrew Morton
2026-03-30 20:41 ` Johannes Weiner
2026-03-30 20:56 ` Johannes Weiner
2026-03-30 22:46 ` Vasily Gorbik
2026-03-31 8:04 ` Mikhail Zaslonko
2026-03-18 21:00 ` [PATCH v3 0/7] mm: switch THP shrinker to list_lru Lorenzo Stoakes (Oracle)
2026-03-18 22:31 ` Johannes Weiner
2026-03-19 8:47 ` Lorenzo Stoakes (Oracle)
2026-03-19 8:52 ` David Hildenbrand (Arm)
2026-03-19 11:45 ` Lorenzo Stoakes (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4d3f8d79-3593-47df-9de8-f94f7f09a403@linux.ibm.com \
--to=zaslonko@linux.ibm.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=david@kernel.org \
--cc=egorenar@linux.ibm.com \
--cc=hannes@cmpxchg.org \
--cc=kas@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=usama.arif@linux.dev \
--cc=yosry.ahmed@linux.dev \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox