* [REGRESSION] Null pointer dereference while shrinking zswap
@ 2024-04-16 12:19 Christian Heusel
2024-04-16 19:18 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Christian Heusel @ 2024-04-16 12:19 UTC (permalink / raw)
To: Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton,
linux-mm, linux-kernel
Cc: David Runge, Richard W.M. Jones, Mark W, regressions
[-- Attachment #1: Type: text/plain, Size: 3596 bytes --]
Hello everyone,
while rebuilding a few packages in Arch Linux we have recently come
across a regression in the linux kernel which was made visible by a test
failure in libguestfs[0], where the booted kernel showed a Call Trace
like the following one:
[ 218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M
[ 218.739007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014^M
[ 218.739787] RIP: 0010:memcg_page_state+0x9/0x30^M
[ 218.740299] Code: 0d b8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48^M
[ 218.740727] RSP: 0018:ffffb5fa808dfc10 EFLAGS: 00000202^M
[ 218.740862] RAX: 0000000000000000 RBX: ffffb5fa808dfce0 RCX: 0000000000000002^M
[ 218.741016] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000^M
[ 218.741168] RBP: 0000000000000000 R08: ffff976681ff8000 R09: 0000000000000000^M
[ 218.741322] R10: 0000000000000001 R11: ffff9766833f9d00 R12: ffff9766ffffe780^M
[ 218.742167] R13: 0000000000000000 R14: ffff976680cc1800 R15: ffff976682204d80^M
[ 218.742376] FS: 00007f1479d9f540(0000) GS:ffff9766fbc00000(0000) knlGS:0000000000000000^M
[ 218.742569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[ 218.743256] CR2: 0000000000000600 CR3: 0000000103606000 CR4: 0000000000750ef0^M
[ 218.743494] PKRU: 55555554^M
[ 218.743593] Call Trace:^M
[ 218.743733] <TASK>^M
[ 218.743847] ? __die+0x23/0x70^M
[ 218.743957] ? page_fault_oops+0x171/0x4e0^M
[ 218.744056] ? free_unref_page+0xf6/0x180^M
[ 218.744458] ? exc_page_fault+0x7f/0x180^M
[ 218.744551] ? asm_exc_page_fault+0x26/0x30^M
[ 218.744684] ? memcg_page_state+0x9/0x30^M
[ 218.744779] zswap_shrinker_count+0x9d/0x110^M
[ 218.744896] do_shrink_slab+0x3a/0x360^M
[ 218.744990] shrink_slab+0xc7/0x3c0^M
[ 218.745609] drop_slab+0x85/0x140^M
[ 218.745691] drop_caches_sysctl_handler+0x7e/0xd0^M
[ 218.745799] proc_sys_call_handler+0x1c0/0x2e0^M
[ 218.745912] vfs_write+0x23d/0x400^M
[ 218.745998] ksys_write+0x6f/0xf0^M
[ 218.746080] do_syscall_64+0x64/0xe0^M
[ 218.746169] ? exit_to_user_mode_prepare+0x132/0x1f0^M
[ 218.746873] entry_SYSCALL_64_after_hwframe+0x6e/0x76^M
The regression is present in the mainline kernel and also was
independently reported to the redhat bugtracker[1].
I have bisected (see log[2]) the regression between v6.9-rc4 and v6.6
and have landed on the following results (removed unrelated test commit)
as remainders since some of the commits were not buildable for me:
- 7108cc3f765c ("mm: memcg: add per-memcg zswap writeback stat")
- a65b0e7607cc ("zswap: make shrinking memcg-aware")
- b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
I have decided on good/bad commits with the relevant libguestfs tests,
but I think the reproducer in the redhat bugzilla is simpler (although I
only became aware of it during the bisection and therefore didn't test
it myself):
LIBGUESTFS_MEMSIZE=4096 LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 make -C /build/libguestfs/src/libguestfs-1.52.0/tests -k check TESTS=c-api/tests
I hope I have included everything needed to debug this further, if there
is more to add I'm happy to provide more details!
Cheers,
Christian
[0]: https://github.com/libguestfs/libguestfs/issues/139
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2275252
[2]: https://gist.github.com/christian-heusel/d5095c36b72ae90871e27dfed32ddc46
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-16 12:19 [REGRESSION] Null pointer dereference while shrinking zswap Christian Heusel @ 2024-04-16 19:18 ` Andrew Morton 2024-04-16 19:57 ` Christian Heusel 2024-04-16 22:14 ` Nhat Pham 2024-04-17 0:33 ` Nhat Pham 2 siblings, 1 reply; 17+ messages in thread From: Andrew Morton @ 2024-04-16 19:18 UTC (permalink / raw) To: Christian Heusel Cc: Seth Jennings, Dan Streetman, Vitaly Wool, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions, Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou On Tue, 16 Apr 2024 14:19:24 +0200 Christian Heusel <christian@heusel.eu> wrote: > Hello everyone, > > while rebuilding a few packages in Arch Linux we have recently come > across a regression in the linux kernel which was made visible by a test > failure in libguestfs[0], where the booted kernel showed a Call Trace > like the following one: > > [ 218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M (cc more zswap people) Fairly old kernel. We might have fixed it since then, but it would be good to know what fixed this, for backporting reasons. > [ 218.739007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014^M > [ 218.739787] RIP: 0010:memcg_page_state+0x9/0x30^M > [ 218.740299] Code: 0d b8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48^M > [ 218.740727] RSP: 0018:ffffb5fa808dfc10 EFLAGS: 00000202^M > [ 218.740862] RAX: 0000000000000000 RBX: ffffb5fa808dfce0 RCX: 0000000000000002^M > [ 218.741016] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000^M > [ 218.741168] RBP: 0000000000000000 R08: ffff976681ff8000 R09: 0000000000000000^M > [ 218.741322] R10: 0000000000000001 R11: ffff9766833f9d00 R12: ffff9766ffffe780^M > [ 218.742167] R13: 0000000000000000 R14: ffff976680cc1800 R15: ffff976682204d80^M > [ 218.742376] FS: 00007f1479d9f540(0000) GS:ffff9766fbc00000(0000) knlGS:0000000000000000^M > [ 218.742569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M > [ 218.743256] CR2: 0000000000000600 CR3: 0000000103606000 CR4: 0000000000750ef0^M > [ 218.743494] PKRU: 55555554^M > [ 218.743593] Call Trace:^M > [ 218.743733] <TASK>^M > [ 218.743847] ? __die+0x23/0x70^M > [ 218.743957] ? page_fault_oops+0x171/0x4e0^M > [ 218.744056] ? free_unref_page+0xf6/0x180^M > [ 218.744458] ? exc_page_fault+0x7f/0x180^M > [ 218.744551] ? asm_exc_page_fault+0x26/0x30^M > [ 218.744684] ? memcg_page_state+0x9/0x30^M > [ 218.744779] zswap_shrinker_count+0x9d/0x110^M > [ 218.744896] do_shrink_slab+0x3a/0x360^M > [ 218.744990] shrink_slab+0xc7/0x3c0^M > [ 218.745609] drop_slab+0x85/0x140^M > [ 218.745691] drop_caches_sysctl_handler+0x7e/0xd0^M > [ 218.745799] proc_sys_call_handler+0x1c0/0x2e0^M > [ 218.745912] vfs_write+0x23d/0x400^M > [ 218.745998] ksys_write+0x6f/0xf0^M > [ 218.746080] do_syscall_64+0x64/0xe0^M > [ 218.746169] ? exit_to_user_mode_prepare+0x132/0x1f0^M > [ 218.746873] entry_SYSCALL_64_after_hwframe+0x6e/0x76^M > > The regression is present in the mainline kernel and also was > independently reported to the redhat bugtracker[1]. > > I have bisected (see log[2]) the regression between v6.9-rc4 and v6.6 > and have landed on the following results (removed unrelated test commit) > as remainders since some of the commits were not buildable for me: > - 7108cc3f765c ("mm: memcg: add per-memcg zswap writeback stat") > - a65b0e7607cc ("zswap: make shrinking memcg-aware") > - b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") > > I have decided on good/bad commits with the relevant libguestfs tests, > but I think the reproducer in the redhat bugzilla is simpler (although I > only became aware of it during the bisection and therefore didn't test > it myself): > > LIBGUESTFS_MEMSIZE=4096 LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 make -C /build/libguestfs/src/libguestfs-1.52.0/tests -k check TESTS=c-api/tests > > I hope I have included everything needed to debug this further, if there > is more to add I'm happy to provide more details! > > Cheers, > Christian > > [0]: https://github.com/libguestfs/libguestfs/issues/139 > [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2275252 > [2]: https://gist.github.com/christian-heusel/d5095c36b72ae90871e27dfed32ddc46 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-16 19:18 ` Andrew Morton @ 2024-04-16 19:57 ` Christian Heusel 0 siblings, 0 replies; 17+ messages in thread From: Christian Heusel @ 2024-04-16 19:57 UTC (permalink / raw) To: Andrew Morton Cc: Seth Jennings, Dan Streetman, Vitaly Wool, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions, Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou [-- Attachment #1: Type: text/plain, Size: 783 bytes --] On 24/04/16 12:18PM, Andrew Morton wrote: > On Tue, 16 Apr 2024 14:19:24 +0200 Christian Heusel <christian@heusel.eu> wrote: > > [ 218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M > > (cc more zswap people) > > Fairly old kernel. We might have fixed it since then, but it would be > good to know what fixed this, for backporting reasons. Hey Andrew, I guess you must have missed the end of my previous mail, I have verified that the bug is still present with with the latest mainline release and bisected the issue down to a few commits. The trace I was posting above was just a random one from during the bisection. Sorry if my formatting was a bit confusing .. :) Cheers, Christian [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-16 12:19 [REGRESSION] Null pointer dereference while shrinking zswap Christian Heusel 2024-04-16 19:18 ` Andrew Morton @ 2024-04-16 22:14 ` Nhat Pham 2024-04-16 22:29 ` Christian Heusel 2024-04-16 23:29 ` Nhat Pham 2024-04-17 0:33 ` Nhat Pham 2 siblings, 2 replies; 17+ messages in thread From: Nhat Pham @ 2024-04-16 22:14 UTC (permalink / raw) To: Christian Heusel Cc: Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions, Johannes Weiner, Yosry Ahmed, Chengming Zhou On Tue, Apr 16, 2024 at 5:19 AM Christian Heusel <christian@heusel.eu> wrote: > > Hello everyone, Thanks for the report, Christian! Looking at it now. > > while rebuilding a few packages in Arch Linux we have recently come > across a regression in the linux kernel which was made visible by a test > failure in libguestfs[0], where the booted kernel showed a Call Trace > like the following one: > > [ 218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M Is this one of the kernel versions that was broken? That looks a bit odd, as zswap shrinker landed on 6.8... > [ 218.739007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014^M > [ 218.739787] RIP: 0010:memcg_page_state+0x9/0x30^M > [ 218.740299] Code: 0d b8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48^M > [ 218.740727] RSP: 0018:ffffb5fa808dfc10 EFLAGS: 00000202^M > [ 218.740862] RAX: 0000000000000000 RBX: ffffb5fa808dfce0 RCX: 0000000000000002^M > [ 218.741016] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000^M > [ 218.741168] RBP: 0000000000000000 R08: ffff976681ff8000 R09: 0000000000000000^M > [ 218.741322] R10: 0000000000000001 R11: ffff9766833f9d00 R12: ffff9766ffffe780^M > [ 218.742167] R13: 0000000000000000 R14: ffff976680cc1800 R15: ffff976682204d80^M > [ 218.742376] FS: 00007f1479d9f540(0000) GS:ffff9766fbc00000(0000) knlGS:0000000000000000^M > [ 218.742569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M > [ 218.743256] CR2: 0000000000000600 CR3: 0000000103606000 CR4: 0000000000750ef0^M > [ 218.743494] PKRU: 55555554^M > [ 218.743593] Call Trace:^M > [ 218.743733] <TASK>^M > [ 218.743847] ? __die+0x23/0x70^M > [ 218.743957] ? page_fault_oops+0x171/0x4e0^M > [ 218.744056] ? free_unref_page+0xf6/0x180^M > [ 218.744458] ? exc_page_fault+0x7f/0x180^M > [ 218.744551] ? asm_exc_page_fault+0x26/0x30^M > [ 218.744684] ? memcg_page_state+0x9/0x30^M > [ 218.744779] zswap_shrinker_count+0x9d/0x110^M > [ 218.744896] do_shrink_slab+0x3a/0x360^M > [ 218.744990] shrink_slab+0xc7/0x3c0^M > [ 218.745609] drop_slab+0x85/0x140^M > [ 218.745691] drop_caches_sysctl_handler+0x7e/0xd0^M > [ 218.745799] proc_sys_call_handler+0x1c0/0x2e0^M > [ 218.745912] vfs_write+0x23d/0x400^M > [ 218.745998] ksys_write+0x6f/0xf0^M > [ 218.746080] do_syscall_64+0x64/0xe0^M > [ 218.746169] ? exit_to_user_mode_prepare+0x132/0x1f0^M > [ 218.746873] entry_SYSCALL_64_after_hwframe+0x6e/0x76^M > > The regression is present in the mainline kernel and also was > independently reported to the redhat bugtracker[1]. > > I have bisected (see log[2]) the regression between v6.9-rc4 and v6.6 > and have landed on the following results (removed unrelated test commit) > as remainders since some of the commits were not buildable for me: > - 7108cc3f765c ("mm: memcg: add per-memcg zswap writeback stat") > - a65b0e7607cc ("zswap: make shrinking memcg-aware") > - b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-16 22:14 ` Nhat Pham @ 2024-04-16 22:29 ` Christian Heusel 2024-04-16 23:29 ` Nhat Pham 1 sibling, 0 replies; 17+ messages in thread From: Christian Heusel @ 2024-04-16 22:29 UTC (permalink / raw) To: Nhat Pham Cc: Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions, Johannes Weiner, Yosry Ahmed, Chengming Zhou [-- Attachment #1: Type: text/plain, Size: 977 bytes --] On 24/04/16 03:14PM, Nhat Pham wrote: > Is this one of the kernel versions that was broken? That looks a bit > odd, as zswap shrinker landed on 6.8... Yes that is the build at commit b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") which is one of the kernel versions that is broken, although the version string seems a bit confusing. This is due to the build script invoking "make -s kernelrelease" which gives the following: $ git checkout b5ba474f3f51 [...] $ make -s kernelrelease 6.7.0-rc4-00158-gb5ba474f3f51 The commit is contained in v6.8+ only as you say, so its just confusing in the version string: $ git tag --contains b5ba474f3f51 | grep -v next v6.8 v6.8-rc1 v6.8-rc2 v6.8-rc3 v6.8-rc4 v6.8-rc5 v6.8-rc6 v6.8-rc7 v6.9-rc1 v6.9-rc2 v6.9-rc3 v6.9-rc4 Please see my initial report for the detailed bisection results and the link there to the bisection log. Cheers and thanks for looking into this! Christian [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-16 22:14 ` Nhat Pham 2024-04-16 22:29 ` Christian Heusel @ 2024-04-16 23:29 ` Nhat Pham 2024-04-17 0:22 ` Nhat Pham 1 sibling, 1 reply; 17+ messages in thread From: Nhat Pham @ 2024-04-16 23:29 UTC (permalink / raw) To: Christian Heusel Cc: Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions, Johannes Weiner, Yosry Ahmed, Chengming Zhou On Tue, Apr 16, 2024 at 3:14 PM Nhat Pham <nphamcs@gmail.com> wrote: > > On Tue, Apr 16, 2024 at 5:19 AM Christian Heusel <christian@heusel.eu> wrote: > > > > Hello everyone, > > Thanks for the report, Christian! Looking at it now. > > > > > while rebuilding a few packages in Arch Linux we have recently come > > across a regression in the linux kernel which was made visible by a test > > failure in libguestfs[0], where the booted kernel showed a Call Trace > > like the following one: > > > > [ 218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M > > Is this one of the kernel versions that was broken? That looks a bit > odd, as zswap shrinker landed on 6.8... Ah ignore this - I understand the versioning now... > > > [ 218.739007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014^M > > [ 218.739787] RIP: 0010:memcg_page_state+0x9/0x30^M > > [ 218.740299] Code: 0d b8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48^M > > [ 218.740727] RSP: 0018:ffffb5fa808dfc10 EFLAGS: 00000202^M > > [ 218.740862] RAX: 0000000000000000 RBX: ffffb5fa808dfce0 RCX: 0000000000000002^M > > [ 218.741016] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000^M > > [ 218.741168] RBP: 0000000000000000 R08: ffff976681ff8000 R09: 0000000000000000^M > > [ 218.741322] R10: 0000000000000001 R11: ffff9766833f9d00 R12: ffff9766ffffe780^M > > [ 218.742167] R13: 0000000000000000 R14: ffff976680cc1800 R15: ffff976682204d80^M > > [ 218.742376] FS: 00007f1479d9f540(0000) GS:ffff9766fbc00000(0000) knlGS:0000000000000000^M > > [ 218.742569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M > > [ 218.743256] CR2: 0000000000000600 CR3: 0000000103606000 CR4: 0000000000750ef0^M > > [ 218.743494] PKRU: 55555554^M > > [ 218.743593] Call Trace:^M > > [ 218.743733] <TASK>^M > > [ 218.743847] ? __die+0x23/0x70^M > > [ 218.743957] ? page_fault_oops+0x171/0x4e0^M > > [ 218.744056] ? free_unref_page+0xf6/0x180^M > > [ 218.744458] ? exc_page_fault+0x7f/0x180^M > > [ 218.744551] ? asm_exc_page_fault+0x26/0x30^M > > [ 218.744684] ? memcg_page_state+0x9/0x30^M > > [ 218.744779] zswap_shrinker_count+0x9d/0x110^M > > [ 218.744896] do_shrink_slab+0x3a/0x360^M > > [ 218.744990] shrink_slab+0xc7/0x3c0^M > > [ 218.745609] drop_slab+0x85/0x140^M > > [ 218.745691] drop_caches_sysctl_handler+0x7e/0xd0^M > > [ 218.745799] proc_sys_call_handler+0x1c0/0x2e0^M > > [ 218.745912] vfs_write+0x23d/0x400^M > > [ 218.745998] ksys_write+0x6f/0xf0^M > > [ 218.746080] do_syscall_64+0x64/0xe0^M > > [ 218.746169] ? exit_to_user_mode_prepare+0x132/0x1f0^M > > [ 218.746873] entry_SYSCALL_64_after_hwframe+0x6e/0x76^M > > > > The regression is present in the mainline kernel and also was > > independently reported to the redhat bugtracker[1]. > > > > I have bisected (see log[2]) the regression between v6.9-rc4 and v6.6 > > and have landed on the following results (removed unrelated test commit) > > as remainders since some of the commits were not buildable for me: > > - 7108cc3f765c ("mm: memcg: add per-memcg zswap writeback stat") > > - a65b0e7607cc ("zswap: make shrinking memcg-aware") > > - b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Hmmm, this smells like a memcg reference counting bug to me. The memcg was successfully accessed just a couple of LoCs above, but failed after the stats flushing. My suspicion right now is the new memcg refcnt dance to select an memcg for the global, capacity-based, shrinker, introduced in this patch series. That dance looks solid to me tho - not sure where it goes wrong. And, if it's a reference counting issue, that should pop up in different places (i.e other mysterious memcg traces)... ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-16 23:29 ` Nhat Pham @ 2024-04-17 0:22 ` Nhat Pham 2024-04-17 3:44 ` Chengming Zhou 0 siblings, 1 reply; 17+ messages in thread From: Nhat Pham @ 2024-04-17 0:22 UTC (permalink / raw) To: Christian Heusel Cc: Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions, Johannes Weiner, Yosry Ahmed, Chengming Zhou On Tue, Apr 16, 2024 at 4:29 PM Nhat Pham <nphamcs@gmail.com> wrote: > > On Tue, Apr 16, 2024 at 3:14 PM Nhat Pham <nphamcs@gmail.com> wrote: > > > > On Tue, Apr 16, 2024 at 5:19 AM Christian Heusel <christian@heusel.eu> wrote: > > > > > > Hello everyone, > > > > Thanks for the report, Christian! Looking at it now. > > > > > > > > while rebuilding a few packages in Arch Linux we have recently come > > > across a regression in the linux kernel which was made visible by a test > > > failure in libguestfs[0], where the booted kernel showed a Call Trace > > > like the following one: > > > > > > [ 218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M > > > > Is this one of the kernel versions that was broken? That looks a bit > > odd, as zswap shrinker landed on 6.8... > > Ah ignore this - I understand the versioning now... > > > > > > [ 218.739007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014^M > > > [ 218.739787] RIP: 0010:memcg_page_state+0x9/0x30^M > > > [ 218.740299] Code: 0d b8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48^M > > > [ 218.740727] RSP: 0018:ffffb5fa808dfc10 EFLAGS: 00000202^M > > > [ 218.740862] RAX: 0000000000000000 RBX: ffffb5fa808dfce0 RCX: 0000000000000002^M > > > [ 218.741016] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000^M > > > [ 218.741168] RBP: 0000000000000000 R08: ffff976681ff8000 R09: 0000000000000000^M > > > [ 218.741322] R10: 0000000000000001 R11: ffff9766833f9d00 R12: ffff9766ffffe780^M > > > [ 218.742167] R13: 0000000000000000 R14: ffff976680cc1800 R15: ffff976682204d80^M > > > [ 218.742376] FS: 00007f1479d9f540(0000) GS:ffff9766fbc00000(0000) knlGS:0000000000000000^M > > > [ 218.742569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M > > > [ 218.743256] CR2: 0000000000000600 CR3: 0000000103606000 CR4: 0000000000750ef0^M > > > [ 218.743494] PKRU: 55555554^M > > > [ 218.743593] Call Trace:^M > > > [ 218.743733] <TASK>^M > > > [ 218.743847] ? __die+0x23/0x70^M > > > [ 218.743957] ? page_fault_oops+0x171/0x4e0^M > > > [ 218.744056] ? free_unref_page+0xf6/0x180^M > > > [ 218.744458] ? exc_page_fault+0x7f/0x180^M > > > [ 218.744551] ? asm_exc_page_fault+0x26/0x30^M > > > [ 218.744684] ? memcg_page_state+0x9/0x30^M > > > [ 218.744779] zswap_shrinker_count+0x9d/0x110^M > > > [ 218.744896] do_shrink_slab+0x3a/0x360^M > > > [ 218.744990] shrink_slab+0xc7/0x3c0^M > > > [ 218.745609] drop_slab+0x85/0x140^M > > > [ 218.745691] drop_caches_sysctl_handler+0x7e/0xd0^M > > > [ 218.745799] proc_sys_call_handler+0x1c0/0x2e0^M > > > [ 218.745912] vfs_write+0x23d/0x400^M > > > [ 218.745998] ksys_write+0x6f/0xf0^M > > > [ 218.746080] do_syscall_64+0x64/0xe0^M > > > [ 218.746169] ? exit_to_user_mode_prepare+0x132/0x1f0^M > > > [ 218.746873] entry_SYSCALL_64_after_hwframe+0x6e/0x76^M > > > Actually, inspecting the code a bit more - can memcg be null here? Specifically, if mem_cgroup_disabled() is true, can we see null memcg here? Looks like in this case, mem_cgroup_iter() can return null, and the first iteration of drop_slab_node() can have null memcg if it's returned by mem_cgroup_iter(). shrink_slab() will still proceed and call do_shrink_slab() if the memcg is null - provided that mem_cgroup_disabled() holds: if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) return shrink_slab_memcg(gfp_mask, nid, memcg, priority); Inside zswap_shrink_count(), all the memcg accessors in this area seem to always check for null memcg (mem_cgroup_lruvec, mem_cgroup_zswap_writeback_enabled), *except* memcg_page_state, which is the one line that fail. If this is all to it, we can simply add a null check or mem_cgroup_disabled() check here, and use pool stats instead? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-17 0:22 ` Nhat Pham @ 2024-04-17 3:44 ` Chengming Zhou 2024-04-17 14:33 ` Johannes Weiner 0 siblings, 1 reply; 17+ messages in thread From: Chengming Zhou @ 2024-04-17 3:44 UTC (permalink / raw) To: Nhat Pham, Christian Heusel Cc: Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions, Johannes Weiner, Yosry Ahmed On 2024/4/17 08:22, Nhat Pham wrote: > On Tue, Apr 16, 2024 at 4:29 PM Nhat Pham <nphamcs@gmail.com> wrote: >> >> On Tue, Apr 16, 2024 at 3:14 PM Nhat Pham <nphamcs@gmail.com> wrote: >>> >>> On Tue, Apr 16, 2024 at 5:19 AM Christian Heusel <christian@heusel.eu> wrote: >>>> >>>> Hello everyone, >>> >>> Thanks for the report, Christian! Looking at it now. >>> >>>> >>>> while rebuilding a few packages in Arch Linux we have recently come >>>> across a regression in the linux kernel which was made visible by a test >>>> failure in libguestfs[0], where the booted kernel showed a Call Trace >>>> like the following one: >>>> >>>> [ 218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M >>> >>> Is this one of the kernel versions that was broken? That looks a bit >>> odd, as zswap shrinker landed on 6.8... >> >> Ah ignore this - I understand the versioning now... >> >>> >>>> [ 218.739007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014^M >>>> [ 218.739787] RIP: 0010:memcg_page_state+0x9/0x30^M >>>> [ 218.740299] Code: 0d b8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48^M >>>> [ 218.740727] RSP: 0018:ffffb5fa808dfc10 EFLAGS: 00000202^M >>>> [ 218.740862] RAX: 0000000000000000 RBX: ffffb5fa808dfce0 RCX: 0000000000000002^M >>>> [ 218.741016] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000^M >>>> [ 218.741168] RBP: 0000000000000000 R08: ffff976681ff8000 R09: 0000000000000000^M >>>> [ 218.741322] R10: 0000000000000001 R11: ffff9766833f9d00 R12: ffff9766ffffe780^M >>>> [ 218.742167] R13: 0000000000000000 R14: ffff976680cc1800 R15: ffff976682204d80^M >>>> [ 218.742376] FS: 00007f1479d9f540(0000) GS:ffff9766fbc00000(0000) knlGS:0000000000000000^M >>>> [ 218.742569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M >>>> [ 218.743256] CR2: 0000000000000600 CR3: 0000000103606000 CR4: 0000000000750ef0^M >>>> [ 218.743494] PKRU: 55555554^M >>>> [ 218.743593] Call Trace:^M >>>> [ 218.743733] <TASK>^M >>>> [ 218.743847] ? __die+0x23/0x70^M >>>> [ 218.743957] ? page_fault_oops+0x171/0x4e0^M >>>> [ 218.744056] ? free_unref_page+0xf6/0x180^M >>>> [ 218.744458] ? exc_page_fault+0x7f/0x180^M >>>> [ 218.744551] ? asm_exc_page_fault+0x26/0x30^M >>>> [ 218.744684] ? memcg_page_state+0x9/0x30^M >>>> [ 218.744779] zswap_shrinker_count+0x9d/0x110^M >>>> [ 218.744896] do_shrink_slab+0x3a/0x360^M >>>> [ 218.744990] shrink_slab+0xc7/0x3c0^M >>>> [ 218.745609] drop_slab+0x85/0x140^M >>>> [ 218.745691] drop_caches_sysctl_handler+0x7e/0xd0^M >>>> [ 218.745799] proc_sys_call_handler+0x1c0/0x2e0^M >>>> [ 218.745912] vfs_write+0x23d/0x400^M >>>> [ 218.745998] ksys_write+0x6f/0xf0^M >>>> [ 218.746080] do_syscall_64+0x64/0xe0^M >>>> [ 218.746169] ? exit_to_user_mode_prepare+0x132/0x1f0^M >>>> [ 218.746873] entry_SYSCALL_64_after_hwframe+0x6e/0x76^M >>>> > > Actually, inspecting the code a bit more - can memcg be null here? > > Specifically, if mem_cgroup_disabled() is true, can we see null memcg > here? Looks like in this case, mem_cgroup_iter() can return null, and > the first iteration of drop_slab_node() can have null memcg if it's > returned by mem_cgroup_iter(). > > shrink_slab() will still proceed and call do_shrink_slab() if the > memcg is null - provided that mem_cgroup_disabled() holds: > > if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) > return shrink_slab_memcg(gfp_mask, nid, memcg, priority); > Ah, I think your analysis is very right, here memcg is NULL but memcg_page_state won't check. > Inside zswap_shrink_count(), all the memcg accessors in this area seem > to always check for null memcg (mem_cgroup_lruvec, > mem_cgroup_zswap_writeback_enabled), *except* memcg_page_state, which > is the one line that fail. > > If this is all to it, we can simply add a null check or > mem_cgroup_disabled() check here, and use pool stats instead? Both look ok to me. The VM could only set sc->memcg to NULL when memcg disabled, right? Thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-17 3:44 ` Chengming Zhou @ 2024-04-17 14:33 ` Johannes Weiner 2024-04-17 15:08 ` Richard W.M. Jones 2024-04-17 17:18 ` Christian Heusel 0 siblings, 2 replies; 17+ messages in thread From: Johannes Weiner @ 2024-04-17 14:33 UTC (permalink / raw) To: Chengming Zhou Cc: Nhat Pham, Christian Heusel, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions, Yosry Ahmed On Wed, Apr 17, 2024 at 11:44:45AM +0800, Chengming Zhou wrote: > On 2024/4/17 08:22, Nhat Pham wrote: > > On Tue, Apr 16, 2024 at 4:29 PM Nhat Pham <nphamcs@gmail.com> wrote: > >> > >> On Tue, Apr 16, 2024 at 3:14 PM Nhat Pham <nphamcs@gmail.com> wrote: > >>> > >>> On Tue, Apr 16, 2024 at 5:19 AM Christian Heusel <christian@heusel.eu> wrote: > >>>> > >>>> Hello everyone, > >>> > >>> Thanks for the report, Christian! Looking at it now. > >>> > >>>> > >>>> while rebuilding a few packages in Arch Linux we have recently come > >>>> across a regression in the linux kernel which was made visible by a test > >>>> failure in libguestfs[0], where the booted kernel showed a Call Trace > >>>> like the following one: > >>>> > >>>> [ 218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M > >>> > >>> Is this one of the kernel versions that was broken? That looks a bit > >>> odd, as zswap shrinker landed on 6.8... > >> > >> Ah ignore this - I understand the versioning now... > >> > >>> > >>>> [ 218.739007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014^M > >>>> [ 218.739787] RIP: 0010:memcg_page_state+0x9/0x30^M > >>>> [ 218.740299] Code: 0d b8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48^M > >>>> [ 218.740727] RSP: 0018:ffffb5fa808dfc10 EFLAGS: 00000202^M > >>>> [ 218.740862] RAX: 0000000000000000 RBX: ffffb5fa808dfce0 RCX: 0000000000000002^M > >>>> [ 218.741016] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000^M > >>>> [ 218.741168] RBP: 0000000000000000 R08: ffff976681ff8000 R09: 0000000000000000^M > >>>> [ 218.741322] R10: 0000000000000001 R11: ffff9766833f9d00 R12: ffff9766ffffe780^M > >>>> [ 218.742167] R13: 0000000000000000 R14: ffff976680cc1800 R15: ffff976682204d80^M > >>>> [ 218.742376] FS: 00007f1479d9f540(0000) GS:ffff9766fbc00000(0000) knlGS:0000000000000000^M > >>>> [ 218.742569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M > >>>> [ 218.743256] CR2: 0000000000000600 CR3: 0000000103606000 CR4: 0000000000750ef0^M > >>>> [ 218.743494] PKRU: 55555554^M > >>>> [ 218.743593] Call Trace:^M > >>>> [ 218.743733] <TASK>^M > >>>> [ 218.743847] ? __die+0x23/0x70^M > >>>> [ 218.743957] ? page_fault_oops+0x171/0x4e0^M > >>>> [ 218.744056] ? free_unref_page+0xf6/0x180^M > >>>> [ 218.744458] ? exc_page_fault+0x7f/0x180^M > >>>> [ 218.744551] ? asm_exc_page_fault+0x26/0x30^M > >>>> [ 218.744684] ? memcg_page_state+0x9/0x30^M > >>>> [ 218.744779] zswap_shrinker_count+0x9d/0x110^M > >>>> [ 218.744896] do_shrink_slab+0x3a/0x360^M > >>>> [ 218.744990] shrink_slab+0xc7/0x3c0^M > >>>> [ 218.745609] drop_slab+0x85/0x140^M > >>>> [ 218.745691] drop_caches_sysctl_handler+0x7e/0xd0^M > >>>> [ 218.745799] proc_sys_call_handler+0x1c0/0x2e0^M > >>>> [ 218.745912] vfs_write+0x23d/0x400^M > >>>> [ 218.745998] ksys_write+0x6f/0xf0^M > >>>> [ 218.746080] do_syscall_64+0x64/0xe0^M > >>>> [ 218.746169] ? exit_to_user_mode_prepare+0x132/0x1f0^M > >>>> [ 218.746873] entry_SYSCALL_64_after_hwframe+0x6e/0x76^M > >>>> > > > > Actually, inspecting the code a bit more - can memcg be null here? > > > > Specifically, if mem_cgroup_disabled() is true, can we see null memcg > > here? Looks like in this case, mem_cgroup_iter() can return null, and > > the first iteration of drop_slab_node() can have null memcg if it's > > returned by mem_cgroup_iter(). > > > > shrink_slab() will still proceed and call do_shrink_slab() if the > > memcg is null - provided that mem_cgroup_disabled() holds: > > > > if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) > > return shrink_slab_memcg(gfp_mask, nid, memcg, priority); > > > > Ah, I think your analysis is very right, here memcg is NULL but memcg_page_state > won't check. +1. I could reproduce the NULL crash locally with cgroup_disable=memory, the shrinker enabled, and echo 3 >/proc/sys/vm/drop_caches. > > Inside zswap_shrink_count(), all the memcg accessors in this area seem > > to always check for null memcg (mem_cgroup_lruvec, > > mem_cgroup_zswap_writeback_enabled), *except* memcg_page_state, which > > is the one line that fail. > > > > If this is all to it, we can simply add a null check or > > mem_cgroup_disabled() check here, and use pool stats instead? > > Both look ok to me. The VM could only set sc->memcg to NULL when memcg > disabled, right? Christian, can you please test the below patch on top of current upstream? diff --git a/mm/zswap.c b/mm/zswap.c index caed028945b0..6f8850c44b61 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1331,15 +1331,22 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, if (!gfp_has_io_fs(sc->gfp_mask)) return 0; -#ifdef CONFIG_MEMCG_KMEM - mem_cgroup_flush_stats(memcg); - nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; - nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); -#else - /* use pool stats instead of memcg stats */ - nr_backing = zswap_pool_total_size >> PAGE_SHIFT; - nr_stored = atomic_read(&zswap_nr_stored); -#endif + /* + * For memcg, use the cgroup-wide ZSWAP stats since we don't + * have them per-node and thus per-lruvec. Careful if memcg is + * runtime-disabled: we can get sc->memcg == NULL, which is ok + * for the lruvec, but not for memcg_page_state(). + * + * Without memcg, use the zswap pool-wide metrics. + */ + if (!mem_cgroup_disabled()) { + mem_cgroup_flush_stats(memcg); + nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; + nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); + } else { + nr_backing = zswap_pool_total_size >> PAGE_SHIFT; + nr_stored = atomic_read(&zswap_nr_stored); + } if (!nr_stored) return 0; ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-17 14:33 ` Johannes Weiner @ 2024-04-17 15:08 ` Richard W.M. Jones 2024-04-17 17:18 ` Christian Heusel 1 sibling, 0 replies; 17+ messages in thread From: Richard W.M. Jones @ 2024-04-17 15:08 UTC (permalink / raw) To: Johannes Weiner Cc: Chengming Zhou, Nhat Pham, Christian Heusel, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Mark W, regressions, Yosry Ahmed On Wed, Apr 17, 2024 at 10:33:24AM -0400, Johannes Weiner wrote: > I could reproduce the NULL crash locally with cgroup_disable=memory, > the shrinker enabled, and echo 3 >/proc/sys/vm/drop_caches. Can confirm that libguestfs uses cgroup_disable=memory for its appliance (as it saves some RAM and we don't need cgroups). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-17 14:33 ` Johannes Weiner 2024-04-17 15:08 ` Richard W.M. Jones @ 2024-04-17 17:18 ` Christian Heusel 2024-04-18 12:40 ` Johannes Weiner 1 sibling, 1 reply; 17+ messages in thread From: Christian Heusel @ 2024-04-17 17:18 UTC (permalink / raw) To: Johannes Weiner Cc: Chengming Zhou, Nhat Pham, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions, Yosry Ahmed [-- Attachment #1: Type: text/plain, Size: 358 bytes --] On 24/04/17 10:33AM, Johannes Weiner wrote: > > Christian, can you please test the below patch on top of current > upstream? > Hey Johannes, I have applied your patch on top of 6.9-rc4 and it did solve the crash for me, thanks for hacking together a fix so quickly! 🤗 Tested-By: Christian Heusel <christian@heusel.eu> Cheers, Christian [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-17 17:18 ` Christian Heusel @ 2024-04-18 12:40 ` Johannes Weiner 2024-04-18 14:25 ` Linux regression tracking (Thorsten Leemhuis) 2024-04-18 20:09 ` Yosry Ahmed 0 siblings, 2 replies; 17+ messages in thread From: Johannes Weiner @ 2024-04-18 12:40 UTC (permalink / raw) To: Christian Heusel Cc: Chengming Zhou, Nhat Pham, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions, Yosry Ahmed On Wed, Apr 17, 2024 at 07:18:14PM +0200, Christian Heusel wrote: > On 24/04/17 10:33AM, Johannes Weiner wrote: > > > > Christian, can you please test the below patch on top of current > > upstream? > > > > Hey Johannes, > > I have applied your patch on top of 6.9-rc4 and it did solve the crash for > me, thanks for hacking together a fix so quickly! 🤗 > > Tested-By: Christian Heusel <christian@heusel.eu> Thanks for confirming it, and sorry about the breakage. Andrew, can you please use the updated changelog below? --- From 52f67f5fab6a743c2aedfc8e04a582a9d1025c28 Mon Sep 17 00:00:00 2001 From: Johannes Weiner <hannes@cmpxchg.org> Date: Thu, 18 Apr 2024 08:26:28 -0400 Subject: [PATCH] mm: zswap: fix shrinker NULL crash with cgroup_disable=memory Christian reports a NULL deref in zswap that he bisected down to the zswap shrinker. The issue also cropped up in the bug trackers of libguestfs [1] and the Red Hat bugzilla [2]. The problem is that when memcg is disabled with the boot time flag, the zswap shrinker might get called with sc->memcg == NULL. This is okay in many places, like the lruvec operations. But it crashes in memcg_page_state() - which is only used due to the non-node accounting of cgroup's the zswap memory to begin with. Nhat spotted that the memcg can be NULL in the memcg-disabled case, and I was then able to reproduce the crash locally as well. [1] https://github.com/libguestfs/libguestfs/issues/139 [2] https://bugzilla.redhat.com/show_bug.cgi?id=2275252 Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Cc: stable@vger.kernel.org [v6.8] Link: https://lkml.kernel.org/r/20240417143324.GA1055428@cmpxchg.org Reported-by: Christian Heusel <christian@heusel.eu> Debugged-by: Nhat Pham <nphamcs@gmail.com> Suggested-by: Nhat Pham <nphamcs@gmail.com> Tested-By: Christian Heusel <christian@heusel.eu> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- mm/zswap.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index caed028945b0..6f8850c44b61 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1331,15 +1331,22 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, if (!gfp_has_io_fs(sc->gfp_mask)) return 0; -#ifdef CONFIG_MEMCG_KMEM - mem_cgroup_flush_stats(memcg); - nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; - nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); -#else - /* use pool stats instead of memcg stats */ - nr_backing = zswap_pool_total_size >> PAGE_SHIFT; - nr_stored = atomic_read(&zswap_nr_stored); -#endif + /* + * For memcg, use the cgroup-wide ZSWAP stats since we don't + * have them per-node and thus per-lruvec. Careful if memcg is + * runtime-disabled: we can get sc->memcg == NULL, which is ok + * for the lruvec, but not for memcg_page_state(). + * + * Without memcg, use the zswap pool-wide metrics. + */ + if (!mem_cgroup_disabled()) { + mem_cgroup_flush_stats(memcg); + nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; + nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); + } else { + nr_backing = zswap_pool_total_size >> PAGE_SHIFT; + nr_stored = atomic_read(&zswap_nr_stored); + } if (!nr_stored) return 0; -- 2.44.0 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-18 12:40 ` Johannes Weiner @ 2024-04-18 14:25 ` Linux regression tracking (Thorsten Leemhuis) 2024-04-18 20:09 ` Yosry Ahmed 1 sibling, 0 replies; 17+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2024-04-18 14:25 UTC (permalink / raw) To: Johannes Weiner, Christian Heusel Cc: Chengming Zhou, Nhat Pham, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions, Yosry Ahmed On 18.04.24 14:40, Johannes Weiner wrote: > On Wed, Apr 17, 2024 at 07:18:14PM +0200, Christian Heusel wrote: >> On 24/04/17 10:33AM, Johannes Weiner wrote: > Christian reports a NULL deref in zswap that he bisected down to the > zswap shrinker. The issue also cropped up in the bug trackers of > libguestfs [1] and the Red Hat bugzilla [2]. > > The problem is that when memcg is disabled with the boot time flag, > the zswap shrinker might get called with sc->memcg == NULL. This is > okay in many places, like the lruvec operations. But it crashes in > memcg_page_state() - which is only used due to the non-node accounting > of cgroup's the zswap memory to begin with. > > Nhat spotted that the memcg can be NULL in the memcg-disabled case, > and I was then able to reproduce the crash locally as well. Thx for the fix. Nitpicking: > [1] https://github.com/libguestfs/libguestfs/issues/139 > [2] https://bugzilla.redhat.com/show_bug.cgi?id=2275252 FWIW, those should ideally look like this: Link: https://github.com/libguestfs/libguestfs/issues/139 [1] Link: https://bugzilla.redhat.com/show_bug.cgi?id=2275252 [2] > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") > Cc: stable@vger.kernel.org [v6.8] > Link: https://lkml.kernel.org/r/20240417143324.GA1055428@cmpxchg.org > Reported-by: Christian Heusel <christian@heusel.eu> And here checkpatch.pl should have complained that the above line should ideally be followed by a Link or Closes tag to the report, e.g.: Closes: https://lore.kernel.org/all/3iccc6vjl5gminut3lvpl4va2lbnsgku5ei2d7ylftoofy3n2v@gcfdvtsq6dx2/ Which in this case would be nice, as I'm tracking this regression, hence regzbot will then track the patch and consider the regression resolved once the fix lands in mainline. Ciao, Thorsten ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-18 12:40 ` Johannes Weiner 2024-04-18 14:25 ` Linux regression tracking (Thorsten Leemhuis) @ 2024-04-18 20:09 ` Yosry Ahmed 2024-04-19 14:22 ` Johannes Weiner 1 sibling, 1 reply; 17+ messages in thread From: Yosry Ahmed @ 2024-04-18 20:09 UTC (permalink / raw) To: Johannes Weiner Cc: Christian Heusel, Chengming Zhou, Nhat Pham, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions On Thu, Apr 18, 2024 at 5:40 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Wed, Apr 17, 2024 at 07:18:14PM +0200, Christian Heusel wrote: > > On 24/04/17 10:33AM, Johannes Weiner wrote: > > > > > > Christian, can you please test the below patch on top of current > > > upstream? > > > > > > > Hey Johannes, > > > > I have applied your patch on top of 6.9-rc4 and it did solve the crash for > > me, thanks for hacking together a fix so quickly! 🤗 > > > > Tested-By: Christian Heusel <christian@heusel.eu> > > Thanks for confirming it, and sorry about the breakage. > > Andrew, can you please use the updated changelog below? > > --- > > From 52f67f5fab6a743c2aedfc8e04a582a9d1025c28 Mon Sep 17 00:00:00 2001 > From: Johannes Weiner <hannes@cmpxchg.org> > Date: Thu, 18 Apr 2024 08:26:28 -0400 > Subject: [PATCH] mm: zswap: fix shrinker NULL crash with cgroup_disable=memory > > Christian reports a NULL deref in zswap that he bisected down to the > zswap shrinker. The issue also cropped up in the bug trackers of > libguestfs [1] and the Red Hat bugzilla [2]. > > The problem is that when memcg is disabled with the boot time flag, > the zswap shrinker might get called with sc->memcg == NULL. This is > okay in many places, like the lruvec operations. But it crashes in > memcg_page_state() - which is only used due to the non-node accounting > of cgroup's the zswap memory to begin with. > > Nhat spotted that the memcg can be NULL in the memcg-disabled case, > and I was then able to reproduce the crash locally as well. > > [1] https://github.com/libguestfs/libguestfs/issues/139 > [2] https://bugzilla.redhat.com/show_bug.cgi?id=2275252 > > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") > Cc: stable@vger.kernel.org [v6.8] > Link: https://lkml.kernel.org/r/20240417143324.GA1055428@cmpxchg.org > Reported-by: Christian Heusel <christian@heusel.eu> > Debugged-by: Nhat Pham <nphamcs@gmail.com> > Suggested-by: Nhat Pham <nphamcs@gmail.com> > Tested-By: Christian Heusel <christian@heusel.eu> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Thanks for fixing this. A couple of comments/questions below, but anyway LGTM: Acked-by: Yosry Ahmed <yosryahmed@google.com> > --- > mm/zswap.c | 25 ++++++++++++++++--------- > 1 file changed, 16 insertions(+), 9 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index caed028945b0..6f8850c44b61 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1331,15 +1331,22 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, > if (!gfp_has_io_fs(sc->gfp_mask)) > return 0; > > -#ifdef CONFIG_MEMCG_KMEM > - mem_cgroup_flush_stats(memcg); > - nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; > - nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); > -#else > - /* use pool stats instead of memcg stats */ > - nr_backing = zswap_pool_total_size >> PAGE_SHIFT; > - nr_stored = atomic_read(&zswap_nr_stored); > -#endif > + /* > + * For memcg, use the cgroup-wide ZSWAP stats since we don't > + * have them per-node and thus per-lruvec. Careful if memcg is > + * runtime-disabled: we can get sc->memcg == NULL, which is ok > + * for the lruvec, but not for memcg_page_state(). > + * > + * Without memcg, use the zswap pool-wide metrics. > + */ > + if (!mem_cgroup_disabled()) { With the current shrinker code, it seems like we cannot get sc->memcg == NULL unless mem_cgroup_disabled() is true indeed. However, maybe it's better to check if sc->memcg is not NULL directly instead? This would be more resilient in case the shrinker code changes and passing sc->memcg == NULL becomes possible in other cases (e.g. during global shrinking). It seems like other shrinkers do this, for example see count_shadow_nodes() and deferred_split_count(). I am also wondering if we should also check !mem_cgroup_is_root() here? We can avoid the expensive global flush and use the global stats directly in this case. I could also send a follow up patch for this if that's preferred. > + mem_cgroup_flush_stats(memcg); > + nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; > + nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); > + } else { > + nr_backing = zswap_pool_total_size >> PAGE_SHIFT; > + nr_stored = atomic_read(&zswap_nr_stored); > + } > > if (!nr_stored) > return 0; > -- > 2.44.0 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-18 20:09 ` Yosry Ahmed @ 2024-04-19 14:22 ` Johannes Weiner 2024-04-19 19:10 ` Yosry Ahmed 0 siblings, 1 reply; 17+ messages in thread From: Johannes Weiner @ 2024-04-19 14:22 UTC (permalink / raw) To: Yosry Ahmed Cc: Christian Heusel, Chengming Zhou, Nhat Pham, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions On Thu, Apr 18, 2024 at 01:09:22PM -0700, Yosry Ahmed wrote: > On Thu, Apr 18, 2024 at 5:40 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > > > On Wed, Apr 17, 2024 at 07:18:14PM +0200, Christian Heusel wrote: > > > On 24/04/17 10:33AM, Johannes Weiner wrote: > > > > > > > > Christian, can you please test the below patch on top of current > > > > upstream? > > > > > > > > > > Hey Johannes, > > > > > > I have applied your patch on top of 6.9-rc4 and it did solve the crash for > > > me, thanks for hacking together a fix so quickly! 🤗 > > > > > > Tested-By: Christian Heusel <christian@heusel.eu> > > > > Thanks for confirming it, and sorry about the breakage. > > > > Andrew, can you please use the updated changelog below? > > > > --- > > > > From 52f67f5fab6a743c2aedfc8e04a582a9d1025c28 Mon Sep 17 00:00:00 2001 > > From: Johannes Weiner <hannes@cmpxchg.org> > > Date: Thu, 18 Apr 2024 08:26:28 -0400 > > Subject: [PATCH] mm: zswap: fix shrinker NULL crash with cgroup_disable=memory > > > > Christian reports a NULL deref in zswap that he bisected down to the > > zswap shrinker. The issue also cropped up in the bug trackers of > > libguestfs [1] and the Red Hat bugzilla [2]. > > > > The problem is that when memcg is disabled with the boot time flag, > > the zswap shrinker might get called with sc->memcg == NULL. This is > > okay in many places, like the lruvec operations. But it crashes in > > memcg_page_state() - which is only used due to the non-node accounting > > of cgroup's the zswap memory to begin with. > > > > Nhat spotted that the memcg can be NULL in the memcg-disabled case, > > and I was then able to reproduce the crash locally as well. > > > > [1] https://github.com/libguestfs/libguestfs/issues/139 > > [2] https://bugzilla.redhat.com/show_bug.cgi?id=2275252 > > > > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") > > Cc: stable@vger.kernel.org [v6.8] > > Link: https://lkml.kernel.org/r/20240417143324.GA1055428@cmpxchg.org > > Reported-by: Christian Heusel <christian@heusel.eu> > > Debugged-by: Nhat Pham <nphamcs@gmail.com> > > Suggested-by: Nhat Pham <nphamcs@gmail.com> > > Tested-By: Christian Heusel <christian@heusel.eu> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > Thanks for fixing this. A couple of comments/questions below, but anyway LGTM: > > Acked-by: Yosry Ahmed <yosryahmed@google.com> > > > --- > > mm/zswap.c | 25 ++++++++++++++++--------- > > 1 file changed, 16 insertions(+), 9 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index caed028945b0..6f8850c44b61 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1331,15 +1331,22 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, > > if (!gfp_has_io_fs(sc->gfp_mask)) > > return 0; > > > > -#ifdef CONFIG_MEMCG_KMEM > > - mem_cgroup_flush_stats(memcg); > > - nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; > > - nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); > > -#else > > - /* use pool stats instead of memcg stats */ > > - nr_backing = zswap_pool_total_size >> PAGE_SHIFT; > > - nr_stored = atomic_read(&zswap_nr_stored); > > -#endif > > + /* > > + * For memcg, use the cgroup-wide ZSWAP stats since we don't > > + * have them per-node and thus per-lruvec. Careful if memcg is > > + * runtime-disabled: we can get sc->memcg == NULL, which is ok > > + * for the lruvec, but not for memcg_page_state(). > > + * > > + * Without memcg, use the zswap pool-wide metrics. > > + */ > > + if (!mem_cgroup_disabled()) { > > With the current shrinker code, it seems like we cannot get sc->memcg > == NULL unless mem_cgroup_disabled() is true indeed. However, maybe > it's better to check if sc->memcg is not NULL directly instead? > > This would be more resilient in case the shrinker code changes and > passing sc->memcg == NULL becomes possible in other cases (e.g. during > global shrinking). It seems like other shrinkers do this, for example > see count_shadow_nodes() and deferred_split_count(). Eh, I'm not sure it's better or worse, so it's a bit hard to care. We shouldn't get NULL here when memcg is enabled, and if somebody introduces that bug it's better to catch it early than run into subtle priority inversions when the kernel is deployed to millions of hosts. > I am also wondering if we should also check !mem_cgroup_is_root() > here? We can avoid the expensive global flush and use the global stats > directly in this case. I could also send a follow up patch for this if > that's preferred. I'd rather not proliferate more memcg internals in this code. If this is a concern, optimizing it in the flush and stat functions would make more sense. Reclaim already flushes the subtree before getting here, so odds are good this is a no-op in most cases. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-19 14:22 ` Johannes Weiner @ 2024-04-19 19:10 ` Yosry Ahmed 0 siblings, 0 replies; 17+ messages in thread From: Yosry Ahmed @ 2024-04-19 19:10 UTC (permalink / raw) To: Johannes Weiner Cc: Christian Heusel, Chengming Zhou, Nhat Pham, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions On Fri, Apr 19, 2024 at 7:22 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Thu, Apr 18, 2024 at 01:09:22PM -0700, Yosry Ahmed wrote: > > On Thu, Apr 18, 2024 at 5:40 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > > > > > On Wed, Apr 17, 2024 at 07:18:14PM +0200, Christian Heusel wrote: > > > > On 24/04/17 10:33AM, Johannes Weiner wrote: > > > > > > > > > > Christian, can you please test the below patch on top of current > > > > > upstream? > > > > > > > > > > > > > Hey Johannes, > > > > > > > > I have applied your patch on top of 6.9-rc4 and it did solve the crash for > > > > me, thanks for hacking together a fix so quickly! 🤗 > > > > > > > > Tested-By: Christian Heusel <christian@heusel.eu> > > > > > > Thanks for confirming it, and sorry about the breakage. > > > > > > Andrew, can you please use the updated changelog below? > > > > > > --- > > > > > > From 52f67f5fab6a743c2aedfc8e04a582a9d1025c28 Mon Sep 17 00:00:00 2001 > > > From: Johannes Weiner <hannes@cmpxchg.org> > > > Date: Thu, 18 Apr 2024 08:26:28 -0400 > > > Subject: [PATCH] mm: zswap: fix shrinker NULL crash with cgroup_disable=memory > > > > > > Christian reports a NULL deref in zswap that he bisected down to the > > > zswap shrinker. The issue also cropped up in the bug trackers of > > > libguestfs [1] and the Red Hat bugzilla [2]. > > > > > > The problem is that when memcg is disabled with the boot time flag, > > > the zswap shrinker might get called with sc->memcg == NULL. This is > > > okay in many places, like the lruvec operations. But it crashes in > > > memcg_page_state() - which is only used due to the non-node accounting > > > of cgroup's the zswap memory to begin with. > > > > > > Nhat spotted that the memcg can be NULL in the memcg-disabled case, > > > and I was then able to reproduce the crash locally as well. > > > > > > [1] https://github.com/libguestfs/libguestfs/issues/139 > > > [2] https://bugzilla.redhat.com/show_bug.cgi?id=2275252 > > > > > > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") > > > Cc: stable@vger.kernel.org [v6.8] > > > Link: https://lkml.kernel.org/r/20240417143324.GA1055428@cmpxchg.org > > > Reported-by: Christian Heusel <christian@heusel.eu> > > > Debugged-by: Nhat Pham <nphamcs@gmail.com> > > > Suggested-by: Nhat Pham <nphamcs@gmail.com> > > > Tested-By: Christian Heusel <christian@heusel.eu> > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > > > Thanks for fixing this. A couple of comments/questions below, but anyway LGTM: > > > > Acked-by: Yosry Ahmed <yosryahmed@google.com> > > > > > --- > > > mm/zswap.c | 25 ++++++++++++++++--------- > > > 1 file changed, 16 insertions(+), 9 deletions(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index caed028945b0..6f8850c44b61 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -1331,15 +1331,22 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker, > > > if (!gfp_has_io_fs(sc->gfp_mask)) > > > return 0; > > > > > > -#ifdef CONFIG_MEMCG_KMEM > > > - mem_cgroup_flush_stats(memcg); > > > - nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT; > > > - nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED); > > > -#else > > > - /* use pool stats instead of memcg stats */ > > > - nr_backing = zswap_pool_total_size >> PAGE_SHIFT; > > > - nr_stored = atomic_read(&zswap_nr_stored); > > > -#endif > > > + /* > > > + * For memcg, use the cgroup-wide ZSWAP stats since we don't > > > + * have them per-node and thus per-lruvec. Careful if memcg is > > > + * runtime-disabled: we can get sc->memcg == NULL, which is ok > > > + * for the lruvec, but not for memcg_page_state(). > > > + * > > > + * Without memcg, use the zswap pool-wide metrics. > > > + */ > > > + if (!mem_cgroup_disabled()) { > > > > With the current shrinker code, it seems like we cannot get sc->memcg > > == NULL unless mem_cgroup_disabled() is true indeed. However, maybe > > it's better to check if sc->memcg is not NULL directly instead? > > > > This would be more resilient in case the shrinker code changes and > > passing sc->memcg == NULL becomes possible in other cases (e.g. during > > global shrinking). It seems like other shrinkers do this, for example > > see count_shadow_nodes() and deferred_split_count(). > > Eh, I'm not sure it's better or worse, so it's a bit hard to care. We > shouldn't get NULL here when memcg is enabled, and if somebody > introduces that bug it's better to catch it early than run into subtle > priority inversions when the kernel is deployed to millions of hosts. No strong opinion here, I just thought consistency with other shrinkers would be nice. > > > I am also wondering if we should also check !mem_cgroup_is_root() > > here? We can avoid the expensive global flush and use the global stats > > directly in this case. I could also send a follow up patch for this if > > that's preferred. > > I'd rather not proliferate more memcg internals in this code. If this > is a concern, optimizing it in the flush and stat functions would make > more sense. Reclaim already flushes the subtree before getting here, > so odds are good this is a no-op in most cases. I agree that with per-memcg thresholding, it is very unlikely that this flush does anything. FWIW, if it turns out to be a problem, optimizing it in the flush functions would not be as straightforward, because the memcg code is not aware that zswap has up-to-date global versions of the stats. Anyway, let's table this until someone actually complains, just seemed to me like a low hanging fruit. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [REGRESSION] Null pointer dereference while shrinking zswap 2024-04-16 12:19 [REGRESSION] Null pointer dereference while shrinking zswap Christian Heusel 2024-04-16 19:18 ` Andrew Morton 2024-04-16 22:14 ` Nhat Pham @ 2024-04-17 0:33 ` Nhat Pham 2 siblings, 0 replies; 17+ messages in thread From: Nhat Pham @ 2024-04-17 0:33 UTC (permalink / raw) To: Christian Heusel Cc: Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, linux-mm, linux-kernel, David Runge, Richard W.M. Jones, Mark W, regressions On Tue, Apr 16, 2024 at 5:19 AM Christian Heusel <christian@heusel.eu> wrote: > > Hello everyone, > > while rebuilding a few packages in Arch Linux we have recently come > across a regression in the linux kernel which was made visible by a test > failure in libguestfs[0], where the booted kernel showed a Call Trace > like the following one: > > [ 218.738568] CPU: 0 PID: 167 Comm: guestfsd Not tainted 6.7.0-rc4-1-mainline-00158-gb5ba474f3f51 #1 bf39861cf50acae7a79c534e25532f28afe4e593^M > [ 218.739007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.3-1-1 04/01/2014^M > [ 218.739787] RIP: 0010:memcg_page_state+0x9/0x30^M > [ 218.740299] Code: 0d b8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 <48> 8b 87 00 06 00 00 48 63 f6 31 d2 48 8b 04 f0 48 85 c0 48 0f 48^M > [ 218.740727] RSP: 0018:ffffb5fa808dfc10 EFLAGS: 00000202^M > [ 218.740862] RAX: 0000000000000000 RBX: ffffb5fa808dfce0 RCX: 0000000000000002^M > [ 218.741016] RDX: 0000000000000001 RSI: 0000000000000033 RDI: 0000000000000000^M > [ 218.741168] RBP: 0000000000000000 R08: ffff976681ff8000 R09: 0000000000000000^M > [ 218.741322] R10: 0000000000000001 R11: ffff9766833f9d00 R12: ffff9766ffffe780^M > [ 218.742167] R13: 0000000000000000 R14: ffff976680cc1800 R15: ffff976682204d80^M > [ 218.742376] FS: 00007f1479d9f540(0000) GS:ffff9766fbc00000(0000) knlGS:0000000000000000^M > [ 218.742569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M > [ 218.743256] CR2: 0000000000000600 CR3: 0000000103606000 CR4: 0000000000750ef0^M > [ 218.743494] PKRU: 55555554^M > [ 218.743593] Call Trace:^M > [ 218.743733] <TASK>^M > [ 218.743847] ? __die+0x23/0x70^M > [ 218.743957] ? page_fault_oops+0x171/0x4e0^M > [ 218.744056] ? free_unref_page+0xf6/0x180^M > [ 218.744458] ? exc_page_fault+0x7f/0x180^M > [ 218.744551] ? asm_exc_page_fault+0x26/0x30^M > [ 218.744684] ? memcg_page_state+0x9/0x30^M > [ 218.744779] zswap_shrinker_count+0x9d/0x110^M > [ 218.744896] do_shrink_slab+0x3a/0x360^M > [ 218.744990] shrink_slab+0xc7/0x3c0^M > [ 218.745609] drop_slab+0x85/0x140^M > [ 218.745691] drop_caches_sysctl_handler+0x7e/0xd0^M > [ 218.745799] proc_sys_call_handler+0x1c0/0x2e0^M > [ 218.745912] vfs_write+0x23d/0x400^M > [ 218.745998] ksys_write+0x6f/0xf0^M > [ 218.746080] do_syscall_64+0x64/0xe0^M > [ 218.746169] ? exit_to_user_mode_prepare+0x132/0x1f0^M > [ 218.746873] entry_SYSCALL_64_after_hwframe+0x6e/0x76^M > > The regression is present in the mainline kernel and also was > independently reported to the redhat bugtracker[1]. > > I have bisected (see log[2]) the regression between v6.9-rc4 and v6.6 > and have landed on the following results (removed unrelated test commit) > as remainders since some of the commits were not buildable for me: > - 7108cc3f765c ("mm: memcg: add per-memcg zswap writeback stat") > - a65b0e7607cc ("zswap: make shrinking memcg-aware") > - b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") > > I have decided on good/bad commits with the relevant libguestfs tests, > but I think the reproducer in the redhat bugzilla is simpler (although I > only became aware of it during the bisection and therefore didn't test > it myself): > > LIBGUESTFS_MEMSIZE=4096 LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 make -C /build/libguestfs/src/libguestfs-1.52.0/tests -k check TESTS=c-api/tests > I have a suspect for the bug: https://lore.kernel.org/all/CAKEwX=MWPUf1NMGdn+1AkRdOUf25ifAbPyoP9zppPTx3U3Tv2Q@mail.gmail.com/ Feel free to fact check me, but let me see if I can reproduce this bug on my own setting based on this analysis and send a fix accordingly :) ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-04-19 19:10 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-04-16 12:19 [REGRESSION] Null pointer dereference while shrinking zswap Christian Heusel 2024-04-16 19:18 ` Andrew Morton 2024-04-16 19:57 ` Christian Heusel 2024-04-16 22:14 ` Nhat Pham 2024-04-16 22:29 ` Christian Heusel 2024-04-16 23:29 ` Nhat Pham 2024-04-17 0:22 ` Nhat Pham 2024-04-17 3:44 ` Chengming Zhou 2024-04-17 14:33 ` Johannes Weiner 2024-04-17 15:08 ` Richard W.M. Jones 2024-04-17 17:18 ` Christian Heusel 2024-04-18 12:40 ` Johannes Weiner 2024-04-18 14:25 ` Linux regression tracking (Thorsten Leemhuis) 2024-04-18 20:09 ` Yosry Ahmed 2024-04-19 14:22 ` Johannes Weiner 2024-04-19 19:10 ` Yosry Ahmed 2024-04-17 0:33 ` Nhat Pham
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox