From: Chris Li <chrisl@kernel.org>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Minchan Kim <minchan@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>,
Kairui Song <ryncsn@gmail.com>
Subject: Re: [PATCHv2] zram: free secondary algorithms names
Date: Tue, 24 Sep 2024 08:36:54 -0700 [thread overview]
Message-ID: <CAF8kJuPq+JwvwrG+H+XU23Xz+Do+qy8RPJBBVRsL0YGQzZpfrQ@mail.gmail.com> (raw)
In-Reply-To: <20240917013021.868769-1-senozhatsky@chromium.org>
[-- Attachment #1: Type: text/plain, Size: 10275 bytes --]
Hi Sergey,
The current mm-unstable is breaking my swap stress test again. While there
seems to be multiple bad commits that cause it. I have bisected into this
commit causing kernel warning and followed by BUG().
[ 56.630032] zswap: loaded using pool lzo/zsmalloc
[ 56.718027] zram0: detected capacity change from 16777216 to 0
[ 56.725492] zram: Removed device: zram0
[ 56.740125] ------------[ cut here ]------------
[ 56.744616] WARNING: CPU: 2 PID: 1894 at mm/slub.c:4556
free_large_kmalloc+0x4d/0x80
[ 56.745119] Modules linked in:
[ 56.749551] CPU: 2 UID: 0 PID: 1894 Comm: zram-generator Tainted: G S
6.11.0-rc6+ #33
[ 56.750129] Tainted: [S]=CPU_OUT_OF_SPEC
[ 56.750908] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
BIOS P89 09/21/2023
[ 56.751354] RIP: 0010:free_large_kmalloc+0x4d/0x80
[ 56.756120] Code: 00 10 00 00 48 d3 e0 f7 d8 81 e2 c0 00 00 00 75 2f 89
c6 48 89 df e8 82 ff ff ff f0 ff 4b 34 0f 85 e
9 7d f5 00 e9 eb 7d f5 00 <0f> 0b 80 3d a8 f3 9b 02 00 0f 84 bd 7d f5 00 b8
00 f0 ff ff eb d1
[ 56.761370] RSP: 0018:ffffaeaaa3657b20 EFLAGS: 00010246
[ 56.761676] RAX: 0057ffffc0002000 RBX: ffffece0c1f40e80 RCX:
000000008040003f
[ 56.766293] RDX: ffffece0c1f40e88 RSI: ffffffff9a03a131 RDI:
ffffece0c1f40e80
[ 56.770931] RBP: 0000000000200000 R08: ffff95571d256480 R09:
000000008040003f
[ 56.775540] R10: 000000008040003f R11: 000000000000032c R12:
0000000000200000
[ 56.780212] R13: ffff953787c71e40 R14: 0000000000000047 R15:
ffff95379b2e3e20
[ 56.784943] FS: 00007fb0f1d58bc0(0000) GS:ffff95567ed00000(0000)
knlGS:0000000000000000
[ 56.785403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 56.789937] CR2: 00007f35b6449050 CR3: 00000001112ac006 CR4:
00000000001706f0
[ 56.794784] Call Trace:
[ 56.794941] <TASK>
[ 56.799377] ? free_large_kmalloc+0x4d/0x80
[ 56.799598] ? __warn.cold+0x8e/0xe8
[ 56.799842] ? free_large_kmalloc+0x4d/0x80
[ 56.800065] ? report_bug+0xff/0x140
[ 56.800296] ? handle_bug+0x3c/0x80
[ 56.804703] ? exc_invalid_op+0x17/0x70
[ 56.804912] ? asm_exc_invalid_op+0x1a/0x20
[ 56.805132] ? free_large_kmalloc+0x4d/0x80
[ 56.805344] zram_destroy_comps+0x32/0x70
[ 56.805568] zram_reset_device+0x102/0x190
[ 56.805812] reset_store+0xa6/0x110
[ 56.810207] kernfs_fop_write_iter+0x141/0x1f0
[ 56.814689] vfs_write+0x294/0x460
[ 56.819106] ksys_write+0x6d/0xf0
[ 56.823550] do_syscall_64+0x82/0x160
[ 56.823827] ? __pfx_kfree_link+0x10/0x10
[ 56.824051] ? do_sys_openat2+0x9c/0xe0
[ 56.824263] ? __handle_mm_fault+0xb34/0xfb0
[ 56.828752] ? syscall_exit_to_user_mode+0x10/0x220
[ 56.833220] ? do_syscall_64+0x8e/0x160
[ 56.833429] ? __count_memcg_events+0x77/0x130
[ 56.838021] ? count_memcg_events.constprop.0+0x1a/0x30
[ 56.838318] ? handle_mm_fault+0x1bb/0x2c0
[ 56.838542] ? do_user_addr_fault+0x55a/0x7b0
[ 56.843014] ? exc_page_fault+0x7e/0x180
[ 56.843228] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 56.843831] RIP: 0033:0x7fb0f1f7a984
[ 56.844045] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00
00 00 00 00 f3 0f 1e fa 80 3d c5 06 0e 00 00 7
4 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5
48 83 ec 20 48 89
[ 56.849247] RSP: 002b:00007ffc7db8fde8 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[ 56.853889] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007fb0f1f7a984
[ 56.858482] RDX: 0000000000000001 RSI: 0000560df4e4ea65 RDI:
0000000000000004
[ 56.863154] RBP: 0000000000000004 R08: 0000560e0e417010 R09:
0000000000000007
[ 56.867794] R10: 00000000000001b6 R11: 0000000000000202 R12:
7fffffffffffffff
[ 56.872980] R13: 00007fb0f1f7a970 R14: 0000560df4e4ea65 R15:
0000560df4e71bd0
[ 56.878043] </TASK>
[ 56.878555] ---[ end trace 0000000000000000 ]---
[ 56.883420] object pointer: 0x00000000f38e5ae7
[ 56.888235] BUG: Bad page state in process zram-generator pfn:407d03a
[ 56.889026] page: refcount:0 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x407d03a
[ 56.889877] flags:
0x57ffffc0002000(reserved|node=1|zone=2|lastcpupid=0x1fffff)
[ 56.894915] raw: 0057ffffc0002000 ffffece0c1f40e88 ffffece0c1f40e88
0000000000000000
[ 56.895771] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[ 56.896562] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 56.897332] Modules linked in:
[ 56.902165] CPU: 2 UID: 0 PID: 1894 Comm: zram-generator Tainted: G S
W 6.11.0-rc6+ #33
[ 56.903155] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[ 56.908082] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
BIOS P89 09/21/2023
[ 56.908918] Call Trace:
[ 56.909484] <TASK>
[ 56.914148] dump_stack_lvl+0x5d/0x80
[ 56.914747] bad_page.cold+0x7a/0x91
[ 56.915318] free_unref_page+0x344/0x520
[ 56.915975] zram_destroy_comps+0x32/0x70
[ 56.916452] zram_reset_device+0x102/0x190
[ 56.917057] reset_store+0xa6/0x110
[ 56.921874] kernfs_fop_write_iter+0x141/0x1f0
[ 56.926685] vfs_write+0x294/0x460
[ 56.931385] ksys_write+0x6d/0xf0
[ 56.936087] do_syscall_64+0x82/0x160
[ 56.936656] ? __pfx_kfree_link+0x10/0x10
[ 56.937257] ? do_sys_openat2+0x9c/0xe0
[ 56.937810] ? __handle_mm_fault+0xb34/0xfb0
[ 56.942593] ? syscall_exit_to_user_mode+0x10/0x220
[ 56.947362] ? do_syscall_64+0x8e/0x160
[ 56.947974] ? __count_memcg_events+0x77/0x130
[ 56.952762] ? count_memcg_events.constprop.0+0x1a/0x30
[ 56.953356] ? handle_mm_fault+0x1bb/0x2c0
[ 56.953937] ? do_user_addr_fault+0x55a/0x7b0
[ 56.958999] ? exc_page_fault+0x7e/0x180
[ 56.959523] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 56.960163] RIP: 0033:0x7fb0f1f7a984
[ 56.960731] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00
00 00 00 00 f3 0f 1e fa 80 3d c5 06 0e 00 00 7
4 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5
48 83 ec 20 48 89
[ 56.966840] RSP: 002b:00007ffc7db8fde8 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[ 56.971903] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007fb0f1f7a984
[ 56.976953] RDX: 0000000000000001 RSI: 0000560df4e4ea65 RDI:
0000000000000004
[ 56.981946] RBP: 0000000000000004 R08: 0000560e0e417010 R09:
0000000000000007
[ 56.986980] R10: 00000000000001b6 R11: 0000000000000202 R12:
7fffffffffffffff
[ 56.991985] R13: 00007fb0f1f7a970 R14: 0000560df4e4ea65 R15:
0000560df4e71bd0
[ 56.996963] </TASK>
[ 56.997533] Disabling lock debugging due to kernel taint
[ 57.037759] zram: Added device: zram0
[ 57.088669] zram: Added device: zram1
[ 57.249105] zram0: detected capacity change from 0 to 6553600
[ 57.320547] zram1: detected capacity change from 0 to 40960000
[ 57.443012] Adding 3276796k swap on /dev/zram0. Priority:100 extents:1
across:3276796k SS
[ 57.470295] Adding 20479996k swap on /dev/zram1. Priority:0 extents:1
across:20479996k SS
Here is the bisect log:
$ git bisect log
# bad: [684826f8271ad97580b138b9ffd462005e470b99] zram: free secondary
algorithms names
# good: [2cacbdfdee65b18f9952620e762eab043d71b564] mm: swap: add a adaptive
full cluster cache reclaim
git bisect start 'mm-stable' 'HEAD'
# good: [9bfbaa5e44c52422a046ce291469c8ebeb6c475d] mm/damon: move kunit
tests to tests/ subdirectory with _kunit suffix
git bisect good 9bfbaa5e44c52422a046ce291469c8ebeb6c475d
# good: [1e673c8cf7f9c1156f615b7c00f224a8110070da] zram: add dictionary
support to lz4hc
git bisect good 1e673c8cf7f9c1156f615b7c00f224a8110070da
# good: [3c8e44c9b369b3d422516b3f2bf47a6e3c61d1ea] mm: mark special bits
for huge pfn mappings when inject
git bisect good 3c8e44c9b369b3d422516b3f2bf47a6e3c61d1ea
# good: [f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101] vfio/pci: implement
huge_fault support
git bisect good f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101
# good: [659c55ef981bb63355a65ffc3b3b5cad562b806a] mm/vma: return the exact
errno in vms_gather_munmap_vmas()
git bisect good 659c55ef981bb63355a65ffc3b3b5cad562b806a
# good: [325efb16da2c840e165d9b620fec8049d4d664cc] mm: add nr argument in
mem_cgroup_swapin_uncharge_swap() helper to support large folios
git bisect good 325efb16da2c840e165d9b620fec8049d4d664cc
# good: [ed8d5b0ce1d738e13c60d6b1a901a56d832e5070] Revert "uprobes: use
vm_special_mapping close() functionality"
git bisect good ed8d5b0ce1d738e13c60d6b1a901a56d832e5070
# good: [2abbcc099ec60844ca7c15214ab12955d3c11e68] uprobes: turn
xol_area->pages[2] into xol_area->page
git bisect good 2abbcc099ec60844ca7c15214ab12955d3c11e68
# first bad commit: [684826f8271ad97580b138b9ffd462005e470b99] zram: free
secondary algorithms names
Sergey told me there is a fix on the way:
https://lore.kernel.org/all/20240923164843.1117010-1-andrej.skvortzov@gmail.com/
This commit did not really break my swap stress test, the test can pass
those kernel oops messages. It is just my bisect script that picks up the
kernel oops and determines that is a bad commit. There is another bad
commit in the current mm-unstable I need to haunt down.
Chris
On Mon, Sep 16, 2024 at 6:30 PM Sergey Senozhatsky <senozhatsky@chromium.org>
wrote:
> We need to kfree() secondary algorithms names when reset
> zram device that had multi-streams, otherwise we leak memory.
>
> Fixes: 001d92735701 ("zram: add recompression algorithm sysfs knob")
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
> drivers/block/zram/zram_drv.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index f8206ba6cbbb..c3d245617083 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -2115,6 +2115,11 @@ static void zram_destroy_comps(struct zram *zram)
> zram->num_active_comps--;
> }
>
> + for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
> + kfree(zram->comp_algs[prio]);
> + zram->comp_algs[prio] = NULL;
> + }
> +
> zram_comp_params_reset(zram);
> }
>
> --
> 2.46.0.662.g92d0881bb0-goog
>
>
>
[-- Attachment #2: Type: text/html, Size: 11573 bytes --]
next parent reply other threads:[~2024-09-24 15:37 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240911025600.3681789-1-senozhatsky@chromium.org>
[not found] ` <20240917013021.868769-1-senozhatsky@chromium.org>
2024-09-24 15:36 ` Chris Li [this message]
2024-09-24 15:52 ` Chris Li
2024-09-24 18:05 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAF8kJuPq+JwvwrG+H+XU23Xz+Do+qy8RPJBBVRsL0YGQzZpfrQ@mail.gmail.com \
--to=chrisl@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=ryncsn@gmail.com \
--cc=senozhatsky@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox