linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCHv2] zram: free secondary algorithms names
       [not found] ` <20240917013021.868769-1-senozhatsky@chromium.org>
@ 2024-09-24 15:36   ` Chris Li
  2024-09-24 15:52     ` Chris Li
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Li @ 2024-09-24 15:36 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Minchan Kim, linux-kernel, linux-mm, Kairui Song

[-- Attachment #1: Type: text/plain, Size: 10275 bytes --]

Hi Sergey,

The current mm-unstable is breaking my swap stress test again. While there
seems to be multiple bad commits that cause it. I have bisected into this
commit causing kernel warning and followed by BUG().

[   56.630032] zswap: loaded using pool lzo/zsmalloc
[   56.718027] zram0: detected capacity change from 16777216 to 0
[   56.725492] zram: Removed device: zram0
[   56.740125] ------------[ cut here ]------------
[   56.744616] WARNING: CPU: 2 PID: 1894 at mm/slub.c:4556
free_large_kmalloc+0x4d/0x80
[   56.745119] Modules linked in:
[   56.749551] CPU: 2 UID: 0 PID: 1894 Comm: zram-generator Tainted: G S
              6.11.0-rc6+ #33
[   56.750129] Tainted: [S]=CPU_OUT_OF_SPEC
[   56.750908] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
BIOS P89 09/21/2023
[   56.751354] RIP: 0010:free_large_kmalloc+0x4d/0x80
[   56.756120] Code: 00 10 00 00 48 d3 e0 f7 d8 81 e2 c0 00 00 00 75 2f 89
c6 48 89 df e8 82 ff ff ff f0 ff 4b 34 0f 85 e
9 7d f5 00 e9 eb 7d f5 00 <0f> 0b 80 3d a8 f3 9b 02 00 0f 84 bd 7d f5 00 b8
00 f0 ff ff eb d1
[   56.761370] RSP: 0018:ffffaeaaa3657b20 EFLAGS: 00010246
[   56.761676] RAX: 0057ffffc0002000 RBX: ffffece0c1f40e80 RCX:
000000008040003f
[   56.766293] RDX: ffffece0c1f40e88 RSI: ffffffff9a03a131 RDI:
ffffece0c1f40e80
[   56.770931] RBP: 0000000000200000 R08: ffff95571d256480 R09:
000000008040003f
[   56.775540] R10: 000000008040003f R11: 000000000000032c R12:
0000000000200000
[   56.780212] R13: ffff953787c71e40 R14: 0000000000000047 R15:
ffff95379b2e3e20
[   56.784943] FS:  00007fb0f1d58bc0(0000) GS:ffff95567ed00000(0000)
knlGS:0000000000000000
[   56.785403] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   56.789937] CR2: 00007f35b6449050 CR3: 00000001112ac006 CR4:
00000000001706f0
[   56.794784] Call Trace:
[   56.794941]  <TASK>
[   56.799377]  ? free_large_kmalloc+0x4d/0x80
[   56.799598]  ? __warn.cold+0x8e/0xe8
[   56.799842]  ? free_large_kmalloc+0x4d/0x80
[   56.800065]  ? report_bug+0xff/0x140
[   56.800296]  ? handle_bug+0x3c/0x80
[   56.804703]  ? exc_invalid_op+0x17/0x70
[   56.804912]  ? asm_exc_invalid_op+0x1a/0x20
[   56.805132]  ? free_large_kmalloc+0x4d/0x80
[   56.805344]  zram_destroy_comps+0x32/0x70
[   56.805568]  zram_reset_device+0x102/0x190
[   56.805812]  reset_store+0xa6/0x110
[   56.810207]  kernfs_fop_write_iter+0x141/0x1f0
[   56.814689]  vfs_write+0x294/0x460
[   56.819106]  ksys_write+0x6d/0xf0
[   56.823550]  do_syscall_64+0x82/0x160
[   56.823827]  ? __pfx_kfree_link+0x10/0x10
[   56.824051]  ? do_sys_openat2+0x9c/0xe0
[   56.824263]  ? __handle_mm_fault+0xb34/0xfb0
[   56.828752]  ? syscall_exit_to_user_mode+0x10/0x220
[   56.833220]  ? do_syscall_64+0x8e/0x160
[   56.833429]  ? __count_memcg_events+0x77/0x130
[   56.838021]  ? count_memcg_events.constprop.0+0x1a/0x30
[   56.838318]  ? handle_mm_fault+0x1bb/0x2c0
[   56.838542]  ? do_user_addr_fault+0x55a/0x7b0
[   56.843014]  ? exc_page_fault+0x7e/0x180
[   56.843228]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   56.843831] RIP: 0033:0x7fb0f1f7a984
[   56.844045] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00
00 00 00 00 f3 0f 1e fa 80 3d c5 06 0e 00 00 7
4 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5
48 83 ec 20 48 89
[   56.849247] RSP: 002b:00007ffc7db8fde8 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[   56.853889] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007fb0f1f7a984
[   56.858482] RDX: 0000000000000001 RSI: 0000560df4e4ea65 RDI:
0000000000000004
[   56.863154] RBP: 0000000000000004 R08: 0000560e0e417010 R09:
0000000000000007
[   56.867794] R10: 00000000000001b6 R11: 0000000000000202 R12:
7fffffffffffffff
[   56.872980] R13: 00007fb0f1f7a970 R14: 0000560df4e4ea65 R15:
0000560df4e71bd0
[   56.878043]  </TASK>
[   56.878555] ---[ end trace 0000000000000000 ]---
[   56.883420] object pointer: 0x00000000f38e5ae7
[   56.888235] BUG: Bad page state in process zram-generator  pfn:407d03a
[   56.889026] page: refcount:0 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x407d03a
[   56.889877] flags:
0x57ffffc0002000(reserved|node=1|zone=2|lastcpupid=0x1fffff)
[   56.894915] raw: 0057ffffc0002000 ffffece0c1f40e88 ffffece0c1f40e88
0000000000000000
[   56.895771] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[   56.896562] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[   56.897332] Modules linked in:
[   56.902165] CPU: 2 UID: 0 PID: 1894 Comm: zram-generator Tainted: G S
   W          6.11.0-rc6+ #33
[   56.903155] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[   56.908082] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
BIOS P89 09/21/2023
[   56.908918] Call Trace:
[   56.909484]  <TASK>
[   56.914148]  dump_stack_lvl+0x5d/0x80
[   56.914747]  bad_page.cold+0x7a/0x91
[   56.915318]  free_unref_page+0x344/0x520
[   56.915975]  zram_destroy_comps+0x32/0x70
[   56.916452]  zram_reset_device+0x102/0x190
[   56.917057]  reset_store+0xa6/0x110
[   56.921874]  kernfs_fop_write_iter+0x141/0x1f0
[   56.926685]  vfs_write+0x294/0x460
[   56.931385]  ksys_write+0x6d/0xf0
[   56.936087]  do_syscall_64+0x82/0x160
[   56.936656]  ? __pfx_kfree_link+0x10/0x10
[   56.937257]  ? do_sys_openat2+0x9c/0xe0
[   56.937810]  ? __handle_mm_fault+0xb34/0xfb0
[   56.942593]  ? syscall_exit_to_user_mode+0x10/0x220
[   56.947362]  ? do_syscall_64+0x8e/0x160
[   56.947974]  ? __count_memcg_events+0x77/0x130
[   56.952762]  ? count_memcg_events.constprop.0+0x1a/0x30
[   56.953356]  ? handle_mm_fault+0x1bb/0x2c0
[   56.953937]  ? do_user_addr_fault+0x55a/0x7b0
[   56.958999]  ? exc_page_fault+0x7e/0x180
[   56.959523]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   56.960163] RIP: 0033:0x7fb0f1f7a984
[   56.960731] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00
00 00 00 00 f3 0f 1e fa 80 3d c5 06 0e 00 00 7
4 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5
48 83 ec 20 48 89
[   56.966840] RSP: 002b:00007ffc7db8fde8 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[   56.971903] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007fb0f1f7a984
[   56.976953] RDX: 0000000000000001 RSI: 0000560df4e4ea65 RDI:
0000000000000004
[   56.981946] RBP: 0000000000000004 R08: 0000560e0e417010 R09:
0000000000000007
[   56.986980] R10: 00000000000001b6 R11: 0000000000000202 R12:
7fffffffffffffff
[   56.991985] R13: 00007fb0f1f7a970 R14: 0000560df4e4ea65 R15:
0000560df4e71bd0
[   56.996963]  </TASK>
[   56.997533] Disabling lock debugging due to kernel taint
[   57.037759] zram: Added device: zram0
[   57.088669] zram: Added device: zram1
[   57.249105] zram0: detected capacity change from 0 to 6553600
[   57.320547] zram1: detected capacity change from 0 to 40960000
[   57.443012] Adding 3276796k swap on /dev/zram0.  Priority:100 extents:1
across:3276796k SS
[   57.470295] Adding 20479996k swap on /dev/zram1.  Priority:0 extents:1
across:20479996k SS

Here is the bisect log:

$ git bisect log
# bad: [684826f8271ad97580b138b9ffd462005e470b99] zram: free secondary
algorithms names
# good: [2cacbdfdee65b18f9952620e762eab043d71b564] mm: swap: add a adaptive
full cluster cache reclaim
git bisect start 'mm-stable' 'HEAD'
# good: [9bfbaa5e44c52422a046ce291469c8ebeb6c475d] mm/damon: move kunit
tests to tests/ subdirectory with _kunit suffix
git bisect good 9bfbaa5e44c52422a046ce291469c8ebeb6c475d
# good: [1e673c8cf7f9c1156f615b7c00f224a8110070da] zram: add dictionary
support to lz4hc
git bisect good 1e673c8cf7f9c1156f615b7c00f224a8110070da
# good: [3c8e44c9b369b3d422516b3f2bf47a6e3c61d1ea] mm: mark special bits
for huge pfn mappings when inject
git bisect good 3c8e44c9b369b3d422516b3f2bf47a6e3c61d1ea
# good: [f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101] vfio/pci: implement
huge_fault support
git bisect good f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101
# good: [659c55ef981bb63355a65ffc3b3b5cad562b806a] mm/vma: return the exact
errno in vms_gather_munmap_vmas()
git bisect good 659c55ef981bb63355a65ffc3b3b5cad562b806a
# good: [325efb16da2c840e165d9b620fec8049d4d664cc] mm: add nr argument in
mem_cgroup_swapin_uncharge_swap() helper to support large folios
git bisect good 325efb16da2c840e165d9b620fec8049d4d664cc
# good: [ed8d5b0ce1d738e13c60d6b1a901a56d832e5070] Revert "uprobes: use
vm_special_mapping close() functionality"
git bisect good ed8d5b0ce1d738e13c60d6b1a901a56d832e5070
# good: [2abbcc099ec60844ca7c15214ab12955d3c11e68] uprobes: turn
xol_area->pages[2] into xol_area->page
git bisect good 2abbcc099ec60844ca7c15214ab12955d3c11e68
# first bad commit: [684826f8271ad97580b138b9ffd462005e470b99] zram: free
secondary algorithms names

Sergey told me there is a fix on the way:
https://lore.kernel.org/all/20240923164843.1117010-1-andrej.skvortzov@gmail.com/

This commit did not really break my swap stress test, the test can pass
those kernel oops messages. It is just my bisect script that picks up the
kernel oops and determines that is a bad commit. There is another bad
commit in the current mm-unstable I need to haunt down.

Chris



On Mon, Sep 16, 2024 at 6:30 PM Sergey Senozhatsky <senozhatsky@chromium.org>
wrote:

> We need to kfree() secondary algorithms names when reset
> zram device that had multi-streams, otherwise we leak memory.
>
> Fixes: 001d92735701 ("zram: add recompression algorithm sysfs knob")
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  drivers/block/zram/zram_drv.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index f8206ba6cbbb..c3d245617083 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -2115,6 +2115,11 @@ static void zram_destroy_comps(struct zram *zram)
>                 zram->num_active_comps--;
>         }
>
> +       for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
> +               kfree(zram->comp_algs[prio]);
> +               zram->comp_algs[prio] = NULL;
> +       }
> +
>         zram_comp_params_reset(zram);
>  }
>
> --
> 2.46.0.662.g92d0881bb0-goog
>
>
>

[-- Attachment #2: Type: text/html, Size: 11573 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCHv2] zram: free secondary algorithms names
  2024-09-24 15:36   ` [PATCHv2] zram: free secondary algorithms names Chris Li
@ 2024-09-24 15:52     ` Chris Li
  2024-09-24 18:05       ` Chris Li
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Li @ 2024-09-24 15:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Minchan Kim, linux-kernel, linux-mm, Kairui Song

Hi Sergey,

On Tue, Sep 24, 2024 at 8:36 AM Chris Li <chrisl@kernel.org> wrote:
>
> Hi Sergey,
>
> The current mm-unstable is breaking my swap stress test again. While there seems to be multiple bad commits that cause it. I have bisected into this commit causing kernel warning and followed by BUG().
>
> [   56.630032] zswap: loaded using pool lzo/zsmalloc
> [   56.718027] zram0: detected capacity change from 16777216 to 0
> [   56.725492] zram: Removed device: zram0
> [   56.740125] ------------[ cut here ]------------
> [   56.744616] WARNING: CPU: 2 PID: 1894 at mm/slub.c:4556 free_large_kmalloc+0x4d/0x80
> [   56.745119] Modules linked in:
> [   56.749551] CPU: 2 UID: 0 PID: 1894 Comm: zram-generator Tainted: G S                 6.11.0-rc6+ #33
> [   56.750129] Tainted: [S]=CPU_OUT_OF_SPEC
> [   56.750908] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/21/2023
> [   56.751354] RIP: 0010:free_large_kmalloc+0x4d/0x80
> [   56.756120] Code: 00 10 00 00 48 d3 e0 f7 d8 81 e2 c0 00 00 00 75 2f 89 c6 48 89 df e8 82 ff ff ff f0 ff 4b 34 0f 85 e
> 9 7d f5 00 e9 eb 7d f5 00 <0f> 0b 80 3d a8 f3 9b 02 00 0f 84 bd 7d f5 00 b8 00 f0 ff ff eb d1
> [   56.761370] RSP: 0018:ffffaeaaa3657b20 EFLAGS: 00010246
> [   56.761676] RAX: 0057ffffc0002000 RBX: ffffece0c1f40e80 RCX: 000000008040003f
> [   56.766293] RDX: ffffece0c1f40e88 RSI: ffffffff9a03a131 RDI: ffffece0c1f40e80
> [   56.770931] RBP: 0000000000200000 R08: ffff95571d256480 R09: 000000008040003f
> [   56.775540] R10: 000000008040003f R11: 000000000000032c R12: 0000000000200000
> [   56.780212] R13: ffff953787c71e40 R14: 0000000000000047 R15: ffff95379b2e3e20
> [   56.784943] FS:  00007fb0f1d58bc0(0000) GS:ffff95567ed00000(0000) knlGS:0000000000000000
> [   56.785403] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.789937] CR2: 00007f35b6449050 CR3: 00000001112ac006 CR4: 00000000001706f0
> [   56.794784] Call Trace:
> [   56.794941]  <TASK>
> [   56.799377]  ? free_large_kmalloc+0x4d/0x80
> [   56.799598]  ? __warn.cold+0x8e/0xe8
> [   56.799842]  ? free_large_kmalloc+0x4d/0x80
> [   56.800065]  ? report_bug+0xff/0x140
> [   56.800296]  ? handle_bug+0x3c/0x80
> [   56.804703]  ? exc_invalid_op+0x17/0x70
> [   56.804912]  ? asm_exc_invalid_op+0x1a/0x20
> [   56.805132]  ? free_large_kmalloc+0x4d/0x80
> [   56.805344]  zram_destroy_comps+0x32/0x70
> [   56.805568]  zram_reset_device+0x102/0x190
> [   56.805812]  reset_store+0xa6/0x110
> [   56.810207]  kernfs_fop_write_iter+0x141/0x1f0
> [   56.814689]  vfs_write+0x294/0x460
> [   56.819106]  ksys_write+0x6d/0xf0
> [   56.823550]  do_syscall_64+0x82/0x160
> [   56.823827]  ? __pfx_kfree_link+0x10/0x10
> [   56.824051]  ? do_sys_openat2+0x9c/0xe0
> [   56.824263]  ? __handle_mm_fault+0xb34/0xfb0
> [   56.828752]  ? syscall_exit_to_user_mode+0x10/0x220
> [   56.833220]  ? do_syscall_64+0x8e/0x160
> [   56.833429]  ? __count_memcg_events+0x77/0x130
> [   56.838021]  ? count_memcg_events.constprop.0+0x1a/0x30
> [   56.838318]  ? handle_mm_fault+0x1bb/0x2c0
> [   56.838542]  ? do_user_addr_fault+0x55a/0x7b0
> [   56.843014]  ? exc_page_fault+0x7e/0x180
> [   56.843228]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   56.843831] RIP: 0033:0x7fb0f1f7a984
> [   56.844045] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 06 0e 00 00 7
> 4 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
> [   56.849247] RSP: 002b:00007ffc7db8fde8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> [   56.853889] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fb0f1f7a984
> [   56.858482] RDX: 0000000000000001 RSI: 0000560df4e4ea65 RDI: 0000000000000004
> [   56.863154] RBP: 0000000000000004 R08: 0000560e0e417010 R09: 0000000000000007
> [   56.867794] R10: 00000000000001b6 R11: 0000000000000202 R12: 7fffffffffffffff
> [   56.872980] R13: 00007fb0f1f7a970 R14: 0000560df4e4ea65 R15: 0000560df4e71bd0
> [   56.878043]  </TASK>
> [   56.878555] ---[ end trace 0000000000000000 ]---
> [   56.883420] object pointer: 0x00000000f38e5ae7
> [   56.888235] BUG: Bad page state in process zram-generator  pfn:407d03a
> [   56.889026] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x407d03a
> [   56.889877] flags: 0x57ffffc0002000(reserved|node=1|zone=2|lastcpupid=0x1fffff)
> [   56.894915] raw: 0057ffffc0002000 ffffece0c1f40e88 ffffece0c1f40e88 0000000000000000
> [   56.895771] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> [   56.896562] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [   56.897332] Modules linked in:
> [   56.902165] CPU: 2 UID: 0 PID: 1894 Comm: zram-generator Tainted: G S      W          6.11.0-rc6+ #33
> [   56.903155] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
> [   56.908082] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/21/2023
> [   56.908918] Call Trace:
> [   56.909484]  <TASK>
> [   56.914148]  dump_stack_lvl+0x5d/0x80
> [   56.914747]  bad_page.cold+0x7a/0x91
> [   56.915318]  free_unref_page+0x344/0x520
> [   56.915975]  zram_destroy_comps+0x32/0x70
> [   56.916452]  zram_reset_device+0x102/0x190
> [   56.917057]  reset_store+0xa6/0x110
> [   56.921874]  kernfs_fop_write_iter+0x141/0x1f0
> [   56.926685]  vfs_write+0x294/0x460
> [   56.931385]  ksys_write+0x6d/0xf0
> [   56.936087]  do_syscall_64+0x82/0x160
> [   56.936656]  ? __pfx_kfree_link+0x10/0x10
> [   56.937257]  ? do_sys_openat2+0x9c/0xe0
> [   56.937810]  ? __handle_mm_fault+0xb34/0xfb0
> [   56.942593]  ? syscall_exit_to_user_mode+0x10/0x220
> [   56.947362]  ? do_syscall_64+0x8e/0x160
> [   56.947974]  ? __count_memcg_events+0x77/0x130
> [   56.952762]  ? count_memcg_events.constprop.0+0x1a/0x30
> [   56.953356]  ? handle_mm_fault+0x1bb/0x2c0
> [   56.953937]  ? do_user_addr_fault+0x55a/0x7b0
> [   56.958999]  ? exc_page_fault+0x7e/0x180
> [   56.959523]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   56.960163] RIP: 0033:0x7fb0f1f7a984
> [   56.960731] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 06 0e 00 00 7
> 4 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
> [   56.966840] RSP: 002b:00007ffc7db8fde8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> [   56.971903] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fb0f1f7a984
> [   56.976953] RDX: 0000000000000001 RSI: 0000560df4e4ea65 RDI: 0000000000000004
> [   56.981946] RBP: 0000000000000004 R08: 0000560e0e417010 R09: 0000000000000007
> [   56.986980] R10: 00000000000001b6 R11: 0000000000000202 R12: 7fffffffffffffff
> [   56.991985] R13: 00007fb0f1f7a970 R14: 0000560df4e4ea65 R15: 0000560df4e71bd0
> [   56.996963]  </TASK>
> [   56.997533] Disabling lock debugging due to kernel taint
> [   57.037759] zram: Added device: zram0
> [   57.088669] zram: Added device: zram1
> [   57.249105] zram0: detected capacity change from 0 to 6553600
> [   57.320547] zram1: detected capacity change from 0 to 40960000
> [   57.443012] Adding 3276796k swap on /dev/zram0.  Priority:100 extents:1 across:3276796k SS
> [   57.470295] Adding 20479996k swap on /dev/zram1.  Priority:0 extents:1 across:20479996k SS
>
> Here is the bisect log:
>
> $ git bisect log
> # bad: [684826f8271ad97580b138b9ffd462005e470b99] zram: free secondary algorithms names
> # good: [2cacbdfdee65b18f9952620e762eab043d71b564] mm: swap: add a adaptive full cluster cache reclaim
> git bisect start 'mm-stable' 'HEAD'
> # good: [9bfbaa5e44c52422a046ce291469c8ebeb6c475d] mm/damon: move kunit tests to tests/ subdirectory with _kunit suffix
> git bisect good 9bfbaa5e44c52422a046ce291469c8ebeb6c475d
> # good: [1e673c8cf7f9c1156f615b7c00f224a8110070da] zram: add dictionary support to lz4hc
> git bisect good 1e673c8cf7f9c1156f615b7c00f224a8110070da
> # good: [3c8e44c9b369b3d422516b3f2bf47a6e3c61d1ea] mm: mark special bits for huge pfn mappings when inject
> git bisect good 3c8e44c9b369b3d422516b3f2bf47a6e3c61d1ea
> # good: [f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101] vfio/pci: implement huge_fault support
> git bisect good f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101
> # good: [659c55ef981bb63355a65ffc3b3b5cad562b806a] mm/vma: return the exact errno in vms_gather_munmap_vmas()
> git bisect good 659c55ef981bb63355a65ffc3b3b5cad562b806a
> # good: [325efb16da2c840e165d9b620fec8049d4d664cc] mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
> git bisect good 325efb16da2c840e165d9b620fec8049d4d664cc
> # good: [ed8d5b0ce1d738e13c60d6b1a901a56d832e5070] Revert "uprobes: use vm_special_mapping close() functionality"
> git bisect good ed8d5b0ce1d738e13c60d6b1a901a56d832e5070
> # good: [2abbcc099ec60844ca7c15214ab12955d3c11e68] uprobes: turn xol_area->pages[2] into xol_area->page
> git bisect good 2abbcc099ec60844ca7c15214ab12955d3c11e68
> # first bad commit: [684826f8271ad97580b138b9ffd462005e470b99] zram: free secondary algorithms names
>
> Sergey told me there is a fix on the way:
> https://lore.kernel.org/all/20240923164843.1117010-1-andrej.skvortzov@gmail.com/

Confirm the fix in the above thread to fix the kernel oops for me.

Tested-by: Chris Li <chrisl@kernel.org>

Chris


> On Mon, Sep 16, 2024 at 6:30 PM Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
>>
>> We need to kfree() secondary algorithms names when reset
>> zram device that had multi-streams, otherwise we leak memory.
>>
>> Fixes: 001d92735701 ("zram: add recompression algorithm sysfs knob")
>> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
>> ---
>>  drivers/block/zram/zram_drv.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> index f8206ba6cbbb..c3d245617083 100644
>> --- a/drivers/block/zram/zram_drv.c
>> +++ b/drivers/block/zram/zram_drv.c
>> @@ -2115,6 +2115,11 @@ static void zram_destroy_comps(struct zram *zram)
>>                 zram->num_active_comps--;
>>         }
>>
>> +       for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
>> +               kfree(zram->comp_algs[prio]);
>> +               zram->comp_algs[prio] = NULL;
>> +       }
>> +
>>         zram_comp_params_reset(zram);
>>  }
>>
>> --
>> 2.46.0.662.g92d0881bb0-goog
>>
>>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCHv2] zram: free secondary algorithms names
  2024-09-24 15:52     ` Chris Li
@ 2024-09-24 18:05       ` Chris Li
  0 siblings, 0 replies; 3+ messages in thread
From: Chris Li @ 2024-09-24 18:05 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Minchan Kim, linux-kernel, linux-mm, Kairui Song

On Tue, Sep 24, 2024 at 8:52 AM Chris Li <chriscli@google.com> wrote:
>
> Hi Sergey,
>
> On Tue, Sep 24, 2024 at 8:36 AM Chris Li <chrisl@kernel.org> wrote:
> >
> > Hi Sergey,
> >
> > The current mm-unstable is breaking my swap stress test again. While there seems to be multiple bad commits that cause it. I have bisected into this commit causing kernel warning and followed by BUG().
> >
> > [   56.630032] zswap: loaded using pool lzo/zsmalloc
> > [   56.718027] zram0: detected capacity change from 16777216 to 0
> > [   56.725492] zram: Removed device: zram0
> > [   56.740125] ------------[ cut here ]------------
> > [   56.744616] WARNING: CPU: 2 PID: 1894 at mm/slub.c:4556 free_large_kmalloc+0x4d/0x80
> > [   56.745119] Modules linked in:
> > [   56.749551] CPU: 2 UID: 0 PID: 1894 Comm: zram-generator Tainted: G S                 6.11.0-rc6+ #33
> > [   56.750129] Tainted: [S]=CPU_OUT_OF_SPEC
> > [   56.750908] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/21/2023
> > [   56.751354] RIP: 0010:free_large_kmalloc+0x4d/0x80
> > [   56.756120] Code: 00 10 00 00 48 d3 e0 f7 d8 81 e2 c0 00 00 00 75 2f 89 c6 48 89 df e8 82 ff ff ff f0 ff 4b 34 0f 85 e
> > 9 7d f5 00 e9 eb 7d f5 00 <0f> 0b 80 3d a8 f3 9b 02 00 0f 84 bd 7d f5 00 b8 00 f0 ff ff eb d1
> > [   56.761370] RSP: 0018:ffffaeaaa3657b20 EFLAGS: 00010246
> > [   56.761676] RAX: 0057ffffc0002000 RBX: ffffece0c1f40e80 RCX: 000000008040003f
> > [   56.766293] RDX: ffffece0c1f40e88 RSI: ffffffff9a03a131 RDI: ffffece0c1f40e80
> > [   56.770931] RBP: 0000000000200000 R08: ffff95571d256480 R09: 000000008040003f
> > [   56.775540] R10: 000000008040003f R11: 000000000000032c R12: 0000000000200000
> > [   56.780212] R13: ffff953787c71e40 R14: 0000000000000047 R15: ffff95379b2e3e20
> > [   56.784943] FS:  00007fb0f1d58bc0(0000) GS:ffff95567ed00000(0000) knlGS:0000000000000000
> > [   56.785403] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   56.789937] CR2: 00007f35b6449050 CR3: 00000001112ac006 CR4: 00000000001706f0
> > [   56.794784] Call Trace:
> > [   56.794941]  <TASK>
> > [   56.799377]  ? free_large_kmalloc+0x4d/0x80
> > [   56.799598]  ? __warn.cold+0x8e/0xe8
> > [   56.799842]  ? free_large_kmalloc+0x4d/0x80
> > [   56.800065]  ? report_bug+0xff/0x140
> > [   56.800296]  ? handle_bug+0x3c/0x80
> > [   56.804703]  ? exc_invalid_op+0x17/0x70
> > [   56.804912]  ? asm_exc_invalid_op+0x1a/0x20
> > [   56.805132]  ? free_large_kmalloc+0x4d/0x80
> > [   56.805344]  zram_destroy_comps+0x32/0x70
> > [   56.805568]  zram_reset_device+0x102/0x190
> > [   56.805812]  reset_store+0xa6/0x110
> > [   56.810207]  kernfs_fop_write_iter+0x141/0x1f0
> > [   56.814689]  vfs_write+0x294/0x460
> > [   56.819106]  ksys_write+0x6d/0xf0
> > [   56.823550]  do_syscall_64+0x82/0x160
> > [   56.823827]  ? __pfx_kfree_link+0x10/0x10
> > [   56.824051]  ? do_sys_openat2+0x9c/0xe0
> > [   56.824263]  ? __handle_mm_fault+0xb34/0xfb0
> > [   56.828752]  ? syscall_exit_to_user_mode+0x10/0x220
> > [   56.833220]  ? do_syscall_64+0x8e/0x160
> > [   56.833429]  ? __count_memcg_events+0x77/0x130
> > [   56.838021]  ? count_memcg_events.constprop.0+0x1a/0x30
> > [   56.838318]  ? handle_mm_fault+0x1bb/0x2c0
> > [   56.838542]  ? do_user_addr_fault+0x55a/0x7b0
> > [   56.843014]  ? exc_page_fault+0x7e/0x180
> > [   56.843228]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [   56.843831] RIP: 0033:0x7fb0f1f7a984
> > [   56.844045] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 06 0e 00 00 7
> > 4 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
> > [   56.849247] RSP: 002b:00007ffc7db8fde8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> > [   56.853889] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fb0f1f7a984
> > [   56.858482] RDX: 0000000000000001 RSI: 0000560df4e4ea65 RDI: 0000000000000004
> > [   56.863154] RBP: 0000000000000004 R08: 0000560e0e417010 R09: 0000000000000007
> > [   56.867794] R10: 00000000000001b6 R11: 0000000000000202 R12: 7fffffffffffffff
> > [   56.872980] R13: 00007fb0f1f7a970 R14: 0000560df4e4ea65 R15: 0000560df4e71bd0
> > [   56.878043]  </TASK>
> > [   56.878555] ---[ end trace 0000000000000000 ]---
> > [   56.883420] object pointer: 0x00000000f38e5ae7
> > [   56.888235] BUG: Bad page state in process zram-generator  pfn:407d03a
> > [   56.889026] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x407d03a
> > [   56.889877] flags: 0x57ffffc0002000(reserved|node=1|zone=2|lastcpupid=0x1fffff)
> > [   56.894915] raw: 0057ffffc0002000 ffffece0c1f40e88 ffffece0c1f40e88 0000000000000000
> > [   56.895771] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> > [   56.896562] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> > [   56.897332] Modules linked in:
> > [   56.902165] CPU: 2 UID: 0 PID: 1894 Comm: zram-generator Tainted: G S      W          6.11.0-rc6+ #33
> > [   56.903155] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
> > [   56.908082] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/21/2023
> > [   56.908918] Call Trace:
> > [   56.909484]  <TASK>
> > [   56.914148]  dump_stack_lvl+0x5d/0x80
> > [   56.914747]  bad_page.cold+0x7a/0x91
> > [   56.915318]  free_unref_page+0x344/0x520
> > [   56.915975]  zram_destroy_comps+0x32/0x70
> > [   56.916452]  zram_reset_device+0x102/0x190
> > [   56.917057]  reset_store+0xa6/0x110
> > [   56.921874]  kernfs_fop_write_iter+0x141/0x1f0
> > [   56.926685]  vfs_write+0x294/0x460
> > [   56.931385]  ksys_write+0x6d/0xf0
> > [   56.936087]  do_syscall_64+0x82/0x160
> > [   56.936656]  ? __pfx_kfree_link+0x10/0x10
> > [   56.937257]  ? do_sys_openat2+0x9c/0xe0
> > [   56.937810]  ? __handle_mm_fault+0xb34/0xfb0
> > [   56.942593]  ? syscall_exit_to_user_mode+0x10/0x220
> > [   56.947362]  ? do_syscall_64+0x8e/0x160
> > [   56.947974]  ? __count_memcg_events+0x77/0x130
> > [   56.952762]  ? count_memcg_events.constprop.0+0x1a/0x30
> > [   56.953356]  ? handle_mm_fault+0x1bb/0x2c0
> > [   56.953937]  ? do_user_addr_fault+0x55a/0x7b0
> > [   56.958999]  ? exc_page_fault+0x7e/0x180
> > [   56.959523]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [   56.960163] RIP: 0033:0x7fb0f1f7a984
> > [   56.960731] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 06 0e 00 00 7
> > 4 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
> > [   56.966840] RSP: 002b:00007ffc7db8fde8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> > [   56.971903] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fb0f1f7a984
> > [   56.976953] RDX: 0000000000000001 RSI: 0000560df4e4ea65 RDI: 0000000000000004
> > [   56.981946] RBP: 0000000000000004 R08: 0000560e0e417010 R09: 0000000000000007
> > [   56.986980] R10: 00000000000001b6 R11: 0000000000000202 R12: 7fffffffffffffff
> > [   56.991985] R13: 00007fb0f1f7a970 R14: 0000560df4e4ea65 R15: 0000560df4e71bd0
> > [   56.996963]  </TASK>
> > [   56.997533] Disabling lock debugging due to kernel taint
> > [   57.037759] zram: Added device: zram0
> > [   57.088669] zram: Added device: zram1
> > [   57.249105] zram0: detected capacity change from 0 to 6553600
> > [   57.320547] zram1: detected capacity change from 0 to 40960000
> > [   57.443012] Adding 3276796k swap on /dev/zram0.  Priority:100 extents:1 across:3276796k SS
> > [   57.470295] Adding 20479996k swap on /dev/zram1.  Priority:0 extents:1 across:20479996k SS
> >
> > Here is the bisect log:
> >
> > $ git bisect log
> > # bad: [684826f8271ad97580b138b9ffd462005e470b99] zram: free secondary algorithms names
> > # good: [2cacbdfdee65b18f9952620e762eab043d71b564] mm: swap: add a adaptive full cluster cache reclaim
> > git bisect start 'mm-stable' 'HEAD'
> > # good: [9bfbaa5e44c52422a046ce291469c8ebeb6c475d] mm/damon: move kunit tests to tests/ subdirectory with _kunit suffix
> > git bisect good 9bfbaa5e44c52422a046ce291469c8ebeb6c475d
> > # good: [1e673c8cf7f9c1156f615b7c00f224a8110070da] zram: add dictionary support to lz4hc
> > git bisect good 1e673c8cf7f9c1156f615b7c00f224a8110070da
> > # good: [3c8e44c9b369b3d422516b3f2bf47a6e3c61d1ea] mm: mark special bits for huge pfn mappings when inject
> > git bisect good 3c8e44c9b369b3d422516b3f2bf47a6e3c61d1ea
> > # good: [f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101] vfio/pci: implement huge_fault support
> > git bisect good f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101
> > # good: [659c55ef981bb63355a65ffc3b3b5cad562b806a] mm/vma: return the exact errno in vms_gather_munmap_vmas()
> > git bisect good 659c55ef981bb63355a65ffc3b3b5cad562b806a
> > # good: [325efb16da2c840e165d9b620fec8049d4d664cc] mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
> > git bisect good 325efb16da2c840e165d9b620fec8049d4d664cc
> > # good: [ed8d5b0ce1d738e13c60d6b1a901a56d832e5070] Revert "uprobes: use vm_special_mapping close() functionality"
> > git bisect good ed8d5b0ce1d738e13c60d6b1a901a56d832e5070
> > # good: [2abbcc099ec60844ca7c15214ab12955d3c11e68] uprobes: turn xol_area->pages[2] into xol_area->page
> > git bisect good 2abbcc099ec60844ca7c15214ab12955d3c11e68
> > # first bad commit: [684826f8271ad97580b138b9ffd462005e470b99] zram: free secondary algorithms names
> >
> > Sergey told me there is a fix on the way:
> > https://lore.kernel.org/all/20240923164843.1117010-1-andrej.skvortzov@gmail.com/
>
> Confirm the fix in the above thread to fix the kernel oops for me.
>
> Tested-by: Chris Li <chrisl@kernel.org>

Sorry I have to withdraw that Tested-by.  Turns out the initial
warning and oops disappear, the swap stress test got oom killed.
I should have waited for the test to complete before sending out emails.

Will report more detail of the oom kill in that email thread.

Chris

>
>
> Chris
>
>
> > On Mon, Sep 16, 2024 at 6:30 PM Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> >>
> >> We need to kfree() secondary algorithms names when reset
> >> zram device that had multi-streams, otherwise we leak memory.
> >>
> >> Fixes: 001d92735701 ("zram: add recompression algorithm sysfs knob")
> >> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> >> ---
> >>  drivers/block/zram/zram_drv.c | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> >> index f8206ba6cbbb..c3d245617083 100644
> >> --- a/drivers/block/zram/zram_drv.c
> >> +++ b/drivers/block/zram/zram_drv.c
> >> @@ -2115,6 +2115,11 @@ static void zram_destroy_comps(struct zram *zram)
> >>                 zram->num_active_comps--;
> >>         }
> >>
> >> +       for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
> >> +               kfree(zram->comp_algs[prio]);
> >> +               zram->comp_algs[prio] = NULL;
> >> +       }
> >> +
> >>         zram_comp_params_reset(zram);
> >>  }
> >>
> >> --
> >> 2.46.0.662.g92d0881bb0-goog
> >>
> >>


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-09-24 18:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20240911025600.3681789-1-senozhatsky@chromium.org>
     [not found] ` <20240917013021.868769-1-senozhatsky@chromium.org>
2024-09-24 15:36   ` [PATCHv2] zram: free secondary algorithms names Chris Li
2024-09-24 15:52     ` Chris Li
2024-09-24 18:05       ` Chris Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox