From: Usama Arif <usamaarif642@gmail.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Waiman Long <longman@redhat.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@suse.com>,
Chengming Zhou <zhouchengming@bytedance.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Muchun Song <muchun.song@linux.dev>,
Nhat Pham <nphamcs@gmail.com>,
Yosry Ahmed <yosryahmed@google.com>
Subject: Re: [PATCH v2 5/6] mm/list_lru: split the lock to per-cgroup scope
Date: Mon, 28 Oct 2024 13:22:25 +0000 [thread overview]
Message-ID: <6da1b9a9-dc77-48f2-8ab2-7b672e6c11ad@gmail.com> (raw)
In-Reply-To: <CAMgjq7D_OA=vYf5SnNnKXjppPFhDqsbYF--6=cOayKiadxuwrQ@mail.gmail.com>
On 27/10/2024 17:26, Kairui Song wrote:
> Hi Usama,
>
>>
>> Hi Kairui,
>>
>> I was testing zswap writeback in mm-unstable, and I think this patch might be breaking things.
>>
>> I have added the panic below
>>
>> 130.051024] ------------[ cut here ]------------
>> [ 130.051489] kernel BUG at mm/list_lru.c:321!
>> [ 130.051732] Oops: invalid opcode: 0000 [#1] SMP
>> [ 130.052133] CPU: 1 UID: 0 PID: 4976 Comm: cc1 Not tainted 6.12.0-rc1-00084-g278bd01cdaf1 #276
>> [ 130.052595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.el9 04/01/2014
>> [ 130.053276] RIP: 0010:__list_lru_walk_one+0x1ae/0x1b0
>> [ 130.053983] Code: 7c 24 78 00 74 03 fb eb 00 48 89 d8 48 83 c4 40 5b 41 5c 41 5d 41 5e 41 5f 5d c3 41 c6 07 00 eb e8 41 c6 07 00 fb eb e1 0f 0b <0f> 0b 0f 1f 44 00 00 6a 01 e8 44 fe ff ff 48 83 c4 08 c3 66 2e 0f
>> [ 130.055557] RSP: 0000:ffffc90004a2b9a0 EFLAGS: 00010246
>> [ 130.056084] RAX: ffff88805dedf6e8 RBX: 0000000000000071 RCX: 0000000000000005
>> [ 130.057407] RDX: 0000000000000000 RSI: 0000000000000022 RDI: ffff888008a26400
>> [ 130.057794] RBP: ffff88805dedf6d0 R08: 0000000000000402 R09: 0000000000000001
>> [ 130.058579] R10: ffffc90004a2b7e8 R11: 0000000000000000 R12: ffffffff81342930
>> [ 130.058962] R13: ffff888017532ca0 R14: ffffc90004a2bae8 R15: ffff8880175322c8
>> [ 130.059773] FS: 00007ff3f1e21f00(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
>> [ 130.060242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 130.060563] CR2: 00007f428e2e2ed8 CR3: 0000000067db6001 CR4: 0000000000770ef0
>> [ 130.060952] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 130.061658] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 130.062425] PKRU: 55555554
>> [ 130.062578] Call Trace:
>> [ 130.062720] <TASK>
>> [ 130.062941] ? __die_body+0x66/0xb0
>> [ 130.063145] ? die+0x88/0xb0
>> [ 130.063309] ? do_trap+0x9d/0x170
>> [ 130.063499] ? __list_lru_walk_one+0x1ae/0x1b0
>> [ 130.063745] ? __list_lru_walk_one+0x1ae/0x1b0
>> [ 130.063995] ? handle_invalid_op+0x65/0x80
>> [ 130.064223] ? __list_lru_walk_one+0x1ae/0x1b0
>> [ 130.064467] ? exc_invalid_op+0x2f/0x40
>> [ 130.064681] ? asm_exc_invalid_op+0x16/0x20
>> [ 130.064912] ? zswap_shrinker_count+0x1c0/0x1c0
>> [ 130.065172] ? __list_lru_walk_one+0x1ae/0x1b0
>> [ 130.065417] list_lru_walk_one+0xc/0x20
>> [ 130.065630] zswap_shrinker_scan+0x4b/0x80
>> [ 130.065856] do_shrink_slab+0x15f/0x2f0
>> [ 130.066075] shrink_slab+0x2bf/0x3d0
>> [ 130.066276] shrink_node+0x4f0/0x8a0
>> [ 130.066477] do_try_to_free_pages+0x131/0x4d0
>> [ 130.066717] try_to_free_mem_cgroup_pages+0x143/0x220
>> [ 130.067000] try_charge_memcg+0x22a/0x610
>> [ 130.067224] __mem_cgroup_charge+0x74/0x100
>> [ 130.068060] do_pte_missing+0xaa8/0x1020
>> [ 130.068280] handle_mm_fault+0x75d/0x1120
>> [ 130.068502] do_user_addr_fault+0x1c2/0x6f0
>> [ 130.068802] exc_page_fault+0x4f/0xb0
>> [ 130.069014] asm_exc_page_fault+0x22/0x30
>> [ 130.069240] RIP: 0033:0x7ff3f19ede49
>> [ 130.069441] Code: c9 62 e1 7f 29 7f 00 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa f3 aa 48 89 d0 c3 48 3b 15 c9 a3 06 00 77 e7 <62> e1 fe 28 7f 07 62 e1 fe 28 7f 47 01 48 81 fa 80 00 00 00 76 89
>> [ 130.070477] RSP: 002b:00007ffc5c818078 EFLAGS: 00010283
>> [ 130.070830] RAX: 00007ff3efac9000 RBX: 00007ff3f02d1940 RCX: 0000000000000001
>> [ 130.071522] RDX: 00000000000005a8 RSI: 0000000000000000 RDI: 00007ff3efac9000
>> [ 130.072146] RBP: 00007ffc5c8180c0 R08: 0000000003007320 R09: 0000000000000007
>> [ 130.072594] R10: 0000000003007320 R11: 0000000000000012 R12: 00007ff3f1f0e000
>> [ 130.072981] R13: 000000007ffa1e74 R14: 00000000000005a8 R15: 00000000000000b5
>> [ 130.073369] </TASK>
>> [ 130.073496] Modules linked in:
>> [ 130.073701] ---[ end trace 0000000000000000 ]---
>> [ 130.073960] RIP: 0010:__list_lru_walk_one+0x1ae/0x1b0
>> [ 130.074319] Code: 7c 24 78 00 74 03 fb eb 00 48 89 d8 48 83 c4 40 5b 41 5c 41 5d 41 5e 41 5f 5d c3 41 c6 07 00 eb e8 41 c6 07 00 fb eb e1 0f 0b <0f> 0b 0f 1f 44 00 00 6a 01 e8 44 fe ff ff 48 83 c4 08 c3 66 2e 0f
>> [ 130.075564] RSP: 0000:ffffc90004a2b9a0 EFLAGS: 00010246
>> [ 130.075897] RAX: ffff88805dedf6e8 RBX: 0000000000000071 RCX: 0000000000000005
>> [ 130.076342] RDX: 0000000000000000 RSI: 0000000000000022 RDI: ffff888008a26400
>> [ 130.076739] RBP: ffff88805dedf6d0 R08: 0000000000000402 R09: 0000000000000001
>> [ 130.077192] R10: ffffc90004a2b7e8 R11: 0000000000000000 R12: ffffffff81342930
>> [ 130.077739] R13: ffff888017532ca0 R14: ffffc90004a2bae8 R15: ffff8880175322c8
>> [ 130.078149] FS: 00007ff3f1e21f00(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
>> [ 130.078764] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 130.079095] CR2: 00007f428e2e2ed8 CR3: 0000000067db6001 CR4: 0000000000770ef0
>> [ 130.079521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 130.080009] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 130.080402] PKRU: 55555554
>> [ 130.080713] Kernel panic - not syncing: Fatal exception
>> [ 130.081198] Kernel Offset: disabled
>> [ 130.081396] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> Thanks,
>> Usama
>>
>
> Thanks for the report. I converted list_lru_walk callback to keep the
> list unlocked when LRU_RETRY and LRU_REMOVED_RETRY is returned, but
> didn't notice shrink_memcg_cg in zswap.c could return LRU_STOP after
> it unlocked the list.
>
> The fix should be simple, is it easy to reproduce? Can you help verify?
>
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 79c2d21504a2..1a3caf4c4e14 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -298,9 +298,9 @@ __list_lru_walk_one(struct list_lru *lru, int nid,
> struct mem_cgroup *memcg,
> ret = isolate(item, l, cb_arg);
> switch (ret) {
> /*
> - * LRU_RETRY and LRU_REMOVED_RETRY will drop the lru lock,
> - * the list traversal will be invalid and have to restart from
> - * scratch.
> + * LRU_RETRY, LRU_REMOVED_RETRY and LRU_STOP will drop the lru
> + * lock, the list traversal will be invalid and have to restart
> + * from scratch.
> */
> case LRU_RETRY:
> goto restart;
> @@ -318,14 +318,13 @@ __list_lru_walk_one(struct list_lru *lru, int
> nid, struct mem_cgroup *memcg,
> case LRU_SKIP:
> break;
> case LRU_STOP:
> - assert_spin_locked(&l->lock);
> goto out;
> default:
> BUG();
> }
> }
> -out:
> unlock_list_lru(l, irq_off);
> +out:
> return isolated;
> }
Hi Kairui,
With this fix there are no more crashes. Thanks for the quick fix.
Just FYI, to test it, just enable zswap and zswap shrinker
(echo Y > /sys/module/zswap/parameters/shrinker_enabled)
and build the kernel in a memory constrained environment
(memory.max 1G).
Thanks,
Usama
next prev parent reply other threads:[~2024-10-28 13:22 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-25 17:10 [PATCH v2 0/6] Split list_lru lock into " Kairui Song
2024-09-25 17:10 ` [PATCH v2 1/6] mm/list_lru: don't pass unnecessary key parameters Kairui Song
2024-09-26 14:31 ` Shakeel Butt
2024-09-25 17:10 ` [PATCH v2 2/6] mm/list_lru: don't export list_lru_add Kairui Song
2024-09-26 14:32 ` Shakeel Butt
2024-09-25 17:10 ` [PATCH v2 3/6] mm/list_lru: code clean up for reparenting Kairui Song
2024-09-26 14:34 ` Shakeel Butt
2024-09-25 17:10 ` [PATCH v2 4/6] mm/list_lru: simplify reparenting and initial allocation Kairui Song
2024-09-25 17:10 ` [PATCH v2 5/6] mm/list_lru: split the lock to per-cgroup scope Kairui Song
2024-10-25 21:13 ` Usama Arif
2024-10-27 17:26 ` Kairui Song
2024-10-28 13:22 ` Usama Arif [this message]
2024-09-25 17:10 ` [PATCH v2 6/6] mm/list_lru: Simplify the list_lru walk callback function Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6da1b9a9-dc77-48f2-8ab2-7b672e6c11ad@gmail.com \
--to=usamaarif642@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=willy@infradead.org \
--cc=yosryahmed@google.com \
--cc=zhengqi.arch@bytedance.com \
--cc=zhouchengming@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox