From: Andrew Morton <akpm@linux-foundation.org>
To: liuye <liuye@kylinos.cn>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/vmscan: Fix hard LOCKUP in function isolate_lru_folios
Date: Wed, 14 Aug 2024 14:27:43 -0700 [thread overview]
Message-ID: <20240814142743.c8227d72be4c5fd9777a4717@linux-foundation.org> (raw)
In-Reply-To: <20240814091825.27262-1-liuye@kylinos.cn>
On Wed, 14 Aug 2024 17:18:25 +0800 liuye <liuye@kylinos.cn> wrote:
> This fixes the following hard lockup in function isolate_lru_folios
> when memory reclaim.If the LRU mostly contains ineligible folios
> May trigger watchdog.
>
> watchdog: Watchdog detected hard LOCKUP on cpu 173
> RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0
> Call Trace:
> _raw_spin_lock_irqsave+0x31/0x40
> folio_lruvec_lock_irqsave+0x5f/0x90
> folio_batch_move_lru+0x91/0x150
> lru_add_drain_per_cpu+0x1c/0x40
> process_one_work+0x17d/0x350
> worker_thread+0x27b/0x3a0
> kthread+0xe8/0x120
> ret_from_fork+0x34/0x50
> ret_from_fork_asm+0x1b/0x30
>
> lruvec->lru_lock owner:
>
> PID: 2865 TASK: ffff888139214d40 CPU: 40 COMMAND: "kswapd0"
> #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555
> #1 [fffffe0000945e68] nmi_handle at ffffffffa563b171
> #2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920
> #3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4
> #4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde
> [exception RIP: isolate_lru_folios+403]
> RIP: ffffffffa597df53 RSP: ffffc90006fb7c28 RFLAGS: 00000002
> RAX: 0000000000000001 RBX: ffffc90006fb7c60 RCX: ffffea04a2196f88
> RDX: ffffc90006fb7c60 RSI: ffffc90006fb7c60 RDI: ffffea04a2197048
> RBP: ffff88812cbd3010 R8: ffffea04a2197008 R9: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000001 R12: ffffea04a2197008
> R13: ffffea04a2197048 R14: ffffc90006fb7de8 R15: 0000000003e3e937
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> <NMI exception stack>
> #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
> #6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788
> #7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0
> #8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354
> #9 [ffffc90006fb7ef8] kthread at ffffffffa5748238
> crash>
Well that's bad.
> Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Merged in 2016.
Can you please describe how to reproduce this? Under what circumstances
does it occur? Why do you think it took eight years to be discovered?
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1655,6 +1655,7 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
> unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
> unsigned long skipped = 0;
> unsigned long scan, total_scan, nr_pages;
> + unsigned long max_nr_skipped = 0;
> LIST_HEAD(folios_skipped);
>
> total_scan = 0;
> @@ -1669,10 +1670,12 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
> nr_pages = folio_nr_pages(folio);
> total_scan += nr_pages;
>
> - if (folio_zonenum(folio) > sc->reclaim_idx ||
> - skip_cma(folio, sc)) {
> + /* Using max_nr_skipped to prevent hard LOCKUP*/
> + if ((max_nr_skipped < SWAP_CLUSTER_MAX_SKIPPED) &&
> + (folio_zonenum(folio) > sc->reclaim_idx || skip_cma(folio, sc))) {
> nr_skipped[folio_zonenum(folio)] += nr_pages;
> move_to = &folios_skipped;
> + max_nr_skipped++;
> goto move;
> }
It looks like that will fix, but perhaps something more fundamental
needs to be done - we're doing a tremendous amount of pretty pointless
work here. Answers to my above questions will help us resolve this.
Thanks.
next prev parent reply other threads:[~2024-08-14 21:27 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-14 9:18 liuye
2024-08-14 21:27 ` Andrew Morton [this message]
2024-09-19 2:14 ` [PATCH v2] " liuye
2024-09-20 6:31 ` Bharata B Rao
[not found] ` <1727070383769353.48.seg@mailgw.kylinos.cn>
2024-09-23 6:03 ` liuye
2024-11-19 6:08 ` [PATCH v2 RESEND] " liuye
2024-11-30 3:22 ` Andrew Morton
2024-12-05 3:55 ` Hugh Dickins
[not found] ` <1733382994392357.312.seg@mailgw.kylinos.cn>
2024-12-11 7:26 ` liuye
2024-09-25 0:22 ` [PATCH] " Andrew Morton
2024-09-25 8:37 ` liuye
2024-09-25 9:29 ` Andrew Morton
2024-09-25 9:53 ` liuye
[not found] <20240815025226.8973-1-liuye@kylinos.cn>
2024-08-23 2:04 ` Re: " liuye
2024-09-03 2:34 ` liuye
2024-09-03 3:03 ` liuye
2024-09-06 1:16 ` liuye
2024-09-11 2:56 ` liuye
2024-12-05 19:17 ` Yu Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240814142743.c8227d72be4c5fd9777a4717@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liuye@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox