* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2024-12-24 2:52 [PATCH v3] memcg: fix soft lockup in the OOM process Chen Ridong
@ 2024-12-24 23:06 ` David Rientjes
2025-01-03 16:18 ` Michal Koutný
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: David Rientjes @ 2024-12-24 23:06 UTC (permalink / raw)
To: Chen Ridong
Cc: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, handai.szj, kamezawa.hiroyu,
linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
On Tue, 24 Dec 2024, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> A soft lockup issue was found in the product with about 56,000 tasks were
> in the OOM cgroup, it was traversing them when the soft lockup was
> triggered.
>
> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
> Hardware name: Huawei Cloud OpenStack Nova, BIOS
> RIP: 0010:console_unlock+0x343/0x540
> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> vprintk_emit+0x193/0x280
> printk+0x52/0x6e
> dump_task+0x114/0x130
> mem_cgroup_scan_tasks+0x76/0x100
> dump_header+0x1fe/0x210
> oom_kill_process+0xd1/0x100
> out_of_memory+0x125/0x570
> mem_cgroup_out_of_memory+0xb5/0xd0
> try_charge+0x720/0x770
> mem_cgroup_try_charge+0x86/0x180
> mem_cgroup_try_charge_delay+0x1c/0x40
> do_anonymous_page+0xb5/0x390
> handle_mm_fault+0xc4/0x1f0
>
> This is because thousands of processes are in the OOM cgroup, it takes a
> long time to traverse all of them. As a result, this lead to soft lockup
> in the OOM process.
>
> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
> function per 1000 iterations. For global OOM, call
> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
>
> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
Looks fine to me, although we do a lot of processes traversals for oom
kill selection as well and this hasn't ever popped up as a significant
concern. We have cases far beyond 56k processes. No objection to the
approach, however.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2024-12-24 2:52 [PATCH v3] memcg: fix soft lockup in the OOM process Chen Ridong
2024-12-24 23:06 ` David Rientjes
@ 2025-01-03 16:18 ` Michal Koutný
2025-01-04 0:52 ` Chen Ridong
2025-01-06 8:45 ` Vlastimil Babka
2025-01-06 9:29 ` Michal Hocko
3 siblings, 1 reply; 15+ messages in thread
From: Michal Koutný @ 2025-01-03 16:18 UTC (permalink / raw)
To: Chen Ridong
Cc: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, handai.szj, rientjes,
kamezawa.hiroyu, linux-mm, linux-kernel, cgroups, chenridong,
wangweiyang2
[-- Attachment #1: Type: text/plain, Size: 984 bytes --]
Hello.
On Tue, Dec 24, 2024 at 02:52:38AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote:
> A soft lockup issue was found in the product with about 56,000 tasks were
> in the OOM cgroup, it was traversing them when the soft lockup was
> triggered.
Why is this softlockup a problem?
It's lot of tasks afterall and possibly a slow console (given looking
for a victim among the comparable number didn't trigger it).
> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
> function per 1000 iterations. For global OOM, call
> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
This only hides the issue. It could be similarly fixed by simply
decreasing loglevel= ;-)
cond_resched() in the memcg case may be OK but the arbitrary touch for
global situation may hide possibly useful troubleshooting information.
(Yeah, cond_resched() won't fit inside RCU section as in other global
task iterations.)
0.02€,
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2025-01-03 16:18 ` Michal Koutný
@ 2025-01-04 0:52 ` Chen Ridong
0 siblings, 0 replies; 15+ messages in thread
From: Chen Ridong @ 2025-01-04 0:52 UTC (permalink / raw)
To: Michal Koutný
Cc: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, handai.szj, rientjes,
kamezawa.hiroyu, linux-mm, linux-kernel, cgroups, chenridong,
wangweiyang2
On 2025/1/4 0:18, Michal Koutný wrote:
> Hello.
>
> On Tue, Dec 24, 2024 at 02:52:38AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote:
>> A soft lockup issue was found in the product with about 56,000 tasks were
>> in the OOM cgroup, it was traversing them when the soft lockup was
>> triggered.
>
> Why is this softlockup a problem?
> It's lot of tasks afterall and possibly a slow console (given looking
> for a victim among the comparable number didn't trigger it).
>
It's not a slow console, but rather 'console pressure'. When a lot of
tasks apply to the console, it can make 'pr_info' slow. In my case,
these tasks will apply to the console. I reproduced this issue using a
test ko that creates many tasks, all of which just call 'pr_info'.
Best regards,
Ridong
>> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
>> function per 1000 iterations. For global OOM, call
>> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
>
> This only hides the issue. It could be similarly fixed by simply
> decreasing loglevel= ;-)
>
> cond_resched() in the memcg case may be OK but the arbitrary touch for
> global situation may hide possibly useful troubleshooting information.
> (Yeah, cond_resched() won't fit inside RCU section as in other global
> task iterations.)
>
> 0.02€,
> Michal
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2024-12-24 2:52 [PATCH v3] memcg: fix soft lockup in the OOM process Chen Ridong
2024-12-24 23:06 ` David Rientjes
2025-01-03 16:18 ` Michal Koutný
@ 2025-01-06 8:45 ` Vlastimil Babka
2025-01-13 6:51 ` Chen Ridong
2025-01-06 9:29 ` Michal Hocko
3 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2025-01-06 8:45 UTC (permalink / raw)
To: Chen Ridong, akpm, mhocko, hannes, yosryahmed, roman.gushchin,
shakeel.butt, muchun.song, davidf, handai.szj, rientjes,
kamezawa.hiroyu, RCU
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
On 12/24/24 03:52, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
+CC RCU
> A soft lockup issue was found in the product with about 56,000 tasks were
> in the OOM cgroup, it was traversing them when the soft lockup was
> triggered.
>
> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
> Hardware name: Huawei Cloud OpenStack Nova, BIOS
> RIP: 0010:console_unlock+0x343/0x540
> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> vprintk_emit+0x193/0x280
> printk+0x52/0x6e
> dump_task+0x114/0x130
> mem_cgroup_scan_tasks+0x76/0x100
> dump_header+0x1fe/0x210
> oom_kill_process+0xd1/0x100
> out_of_memory+0x125/0x570
> mem_cgroup_out_of_memory+0xb5/0xd0
> try_charge+0x720/0x770
> mem_cgroup_try_charge+0x86/0x180
> mem_cgroup_try_charge_delay+0x1c/0x40
> do_anonymous_page+0xb5/0x390
> handle_mm_fault+0xc4/0x1f0
>
> This is because thousands of processes are in the OOM cgroup, it takes a
> long time to traverse all of them. As a result, this lead to soft lockup
> in the OOM process.
>
> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
> function per 1000 iterations. For global OOM, call
> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
>
> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> mm/memcontrol.c | 7 ++++++-
> mm/oom_kill.c | 8 +++++++-
> 2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 65fb5eee1466..46f8b372d212 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1161,6 +1161,7 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
> {
> struct mem_cgroup *iter;
> int ret = 0;
> + int i = 0;
>
> BUG_ON(mem_cgroup_is_root(memcg));
>
> @@ -1169,8 +1170,12 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
> struct task_struct *task;
>
> css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
> - while (!ret && (task = css_task_iter_next(&it)))
> + while (!ret && (task = css_task_iter_next(&it))) {
> + /* Avoid potential softlockup warning */
> + if ((++i & 1023) == 0)
> + cond_resched();
> ret = fn(task, arg);
> + }
> css_task_iter_end(&it);
> if (ret) {
> mem_cgroup_iter_break(memcg, iter);
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 1c485beb0b93..044ebab2c941 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -44,6 +44,7 @@
> #include <linux/init.h>
> #include <linux/mmu_notifier.h>
> #include <linux/cred.h>
> +#include <linux/nmi.h>
>
> #include <asm/tlb.h>
> #include "internal.h"
> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> else {
> struct task_struct *p;
> + int i = 0;
>
> rcu_read_lock();
> - for_each_process(p)
> + for_each_process(p) {
> + /* Avoid potential softlockup warning */
> + if ((++i & 1023) == 0)
> + touch_softlockup_watchdog();
This might suppress the soft lockup, but won't a rcu stall still be detected?
> dump_task(p, oc);
> + }
> rcu_read_unlock();
> }
> }
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2025-01-06 8:45 ` Vlastimil Babka
@ 2025-01-13 6:51 ` Chen Ridong
2025-01-14 3:45 ` Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: Chen Ridong @ 2025-01-13 6:51 UTC (permalink / raw)
To: Vlastimil Babka, akpm, mhocko, hannes, yosryahmed,
roman.gushchin, shakeel.butt, muchun.song, davidf, handai.szj,
rientjes, kamezawa.hiroyu, RCU
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
On 2025/1/6 16:45, Vlastimil Babka wrote:
> On 12/24/24 03:52, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>
> +CC RCU
>
>> A soft lockup issue was found in the product with about 56,000 tasks were
>> in the OOM cgroup, it was traversing them when the soft lockup was
>> triggered.
>>
>> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
>> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
>> Hardware name: Huawei Cloud OpenStack Nova, BIOS
>> RIP: 0010:console_unlock+0x343/0x540
>> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
>> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
>> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
>> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
>> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
>> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>> vprintk_emit+0x193/0x280
>> printk+0x52/0x6e
>> dump_task+0x114/0x130
>> mem_cgroup_scan_tasks+0x76/0x100
>> dump_header+0x1fe/0x210
>> oom_kill_process+0xd1/0x100
>> out_of_memory+0x125/0x570
>> mem_cgroup_out_of_memory+0xb5/0xd0
>> try_charge+0x720/0x770
>> mem_cgroup_try_charge+0x86/0x180
>> mem_cgroup_try_charge_delay+0x1c/0x40
>> do_anonymous_page+0xb5/0x390
>> handle_mm_fault+0xc4/0x1f0
>>
>> This is because thousands of processes are in the OOM cgroup, it takes a
>> long time to traverse all of them. As a result, this lead to soft lockup
>> in the OOM process.
>>
>> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
>> function per 1000 iterations. For global OOM, call
>> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
>>
>> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>> mm/memcontrol.c | 7 ++++++-
>> mm/oom_kill.c | 8 +++++++-
>> 2 files changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 65fb5eee1466..46f8b372d212 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1161,6 +1161,7 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
>> {
>> struct mem_cgroup *iter;
>> int ret = 0;
>> + int i = 0;
>>
>> BUG_ON(mem_cgroup_is_root(memcg));
>>
>> @@ -1169,8 +1170,12 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
>> struct task_struct *task;
>>
>> css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
>> - while (!ret && (task = css_task_iter_next(&it)))
>> + while (!ret && (task = css_task_iter_next(&it))) {
>> + /* Avoid potential softlockup warning */
>> + if ((++i & 1023) == 0)
>> + cond_resched();
>> ret = fn(task, arg);
>> + }
>> css_task_iter_end(&it);
>> if (ret) {
>> mem_cgroup_iter_break(memcg, iter);
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 1c485beb0b93..044ebab2c941 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -44,6 +44,7 @@
>> #include <linux/init.h>
>> #include <linux/mmu_notifier.h>
>> #include <linux/cred.h>
>> +#include <linux/nmi.h>
>>
>> #include <asm/tlb.h>
>> #include "internal.h"
>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>> else {
>> struct task_struct *p;
>> + int i = 0;
>>
>> rcu_read_lock();
>> - for_each_process(p)
>> + for_each_process(p) {
>> + /* Avoid potential softlockup warning */
>> + if ((++i & 1023) == 0)
>> + touch_softlockup_watchdog();
>
> This might suppress the soft lockup, but won't a rcu stall still be detected?
Yes, rcu stall was still detected.
For global OOM, system is likely to struggle, do we have to do some
works to suppress RCU detete?
Best regards,
Ridong
>
>> dump_task(p, oc);
>> + }
>> rcu_read_unlock();
>> }
>> }
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2025-01-13 6:51 ` Chen Ridong
@ 2025-01-14 3:45 ` Andrew Morton
2025-01-14 8:40 ` Michal Hocko
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2025-01-14 3:45 UTC (permalink / raw)
To: Chen Ridong
Cc: Vlastimil Babka, mhocko, hannes, yosryahmed, roman.gushchin,
shakeel.butt, muchun.song, davidf, handai.szj, rientjes,
kamezawa.hiroyu, RCU, linux-mm, linux-kernel, cgroups,
chenridong, wangweiyang2
On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>
>
> On 2025/1/6 16:45, Vlastimil Babka wrote:
> > On 12/24/24 03:52, Chen Ridong wrote:
> >> From: Chen Ridong <chenridong@huawei.com>
> >
> > +CC RCU
> >
> >> A soft lockup issue was found in the product with about 56,000 tasks were
> >> in the OOM cgroup, it was traversing them when the soft lockup was
> >> triggered.
> >>
>
> ...
>
> >> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> >> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> >> else {
> >> struct task_struct *p;
> >> + int i = 0;
> >>
> >> rcu_read_lock();
> >> - for_each_process(p)
> >> + for_each_process(p) {
> >> + /* Avoid potential softlockup warning */
> >> + if ((++i & 1023) == 0)
> >> + touch_softlockup_watchdog();
> >
> > This might suppress the soft lockup, but won't a rcu stall still be detected?
>
> Yes, rcu stall was still detected.
> For global OOM, system is likely to struggle, do we have to do some
> works to suppress RCU detete?
rcu_cpu_stall_reset()?
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2025-01-14 3:45 ` Andrew Morton
@ 2025-01-14 8:40 ` Michal Hocko
2025-01-14 9:20 ` Vlastimil Babka
0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2025-01-14 8:40 UTC (permalink / raw)
To: Andrew Morton
Cc: Chen Ridong, Vlastimil Babka, hannes, yosryahmed, roman.gushchin,
shakeel.butt, muchun.song, davidf, handai.szj, rientjes,
kamezawa.hiroyu, RCU, linux-mm, linux-kernel, cgroups,
chenridong, wangweiyang2
On Mon 13-01-25 19:45:46, Andrew Morton wrote:
> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>
> >
> >
> > On 2025/1/6 16:45, Vlastimil Babka wrote:
> > > On 12/24/24 03:52, Chen Ridong wrote:
> > >> From: Chen Ridong <chenridong@huawei.com>
> > >
> > > +CC RCU
> > >
> > >> A soft lockup issue was found in the product with about 56,000 tasks were
> > >> in the OOM cgroup, it was traversing them when the soft lockup was
> > >> triggered.
> > >>
> >
> > ...
> >
> > >> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> > >> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> > >> else {
> > >> struct task_struct *p;
> > >> + int i = 0;
> > >>
> > >> rcu_read_lock();
> > >> - for_each_process(p)
> > >> + for_each_process(p) {
> > >> + /* Avoid potential softlockup warning */
> > >> + if ((++i & 1023) == 0)
> > >> + touch_softlockup_watchdog();
> > >
> > > This might suppress the soft lockup, but won't a rcu stall still be detected?
> >
> > Yes, rcu stall was still detected.
> > For global OOM, system is likely to struggle, do we have to do some
> > works to suppress RCU detete?
>
> rcu_cpu_stall_reset()?
Do we really care about those? The code to iterate over all processes
under RCU is there (basically) since ever and yet we do not seem to have
many reports of stalls? Chen's situation is specific to memcg OOM and
touching the global case was mostly for consistency reasons.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2025-01-14 8:40 ` Michal Hocko
@ 2025-01-14 9:20 ` Vlastimil Babka
2025-01-14 9:30 ` Michal Hocko
2025-01-14 12:13 ` Chen Ridong
0 siblings, 2 replies; 15+ messages in thread
From: Vlastimil Babka @ 2025-01-14 9:20 UTC (permalink / raw)
To: Michal Hocko, Andrew Morton
Cc: Chen Ridong, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, handai.szj, rientjes, kamezawa.hiroyu, RCU,
linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
On 1/14/25 09:40, Michal Hocko wrote:
> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>>
>> > >> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>> > >> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>> > >> else {
>> > >> struct task_struct *p;
>> > >> + int i = 0;
>> > >>
>> > >> rcu_read_lock();
>> > >> - for_each_process(p)
>> > >> + for_each_process(p) {
>> > >> + /* Avoid potential softlockup warning */
>> > >> + if ((++i & 1023) == 0)
>> > >> + touch_softlockup_watchdog();
>> > >
>> > > This might suppress the soft lockup, but won't a rcu stall still be detected?
>> >
>> > Yes, rcu stall was still detected.
"was" or "would be"? I thought only the memcg case was observed, or was that
some deliberate stress test of the global case? (or the pr_info() console
stress test mentioned earlier, but created outside of the oom code?)
>> > For global OOM, system is likely to struggle, do we have to do some
>> > works to suppress RCU detete?
>>
>> rcu_cpu_stall_reset()?
>
> Do we really care about those? The code to iterate over all processes
> under RCU is there (basically) since ever and yet we do not seem to have
> many reports of stalls? Chen's situation is specific to memcg OOM and
> touching the global case was mostly for consistency reasons.
Then I'd rather not touch the global case then if it's theoretical? It's not
even exactly consistent, given it's a cond_resched() in the memcg code (that
can be eventually automatically removed once/if lazy preempt becomes the
sole implementation), but the touch_softlockup_watchdog() would remain,
while doing only half of the job?
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2025-01-14 9:20 ` Vlastimil Babka
@ 2025-01-14 9:30 ` Michal Hocko
2025-01-14 12:19 ` Chen Ridong
2025-01-14 12:13 ` Chen Ridong
1 sibling, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2025-01-14 9:30 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, Chen Ridong, hannes, yosryahmed, roman.gushchin,
shakeel.butt, muchun.song, davidf, handai.szj, rientjes,
kamezawa.hiroyu, RCU, linux-mm, linux-kernel, cgroups,
chenridong, wangweiyang2
On Tue 14-01-25 10:20:28, Vlastimil Babka wrote:
> On 1/14/25 09:40, Michal Hocko wrote:
> > On Mon 13-01-25 19:45:46, Andrew Morton wrote:
[...]
> >> > For global OOM, system is likely to struggle, do we have to do some
> >> > works to suppress RCU detete?
> >>
> >> rcu_cpu_stall_reset()?
> >
> > Do we really care about those? The code to iterate over all processes
> > under RCU is there (basically) since ever and yet we do not seem to have
> > many reports of stalls? Chen's situation is specific to memcg OOM and
> > touching the global case was mostly for consistency reasons.
>
> Then I'd rather not touch the global case then if it's theoretical?
No strong opinion on this on my side. The only actual reason
touch_softlockup_watchdog is there is becuase it originally had
incorrectly cond_resched there. If half silencing (soft lock up
detector only) disturbs people then let's just drop that hunk.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2025-01-14 9:30 ` Michal Hocko
@ 2025-01-14 12:19 ` Chen Ridong
0 siblings, 0 replies; 15+ messages in thread
From: Chen Ridong @ 2025-01-14 12:19 UTC (permalink / raw)
To: Michal Hocko, Vlastimil Babka
Cc: Andrew Morton, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, handai.szj, rientjes, kamezawa.hiroyu, RCU,
linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
On 2025/1/14 17:30, Michal Hocko wrote:
> On Tue 14-01-25 10:20:28, Vlastimil Babka wrote:
>> On 1/14/25 09:40, Michal Hocko wrote:
>>> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
> [...]
>>>>> For global OOM, system is likely to struggle, do we have to do some
>>>>> works to suppress RCU detete?
>>>>
>>>> rcu_cpu_stall_reset()?
>>>
>>> Do we really care about those? The code to iterate over all processes
>>> under RCU is there (basically) since ever and yet we do not seem to have
>>> many reports of stalls? Chen's situation is specific to memcg OOM and
>>> touching the global case was mostly for consistency reasons.
>>
>> Then I'd rather not touch the global case then if it's theoretical?
>
> No strong opinion on this on my side. The only actual reason
> touch_softlockup_watchdog is there is becuase it originally had
> incorrectly cond_resched there. If half silencing (soft lock up
> detector only) disturbs people then let's just drop that hunk.
So do I. If there are no other opinions, I will drop it.
Best regards,
Ridong
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2025-01-14 9:20 ` Vlastimil Babka
2025-01-14 9:30 ` Michal Hocko
@ 2025-01-14 12:13 ` Chen Ridong
2025-01-14 18:42 ` Paul E. McKenney
1 sibling, 1 reply; 15+ messages in thread
From: Chen Ridong @ 2025-01-14 12:13 UTC (permalink / raw)
To: Vlastimil Babka, Michal Hocko, Andrew Morton
Cc: hannes, yosryahmed, roman.gushchin, shakeel.butt, muchun.song,
davidf, handai.szj, rientjes, kamezawa.hiroyu, RCU, linux-mm,
linux-kernel, cgroups, chenridong, wangweiyang2
On 2025/1/14 17:20, Vlastimil Babka wrote:
> On 1/14/25 09:40, Michal Hocko wrote:
>> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
>>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>
>>>>>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>>>>>> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>>>>>> else {
>>>>>> struct task_struct *p;
>>>>>> + int i = 0;
>>>>>>
>>>>>> rcu_read_lock();
>>>>>> - for_each_process(p)
>>>>>> + for_each_process(p) {
>>>>>> + /* Avoid potential softlockup warning */
>>>>>> + if ((++i & 1023) == 0)
>>>>>> + touch_softlockup_watchdog();
>>>>>
>>>>> This might suppress the soft lockup, but won't a rcu stall still be detected?
>>>>
>>>> Yes, rcu stall was still detected.
>
> "was" or "would be"? I thought only the memcg case was observed, or was that
> some deliberate stress test of the global case? (or the pr_info() console
> stress test mentioned earlier, but created outside of the oom code?)
>
It's not easy to reproduce for global OOM. Because the pr_info() console
stress test can also lead to other softlockups or RCU warnings(not
causeed by OOM process) because the whole system is struggling.However,
if I add mdelay(1) in the dump_task() function (just to slow down
dump_task, assuming this is slowed by pr_info()) and trigger a global
OOM, RCU warnings can be observed.
I think this can verify that global OOM can trigger RCU warnings in the
specific scenarios.
>>>> For global OOM, system is likely to struggle, do we have to do some
>>>> works to suppress RCU detete?
>>>
>>> rcu_cpu_stall_reset()?
>>
>> Do we really care about those? The code to iterate over all processes
>> under RCU is there (basically) since ever and yet we do not seem to have
>> many reports of stalls? Chen's situation is specific to memcg OOM and
>> touching the global case was mostly for consistency reasons.
>
> Then I'd rather not touch the global case then if it's theoretical? It's not
> even exactly consistent, given it's a cond_resched() in the memcg code (that
> can be eventually automatically removed once/if lazy preempt becomes the
> sole implementation), but the touch_softlockup_watchdog() would remain,
> while doing only half of the job?
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2025-01-14 12:13 ` Chen Ridong
@ 2025-01-14 18:42 ` Paul E. McKenney
2025-01-17 6:59 ` chenridong
0 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2025-01-14 18:42 UTC (permalink / raw)
To: Chen Ridong
Cc: Vlastimil Babka, Michal Hocko, Andrew Morton, hannes, yosryahmed,
roman.gushchin, shakeel.butt, muchun.song, davidf, handai.szj,
rientjes, kamezawa.hiroyu, RCU, linux-mm, linux-kernel, cgroups,
chenridong, wangweiyang2
On Tue, Jan 14, 2025 at 08:13:37PM +0800, Chen Ridong wrote:
>
>
> On 2025/1/14 17:20, Vlastimil Babka wrote:
> > On 1/14/25 09:40, Michal Hocko wrote:
> >> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
> >>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
> >>>
> >>>>>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> >>>>>> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> >>>>>> else {
> >>>>>> struct task_struct *p;
> >>>>>> + int i = 0;
> >>>>>>
> >>>>>> rcu_read_lock();
> >>>>>> - for_each_process(p)
> >>>>>> + for_each_process(p) {
> >>>>>> + /* Avoid potential softlockup warning */
> >>>>>> + if ((++i & 1023) == 0)
> >>>>>> + touch_softlockup_watchdog();
> >>>>>
> >>>>> This might suppress the soft lockup, but won't a rcu stall still be detected?
> >>>>
> >>>> Yes, rcu stall was still detected.
> >
> > "was" or "would be"? I thought only the memcg case was observed, or was that
> > some deliberate stress test of the global case? (or the pr_info() console
> > stress test mentioned earlier, but created outside of the oom code?)
> >
>
> It's not easy to reproduce for global OOM. Because the pr_info() console
> stress test can also lead to other softlockups or RCU warnings(not
> causeed by OOM process) because the whole system is struggling.However,
> if I add mdelay(1) in the dump_task() function (just to slow down
> dump_task, assuming this is slowed by pr_info()) and trigger a global
> OOM, RCU warnings can be observed.
>
> I think this can verify that global OOM can trigger RCU warnings in the
> specific scenarios.
We do have a recently upstreamed rcutree.csd_lock_suppress_rcu_stall
kernel boot parameter that causes RCU CPU stall warnings to suppress
most of the output when there is an ongoing CSD-lock stall.
Would it make sense to do something similar when the system is in OOM,
give or take the traditional difficulty of determining exactly when OOM
starts and ends?
1dd01c06506c ("rcu: Summarize RCU CPU stall warnings during CSD-lock stalls")
Thanx, Paul
> >>>> For global OOM, system is likely to struggle, do we have to do some
> >>>> works to suppress RCU detete?
> >>>
> >>> rcu_cpu_stall_reset()?
> >>
> >> Do we really care about those? The code to iterate over all processes
> >> under RCU is there (basically) since ever and yet we do not seem to have
> >> many reports of stalls? Chen's situation is specific to memcg OOM and
> >> touching the global case was mostly for consistency reasons.
> >
> > Then I'd rather not touch the global case then if it's theoretical? It's not
> > even exactly consistent, given it's a cond_resched() in the memcg code (that
> > can be eventually automatically removed once/if lazy preempt becomes the
> > sole implementation), but the touch_softlockup_watchdog() would remain,
> > while doing only half of the job?
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2025-01-14 18:42 ` Paul E. McKenney
@ 2025-01-17 6:59 ` chenridong
0 siblings, 0 replies; 15+ messages in thread
From: chenridong @ 2025-01-17 6:59 UTC (permalink / raw)
To: paulmck, Chen Ridong
Cc: Vlastimil Babka, Michal Hocko, Andrew Morton, hannes, yosryahmed,
roman.gushchin, shakeel.butt, muchun.song, davidf, handai.szj,
rientjes, kamezawa.hiroyu, RCU, linux-mm, linux-kernel, cgroups,
wangweiyang2
On 2025/1/15 2:42, Paul E. McKenney wrote:
> On Tue, Jan 14, 2025 at 08:13:37PM +0800, Chen Ridong wrote:
>>
>>
>> On 2025/1/14 17:20, Vlastimil Babka wrote:
>>> On 1/14/25 09:40, Michal Hocko wrote:
>>>> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
>>>>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>>>
>>>>>>>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>>>>>>>> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>>>>>>>> else {
>>>>>>>> struct task_struct *p;
>>>>>>>> + int i = 0;
>>>>>>>>
>>>>>>>> rcu_read_lock();
>>>>>>>> - for_each_process(p)
>>>>>>>> + for_each_process(p) {
>>>>>>>> + /* Avoid potential softlockup warning */
>>>>>>>> + if ((++i & 1023) == 0)
>>>>>>>> + touch_softlockup_watchdog();
>>>>>>>
>>>>>>> This might suppress the soft lockup, but won't a rcu stall still be detected?
>>>>>>
>>>>>> Yes, rcu stall was still detected.
>>>
>>> "was" or "would be"? I thought only the memcg case was observed, or was that
>>> some deliberate stress test of the global case? (or the pr_info() console
>>> stress test mentioned earlier, but created outside of the oom code?)
>>>
>>
>> It's not easy to reproduce for global OOM. Because the pr_info() console
>> stress test can also lead to other softlockups or RCU warnings(not
>> causeed by OOM process) because the whole system is struggling.However,
>> if I add mdelay(1) in the dump_task() function (just to slow down
>> dump_task, assuming this is slowed by pr_info()) and trigger a global
>> OOM, RCU warnings can be observed.
>>
>> I think this can verify that global OOM can trigger RCU warnings in the
>> specific scenarios.
>
> We do have a recently upstreamed rcutree.csd_lock_suppress_rcu_stall
> kernel boot parameter that causes RCU CPU stall warnings to suppress
> most of the output when there is an ongoing CSD-lock stall.
>
> Would it make sense to do something similar when the system is in OOM,
> give or take the traditional difficulty of determining exactly when OOM
> starts and ends?
>
> 1dd01c06506c ("rcu: Summarize RCU CPU stall warnings during CSD-lock stalls")
>
> Thanx, Paul
>
I prefer to just drop it.
Unlike memcg OOM, global OOM doesn't usually happen. Although it
'verified' that the RCU warning can be observed, we haven't encountered
it in practice. Besides, other RCU warnings may also be observed during
global OOM, and it's difficult to circumvent all the warnings.
Best regards,
Ridong
>>>>>> For global OOM, system is likely to struggle, do we have to do some
>>>>>> works to suppress RCU detete?
>>>>>
>>>>> rcu_cpu_stall_reset()?
>>>>
>>>> Do we really care about those? The code to iterate over all processes
>>>> under RCU is there (basically) since ever and yet we do not seem to have
>>>> many reports of stalls? Chen's situation is specific to memcg OOM and
>>>> touching the global case was mostly for consistency reasons.
>>>
>>> Then I'd rather not touch the global case then if it's theoretical? It's not
>>> even exactly consistent, given it's a cond_resched() in the memcg code (that
>>> can be eventually automatically removed once/if lazy preempt becomes the
>>> sole implementation), but the touch_softlockup_watchdog() would remain,
>>> while doing only half of the job?
>>
>>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] memcg: fix soft lockup in the OOM process
2024-12-24 2:52 [PATCH v3] memcg: fix soft lockup in the OOM process Chen Ridong
` (2 preceding siblings ...)
2025-01-06 8:45 ` Vlastimil Babka
@ 2025-01-06 9:29 ` Michal Hocko
3 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2025-01-06 9:29 UTC (permalink / raw)
To: Chen Ridong
Cc: akpm, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, handai.szj, rientjes,
kamezawa.hiroyu, linux-mm, linux-kernel, cgroups, chenridong,
wangweiyang2
On Tue 24-12-24 02:52:38, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> A soft lockup issue was found in the product with about 56,000 tasks were
> in the OOM cgroup, it was traversing them when the soft lockup was
> triggered.
>
> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
> Hardware name: Huawei Cloud OpenStack Nova, BIOS
> RIP: 0010:console_unlock+0x343/0x540
> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> vprintk_emit+0x193/0x280
> printk+0x52/0x6e
> dump_task+0x114/0x130
> mem_cgroup_scan_tasks+0x76/0x100
> dump_header+0x1fe/0x210
> oom_kill_process+0xd1/0x100
> out_of_memory+0x125/0x570
> mem_cgroup_out_of_memory+0xb5/0xd0
> try_charge+0x720/0x770
> mem_cgroup_try_charge+0x86/0x180
> mem_cgroup_try_charge_delay+0x1c/0x40
> do_anonymous_page+0xb5/0x390
> handle_mm_fault+0xc4/0x1f0
>
> This is because thousands of processes are in the OOM cgroup, it takes a
> long time to traverse all of them. As a result, this lead to soft lockup
> in the OOM process.
>
> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
> function per 1000 iterations. For global OOM, call
> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
>
> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
LGTM, I would really not overthink that much. PREEMPT_NONE and Soft
lockups will hopefully soon become a non-issue.
Acked-by: Michal Hocko <mhocko@suse.com>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 15+ messages in thread