* [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
@ 2025-02-12 2:57 Chen Ridong
2025-02-12 3:24 ` Chen Ridong
2025-02-12 8:57 ` Michal Hocko
0 siblings, 2 replies; 7+ messages in thread
From: Chen Ridong @ 2025-02-12 2:57 UTC (permalink / raw)
To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, mkoutny, paulmck
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
From: Chen Ridong <chenridong@huawei.com>
Unlike memcg OOM, which is relatively common, global OOM events are rare
and typically indicate that the entire system is under severe memory
pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
process") added the touch_softlockup_watchdog in the global OOM handler to
suppess the soft lockup issues. However, while this change can suppress
soft lockup warnings, it does not address RCU stalls, which can still be
detected and may cause unnecessary disturbances. Simply remove the
modification from the global OOM handler.
Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
mm/oom_kill.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 25923cfec9c6..2d8b27604ef8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -44,7 +44,6 @@
#include <linux/init.h>
#include <linux/mmu_notifier.h>
#include <linux/cred.h>
-#include <linux/nmi.h>
#include <asm/tlb.h>
#include "internal.h"
@@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
else {
struct task_struct *p;
- int i = 0;
rcu_read_lock();
- for_each_process(p) {
- /* Avoid potential softlockup warning */
- if ((++i & 1023) == 0)
- touch_softlockup_watchdog();
+ for_each_process(p)
dump_task(p, oc);
- }
rcu_read_unlock();
}
}
--
2.34.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
2025-02-12 2:57 [PATCH] mm/oom_kill: revert watchdog reset in global OOM process Chen Ridong
@ 2025-02-12 3:24 ` Chen Ridong
2025-02-12 8:57 ` Michal Hocko
1 sibling, 0 replies; 7+ messages in thread
From: Chen Ridong @ 2025-02-12 3:24 UTC (permalink / raw)
To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, mkoutny, paulmck
Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2
On 2025/2/12 10:57, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> Unlike memcg OOM, which is relatively common, global OOM events are rare
> and typically indicate that the entire system is under severe memory
> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> process") added the touch_softlockup_watchdog in the global OOM handler to
> suppess the soft lockup issues. However, while this change can suppress
> soft lockup warnings, it does not address RCU stalls, which can still be
> detected and may cause unnecessary disturbances. Simply remove the
> modification from the global OOM handler.
>
> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> mm/oom_kill.c | 8 +-------
> 1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..2d8b27604ef8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -44,7 +44,6 @@
> #include <linux/init.h>
> #include <linux/mmu_notifier.h>
> #include <linux/cred.h>
> -#include <linux/nmi.h>
>
> #include <asm/tlb.h>
> #include "internal.h"
> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> else {
> struct task_struct *p;
> - int i = 0;
>
> rcu_read_lock();
> - for_each_process(p) {
> - /* Avoid potential softlockup warning */
> - if ((++i & 1023) == 0)
> - touch_softlockup_watchdog();
> + for_each_process(p)
> dump_task(p, oc);
> - }
> rcu_read_unlock();
> }
> }
Add discussion link:
https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
2025-02-12 2:57 [PATCH] mm/oom_kill: revert watchdog reset in global OOM process Chen Ridong
2025-02-12 3:24 ` Chen Ridong
@ 2025-02-12 8:57 ` Michal Hocko
2025-02-12 9:19 ` Chen Ridong
1 sibling, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2025-02-12 8:57 UTC (permalink / raw)
To: Chen Ridong
Cc: akpm, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, mkoutny, paulmck, linux-mm,
linux-kernel, cgroups, chenridong, wangweiyang2
On Wed 12-02-25 02:57:07, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> Unlike memcg OOM, which is relatively common, global OOM events are rare
> and typically indicate that the entire system is under severe memory
> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> process") added the touch_softlockup_watchdog in the global OOM handler to
> suppess the soft lockup issues. However, while this change can suppress
> soft lockup warnings, it does not address RCU stalls, which can still be
> detected and may cause unnecessary disturbances. Simply remove the
> modification from the global OOM handler.
>
> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
But this is not really fixing anything, is it? While this doesn't
address a potential RCU stall it doesn't address any actual problem.
So why do we want to do this?
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> mm/oom_kill.c | 8 +-------
> 1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..2d8b27604ef8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -44,7 +44,6 @@
> #include <linux/init.h>
> #include <linux/mmu_notifier.h>
> #include <linux/cred.h>
> -#include <linux/nmi.h>
>
> #include <asm/tlb.h>
> #include "internal.h"
> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> else {
> struct task_struct *p;
> - int i = 0;
>
> rcu_read_lock();
> - for_each_process(p) {
> - /* Avoid potential softlockup warning */
> - if ((++i & 1023) == 0)
> - touch_softlockup_watchdog();
> + for_each_process(p)
> dump_task(p, oc);
> - }
> rcu_read_unlock();
> }
> }
> --
> 2.34.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
2025-02-12 8:57 ` Michal Hocko
@ 2025-02-12 9:19 ` Chen Ridong
2025-02-12 9:34 ` Vlastimil Babka
0 siblings, 1 reply; 7+ messages in thread
From: Chen Ridong @ 2025-02-12 9:19 UTC (permalink / raw)
To: Michal Hocko
Cc: akpm, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, vbabka, mkoutny, paulmck, linux-mm,
linux-kernel, cgroups, chenridong, wangweiyang2
On 2025/2/12 16:57, Michal Hocko wrote:
> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>> and typically indicate that the entire system is under severe memory
>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>> process") added the touch_softlockup_watchdog in the global OOM handler to
>> suppess the soft lockup issues. However, while this change can suppress
>> soft lockup warnings, it does not address RCU stalls, which can still be
>> detected and may cause unnecessary disturbances. Simply remove the
>> modification from the global OOM handler.
>>
>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
>
> But this is not really fixing anything, is it? While this doesn't
> address a potential RCU stall it doesn't address any actual problem.
> So why do we want to do this?
>
[1]
https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
As previously discussed, the work I have done on the global OOM is 'half
of the job'. Based on our discussions, I thought that it would be best
to abandon this approach for global OOM. Therefore, I am sending this
patch to revert the changes.
Or just leave it?
Best regards,
Ridong
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>> mm/oom_kill.c | 8 +-------
>> 1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 25923cfec9c6..2d8b27604ef8 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -44,7 +44,6 @@
>> #include <linux/init.h>
>> #include <linux/mmu_notifier.h>
>> #include <linux/cred.h>
>> -#include <linux/nmi.h>
>>
>> #include <asm/tlb.h>
>> #include "internal.h"
>> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
>> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>> else {
>> struct task_struct *p;
>> - int i = 0;
>>
>> rcu_read_lock();
>> - for_each_process(p) {
>> - /* Avoid potential softlockup warning */
>> - if ((++i & 1023) == 0)
>> - touch_softlockup_watchdog();
>> + for_each_process(p)
>> dump_task(p, oc);
>> - }
>> rcu_read_unlock();
>> }
>> }
>> --
>> 2.34.1
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
2025-02-12 9:19 ` Chen Ridong
@ 2025-02-12 9:34 ` Vlastimil Babka
2025-02-12 9:52 ` Chen Ridong
2025-02-12 11:58 ` Michal Hocko
0 siblings, 2 replies; 7+ messages in thread
From: Vlastimil Babka @ 2025-02-12 9:34 UTC (permalink / raw)
To: Chen Ridong, Michal Hocko
Cc: akpm, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, mkoutny, paulmck, linux-mm, linux-kernel,
cgroups, chenridong, wangweiyang2
On 2/12/25 10:19, Chen Ridong wrote:
>
>
> On 2025/2/12 16:57, Michal Hocko wrote:
>> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>>> and typically indicate that the entire system is under severe memory
>>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>>> process") added the touch_softlockup_watchdog in the global OOM handler to
>>> suppess the soft lockup issues. However, while this change can suppress
>>> soft lockup warnings, it does not address RCU stalls, which can still be
>>> detected and may cause unnecessary disturbances. Simply remove the
>>> modification from the global OOM handler.
>>>
>>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
>>
>> But this is not really fixing anything, is it? While this doesn't
>> address a potential RCU stall it doesn't address any actual problem.
>> So why do we want to do this?
>>
>
>
> [1]
> https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
>
> As previously discussed, the work I have done on the global OOM is 'half
> of the job'. Based on our discussions, I thought that it would be best
> to abandon this approach for global OOM. Therefore, I am sending this
> patch to revert the changes.
>
> Or just leave it?
I suggested that part doesn't need to be in the patch, but if it was merged
with it, we can just leave it there. Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
2025-02-12 9:34 ` Vlastimil Babka
@ 2025-02-12 9:52 ` Chen Ridong
2025-02-12 11:58 ` Michal Hocko
1 sibling, 0 replies; 7+ messages in thread
From: Chen Ridong @ 2025-02-12 9:52 UTC (permalink / raw)
To: Vlastimil Babka, Michal Hocko
Cc: akpm, hannes, yosryahmed, roman.gushchin, shakeel.butt,
muchun.song, davidf, mkoutny, paulmck, linux-mm, linux-kernel,
cgroups, chenridong, wangweiyang2
On 2025/2/12 17:34, Vlastimil Babka wrote:
> On 2/12/25 10:19, Chen Ridong wrote:
>>
>>
>> On 2025/2/12 16:57, Michal Hocko wrote:
>>> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>
>>>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>>>> and typically indicate that the entire system is under severe memory
>>>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>>>> process") added the touch_softlockup_watchdog in the global OOM handler to
>>>> suppess the soft lockup issues. However, while this change can suppress
>>>> soft lockup warnings, it does not address RCU stalls, which can still be
>>>> detected and may cause unnecessary disturbances. Simply remove the
>>>> modification from the global OOM handler.
>>>>
>>>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
>>>
>>> But this is not really fixing anything, is it? While this doesn't
>>> address a potential RCU stall it doesn't address any actual problem.
>>> So why do we want to do this?
>>>
>>
>>
>> [1]
>> https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
>>
>> As previously discussed, the work I have done on the global OOM is 'half
>> of the job'. Based on our discussions, I thought that it would be best
>> to abandon this approach for global OOM. Therefore, I am sending this
>> patch to revert the changes.
>>
>> Or just leave it?
>
> I suggested that part doesn't need to be in the patch, but if it was merged
> with it, we can just leave it there. Thanks.
See. Thank you very much.
Best regards,
Ridong
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
2025-02-12 9:34 ` Vlastimil Babka
2025-02-12 9:52 ` Chen Ridong
@ 2025-02-12 11:58 ` Michal Hocko
1 sibling, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2025-02-12 11:58 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Chen Ridong, akpm, hannes, yosryahmed, roman.gushchin,
shakeel.butt, muchun.song, davidf, mkoutny, paulmck, linux-mm,
linux-kernel, cgroups, chenridong, wangweiyang2
On Wed 12-02-25 10:34:06, Vlastimil Babka wrote:
> On 2/12/25 10:19, Chen Ridong wrote:
> >
> >
> > On 2025/2/12 16:57, Michal Hocko wrote:
> >> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
> >>> From: Chen Ridong <chenridong@huawei.com>
> >>>
> >>> Unlike memcg OOM, which is relatively common, global OOM events are rare
> >>> and typically indicate that the entire system is under severe memory
> >>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> >>> process") added the touch_softlockup_watchdog in the global OOM handler to
> >>> suppess the soft lockup issues. However, while this change can suppress
> >>> soft lockup warnings, it does not address RCU stalls, which can still be
> >>> detected and may cause unnecessary disturbances. Simply remove the
> >>> modification from the global OOM handler.
> >>>
> >>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
> >>
> >> But this is not really fixing anything, is it? While this doesn't
> >> address a potential RCU stall it doesn't address any actual problem.
> >> So why do we want to do this?
> >>
> >
> >
> > [1]
> > https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
> >
> > As previously discussed, the work I have done on the global OOM is 'half
> > of the job'. Based on our discussions, I thought that it would be best
> > to abandon this approach for global OOM. Therefore, I am sending this
> > patch to revert the changes.
> >
> > Or just leave it?
>
> I suggested that part doesn't need to be in the patch, but if it was merged
> with it, we can just leave it there. Thanks.
Agreed!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-02-12 11:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-12 2:57 [PATCH] mm/oom_kill: revert watchdog reset in global OOM process Chen Ridong
2025-02-12 3:24 ` Chen Ridong
2025-02-12 8:57 ` Michal Hocko
2025-02-12 9:19 ` Chen Ridong
2025-02-12 9:34 ` Vlastimil Babka
2025-02-12 9:52 ` Chen Ridong
2025-02-12 11:58 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox