linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
@ 2025-02-12  2:57 Chen Ridong
  2025-02-12  3:24 ` Chen Ridong
  2025-02-12  8:57 ` Michal Hocko
  0 siblings, 2 replies; 7+ messages in thread
From: Chen Ridong @ 2025-02-12  2:57 UTC (permalink / raw)
  To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
	muchun.song, davidf, vbabka, mkoutny, paulmck
  Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2

From: Chen Ridong <chenridong@huawei.com>

Unlike memcg OOM, which is relatively common, global OOM events are rare
and typically indicate that the entire system is under severe memory
pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
process") added the touch_softlockup_watchdog in the global OOM handler to
suppess the soft lockup issues. However, while this change can suppress
soft lockup warnings, it does not address RCU stalls, which can still be
detected and may cause unnecessary disturbances. Simply remove the
modification from the global OOM handler.

Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 mm/oom_kill.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 25923cfec9c6..2d8b27604ef8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -44,7 +44,6 @@
 #include <linux/init.h>
 #include <linux/mmu_notifier.h>
 #include <linux/cred.h>
-#include <linux/nmi.h>
 
 #include <asm/tlb.h>
 #include "internal.h"
@@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
 		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
 	else {
 		struct task_struct *p;
-		int i = 0;
 
 		rcu_read_lock();
-		for_each_process(p) {
-			/* Avoid potential softlockup warning */
-			if ((++i & 1023) == 0)
-				touch_softlockup_watchdog();
+		for_each_process(p)
 			dump_task(p, oc);
-		}
 		rcu_read_unlock();
 	}
 }
-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
  2025-02-12  2:57 [PATCH] mm/oom_kill: revert watchdog reset in global OOM process Chen Ridong
@ 2025-02-12  3:24 ` Chen Ridong
  2025-02-12  8:57 ` Michal Hocko
  1 sibling, 0 replies; 7+ messages in thread
From: Chen Ridong @ 2025-02-12  3:24 UTC (permalink / raw)
  To: akpm, mhocko, hannes, yosryahmed, roman.gushchin, shakeel.butt,
	muchun.song, davidf, vbabka, mkoutny, paulmck
  Cc: linux-mm, linux-kernel, cgroups, chenridong, wangweiyang2



On 2025/2/12 10:57, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> Unlike memcg OOM, which is relatively common, global OOM events are rare
> and typically indicate that the entire system is under severe memory
> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> process") added the touch_softlockup_watchdog in the global OOM handler to
> suppess the soft lockup issues. However, while this change can suppress
> soft lockup warnings, it does not address RCU stalls, which can still be
> detected and may cause unnecessary disturbances. Simply remove the
> modification from the global OOM handler.
> 
> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  mm/oom_kill.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..2d8b27604ef8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -44,7 +44,6 @@
>  #include <linux/init.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/cred.h>
> -#include <linux/nmi.h>
>  
>  #include <asm/tlb.h>
>  #include "internal.h"
> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>  	else {
>  		struct task_struct *p;
> -		int i = 0;
>  
>  		rcu_read_lock();
> -		for_each_process(p) {
> -			/* Avoid potential softlockup warning */
> -			if ((++i & 1023) == 0)
> -				touch_softlockup_watchdog();
> +		for_each_process(p)
>  			dump_task(p, oc);
> -		}
>  		rcu_read_unlock();
>  	}
>  }

Add discussion link:
https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
  2025-02-12  2:57 [PATCH] mm/oom_kill: revert watchdog reset in global OOM process Chen Ridong
  2025-02-12  3:24 ` Chen Ridong
@ 2025-02-12  8:57 ` Michal Hocko
  2025-02-12  9:19   ` Chen Ridong
  1 sibling, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2025-02-12  8:57 UTC (permalink / raw)
  To: Chen Ridong
  Cc: akpm, hannes, yosryahmed, roman.gushchin, shakeel.butt,
	muchun.song, davidf, vbabka, mkoutny, paulmck, linux-mm,
	linux-kernel, cgroups, chenridong, wangweiyang2

On Wed 12-02-25 02:57:07, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> Unlike memcg OOM, which is relatively common, global OOM events are rare
> and typically indicate that the entire system is under severe memory
> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> process") added the touch_softlockup_watchdog in the global OOM handler to
> suppess the soft lockup issues. However, while this change can suppress
> soft lockup warnings, it does not address RCU stalls, which can still be
> detected and may cause unnecessary disturbances. Simply remove the
> modification from the global OOM handler.
> 
> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")

But this is not really fixing anything, is it? While this doesn't
address a potential RCU stall it doesn't address any actual problem.
So why do we want to do this?

> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  mm/oom_kill.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..2d8b27604ef8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -44,7 +44,6 @@
>  #include <linux/init.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/cred.h>
> -#include <linux/nmi.h>
>  
>  #include <asm/tlb.h>
>  #include "internal.h"
> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>  	else {
>  		struct task_struct *p;
> -		int i = 0;
>  
>  		rcu_read_lock();
> -		for_each_process(p) {
> -			/* Avoid potential softlockup warning */
> -			if ((++i & 1023) == 0)
> -				touch_softlockup_watchdog();
> +		for_each_process(p)
>  			dump_task(p, oc);
> -		}
>  		rcu_read_unlock();
>  	}
>  }
> -- 
> 2.34.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
  2025-02-12  8:57 ` Michal Hocko
@ 2025-02-12  9:19   ` Chen Ridong
  2025-02-12  9:34     ` Vlastimil Babka
  0 siblings, 1 reply; 7+ messages in thread
From: Chen Ridong @ 2025-02-12  9:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, hannes, yosryahmed, roman.gushchin, shakeel.butt,
	muchun.song, davidf, vbabka, mkoutny, paulmck, linux-mm,
	linux-kernel, cgroups, chenridong, wangweiyang2



On 2025/2/12 16:57, Michal Hocko wrote:
> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>> and typically indicate that the entire system is under severe memory
>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>> process") added the touch_softlockup_watchdog in the global OOM handler to
>> suppess the soft lockup issues. However, while this change can suppress
>> soft lockup warnings, it does not address RCU stalls, which can still be
>> detected and may cause unnecessary disturbances. Simply remove the
>> modification from the global OOM handler.
>>
>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
> 
> But this is not really fixing anything, is it? While this doesn't
> address a potential RCU stall it doesn't address any actual problem.
> So why do we want to do this?
> 


[1]
https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/

As previously discussed, the work I have done on the global OOM is 'half
of the job'. Based on our discussions, I thought that it would be best
to abandon this approach for global OOM. Therefore, I am sending this
patch to revert the changes.

Or just leave it?

Best regards,
Ridong

>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>  mm/oom_kill.c | 8 +-------
>>  1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 25923cfec9c6..2d8b27604ef8 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -44,7 +44,6 @@
>>  #include <linux/init.h>
>>  #include <linux/mmu_notifier.h>
>>  #include <linux/cred.h>
>> -#include <linux/nmi.h>
>>  
>>  #include <asm/tlb.h>
>>  #include "internal.h"
>> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
>>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>>  	else {
>>  		struct task_struct *p;
>> -		int i = 0;
>>  
>>  		rcu_read_lock();
>> -		for_each_process(p) {
>> -			/* Avoid potential softlockup warning */
>> -			if ((++i & 1023) == 0)
>> -				touch_softlockup_watchdog();
>> +		for_each_process(p)
>>  			dump_task(p, oc);
>> -		}
>>  		rcu_read_unlock();
>>  	}
>>  }
>> -- 
>> 2.34.1
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
  2025-02-12  9:19   ` Chen Ridong
@ 2025-02-12  9:34     ` Vlastimil Babka
  2025-02-12  9:52       ` Chen Ridong
  2025-02-12 11:58       ` Michal Hocko
  0 siblings, 2 replies; 7+ messages in thread
From: Vlastimil Babka @ 2025-02-12  9:34 UTC (permalink / raw)
  To: Chen Ridong, Michal Hocko
  Cc: akpm, hannes, yosryahmed, roman.gushchin, shakeel.butt,
	muchun.song, davidf, mkoutny, paulmck, linux-mm, linux-kernel,
	cgroups, chenridong, wangweiyang2

On 2/12/25 10:19, Chen Ridong wrote:
> 
> 
> On 2025/2/12 16:57, Michal Hocko wrote:
>> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>>> and typically indicate that the entire system is under severe memory
>>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>>> process") added the touch_softlockup_watchdog in the global OOM handler to
>>> suppess the soft lockup issues. However, while this change can suppress
>>> soft lockup warnings, it does not address RCU stalls, which can still be
>>> detected and may cause unnecessary disturbances. Simply remove the
>>> modification from the global OOM handler.
>>>
>>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
>> 
>> But this is not really fixing anything, is it? While this doesn't
>> address a potential RCU stall it doesn't address any actual problem.
>> So why do we want to do this?
>> 
> 
> 
> [1]
> https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
> 
> As previously discussed, the work I have done on the global OOM is 'half
> of the job'. Based on our discussions, I thought that it would be best
> to abandon this approach for global OOM. Therefore, I am sending this
> patch to revert the changes.
> 
> Or just leave it?

I suggested that part doesn't need to be in the patch, but if it was merged
with it, we can just leave it there. Thanks.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
  2025-02-12  9:34     ` Vlastimil Babka
@ 2025-02-12  9:52       ` Chen Ridong
  2025-02-12 11:58       ` Michal Hocko
  1 sibling, 0 replies; 7+ messages in thread
From: Chen Ridong @ 2025-02-12  9:52 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: akpm, hannes, yosryahmed, roman.gushchin, shakeel.butt,
	muchun.song, davidf, mkoutny, paulmck, linux-mm, linux-kernel,
	cgroups, chenridong, wangweiyang2



On 2025/2/12 17:34, Vlastimil Babka wrote:
> On 2/12/25 10:19, Chen Ridong wrote:
>>
>>
>> On 2025/2/12 16:57, Michal Hocko wrote:
>>> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>
>>>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>>>> and typically indicate that the entire system is under severe memory
>>>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>>>> process") added the touch_softlockup_watchdog in the global OOM handler to
>>>> suppess the soft lockup issues. However, while this change can suppress
>>>> soft lockup warnings, it does not address RCU stalls, which can still be
>>>> detected and may cause unnecessary disturbances. Simply remove the
>>>> modification from the global OOM handler.
>>>>
>>>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
>>>
>>> But this is not really fixing anything, is it? While this doesn't
>>> address a potential RCU stall it doesn't address any actual problem.
>>> So why do we want to do this?
>>>
>>
>>
>> [1]
>> https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
>>
>> As previously discussed, the work I have done on the global OOM is 'half
>> of the job'. Based on our discussions, I thought that it would be best
>> to abandon this approach for global OOM. Therefore, I am sending this
>> patch to revert the changes.
>>
>> Or just leave it?
> 
> I suggested that part doesn't need to be in the patch, but if it was merged
> with it, we can just leave it there. Thanks.

See. Thank you very much.

Best regards,
Ridong



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
  2025-02-12  9:34     ` Vlastimil Babka
  2025-02-12  9:52       ` Chen Ridong
@ 2025-02-12 11:58       ` Michal Hocko
  1 sibling, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2025-02-12 11:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Chen Ridong, akpm, hannes, yosryahmed, roman.gushchin,
	shakeel.butt, muchun.song, davidf, mkoutny, paulmck, linux-mm,
	linux-kernel, cgroups, chenridong, wangweiyang2

On Wed 12-02-25 10:34:06, Vlastimil Babka wrote:
> On 2/12/25 10:19, Chen Ridong wrote:
> > 
> > 
> > On 2025/2/12 16:57, Michal Hocko wrote:
> >> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
> >>> From: Chen Ridong <chenridong@huawei.com>
> >>>
> >>> Unlike memcg OOM, which is relatively common, global OOM events are rare
> >>> and typically indicate that the entire system is under severe memory
> >>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> >>> process") added the touch_softlockup_watchdog in the global OOM handler to
> >>> suppess the soft lockup issues. However, while this change can suppress
> >>> soft lockup warnings, it does not address RCU stalls, which can still be
> >>> detected and may cause unnecessary disturbances. Simply remove the
> >>> modification from the global OOM handler.
> >>>
> >>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
> >> 
> >> But this is not really fixing anything, is it? While this doesn't
> >> address a potential RCU stall it doesn't address any actual problem.
> >> So why do we want to do this?
> >> 
> > 
> > 
> > [1]
> > https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
> > 
> > As previously discussed, the work I have done on the global OOM is 'half
> > of the job'. Based on our discussions, I thought that it would be best
> > to abandon this approach for global OOM. Therefore, I am sending this
> > patch to revert the changes.
> > 
> > Or just leave it?
> 
> I suggested that part doesn't need to be in the patch, but if it was merged
> with it, we can just leave it there. Thanks.

Agreed!

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-02-12 11:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-12  2:57 [PATCH] mm/oom_kill: revert watchdog reset in global OOM process Chen Ridong
2025-02-12  3:24 ` Chen Ridong
2025-02-12  8:57 ` Michal Hocko
2025-02-12  9:19   ` Chen Ridong
2025-02-12  9:34     ` Vlastimil Babka
2025-02-12  9:52       ` Chen Ridong
2025-02-12 11:58       ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox