* [PATCH v8 0/3] Improvements to Victim Process Thawing and OOM Reaper Traversal Order
@ 2025-09-09 9:06 zhongjinji
2025-09-09 9:06 ` [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims zhongjinji
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: zhongjinji @ 2025-09-09 9:06 UTC (permalink / raw)
To: mhocko
Cc: rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, surenb, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han, zhongjinji
This patch series focuses on optimizing victim process thawing and refining
the traversal order of the OOM reaper. Since __thaw_task is used to thaw a
single thread of the victim, thawing only one thread cannot guarantee the
exit of the OOM victim when it is frozen. Patch 1 and patch 2 thaw the entire
process of the OOM victim to ensure that OOM victims are able to terminate
themselves. Even if the oom_reaper is delayed, patch 3 is still beneficial
for reaping processes with a large address space footprint, and it also
greatly improves process_mrelease.
---
v7 -> v8:
- Introduce thaw_oom_process() for thawing OOM victims. [12]
- Use RCU protection for thread traversal in thaw_oom_process.
v6 -> v7:
- Thawing the victim process to ensure that it can terminate on its own. [10]
- Since the delay reaper is no longer skipped, I'm not sure whether patch 2
will still be accepted. Revise the Changelog for patch 2. [11]
- Remove report tags
v5 -> v6:
- Use mas_for_each_rev() for VMA traversal [6]
- Simplify the judgment of whether to delay in queue_oom_reaper() [7]
- Refine changelog to better capture the essence of the changes [8]
- Use READ_ONCE(tsk->frozen) instead of checking mm and additional
checks inside for_each_process(), as it is sufficient [9]
- Add report tags because fengbaopeng and tianxiaobin reported the
high load issue of the reaper
v4 -> v5:
- Detect frozen state directly, avoid special futex handling. [3]
- Use mas_find_rev() for VMA traversal to avoid skipping entries. [4]
- Only check should_delay_oom_reap() in queue_oom_reaper(). [5]
v3 -> v4:
- Renamed functions and parameters for clarity. [2]
- Added should_delay_oom_reap() for OOM reap decisions.
- Traverse maple tree in reverse for improved behavior.
v2 -> v3:
- Fixed Subject prefix error.
v1 -> v2:
- Check robust_list for all threads, not just one. [1]
Reference:
[1] https://lore.kernel.org/linux-mm/u3mepw3oxj7cywezna4v72y2hvyc7bafkmsbirsbfuf34zpa7c@b23sc3rvp2gp/
[2] https://lore.kernel.org/linux-mm/87cy99g3k6.ffs@tglx/
[3] https://lore.kernel.org/linux-mm/aKRWtjRhE_HgFlp2@tiehlicka/
[4] https://lore.kernel.org/linux-mm/26larxehoe3a627s4fxsqghriwctays4opm4hhme3uk7ybjc5r@pmwh4s4yv7lm/
[5] https://lore.kernel.org/linux-mm/d5013a33-c08a-44c5-a67f-9dc8fd73c969@lucifer.local/
[6] https://lore.kernel.org/linux-mm/nwh7gegmvoisbxlsfwslobpbqku376uxdj2z32owkbftvozt3x@4dfet73fh2yy/
[7] https://lore.kernel.org/linux-mm/af4edeaf-d3c9-46a9-a300-dbaf5936e7d6@lucifer.local/
[8] https://lore.kernel.org/linux-mm/aK71W1ITmC_4I_RY@tiehlicka/
[9] https://lore.kernel.org/linux-mm/jzzdeczuyraup2zrspl6b74muf3bly2a3acejfftcldfmz4ekk@s5mcbeim34my/
[10] https://lore.kernel.org/linux-mm/aLWmf6qZHTA0hMpU@tiehlicka/
[11] https://lore.kernel.org/linux-mm/aLVOICSkyvVRKD94@tiehlicka/
[12] https://lore.kernel.org/linux-mm/aLg0QZQ5kXNJgDMF@tiehlicka/
The earlier post:
v7: https://lore.kernel.org/linux-mm/20250903092729.10611-1-zhongjinji@honor.com/
v6: https://lore.kernel.org/linux-mm/20250829065550.29571-1-zhongjinji@honor.com/
v5: https://lore.kernel.org/linux-mm/20250825133855.30229-1-zhongjinji@honor.com/
v4: https://lore.kernel.org/linux-mm/20250814135555.17493-1-zhongjinji@honor.com/
v3: https://lore.kernel.org/linux-mm/20250804030341.18619-1-zhongjinji@honor.com/
v2: https://lore.kernel.org/linux-mm/20250801153649.23244-1-zhongjinji@honor.com/
v1: https://lore.kernel.org/linux-mm/20250731102904.8615-1-zhongjinji@honor.com/
zhongjinji (3):
mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims
mm/oom_kill: Thaw the entire OOM victim process
mm/oom_kill: OOM reaper walks VMA maple tree in reverse order
include/linux/freezer.h | 2 ++
kernel/freezer.c | 19 +++++++++++++++++++
mm/oom_kill.c | 16 +++++++++++-----
3 files changed, 32 insertions(+), 5 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims
2025-09-09 9:06 [PATCH v8 0/3] Improvements to Victim Process Thawing and OOM Reaper Traversal Order zhongjinji
@ 2025-09-09 9:06 ` zhongjinji
2025-09-09 9:15 ` Michal Hocko
2025-09-09 9:06 ` [PATCH v8 2/3] mm/oom_kill: Thaw the entire OOM victim process zhongjinji
2025-09-09 9:06 ` [PATCH v8 3/3] mm/oom_kill: The OOM reaper traverses the VMA maple tree in reverse order zhongjinji
2 siblings, 1 reply; 17+ messages in thread
From: zhongjinji @ 2025-09-09 9:06 UTC (permalink / raw)
To: mhocko
Cc: rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, surenb, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han, zhongjinji
OOM killer is a mechanism that selects and kills processes when the system
runs out of memory to reclaim resources and keep the system stable.
However, the oom victim cannot terminate on its own when it is frozen,
because __thaw_task() only thaws one thread of the victim, while
the other threads remain in the frozen state.
Since __thaw_task did not fully thaw the OOM victim for self-termination,
introduce thaw_oom_process() to properly thaw OOM victims.
Signed-off-by: zhongjinji <zhongjinji@honor.com>
---
include/linux/freezer.h | 2 ++
kernel/freezer.c | 19 +++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index b303472255be..19a4b57950cd 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -47,6 +47,7 @@ extern int freeze_processes(void);
extern int freeze_kernel_threads(void);
extern void thaw_processes(void);
extern void thaw_kernel_threads(void);
+extern void thaw_oom_process(struct task_struct *p);
static inline bool try_to_freeze(void)
{
@@ -80,6 +81,7 @@ static inline int freeze_processes(void) { return -ENOSYS; }
static inline int freeze_kernel_threads(void) { return -ENOSYS; }
static inline void thaw_processes(void) {}
static inline void thaw_kernel_threads(void) {}
+static inline void thaw_oom_process(struct task_struct *p) {}
static inline bool try_to_freeze(void) { return false; }
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 6a96149aede9..17970e0be8a7 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -206,6 +206,25 @@ void __thaw_task(struct task_struct *p)
wake_up_state(p, TASK_FROZEN);
}
+/*
+ * thaw_oom_process - thaw the OOM victim process
+ * @p: process to be thawed
+ *
+ * Sets TIF_MEMDIE for all threads in the process group and thaws them.
+ * Threads with TIF_MEMDIE are ignored by the freezer.
+ */
+void thaw_oom_process(struct task_struct *p)
+{
+ struct task_struct *t;
+
+ rcu_read_lock();
+ for_each_thread(p, t) {
+ set_tsk_thread_flag(t, TIF_MEMDIE);
+ __thaw_task(t);
+ }
+ rcu_read_unlock();
+}
+
/**
* set_freezable - make %current freezable
*
--
2.17.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v8 2/3] mm/oom_kill: Thaw the entire OOM victim process
2025-09-09 9:06 [PATCH v8 0/3] Improvements to Victim Process Thawing and OOM Reaper Traversal Order zhongjinji
2025-09-09 9:06 ` [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims zhongjinji
@ 2025-09-09 9:06 ` zhongjinji
2025-09-09 9:15 ` Michal Hocko
2025-09-09 9:06 ` [PATCH v8 3/3] mm/oom_kill: The OOM reaper traverses the VMA maple tree in reverse order zhongjinji
2 siblings, 1 reply; 17+ messages in thread
From: zhongjinji @ 2025-09-09 9:06 UTC (permalink / raw)
To: mhocko
Cc: rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, surenb, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han, zhongjinji
OOM killer is a mechanism that selects and kills processes when the system
runs out of memory to reclaim resources and keep the system stable.
However, the oom victim cannot terminate on its own when it is frozen,
because __thaw_task() only thaws one thread of the victim, while
the other threads remain in the frozen state.
This change will thaw the entire victim process when OOM occurs,
ensuring that the oom victim can terminate on its own.
Signed-off-by: zhongjinji <zhongjinji@honor.com>
---
mm/oom_kill.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 25923cfec9c6..ffa50a1f0132 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -772,12 +772,11 @@ static void mark_oom_victim(struct task_struct *tsk)
mmgrab(tsk->signal->oom_mm);
/*
- * Make sure that the task is woken up from uninterruptible sleep
+ * Make sure that the process is woken up from uninterruptible sleep
* if it is frozen because OOM killer wouldn't be able to free
- * any memory and livelock. freezing_slow_path will tell the freezer
- * that TIF_MEMDIE tasks should be ignored.
+ * any memory and livelock.
*/
- __thaw_task(tsk);
+ thaw_oom_process(tsk);
atomic_inc(&oom_victims);
cred = get_task_cred(tsk);
trace_mark_victim(tsk, cred->uid.val);
--
2.17.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v8 3/3] mm/oom_kill: The OOM reaper traverses the VMA maple tree in reverse order
2025-09-09 9:06 [PATCH v8 0/3] Improvements to Victim Process Thawing and OOM Reaper Traversal Order zhongjinji
2025-09-09 9:06 ` [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims zhongjinji
2025-09-09 9:06 ` [PATCH v8 2/3] mm/oom_kill: Thaw the entire OOM victim process zhongjinji
@ 2025-09-09 9:06 ` zhongjinji
2025-09-09 16:29 ` Suren Baghdasaryan
2 siblings, 1 reply; 17+ messages in thread
From: zhongjinji @ 2025-09-09 9:06 UTC (permalink / raw)
To: mhocko
Cc: rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, surenb, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han, zhongjinji
Although the oom_reaper is delayed and it gives the oom victim chance to
clean up its address space this might take a while especially for
processes with a large address space footprint. In those cases
oom_reaper might start racing with the dying task and compete for shared
resources - e.g. page table lock contention has been observed.
Reduce those races by reaping the oom victim from the other end of the
address space.
It is also a significant improvement for process_mrelease(). When a process
is killed, process_mrelease is used to reap the killed process and often
runs concurrently with the dying task. The test data shows that after
applying the patch, lock contention is greatly reduced during the procedure
of reaping the killed process.
The test is based on arm64.
Without the patch:
|--99.57%-- oom_reaper
| |--0.28%-- [hit in function]
| |--73.58%-- unmap_page_range
| | |--8.67%-- [hit in function]
| | |--41.59%-- __pte_offset_map_lock
| | |--29.47%-- folio_remove_rmap_ptes
| | |--16.11%-- tlb_flush_mmu
| | |--1.66%-- folio_mark_accessed
| | |--0.74%-- free_swap_and_cache_nr
| | |--0.69%-- __tlb_remove_folio_pages
| |--19.94%-- tlb_finish_mmu
| |--3.21%-- folio_remove_rmap_ptes
| |--1.16%-- __tlb_remove_folio_pages
| |--1.16%-- folio_mark_accessed
| |--0.36%-- __pte_offset_map_lock
With the patch:
|--99.53%-- oom_reaper
| |--55.77%-- unmap_page_range
| | |--20.49%-- [hit in function]
| | |--58.30%-- folio_remove_rmap_ptes
| | |--11.48%-- tlb_flush_mmu
| | |--3.33%-- folio_mark_accessed
| | |--2.65%-- __tlb_remove_folio_pages
| | |--1.37%-- _raw_spin_lock
| | |--0.68%-- __mod_lruvec_page_state
| | |--0.51%-- __pte_offset_map_lock
| |--32.21%-- tlb_finish_mmu
| |--6.93%-- folio_remove_rmap_ptes
| |--1.90%-- __tlb_remove_folio_pages
| |--1.55%-- folio_mark_accessed
| |--0.69%-- __pte_offset_map_lock
Signed-off-by: zhongjinji <zhongjinji@honor.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
---
mm/oom_kill.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ffa50a1f0132..52d285da5ba4 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -516,7 +516,7 @@ static bool __oom_reap_task_mm(struct mm_struct *mm)
{
struct vm_area_struct *vma;
bool ret = true;
- VMA_ITERATOR(vmi, mm, 0);
+ MA_STATE(mas, &mm->mm_mt, ULONG_MAX, ULONG_MAX);
/*
* Tell all users of get_user/copy_from_user etc... that the content
@@ -526,7 +526,13 @@ static bool __oom_reap_task_mm(struct mm_struct *mm)
*/
set_bit(MMF_UNSTABLE, &mm->flags);
- for_each_vma(vmi, vma) {
+ /*
+ * It might start racing with the dying task and compete for shared
+ * resources - e.g. page table lock contention has been observed.
+ * Reduce those races by reaping the oom victim from the other end
+ * of the address space.
+ */
+ mas_for_each_rev(&mas, vma, 0) {
if (vma->vm_flags & (VM_HUGETLB|VM_PFNMAP))
continue;
--
2.17.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims
2025-09-09 9:06 ` [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims zhongjinji
@ 2025-09-09 9:15 ` Michal Hocko
2025-09-09 16:27 ` Suren Baghdasaryan
0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2025-09-09 9:15 UTC (permalink / raw)
To: zhongjinji
Cc: rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, surenb, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han
On Tue 09-09-25 17:06:57, zhongjinji wrote:
> OOM killer is a mechanism that selects and kills processes when the system
> runs out of memory to reclaim resources and keep the system stable.
> However, the oom victim cannot terminate on its own when it is frozen,
> because __thaw_task() only thaws one thread of the victim, while
> the other threads remain in the frozen state.
>
> Since __thaw_task did not fully thaw the OOM victim for self-termination,
> introduce thaw_oom_process() to properly thaw OOM victims.
You will need s@thaw_oom_process@thaw_processes@
I would also add the caller in this patch.
> Signed-off-by: zhongjinji <zhongjinji@honor.com>
Other than that looks good to me. With the above fixed feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> include/linux/freezer.h | 2 ++
> kernel/freezer.c | 19 +++++++++++++++++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> index b303472255be..19a4b57950cd 100644
> --- a/include/linux/freezer.h
> +++ b/include/linux/freezer.h
> @@ -47,6 +47,7 @@ extern int freeze_processes(void);
> extern int freeze_kernel_threads(void);
> extern void thaw_processes(void);
> extern void thaw_kernel_threads(void);
> +extern void thaw_oom_process(struct task_struct *p);
>
> static inline bool try_to_freeze(void)
> {
> @@ -80,6 +81,7 @@ static inline int freeze_processes(void) { return -ENOSYS; }
> static inline int freeze_kernel_threads(void) { return -ENOSYS; }
> static inline void thaw_processes(void) {}
> static inline void thaw_kernel_threads(void) {}
> +static inline void thaw_oom_process(struct task_struct *p) {}
>
> static inline bool try_to_freeze(void) { return false; }
>
> diff --git a/kernel/freezer.c b/kernel/freezer.c
> index 6a96149aede9..17970e0be8a7 100644
> --- a/kernel/freezer.c
> +++ b/kernel/freezer.c
> @@ -206,6 +206,25 @@ void __thaw_task(struct task_struct *p)
> wake_up_state(p, TASK_FROZEN);
> }
>
> +/*
> + * thaw_oom_process - thaw the OOM victim process
> + * @p: process to be thawed
> + *
> + * Sets TIF_MEMDIE for all threads in the process group and thaws them.
> + * Threads with TIF_MEMDIE are ignored by the freezer.
> + */
> +void thaw_oom_process(struct task_struct *p)
> +{
> + struct task_struct *t;
> +
> + rcu_read_lock();
> + for_each_thread(p, t) {
> + set_tsk_thread_flag(t, TIF_MEMDIE);
> + __thaw_task(t);
> + }
> + rcu_read_unlock();
> +}
> +
> /**
> * set_freezable - make %current freezable
> *
> --
> 2.17.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 2/3] mm/oom_kill: Thaw the entire OOM victim process
2025-09-09 9:06 ` [PATCH v8 2/3] mm/oom_kill: Thaw the entire OOM victim process zhongjinji
@ 2025-09-09 9:15 ` Michal Hocko
2025-09-09 11:41 ` [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims zhongjinji
2025-09-09 16:23 ` [PATCH v8 2/3] mm/oom_kill: Thaw the entire OOM victim process Suren Baghdasaryan
0 siblings, 2 replies; 17+ messages in thread
From: Michal Hocko @ 2025-09-09 9:15 UTC (permalink / raw)
To: zhongjinji
Cc: rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, surenb, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han
On Tue 09-09-25 17:06:58, zhongjinji wrote:
> OOM killer is a mechanism that selects and kills processes when the system
> runs out of memory to reclaim resources and keep the system stable.
> However, the oom victim cannot terminate on its own when it is frozen,
> because __thaw_task() only thaws one thread of the victim, while
> the other threads remain in the frozen state.
>
> This change will thaw the entire victim process when OOM occurs,
> ensuring that the oom victim can terminate on its own.
fold this into patch 1.
>
> Signed-off-by: zhongjinji <zhongjinji@honor.com>
> ---
> mm/oom_kill.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..ffa50a1f0132 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -772,12 +772,11 @@ static void mark_oom_victim(struct task_struct *tsk)
> mmgrab(tsk->signal->oom_mm);
>
> /*
> - * Make sure that the task is woken up from uninterruptible sleep
> + * Make sure that the process is woken up from uninterruptible sleep
> * if it is frozen because OOM killer wouldn't be able to free
> - * any memory and livelock. freezing_slow_path will tell the freezer
> - * that TIF_MEMDIE tasks should be ignored.
> + * any memory and livelock.
> */
> - __thaw_task(tsk);
> + thaw_oom_process(tsk);
> atomic_inc(&oom_victims);
> cred = get_task_cred(tsk);
> trace_mark_victim(tsk, cred->uid.val);
> --
> 2.17.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims
2025-09-09 9:15 ` Michal Hocko
@ 2025-09-09 11:41 ` zhongjinji
2025-09-09 11:59 ` Michal Hocko
2025-09-09 16:23 ` [PATCH v8 2/3] mm/oom_kill: Thaw the entire OOM victim process Suren Baghdasaryan
1 sibling, 1 reply; 17+ messages in thread
From: zhongjinji @ 2025-09-09 11:41 UTC (permalink / raw)
To: mhocko
Cc: akpm, feng.han, lenb, liam.howlett, linux-kernel, linux-mm,
linux-pm, liulu.liu, lorenzo.stoakes, pavel, rafael, rientjes,
shakeel.butt, surenb, tglx, zhongjinji
> On Tue 09-09-25 17:06:57, zhongjinji wrote:
> > OOM killer is a mechanism that selects and kills processes when the system
> > runs out of memory to reclaim resources and keep the system stable.
> > However, the oom victim cannot terminate on its own when it is frozen,
> > because __thaw_task() only thaws one thread of the victim, while
> > the other threads remain in the frozen state.
> >
> > Since __thaw_task did not fully thaw the OOM victim for self-termination,
> > introduce thaw_oom_process() to properly thaw OOM victims.
>
> You will need s@thaw_oom_process@thaw_processes@
The reason for using thaw_oom_process is that the TIF_MEMDIE flag of the
thawed thread will be set, which means this function can only be used to
thaw processes terminated by the OOM killer.
thaw_processes has already been defined in kernel/power/process.c.
Would it be better to use thaw_process instead?
I am concerned that others might misunderstand the thaw_process function.
thaw_process sets all threads to the TIF_MEMDIE state, so it can only be
used to thaw processes killed by the OOM killer.
If the TIF_MEMDIE flag of a thread is not set, the thread cannot be thawed
regardless of the cgroup state. Should we add a function to set the TIF_MEMDIE
state for all threads, like the implementation below?
-/*
- * thaw_oom_process - thaw the OOM victim process
- * @p: process to be thawed
- *
- * Sets TIF_MEMDIE for all threads in the process group and thaws them.
- * Threads with TIF_MEMDIE are ignored by the freezer.
- */
-void thaw_oom_process(struct task_struct *p)
+void thaw_process(struct task_struct *p)
{
struct task_struct *t;
rcu_read_lock();
for_each_thread(p, t) {
- set_tsk_thread_flag(t, TIF_MEMDIE);
__thaw_task(t);
}
rcu_read_unlock();
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 52d285da5ba4..67b65b249757 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -753,6 +753,17 @@ static inline void queue_oom_reaper(struct task_struct *tsk)
}
#endif /* CONFIG_MMU */
+void mark_oom_victim_die(struct task_struct *p)
+{
+ struct task_struct *t;
+
+ rcu_read_lock();
+ for_each_thread(p, t) {
+ set_tsk_thread_flag(t, TIF_MEMDIE);
+ }
+ rcu_read_unlock();
+}
+
/**
* mark_oom_victim - mark the given task as OOM victim
* @tsk: task to mark
@@ -782,7 +793,8 @@ static void mark_oom_victim(struct task_struct *tsk)
* if it is frozen because OOM killer wouldn't be able to free
* any memory and livelock.
*/
- thaw_oom_process(tsk);
+ mark_oom_victim_die(tsk);
+ thaw_process(tsk);
> I would also add the caller in this patch.
>
> > Signed-off-by: zhongjinji <zhongjinji@honor.com>
>
> Other than that looks good to me. With the above fixed feel free to add
> Acked-by: Michal Hocko <mhocko@suse.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims
2025-09-09 11:41 ` [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims zhongjinji
@ 2025-09-09 11:59 ` Michal Hocko
2025-09-09 13:51 ` zhongjinji
0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2025-09-09 11:59 UTC (permalink / raw)
To: zhongjinji
Cc: akpm, feng.han, lenb, liam.howlett, linux-kernel, linux-mm,
linux-pm, liulu.liu, lorenzo.stoakes, pavel, rafael, rientjes,
shakeel.butt, surenb, tglx
On Tue 09-09-25 19:41:31, zhongjinji wrote:
> > On Tue 09-09-25 17:06:57, zhongjinji wrote:
> > > OOM killer is a mechanism that selects and kills processes when the system
> > > runs out of memory to reclaim resources and keep the system stable.
> > > However, the oom victim cannot terminate on its own when it is frozen,
> > > because __thaw_task() only thaws one thread of the victim, while
> > > the other threads remain in the frozen state.
> > >
> > > Since __thaw_task did not fully thaw the OOM victim for self-termination,
> > > introduce thaw_oom_process() to properly thaw OOM victims.
> >
> > You will need s@thaw_oom_process@thaw_processes@
>
> The reason for using thaw_oom_process is that the TIF_MEMDIE flag of the
> thawed thread will be set, which means this function can only be used to
> thaw processes terminated by the OOM killer.
Just do not set the flag inside the function. I would even say do not
set TIF_MEMDIE to the rest of the thread group at all. More on that
below
> thaw_processes has already been defined in kernel/power/process.c.
> Would it be better to use thaw_process instead?
Sorry I meant thaw_process as thaw_processes is handling all the
processes.
> I am concerned that others might misunderstand the thaw_process function.
> thaw_process sets all threads to the TIF_MEMDIE state, so it can only be
> used to thaw processes killed by the OOM killer.
And that is the reason why it shouldn't be doing that. It should thaw
the whole thread group. That's it.
> If the TIF_MEMDIE flag of a thread is not set, the thread cannot be thawed
> regardless of the cgroup state.
Why would that be the case. TIF_MEMDIE should only denote the victim
should be able to access memory reserves. Why the whole thread group
needs that? While more threads could be caught in the allocation path
this is a sort of boost at best. It cannot guarantee any forward
progress and we have kept marking only the first thread that way without
any issues.
> Should we add a function to set the TIF_MEMDIE
> state for all threads, like the implementation below?
>
> -/*
> - * thaw_oom_process - thaw the OOM victim process
> - * @p: process to be thawed
> - *
> - * Sets TIF_MEMDIE for all threads in the process group and thaws them.
> - * Threads with TIF_MEMDIE are ignored by the freezer.
> - */
> -void thaw_oom_process(struct task_struct *p)
> +void thaw_process(struct task_struct *p)
> {
> struct task_struct *t;
>
> rcu_read_lock();
> for_each_thread(p, t) {
> - set_tsk_thread_flag(t, TIF_MEMDIE);
> __thaw_task(t);
> }
> rcu_read_unlock();
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 52d285da5ba4..67b65b249757 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -753,6 +753,17 @@ static inline void queue_oom_reaper(struct task_struct *tsk)
> }
> #endif /* CONFIG_MMU */
>
> +void mark_oom_victim_die(struct task_struct *p)
> +{
> + struct task_struct *t;
> +
> + rcu_read_lock();
> + for_each_thread(p, t) {
> + set_tsk_thread_flag(t, TIF_MEMDIE);
> + }
> + rcu_read_unlock();
> +}
> +
> /**
> * mark_oom_victim - mark the given task as OOM victim
> * @tsk: task to mark
> @@ -782,7 +793,8 @@ static void mark_oom_victim(struct task_struct *tsk)
> * if it is frozen because OOM killer wouldn't be able to free
> * any memory and livelock.
> */
> - thaw_oom_process(tsk);
> + mark_oom_victim_die(tsk);
> + thaw_process(tsk);
>
> > I would also add the caller in this patch.
> >
> > > Signed-off-by: zhongjinji <zhongjinji@honor.com>
> >
> > Other than that looks good to me. With the above fixed feel free to add
> > Acked-by: Michal Hocko <mhocko@suse.com>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims
2025-09-09 11:59 ` Michal Hocko
@ 2025-09-09 13:51 ` zhongjinji
2025-09-09 14:02 ` Michal Hocko
0 siblings, 1 reply; 17+ messages in thread
From: zhongjinji @ 2025-09-09 13:51 UTC (permalink / raw)
To: mhocko
Cc: akpm, feng.han, lenb, liam.howlett, linux-kernel, linux-mm,
linux-pm, liulu.liu, lorenzo.stoakes, pavel, rafael, rientjes,
shakeel.butt, surenb, tglx, zhongjinji
> On Tue 09-09-25 19:41:31, zhongjinji wrote:
> > > On Tue 09-09-25 17:06:57, zhongjinji wrote:
> > > > OOM killer is a mechanism that selects and kills processes when the system
> > > > runs out of memory to reclaim resources and keep the system stable.
> > > > However, the oom victim cannot terminate on its own when it is frozen,
> > > > because __thaw_task() only thaws one thread of the victim, while
> > > > the other threads remain in the frozen state.
> > > >
> > > > Since __thaw_task did not fully thaw the OOM victim for self-termination,
> > > > introduce thaw_oom_process() to properly thaw OOM victims.
> > >
> > > You will need s@thaw_oom_process@thaw_processes@
> >
> > The reason for using thaw_oom_process is that the TIF_MEMDIE flag of the
> > thawed thread will be set, which means this function can only be used to
> > thaw processes terminated by the OOM killer.
>
> Just do not set the flag inside the function. I would even say do not
> set TIF_MEMDIE to the rest of the thread group at all. More on that
> below
>
> > thaw_processes has already been defined in kernel/power/process.c.
> > Would it be better to use thaw_process instead?
>
> Sorry I meant thaw_process as thaw_processes is handling all the
> processes.
>
> > I am concerned that others might misunderstand the thaw_process function.
> > thaw_process sets all threads to the TIF_MEMDIE state, so it can only be
> > used to thaw processes killed by the OOM killer.
>
> And that is the reason why it shouldn't be doing that. It should thaw
> the whole thread group. That's it.
>
> > If the TIF_MEMDIE flag of a thread is not set, the thread cannot be thawed
> > regardless of the cgroup state.
>
> Why would that be the case. TIF_MEMDIE should only denote the victim
> should be able to access memory reserves. Why the whole thread group
> needs that? While more threads could be caught in the allocation path
> this is a sort of boost at best. It cannot guarantee any forward
> progress and we have kept marking only the first thread that way without
> any issues.
When a process is frozen, all its threads enter __refrigerator() (in kernel/freezer.c).
When __thaw_task is called, the threads are woken up and check the freezing(current)
state (in __refrigerator). The freezing check is implemented via freezing_slow_path.
When TIF_MEMDIE is set for a thread, freezing_slow_path will return false, allowing
the thread to exit the infinite loop in __refrigerator(), and thus the thread will
be thawed.
The following code can explain how TIF_MEMDIE works in thread thawing.
__refrigerator
for (;;) {
freezing = freezing(current)
freezing_slow_path
if (test_tsk_thread_flag(p, TIF_MEMDIE))
return false;
if (!freezing)
break;
schedule();
}
Since thread_info is not shared within a thread group, TIF_MEMDIE for each thread
must be set so that all threads can be thawed.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims
2025-09-09 13:51 ` zhongjinji
@ 2025-09-09 14:02 ` Michal Hocko
2025-09-09 14:47 ` zhongjinji
0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2025-09-09 14:02 UTC (permalink / raw)
To: zhongjinji
Cc: akpm, feng.han, lenb, liam.howlett, linux-kernel, linux-mm,
linux-pm, liulu.liu, lorenzo.stoakes, pavel, rafael, rientjes,
shakeel.butt, surenb, tglx
On Tue 09-09-25 21:51:52, zhongjinji wrote:
> > On Tue 09-09-25 19:41:31, zhongjinji wrote:
> > > > On Tue 09-09-25 17:06:57, zhongjinji wrote:
> > > > > OOM killer is a mechanism that selects and kills processes when the system
> > > > > runs out of memory to reclaim resources and keep the system stable.
> > > > > However, the oom victim cannot terminate on its own when it is frozen,
> > > > > because __thaw_task() only thaws one thread of the victim, while
> > > > > the other threads remain in the frozen state.
> > > > >
> > > > > Since __thaw_task did not fully thaw the OOM victim for self-termination,
> > > > > introduce thaw_oom_process() to properly thaw OOM victims.
> > > >
> > > > You will need s@thaw_oom_process@thaw_processes@
> > >
> > > The reason for using thaw_oom_process is that the TIF_MEMDIE flag of the
> > > thawed thread will be set, which means this function can only be used to
> > > thaw processes terminated by the OOM killer.
> >
> > Just do not set the flag inside the function. I would even say do not
> > set TIF_MEMDIE to the rest of the thread group at all. More on that
> > below
> >
> > > thaw_processes has already been defined in kernel/power/process.c.
> > > Would it be better to use thaw_process instead?
> >
> > Sorry I meant thaw_process as thaw_processes is handling all the
> > processes.
> >
> > > I am concerned that others might misunderstand the thaw_process function.
> > > thaw_process sets all threads to the TIF_MEMDIE state, so it can only be
> > > used to thaw processes killed by the OOM killer.
> >
> > And that is the reason why it shouldn't be doing that. It should thaw
> > the whole thread group. That's it.
> >
> > > If the TIF_MEMDIE flag of a thread is not set, the thread cannot be thawed
> > > regardless of the cgroup state.
> >
> > Why would that be the case. TIF_MEMDIE should only denote the victim
> > should be able to access memory reserves. Why the whole thread group
> > needs that? While more threads could be caught in the allocation path
> > this is a sort of boost at best. It cannot guarantee any forward
> > progress and we have kept marking only the first thread that way without
> > any issues.
>
> When a process is frozen, all its threads enter __refrigerator() (in kernel/freezer.c).
> When __thaw_task is called, the threads are woken up and check the freezing(current)
> state (in __refrigerator). The freezing check is implemented via freezing_slow_path.
> When TIF_MEMDIE is set for a thread, freezing_slow_path will return false, allowing
> the thread to exit the infinite loop in __refrigerator(), and thus the thread will
> be thawed.
>
> The following code can explain how TIF_MEMDIE works in thread thawing.
> __refrigerator
> for (;;) {
> freezing = freezing(current)
> freezing_slow_path
> if (test_tsk_thread_flag(p, TIF_MEMDIE))
> return false;
> if (!freezing)
> break;
> schedule();
> }
OK, I see. We could deal with that by checking tsk_is_oom_victim()
instead of TIF_MEMDIE
> Since thread_info is not shared within a thread group, TIF_MEMDIE for each thread
> must be set so that all threads can be thawed.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims
2025-09-09 14:02 ` Michal Hocko
@ 2025-09-09 14:47 ` zhongjinji
0 siblings, 0 replies; 17+ messages in thread
From: zhongjinji @ 2025-09-09 14:47 UTC (permalink / raw)
To: mhocko
Cc: akpm, feng.han, lenb, liam.howlett, linux-kernel, linux-mm,
linux-pm, liulu.liu, lorenzo.stoakes, pavel, rafael, rientjes,
shakeel.butt, surenb, tglx, zhongjinji
> > > On Tue 09-09-25 19:41:31, zhongjinji wrote:
> > > > > On Tue 09-09-25 17:06:57, zhongjinji wrote:
> > > > > > OOM killer is a mechanism that selects and kills processes when the system
> > > > > > runs out of memory to reclaim resources and keep the system stable.
> > > > > > However, the oom victim cannot terminate on its own when it is frozen,
> > > > > > because __thaw_task() only thaws one thread of the victim, while
> > > > > > the other threads remain in the frozen state.
> > > > > >
> > > > > > Since __thaw_task did not fully thaw the OOM victim for self-termination,
> > > > > > introduce thaw_oom_process() to properly thaw OOM victims.
> > > > >
> > > > > You will need s@thaw_oom_process@thaw_processes@
> > > >
> > > > The reason for using thaw_oom_process is that the TIF_MEMDIE flag of the
> > > > thawed thread will be set, which means this function can only be used to
> > > > thaw processes terminated by the OOM killer.
> > >
> > > Just do not set the flag inside the function. I would even say do not
> > > set TIF_MEMDIE to the rest of the thread group at all. More on that
> > > below
> > >
> > > > thaw_processes has already been defined in kernel/power/process.c.
> > > > Would it be better to use thaw_process instead?
> > >
> > > Sorry I meant thaw_process as thaw_processes is handling all the
> > > processes.
> > >
> > > > I am concerned that others might misunderstand the thaw_process function.
> > > > thaw_process sets all threads to the TIF_MEMDIE state, so it can only be
> > > > used to thaw processes killed by the OOM killer.
> > >
> > > And that is the reason why it shouldn't be doing that. It should thaw
> > > the whole thread group. That's it.
> > >
> > > > If the TIF_MEMDIE flag of a thread is not set, the thread cannot be thawed
> > > > regardless of the cgroup state.
> > >
> > > Why would that be the case. TIF_MEMDIE should only denote the victim
> > > should be able to access memory reserves. Why the whole thread group
> > > needs that? While more threads could be caught in the allocation path
> > > this is a sort of boost at best. It cannot guarantee any forward
> > > progress and we have kept marking only the first thread that way without
> > > any issues.
> >
> > When a process is frozen, all its threads enter __refrigerator() (in kernel/freezer.c).
> > When __thaw_task is called, the threads are woken up and check the freezing(current)
> > state (in __refrigerator). The freezing check is implemented via freezing_slow_path.
> > When TIF_MEMDIE is set for a thread, freezing_slow_path will return false, allowing
> > the thread to exit the infinite loop in __refrigerator(), and thus the thread will
> > be thawed.
> >
> > The following code can explain how TIF_MEMDIE works in thread thawing.
> > __refrigerator
> > for (;;) {
> > freezing = freezing(current)
> > freezing_slow_path
> > if (test_tsk_thread_flag(p, TIF_MEMDIE))
> > return false;
> > if (!freezing)
> > break;
> > schedule();
> > }
>
> OK, I see. We could deal with that by checking tsk_is_oom_victim()
> instead of TIF_MEMDIE
Thank you, this looks great. It seems that oom_reserves_allowed implies
that tsk_is_oom_victim is not always effective (in page_alloc.c).
I will check it.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 2/3] mm/oom_kill: Thaw the entire OOM victim process
2025-09-09 9:15 ` Michal Hocko
2025-09-09 11:41 ` [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims zhongjinji
@ 2025-09-09 16:23 ` Suren Baghdasaryan
1 sibling, 0 replies; 17+ messages in thread
From: Suren Baghdasaryan @ 2025-09-09 16:23 UTC (permalink / raw)
To: Michal Hocko
Cc: zhongjinji, rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han
On Tue, Sep 9, 2025 at 2:15 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 09-09-25 17:06:58, zhongjinji wrote:
> > OOM killer is a mechanism that selects and kills processes when the system
> > runs out of memory to reclaim resources and keep the system stable.
> > However, the oom victim cannot terminate on its own when it is frozen,
> > because __thaw_task() only thaws one thread of the victim, while
> > the other threads remain in the frozen state.
> >
> > This change will thaw the entire victim process when OOM occurs,
> > ensuring that the oom victim can terminate on its own.
>
> fold this into patch 1.
+1
With that done,
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
>
> >
> > Signed-off-by: zhongjinji <zhongjinji@honor.com>
> > ---
> > mm/oom_kill.c | 7 +++----
> > 1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 25923cfec9c6..ffa50a1f0132 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -772,12 +772,11 @@ static void mark_oom_victim(struct task_struct *tsk)
> > mmgrab(tsk->signal->oom_mm);
> >
> > /*
> > - * Make sure that the task is woken up from uninterruptible sleep
> > + * Make sure that the process is woken up from uninterruptible sleep
> > * if it is frozen because OOM killer wouldn't be able to free
> > - * any memory and livelock. freezing_slow_path will tell the freezer
> > - * that TIF_MEMDIE tasks should be ignored.
> > + * any memory and livelock.
> > */
> > - __thaw_task(tsk);
> > + thaw_oom_process(tsk);
> > atomic_inc(&oom_victims);
> > cred = get_task_cred(tsk);
> > trace_mark_victim(tsk, cred->uid.val);
> > --
> > 2.17.1
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims
2025-09-09 9:15 ` Michal Hocko
@ 2025-09-09 16:27 ` Suren Baghdasaryan
2025-09-09 16:44 ` Michal Hocko
0 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2025-09-09 16:27 UTC (permalink / raw)
To: Michal Hocko
Cc: zhongjinji, rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han
On Tue, Sep 9, 2025 at 2:15 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 09-09-25 17:06:57, zhongjinji wrote:
> > OOM killer is a mechanism that selects and kills processes when the system
> > runs out of memory to reclaim resources and keep the system stable.
> > However, the oom victim cannot terminate on its own when it is frozen,
> > because __thaw_task() only thaws one thread of the victim, while
> > the other threads remain in the frozen state.
> >
> > Since __thaw_task did not fully thaw the OOM victim for self-termination,
> > introduce thaw_oom_process() to properly thaw OOM victims.
>
> You will need s@thaw_oom_process@thaw_processes@
Do you suggest renaming thaw_oom_process() into thaw_processes()
(s/thaw_oom_process/thaw_processes)? If so, I don't think that's a
better name considering the function sets TIF_MEMDIE flag. From that
perspective less generic thaw_oom_process() seems appropriate, no?
>
> I would also add the caller in this patch.
>
> > Signed-off-by: zhongjinji <zhongjinji@honor.com>
>
> Other than that looks good to me. With the above fixed feel free to add
> Acked-by: Michal Hocko <mhocko@suse.com>
>
> > ---
> > include/linux/freezer.h | 2 ++
> > kernel/freezer.c | 19 +++++++++++++++++++
> > 2 files changed, 21 insertions(+)
> >
> > diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> > index b303472255be..19a4b57950cd 100644
> > --- a/include/linux/freezer.h
> > +++ b/include/linux/freezer.h
> > @@ -47,6 +47,7 @@ extern int freeze_processes(void);
> > extern int freeze_kernel_threads(void);
> > extern void thaw_processes(void);
> > extern void thaw_kernel_threads(void);
> > +extern void thaw_oom_process(struct task_struct *p);
> >
> > static inline bool try_to_freeze(void)
> > {
> > @@ -80,6 +81,7 @@ static inline int freeze_processes(void) { return -ENOSYS; }
> > static inline int freeze_kernel_threads(void) { return -ENOSYS; }
> > static inline void thaw_processes(void) {}
> > static inline void thaw_kernel_threads(void) {}
> > +static inline void thaw_oom_process(struct task_struct *p) {}
> >
> > static inline bool try_to_freeze(void) { return false; }
> >
> > diff --git a/kernel/freezer.c b/kernel/freezer.c
> > index 6a96149aede9..17970e0be8a7 100644
> > --- a/kernel/freezer.c
> > +++ b/kernel/freezer.c
> > @@ -206,6 +206,25 @@ void __thaw_task(struct task_struct *p)
> > wake_up_state(p, TASK_FROZEN);
> > }
> >
> > +/*
> > + * thaw_oom_process - thaw the OOM victim process
> > + * @p: process to be thawed
> > + *
> > + * Sets TIF_MEMDIE for all threads in the process group and thaws them.
> > + * Threads with TIF_MEMDIE are ignored by the freezer.
> > + */
> > +void thaw_oom_process(struct task_struct *p)
> > +{
> > + struct task_struct *t;
> > +
> > + rcu_read_lock();
> > + for_each_thread(p, t) {
> > + set_tsk_thread_flag(t, TIF_MEMDIE);
> > + __thaw_task(t);
> > + }
> > + rcu_read_unlock();
> > +}
> > +
> > /**
> > * set_freezable - make %current freezable
> > *
> > --
> > 2.17.1
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 3/3] mm/oom_kill: The OOM reaper traverses the VMA maple tree in reverse order
2025-09-09 9:06 ` [PATCH v8 3/3] mm/oom_kill: The OOM reaper traverses the VMA maple tree in reverse order zhongjinji
@ 2025-09-09 16:29 ` Suren Baghdasaryan
2025-09-09 16:30 ` Suren Baghdasaryan
0 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2025-09-09 16:29 UTC (permalink / raw)
To: zhongjinji
Cc: mhocko, rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han
On Tue, Sep 9, 2025 at 2:07 AM zhongjinji <zhongjinji@honor.com> wrote:
>
> Although the oom_reaper is delayed and it gives the oom victim chance to
> clean up its address space this might take a while especially for
> processes with a large address space footprint. In those cases
> oom_reaper might start racing with the dying task and compete for shared
> resources - e.g. page table lock contention has been observed.
>
> Reduce those races by reaping the oom victim from the other end of the
> address space.
>
> It is also a significant improvement for process_mrelease(). When a process
> is killed, process_mrelease is used to reap the killed process and often
> runs concurrently with the dying task. The test data shows that after
> applying the patch, lock contention is greatly reduced during the procedure
> of reaping the killed process.
>
> The test is based on arm64.
>
> Without the patch:
> |--99.57%-- oom_reaper
> | |--0.28%-- [hit in function]
> | |--73.58%-- unmap_page_range
> | | |--8.67%-- [hit in function]
> | | |--41.59%-- __pte_offset_map_lock
> | | |--29.47%-- folio_remove_rmap_ptes
> | | |--16.11%-- tlb_flush_mmu
> | | |--1.66%-- folio_mark_accessed
> | | |--0.74%-- free_swap_and_cache_nr
> | | |--0.69%-- __tlb_remove_folio_pages
> | |--19.94%-- tlb_finish_mmu
> | |--3.21%-- folio_remove_rmap_ptes
> | |--1.16%-- __tlb_remove_folio_pages
> | |--1.16%-- folio_mark_accessed
> | |--0.36%-- __pte_offset_map_lock
>
> With the patch:
> |--99.53%-- oom_reaper
> | |--55.77%-- unmap_page_range
> | | |--20.49%-- [hit in function]
> | | |--58.30%-- folio_remove_rmap_ptes
> | | |--11.48%-- tlb_flush_mmu
> | | |--3.33%-- folio_mark_accessed
> | | |--2.65%-- __tlb_remove_folio_pages
> | | |--1.37%-- _raw_spin_lock
> | | |--0.68%-- __mod_lruvec_page_state
> | | |--0.51%-- __pte_offset_map_lock
> | |--32.21%-- tlb_finish_mmu
> | |--6.93%-- folio_remove_rmap_ptes
> | |--1.90%-- __tlb_remove_folio_pages
> | |--1.55%-- folio_mark_accessed
> | |--0.69%-- __pte_offset_map_lock
>
> Signed-off-by: zhongjinji <zhongjinji@honor.com>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Suren Baghdsaryan <surenb@google.com>
> ---
> mm/oom_kill.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index ffa50a1f0132..52d285da5ba4 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -516,7 +516,7 @@ static bool __oom_reap_task_mm(struct mm_struct *mm)
> {
> struct vm_area_struct *vma;
> bool ret = true;
> - VMA_ITERATOR(vmi, mm, 0);
> + MA_STATE(mas, &mm->mm_mt, ULONG_MAX, ULONG_MAX);
>
> /*
> * Tell all users of get_user/copy_from_user etc... that the content
> @@ -526,7 +526,13 @@ static bool __oom_reap_task_mm(struct mm_struct *mm)
> */
> set_bit(MMF_UNSTABLE, &mm->flags);
>
> - for_each_vma(vmi, vma) {
> + /*
> + * It might start racing with the dying task and compete for shared
> + * resources - e.g. page table lock contention has been observed.
> + * Reduce those races by reaping the oom victim from the other end
> + * of the address space.
> + */
> + mas_for_each_rev(&mas, vma, 0) {
> if (vma->vm_flags & (VM_HUGETLB|VM_PFNMAP))
> continue;
>
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 3/3] mm/oom_kill: The OOM reaper traverses the VMA maple tree in reverse order
2025-09-09 16:29 ` Suren Baghdasaryan
@ 2025-09-09 16:30 ` Suren Baghdasaryan
0 siblings, 0 replies; 17+ messages in thread
From: Suren Baghdasaryan @ 2025-09-09 16:30 UTC (permalink / raw)
To: zhongjinji
Cc: mhocko, rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han
On Tue, Sep 9, 2025 at 9:29 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Sep 9, 2025 at 2:07 AM zhongjinji <zhongjinji@honor.com> wrote:
> >
> > Although the oom_reaper is delayed and it gives the oom victim chance to
> > clean up its address space this might take a while especially for
> > processes with a large address space footprint. In those cases
> > oom_reaper might start racing with the dying task and compete for shared
> > resources - e.g. page table lock contention has been observed.
> >
> > Reduce those races by reaping the oom victim from the other end of the
> > address space.
> >
> > It is also a significant improvement for process_mrelease(). When a process
> > is killed, process_mrelease is used to reap the killed process and often
> > runs concurrently with the dying task. The test data shows that after
> > applying the patch, lock contention is greatly reduced during the procedure
> > of reaping the killed process.
> >
> > The test is based on arm64.
> >
> > Without the patch:
> > |--99.57%-- oom_reaper
> > | |--0.28%-- [hit in function]
> > | |--73.58%-- unmap_page_range
> > | | |--8.67%-- [hit in function]
> > | | |--41.59%-- __pte_offset_map_lock
> > | | |--29.47%-- folio_remove_rmap_ptes
> > | | |--16.11%-- tlb_flush_mmu
> > | | |--1.66%-- folio_mark_accessed
> > | | |--0.74%-- free_swap_and_cache_nr
> > | | |--0.69%-- __tlb_remove_folio_pages
> > | |--19.94%-- tlb_finish_mmu
> > | |--3.21%-- folio_remove_rmap_ptes
> > | |--1.16%-- __tlb_remove_folio_pages
> > | |--1.16%-- folio_mark_accessed
> > | |--0.36%-- __pte_offset_map_lock
> >
> > With the patch:
> > |--99.53%-- oom_reaper
> > | |--55.77%-- unmap_page_range
> > | | |--20.49%-- [hit in function]
> > | | |--58.30%-- folio_remove_rmap_ptes
> > | | |--11.48%-- tlb_flush_mmu
> > | | |--3.33%-- folio_mark_accessed
> > | | |--2.65%-- __tlb_remove_folio_pages
> > | | |--1.37%-- _raw_spin_lock
> > | | |--0.68%-- __mod_lruvec_page_state
> > | | |--0.51%-- __pte_offset_map_lock
> > | |--32.21%-- tlb_finish_mmu
> > | |--6.93%-- folio_remove_rmap_ptes
> > | |--1.90%-- __tlb_remove_folio_pages
> > | |--1.55%-- folio_mark_accessed
> > | |--0.69%-- __pte_offset_map_lock
> >
> > Signed-off-by: zhongjinji <zhongjinji@honor.com>
> > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> > Acked-by: Michal Hocko <mhocko@suse.com>
>
> Reviewed-by: Suren Baghdsaryan <surenb@google.com>
Apparently I misspelled my own last name :)
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
>
> > ---
> > mm/oom_kill.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index ffa50a1f0132..52d285da5ba4 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -516,7 +516,7 @@ static bool __oom_reap_task_mm(struct mm_struct *mm)
> > {
> > struct vm_area_struct *vma;
> > bool ret = true;
> > - VMA_ITERATOR(vmi, mm, 0);
> > + MA_STATE(mas, &mm->mm_mt, ULONG_MAX, ULONG_MAX);
> >
> > /*
> > * Tell all users of get_user/copy_from_user etc... that the content
> > @@ -526,7 +526,13 @@ static bool __oom_reap_task_mm(struct mm_struct *mm)
> > */
> > set_bit(MMF_UNSTABLE, &mm->flags);
> >
> > - for_each_vma(vmi, vma) {
> > + /*
> > + * It might start racing with the dying task and compete for shared
> > + * resources - e.g. page table lock contention has been observed.
> > + * Reduce those races by reaping the oom victim from the other end
> > + * of the address space.
> > + */
> > + mas_for_each_rev(&mas, vma, 0) {
> > if (vma->vm_flags & (VM_HUGETLB|VM_PFNMAP))
> > continue;
> >
> > --
> > 2.17.1
> >
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims
2025-09-09 16:27 ` Suren Baghdasaryan
@ 2025-09-09 16:44 ` Michal Hocko
2025-09-09 16:53 ` Suren Baghdasaryan
0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2025-09-09 16:44 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: zhongjinji, rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han
On Tue 09-09-25 09:27:54, Suren Baghdasaryan wrote:
> On Tue, Sep 9, 2025 at 2:15 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Tue 09-09-25 17:06:57, zhongjinji wrote:
> > > OOM killer is a mechanism that selects and kills processes when the system
> > > runs out of memory to reclaim resources and keep the system stable.
> > > However, the oom victim cannot terminate on its own when it is frozen,
> > > because __thaw_task() only thaws one thread of the victim, while
> > > the other threads remain in the frozen state.
> > >
> > > Since __thaw_task did not fully thaw the OOM victim for self-termination,
> > > introduce thaw_oom_process() to properly thaw OOM victims.
> >
> > You will need s@thaw_oom_process@thaw_processes@
>
> Do you suggest renaming thaw_oom_process() into thaw_processes()
> (s/thaw_oom_process/thaw_processes)? If so, I don't think that's a
> better name considering the function sets TIF_MEMDIE flag. From that
> perspective less generic thaw_oom_process() seems appropriate, no?
Please see the discussion for the patch 2.
TL;DR yes rename and drop TIF_MEMDIE part and update freezer to check
tsk_is_oom_victim rather than TIF_MEMDIE.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims
2025-09-09 16:44 ` Michal Hocko
@ 2025-09-09 16:53 ` Suren Baghdasaryan
0 siblings, 0 replies; 17+ messages in thread
From: Suren Baghdasaryan @ 2025-09-09 16:53 UTC (permalink / raw)
To: Michal Hocko
Cc: zhongjinji, rientjes, shakeel.butt, akpm, tglx, liam.howlett,
lorenzo.stoakes, lenb, rafael, pavel, linux-mm, linux-pm,
linux-kernel, liulu.liu, feng.han
On Tue, Sep 9, 2025 at 9:44 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 09-09-25 09:27:54, Suren Baghdasaryan wrote:
> > On Tue, Sep 9, 2025 at 2:15 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Tue 09-09-25 17:06:57, zhongjinji wrote:
> > > > OOM killer is a mechanism that selects and kills processes when the system
> > > > runs out of memory to reclaim resources and keep the system stable.
> > > > However, the oom victim cannot terminate on its own when it is frozen,
> > > > because __thaw_task() only thaws one thread of the victim, while
> > > > the other threads remain in the frozen state.
> > > >
> > > > Since __thaw_task did not fully thaw the OOM victim for self-termination,
> > > > introduce thaw_oom_process() to properly thaw OOM victims.
> > >
> > > You will need s@thaw_oom_process@thaw_processes@
> >
> > Do you suggest renaming thaw_oom_process() into thaw_processes()
> > (s/thaw_oom_process/thaw_processes)? If so, I don't think that's a
> > better name considering the function sets TIF_MEMDIE flag. From that
> > perspective less generic thaw_oom_process() seems appropriate, no?
>
> Please see the discussion for the patch 2.
> TL;DR yes rename and drop TIF_MEMDIE part and update freezer to check
> tsk_is_oom_victim rather than TIF_MEMDIE.
Oh, sorry. For some reason that part of the email thread ended up as a
separate email in my mailbox and I missed it. Your suggestion there
sounds great.
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-09-09 16:54 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-09 9:06 [PATCH v8 0/3] Improvements to Victim Process Thawing and OOM Reaper Traversal Order zhongjinji
2025-09-09 9:06 ` [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims zhongjinji
2025-09-09 9:15 ` Michal Hocko
2025-09-09 16:27 ` Suren Baghdasaryan
2025-09-09 16:44 ` Michal Hocko
2025-09-09 16:53 ` Suren Baghdasaryan
2025-09-09 9:06 ` [PATCH v8 2/3] mm/oom_kill: Thaw the entire OOM victim process zhongjinji
2025-09-09 9:15 ` Michal Hocko
2025-09-09 11:41 ` [PATCH v8 1/3] mm/oom_kill: Introduce thaw_oom_process() for thawing OOM victims zhongjinji
2025-09-09 11:59 ` Michal Hocko
2025-09-09 13:51 ` zhongjinji
2025-09-09 14:02 ` Michal Hocko
2025-09-09 14:47 ` zhongjinji
2025-09-09 16:23 ` [PATCH v8 2/3] mm/oom_kill: Thaw the entire OOM victim process Suren Baghdasaryan
2025-09-09 9:06 ` [PATCH v8 3/3] mm/oom_kill: The OOM reaper traverses the VMA maple tree in reverse order zhongjinji
2025-09-09 16:29 ` Suren Baghdasaryan
2025-09-09 16:30 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox