* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
2025-12-15 20:46 ` Johannes Weiner
@ 2025-12-18 6:55 ` Dipendra Khadka
2025-12-18 7:06 ` Dipendra Khadka
2025-12-18 7:25 ` Dipendra Khadka
2 siblings, 0 replies; 8+ messages in thread
From: Dipendra Khadka @ 2025-12-18 6:55 UTC (permalink / raw)
To: Johannes Weiner
Cc: akpm, mhocko, roman.gushchin, shakeel.butt, muchun.song, cgroups,
linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2683 bytes --]
Hi Johannes,
On Tue, 16 Dec 2025 at 02:31, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Mon, Dec 15, 2025 at 02:54:19PM +0000, Dipendra Khadka wrote:
> > In try_charge_memcg(), reorder the retry logic checks to follow the
> > early-exit pattern by testing for dying task before decrementing the
> > retry counter:
> >
> > Before:
> > if (nr_retries--)
> > goto retry;
> >
> > if (passed_oom && task_is_dying())
> > goto nomem;
> >
> > After:
> > if (passed_oom && task_is_dying())
> > goto nomem;
> >
> > if (nr_retries--)
> > goto retry;
> >
> > This makes the control flow more obvious: check exit conditions first,
> > then decide whether to retry. When current task is dying (e.g., has
> > received SIGKILL or is exiting), we should exit immediately rather than
> > consuming a retry count first.
> >
> > No functional change for the common case where task is not dying.
>
> It's definitely a functional change, not just code clarification.
The oom kill resets nr_retries. This means that currently, even an OOM
> victim is going to retry a full set of reclaims, even if they are
> hopeless. After your patch, it'll retry for other reasons but can bail
> much earlier as well. Check the other conditions.
> The dying task / OOM victim allocation path is tricky and it tends to
> fail us in the rarest and most difficult to debug scenarios. There
> should be a good reason to change it.
>
Thank you for the feedback. Let me clarify the scenario this patch addresses
:
The task_is_dying() check in try_charge_memcg() identifies when the
CURRENT task (the caller) is the OOM victim - not when some other process
was killed.
Two scenarios:
1. Normal allocator triggers OOM:
- Process A allocates → triggers OOM
- Process B is killed (victim)
- Process A continues with reset retries - task_is_dying() = false for A
→ Unchanged by my patch
2. Victim tries to allocate:
- Process B (victim, TIF_MEMDIE set) tries to allocate
- task_is_dying() = true
- Current code: wastes retries on hopeless reclaims
- My patch: exits immediately
→ Optimization for this case
The victim has three safety mechanisms that make the retries unnecessary:
1. oom_reaper proactively frees its memory
2. __alloc_pages_slowpath() grants reserves via oom_reserves_allowed()
3. Critical allocations with __GFP_NOFAIL still reach force: label
The retry loop for a dying victim is futile because:
- Reclaim won't help (victim is being killed to free memory!)
- Victim will exit regardless
- Just wastes CPU cycles
Best Regards,
Dipendra
[-- Attachment #2: Type: text/html, Size: 8729 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
2025-12-15 20:46 ` Johannes Weiner
2025-12-18 6:55 ` Dipendra Khadka
@ 2025-12-18 7:06 ` Dipendra Khadka
2025-12-18 7:28 ` Shakeel Butt
2025-12-18 7:25 ` Dipendra Khadka
2 siblings, 1 reply; 8+ messages in thread
From: Dipendra Khadka @ 2025-12-18 7:06 UTC (permalink / raw)
To: Johannes Weiner
Cc: akpm, mhocko, roman.gushchin, shakeel.butt, muchun.song, cgroups,
linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2816 bytes --]
Hi Johannes,
Thank you for the feedback. Let me clarify the scenario this patch
addresses.
On Tue, 16 Dec 2025 at 02:31, Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Mon, Dec 15, 2025 at 02:54:19PM +0000, Dipendra Khadka wrote:
> > In try_charge_memcg(), reorder the retry logic checks to follow the
> > early-exit pattern by testing for dying task before decrementing the
> > retry counter:
> >
> > Before:
> > if (nr_retries--)
> > goto retry;
> >
> > if (passed_oom && task_is_dying())
> > goto nomem;
> >
> > After:
> > if (passed_oom && task_is_dying())
> > goto nomem;
> >
> > if (nr_retries--)
> > goto retry;
> >
> > This makes the control flow more obvious: check exit conditions first,
> > then decide whether to retry. When current task is dying (e.g., has
> > received SIGKILL or is exiting), we should exit immediately rather than
> > consuming a retry count first.
> >
> > No functional change for the common case where task is not dying.
>
> It's definitely a functional change, not just code clarification.
>
> The oom kill resets nr_retries. This means that currently, even an OOM
> victim is going to retry a full set of reclaims, even if they are
> hopeless. After your patch, it'll retry for other reasons but can bail
> much earlier as well. Check the other conditions.
>
> The dying task / OOM victim allocation path is tricky and it tends to
> fail us in the rarest and most difficult to debug scenarios. There
> should be a good reason to change it.
The task_is_dying() check in try_charge_memcg() identifies when the
CURRENT task (the caller) is the OOM victim - not when some other
process was killed.
Two scenarios:
1. Normal allocator triggers OOM:
- Process A allocates → triggers OOM
- Process B is killed (victim)
- Process A continues with reset retries - task_is_dying() = false for A
→ Unchanged by my patch
2. Victim tries to allocate:
- Process B (victim, TIF_MEMDIE set) tries to allocate
- task_is_dying() = true
- Current code: wastes retries on hopeless reclaims
- My patch: exits immediately
→ Optimization for this case
The victim has three safety mechanisms that make the retries unnecessary:
1. oom_reaper proactively frees its memory
2. __alloc_pages_slowpath() grants reserves via oom_reserves_allowed()
3. Critical allocations with __GFP_NOFAIL still reach force: label
The retry loop for a dying victim is futile because:
- Reclaim won't help (victim is being killed to free memory!)
- Victim will exit regardless
- Just wastes CPU cycles
Would you like me to provide evidence showing the unnecessary retries,
or run specific tests to verify the safety mechanisms are sufficient?
Best Regards,
Dipendra
[-- Attachment #2: Type: text/html, Size: 3321 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
2025-12-18 7:06 ` Dipendra Khadka
@ 2025-12-18 7:28 ` Shakeel Butt
2025-12-18 7:36 ` Dipendra Khadka
0 siblings, 1 reply; 8+ messages in thread
From: Shakeel Butt @ 2025-12-18 7:28 UTC (permalink / raw)
To: Dipendra Khadka
Cc: Johannes Weiner, akpm, mhocko, roman.gushchin, muchun.song,
cgroups, linux-mm, linux-kernel
On Thu, Dec 18, 2025 at 12:51:04PM +0545, Dipendra Khadka wrote:
> Hi Johannes,
>
> Thank you for the feedback. Let me clarify the scenario this patch
> addresses.
>
> On Tue, 16 Dec 2025 at 02:31, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Mon, Dec 15, 2025 at 02:54:19PM +0000, Dipendra Khadka wrote:
> > > In try_charge_memcg(), reorder the retry logic checks to follow the
> > > early-exit pattern by testing for dying task before decrementing the
> > > retry counter:
> > >
> > > Before:
> > > if (nr_retries--)
> > > goto retry;
> > >
> > > if (passed_oom && task_is_dying())
> > > goto nomem;
> > >
> > > After:
> > > if (passed_oom && task_is_dying())
> > > goto nomem;
> > >
> > > if (nr_retries--)
> > > goto retry;
> > >
> > > This makes the control flow more obvious: check exit conditions first,
> > > then decide whether to retry. When current task is dying (e.g., has
> > > received SIGKILL or is exiting), we should exit immediately rather than
> > > consuming a retry count first.
> > >
> > > No functional change for the common case where task is not dying.
> >
> > It's definitely a functional change, not just code clarification.
> >
> > The oom kill resets nr_retries. This means that currently, even an OOM
> > victim is going to retry a full set of reclaims, even if they are
> > hopeless. After your patch, it'll retry for other reasons but can bail
> > much earlier as well. Check the other conditions.
> >
> > The dying task / OOM victim allocation path is tricky and it tends to
> > fail us in the rarest and most difficult to debug scenarios. There
> > should be a good reason to change it.
>
> The task_is_dying() check in try_charge_memcg() identifies when the
> CURRENT task (the caller) is the OOM victim - not when some other
> process was killed.
>
> Two scenarios:
>
> 1. Normal allocator triggers OOM:
> - Process A allocates → triggers OOM
> - Process B is killed (victim)
> - Process A continues with reset retries - task_is_dying() = false for A
> → Unchanged by my patch
>
> 2. Victim tries to allocate:
> - Process B (victim, TIF_MEMDIE set) tries to allocate
> - task_is_dying() = true
> - Current code: wastes retries on hopeless reclaims
Why hopeless?
> - My patch: exits immediately
> → Optimization for this case
Why optimize for this case?
>
> The victim has three safety mechanisms that make the retries unnecessary:
> 1. oom_reaper proactively frees its memory
Since oom_reaper will reap the memory of the killed process, do we
really care about if killed process is delayed a bit due to reclaim?
> 2. __alloc_pages_slowpath() grants reserves via oom_reserves_allowed()
How is this relevant here?
> 3. Critical allocations with __GFP_NOFAIL still reach force: label
Same, how is this relevant to victim safety?
>
> The retry loop for a dying victim is futile because:
> - Reclaim won't help (victim is being killed to free memory!)
> - Victim will exit regardless
> - Just wastes CPU cycles
>
> Would you like me to provide evidence showing the unnecessary retries,
> or run specific tests to verify the safety mechanisms are sufficient?
>
> Best Regards,
> Dipendra
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
2025-12-18 7:28 ` Shakeel Butt
@ 2025-12-18 7:36 ` Dipendra Khadka
2025-12-18 8:08 ` Shakeel Butt
0 siblings, 1 reply; 8+ messages in thread
From: Dipendra Khadka @ 2025-12-18 7:36 UTC (permalink / raw)
To: shakeel.butt
Cc: akpm, cgroups, hannes, kdipendra88, linux-kernel, linux-mm,
mhocko, muchun.song, roman.gushchin
> Why hopeless?
Because in this specific path the allocating task is already the OOM
victim (TIF_MEMDIE set). Any reclaim attempt performed by that task is
unlikely to make progress for its own allocation, since the kernel has
already decided that freeing this task’s memory is the resolution
mechanism. Reclaim may free some pages globally, but the victim itself
will still be exiting shortly, making retries for its own allocation
non-actionable.
> Why optimize for this case?
I agree this is a narrow case, but it is also a delicate one. The
motivation is not performance in the general sense, but avoiding extra
complexity and repeated reclaim attempts in a code path that is already
operating under exceptional conditions. The early exit reduces retry
churn for a task that is guaranteed to terminate, without affecting
non-victim allocators.
> Since oom_reaper will reap the memory of the killed process, do we
> really care about if killed process is delayed a bit due to reclaim?
Not strongly from a functional standpoint. The concern is more about
control flow clarity and avoiding unnecessary reclaim loops while the
task is already in a terminal state. I agree that this is not a
correctness issue by itself, but rather an attempt to avoid redundant
work in an already resolved situation.
> How is this relevant here?
This was meant to explain why exiting early does not introduce new
failure modes for the victim task. Even if the victim still performs
allocations briefly, the slowpath mechanisms already allow limited
forward progress. I agree this does not directly justify the reordering
by itself.
> Same, how is this relevant to victim safety?
Same answer here — these mechanisms ensure that the victim does not
regress functionally if retries are skipped, but they are not intended
as the primary justification for the change.
The primary intent of the patch is to avoid retrying reclaim for the
current task once it has been marked as dying, not to change OOM
resolution behavior. If this rationale is insufficient, I’m happy to
drop the patch or rework it with clearer justification or measurable
evidence.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
2025-12-18 7:36 ` Dipendra Khadka
@ 2025-12-18 8:08 ` Shakeel Butt
0 siblings, 0 replies; 8+ messages in thread
From: Shakeel Butt @ 2025-12-18 8:08 UTC (permalink / raw)
To: Dipendra Khadka
Cc: akpm, cgroups, hannes, linux-kernel, linux-mm, mhocko,
muchun.song, roman.gushchin
This feels like AI generated responses and giving me the impression that
you are not really understanding what you have written here.
On Thu, Dec 18, 2025 at 07:36:13AM +0000, Dipendra Khadka wrote:
> > Why hopeless?
>
> Because in this specific path the allocating task is already the OOM
> victim (TIF_MEMDIE set). Any reclaim attempt performed by that task is
> unlikely to make progress for its own allocation, since the kernel has
> already decided that freeing this task’s memory is the resolution
> mechanism. Reclaim may free some pages globally, but the victim itself
> will still be exiting shortly, making retries for its own allocation
> non-actionable.
Beside its own allocation, these retries also keep the usage of the
container below its limit. Under extreme condition we allow allocations
above the limit. Here we are failing the charge request. Will we start
seeing more ENOMEM in the exit path? Do you have a call stack where such
retries are happening?
>
> > Why optimize for this case?
>
> I agree this is a narrow case, but it is also a delicate one.
How is it a delicate one and what exactly delicate mean here?
> The
> motivation is not performance in the general sense, but avoiding extra
> complexity and repeated reclaim attempts in a code path that is already
> operating under exceptional conditions. The early exit reduces retry
> churn for a task that is guaranteed to terminate, without affecting
> non-victim allocators.
>
> > Since oom_reaper will reap the memory of the killed process, do we
> > really care about if killed process is delayed a bit due to reclaim?
>
> Not strongly from a functional standpoint. The concern is more about
> control flow clarity and avoiding unnecessary reclaim loops while the
> task is already in a terminal state. I agree that this is not a
> correctness issue by itself, but rather an attempt to avoid redundant
> work in an already resolved situation.
How is it a resolved situation? The dying process is in reclaim due to
limit.
>
> > How is this relevant here?
>
> This was meant to explain why exiting early does not introduce new
> failure modes for the victim task.
More chances of ENOMEMs?
> Even if the victim still performs
> allocations briefly, the slowpath mechanisms
What slowpath mechanisms?
> already allow limited
> forward progress. I agree this does not directly justify the reordering
> by itself.
>
> > Same, how is this relevant to victim safety?
>
> Same answer here — these mechanisms ensure that the victim does not
> regress functionally if retries are skipped,
How GFP_NOFAIL doing force charge is relevant to the case you are trying
to optimize?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
2025-12-15 20:46 ` Johannes Weiner
2025-12-18 6:55 ` Dipendra Khadka
2025-12-18 7:06 ` Dipendra Khadka
@ 2025-12-18 7:25 ` Dipendra Khadka
2 siblings, 0 replies; 8+ messages in thread
From: Dipendra Khadka @ 2025-12-18 7:25 UTC (permalink / raw)
To: hannes
Cc: akpm, mhocko, roman.gushchin, shakeel.butt, muchun.song, cgroups,
linux-mm, linux-kernel
Hi Johannes,
Thank you for the feedback. Let me clarify the scenario this patch addresses.
The task_is_dying() check in try_charge_memcg() identifies when the
CURRENT task (the caller) is the OOM victim - not when some other
process was killed.
Two scenarios:
1. Normal allocator triggers OOM:
- Process A allocates → triggers OOM
- Process B is killed (victim)
- Process A continues with reset retries - task_is_dying() = false for A
→ Unchanged by my patch
2. Victim tries to allocate:
- Process B (victim, TIF_MEMDIE set) tries to allocate
- task_is_dying() = true
- Current code: wastes retries on hopeless reclaims
- My patch: exits immediately
→ Optimization for this case
The victim has three safety mechanisms that make the retries unnecessary:
1. oom_reaper proactively frees its memory
2. __alloc_pages_slowpath() grants reserves via oom_reserves_allowed()
3. Critical allocations with __GFP_NOFAIL still reach force: label
The retry loop for a dying victim is futile because:
- Reclaim won't help (victim is being killed to free memory!)
- Victim will exit regardless
- Just wastes CPU cycles
Would you like me to provide evidence showing the unnecessary retries,
or run specific tests to verify the safety mechanisms are sufficient?
Best Regards,
Dipendra
^ permalink raw reply [flat|nested] 8+ messages in thread