[PATCH] mm/memcg: reorder retry checks for clarity in try_charge

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
@ 2025-12-15 14:54 Dipendra Khadka
  2025-12-15 20:46 ` Johannes Weiner
  0 siblings, 1 reply; 8+ messages in thread
From: Dipendra Khadka @ 2025-12-15 14:54 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, roman.gushchin, shakeel.butt, muchun.song,
	cgroups, linux-mm, linux-kernel

In try_charge_memcg(), reorder the retry logic checks to follow the
early-exit pattern by testing for dying task before decrementing the
retry counter:

Before:
    if (nr_retries--)
        goto retry;
    
    if (passed_oom && task_is_dying())
        goto nomem;

After:
    if (passed_oom && task_is_dying())
        goto nomem;
    
    if (nr_retries--)
        goto retry;

This makes the control flow more obvious: check exit conditions first,
then decide whether to retry. When current task is dying (e.g., has
received SIGKILL or is exiting), we should exit immediately rather than
consuming a retry count first.

No functional change for the common case where task is not dying.

Signed-off-by: Dipendra Khadka <kdipendra88@gmail.com>
---
 mm/memcontrol.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index be810c1fbfc3..7c9145538683 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2412,16 +2412,16 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
 		goto retry;

+	/* Avoid endless loop for tasks bypassed by the oom killer */
+	if (passed_oom && task_is_dying())
+		goto nomem;
+
 	if (nr_retries--)
 		goto retry;
 
 	if (gfp_mask & __GFP_RETRY_MAYFAIL)
 		goto nomem;
 
-	/* Avoid endless loop for tasks bypassed by the oom killer */
-	if (passed_oom && task_is_dying())
-		goto nomem;
-
 	/*
 	 * keep retrying as long as the memcg oom killer is able to make
 	 * a forward progress or bypass the charge if the oom killer
-- 
2.43.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
  2025-12-15 14:54 [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg Dipendra Khadka
@ 2025-12-15 20:46 ` Johannes Weiner
  2025-12-18  6:55   ` Dipendra Khadka
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Johannes Weiner @ 2025-12-15 20:46 UTC (permalink / raw)
  To: Dipendra Khadka
  Cc: akpm, mhocko, roman.gushchin, shakeel.butt, muchun.song, cgroups,
	linux-mm, linux-kernel

On Mon, Dec 15, 2025 at 02:54:19PM +0000, Dipendra Khadka wrote:
> In try_charge_memcg(), reorder the retry logic checks to follow the
> early-exit pattern by testing for dying task before decrementing the
> retry counter:
> 
> Before:
>     if (nr_retries--)
>         goto retry;
>     
>     if (passed_oom && task_is_dying())
>         goto nomem;
> 
> After:
>     if (passed_oom && task_is_dying())
>         goto nomem;
>     
>     if (nr_retries--)
>         goto retry;
> 
> This makes the control flow more obvious: check exit conditions first,
> then decide whether to retry. When current task is dying (e.g., has
> received SIGKILL or is exiting), we should exit immediately rather than
> consuming a retry count first.
> 
> No functional change for the common case where task is not dying.

It's definitely a functional change, not just code clarification.

The oom kill resets nr_retries. This means that currently, even an OOM
victim is going to retry a full set of reclaims, even if they are
hopeless. After your patch, it'll retry for other reasons but can bail
much earlier as well. Check the other conditions.

The dying task / OOM victim allocation path is tricky and it tends to
fail us in the rarest and most difficult to debug scenarios. There
should be a good reason to change it.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
  2025-12-15 20:46 ` Johannes Weiner
@ 2025-12-18  6:55   ` Dipendra Khadka
  2025-12-18  7:06   ` Dipendra Khadka
  2025-12-18  7:25   ` Dipendra Khadka
  2 siblings, 0 replies; 8+ messages in thread
From: Dipendra Khadka @ 2025-12-18  6:55 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: akpm, mhocko, roman.gushchin, shakeel.butt, muchun.song, cgroups,
	linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2683 bytes --]

Hi Johannes,


On Tue, 16 Dec 2025 at 02:31, Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Mon, Dec 15, 2025 at 02:54:19PM +0000, Dipendra Khadka wrote:
> > In try_charge_memcg(), reorder the retry logic checks to follow the
> > early-exit pattern by testing for dying task before decrementing the
> > retry counter:
> >
> > Before:
> >     if (nr_retries--)
> >         goto retry;
> >
> >     if (passed_oom && task_is_dying())
> >         goto nomem;
> >
> > After:
> >     if (passed_oom && task_is_dying())
> >         goto nomem;
> >
> >     if (nr_retries--)
> >         goto retry;
> >
> > This makes the control flow more obvious: check exit conditions first,
> > then decide whether to retry. When current task is dying (e.g., has
> > received SIGKILL or is exiting), we should exit immediately rather than
> > consuming a retry count first.
> >
> > No functional change for the common case where task is not dying.
>
> It's definitely a functional change, not just code clarification.



The oom kill resets nr_retries. This means that currently, even an OOM
> victim is going to retry a full set of reclaims, even if they are
> hopeless. After your patch, it'll retry for other reasons but can bail
> much earlier as well. Check the other conditions.


> The dying task / OOM victim allocation path is tricky and it tends to
> fail us in the rarest and most difficult to debug scenarios. There
> should be a good reason to change it.
>

Thank you for the feedback. Let me clarify the scenario this patch addresses
:

The task_is_dying() check in try_charge_memcg() identifies when the
 CURRENT task (the caller) is the OOM victim - not when some other process
was killed.

Two scenarios:

1. Normal allocator triggers OOM:
  - Process A allocates → triggers OOM
  - Process B is killed (victim)
  - Process A continues with reset retries - task_is_dying() = false for A
  → Unchanged by my patch

 2. Victim tries to allocate:
 - Process B (victim, TIF_MEMDIE set) tries to allocate
  - task_is_dying() = true
  - Current code: wastes retries on hopeless reclaims
  - My patch: exits immediately
  → Optimization for this case

 The victim has three safety mechanisms that make the retries unnecessary:
1. oom_reaper proactively frees its memory
2. __alloc_pages_slowpath() grants reserves via oom_reserves_allowed()
3. Critical allocations with __GFP_NOFAIL still reach force: label

The retry loop for a dying victim is futile because:
- Reclaim won't help (victim is being killed to free memory!)
- Victim will exit regardless
- Just wastes CPU cycles

 Best Regards,
 Dipendra

[-- Attachment #2: Type: text/html, Size: 8729 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
  2025-12-15 20:46 ` Johannes Weiner
  2025-12-18  6:55   ` Dipendra Khadka
@ 2025-12-18  7:06   ` Dipendra Khadka
  2025-12-18  7:28     ` Shakeel Butt
  2025-12-18  7:25   ` Dipendra Khadka
  2 siblings, 1 reply; 8+ messages in thread
From: Dipendra Khadka @ 2025-12-18  7:06 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: akpm, mhocko, roman.gushchin, shakeel.butt, muchun.song, cgroups,
	linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2816 bytes --]

Hi Johannes,

Thank you for the feedback. Let me clarify the scenario this patch
addresses.

On Tue, 16 Dec 2025 at 02:31, Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Mon, Dec 15, 2025 at 02:54:19PM +0000, Dipendra Khadka wrote:
> > In try_charge_memcg(), reorder the retry logic checks to follow the
> > early-exit pattern by testing for dying task before decrementing the
> > retry counter:
> >
> > Before:
> >     if (nr_retries--)
> >         goto retry;
> >
> >     if (passed_oom && task_is_dying())
> >         goto nomem;
> >
> > After:
> >     if (passed_oom && task_is_dying())
> >         goto nomem;
> >
> >     if (nr_retries--)
> >         goto retry;
> >
> > This makes the control flow more obvious: check exit conditions first,
> > then decide whether to retry. When current task is dying (e.g., has
> > received SIGKILL or is exiting), we should exit immediately rather than
> > consuming a retry count first.
> >
> > No functional change for the common case where task is not dying.
>
> It's definitely a functional change, not just code clarification.
>
> The oom kill resets nr_retries. This means that currently, even an OOM
> victim is going to retry a full set of reclaims, even if they are
> hopeless. After your patch, it'll retry for other reasons but can bail
> much earlier as well. Check the other conditions.
>
> The dying task / OOM victim allocation path is tricky and it tends to
> fail us in the rarest and most difficult to debug scenarios. There
> should be a good reason to change it.

The task_is_dying() check in try_charge_memcg() identifies when the
CURRENT task (the caller) is the OOM victim - not when some other
process was killed.

Two scenarios:

1. Normal allocator triggers OOM:
  - Process A allocates → triggers OOM
  - Process B is killed (victim)
  - Process A continues with reset retries - task_is_dying() = false for A
  → Unchanged by my patch

2. Victim tries to allocate:
 - Process B (victim, TIF_MEMDIE set) tries to allocate
  - task_is_dying() = true
  - Current code: wastes retries on hopeless reclaims
  - My patch: exits immediately
  → Optimization for this case

The victim has three safety mechanisms that make the retries unnecessary:
1. oom_reaper proactively frees its memory
2. __alloc_pages_slowpath() grants reserves via oom_reserves_allowed()
3. Critical allocations with __GFP_NOFAIL still reach force: label

The retry loop for a dying victim is futile because:
- Reclaim won't help (victim is being killed to free memory!)
- Victim will exit regardless
- Just wastes CPU cycles

Would you like me to provide evidence showing the unnecessary retries,
or run specific tests to verify the safety mechanisms are sufficient?

Best Regards,
Dipendra

[-- Attachment #2: Type: text/html, Size: 3321 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
  2025-12-18  7:06   ` Dipendra Khadka
@ 2025-12-18  7:28     ` Shakeel Butt
  2025-12-18  7:36       ` Dipendra Khadka
  0 siblings, 1 reply; 8+ messages in thread
From: Shakeel Butt @ 2025-12-18  7:28 UTC (permalink / raw)
  To: Dipendra Khadka
  Cc: Johannes Weiner, akpm, mhocko, roman.gushchin, muchun.song,
	cgroups, linux-mm, linux-kernel

On Thu, Dec 18, 2025 at 12:51:04PM +0545, Dipendra Khadka wrote:
> Hi Johannes,
> 
> Thank you for the feedback. Let me clarify the scenario this patch
> addresses.
> 
> On Tue, 16 Dec 2025 at 02:31, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Mon, Dec 15, 2025 at 02:54:19PM +0000, Dipendra Khadka wrote:
> > > In try_charge_memcg(), reorder the retry logic checks to follow the
> > > early-exit pattern by testing for dying task before decrementing the
> > > retry counter:
> > >
> > > Before:
> > >     if (nr_retries--)
> > >         goto retry;
> > >
> > >     if (passed_oom && task_is_dying())
> > >         goto nomem;
> > >
> > > After:
> > >     if (passed_oom && task_is_dying())
> > >         goto nomem;
> > >
> > >     if (nr_retries--)
> > >         goto retry;
> > >
> > > This makes the control flow more obvious: check exit conditions first,
> > > then decide whether to retry. When current task is dying (e.g., has
> > > received SIGKILL or is exiting), we should exit immediately rather than
> > > consuming a retry count first.
> > >
> > > No functional change for the common case where task is not dying.
> >
> > It's definitely a functional change, not just code clarification.
> >
> > The oom kill resets nr_retries. This means that currently, even an OOM
> > victim is going to retry a full set of reclaims, even if they are
> > hopeless. After your patch, it'll retry for other reasons but can bail
> > much earlier as well. Check the other conditions.
> >
> > The dying task / OOM victim allocation path is tricky and it tends to
> > fail us in the rarest and most difficult to debug scenarios. There
> > should be a good reason to change it.
> 
> The task_is_dying() check in try_charge_memcg() identifies when the
> CURRENT task (the caller) is the OOM victim - not when some other
> process was killed.
> 
> Two scenarios:
> 
> 1. Normal allocator triggers OOM:
>   - Process A allocates → triggers OOM
>   - Process B is killed (victim)
>   - Process A continues with reset retries - task_is_dying() = false for A
>   → Unchanged by my patch
> 
> 2. Victim tries to allocate:
>  - Process B (victim, TIF_MEMDIE set) tries to allocate
>   - task_is_dying() = true
>   - Current code: wastes retries on hopeless reclaims

Why hopeless?

>   - My patch: exits immediately
>   → Optimization for this case

Why optimize for this case?

> 
> The victim has three safety mechanisms that make the retries unnecessary:
> 1. oom_reaper proactively frees its memory

Since oom_reaper will reap the memory of the killed process, do we
really care about if killed process is delayed a bit due to reclaim?

> 2. __alloc_pages_slowpath() grants reserves via oom_reserves_allowed()

How is this relevant here?

> 3. Critical allocations with __GFP_NOFAIL still reach force: label

Same, how is this relevant to victim safety?

> 
> The retry loop for a dying victim is futile because:
> - Reclaim won't help (victim is being killed to free memory!)
> - Victim will exit regardless
> - Just wastes CPU cycles
> 
> Would you like me to provide evidence showing the unnecessary retries,
> or run specific tests to verify the safety mechanisms are sufficient?
> 
> Best Regards,
> Dipendra


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
  2025-12-18  7:28     ` Shakeel Butt
@ 2025-12-18  7:36       ` Dipendra Khadka
  2025-12-18  8:08         ` Shakeel Butt
  0 siblings, 1 reply; 8+ messages in thread
From: Dipendra Khadka @ 2025-12-18  7:36 UTC (permalink / raw)
  To: shakeel.butt
  Cc: akpm, cgroups, hannes, kdipendra88, linux-kernel, linux-mm,
	mhocko, muchun.song, roman.gushchin

> Why hopeless?

Because in this specific path the allocating task is already the OOM
victim (TIF_MEMDIE set). Any reclaim attempt performed by that task is
unlikely to make progress for its own allocation, since the kernel has
already decided that freeing this task’s memory is the resolution
mechanism. Reclaim may free some pages globally, but the victim itself
will still be exiting shortly, making retries for its own allocation
non-actionable.

> Why optimize for this case?

I agree this is a narrow case, but it is also a delicate one. The
motivation is not performance in the general sense, but avoiding extra
complexity and repeated reclaim attempts in a code path that is already
operating under exceptional conditions. The early exit reduces retry
churn for a task that is guaranteed to terminate, without affecting
non-victim allocators.

> Since oom_reaper will reap the memory of the killed process, do we
> really care about if killed process is delayed a bit due to reclaim?

Not strongly from a functional standpoint. The concern is more about
control flow clarity and avoiding unnecessary reclaim loops while the
task is already in a terminal state. I agree that this is not a
correctness issue by itself, but rather an attempt to avoid redundant
work in an already resolved situation.

> How is this relevant here?

This was meant to explain why exiting early does not introduce new
failure modes for the victim task. Even if the victim still performs
allocations briefly, the slowpath mechanisms already allow limited
forward progress. I agree this does not directly justify the reordering
by itself.

> Same, how is this relevant to victim safety?

Same answer here — these mechanisms ensure that the victim does not
regress functionally if retries are skipped, but they are not intended
as the primary justification for the change.

The primary intent of the patch is to avoid retrying reclaim for the
current task once it has been marked as dying, not to change OOM
resolution behavior. If this rationale is insufficient, I’m happy to
drop the patch or rework it with clearer justification or measurable
evidence.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
  2025-12-18  7:36       ` Dipendra Khadka
@ 2025-12-18  8:08         ` Shakeel Butt
  0 siblings, 0 replies; 8+ messages in thread
From: Shakeel Butt @ 2025-12-18  8:08 UTC (permalink / raw)
  To: Dipendra Khadka
  Cc: akpm, cgroups, hannes, linux-kernel, linux-mm, mhocko,
	muchun.song, roman.gushchin

This feels like AI generated responses and giving me the impression that
you are not really understanding what you have written here.

On Thu, Dec 18, 2025 at 07:36:13AM +0000, Dipendra Khadka wrote:
> > Why hopeless?
> 
> Because in this specific path the allocating task is already the OOM
> victim (TIF_MEMDIE set). Any reclaim attempt performed by that task is
> unlikely to make progress for its own allocation, since the kernel has
> already decided that freeing this task’s memory is the resolution
> mechanism. Reclaim may free some pages globally, but the victim itself
> will still be exiting shortly, making retries for its own allocation
> non-actionable.

Beside its own allocation, these retries also keep the usage of the
container below its limit. Under extreme condition we allow allocations
above the limit. Here we are failing the charge request. Will we start
seeing more ENOMEM in the exit path? Do you have a call stack where such
retries are happening?

> 
> > Why optimize for this case?
> 
> I agree this is a narrow case, but it is also a delicate one.

How is it a delicate one and what exactly delicate mean here?

> The
> motivation is not performance in the general sense, but avoiding extra
> complexity and repeated reclaim attempts in a code path that is already
> operating under exceptional conditions. The early exit reduces retry
> churn for a task that is guaranteed to terminate, without affecting
> non-victim allocators.
> 
> > Since oom_reaper will reap the memory of the killed process, do we
> > really care about if killed process is delayed a bit due to reclaim?
> 
> Not strongly from a functional standpoint. The concern is more about
> control flow clarity and avoiding unnecessary reclaim loops while the
> task is already in a terminal state. I agree that this is not a
> correctness issue by itself, but rather an attempt to avoid redundant
> work in an already resolved situation.

How is it a resolved situation? The dying process is in reclaim due to
limit.

> 
> > How is this relevant here?
> 
> This was meant to explain why exiting early does not introduce new
> failure modes for the victim task.

More chances of ENOMEMs?

> Even if the victim still performs
> allocations briefly, the slowpath mechanisms 

What slowpath mechanisms?

> already allow limited
> forward progress. I agree this does not directly justify the reordering
> by itself.
> 
> > Same, how is this relevant to victim safety?
> 
> Same answer here — these mechanisms ensure that the victim does not
> regress functionally if retries are skipped,

How GFP_NOFAIL doing force charge is relevant to the case you are trying
to optimize?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg
  2025-12-15 20:46 ` Johannes Weiner
  2025-12-18  6:55   ` Dipendra Khadka
  2025-12-18  7:06   ` Dipendra Khadka
@ 2025-12-18  7:25   ` Dipendra Khadka
  2 siblings, 0 replies; 8+ messages in thread
From: Dipendra Khadka @ 2025-12-18  7:25 UTC (permalink / raw)
  To: hannes
  Cc: akpm, mhocko, roman.gushchin, shakeel.butt, muchun.song, cgroups,
	linux-mm, linux-kernel

Hi Johannes,

Thank you for the feedback. Let me clarify the scenario this patch addresses.

The task_is_dying() check in try_charge_memcg() identifies when the
CURRENT task (the caller) is the OOM victim - not when some other 
process was killed.

Two scenarios:

1. Normal allocator triggers OOM:
  - Process A allocates → triggers OOM
  - Process B is killed (victim)
  - Process A continues with reset retries - task_is_dying() = false for A
  → Unchanged by my patch

2. Victim tries to allocate:
 - Process B (victim, TIF_MEMDIE set) tries to allocate
  - task_is_dying() = true
  - Current code: wastes retries on hopeless reclaims
  - My patch: exits immediately
  → Optimization for this case

The victim has three safety mechanisms that make the retries unnecessary:
1. oom_reaper proactively frees its memory
2. __alloc_pages_slowpath() grants reserves via oom_reserves_allowed()
3. Critical allocations with __GFP_NOFAIL still reach force: label

The retry loop for a dying victim is futile because:
- Reclaim won't help (victim is being killed to free memory!)
- Victim will exit regardless
- Just wastes CPU cycles

Would you like me to provide evidence showing the unnecessary retries,
or run specific tests to verify the safety mechanisms are sufficient?

Best Regards,
Dipendra

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-12-18  8:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-15 14:54 [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg Dipendra Khadka
2025-12-15 20:46 ` Johannes Weiner
2025-12-18  6:55   ` Dipendra Khadka
2025-12-18  7:06   ` Dipendra Khadka
2025-12-18  7:28     ` Shakeel Butt
2025-12-18  7:36       ` Dipendra Khadka
2025-12-18  8:08         ` Shakeel Butt
2025-12-18  7:25   ` Dipendra Khadka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox