Hi Johannes,


On Tue, 16 Dec 2025 at 02:31, Johannes Weiner <hannes@cmpxchg.org> wrote:
On Mon, Dec 15, 2025 at 02:54:19PM +0000, Dipendra Khadka wrote:
> In try_charge_memcg(), reorder the retry logic checks to follow the
> early-exit pattern by testing for dying task before decrementing the
> retry counter:
>
> Before:
>     if (nr_retries--)
>         goto retry;
>     
>     if (passed_oom && task_is_dying())
>         goto nomem;
>
> After:
>     if (passed_oom && task_is_dying())
>         goto nomem;
>     
>     if (nr_retries--)
>         goto retry;
>
> This makes the control flow more obvious: check exit conditions first,
> then decide whether to retry. When current task is dying (e.g., has
> received SIGKILL or is exiting), we should exit immediately rather than
> consuming a retry count first.
>
> No functional change for the common case where task is not dying.

It's definitely a functional change, not just code clarification.
 
The oom kill resets nr_retries. This means that currently, even an OOM
victim is going to retry a full set of reclaims, even if they are
hopeless. After your patch, it'll retry for other reasons but can bail
much earlier as well. Check the other conditions.

The dying task / OOM victim allocation path is tricky and it tends to
fail us in the rarest and most difficult to debug scenarios. There
should be a good reason to change it.

Thank you for the feedback. Let me clarify the scenario this patch addresses: 

The task_is_dying() check in try_charge_memcg() identifies when the 
 CURRENT task (the caller) is the OOM victim - not when some other process was killed. 

Two scenarios: 

1. Normal allocator triggers OOM: 
  - Process A allocates → triggers OOM 
  - Process B is killed (victim) 
  - Process A continues with reset retries - task_is_dying() = false for
  → Unchanged by my patch 

 2. Victim tries to allocate:
 - Process B (victim, TIF_MEMDIE set) tries to allocate 
  - task_is_dying() = true 
  - Current code: wastes retries on hopeless reclaims 
  - My patch: exits immediately 
  → Optimization for this case 

 The victim has three safety mechanisms that make the retries unnecessary: 
1. oom_reaper proactively frees its memory 
2. __alloc_pages_slowpath() grants reserves via oom_reserves_allowed() 
3. Critical allocations with __GFP_NOFAIL still reach force: label 

The retry loop for a dying victim is futile because: 
- Reclaim won't help (victim is being killed to free memory!) 
- Victim will exit regardless 
- Just wastes CPU cycles 

 Best Regards,
 Dipendra