Hi Johannes, Thank you for the feedback. Let me clarify the scenario this patch addresses. On Tue, 16 Dec 2025 at 02:31, Johannes Weiner wrote: > > On Mon, Dec 15, 2025 at 02:54:19PM +0000, Dipendra Khadka wrote: > > In try_charge_memcg(), reorder the retry logic checks to follow the > > early-exit pattern by testing for dying task before decrementing the > > retry counter: > > > > Before: > > if (nr_retries--) > > goto retry; > > > > if (passed_oom && task_is_dying()) > > goto nomem; > > > > After: > > if (passed_oom && task_is_dying()) > > goto nomem; > > > > if (nr_retries--) > > goto retry; > > > > This makes the control flow more obvious: check exit conditions first, > > then decide whether to retry. When current task is dying (e.g., has > > received SIGKILL or is exiting), we should exit immediately rather than > > consuming a retry count first. > > > > No functional change for the common case where task is not dying. > > It's definitely a functional change, not just code clarification. > > The oom kill resets nr_retries. This means that currently, even an OOM > victim is going to retry a full set of reclaims, even if they are > hopeless. After your patch, it'll retry for other reasons but can bail > much earlier as well. Check the other conditions. > > The dying task / OOM victim allocation path is tricky and it tends to > fail us in the rarest and most difficult to debug scenarios. There > should be a good reason to change it. The task_is_dying() check in try_charge_memcg() identifies when the CURRENT task (the caller) is the OOM victim - not when some other process was killed. Two scenarios: 1. Normal allocator triggers OOM: - Process A allocates → triggers OOM - Process B is killed (victim) - Process A continues with reset retries - task_is_dying() = false for A → Unchanged by my patch 2. Victim tries to allocate: - Process B (victim, TIF_MEMDIE set) tries to allocate - task_is_dying() = true - Current code: wastes retries on hopeless reclaims - My patch: exits immediately → Optimization for this case The victim has three safety mechanisms that make the retries unnecessary: 1. oom_reaper proactively frees its memory 2. __alloc_pages_slowpath() grants reserves via oom_reserves_allowed() 3. Critical allocations with __GFP_NOFAIL still reach force: label The retry loop for a dying victim is futile because: - Reclaim won't help (victim is being killed to free memory!) - Victim will exit regardless - Just wastes CPU cycles Would you like me to provide evidence showing the unnecessary retries, or run specific tests to verify the safety mechanisms are sufficient? Best Regards, Dipendra