linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@surriel.com>,
	Yosry Ahmed <yosryahmed@google.com>,
	Balbir Singh <balbirs@nvidia.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	hakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel-team@meta.com,
	Nhat Pham <nphamcs@gmail.com>
Subject: Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap
Date: Tue, 14 Jan 2025 20:42:18 +0100	[thread overview]
Message-ID: <Z4a-GllRm7KABAu7@tiehlicka> (raw)
In-Reply-To: <20250114192322.GB1115056@cmpxchg.org>

On Tue 14-01-25 14:23:22, Johannes Weiner wrote:
> On Tue, Jan 14, 2025 at 07:13:07PM +0100, Michal Hocko wrote:
> > Anyway, have you tried to reproduce with 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 7b3503d12aaf..9c30c442e3b0 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1627,7 +1627,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
> >  	 * A few threads which were not waiting at mutex_lock_killable() can
> >  	 * fail to bail out. Therefore, check again after holding oom_lock.
> >  	 */
> > -	ret = task_is_dying() || out_of_memory(&oc);
> > +	ret = out_of_memory(&oc);
> >  
> >  unlock:
> >  	mutex_unlock(&oom_lock);
> > 
> > proposed by Johannes earlier? This should help to trigger the oom reaper
> > to free up some memory.
> 
> Yes, I was wondering about that too.
> 
> If the OOM reaper can be our reliable way of forward progress, we
> don't need any reserve or headroom beyond memory.max.
> 
> IIRC it can fail if somebody is holding mmap_sem for writing. The exit
> path at some point takes that, but also around the time it frees up
> all its memory voluntarily, so that should be fine. Are you aware of
> other scenarios where it can fail?

Setting MMF_OOM_SKIP is the final moment when oom reaper can act. This
is after exit_mm_release which releases futex. Also get_user callers
shouldn't be holding exclusive mmap_lock as that would deadlock when PF
path takes the read lock, right?

> What if everything has been swapped out already and there is nothing
> to reap? IOW, only unreclaimable/kernel memory remaining in the group.

Yes, this is possible. It is also possible the the oom victim depletes
oom reserves globally and fail the allocation resulting in the same
problem. Reserves do buy some time but do not solve the underlying
issue.

> It still seems to me that allowing the OOM victim (and only the OOM
> victim) to bypass memory.max is the only guarantee to progress.
> 
> I'm not really concerned about side effects. Any runaway allocation in
> the exit path (like the vmalloc one you referenced before) is a much
> bigger concern for exceeding the physical OOM reserves in the page
> allocator. What's a containment failure for cgroups would be a memory
> deadlock at the system level. It's a class of kernel bug that needs
> fixing, not something we can really work around in the cgroup code.

I do agreee that a memory deadlock is not really proper way to deal with
the issue. I have to admit that my understanding was based on ENOMEM
being properly propagated out of in kernel user page faults. It seems I
was wrong about that. On the other hand wouldn't that be a proper way to
deal with the issue? Relying on allocations never failing is quite
fragile.

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2025-01-14 19:42 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-12 16:57 Rik van Riel
2024-12-12 17:06 ` Yosry Ahmed
2024-12-12 17:51   ` Shakeel Butt
2024-12-12 18:02     ` Rik van Riel
2024-12-12 18:18       ` Nhat Pham
2024-12-12 18:11   ` Nhat Pham
2024-12-12 18:30   ` Johannes Weiner
2024-12-12 21:35     ` Shakeel Butt
2024-12-12 21:41       ` Yosry Ahmed
2024-12-13  0:32     ` Roman Gushchin
2024-12-13  4:42       ` Johannes Weiner
2024-12-16 15:39     ` Michal Hocko
2025-01-14 16:09       ` Johannes Weiner
2025-01-14 16:46         ` Michal Hocko
2025-01-14 16:51           ` Rik van Riel
2025-01-14 17:00             ` Michal Hocko
2025-01-14 17:11               ` Rik van Riel
2025-01-14 18:13                 ` Michal Hocko
2025-01-14 19:23                   ` Johannes Weiner
2025-01-14 19:42                     ` Michal Hocko [this message]
2025-01-15 17:35                       ` Rik van Riel
2025-01-15 19:41                         ` Michal Hocko
2025-01-14 16:54           ` Michal Hocko
2025-01-14 16:56             ` Rik van Riel
2025-01-14 16:56             ` Michal Hocko
2024-12-12 18:31 ` Roman Gushchin
2024-12-12 20:00   ` Rik van Riel
2024-12-13  0:49     ` Roman Gushchin
2024-12-13  2:54     ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4a-GllRm7KABAu7@tiehlicka \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbirs@nvidia.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox