From: Yosry Ahmed <yosryahmed@google.com>
To: Rik van Riel <riel@surriel.com>
Cc: Balbir Singh <balbirs@nvidia.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
hakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com,
Nhat Pham <nphamcs@gmail.com>
Subject: Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap
Date: Thu, 12 Dec 2024 09:06:25 -0800 [thread overview]
Message-ID: <CAJD7tkY=bHv0obOpRiOg4aLMYNkbEjfOtpVSSzNJgVSwkzaNpA@mail.gmail.com> (raw)
In-Reply-To: <20241212115754.38f798b3@fangorn>
On Thu, Dec 12, 2024 at 8:58 AM Rik van Riel <riel@surriel.com> wrote:
>
> A task already in exit can get stuck trying to allocate pages, if its
> cgroup is at the memory.max limit, the cgroup is using zswap, but
> zswap writeback is enabled, and the remaining memory in the cgroup is
> not compressible.
>
> This seems like an unlikely confluence of events, but it can happen
> quite easily if a cgroup is OOM killed due to exceeding its memory.max
> limit, and all the tasks in the cgroup are trying to exit simultaneously.
>
> When this happens, it can sometimes take hours for tasks to exit,
> as they are all trying to squeeze things into zswap to bring the group's
> memory consumption below memory.max.
>
> Allowing these exiting programs to push some memory from their own
> cgroup into swap allows them to quickly bring the cgroup's memory
> consumption below memory.max, and exit in seconds rather than hours.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
Thanks for sending a v2.
I still think maybe this needs to be fixed on the memcg side, at least
by not making exiting tasks try really hard to reclaim memory to the
point where this becomes a problem. IIUC there could be other reasons
why reclaim may take too long, but maybe not as pathological as this
case to be fair. I will let the memcg maintainers chime in for this.
If there's a fundamental reason why this cannot be fixed on the memcg
side, I don't object to this change.
Nhat, any objections on your end? I think your fleet workloads were
the first users of this interface. Does this break their expectations?
> ---
> v2: use mm_match_cgroup as suggested by Yosry Ahmed
>
> mm/memcontrol.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7b3503d12aaf..ba1cd9c04a02 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5371,6 +5371,18 @@ bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
> if (!zswap_is_enabled())
> return true;
>
> + /*
> + * Always allow exiting tasks to push data to swap. A process in
> + * the middle of exit cannot get OOM killed, but may need to push
> + * uncompressible data to swap in order to get the cgroup memory
> + * use below the limit, and make progress with the exit.
> + */
> + if (unlikely(current->flags & PF_EXITING)) {
> + struct mm_struct *mm = READ_ONCE(current->mm);
> + if (mm && mm_match_cgroup(mm, memcg))
> + return true;
> + }
> +
> for (; memcg; memcg = parent_mem_cgroup(memcg))
> if (!READ_ONCE(memcg->zswap_writeback))
> return false;
> --
> 2.47.0
>
>
next prev parent reply other threads:[~2024-12-12 17:07 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-12 16:57 Rik van Riel
2024-12-12 17:06 ` Yosry Ahmed [this message]
2024-12-12 17:51 ` Shakeel Butt
2024-12-12 18:02 ` Rik van Riel
2024-12-12 18:18 ` Nhat Pham
2024-12-12 18:11 ` Nhat Pham
2024-12-12 18:30 ` Johannes Weiner
2024-12-12 21:35 ` Shakeel Butt
2024-12-12 21:41 ` Yosry Ahmed
2024-12-13 0:32 ` Roman Gushchin
2024-12-13 4:42 ` Johannes Weiner
2024-12-16 15:39 ` Michal Hocko
2025-01-14 16:09 ` Johannes Weiner
2025-01-14 16:46 ` Michal Hocko
2025-01-14 16:51 ` Rik van Riel
2025-01-14 17:00 ` Michal Hocko
2025-01-14 17:11 ` Rik van Riel
2025-01-14 18:13 ` Michal Hocko
2025-01-14 19:23 ` Johannes Weiner
2025-01-14 19:42 ` Michal Hocko
2025-01-15 17:35 ` Rik van Riel
2025-01-15 19:41 ` Michal Hocko
2025-01-14 16:54 ` Michal Hocko
2025-01-14 16:56 ` Rik van Riel
2025-01-14 16:56 ` Michal Hocko
2024-12-12 18:31 ` Roman Gushchin
2024-12-12 20:00 ` Rik van Riel
2024-12-13 0:49 ` Roman Gushchin
2024-12-13 2:54 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJD7tkY=bHv0obOpRiOg4aLMYNkbEjfOtpVSSzNJgVSwkzaNpA@mail.gmail.com' \
--to=yosryahmed@google.com \
--cc=akpm@linux-foundation.org \
--cc=balbirs@nvidia.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=riel@surriel.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox