From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54266E77180 for ; Thu, 12 Dec 2024 20:01:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D6D046B0083; Thu, 12 Dec 2024 15:01:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D1AE66B0093; Thu, 12 Dec 2024 15:01:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE2A66B009D; Thu, 12 Dec 2024 15:01:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7FF6F6B0083 for ; Thu, 12 Dec 2024 15:01:01 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id EE2B3A08FF for ; Thu, 12 Dec 2024 20:01:00 +0000 (UTC) X-FDA: 82887375228.02.C1060FA Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf22.hostedemail.com (Postfix) with ESMTP id 89729C001D for ; Thu, 12 Dec 2024 20:00:31 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734033642; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ow5M0+5T/6Qi3sbm7jc3j1qsc21tA6NQsFIO506F31k=; b=vbUrndZq4k0ZoTsB6B1VzTwwmKhV4bgZEZOIC2UPOLY+PHrN2p10PbQ82fJKtmyRTmtnKG PPp+fmyhnyOJQMNCpsdJyGQEirxOPcKGvWF52zG0VDofNIOgW0HbebqIMbffsdiip9TJ08 EWwXXKbWvXUn327IBMmZJ7cs333U0U4= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734033642; a=rsa-sha256; cv=none; b=ovKEBew3A6Ddf8zzj3xNp//myOztt9PZ22oCbVAzl1V3TvGdXZaj53eyNX/kv2Prw5b4U3 bWToA521amLf2PDJkdfTTdzrBFOANlKE+cVTxkDoiCj6wDZW1Avn6qyEqr2oyJR+Ai3BNw 4lKsT0q93ofvtrDrHHr3r7BZZSKmmKM= Received: from [2601:18c:9101:a8b6:82e7:cf5d:dfd9:50ef] (helo=fangorn) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tLpLv-000000000vW-3YTZ; Thu, 12 Dec 2024 15:00:03 -0500 Date: Thu, 12 Dec 2024 15:00:03 -0500 From: Rik van Riel To: Roman Gushchin Cc: Yosry Ahmed , Balbir Singh , Johannes Weiner , Michal Hocko , hakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Nhat Pham Subject: Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap Message-ID: <20241212150003.1a0ed845@fangorn> In-Reply-To: References: <20241212115754.38f798b3@fangorn> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Stat-Signature: mdz9fyguanerx1rcbafgxe8fgt8w46mf X-Rspamd-Queue-Id: 89729C001D X-Rspam-User: X-HE-Tag: 1734033631-745184 X-HE-Meta: U2FsdGVkX1+3OhGkLSjQjV+1DQ/Q+LFa/8P7n6vO4nQjLWqyjJtxmyjj1776tzw5eUN+kMlZgFgF1GoPquJCjp2W8yZ7mR1TT/+q3DscDiqmRTPA//PgTOCwVJgPYbSl8Zh0CjpOcuIeHBX1cKuZQyULylOn6qgFE/2KlySJuY/aE61cSRan939oW7GU+bfbNDMheX8l28ILocus23FjZLPbibEqdjusz5U3K0k5fpmNsa0WsedzjmAW+zms8j73PkhH2g34yofOs7DHc39z3zy9zgwrLFnbYsLQ9eckleqsuGsOPIu4c+/iXoi+YVkHDIoLnGm8CGFSTQc3cqxWOf/XYj+k/sJ1YNhtW8ZnIKBlieNXGoIyHw+/7Mt1ABfdGO0p0cQYIgv2ODMPKtXzcTo23liZN/xeEIp7C9Jk8RXDi9t1Kvs09wnFYQK1cx5nPDTN9C36z1SshDsn1axHYzxrL+czmzevGonkVZz/MZlmFaDSgKNqTB1gMJWXJVqvl1dEQ4jAQafK8EYG1J1v0ArhFhPp22+Mr/3/+SpZh6bObcGxM1dKAIXLOL9DaWWmg+GhStYcoXieQLEgo8I9rbCBj7cSN+1mx+03YCEoU+nAnvKxO//yoQ2awE167yWbLYTPK0zQ6PA2PIyl+tEqHKnEVBwCNc6i38CiP34c9Gc++1ViPeomcBTDnc+gjHc7UV++6yKW6vvYcmCNfPunNIhc7g1s43phpQIRdNptl6UwY5H1l3AgRf7FyOi5IpaGhLGnH1D2Txja8YRTHi4r2hH1msKVo0AWnsKMTHxO7UGFGfRut0uocJZKzDgwUaFoDtIoaTODVfzhkwNSTxSuzk6ua3+PKwuQ6jaK04a8mhSB6Ql+j8DnOtBZY1Tyy/vygIQyIRS7j/wmGZJV6WyF++Bi6ETBSL6kxdJ1qMY12Zk7fJYyUnE+Oacwd4zztx9zqIbVIyVUX/9wKMqHLVD H1gG468Q we6icJOnFExv8euuO2hM4gE7/hLKfP0b/335rX2paSfXQ38etBndHkijPxj0lUlVnMl75hZn6CmuHuHChhPKzJHn+7dGuIjRvd1A9TSUFwDMfKqC8vt2s4I7AoUrSBpz968Uj15dL2yDcdl2GmVpDWXlj775FsliEjsltJ5NWG8SOYbg5kkm43rJeE7NGOZd+QPGq9XrzvyBIC4m+K2giLKBKZI/Kvy/L5TJdPrY0wY4tUIigWwigjM/v4Yc1/9vds87pNX5VL0jMTOdSdHV+LllldgO/7ciri+c6B/T6N7AO0qy3P3thSX/3UYxPp8nMaTkSVoj3GwTv1z+xP4huDBpaCBZWnAlFeMwl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 12 Dec 2024 18:31:57 +0000 Roman Gushchin wrote: > Is it about a single task or groups of tasks or the entire cgroup? > If former, why it's a problem? A tight memcg limit can slow things down > in general and I don't see why we should treat the exit() path differentl= y. >=20 I think the exit path does need to be treated a little differently, since this exit may be the only way such a cgroup can free up memory. > If it's about the entire cgroup and we have essentially a deadlock, > I feel like we need to look into the oom reaper side. You mean something like the below? I have not tested it yet, because we don't have any stuck cgroups right now among the workloads that I'm monitoring. ---8<--- =46rom c0e545fd45bd3ee24cd79b3d3e9b375e968ef460 Mon Sep 17 00:00:00 2001 From: Rik van Riel Date: Thu, 12 Dec 2024 14:50:49 -0500 Subject: [PATCH] memcg,oom: speed up reclaim for exiting tasks When a memcg reaches its memory limit, and reclaim becomes unavailable or slow for some reason, for example only zswap is available, but zswap writeback is disabled, it can take a long time for tasks to exit, and for the cgroup to get back to normal (or cleaned up). Speed up memcg reclaim for exiting tasks by limiting how much work reclaim does, and by invoking the OOM reaper if reclaim does not free up enough memory to allow the task to make progress. Signed-off-by: Rik van Riel --- include/linux/oom.h | 8 ++++++++ mm/memcontrol.c | 11 +++++++++++ mm/oom_kill.c | 6 +----- 3 files changed, 20 insertions(+), 5 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index 1e0fc6931ce9..b2d9cf936664 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -111,4 +111,12 @@ extern void oom_killer_enable(void); =20 extern struct task_struct *find_lock_task_mm(struct task_struct *p); =20 +#ifdef CONFIG_MMU +extern void queue_oom_reaper(struct task_struct *tsk); +#else +static intern void queue_oom_reaper(struct task_struct *tsk) +{ +} +#endif + #endif /* _INCLUDE_LINUX_OOM_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7b3503d12aaf..21f42758d430 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2231,6 +2231,9 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t = gfp_mask, if (!gfpflags_allow_blocking(gfp_mask)) goto nomem; =20 + if (unlikely(current->flags & PF_EXITING)) + gfp_mask |=3D __GFP_NORETRY; + memcg_memory_event(mem_over_limit, MEMCG_MAX); raised_max_event =3D true; =20 @@ -2284,6 +2287,14 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t= gfp_mask, goto retry; } nomem: + /* + * We ran out of memory while inside exit. Maybe the OOM + * reaper can help reduce cgroup memory use and get things + * moving again? + */ + if (unlikely(current->flags & PF_EXITING)) + queue_oom_reaper(current); + /* * Memcg doesn't have a dedicated reserve for atomic * allocations. But like the global atomic pool, we need to diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 1c485beb0b93..8d5278e45c63 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -686,7 +686,7 @@ static void wake_oom_reaper(struct timer_list *timer) * before the exit path is able to wake the futex waiters. */ #define OOM_REAPER_DELAY (2*HZ) -static void queue_oom_reaper(struct task_struct *tsk) +void queue_oom_reaper(struct task_struct *tsk) { /* mm is already queued? */ if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) @@ -735,10 +735,6 @@ static int __init oom_init(void) return 0; } subsys_initcall(oom_init) -#else -static inline void queue_oom_reaper(struct task_struct *tsk) -{ -} #endif /* CONFIG_MMU */ =20 /** --=20 2.47.0