From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2060EE7717F for ; Thu, 12 Dec 2024 17:07:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA0E86B007B; Thu, 12 Dec 2024 12:07:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A518B6B0082; Thu, 12 Dec 2024 12:07:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9190F6B0085; Thu, 12 Dec 2024 12:07:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6022B6B007B for ; Thu, 12 Dec 2024 12:07:06 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A0E6F1A043E for ; Thu, 12 Dec 2024 17:07:05 +0000 (UTC) X-FDA: 82886935446.17.FB27459 Received: from mail-qv1-f47.google.com (mail-qv1-f47.google.com [209.85.219.47]) by imf05.hostedemail.com (Postfix) with ESMTP id A8306100019 for ; Thu, 12 Dec 2024 17:06:16 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mrcII3iK; spf=pass (imf05.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734023197; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gCU6Io7K5lloOjr5AqAuJ1H3CWWQOcWOOH73iAc8mqI=; b=RO/rmgP9CC5q+DIJEScZos+mSOfW7MARWW9IJCv1Yi3LWX2y008DiqOgXlAgRGiy/PXrBS kUKsk5bIuuOw4QMXu7dmbEz94dIglRMoRxqBeL+F0OONa8Y+oGnPhRbaK/NmANohLDEnGf iCM16O2ueC9Sm7fbJcJNtzNW6pADQ+s= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mrcII3iK; spf=pass (imf05.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734023197; a=rsa-sha256; cv=none; b=DLGORrZixn7SvrUiGPX6yBxY9yEqfaxp0yU7nrQQdFd8UplUPNKUSzMDHl1kEWegvlUaf3 7euGRPhTvBnTBengZcJNtmGvYEP7Fh749JjIq8+mwENCLaBgExXxqLnN0ejwsfq7lga3A3 cCwrzTQAIjOQgQBhWzGDxonh5sw0p4U= Received: by mail-qv1-f47.google.com with SMTP id 6a1803df08f44-6d8ece4937fso6246176d6.2 for ; Thu, 12 Dec 2024 09:07:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734023222; x=1734628022; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gCU6Io7K5lloOjr5AqAuJ1H3CWWQOcWOOH73iAc8mqI=; b=mrcII3iKu+Sss+cFMQUCRKYMQnWbXgK4Kv+hupmaAMDE81uLoXRDWCRV3uJZzhaAbw KBFK+WlVOrATVG4R6GlGggHpyUzQldGxVF38YLTi1UMFkg2rWbiD+czxZEicYZzc5zJ/ 8f9Keauzfx2aIsQE6JJki0Tu0yfjqiuczN13VleGFBoAhfN5eQBFX79Wh2yeqQVSxKkR uWPFLm2TcQvGm5BauX/UQ5yChgvh2SPEHfemgoSoHbFe+GiAfUsEp0WrWFnOSJuPfLzn lSN3ZFdm+SCPxSEe7D4+Bo65eL1lHoO1tisSCATj+TLigdLTXTm066P2hknxkIM2QkVk ckOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734023222; x=1734628022; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gCU6Io7K5lloOjr5AqAuJ1H3CWWQOcWOOH73iAc8mqI=; b=vjoxLozH72jAaJ1/wMfOucJCwaw3TMEGBIECbbU3RL4/P82XxSdwqoUtq9cxhAJRYo b2Pm00418pJ/BSMVjo+l5hqxoVoz9YcDoRyijOkrUP/Cp3wXUAhtt2y5LjdCdd1uwf0/ 4rpFQoxjDhldVFJqN0uik/tf3K8pdUM4UbgcWLvtd0fBIfvQPb48SIcroqPwTvSfO6g4 6csf5l4fj1AFpR6oh0DsAPzb0Qa2jj74oHOS6oyamWDWJyzWRZl5m3H9JCkZ8tY6/JU2 U5D6Y5YWmblFRdoR7mUV7GEcX1rqsDQMixs5swNqdIgimGRBGuMy+qKL2Fg1q+6jzgB5 xgJQ== X-Forwarded-Encrypted: i=1; AJvYcCUBgbzGPZ4ZrZdMXYHM4I+x3N/+FzMkxJMiM7tAPOKSv+vgzoaZxbHCWZU6nLFbrxs7gMku8RZSgg==@kvack.org X-Gm-Message-State: AOJu0YxcLu0xqgtYURvwOiKT75w0W/HiBmeg10fLYMb1fnhH24jPEBeP 1DbkZPqPMlZTmfEl00ljJrfRXw3nuCW4Y4kvSuIpLqPxBI3uSigXDjpyoLWaAVK25Uzibdd9sQd 8J8EVtzi6GOviPF1xI8fMsJN/nLqAzK0gaDaM X-Gm-Gg: ASbGncufupn0B9iuOeD3E6V0vOlF/SzQWgD0bx4jtjorhyUaGwEbgfi6d794/NK21vc 9wKyQVAer9aSy/WuhAfEZ3T7WhxXoGfcjXJs= X-Google-Smtp-Source: AGHT+IEWYnxKGd+UY4guVytg68DPFmVI424eT9T154/j96Ckzc2xwC/UXvDBjuCYYXtvEZKkQodesHCobvjymkaBzwE= X-Received: by 2002:a05:6214:1d04:b0:6cb:edd7:ac32 with SMTP id 6a1803df08f44-6db0f744d63mr13338916d6.12.1734023222307; Thu, 12 Dec 2024 09:07:02 -0800 (PST) MIME-Version: 1.0 References: <20241212115754.38f798b3@fangorn> In-Reply-To: <20241212115754.38f798b3@fangorn> From: Yosry Ahmed Date: Thu, 12 Dec 2024 09:06:25 -0800 X-Gm-Features: AbW1kvYqCkeLbtm1kn4Hmr1w3G_jj-nRHwu7-NDzHzt7kzJ52IijDm_4rEdM63c Message-ID: Subject: Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap To: Rik van Riel Cc: Balbir Singh , Johannes Weiner , Michal Hocko , Roman Gushchin , hakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Nhat Pham Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A8306100019 X-Rspam-User: X-Stat-Signature: 9tdkajf7kxxxr1rjib41h3bxyjcf14cu X-HE-Tag: 1734023176-933789 X-HE-Meta: U2FsdGVkX1/JlZaG9cWkhiVLSLnj7vivhNN+pJNBUgtcU3olasFu9/0L6P4+qP0UmGK76E8/F43VyUNXuJeMrsKe5WtsqL7HevdQYExcgNyM7y9uyLo9v9QY60zp/cXnUndvacxf+xKYeBGEFwcNvOkINAOIZqVtVzcrSDwIWV1/DN+jxsco4slcOm58pFkyGFNMNiHZ24oOzmS5CWyRgj5hIgNQvmiuAF174Xlf3tlVp9a9hEsPaHG4E4/3zGFOF0ojYIVx1ZD0Zr6Ha3NR9HjnHH0WOVhaQj0nve4KxFQCrhxafQKxVEmH50s/cCSPyw3PIBrQL88rD2iwibiV5IyAEBtUyjaUAlJZLGpvrN90TlT7O2Pw0Y45N2Ax2eErl26p+UZNkLv6IzX/4ix1YpYwzfFjBHqHre+Mz71bvub2H+0vidCh2lVxX5JvpyT7Bfwfh7FHmJrPiEFM0eSHkGzoyesTKllaDyzmZUGT4nCvbPsqw5BOA8xscCzDaI64f3N13AzJ1qER7y7pFcd0h21skv8Uu5cf6OQfB4dYdstofF/Fb6Zi7G1c7u4rAgjrhu888DlttTJqJLkjRgH/1Pn6JWqIEqDi4p/G25uRiqXXbJOy2ABVh4L/IM+EQG1LemQQxVRr9YzBMg5svaDEn4GlL5/jzVPYfFB8cuYECSU6/nCYpOpWG+Lp4jRYSbZsoiypssCiW6cmjuOmTt+d3x4ZAtjf72E/i6xYTw6mptvy8jmoUIYCGufxqvX6JusZ+8gHExXjUSmN+jSh6UvAmAT/QR9ouEMgGSmTEuvoB7TF00I3TABt2NKczH51XZIbPLPzaLlWLGUww3rKMfqfxq4RJS1/l3r6zS0gyI+Mm6BuFTHXzx2+SORT14zwoVL4bO/oLcf1rFVnFSppcBgO8C0a9dZn1RvLW/btyY3l89zmZdnuIfNt7KiPffp6pFJcm2c72VGyCaulFShDSQR PecpH1Bu gwDKxE7BFDlFLyv/d5F2PwshfyJptE1IpTw9W2HUJCXlugUDTKqROZjfqvfolFzcC0u03e5QEOIJ8y0YWSh+bU1VQChs5BtOGkVOuh8pgfVIXT5T8D5s3pyfxr2kP2kw5e8ONOKp4H1nNVEwzGMEgZKAFlIW9Nns4NX5OXyaqYHY3IdmX74adtbmpH19uqc6PIUF3SftOXJLoZDxYK6aHMKDPfJPadVvXgpLw7G7HVyaKFwnUG/1mUzP9UV4k7vPgj4+S X-Bogosity: Ham, tests=bogofilter, spamicity=0.037176, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 12, 2024 at 8:58=E2=80=AFAM Rik van Riel wro= te: > > A task already in exit can get stuck trying to allocate pages, if its > cgroup is at the memory.max limit, the cgroup is using zswap, but > zswap writeback is enabled, and the remaining memory in the cgroup is > not compressible. > > This seems like an unlikely confluence of events, but it can happen > quite easily if a cgroup is OOM killed due to exceeding its memory.max > limit, and all the tasks in the cgroup are trying to exit simultaneously. > > When this happens, it can sometimes take hours for tasks to exit, > as they are all trying to squeeze things into zswap to bring the group's > memory consumption below memory.max. > > Allowing these exiting programs to push some memory from their own > cgroup into swap allows them to quickly bring the cgroup's memory > consumption below memory.max, and exit in seconds rather than hours. > > Signed-off-by: Rik van Riel Thanks for sending a v2. I still think maybe this needs to be fixed on the memcg side, at least by not making exiting tasks try really hard to reclaim memory to the point where this becomes a problem. IIUC there could be other reasons why reclaim may take too long, but maybe not as pathological as this case to be fair. I will let the memcg maintainers chime in for this. If there's a fundamental reason why this cannot be fixed on the memcg side, I don't object to this change. Nhat, any objections on your end? I think your fleet workloads were the first users of this interface. Does this break their expectations? > --- > v2: use mm_match_cgroup as suggested by Yosry Ahmed > > mm/memcontrol.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 7b3503d12aaf..ba1cd9c04a02 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5371,6 +5371,18 @@ bool mem_cgroup_zswap_writeback_enabled(struct mem= _cgroup *memcg) > if (!zswap_is_enabled()) > return true; > > + /* > + * Always allow exiting tasks to push data to swap. A process in > + * the middle of exit cannot get OOM killed, but may need to push > + * uncompressible data to swap in order to get the cgroup memory > + * use below the limit, and make progress with the exit. > + */ > + if (unlikely(current->flags & PF_EXITING)) { > + struct mm_struct *mm =3D READ_ONCE(current->mm); > + if (mm && mm_match_cgroup(mm, memcg)) > + return true; > + } > + > for (; memcg; memcg =3D parent_mem_cgroup(memcg)) > if (!READ_ONCE(memcg->zswap_writeback)) > return false; > -- > 2.47.0 > >