From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACE6BE7717F for ; Thu, 12 Dec 2024 20:45:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BF2C6B00A5; Thu, 12 Dec 2024 15:45:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 047586B00A6; Thu, 12 Dec 2024 15:45:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E03696B00A7; Thu, 12 Dec 2024 15:45:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BDD2C6B00A5 for ; Thu, 12 Dec 2024 15:45:22 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 6B5DF1A08B9 for ; Thu, 12 Dec 2024 20:45:22 +0000 (UTC) X-FDA: 82887485520.20.646431C Received: from mail-qv1-f47.google.com (mail-qv1-f47.google.com [209.85.219.47]) by imf15.hostedemail.com (Postfix) with ESMTP id 75B4BA002B for ; Thu, 12 Dec 2024 20:44:50 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=uXwpFkIn; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf15.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.47 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734036309; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TyQT77h23X+xU78cV1YaX8EfQe1q/U0zcpXq7aqXgvk=; b=h5AWmk6B4JmFcfM5Sm9v7IPpfuLZwtoI704dAOpDluH/kIZrdIjjN88q/ryIuBsDNUckVw ue+eJHUUaS87byUp6Q6DmmAU2jXYu/CxGfNa/f8WAU9s0vvNSJ7MTwrPTG6DL1BK+Z3wGU 0A8Lpql3OC3pWZhqOXUfxdRg7/IqThY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734036309; a=rsa-sha256; cv=none; b=Gt6n3gF+tCVSxIfr4PXbuOnaKF6bIvgV4wIwyXGaZiA189VhgWLvkjAFxfteSwQq8lW9KF bkvedaXxSPhXpHL+8P1aO864rXYxn/cxGTdtU3T6G2Sj2JRvS9Ja61OaiQDZdmb64Cdfuo ZvXJGE0P5DTVQIdSs9Gq1RiMF47KQ+E= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=uXwpFkIn; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf15.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.47 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org Received: by mail-qv1-f47.google.com with SMTP id 6a1803df08f44-6d8918ec243so11741286d6.1 for ; Thu, 12 Dec 2024 12:45:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1734036318; x=1734641118; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=TyQT77h23X+xU78cV1YaX8EfQe1q/U0zcpXq7aqXgvk=; b=uXwpFkInKan5qL0BUCOMXkg0s9dAlrLZV8XQJa0kSOqoey3e0Dm/svKxgMEcaRwmtV F7v0FvsyvAjMGteoVEjw13k8MT5cZbQmQJ+YPDmTRsVEaqrkdpFQlrc+BOj+qhJyHqBo 7UleYBdEgPXCImgCNWiMtYipygH7wmWWSrEt++0vyj0+Iq2x0UU+ZHRT6IlqUNu5yEYN nVrFYhY8bo1s2x7qUQrTFW79Kb0XS2NWKnO2MtADXAMZ4+oVAqLnOWV//JNFg8WqE+g5 6ItD6bPPllKa+u1ZDZkWsYZgmzfLdQrZ/c3TL07V2S9fOWgS6XMsftHA8yTUwj5Kjlul DFSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734036318; x=1734641118; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=TyQT77h23X+xU78cV1YaX8EfQe1q/U0zcpXq7aqXgvk=; b=REX6tKVHfCbpORWKIrZ8oU6IPMFX8CfFGLEA2qE6k7PdBhQnJM6HmPppKudwtHWBL8 a/iF7yodoABpc6NckEzCcfqwwHPzHLKE6kEj9Ef96B0DYe30ai1IAErabP2yA4/bfh+r rxHii51PaG7mcSqW2UwBqPdKRnPhPfAIyFYDme7haPZGH4jEb9Ek/TN6SwZVslBIBAug 4zlmipUZbeBUcglMgjnbOrVOvhl1SL842R1ukBKIJ58O6Rg8hs9DQV5kYzGRo/OXF5ZS iWGpSGWDObyrO12swuZJH04RLC5WU6J8Fr++BK5t9pdNj9a5jOM1lbJBXDrOMR8S88Ej oBmw== X-Forwarded-Encrypted: i=1; AJvYcCVW5wHVdHz2QwOChbolQNVvEh4pKddBd3lINzVHFy5CeRE6suU7H9DzAHxLCFn3ISxjG5c2Op05Zg==@kvack.org X-Gm-Message-State: AOJu0YzOxVko0f6iQKqRWdXsDCLHufw+mrru1UueQ93AvKydNqyF07Cn z9Loc6byveomxweSVg4HOShY7Te1zYTTXueVVkg49YEqPkSLjp/Rz059mI+UL4s= X-Gm-Gg: ASbGnctHPOw6BTlItNwoCZcwSBNtlwamA8Mmnoir/NEHCrKP3imOHh+ACjvrQ3Uj9Ct DnDvc8mdct45x8FfNDt7D2uoI++T54UMmL3vffGwX2fMuRevfcwFszNFIQp4CvRJyIVUwquL5MK UrvAoJKWHm74W11z06vrcJE+rd4eGfB19iQ3l8pbt1juWv3WaoiU97olojxr8Kb21kGgcQZvU4F ijYj7uUFXS0N1QHUuCRUPsGZ37WtJM1vxTHWsouTHui2Ixv2J1D4Ik= X-Google-Smtp-Source: AGHT+IFOLo1qW5rYvlQMb9BVGyYA7xOiMNbLd35SK/ZwdxiRQHpN+3OFqqXCWMQMrYJZ8PDm6T7V9g== X-Received: by 2002:a05:6214:248d:b0:6d8:b3a7:75ba with SMTP id 6a1803df08f44-6db0f827d9dmr29607546d6.45.1734036318577; Thu, 12 Dec 2024 12:45:18 -0800 (PST) Received: from localhost ([2603:7000:c01:2716:97cf:7b55:44af:acd6]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6d8da6b651asm85285236d6.69.2024.12.12.12.45.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Dec 2024 12:45:17 -0800 (PST) Date: Thu, 12 Dec 2024 12:45:13 -0800 From: Johannes Weiner To: Michal Hocko Cc: Rik van Riel , kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org Subject: Re: [PATCH] mm: allow exiting processes to exceed the memory.max limit Message-ID: <20241212204513.GA50370@cmpxchg.org> References: <20241209124233.3543f237@fangorn> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: h7d1jo33suqcwrs378xk6h18aggxpu3z X-Rspamd-Queue-Id: 75B4BA002B X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1734036290-196078 X-HE-Meta: U2FsdGVkX1+TJJAOoOUbBFy0NNKHFV9tbr0zgLFRo44qycE1g3YjewmluL2UUHD9lbeB0kFq/9Mu4cUhECM+Vsn8Dnp40mDrSnVnKstd3pCG/9k+U+KnEZXBKY+/Pc1awmKCSDUCwfUdXROONU6T6dHZqMaUcdTrWPZwankEPhMHqYAfr/DwaNcOmu+6OGWIP4G5LZA3ZYJ9+is+GaiOMo2Im42xwgr+NuUbIrCvHWykzFjZtfvwcKkAl/9yqW43g8prLl5DqcP58Wh3IxirBzMll7NLOjb7zfCd+uoqcNOs57xWoXP9Rn8uq0dIhD2Ub/H1JRrROGxW3vznGib6Zic7gr+XxIm1bEjOsw8fWcPlw0m42dHUDCJ1m2UY7e1oyWv+gCGfxI3oitS4yxeTTUN315x7fJU+zus5jZ9WiIdJE1XPMxxMfWFsp6Kwc9Qwl/YNnn1/fsiIoTD/tAqSrG+nPmESl8f6SJLskORBzd4zM2tV1eqQ61fdUsln2VL4ZM3RqstW4R4t/qfHOMDLvauSN8B0ETR0EUmCzt6AVpEt+OWvhys0nHknzQLP2Vv5d62rRG5KXfxWE2Pc9uF/EwWBlBcX2A41K7xY+ATY5BVjrTqejO4Gx4u11h6I08SsImUPDJ+Uiua9fFwYzfIIjdCS6nrjNOIJZp5HukVnpAzBxZ9vMZzBlas2LezM7ZjEN5yoeXKHalEIMoyB1XmvAjF4PLVylbimQIZL0xyQtTl/zXtm/tLpWNwT20ZMBhVAx9e32+OTY2dBG53h7IpAX6vWQbTVxy4eBUYBLnuhKSuzI826v9LY7ScmSr/KMuWLh97D4kpawFEhAcSXG/YnzHfXSudTW/3wr3oHOWO89SsvZBmXUCIXrxDIApE7RhYsre9J6JIlalcWD0Q7hMyqNv6RZIXyEyglOWg+IizcbHBojwEJo+J3KWyT0bCgxIeOeLkhfxFGpesgMLXOAtd IylODM6D X61mggAmH+2WCUbE9eexX7OxId/c0JSRv5PbGZtMDK7WIs63hlef2rxZjyf6MhVNXOOXP18CNxoMQdIsVxMzZglZcJeYV5KT7hHWGT0CIyQ2QD/NaIDH/YH96e2nGrVvGr1t8gi20IhpZwJD26Zrbsti3vIS3lDA4mXYjADW5z69R0IWiBV9zwRMhPuz5KbjxrBZ7peW3BLheTyduVEVZCYW5uIzRBdrNisw7t/Sb1djpiLgqoLScKPam3jhmExRJkqHujwialg+J5iuu+5iFHVZRw6Q1kZ488zqHd5vc0VaI55LXNVVBMOPxttR8SZ/+e3mWOai4zCSCt8JOipH5oWm4dGxVOz0dVw9wRopx44phadgHLxi4ii9xpq+JFFoKcadoWaYY6SmqenKGw8ctE9ZAtfUPDqHl613hbbGSvISyfvuHQD1euiFrSedsFcC6miAVV9MzwDfYsrZRx5PXAA0XzfycNM1Hyb0kHHQgOzGtrPSpj16pBn0h7W6ayhhutquHchKXFRTVZ68= X-Bogosity: Ham, tests=bogofilter, spamicity=0.030444, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 09, 2024 at 07:08:19PM +0100, Michal Hocko wrote: > On Mon 09-12-24 12:42:33, Rik van Riel wrote: > > It is possible for programs to get stuck in exit, when their > > memcg is at or above the memory.max limit, and things like > > the do_futex() call from mm_release() need to page memory in. > > > > This can hang forever, but it really doesn't have to. > > Are you sure this is really happening? > > > > > The amount of memory that the exit path will page into memory > > should be relatively small, and letting exit proceed faster > > will free up memory faster. > > > > Allow PF_EXITING tasks to bypass the cgroup memory.max limit > > the same way PF_MEMALLOC already does. > > > > Signed-off-by: Rik van Riel > > --- > > mm/memcontrol.c | 9 +++++---- > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 7b3503d12aaf..d1abef1138ff 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2218,11 +2218,12 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, > > > > /* > > * Prevent unbounded recursion when reclaim operations need to > > - * allocate memory. This might exceed the limits temporarily, > > - * but we prefer facilitating memory reclaim and getting back > > - * under the limit over triggering OOM kills in these cases. > > + * allocate memory, or the process is exiting. This might exceed > > + * the limits temporarily, but we prefer facilitating memory reclaim > > + * and getting back under the limit over triggering OOM kills in > > + * these cases. > > */ > > - if (unlikely(current->flags & PF_MEMALLOC)) > > + if (unlikely(current->flags & (PF_MEMALLOC | PF_EXITING))) > > goto force; > > We already have task_is_dying() bail out. Why is that insufficient? Note that the current one goes to nomem, which causes the fault to simply retry. It doesn't actually make forward progress. > It is currently hitting when the oom situation is triggered while your > patch is triggering this much earlier. We used to do that in the past > but this got changed by a4ebf1b6ca1e ("memcg: prohibit unconditional > exceeding the limit of dying tasks"). I believe the situation in vmalloc > has changed since then but I suspect the fundamental problem that the > amount of memory dying tasks could allocate a lot of memory stays. Before that patch, *every* exiting task was allowed to bypass. That doesn't seem right, either. But IMO this patch then tossed the baby out with the bathwater; at least the OOM vic needs to make progress. > There is still this > : It has been observed that it is not really hard to trigger these > : bypasses and cause global OOM situation. > that really needs to be re-evaluated. This is quite vague, yeah. And not clear if a single task was doing this, or a large number of concurrently exiting tasks all being allowed to bypass without even trying. I'm guessing the latter, simply because OOM victims *are* allowed to tap into the page_alloc reserves; we'd have seen deadlocks if a single task's exit path vmallocing could blow the lid on these. I sent a patch in the other thread, we should discuss over there. I just wanted to address those two points made here.