From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6C61C4707C for ; Fri, 12 Jan 2024 21:28:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 42DAC6B009D; Fri, 12 Jan 2024 16:28:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3DC506B00A5; Fri, 12 Jan 2024 16:28:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 255E56B00A4; Fri, 12 Jan 2024 16:28:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0D7526B009C for ; Fri, 12 Jan 2024 16:28:10 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CF1D9120739 for ; Fri, 12 Jan 2024 21:28:09 +0000 (UTC) X-FDA: 81671947098.17.F784186 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf13.hostedemail.com (Postfix) with ESMTP id 176AE2001A for ; Fri, 12 Jan 2024 21:28:06 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=T6ISPMGc; spf=pass (imf13.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=shakeelb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705094887; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ulXxKHR6+R96Ym936Wr/tcnuVWt4HjQnt79btAUo6a0=; b=hk1LNFQLx0/D2+SOlwR8lZklTyCaG51yPTz1fdzGA+Mi5BV9iUA9g3slXKrgxPIpzXF8tt 8LgjmqJz96TtGjMrdxmNA+N9lQnGCna9HfEQIiH3IKFaBZAzzFWpaRuBsqIVPEsfCUl4eu WkLFm1H3LU4aVQsVGkcwBwluwbDVF6U= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=T6ISPMGc; spf=pass (imf13.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=shakeelb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705094887; a=rsa-sha256; cv=none; b=Ugssak6itu37GvQW4/C7rBqLIa71S1e3T5y5wPp2401Af/7ND7vo33oYjEWNMptV6/iQ8i L9gZD1reyoC/6a2E1Go4X6C3ivxCzF34g7MJLnpMm0Q2jRIx3SYkSuZhK/S7TBRC9WH9Ei rLiYYuVDAsO3ZyXujCAmv5h65NEyDE8= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1d47fae33e0so36835ad.0 for ; Fri, 12 Jan 2024 13:28:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705094886; x=1705699686; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ulXxKHR6+R96Ym936Wr/tcnuVWt4HjQnt79btAUo6a0=; b=T6ISPMGcOaqCyxwWHjw215vcKYZ6TQXqCkW4v+41/1aTZaPkUckLnwFt8X88d6uLof qb0OQEWxR1UjOhD42MRj0l1sJVGFX8TovPOZVEN5f1dW8meCwcwhuawL7nPeqjokDquN i2ciQzyXmsHJzSIgSDMO9t8T1A7lnyhAC1aQUlAhFEHpGD4oSlJFTBLEwleIGhDRvpoX V6RrBm+dmsqN894rLd/+Nv4FCPmPzx8v5QJhoq4CvzrOnSBkepxx1SXRalJvaUtvxMKk 8Yssn+POLFEQKv49mhJQ6nfDQNXjZRgOB76xBhNp1oFi5yd9mgMAOzA3Q3HrmfX4MUlg lfOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705094886; x=1705699686; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ulXxKHR6+R96Ym936Wr/tcnuVWt4HjQnt79btAUo6a0=; b=f/AkfOO4nFVY+78kRdNYK+UsUnNRD6P8I1nM59fh3ez1LfY25kDpBbDKhWirD1Dp41 mj6dvQq7EpgntKUMjKh6YPF6Mw5wDgpLUyaiNjUT9cQ0gr7xdM3LAO/9g68XJi+BX5RL IqXYjXyr61vCZ4U3nyknm5uZ4AT3FxJzZSwMSNCvHo6uzW58Y81DTkCgR9SXR3+8sRCS cJkFVKcVXRz+mRVKvpTvga9qE5AlLg1fpkruF8iwlJ/NqylVAycBkWoVIDAZI4rtyiI0 NhlXLfTv6WoQJzfRgYiinNm3aYeFuWJKylpU3VdStLH+sN/isRbvRRUFf/ASxA0VkuzA AisQ== X-Gm-Message-State: AOJu0Yym4HhcRVyOKPYbYGGX4ll2hke0O+s5UzmfJgYtbxn3itOSH61y 7Fs2KR4ioJO50VRHWfQksMLIHgoOWmC16mXzJDQoeHvxndB2 X-Google-Smtp-Source: AGHT+IHTke8Gkj+PF60nPCJXSujK1CjK+FPxEcsD43yT9XaiGsxSwmhN8j5WvcvtGmrPhj7ZMJslu5DxyXB0dGIAeOY= X-Received: by 2002:a17:902:e54b:b0:1d3:c36b:4833 with SMTP id n11-20020a170902e54b00b001d3c36b4833mr460116plf.27.1705094885609; Fri, 12 Jan 2024 13:28:05 -0800 (PST) MIME-Version: 1.0 References: <20240111132902.389862-1-hannes@cmpxchg.org> <20240111192807.GA424308@cmpxchg.org> In-Reply-To: From: Shakeel Butt Date: Fri, 12 Jan 2024 13:27:54 -0800 Message-ID: Subject: Re: [PATCH] mm: memcontrol: don't throttle dying tasks on memory.high To: Roman Gushchin Cc: Johannes Weiner , Andrew Morton , Michal Hocko , Muchun Song , Tejun Heo , Dan Schatzberg , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 176AE2001A X-Rspam-User: X-Stat-Signature: 7ihq5g63afqkf851s9qh5yaed1z9qb5a X-Rspamd-Server: rspam01 X-HE-Tag: 1705094886-892646 X-HE-Meta: U2FsdGVkX1/OLqZdIS0+ZAvQyy487vlQjYn4SCpJp7+Lrsu5kjV68g/ux2Lh2+dNNmxiiou/YoBGIaggGaGdVEyq3kSiq4ofV2slE/flFQSnyI2NrLJTMb8PIbeUbETkzzxacQmT/ydTscyj58yXX0dZoMC3BL9p36mPgV5M8lRffWXBjidUYpDczQxHRtopJtbinunxvr+r9mHgMapD/eX/2RuZdHjgTlcdXq+FgTJzN771EuU8wjaUvcxo8PfFQ3MwgxuPMsonFxZbbV3+TyRLF1QLY0z6fBi9BPruciNdycOO5iZ5WEQWICiobpOOKt+7vCnJ6M/w+f1c+63vCoDRG42m/0GJAz/HJLUm1oNGFlSCjZWV5bp55YQVA7SRztHGAodAz9YdEPH9ue/OMeLiYc+Kk1tcy/XTD1GlhKsKmwhS6PUCgwFVlA/faAgH1zv8rFrnITzRRk/zSyp0iBqat2ffmJ6SDxd/wdfQpe1EbRml9/uKz5dn0faDZTUcDlgmxENpogKiv0XMkzae/Xv9KWcmqxocm4SwPbz+20pOdtpouz5blUObfe2PBVeARh03rDphsGFGHCx4iE92Uf1jnVBZqsL2kbf8Gh2ZP9p+oUbiLMjpXRyPwbUVMUqR0aat8pmCeJ6d8jBMhcqUgEDAnoD7lDP/vqwT8E9czYlHsyzBwsr7VTHwkKD8GMAZDd4glxjsDn0DYzdNYM5TO1VehqBLzfBuQZhIR3fWPzUTNus11kAg0JlDiEwyKzyjUJengIBZZ1rLaqSHIZhrlT7nupso4BKwendS7TzSXxDqk4pm2CFdLIWfH4vUe+tx1a4o1OVgjjtLHhB+kPlx1YbSW3+TKVafMiNoV8+QL4JjWJtX9Q/lSqWPpvKjaQCZ8rEo+bqG3eoCls+4XbdkOW420D9UBpMlAjb6G/+Gnk1Ks04ZdxPg3sAr/4nhapPF7UgzmQUpdJ2KaYymIbM 47YZJ76I T120Z5V0OSuD3IWRUrlCZqiOBwgOOUGySo+2mbtPvXpwYfTWzYo102fslhkxgfnQtZqGtiwSF3Ah/LglpPOMxpco26JtR8akOoYktLSszkvw8FnvJEmbkhbSoY37RNFoBFDECq/L7ZWFx99H655nfZbCyrkLSRl7RxzQkf6qPFL2CQO+dQlCsY9wjHmv0HIK3FED49LVGZ2RsNqYVIOKz59zPj/BHnXLz3iRF4B4JlVLh1e0t+xTk0lyW2LHvKebA57ZuoghRuCVS8ZW5dcI+tP6RugFHAuWd1mowOPr7l3ASRDFUwV5aQjwpcCTqMQ2cxcKQT/uH9EAFJdXoX0pRC/aZru3xq4puOt7S3GfGxlARy9JUEMijxP8d4FtfKPaF0vwQgIUc2NfLzxHcrp/7lxj/oNv0rSW13RJqpGku47o1OXilyqyxuBY5mw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 12, 2024 at 1:00=E2=80=AFPM Roman Gushchin wrote: > > On Fri, Jan 12, 2024 at 11:04:06AM -0800, Shakeel Butt wrote: > > On Thu, Jan 11, 2024 at 11:28=E2=80=AFAM Johannes Weiner wrote: > > > > > [...] > > > > > > From 6124a13cb073f5ff06b9c1309505bc937d65d6e5 Mon Sep 17 00:00:00 200= 1 > > > From: Johannes Weiner > > > Date: Thu, 11 Jan 2024 07:18:47 -0500 > > > Subject: [PATCH] mm: memcontrol: don't throttle dying tasks on memory= .high > > > > > > While investigating hosts with high cgroup memory pressures, Tejun > > > found culprit zombie tasks that had were holding on to a lot of > > > memory, had SIGKILL pending, but were stuck in memory.high reclaim. > > > > > > In the past, we used to always force-charge allocations from tasks > > > that were exiting in order to accelerate them dying and freeing up > > > their rss. This changed for memory.max in a4ebf1b6ca1e ("memcg: > > > prohibit unconditional exceeding the limit of dying tasks"); it noted > > > that this can cause (userspace inducable) containment failures, so it > > > added a mandatory reclaim and OOM kill cycle before forcing charges. > > > At the time, memory.high enforcement was handled in the userspace > > > return path, which isn't reached by dying tasks, and so memory.high > > > was still never enforced by dying tasks. > > > > > > When c9afe31ec443 ("memcg: synchronously enforce memory.high for larg= e > > > overcharges") added synchronous reclaim for memory.high, it added > > > unconditional memory.high enforcement for dying tasks as well. The > > > callstack shows that this path is where the zombie is stuck in. > > > > > > We need to accelerate dying tasks getting past memory.high, but we > > > cannot do it quite the same way as we do for memory.max: memory.max i= s > > > enforced strictly, and tasks aren't allowed to move past it without > > > FIRST reclaiming and OOM killing if necessary. This ensures very smal= l > > > levels of excess. With memory.high, though, enforcement happens lazil= y > > > after the charge, and OOM killing is never triggered. A lot of > > > concurrent threads could have pushed, or could actively be pushing, > > > the cgroup into excess. The dying task will enter reclaim on every > > > allocation attempt, with little hope of restoring balance. > > > > > > To fix this, skip synchronous memory.high enforcement on dying tasks > > > altogether again. Update memory.high path documentation while at it. > > > > > > Fixes: c9afe31ec443 ("memcg: synchronously enforce memory.high for la= rge overcharges") > > > Reported-by: Tejun Heo > > > Signed-off-by: Johannes Weiner > > > > Acked-by: Shakeel Butt > > > > I am wondering if you have seen or suspected a similar issue but for > > remote memcg charging. For example pageout on a global reclaim which > > has to allocate buffers for some other memcg. > > You mean dying tasks entering a direct reclaim mode? > Or kswapd being stuck in the reclaim path? No, a normal task (not dying and not kswapd) doing global reclaim and may have to do pageout which may trigger allocation of buffer head in folio_alloc_buffers(). We increase current->memcg_nr_pages_over_high irrespective of current in target memcg or not. Basically I just want to know if this is a real concern or can be ignored for now.