From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 751FBE7717F for ; Thu, 12 Dec 2024 18:19:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F1E716B008C; Thu, 12 Dec 2024 13:19:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E7FF96B0096; Thu, 12 Dec 2024 13:19:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD2766B009F; Thu, 12 Dec 2024 13:19:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id ABA6A6B008C for ; Thu, 12 Dec 2024 13:19:05 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5DA32C05D7 for ; Thu, 12 Dec 2024 18:19:05 +0000 (UTC) X-FDA: 82887117936.22.773BDD8 Received: from mail-qv1-f45.google.com (mail-qv1-f45.google.com [209.85.219.45]) by imf09.hostedemail.com (Postfix) with ESMTP id A832D140011 for ; Thu, 12 Dec 2024 18:18:46 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GIDFlsVm; spf=pass (imf09.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.45 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734027531; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ko2oS+MRlT4GT5PWtXrB9gMB+qbOgZcujdySYKtFg+w=; b=XCkURi6o3CukRKeaPs2112aRPxzUKmfnIslEPJl9mW5E/k0RVHYYZ8LOxlO4a7CNEq6HSD I4ZKvTBdTeEc0JrLNrTWzix13eh1z9ZncPgjtYaIho5PoAgL24Ew0mSLHsaNXWvKbphlY9 qzEUIkTFZIbbRM1WmapMsf4wgNaREcg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734027531; a=rsa-sha256; cv=none; b=w/xf8jRlz5m7TyjIiXB62CbnEPKsQWxm/WfDYpe7dgf4pJEn+Z5gzKIg+eLCDP1e8GRFrw iCHwMMzJvKtkdw69/HTr9p+/TnpXLahuOcagjuaEiQPiDrCTHlRTkigsfxOyKBrPpUBHRO JRk+isqFe8pd9T2cjp0t5/lPUnt+LVI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GIDFlsVm; spf=pass (imf09.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.45 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qv1-f45.google.com with SMTP id 6a1803df08f44-6d896be3992so6153716d6.1 for ; Thu, 12 Dec 2024 10:19:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734027542; x=1734632342; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ko2oS+MRlT4GT5PWtXrB9gMB+qbOgZcujdySYKtFg+w=; b=GIDFlsVmCanvBNgMTzNR4xGc9L3Wr5usprv4dT7Oq4aw1KFMi+7/y9YRHV/+fnX7nU e90G5/Oao8X/ydFVBKgT+YzP5Jy3ttpSSJsjkillA3/3nMUbuLIswpLFYXa0Z5oQKoX+ 5ZKqDp0vDjKJ7S0d7WUIDnYgym33xluGdVO8zvBqYsmoNWnZCQ0bcVAVGX7gH074a+/v EB6Xr+G36sZan3bOK+bwSHGd9TcZjntsNXIPlYEUEtgD6kvVjTrFRyx5ToE3Od+Dce8E ZooDFT0jamMUKbxwcWgPO473qvuHQDFiST8WrhWxuNCth5zvhhW2SmaVmZoWfBRmwPTv RSsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734027542; x=1734632342; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ko2oS+MRlT4GT5PWtXrB9gMB+qbOgZcujdySYKtFg+w=; b=n1p85BvO3h9SEyb3ONRdFeOgEa1t3O3DByS/646lZ8/xoCIxwBbP2VtxNKnnvsWAI+ +uiC+ASKbOHgt5UrU0hzTbxZa5bMH77P2snSHCFKK977vOnYy94f9/18y6uJa4KyeCYG FrqBopdqVRtjoP4YZNt2dNF/+yBN8rhgP/MP8Q1WW28f3+sSOYDErcAhXS/N7CBtbh5a nXJJ+ok/aVARWwnIzmN7qjQ/YOVPDSKkT9FEWGjPLcRAu2hmWi05bg9w20Ijf2fyGQ+Q JsMb2sQraMNL4ANHjryKMcl8s1ZPSkqzFeOvkDSCQAO5yd/lRzmUAwcdCvpExFb2pYme +QPA== X-Forwarded-Encrypted: i=1; AJvYcCVXNFdOpE/KF+VXZx8TP8FhucHpflLkRjnzWWEIRfN7dBp+a99WQ4BSMk11piCuM/W2aw/6zeBhwA==@kvack.org X-Gm-Message-State: AOJu0YxW4AQ0f8agudkx/S2AboQ9Oio5C8/PXF+ydAA7dVj/831pSxQn fQlup8h46km9zUvAeWWLmrPCBH2wDkaAe4BpyfIfHOHXclrxkxPb568rHuWH2MpFCwhb4e06MBR AD0basADc4WFipPbyRursKafXJPQ= X-Gm-Gg: ASbGnctoPZ619xWsL0GdiQL6Lb6Yi7h05Hzr94V09xLYyIHCfmaYbBzIW5Yo2h9Pk0n JtNWBl9mEYa3sBHCwa0blQ0UjVy7gta3cQrGzgRT7kjYF5y3cjjxOfg== X-Google-Smtp-Source: AGHT+IFNoU3CLIuT8qLnj94iT37F6awSRZbDnnjbgfCpFQ8D7WtTtHoh9SG4GO68UHHvEcwk+QowvfA25aJLdANqXQU= X-Received: by 2002:a05:6214:2388:b0:6d8:9124:878e with SMTP id 6a1803df08f44-6db0f709160mr20363076d6.4.1734027542519; Thu, 12 Dec 2024 10:19:02 -0800 (PST) MIME-Version: 1.0 References: <20241212115754.38f798b3@fangorn> <4oxovutecmn7mkbbmbk3rhqudilivf6fkedvmcbcttmcspwebl@fp6pv2a45x6n> <0d9e676686db8e2025bc0c6dc2b55d17d9f16290.camel@surriel.com> In-Reply-To: <0d9e676686db8e2025bc0c6dc2b55d17d9f16290.camel@surriel.com> From: Nhat Pham Date: Thu, 12 Dec 2024 10:18:51 -0800 Message-ID: Subject: Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap To: Rik van Riel Cc: Shakeel Butt , Yosry Ahmed , Balbir Singh , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A832D140011 X-Stat-Signature: enzmej718yx88d465ij4i419uniareud X-Rspam-User: X-HE-Tag: 1734027526-564188 X-HE-Meta: U2FsdGVkX1+Hia+WBzXOt5ka++h3RQkS3GfhQsQJwJk6P/KGri2NdyFeZsj4qYAsWEfbKMnBxXptkCinLvgY9iod8bVYQCxTUGJ/fP4k+id4pir6N3MGhbAeFX1kFwII56KNaSCYdINgFeKar2x0mbR25QVM3jzc7DQoaGw/xnYyvQYCwRim8c6dWwnhj8v3Cod5fdEU9fsg/mNnSpgdPVHHCBvnRu0LzL4WV/qk4vnPnE2e6scA7yMCP2g+zb/YdGWNF933o+OIT/tvoVxc8IN1HmHxBvqEsnHrJ/hh2sfWg4NhdY0FGJkbUoTJ07zQBbKA8Xr03E52M235RqHmc8s0yNq8kjhWbqLjs4pkvygCvUeG6tOiR8usZ/yxFPmh+tecV9ltEDQ3njWVVdMJgerC0TgtIOmh1ZnP9k4MGKOYxSqGpGUR5B7mrdQBuyNOrZqLhIDF66rmfCaajIP9tdOAhmOQJMlgh4EjJA3JJxICN9qtX499FtbJCYY06y1tgWfMy7sAZadySv0AgrdFsYXf9WbAR4anz9U6JoSMJ1/0Y6v/0TXo77N2hDNKdFlgnzS/6couSgSpBzgtqM2GPyLeBZjWJEUxpjEBZJXBdPVijTnHTICH4ZqdIW3n+F7xSoONmDDXYfQTMImdbil1prNvAmFN3+JfrT2aH3KnboK7jM/UwKeaZVrwSlnbnww18AO3vAVAh1enJu2DTn+7zrPLImj/eiMipITsgK8VUH7YzPOZmTdxhCElkb0fDDLK/k1GhdCzNzOwbZETOG3bmt/T+t0ShMjk8jMrmpT4S8P5msxAR864o5QhEpaVBsJbSM2nAn5WS+El/FXFx8n310b3GhzRxW9IOjiPdwiSphT11se/elV1D14ZpmDjw/mVUkbIXpFhUC411TKn06YzdTnQ2QMr8rCklCdJj8MRgdqjUK3rSEHWaMLc3Ho0bdHE0XzWfwQAucv9EWC39R7 y8Vr4A/8 rP52Y7Z3MBIQKlsuxgoCNGLNSRJlm03jnt6qp8AXF3SQ1peNwqNFEzroRgaPNACmD1X7Oz+++h1VvVWxl4e+vYJb0Qo2EL0hQZPDCGm7gf/j1gACNJ8rSZ7BPkchMH21RVfmPfypHlcya9zEQKVFcz6FBInU9tK5zEZ2JzuQr6od8dYfTpI73kC2CJDU2dSfAPZC4No/dW2S8fpETGlx7tYHNpuIJcly/HSJOm4wMnZCHUlJ3NpWTGn9KcrMsH4W+geif8Moy3TcGMxWAyKT7WPiPHC8bkf28abLKV7lb9R8u2gY8ySnH9mF+KmFjrREWMqWtETLxLsJpcpUJ+iuvI7vxwAAVB+KoAJRuRQOyKshxsxfUm8lJ+Ss8MDwHsvGz4Lkc X-Bogosity: Ham, tests=bogofilter, spamicity=0.139737, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 12, 2024 at 10:03=E2=80=AFAM Rik van Riel wr= ote: > > On Thu, 2024-12-12 at 09:51 -0800, Shakeel Butt wrote: > > > > The fundamental issue is that the exiting process (killed by oomd or > > simple exit) has to allocated memory but the cgroup is at limit and > > the > > reclaim is very very slow. > > > > I can see attacking this issue with multiple angles. > > Besides your proposed ideas, I suppose we could also limit > the gfp_mask of an exiting reclaimer with eg. __GFP_NORETRY, > but I do not know how effective that would be, since a single > pass through the memory reclaim code was still taking dozens > of seconds when I traced the "stuck" workloads. I know we already discussed this, but it'd be nice if we can let the exiting task go ahead with the page fault and bypass the memory limits, if the page fault is crucial for it to make forward progress. Not sure how feasible that is, and how to decide which page fault is really crucial though :) For the pathological memory.zswap.writeback disabling case in particular, another thing we can do here is to make these incompressible pages ineligible for further reclaim attempt, either by putting them on a non-reclaim LRU, or putting them in the zswap LRU to maintain total ordering of the LRUs. That way we can move on to other sources (slab caches for example) quicker, or fail earlier? That said, it remains to be seen what will happen if these incompressible pages are literally all that are left...? I'm biased to this idea though, because they have other benefits. Maybe I'm just looking for excuses to revive the project ;)