From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFB4AC48BF6 for ; Mon, 26 Feb 2024 12:34:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E651444014D; Mon, 26 Feb 2024 07:34:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DF134440147; Mon, 26 Feb 2024 07:34:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C66DF44014D; Mon, 26 Feb 2024 07:34:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B1DAB440147 for ; Mon, 26 Feb 2024 07:34:46 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 82D11140813 for ; Mon, 26 Feb 2024 12:34:46 +0000 (UTC) X-FDA: 81833898972.27.D7151BC Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) by imf16.hostedemail.com (Postfix) with ESMTP id DB7A5180008 for ; Mon, 26 Feb 2024 12:34:44 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Y+lyrTmY; spf=pass (imf16.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.222.177 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708950884; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3GYXuW0S141vY9u9jWVu2+CyF0XW87te7rNLegsLKu8=; b=p0IEe4JH21baXDq1uGDRHtnCHVS87FzyHEQvvdNeC7i4Hv20o0VkVXxerywmM9UdN7OUuG b+Wc4g8eZZFxzkTJJ9qYDRxw5lXc+Ck0OrWScIg3ioOzFjSmDkYGr/Ouz2uVK5DtrMTLqI WwLla2HSQgBLvjbKu7zL4OjIWDVHUYQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708950884; a=rsa-sha256; cv=none; b=jjte0Ub7kjkxvB2kT3cbkTpjOSwCJE7hw9M1SbXXnJGQe7ckhX/y2SD9/F7XkP/ZpzIjdn juDzO399FlJ9B3IaH+MRMaask8t0190DDrbzcU6KIUnw/EpG5iZqhgsAyhFynUyCZMOJCF GOyeeDF4/+twB5ZzvF467CbbxIbgp9w= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Y+lyrTmY; spf=pass (imf16.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.222.177 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qk1-f177.google.com with SMTP id af79cd13be357-787c2e38161so126826985a.1 for ; Mon, 26 Feb 2024 04:34:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708950884; x=1709555684; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3GYXuW0S141vY9u9jWVu2+CyF0XW87te7rNLegsLKu8=; b=Y+lyrTmYy/OMUrneORdHabvhYtfBHgLLDBmJULsmFXFb3jvIjPqgWTN3P5VXASyBxV Lb4X3WapDorPxoz2wJIiEcAXy7SlaeYg5m9kpcwsW2CHMx+gx4CmgBfB0ZTo6xc7CV+0 peHCXOiPx90CYRxmcAE9xT9vQfoB8CBjIX5qamr2DG1vXxNoU/DXF1dTdwPT4E4ibS/3 OVLFT0oBkNVZK19KHKj2mtCUHOpKFB58h8z88LjxtF2QOAtQ2dLUx/mwGd4RPT1BUt6T Y8yqXlSSTT/DVcQFHDqdVDHncpykwd0CqdCrY1yQbhI4c9inIC2Ubcek8nco2O9Ms6Y8 Z3Hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708950884; x=1709555684; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3GYXuW0S141vY9u9jWVu2+CyF0XW87te7rNLegsLKu8=; b=KMqXZZnI28+OsyiR4sw4ZNoJ5hCsVTkADkohxfCN9WKtv+wgdCY3ACN52syLyf+hQW +Ofcvl1/ICjfI0V2Fzod3fgD/yNjZQmrxNdiWAn0X+t4Wfi7Gljtk0z2XiKlggbtZhue e61u3D1r67bcEtswnht/jcVc+VHziNmXOj9Z9VqD5o22gUgr0/+1Gm6OtjRiwWQp1wTK KSvEGnrsjdy0bNfHgnsIOXsmY2pbuGUg1LOsPxgrKrjalkzdC8zo9KUWyznxth5i5y9s 0Uxz9i9a+xvHIaiLMQs8Pd/+jZOWAZ8yEj34ZrohPOoB0RLIxw3phOiTxxtUHGyLzN16 bWwg== X-Forwarded-Encrypted: i=1; AJvYcCWxUe1bvd18CYK8PmyBStgoMzEPSo/3pX/ZVr2ANB3aVFFt7m5z602+Y9SEFHFKVFz9NOhHFvz/tm/jqE2FR0AsUz4= X-Gm-Message-State: AOJu0Yx+0l71t/qpDGBSDaUGxXWmtExPGCgzs76hHB2BlJP4bEt+DtVD mGpS7U1YEwmUnIgdhvgM91bjjM8yEAFZyfvvbfvKZpJ9PQuX6yvXMPufYFjRw+rVvIJHpqROZFU mx+cOnlF6puewAUE2GfYQS1DePRQ= X-Google-Smtp-Source: AGHT+IExBjtf5VYOaTxnq5i5Kzn7N22HZ6tzwriBeK8kQENZLNjTBv9Wte/MFW70b+CNI7+fO81Gm4Yvas64gimCaxs= X-Received: by 2002:a0c:f04e:0:b0:68f:f634:f1ed with SMTP id b14-20020a0cf04e000000b0068ff634f1edmr5299204qvl.55.1708950883981; Mon, 26 Feb 2024 04:34:43 -0800 (PST) MIME-Version: 1.0 References: <20240225114204.50459-1-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Mon, 26 Feb 2024 20:34:07 +0800 Message-ID: Subject: Re: [RFC PATCH] mm: Add reclaim type to memory.reclaim To: Matthew Wilcox Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: DB7A5180008 X-Rspam-User: X-Stat-Signature: zdyyd6tptgwko8uaj3r7ky5pdfi9j5gu X-Rspamd-Server: rspam03 X-HE-Tag: 1708950884-7242 X-HE-Meta: U2FsdGVkX18pmSZ0stL67/rnGwTsebbqxqdJ3/pRSfCWg6PQ/sukQM/WXmgyW/nHjlLGeZ9JQVkKeK6Gq7dLmwzoV3UTXPsUXUGZbTK0wrYDSdZi6AYNhevyIB5QLA/oH1TtJwlQHn7aFIKfdcCHk3PgKqBFUya+3h9VHwsSGvmoRfR524YdADfxREnD/b/3r9RVg9aGvPO5sZIHwFsloRjyi2I17i+SXICZYEV9qhOSY8cNcVsbSCdaVmrbcuhn56lvU6WFfr2Frm1XwRcB6m+k8clEm+1nXVsncQQ3zOf/KkRS2e5zGMDxDrsEDfF8VQeJZm8yJlqeCO8Cf69Int+B1PEby5GBWYEDTFHzNgotnl5X7IRXYHV4Xspj3EnQ79Ocb+1VTBowQkkTx4myOpR51SdzixMfRPyDZ2p6A47RsI1/MITswWKyoL7yYvL30CP71dd6NZL67S2BeBM8/T4lhH4rX6Qc6aLBwUPxaPk0/5/QxLC7sfMFGu8izHmcyoQRIwZgdpsf7V0topcoWQ0kldQP78UesxqceZGwGWC9Nf0JFGx0XnfrjLG5oWsjA4HcsuFcIZVzLhOHVpDlj9x7Bd0u9vpuT/4fjzSSDfVTtiMLUvACKlvVweY6b4wcXZ/4KktUAE5NKEPPrKyIF6CeExfApGV6+x8ZOgOenAOX8omAWk+Y1JD9s5vV0LBvjuuC1HvgMAB7fbxZ9uvO7GuenzWczUUY34h/1195JeiirByWQPyxhZmf7GAzLC9Bwkg7obxLLt/YHGjV8UmIrzkDhE3/EBRJxLe07Vh/lH/bpQ8NapE+x7v4DazEnhWSfk1+7G1uvcyfHZlak2n26FU7ac36jyHbWuppt7MR6gCFVHUX+DapgBCd0Or/nFz9HVeiE2K9700WzpqXvSEYm1ASFklJBm3VEfELP7MVJox7M1QDm7zPhAja0gZKu1cT7KstGRN1zVyIz5+A+ph HcpuniLm niV1tRj4iyrHb16+P2YpAgRe8L2QXr7Ug4LPLIJX5D4F7JdIIXKX1qWL1SxG7EMLloFVt X-Bogosity: Ham, tests=bogofilter, spamicity=0.045363, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 26, 2024 at 12:17=E2=80=AFPM Matthew Wilcox wrote: > > On Sun, Feb 25, 2024 at 07:42:04PM +0800, Yafang Shao wrote: > > In our container environment, we've observed that certain containers ma= y > > accumulate more than 40GB of slabs, predominantly negative dentries. Th= ese > > negative dentries remain unreclaimed unless there is memory pressure. E= ven > > after the containers exit, these negative dentries persist. To manage d= isk > > storage efficiently, we employ an agent that identifies container image= s > > eligible for destruction once all instances of that image exit. > > I understand why you've written this patch, but we really do need to fix > this for non-container workloads. See also: > > https://lore.kernel.org/all/20220402072103.5140-1-hdanton@sina.com/ > > https://lore.kernel.org/linux-fsdevel/1611235185-1685-1-git-send-email-ga= utham.ananthakrishna@oracle.com/ > > https://lore.kernel.org/all/YjDvRPuxPN0GsxLB@casper.infradead.org/ > > I'm sure theer have been many other threads on this over the years. Thank you for sharing your insights. I've reviewed the proposals and related discussions. It appears that a consensus has not yet been reached on how to tackle the issue. While I may not fully comprehend all aspects of the discussions, it seems that the challenges stemming from slab shrinking can be distilled into four key questions: - When should the shrinker be triggered? - Which task is responsible for performing the shrinking? - Which slab should be reclaimed? - How many slabs should be reclaimed? Addressing all these questions within the kernel might introduce unnecessary complexity. Instead, one potential approach could be to extend the functionality of memory.reclaim or introduce a new interface, such as memory.shrinker, and delegate decision-making to userspace based on the workload. Since memory.reclaim is also supported in the root memcg, it can effectively address issues outside of container environments. Here's a rough idea, which needs validation: 1. Expose detailed shrinker information via debugfs We've already exposed details of the slab through /sys/kernel/debug/slab, so extending this to include shrinker details shouldn't be too challenging. For example, for the dentry shrinker, we could expose /sys/kernel/debug/shrinker/super_cache_scan/{shrinker_id, kmem_cache, ...}. 2. Shrink specific slabs with a specific count This could be implemented by extending memory.reclaim with parameters like "shrinker_id=3D" and "scan_count=3D". Currently, memory.reclaim is byte-based, which isn't ideal for shrinkers due to the deferred freeing of slabs. Using scan_count to specify the number of slabs to reclaim could be more effective. These are preliminary ideas, and I welcome any feedback. Additionally, since this patch offers a straightforward solution to address the issue in container environments, would it be feasible to apply this patch initially? --=20 Regards Yafang