From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F6C8C433EF for ; Mon, 7 Mar 2022 00:50:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 519E78D0002; Sun, 6 Mar 2022 19:50:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4C91D8D0001; Sun, 6 Mar 2022 19:50:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36A848D0002; Sun, 6 Mar 2022 19:50:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 29BF98D0001 for ; Sun, 6 Mar 2022 19:50:02 -0500 (EST) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id CF0DA181F10EA for ; Mon, 7 Mar 2022 00:50:01 +0000 (UTC) X-FDA: 79215758202.31.2B94E6F Received: from mail-vk1-f180.google.com (mail-vk1-f180.google.com [209.85.221.180]) by imf22.hostedemail.com (Postfix) with ESMTP id 54CCFC0002 for ; Mon, 7 Mar 2022 00:50:01 +0000 (UTC) Received: by mail-vk1-f180.google.com with SMTP id s195so5102397vkb.10 for ; Sun, 06 Mar 2022 16:50:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=nxX6RjwNYjVc57XUjjP7gxnga93NFFhjkq7EW7LF2EU=; b=a5aBoyUqMvpdrxd4JyB9+pd+2vn+Y+9QKdGsjREW7X84RwmipGalVOSRtbzJilZDeb /RI2NYVecPpklxtNuN158LJoDIb+FcEuG9Y5nhBI9QJFvd/tPkV4wvCU4LS2u44NHlxB jWqYZX9d/Mg7lov0mBgT32DsfxWYlM6mnFlGeCe/qaZSGmbgRjEVwaP5LiuRf/O/gzSP NwAJebJkHJJWEX1koQ24VD7HmJ85KSCh/OZ73ev4tI2c4BgZejG7ElcHN2xsMqGb+DM0 Taev9oMVs2XhROju/yEv0vQnlVW/JvSIaTeIzxKFLWzX7oAAoeZzVykhVwMapKMZBmHe Ccgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nxX6RjwNYjVc57XUjjP7gxnga93NFFhjkq7EW7LF2EU=; b=3wTvoqC3YtERt09b7ezjQC4HzahNqq1Z8o/BkRxpHgssIgbjLPc3G4t+b6t7bDdpLI xrtNSngdrVl59zquHjnPr15cNCBeSOVWJ12dzrIIApJe94PG+frgHAkj60kTP7eCmQoi lq373Kv+1E4CD3SHCF/FkJ2lSLpdFHSgl+ZNQlFBMFhObJvkN0oB6f/I1UIpkJ/o+JvI hhwvIciyqA1pKMu92BDp4o3HoFpxyVC1zL9YkJiYMxkcGvhmS86JPoVOJFaw9ZEtS2uf uLYrO5dJ1v7fNVLjSiqGx1GPcF3p11doB9QL1SBcKJV9K8nt/VLMwGxDvthWzpBSbC6h 1A8Q== X-Gm-Message-State: AOAM531p/qLy5uhR/ljStax7YwcomkunFUYwVJGUzUJRSr/hqKmVMr8u 4SeafmaNAFoXs+xc5QKWE/mhs4vDKrzd8HwYnwBhuw== X-Google-Smtp-Source: ABdhPJwPIXmdYizU29XlM6l+uAiDWVEOQgYBiWkSZYmvSEppOCmYMdVzUi0BRDQG5Ezgyrqb85jVC3seza2f3Ln3zzQ= X-Received: by 2002:a05:6122:208c:b0:320:3628:10be with SMTP id i12-20020a056122208c00b00320362810bemr2857782vkd.14.1646614200440; Sun, 06 Mar 2022 16:50:00 -0800 (PST) MIME-Version: 1.0 References: <5df21376-7dd1-bf81-8414-32a73cea45dd@google.com> In-Reply-To: <5df21376-7dd1-bf81-8414-32a73cea45dd@google.com> From: Yu Zhao Date: Sun, 6 Mar 2022 17:49:49 -0700 Message-ID: Subject: Re: [RFC] Mechanism to induce memory reclaim To: David Rientjes , Andrea Righi Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Dave Hansen , Linux-MM , Yosry Ahmed , Wei Xu , Shakeel Butt , Greg Thelen Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 54CCFC0002 X-Stat-Signature: b3mfzwx5ps9fgr76447ktn4zmzxpzxx5 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=a5aBoyUq; spf=pass (imf22.hostedemail.com: domain of yuzhao@google.com designates 209.85.221.180 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1646614201-171124 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Mar 6, 2022 at 4:11 PM David Rientjes wrote: > > Hi everybody, > > We'd like to discuss formalizing a mechanism to induce memory reclaim by > the kernel. > > The current multigenerational LRU proposal introduces a debugfs > mechanism[1] for this. The "TMO: Transparent Memory Offloading in > Datacenters" paper also discusses a per-memcg mechanism[2]. While the > former can be used for debugging of MGLRU, both can quite powerfully be > used for proactive reclaim. > > Google's datacenters use a similar per-memcg mechanism for the same > purpose. Thus, formalizing the mechanism would allow our userspace to use > an upstream supported interface that will be stable and consistent. > > This could be an incremental addition to MGLRU's lru_gen debugfs mechanism > but, since the concept has no direct dependency on the work, we believe it > is useful independent of the reclaim mechanism in use (both with and > without CONFIG_LRU_GEN). > > Idea: introduce a per-node sysfs mechanism for inducing memory reclaim > that can be useful for global (non-memcg constrained) reclaim and possible > even if memcg is not enabled in the kernel or mounted. This could > optionally take a memcg id to induce reclaim for a memcg hierarchy. > > IOW, this would be a /sys/devices/system/node/nodeN/reclaim mechanim for > each NUMA node N on the system. (It would be similar to the existing > per-node sysfs "compact" mechanism used to trigger compaction from > userspace.) > > Userspace would write the following to this file: > - nr_to_reclaim pages > - swappiness factor > - memcg_id of the hierarchy to reclaim from, if any[*] > - flags to specify context, if any[**] > > [*] if global reclaim or memcg is not enabled/mounted, this is 0 since > this is the return value of mem_cgroup_id() > [**] this is offered for extensibility to specify the context in which > reclaim is being done (clean file pages only, demotion for memory > tiering vs eviction, etc), otherwise 0 > > An alternative may be to introduce a /sys/kernel/mm/reclaim mechanism that > also takes a nodemask to reclaim from. The kernel would reclaim memory > over the set of nodes passed to it. > > Some questions to get discussion going: > > - Overall feedback or suggestions for the proposal in general? > > - This proposal uses a value specified in pages to reclaim; this could be > a number of bytes instead. I have no strong opinion, does anybody > else? > > - Should this be a per-node mechanism under sysfs like the existing > "compact" mechanism or should it be implemented as a single file that > can optionally specify a nodemask to reclaim from? > > Thanks! > > [1] https://lore.kernel.org/linux-mm/20220208081902.3550911-12-yuzhao@google.com > [2] https://dl.acm.org/doi/10.1145/3503222.3507731 (Section 3.3) Adding Canonical who also provided additional use cases [3] for this potential ABI. [3] https://lore.kernel.org/lkml/20201005081313.732745-1-andrea.righi@canonical.com/