From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C22DC433F5 for ; Sun, 6 Mar 2022 23:11:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD5ED8D0002; Sun, 6 Mar 2022 18:11:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C85388D0001; Sun, 6 Mar 2022 18:11:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4CA28D0002; Sun, 6 Mar 2022 18:11:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id A42998D0001 for ; Sun, 6 Mar 2022 18:11:26 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 576D32353C for ; Sun, 6 Mar 2022 23:11:26 +0000 (UTC) X-FDA: 79215509772.15.6899D77 Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) by imf13.hostedemail.com (Postfix) with ESMTP id 998F520002 for ; Sun, 6 Mar 2022 23:11:25 +0000 (UTC) Received: by mail-pg1-f175.google.com with SMTP id z4so12090261pgh.12 for ; Sun, 06 Mar 2022 15:11:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:mime-version; bh=jAm/WS6nNXBGGVXszReyZq9rCcJEjHYhtogTO5jmyyo=; b=jhSJIhoL0PgylxkBESDmBwpvSEXs/V4WLsvB1JsWa+orxThGNLYr6yGV+kPrZ1EitF UWxX6qY0CrDTWyODBQ8saAQJHKwEkUFA+tDvWKWfi19tQHPyq2yIHNptDVdZ8YjQIkEB kgxWQ1dAEK8UQOq0T5yc5ghsq6Qtw3lAna6/+aqNPDSLgqPJ7+3OX/7UruGkgdHWrcbW azySU7uwOs9won75+GRCwRPODZNylB1qs02SeQbuGBFyk88V2l0S4/A9QAtzZiG4Xc16 HpBJA7ynFVx6kgY2WDdLnYJvn8V8wZU6wG/ZFBfUefWbXrgEop/MpIvWR+lScHAoo+gr /Tkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version; bh=jAm/WS6nNXBGGVXszReyZq9rCcJEjHYhtogTO5jmyyo=; b=oeJmISbsAYyfRN/23Fze96v6zhKbRjbQtcjoLaPtGdKOuKzc5ltH2domU5bJQ8Yn1a B9Fl79gL0d56Adp/A1//BRg/Bml3kGh2mxdhNXfklMQMv/gWQ4WfQ213kPyqV9vyU39n +q7laFAdxmdC0Ub1g3rthjzNI2WNO7Xucl4FuIRIBXEI64TgVRTFeE/w/Z1+gDIr6Ew9 XtpfxZt/TyrDqGoRstUe8YytZ8lcBYPRVL6sbFKMi8a9jUXppMveP9PUk75lQgfKdXiE S5Aqcer/QWEQzBj+Vae4C0Jgz45oxB7tdZuCnfHD0FvuGfsdKbcJqiJutJkGWnFOrS6w qvXw== X-Gm-Message-State: AOAM530OQRpucDMaAfdF7e7S6btBvfqNwVkI6rOTf5P/RLm+ZkN7jBV+ bbMS9LJobgOvF3k2onEdE3dW1A== X-Google-Smtp-Source: ABdhPJyz4maap3Zcg5RGyq1WKsXmQ2GXi+pWLqthTYOEZ2krjSF2/UlhXQIKxYELMRHzf6NS7rQ1Hw== X-Received: by 2002:a05:6a00:228d:b0:4f6:d4a8:7f47 with SMTP id f13-20020a056a00228d00b004f6d4a87f47mr9146671pfe.66.1646608284285; Sun, 06 Mar 2022 15:11:24 -0800 (PST) Received: from [2620:15c:29:204:5f87:a605:2b59:e392] ([2620:15c:29:204:5f87:a605:2b59:e392]) by smtp.gmail.com with ESMTPSA id j11-20020a63230b000000b00372a08b584asm9797823pgj.47.2022.03.06.15.11.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 06 Mar 2022 15:11:23 -0800 (PST) Date: Sun, 6 Mar 2022 15:11:23 -0800 (PST) From: David Rientjes To: Andrew Morton , Johannes Weiner , Michal Hocko , Yu Zhao , Dave Hansen cc: linux-mm@kvack.org, Yosry Ahmed , Wei Xu , Shakeel Butt , Greg Thelen Subject: [RFC] Mechanism to induce memory reclaim Message-ID: <5df21376-7dd1-bf81-8414-32a73cea45dd@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: 998F520002 X-Stat-Signature: erbf8ndc8fkxgtfjoohkp4nk3ha1w9se X-Rspam-User: Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jhSJIhoL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of rientjes@google.com designates 209.85.215.175 as permitted sender) smtp.mailfrom=rientjes@google.com X-Rspamd-Server: rspam03 X-HE-Tag: 1646608285-609151 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi everybody, We'd like to discuss formalizing a mechanism to induce memory reclaim by the kernel. The current multigenerational LRU proposal introduces a debugfs mechanism[1] for this. The "TMO: Transparent Memory Offloading in Datacenters" paper also discusses a per-memcg mechanism[2]. While the former can be used for debugging of MGLRU, both can quite powerfully be used for proactive reclaim. Google's datacenters use a similar per-memcg mechanism for the same purpose. Thus, formalizing the mechanism would allow our userspace to use an upstream supported interface that will be stable and consistent. This could be an incremental addition to MGLRU's lru_gen debugfs mechanism but, since the concept has no direct dependency on the work, we believe it is useful independent of the reclaim mechanism in use (both with and without CONFIG_LRU_GEN). Idea: introduce a per-node sysfs mechanism for inducing memory reclaim that can be useful for global (non-memcg constrained) reclaim and possible even if memcg is not enabled in the kernel or mounted. This could optionally take a memcg id to induce reclaim for a memcg hierarchy. IOW, this would be a /sys/devices/system/node/nodeN/reclaim mechanim for each NUMA node N on the system. (It would be similar to the existing per-node sysfs "compact" mechanism used to trigger compaction from userspace.) Userspace would write the following to this file: - nr_to_reclaim pages - swappiness factor - memcg_id of the hierarchy to reclaim from, if any[*] - flags to specify context, if any[**] [*] if global reclaim or memcg is not enabled/mounted, this is 0 since this is the return value of mem_cgroup_id() [**] this is offered for extensibility to specify the context in which reclaim is being done (clean file pages only, demotion for memory tiering vs eviction, etc), otherwise 0 An alternative may be to introduce a /sys/kernel/mm/reclaim mechanism that also takes a nodemask to reclaim from. The kernel would reclaim memory over the set of nodes passed to it. Some questions to get discussion going: - Overall feedback or suggestions for the proposal in general? - This proposal uses a value specified in pages to reclaim; this could be a number of bytes instead. I have no strong opinion, does anybody else? - Should this be a per-node mechanism under sysfs like the existing "compact" mechanism or should it be implemented as a single file that can optionally specify a nodemask to reclaim from? Thanks! [1] https://lore.kernel.org/linux-mm/20220208081902.3550911-12-yuzhao@google.com [2] https://dl.acm.org/doi/10.1145/3503222.3507731 (Section 3.3)