From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06485CDB483 for ; Tue, 17 Oct 2023 23:21:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8017F8D011A; Tue, 17 Oct 2023 19:21:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B1388D0002; Tue, 17 Oct 2023 19:21:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 679EC8D011A; Tue, 17 Oct 2023 19:21:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 589948D0002 for ; Tue, 17 Oct 2023 19:21:56 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2AAAEA0F3B for ; Tue, 17 Oct 2023 23:21:56 +0000 (UTC) X-FDA: 81356528232.16.3EF5C6F Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf18.hostedemail.com (Postfix) with ESMTP id 615621C0010 for ; Tue, 17 Oct 2023 23:21:54 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Y4MfHxQo; spf=pass (imf18.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697584914; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=AsfyhFO7p0RUOzLnEBAeg12shRu5DCBIJmIbrYlrbQQ=; b=Y2GeECdsiNDdXtKfMGqsF6Sqkqu+wpVQJMu8vPZeUOa+3Zes4Nhk34XqH8EzyD3MSg7ylX aUjE8wlapivNKwGR6BPCutcCHVs1bImKoBJGXNSh//45WmOaoa/rCm4xcSBrrfFQGe1n/c Nd86vs3B+bWM7qrVg0wRjAll9+NPC/4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697584914; a=rsa-sha256; cv=none; b=zgZTrbVR/k0osm9+CCyzW5hp9TLIW7GBASkHNejcHjGrQPqIsbNU56iKlI2fg+b75ZVObd YkIR23ByxTV8WC0kDJO6H11PHMdApSZ5Wcd5Fl43qbitn5My083Vsv341NpN6n4clsQMDA bnwoQXiqCOtFE0psrKOOjFUugmPF2H8= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Y4MfHxQo; spf=pass (imf18.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1ca82f015e4so18137685ad.1 for ; Tue, 17 Oct 2023 16:21:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697584913; x=1698189713; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=AsfyhFO7p0RUOzLnEBAeg12shRu5DCBIJmIbrYlrbQQ=; b=Y4MfHxQosOnrYm9ABeJRZiv5vJEPVlJL/Anq7ukJF+Qn5rpxzDcjM4C4tVwKAozoNP ZAVQahpw4keK/9o5sGUNYCIfXq8mtmPA3j/REEFBInlShxnhHo7/QhcYUn5IIcS7XaWN HoFxclJq9qWZwkeemYk1CvoGsErIfA7+OQYCL3DfJBWKYXZdWXjUoCeayCOO7XBEY6Ml QVZn2Q4MBCX6PVy+cI21m7T0WXrs7IfgDVShvSnqJHG+Zi2zzVq8pz62FGFxcqq1ID0H IblVhr7dNjdz5SgLLxMOmnajGAkiF38rRttz/2KbV6vX68q5JxdHpLAadw2ywqSrBSvL 61vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697584913; x=1698189713; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AsfyhFO7p0RUOzLnEBAeg12shRu5DCBIJmIbrYlrbQQ=; b=rdAn6ncvDbyFadEWpduB86NF6khpB3nM3fnWn8ifjcFaJphZ1S+RjR/y4jvTvB7xfh i5/0Vgusd/S+rGXLPPgTj44oHJw7CuQH8NADk/EhG5u1ZJ5UzHXShvmSAJ6AjrYoAKoL yUDoQgkAyDJW7hQtdHIcYVRwUr1e0KDQYWIJBVl3SxsoQxsYReiFlaX1Qfs68MvKcaNn ogDFsWMXlf6xkf4PQ9FNZwoo64H2j7JPm0uCnZ7MPafYIwMX6BaP4YPuGnjxlCkUN4xQ UcJXZdsG2ImCoYu+nC6RxTx2kNrhVyYZBmcKz0PkF/99ajY7mc/3HsicL+a+FI4yzb92 axBw== X-Gm-Message-State: AOJu0YxuP+BwH/ptO803CA0/mOxbLBbZ0WEjcw0udE85W1kPdGAETZvQ wwrghban5D7lYepQ/3xL3TE= X-Google-Smtp-Source: AGHT+IEB1bkpliZr4/0wwEWk35bjMHS8jcWP7je173r3nrnISt5OHU2yyH0fIdSym0BtiLsCs5IyAQ== X-Received: by 2002:a17:902:e5c6:b0:1ca:8abd:6b52 with SMTP id u6-20020a170902e5c600b001ca8abd6b52mr4311445plf.69.1697584913016; Tue, 17 Oct 2023 16:21:53 -0700 (PDT) Received: from localhost (fwdproxy-prn-003.fbsv.net. [2a03:2880:ff:3::face:b00c]) by smtp.gmail.com with ESMTPSA id t9-20020a1709028c8900b001b8a00d4f7asm2126736plo.9.2023.10.17.16.21.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 16:21:52 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, shuah@kernel.org Subject: [PATCH v3 0/5] workload-specific and memory pressure-driven zswap writeback Date: Tue, 17 Oct 2023 16:21:47 -0700 Message-Id: <20231017232152.2605440-1-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: mxzr83kcg98fsfqt67ahbzpgeus43z9g X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 615621C0010 X-Rspam-User: X-HE-Tag: 1697584914-517607 X-HE-Meta: U2FsdGVkX1+n2hqx+YTeBiXEK5wc9SggftSP/WdyqFhrpREQsF0/77jae+t5AUp8Z5ehuBFzygf6sA/qVw7I3ehHaJYbRR79U3vwV28p6EvvPLO0FhXCb89LZKsueXmigpjHO2Qoi0YElIwli34nToOb8F8ihaK1A9MNBfqp+NSWiaEfP2Zy9GFI4ChEFL+uTiPWYscNWBnW2zvvQjOUUyGMd2hiR2AGm36z51UuaGn6jUkA1PrLgquMWO66W5DOPKhd1leu4AOa4SRuw5ljtLCcli1PhxI9OI+ydzuboRpspySg+52BsNt3k5R/XT5iLb3mdhVTvcyPp+wceaCIh/btLmLOcXVf9y6sJAKO6Pjp6O5Hi6FHUx/b8dVaLgyF7GENj5nPpaCNtD7yItmmMrQKnhjLJO6u74h4oiXq/RTmB6jVkIMO0CLuKoYc2OJ4rt5NmYdmWCuoUJGCvaeySNoUnLzP/Cwx6ZlesskZQKLRrASO3EzDs2T1PanTVTIY55ZjvW7qh00mGS3N72XYWVaHnHUJu74PX/ZRxZ554YHemw0fS/zlC5/lkFuw2FIaXzNwEnmfv3LS2uiglhdcbUcUXFGDo2Qu2mVhGJf/riFEHz7zky4SE7/nlFtIlPV3g2oiRkJfHEEGrSoEtOZdOSjJ01zl+obppmmEljir02ooyna4v1Qx995CCB3XQI8xeIzLusiUyt9LWerPOFEBw/qMTh7ifzRgSwFwdrApcYGgYK2/LxIbJaGpZlVGY8LlIXMhouWv02NlpHr4LFK10OwsU0vH5BCIJ0rVNOqkfH6+svBZx2bh+zvFx9JI4YuorQxe+ty+q107bjvRV7TRJoJ0WnG/t7PGt76UVoNvwmcmW46+q5OwzhH/cRZuolt8NPOg45JzlzuDO2x0lXs4AQANfZ3dqzKUmvT51+ErBLv3VNnTJ8pvw8b2IJALhw8PdtGr3Nn+PPL8Oo1/wVY 8xqMZ03r UjIlUlgxldvL0gZdXhGtWvb888jJjN5EaTtM/KyNgL7zqdkEdxpClO5dliJDhF5QhO9yGn4XKy5bIL9EYyMq2dax0Jly0zcABJDxhCUCjJ/GeUVxbqd+httsN32sw0+HEVF5sHfQoI4w6DLyB8a114dGLiJBbV82EO2cSU8pwC2h8iw36l3hXjpV0NGR9AJELgnFNQ6KzxGoxmVRSv+0L+8yL5EDowJn6z8khuLl0GgVuXYO0Z5KlL5f4+KGwj6XZzrjd/eCmOW0nXtRQx+sF7VGYnHTKh9lTMjXBvnZ3xAHVY44yqnJ1CzMQoriAahlzjTcOgd4RR1DTrKL9Gezpb/3av7stPozNK0/tuqV2VKwE8QqTELThPybfI3K46rrlrV0oq+i0wmaspu6tp1dUP2M01fKLvRVHpO6Jb5RwQFeotK1CAuv+fjwYesbZHVHdM1WJSQmtRCmsm/RUEAKFBGgF4kOypaoov/Tm+5QJc2cBwSc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Changelog: v3: * Add a patch to export per-cgroup zswap writeback counters * Add a patch to update zswap's kselftest * Separate the new list_lru functions into its own prep patch * Do not start from the top of the hierarchy when encounter a memcg that is not online for the global limit zswap writeback (patch 2) (suggested by Yosry Ahmed) * Do not remove the swap entry from list_lru in __read_swapcache_async() (patch 2) (suggested by Yosry Ahmed) * Removed a redundant zswap pool getting (patch 2) (reported by Ryan Roberts) * Use atomic for the nr_zswap_protected (instead of lruvec's lock) (patch 5) (suggested by Yosry Ahmed) * Remove the per-cgroup zswap shrinker knob (patch 5) (suggested by Yosry Ahmed) v2: * Fix loongarch compiler errors * Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap, making it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in the zswap pool. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. As a proof of concept, we ran the following synthetic benchmark: build the linux kernel in a memory-limited cgroup, and allocate some cold data in tmpfs to see if the shrinker could write them out and improved the overall performance. Depending on the amount of cold data generated, we observe from 14% to 35% reduction in kernel CPU time used in the kernel builds. Domenico Cerasuolo (3): zswap: make shrinking memcg-aware mm: memcg: add per-memcg zswap writeback stat selftests: cgroup: update per-memcg zswap writeback selftest Nhat Pham (2): mm: list_lru: allow external numa node and cgroup tracking zswap: shrinks zswap pool based on memory pressure Documentation/admin-guide/mm/zswap.rst | 7 + include/linux/list_lru.h | 38 +++ include/linux/memcontrol.h | 7 + include/linux/mmzone.h | 14 + mm/list_lru.c | 43 ++- mm/memcontrol.c | 15 + mm/mmzone.c | 3 + mm/swap.h | 3 +- mm/swap_state.c | 38 ++- mm/zswap.c | 335 ++++++++++++++++---- tools/testing/selftests/cgroup/test_zswap.c | 74 +++-- 11 files changed, 485 insertions(+), 92 deletions(-) -- 2.34.1