From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A814DC4167B for ; Wed, 6 Dec 2023 04:10:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1013F6B0085; Tue, 5 Dec 2023 23:10:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B2956B0087; Tue, 5 Dec 2023 23:10:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E94776B0088; Tue, 5 Dec 2023 23:10:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D73716B0085 for ; Tue, 5 Dec 2023 23:10:57 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B04A8C03F2 for ; Wed, 6 Dec 2023 04:10:57 +0000 (UTC) X-FDA: 81535067754.16.E94FBBE Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf09.hostedemail.com (Postfix) with ESMTP id C22AA14000A for ; Wed, 6 Dec 2023 04:10:55 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gHgfGMlu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of bagasdotme@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=bagasdotme@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701835855; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ObrC2jCtLGDAckpSfZVTn0tEKMdAtJUEwxrQR45+A9Q=; b=uG6lkRiIRG2EW1/uWlzWBkhtTwQ2nUzTHOOAZPZVslUAV9ibnPrwMeGdtvMmSoukXkSKLc XDPLGhOj3ZR/I4HQhWVG5qcKF0lgPvdJuF6pp+fknoVBy3XhbKfI1sRxwlWehhhQr8aDYZ VM/NgO31rhmNLZgCScl1FtlDjTuPnlk= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gHgfGMlu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of bagasdotme@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=bagasdotme@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701835855; a=rsa-sha256; cv=none; b=YTNdLjVy985TD/XX60sx/PGUNKis7Em4FLa6crVVjkgPEksil5IObx3t2kU+Yr4fUZYihn FWUC81tGyElucjmZbS0DUtCNk1r42XHUpWNGaZcY60NKZCIZyKsrmOW5pefy1zVvkheHHr P09Zx51fSe7IO4/O9StF6iDkifMdBfY= Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1d05e4a94c3so43833005ad.1 for ; Tue, 05 Dec 2023 20:10:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701835854; x=1702440654; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ObrC2jCtLGDAckpSfZVTn0tEKMdAtJUEwxrQR45+A9Q=; b=gHgfGMluMdlhTxUBWi0Wf6byWNFRVp6ITu7woEs3Q4QtdFMsi9t/8dlTGKS+P3GIhY V2EQlpl2D5T0ra3ctUAXBrAWHRsc9DL+Ucj4jYtkWOm2UqGWfPWCpdFC1OufMGvZY0FR N5D5n+bQEVTJzm7+ss8N3MtrkVFZ8bpap8UFPZY2Z5n2wc8Ky5wjLpvvGD5Iy0EnzhYe /nTaEfolyhQd0T7I/7dwaK77m3/SOk6hqigIzx3loaANl/R+KfrnErKFq5jzZTfcc53v mYyM1oMXbnfgsv9BgIpR2SfCJ3e41GXvct+mmkkjj1B82WjW8bDKsITa7zp2IO6MFE4H 2XWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701835854; x=1702440654; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ObrC2jCtLGDAckpSfZVTn0tEKMdAtJUEwxrQR45+A9Q=; b=LgT4xF7CiF+o/o5DWBgTtJrh7gBW3EBMdtZvaxWZM6hPRh3oWq6Ttsgx/xLZ1wcirN /Zv3K3p7ldGJIZjTvcVcFOZtjbecmVhxOyjOkZMiJDE4vS6shucLqUq8149acVDB1TX3 uiFFNj6UTdHNNGg6bgTVTxhXbr4H+DIlJt7Tpkrra/Ppu5axvKOK1pDq+q9c01/OTjq0 1uYKqD94IqG+mAG5KafRpD2bWtwmgrV2pKs5CPxtKKerePRc9GTp5zJPf0XKFSb5qpja nxL07to/tlgy3Kl9VYpKq1bDnpSqTw2Qw+xv0rE0OV1aTkBwqrdIjqzsUPh9obubjdjh 2KuA== X-Gm-Message-State: AOJu0YyOxhkiNgguq74C9xb0zKWiVVOgyGuk2lGsTHlUADSnmZx6zDVU fvcL2n9MtCQgi9+jajgAS1o= X-Google-Smtp-Source: AGHT+IEaAOoQGFwwH/kU2fUu/2MKaKhp7MePbfzzvQOCn6yYbZlMCkGzBFnYk7XBORY7u8QTpJQ4RA== X-Received: by 2002:a17:902:f7d4:b0:1d0:737d:2ae5 with SMTP id h20-20020a170902f7d400b001d0737d2ae5mr292078plw.87.1701835854495; Tue, 05 Dec 2023 20:10:54 -0800 (PST) Received: from archie.me ([103.131.18.64]) by smtp.gmail.com with ESMTPSA id ij20-20020a170902ab5400b001cfbd3f38dcsm9625744plb.225.2023.12.05.20.10.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 20:10:53 -0800 (PST) Received: by archie.me (Postfix, from userid 1000) id 8E9E81123835B; Wed, 6 Dec 2023 11:10:50 +0700 (WIB) Date: Wed, 6 Dec 2023 11:10:50 +0700 From: Bagas Sanjaya To: Nhat Pham , akpm@linux-foundation.org Cc: hannes@cmpxchg.org, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, chrisl@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, shuah@kernel.org Subject: Re: [PATCH v8 0/6] workload-specific and memory pressure-driven zswap writeback Message-ID: References: <20231130194023.4102148-1-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="cYRqWT32c2qY1zcr" Content-Disposition: inline In-Reply-To: <20231130194023.4102148-1-nphamcs@gmail.com> X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C22AA14000A X-Stat-Signature: p736tn44dqwzypimi7fn1dq64838fio1 X-HE-Tag: 1701835855-938322 X-HE-Meta: U2FsdGVkX186ai0VUXJqOmI9kTuu6ZbCpooJnrxHahnwRlsFYSWDMbwKjj7plCRuPI5oK4ju+rHOMCwwcesf6V+MbmF0YiB+bpHucrDAvpendxXHDRXVhtB+WUGnSqMEIUROZoerPM8XsYkdM2lHqOPyPkfOHp1t+B3OKhw/C6EK5YTJY/l/WBlvm1LP4+aZFZxMXPd0dqV/zobh9TjoP22pjSiFDR6sg8wWuBBb7Qh+D+PK0+2fIlV3LCxHdMTGNrLryiF1hUMQeChcOTiStok9q7jYzPUcPTVv2RYRnRKoW6d3S4+HCKtrWiKV5MC3QtPi9QVIHSot4eVbdy/riaH0mgXns68f9qYIqpDACws0InGdSzEL6YLcDu7FRhA6tv01SKpEG4wzN4WmpYzuwthD/FjIBnqMGMGFAaKmbVZ8RL4kpMHmQMmnwTl+FAdf4y3969P+Kg7jb469zwK6f7lMUjriKdDOLWIn9fuWC4o9Ag0UmskQSudG1/O8HVyDqMEf1U+t7Rdww2lwZypsD4UhrAXAK/OszwLbRS7Xr3Y8iQ6cWnaoyO/TRaPcWAWyuOrtqpg7upop4G4uB/YvnzHZdn5gCfkxorB3y9zHPGoYb46kbF4u2bw7a31Q3VDuGuqwt+Mn9D9W/M4s4Cznfvv5mEDjmCJHhnmXZsziKE/z8Z/0CE2NRGKJkEPSxDddSiLG4UEeePVefdj9G7/HHZgIoSpHlaYx8JrGZQSgKrJ+iRgOdxBO4dG6k/2w1OsCJX64NyKvSmHjXE0I0cC7qrgPu99UHpEFt5snnGzVyeyzYisqBi38D53awSvI8vIXw36JB+cD97ch0yh/EKjwXcL86lYSSoSaC9qq5lJbhLiCLjOoNwGev8DHrwSN2v3SX7C2Zxz142b72cdSJSLSCUYwZvuoN7qvfvLV+8xiiJhPW9FbYr/jBbDzyhXmTG2i4ydGOOQ9/H5+IQ81IKV gCRBB6h3 2G64MOwtWs04h0iMADgfGnSZVNfW2PYbhAExd1shH+VcSFK5Zbg/qovzoe6EA4c1VjFRPXv/mZC5sWa0AYQtKTFZnBwDbVZyAnLYlSx6pkKKJcYa/JaZMIjhs2jsUKC1oSvNSE5EohK1x7MB1OMy7MQ2S7Y47gPasDLmiC0gpgs/fxAIOYCvM2P2UZYwbPK6/poEQtMaYjCmr4N99Rdn3QhK0GG0DPncDAkUbWHUW7/0T449SPl96Qio2EnQJrbNcCzDwPHTbJc6OA6Kzj2pHDuWw5BDphj9zN2QfPAHqn5ymx03XGXf315ULR2SC9a/1H6egsrp7jQqGYtglTpXfJYGuWymQ9nhs7kxTQaQx+/VuqCpE2/xFDQ83YEl2gvlZQzzoJCMfLsDgzPVSOhPwdshtgSkPyrVS56w9YrFb+GGgOMT12sOJjmirr/eCWDJagGjQRLP93Oov4uNrMSAxK6BiWUQM/UcTH6f1IRGSwA8uiYmqXV8Nws1krBcmfwuSG3Zif7iyyhDGsFhYt7RBO6p62QnsoD3VPzXf9aicjtgZ6d1FwBQZX/cDXU8SLg+EKAAsKuBK6lp8Cl9grmB9d0TSA7svrz5xDsNmjxyy69usLUt3mE4A0lHV/hlZY68GuZHMXUn+3zbSQyo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --cYRqWT32c2qY1zcr Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 30, 2023 at 11:40:17AM -0800, Nhat Pham wrote: > Changelog: > v8: > * Fixed a couple of build errors in the case of !CONFIG_MEMCG > * Simplified the online memcg selection scheme for the zswap global > limit reclaim (suggested by Michal Hocko and Johannes Weiner) > (patch 2 and patch 3) > * Added a new kconfig to allows users to enable zswap shrinker by > default. (suggested by Johannes Weiner) (patch 6) > v7: > * Added the mem_cgroup_iter_online() function to the API for the new > behavior (suggested by Andrew Morton) (patch 2) > * Fixed a missing list_lru_del -> list_lru_del_obj (patch 1) > v6: > * Rebase on top of latest mm-unstable. > * Fix/improve the in-code documentation of the new list_lru > manipulation functions (patch 1) > v5: > * Replace reference getting with an rcu_read_lock() section for > zswap lru modifications (suggested by Yosry) > * Add a new prep patch that allows mem_cgroup_iter() to return > online cgroup. > * Add a callback that updates pool->next_shrink when the cgroup is > offlined (suggested by Yosry Ahmed, Johannes Weiner) > v4: > * Rename list_lru_add to list_lru_add_obj and __list_lru_add to > list_lru_add (patch 1) (suggested by Johannes Weiner and > Yosry Ahmed) > * Some cleanups on the memcg aware LRU patch (patch 2) > (suggested by Yosry Ahmed) > * Use event interface for the new per-cgroup writeback counters. > (patch 3) (suggested by Yosry Ahmed) > * Abstract zswap's lruvec states and handling into=20 > zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed) > v3: > * Add a patch to export per-cgroup zswap writeback counters > * Add a patch to update zswap's kselftest > * Separate the new list_lru functions into its own prep patch > * Do not start from the top of the hierarchy when encounter a memcg > that is not online for the global limit zswap writeback (patch 2) > (suggested by Yosry Ahmed) > * Do not remove the swap entry from list_lru in > __read_swapcache_async() (patch 2) (suggested by Yosry Ahmed) > * Removed a redundant zswap pool getting (patch 2) > (reported by Ryan Roberts) > * Use atomic for the nr_zswap_protected (instead of lruvec's lock) > (patch 5) (suggested by Yosry Ahmed) > * Remove the per-cgroup zswap shrinker knob (patch 5) > (suggested by Yosry Ahmed) > v2: > * Fix loongarch compiler errors > * Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM >=20 > There are currently several issues with zswap writeback: >=20 > 1. There is only a single global LRU for zswap, making it impossible to > perform worload-specific shrinking - an memcg under memory pressure > cannot determine which pages in the pool it owns, and often ends up > writing pages from other memcgs. This issue has been previously > observed in practice and mitigated by simply disabling > memcg-initiated shrinking: >=20 > https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com= /T/#u >=20 > But this solution leaves a lot to be desired, as we still do not > have an avenue for an memcg to free up its own memory locked up in > the zswap pool. >=20 > 2. We only shrink the zswap pool when the user-defined limit is hit. > This means that if we set the limit too high, cold data that are > unlikely to be used again will reside in the pool, wasting precious > memory. It is hard to predict how much zswap space will be needed > ahead of time, as this depends on the workload (specifically, on > factors such as memory access patterns and compressibility of the > memory pages). >=20 > This patch series solves these issues by separating the global zswap > LRU into per-memcg and per-NUMA LRUs, and performs workload-specific > (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The > new shrinker does not have any parameter that must be tuned by the > user, and can be opted in or out on a per-memcg basis. >=20 > As a proof of concept, we ran the following synthetic benchmark: > build the linux kernel in a memory-limited cgroup, and allocate some > cold data in tmpfs to see if the shrinker could write them out and > improved the overall performance. Depending on the amount of cold data > generated, we observe from 14% to 35% reduction in kernel CPU time used > in the kernel builds. >=20 > Domenico Cerasuolo (3): > zswap: make shrinking memcg-aware > mm: memcg: add per-memcg zswap writeback stat > selftests: cgroup: update per-memcg zswap writeback selftest >=20 > Nhat Pham (3): > list_lru: allows explicit memcg and NUMA node selection > memcontrol: implement mem_cgroup_tryget_online() > zswap: shrinks zswap pool based on memory pressure >=20 > Documentation/admin-guide/mm/zswap.rst | 10 + > drivers/android/binder_alloc.c | 7 +- > fs/dcache.c | 8 +- > fs/gfs2/quota.c | 6 +- > fs/inode.c | 4 +- > fs/nfs/nfs42xattr.c | 8 +- > fs/nfsd/filecache.c | 4 +- > fs/xfs/xfs_buf.c | 6 +- > fs/xfs/xfs_dquot.c | 2 +- > fs/xfs/xfs_qm.c | 2 +- > include/linux/list_lru.h | 54 ++- > include/linux/memcontrol.h | 15 + > include/linux/mmzone.h | 2 + > include/linux/vm_event_item.h | 1 + > include/linux/zswap.h | 27 +- > mm/Kconfig | 14 + > mm/list_lru.c | 48 ++- > mm/memcontrol.c | 3 + > mm/mmzone.c | 1 + > mm/swap.h | 3 +- > mm/swap_state.c | 26 +- > mm/vmstat.c | 1 + > mm/workingset.c | 4 +- > mm/zswap.c | 456 +++++++++++++++++--- > tools/testing/selftests/cgroup/test_zswap.c | 74 ++-- > 25 files changed, 661 insertions(+), 125 deletions(-) >=20 Carrying from v7, Tested-by: Bagas Sanjaya --=20 An old man doll... just what I always wanted! - Clara --cYRqWT32c2qY1zcr Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHQEABYKAB0WIQSSYQ6Cy7oyFNCHrUH2uYlJVVFOowUCZW/0RQAKCRD2uYlJVVFO o65MAP9/pT5frcsZd0LxwPcoVNeeGc0pNzxVGOZLmtka3xNlbgD3VXD447mxf7HC r0BOQRED5Yu59sVopP7c3ZGK4tuzBA== =huGs -----END PGP SIGNATURE----- --cYRqWT32c2qY1zcr--