From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82338C4332F for ; Wed, 8 Nov 2023 19:46:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 03F4580025; Wed, 8 Nov 2023 14:46:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F31178D0073; Wed, 8 Nov 2023 14:46:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF8C380025; Wed, 8 Nov 2023 14:46:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CC9958D0073 for ; Wed, 8 Nov 2023 14:46:37 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9C9CE401A3 for ; Wed, 8 Nov 2023 19:46:37 +0000 (UTC) X-FDA: 81435819234.15.A8B519E Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf20.hostedemail.com (Postfix) with ESMTP id 75CF41C0005 for ; Wed, 8 Nov 2023 19:46:35 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=jScmJUFa; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf20.hostedemail.com: domain of chrisl@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699472795; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5tESkBuztGIp4i+OYLm774pOKwh/GrKGVdipvCTck0U=; b=NEmLpnaCA1lyriqt/JerFKvq+AMCGlyumdhnB210OmbZaNXcDr3ngT3fUBME4u2bcoQRiS bh99bXLr7HpT7gTNycOnOXLPjoIClHEVQIOvftJStMI8t9q2SVVYBgdGoe3CSl1HSbuAl+ 9Zmb/aSCqQ9xW9kfRxdR5d123/sLw1A= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=jScmJUFa; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf20.hostedemail.com: domain of chrisl@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699472795; a=rsa-sha256; cv=none; b=CnHaaj1yha1zXc3AjnFfYf7RJenttJID7OcZvmt/5iI1nudYPhC10ZsZ28oFtO7bIN74sQ /e1cr7zoCjbd2eEShJSL2/gUoE08IdqfM+G170r4iPSeLZq0N/Q3yAjnVuEA7caJ3Ha2qD 4bbwnmCsFtR8YvyreJN5Yny9uC/387A= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by ams.source.kernel.org (Postfix) with ESMTP id 40CC9B81E6F for ; Wed, 8 Nov 2023 19:46:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1ACA9C43395 for ; Wed, 8 Nov 2023 19:46:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699472793; bh=YINR+xn2kpfl6uzkjS9RJbv6oX776g7fQYmfKSYDiKQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=jScmJUFauJ90F24mz5IExJRJYcceWrAzg85YbZCIOjA1APtE9w1T73H/BqkPeKOE8 JGHfhYWtf2/hFv1cVRiGRlw4ETNcDN+NOsdt6z31SdrF5hQu+UJJ1i8grwzAgcduS+ Cex7bisT55EAOhIggke5dNA/DQa5NgvnCd0jN4cBkiQSzEUPoeWTAMEZ9WBpNP6RLC QEc6iqkNkKevLVqRwqtKoq+KBdBxOThrIQb+q8UHYHWNNrwkHsOTZABguFL1ukmSRg BtGQKhWSok+zVcwfVImU+bilOTo5YxU2F1GzRUguwuNLyTdpMxrcOqF6mY6GvLKQ3l XnhEo1Dc/ccfA== Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2809b4d648bso25377a91.2 for ; Wed, 08 Nov 2023 11:46:33 -0800 (PST) X-Gm-Message-State: AOJu0YzCyOWtVCvR3C6vG2g0aJuPWTfMZEid7byYSGJWF+sWSaizOEP1 VpRSa/RD02rymI3EAW7nG/3siLW0D+XG9dLXucp5AA== X-Google-Smtp-Source: AGHT+IFM+aRvsYeowp3+mfT2sb8EaWBpV+G26oNb9+NL57CiC2qZwMd08rgn/3taWt8x9hi+rhSfSDQvjz9bUAm6qjM= X-Received: by 2002:a17:90b:38c1:b0:280:48d4:1eb3 with SMTP id nn1-20020a17090b38c100b0028048d41eb3mr2919828pjb.8.1699472791978; Wed, 08 Nov 2023 11:46:31 -0800 (PST) MIME-Version: 1.0 References: <20231106183159.3562879-1-nphamcs@gmail.com> In-Reply-To: <20231106183159.3562879-1-nphamcs@gmail.com> From: Chris Li Date: Wed, 8 Nov 2023 11:46:20 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v5 0/6] workload-specific and memory pressure-driven zswap writeback To: Nhat Pham Cc: Andrew Morton , Johannes Weiner , Domenico Cerasuolo , Yosry Ahmed , Seth Jennings , Dan Streetman , Vitaly Wool , mhocko@kernel.org, roman.gushchin@linux.dev, Shakeel Butt , muchun.song@linux.dev, linux-mm , kernel-team@meta.com, LKML , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, shuah@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 75CF41C0005 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: nx31sp4mbptqstqkfry56ea99a8zf1it X-HE-Tag: 1699472795-473318 X-HE-Meta: U2FsdGVkX1+fC/8xNNV5hRQaz34KAIyhPE7dXelfO75nSQ5DucyZcYHjyDx8Ei4q5GIhPSearzndwQSLmZR3dCYU1ZCtEqpOeqYae3GZCNIhGcWHe0TT5VvJ1swW/PGn5vNbAH1FsRqBHkBAnDwqkhytqS/kPwFGol70XslsBLTSN5Sz30n8T3D1edb9Ga3u3J4RbUJIGFzNOBY0Wotpef9WBzh+jM68ekOv/RBeCcsz9ihCogX3WVSL5XrOV7OhAEDESOnYeWyJdXIWxK88EISwB05AQ2dZL5VhsUuEQkRj6QOYN9NHADEkfupwGHjlxBIqC2gj8o0cfdOnMKJjociSGSn7c1O1q3EerGTntVb8PwZoxhE5XlpqDon14+5ljiZVnQxTGW1Lpe9zM8nnCHRFuIbMAt5Z5iW3pe6qmwrWkDeuFBvh2XwyUqn/A3vvFgEB2g5VD+noOEk7g4aOe815zzDd3OdF+xqaQyJAyLijJR9WFqKZ8PJzPRly0mikQGDpWmVfGreVy0Z+2GjWiGKP1g39Z+me16uKlRrrzhC5tccvEvjO6fxv1lcwbOY6AOwX6YXPycL+0MJTYzWJec6BgoV5E3Ozlt0jtIQM3QNBRiWh0qnyd+AyO0948YrHoKhWjyyxMXdyX+OFg8Ao52KxMcQC60LsQ2ozk23r8Jx7uuuxuVjWcSJdE16u1rtZei/sMurrOJ9/cRoQQ7Z4XdKMRMWWi3Japf5PcnPg7VUyDV5lZzwe8+ca3GVx/SfvzRIYjxc+6JbsTlNpLf3G8P3aDp13aR/ZDqTRiyr70ungVI6Q3GcbfZ0MpjM/+i6kRNjuySTgfhvxR7OVuOgedQZe/QehjryI4gatdrWrczuNvjTa4x64jd1lGvXxTGSV2ssNKb3BUKlMwrnRgq1kyvwb05g8A8CoWTA6En0U2iMtIpZvB3ByeVQjFQ4rEB61sYIL3AeOSDOjAvHj+9Z FlnvrOda s1Wls/vTlM//iI03l4Anq8Dwt4H0FtDNl5VzUMSIZTNcZHc5ueJaZSvaNRVcHoCaBCNL1fSuQacEqAP7MMAnEDA6/dP4Ea6UX5seUKVhDdwgiSIgvQ0yiIzUTSsaYT7MiVw1OEJ6w2atp/szE1xlbb49jgW5mSdhVlkp0OqpHrZOjgF+W+diFrDDaUjUJgFoHimJl49hQQu/2m/e0SQZv5y2xz9sElCEDHdP1lSGLzaw7uh6wQ28Yuceaxn95uCixlCsJk67GgjT5/fmqZoPAtV6vVs3Xi/FicoGxkAXCJA8kq8Va0DWsLDKIk770Ovd12Exu7ynLGwo3B1s/Pe7H2Q4RX+FGmpOjdg/b X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Nhat, Sorry for being late to the party. I want to take a look at your patches se= ries. However I wasn't able to "git am" your patches series cleanly on current mm-stable, mm-unstable or linux tip. $ git am patches/v5_20231106_nphamcs_workload_specific_and_memory_pressure_= driven_zswap_writeback.mbx Applying: list_lru: allows explicit memcg and NUMA node selection Applying: memcontrol: allows mem_cgroup_iter() to check for onlineness Applying: zswap: make shrinking memcg-aware (fix) error: patch failed: mm/zswap.c:174 error: mm/zswap.c: patch does not apply Patch failed at 0003 zswap: make shrinking memcg-aware (fix) What is the base of your patches? A git hash or a branch I can pull from would be nice. Thanks Chris On Mon, Nov 6, 2023 at 10:32=E2=80=AFAM Nhat Pham wrote= : > > Changelog: > v5: > * Replace reference getting with an rcu_read_lock() section for > zswap lru modifications (suggested by Yosry) > * Add a new prep patch that allows mem_cgroup_iter() to return > online cgroup. > * Add a callback that updates pool->next_shrink when the cgroup is > offlined (suggested by Yosry Ahmed, Johannes Weiner) > v4: > * Rename list_lru_add to list_lru_add_obj and __list_lru_add to > list_lru_add (patch 1) (suggested by Johannes Weiner and > Yosry Ahmed) > * Some cleanups on the memcg aware LRU patch (patch 2) > (suggested by Yosry Ahmed) > * Use event interface for the new per-cgroup writeback counters. > (patch 3) (suggested by Yosry Ahmed) > * Abstract zswap's lruvec states and handling into > zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed) > v3: > * Add a patch to export per-cgroup zswap writeback counters > * Add a patch to update zswap's kselftest > * Separate the new list_lru functions into its own prep patch > * Do not start from the top of the hierarchy when encounter a memcg > that is not online for the global limit zswap writeback (patch 2) > (suggested by Yosry Ahmed) > * Do not remove the swap entry from list_lru in > __read_swapcache_async() (patch 2) (suggested by Yosry Ahmed) > * Removed a redundant zswap pool getting (patch 2) > (reported by Ryan Roberts) > * Use atomic for the nr_zswap_protected (instead of lruvec's lock) > (patch 5) (suggested by Yosry Ahmed) > * Remove the per-cgroup zswap shrinker knob (patch 5) > (suggested by Yosry Ahmed) > v2: > * Fix loongarch compiler errors > * Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM > > There are currently several issues with zswap writeback: > > 1. There is only a single global LRU for zswap, making it impossible to > perform worload-specific shrinking - an memcg under memory pressure > cannot determine which pages in the pool it owns, and often ends up > writing pages from other memcgs. This issue has been previously > observed in practice and mitigated by simply disabling > memcg-initiated shrinking: > > https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com= /T/#u > > But this solution leaves a lot to be desired, as we still do not > have an avenue for an memcg to free up its own memory locked up in > the zswap pool. > > 2. We only shrink the zswap pool when the user-defined limit is hit. > This means that if we set the limit too high, cold data that are > unlikely to be used again will reside in the pool, wasting precious > memory. It is hard to predict how much zswap space will be needed > ahead of time, as this depends on the workload (specifically, on > factors such as memory access patterns and compressibility of the > memory pages). > > This patch series solves these issues by separating the global zswap > LRU into per-memcg and per-NUMA LRUs, and performs workload-specific > (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The > new shrinker does not have any parameter that must be tuned by the > user, and can be opted in or out on a per-memcg basis. > > As a proof of concept, we ran the following synthetic benchmark: > build the linux kernel in a memory-limited cgroup, and allocate some > cold data in tmpfs to see if the shrinker could write them out and > improved the overall performance. Depending on the amount of cold data > generated, we observe from 14% to 35% reduction in kernel CPU time used > in the kernel builds. > > Domenico Cerasuolo (3): > zswap: make shrinking memcg-aware > mm: memcg: add per-memcg zswap writeback stat > selftests: cgroup: update per-memcg zswap writeback selftest > > Nhat Pham (3): > list_lru: allows explicit memcg and NUMA node selection > memcontrol: allows mem_cgroup_iter() to check for onlineness > zswap: shrinks zswap pool based on memory pressure > > Documentation/admin-guide/mm/zswap.rst | 7 + > drivers/android/binder_alloc.c | 5 +- > fs/dcache.c | 8 +- > fs/gfs2/quota.c | 6 +- > fs/inode.c | 4 +- > fs/nfs/nfs42xattr.c | 8 +- > fs/nfsd/filecache.c | 4 +- > fs/xfs/xfs_buf.c | 6 +- > fs/xfs/xfs_dquot.c | 2 +- > fs/xfs/xfs_qm.c | 2 +- > include/linux/list_lru.h | 46 ++- > include/linux/memcontrol.h | 9 +- > include/linux/mmzone.h | 2 + > include/linux/vm_event_item.h | 1 + > include/linux/zswap.h | 27 +- > mm/list_lru.c | 48 ++- > mm/memcontrol.c | 20 +- > mm/mmzone.c | 1 + > mm/shrinker.c | 4 +- > mm/swap.h | 3 +- > mm/swap_state.c | 26 +- > mm/vmscan.c | 26 +- > mm/vmstat.c | 1 + > mm/workingset.c | 4 +- > mm/zswap.c | 430 +++++++++++++++++--- > tools/testing/selftests/cgroup/test_zswap.c | 74 ++-- > 26 files changed, 625 insertions(+), 149 deletions(-) > > -- > 2.34.1 >