From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15AA1C54E58 for ; Tue, 12 Mar 2024 09:12:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F0006B0155; Tue, 12 Mar 2024 05:12:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A0CC6B0157; Tue, 12 Mar 2024 05:12:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 167B66B0156; Tue, 12 Mar 2024 05:12:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 048976B0151 for ; Tue, 12 Mar 2024 05:12:47 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9EEF61C0AD4 for ; Tue, 12 Mar 2024 09:12:46 +0000 (UTC) X-FDA: 81887821932.21.AC5DD45 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf17.hostedemail.com (Postfix) with ESMTP id 0663F40005 for ; Tue, 12 Mar 2024 09:12:44 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TEPNPk2b; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.175 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710234765; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5rUxrQEulPJ8fAsXx7rye46mclEZHswu8KUh45f0+5Y=; b=aw1DFYfu3pXXaLIRoo18zvj041kPhNvjinNWmrdev6EqvrjheJCeIkL5Eqc9TNVc/H5Bzy bfPO5dDwg7SMlkVfch2ciyjGDuOAEabCWuPgryfZGGck/15V9mShb83O/Xxxin+Ev/0B7E XeRI/8z9Ow5MBb9ec8tMwYf/630QVsg= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TEPNPk2b; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.175 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710234765; a=rsa-sha256; cv=none; b=TczBaUlSjHiOBdXSrL2WVYzi6vmAxe4FKXGTx28FwhMTrGxrZwP1GhwTU4VnbNt9S9FoPw Fa6nARXw71J3S9xt6+MvdAbx/vuGo6eMFKZvSq2jVd6bG0Yf1KCARsidlYCviM977B97XX Bb0Uzc6eG0ptVZIWFxO9vKX4bA3Wdfw= Received: by mail-qk1-f175.google.com with SMTP id af79cd13be357-78858428fb6so272791785a.0 for ; Tue, 12 Mar 2024 02:12:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710234764; x=1710839564; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5rUxrQEulPJ8fAsXx7rye46mclEZHswu8KUh45f0+5Y=; b=TEPNPk2bTuRuvCzs7X63Q5106tDjeF4fa2C9YSKSiyD1kDuZq/Ru3QA/YZWgK92JA6 CnmdC/D186TQySxawSVfbthLzaMqQqEyCpwwvJlFGOrD6RanRDzGQaezHDuD4WsVhQ/7 a6/LYtq8v027aNIwdN9wR4rRusXGRAiz7SAluP5WM9/51Aig4+7CTMvrdv/yiIN204e4 Uu3TKCZtiAevp04MJSIKY0YrNAZRVHskAkbHDAl5pAcsD/kF1l3lXBJPukQT4s2iZWej KRKOmMS1CENnptOvdxCwp7FoTAkRqG7gs5EkgDNptIrnrllbSdv0tcU3OMlhKshHssHo T0CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710234764; x=1710839564; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5rUxrQEulPJ8fAsXx7rye46mclEZHswu8KUh45f0+5Y=; b=HbjTOBmem4IgwerssIy9nUschcFBQ2ehwQa7Dg2Uk2/DQVO8GoEbUyQua5UGilARhS mTi1V74NF5t9Dk1w6AmkOwW6JjUpWtyYewmbTT4VhJDw4OnBiEcrTCMDPsU1ZHICDeE4 fE2y0Mr1gpCxPBzw62/Ff6CG1qD/Vx9DHccjA8TwrrySUz67v0RlSbsGI5gnOCeiWcyW sfZ5XNNVLZd1ZhKA3gA3LTGEnBd5HRSuXri9XW7zxDl4qnZUbeZ35xT42Vk5SeICXJSo yKXepFjGkN1KV2RA4QxLylBYTDdZ3MiurtR15ZOxxEHF9FdE1BW5dAqQxXPQE0f5sYOr q08A== X-Forwarded-Encrypted: i=1; AJvYcCU1+8B5C41zzja73a1Mj8cwOZg0u7FAVAMnsyxmPtdy4SP9UYdKNvvH+Tv1HR2CmTww/VAmUz2EsOEFXplgS/waElU= X-Gm-Message-State: AOJu0YwAU2dId3E3xIqitWDWPyuyhqqTQMlUdaU/MvJVYF+qhPNXk4uW hqP1i4SRfOEyTi4TzEKrn3F2g7fVo1Z6QII91ysWphH6OsWzV7LcpzdfphAy3WekS3yVxolmDvm IuQkU+At1ocJtEga5WMd2ctEB/3M= X-Google-Smtp-Source: AGHT+IEQSpNWG3Ysi4W4TTQIV2JRY97VARQRzp2M6AJuOs54BFsyELyeTiX4468GHJpaEV3HOdJ5Vaoy5HzljZGr7kE= X-Received: by 2002:a05:6214:5009:b0:68f:a813:f0d with SMTP id jo9-20020a056214500900b0068fa8130f0dmr16567830qvb.32.1710234763971; Tue, 12 Mar 2024 02:12:43 -0700 (PDT) MIME-Version: 1.0 References: <20240311161214.1145168-1-hannes@cmpxchg.org> In-Reply-To: <20240311161214.1145168-1-hannes@cmpxchg.org> From: Nhat Pham Date: Tue, 12 Mar 2024 16:12:32 +0700 Message-ID: Subject: Re: [PATCH 1/2] mm: zswap: optimize zswap pool size tracking To: Johannes Weiner Cc: Andrew Morton , Yosry Ahmed , Chengming Zhou , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: ydbyww8tk73uyo4zcw95imkma1mfrpky X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0663F40005 X-HE-Tag: 1710234764-663623 X-HE-Meta: U2FsdGVkX181srqI+jFpbbBlElCPOfc0ctAePDCYh+Zzb6U7xQPaRrT+sc+6KPN4hXqc8H33iqvyWlLT96G39+/u5CGIA17wXt0reQgjBk7dbvuL0oNyzoqMy9abpu/sq1zbvP3yuGaGQDMQNjKJKn3/pztZ//sJyGLLV/hezjIW2GQ4hE+7/keRXBzknio1NY5UAxFzNHjW6zhas5ZZeo4Yl5uIsEuH+y2cqXEnHqW0zOku/wcpS3HFSnW938PrvXnLR9+G1mXK9Vbaw01PhcElyVdWiLSB9abPuJgch6jQkhneDsTYZEpz5bJ5CsS/nxRqpKep8swsBvd3bZ2Vv38/dUzRc6/gakzr8S0f4ksy7R3WRGNHc1MMgiCbjf3BDhxBks9qO6un75JKu9tbyHRsLh8n5XUWdTJojPhz8ESnqGd91Xq86bCrEPebhBqW4SfptexkrfnNUhxU3XRl+JiZlcVjUaCcmZBpdKOWGQ7BuYeqgh4qFBgy+a5xhFKAgD9Nz5YLPrC02SZM18m5900ekEumbMKxPkbCX9LWxyP7g8CaN3IBrWSNQdgAvn2XO25v9jWotf9YV37IX8f4zV8fFYpq5YMOpjPpNeByEu6M9OExYk4WC3iM1v8I5qeCwG76JKlokvttiQIrYZKVEm3+CUUwGgOqJhjnJXCo4TpqasTWMmjlfSXJC+2SgekrO3/6jmbT5X66a+5WYZ5zDwZUkQlANrhV1+nmfj6qU+dP0C6sPPWtisdWv1IOC6rbrakRAZ69odNNrKxf0c/ij97uQQsdaTnWapcn1dDYyqWxP+386AlXJWnXCOh5+eqOupM6hCDCFWOY4szw6IpGjwLRIHyoDS6/vfIPnRYDxG0EHEl8wssF/BMHeNDH00AWupV3Me/GzzzK6gv/qJJU91B7pFSKhhJhWerW51iuNrp350d4uvhd3XzEiGOnPACOfA4ykD0KvaP16HxicXN bvwahfY7 qBjahBD3e78ZKvCD+irFaDOFbhwEJR0XpJ+Gy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 11, 2024 at 11:12=E2=80=AFPM Johannes Weiner wrote: > > Profiling the munmap() of a zswapped memory region shows 50%(!) of the > total cycles currently going into updating the zswap_pool_total_size. > > There are three consumers of this counter: > - store, to enforce the globally configured pool limit > - meminfo & debugfs, to report the size to the user > - shrink, to determine the batch size for each cycle > > Instead of aggregating everytime an entry enters or exits the zswap > pool, aggregate the value from the zpools on-demand: > > - Stores aggregate the counter anyway upon success. Aggregating to > check the limit instead is the same amount of work. > > - Meminfo & debugfs might benefit somewhat from a pre-aggregated > counter, but aren't exactly hotpaths. > > - Shrinking can aggregate once for every cycle instead of doing it for > every freed entry. As the shrinker might work on tens or hundreds of > objects per scan cycle, this is a large reduction in aggregations. Nice! > > The paths that benefit dramatically are swapin, swapoff, and > unmaps. There could be millions of pages being processed until > somebody asks for the pool size again. This eliminates the pool size > updates from those paths entirely. > > Signed-off-by: Johannes Weiner With your fixlet applied: Reviewed-by: Nhat Pham > --- > fs/proc/meminfo.c | 3 +- > include/linux/zswap.h | 2 +- > mm/zswap.c | 98 +++++++++++++++++++++---------------------- > 3 files changed, 49 insertions(+), 54 deletions(-) > > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c > index 45af9a989d40..245171d9164b 100644 > --- a/fs/proc/meminfo.c > +++ b/fs/proc/meminfo.c > @@ -89,8 +89,7 @@ static int meminfo_proc_show(struct seq_file *m, void *= v) > show_val_kb(m, "SwapTotal: ", i.totalswap); > show_val_kb(m, "SwapFree: ", i.freeswap); > #ifdef CONFIG_ZSWAP > - seq_printf(m, "Zswap: %8lu kB\n", > - (unsigned long)(zswap_pool_total_size >> 10)); > + show_val_kb(m, "Zswap: ", zswap_total_pages()); > seq_printf(m, "Zswapped: %8lu kB\n", > (unsigned long)atomic_read(&zswap_stored_pages) << > (PAGE_SHIFT - 10)); > diff --git a/include/linux/zswap.h b/include/linux/zswap.h > index 341aea490070..2a85b941db97 100644 > --- a/include/linux/zswap.h > +++ b/include/linux/zswap.h > @@ -7,7 +7,6 @@ > > struct lruvec; > > -extern u64 zswap_pool_total_size; > extern atomic_t zswap_stored_pages; > > #ifdef CONFIG_ZSWAP > @@ -27,6 +26,7 @@ struct zswap_lruvec_state { > atomic_long_t nr_zswap_protected; > }; > > +unsigned long zswap_total_pages(void); > bool zswap_store(struct folio *folio); > bool zswap_load(struct folio *folio); > void zswap_invalidate(swp_entry_t swp); > diff --git a/mm/zswap.c b/mm/zswap.c > index 9a3237752082..7c39327a7cc2 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -43,8 +43,6 @@ > /********************************* > * statistics > **********************************/ > -/* Total bytes used by the compressed storage */ > -u64 zswap_pool_total_size; > /* The number of compressed pages currently stored in zswap */ > atomic_t zswap_stored_pages =3D ATOMIC_INIT(0); > /* The number of same-value filled pages currently stored in zswap */ > @@ -264,45 +262,6 @@ static inline struct zswap_tree *swap_zswap_tree(swp= _entry_t swp) > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > zpool_get_type((p)->zpools[0])) > > -static bool zswap_is_full(void) > -{ > - return totalram_pages() * zswap_max_pool_percent / 100 < > - DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE); > -} > - > -static bool zswap_can_accept(void) > -{ > - return totalram_pages() * zswap_accept_thr_percent / 100 * > - zswap_max_pool_percent / 100 > > - DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE); > -} > - > -static u64 get_zswap_pool_size(struct zswap_pool *pool) > -{ > - u64 pool_size =3D 0; > - int i; > - > - for (i =3D 0; i < ZSWAP_NR_ZPOOLS; i++) > - pool_size +=3D zpool_get_total_size(pool->zpools[i]); > - > - return pool_size; > -} > - > -static void zswap_update_total_size(void) > -{ > - struct zswap_pool *pool; > - u64 total =3D 0; > - > - rcu_read_lock(); > - > - list_for_each_entry_rcu(pool, &zswap_pools, list) > - total +=3D get_zswap_pool_size(pool); > - > - rcu_read_unlock(); > - > - zswap_pool_total_size =3D total; > -} > - > /********************************* > * pool functions > **********************************/ > @@ -540,6 +499,28 @@ static struct zswap_pool *zswap_pool_find_get(char *= type, char *compressor) > return NULL; > } > > +static unsigned long zswap_max_pages(void) > +{ > + return totalram_pages() * zswap_max_pool_percent / 100; > +} > + > +unsigned long zswap_total_pages(void) > +{ > + struct zswap_pool *pool; > + u64 total =3D 0; > + > + rcu_read_lock(); > + list_for_each_entry_rcu(pool, &zswap_pools, list) { > + int i; > + > + for (i =3D 0; i < ZSWAP_NR_ZPOOLS; i++) > + total +=3D zpool_get_total_size(pool->zpools[i]); > + } > + rcu_read_unlock(); > + > + return total >> PAGE_SHIFT; > +} > + > /********************************* > * param callbacks > **********************************/ > @@ -912,7 +893,6 @@ static void zswap_entry_free(struct zswap_entry *entr= y) > } > zswap_entry_cache_free(entry); > atomic_dec(&zswap_stored_pages); > - zswap_update_total_size(); > } > > /* > @@ -1317,7 +1297,7 @@ static unsigned long zswap_shrinker_count(struct sh= rinker *shrinker, > nr_stored =3D memcg_page_state(memcg, MEMCG_ZSWAPPED); > #else > /* use pool stats instead of memcg stats */ > - nr_backing =3D zswap_pool_total_size >> PAGE_SHIFT; > + nr_backing =3D zswap_total_pages(); > nr_stored =3D atomic_read(&zswap_nr_stored); > #endif > > @@ -1385,6 +1365,10 @@ static void shrink_worker(struct work_struct *w) > { > struct mem_cgroup *memcg; > int ret, failures =3D 0; > + unsigned long thr; > + > + /* Reclaim down to the accept threshold */ > + thr =3D zswap_max_pages() * zswap_accept_thr_percent / 100; > > /* global reclaim will select cgroup in a round-robin fashion. */ > do { > @@ -1432,10 +1416,9 @@ static void shrink_worker(struct work_struct *w) > break; > if (ret && ++failures =3D=3D MAX_RECLAIM_RETRIES) > break; > - > resched: > cond_resched(); > - } while (!zswap_can_accept()); > + } while (zswap_total_pages() > thr); > } > > static int zswap_is_page_same_filled(void *ptr, unsigned long *value) > @@ -1476,6 +1459,7 @@ bool zswap_store(struct folio *folio) > struct zswap_entry *entry, *dupentry; > struct obj_cgroup *objcg =3D NULL; > struct mem_cgroup *memcg =3D NULL; > + unsigned long max_pages, cur_pages; > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); > @@ -1487,6 +1471,7 @@ bool zswap_store(struct folio *folio) > if (!zswap_enabled) > goto check_old; > > + /* Check cgroup limits */ > objcg =3D get_obj_cgroup_from_folio(folio); > if (objcg && !obj_cgroup_may_zswap(objcg)) { > memcg =3D get_mem_cgroup_from_objcg(objcg); > @@ -1497,15 +1482,20 @@ bool zswap_store(struct folio *folio) > mem_cgroup_put(memcg); > } > > - /* reclaim space if needed */ > - if (zswap_is_full()) { > + /* Check global limits */ > + cur_pages =3D zswap_total_pages(); > + max_pages =3D zswap_max_pages(); > + > + if (cur_pages >=3D max_pages) { > zswap_pool_limit_hit++; > zswap_pool_reached_full =3D true; > goto shrink; > } > > if (zswap_pool_reached_full) { > - if (!zswap_can_accept()) > + unsigned long thr =3D max_pages * zswap_accept_thr_percen= t / 100; > + > + if (cur_pages > thr) > goto shrink; > else > zswap_pool_reached_full =3D false; > @@ -1581,7 +1571,6 @@ bool zswap_store(struct folio *folio) > > /* update stats */ > atomic_inc(&zswap_stored_pages); > - zswap_update_total_size(); > count_vm_event(ZSWPOUT); > > return true; > @@ -1711,6 +1700,13 @@ void zswap_swapoff(int type) > > static struct dentry *zswap_debugfs_root; > > +static int debugfs_get_total_size(void *data, u64 *val) > +{ > + *val =3D zswap_total_pages() * PAGE_SIZE; > + return 0; > +} > +DEFINE_DEBUGFS_ATTRIBUTE(total_size_fops, debugfs_get_total_size, NULL, = "%llu"); > + > static int zswap_debugfs_init(void) > { > if (!debugfs_initialized()) > @@ -1732,8 +1728,8 @@ static int zswap_debugfs_init(void) > zswap_debugfs_root, &zswap_reject_compress_poo= r); > debugfs_create_u64("written_back_pages", 0444, > zswap_debugfs_root, &zswap_written_back_pages)= ; > - debugfs_create_u64("pool_total_size", 0444, > - zswap_debugfs_root, &zswap_pool_total_size); > + debugfs_create_file("pool_total_size", 0444, > + zswap_debugfs_root, NULL, &total_size_fops); > debugfs_create_atomic_t("stored_pages", 0444, > zswap_debugfs_root, &zswap_stored_pages); > debugfs_create_atomic_t("same_filled_pages", 0444, > -- > 2.44.0 >