From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32AB8C27C52 for ; Thu, 6 Jun 2024 17:46:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 94D086B00B5; Thu, 6 Jun 2024 13:46:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8FCA06B00B6; Thu, 6 Jun 2024 13:46:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79D276B00B7; Thu, 6 Jun 2024 13:46:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5C0DC6B00B5 for ; Thu, 6 Jun 2024 13:46:42 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E6AA214158E for ; Thu, 6 Jun 2024 17:46:41 +0000 (UTC) X-FDA: 82201193802.01.CF19B53 Received: from mail-vs1-f43.google.com (mail-vs1-f43.google.com [209.85.217.43]) by imf23.hostedemail.com (Postfix) with ESMTP id 27929140004 for ; Thu, 6 Jun 2024 17:46:39 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MifHJZJQ; spf=pass (imf23.hostedemail.com: domain of yosryahmed@google.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717696000; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HRhyW1F2lGb3fjD4x2stgo6BiduS8IQYI372u9O9y8A=; b=exwqNTm7iJGrKR5oqC1uzr7nlm/udAlBJdUu41/fYY1E3GIxycayhLsfztr2GcJaPDDoRS cLlyqEXdHUpzhPicxCHOLlnSypaVDI/TGiQYYBCSmeOLd/yOO/a5jmdtjc291p5w40WNQ+ 71a7oERtpgIY9CtZXtHsg4YIpU/3sQ8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717696000; a=rsa-sha256; cv=none; b=hL0J7ceaAzdmiRT4QqsNrB2T0mVjlyU+SfEjUkb6KEP8GzBUft11dKzQHDGsqmt9MB+2fX iylTuIamB4CoPqM99ZpjxYknaiI7TzUDaZ7bLm8zxpD7HMNbQJ0r28tA2gnvd3E9C+cUNE 9MjQxaVu5HuBYYm5lFI1CTW/6KOBfzA= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MifHJZJQ; spf=pass (imf23.hostedemail.com: domain of yosryahmed@google.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-vs1-f43.google.com with SMTP id ada2fe7eead31-48bcadbd790so580022137.1 for ; Thu, 06 Jun 2024 10:46:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717695999; x=1718300799; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HRhyW1F2lGb3fjD4x2stgo6BiduS8IQYI372u9O9y8A=; b=MifHJZJQsTxfGGSF+Ea/Tg7By7YN05E4h4ChOpCEYXBOgk2n1QAmwSj1jGoXvq4FKi WRbnn5TW8lvx1fHZHCZBYzBMopdcr70jRKUDCRRqAHO2sW0GvXiapKIhqtz2RDSFsPRa MSwBNnxiaKJN63g7QqzeaE5qqu4+SI/74URA9p1icxBtNBKnVgJk90/fjF4oCFFjbf0/ 1cz/u4J8Bae5dLBr7ShpGbe2me2aINBAW3Wi/F7CbwpVx3nl5I3IQG5oIau3/mVQzDY6 FOxU3vgseNK3R8ZvEsdgDZg3/NHAM1I00Nt8oH0SBkcI9+DZ4N+36JktpsgwTTGFoIBy fHTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717695999; x=1718300799; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HRhyW1F2lGb3fjD4x2stgo6BiduS8IQYI372u9O9y8A=; b=k68Psd3rbbYe7FE/iFBrj3ZsE3Tvn0cpUJrTuX2+HcY6z+IVEZRgQCMLO01EWW299S IsUaxGMM/6IwPY+qKIO5hrwWqeRj1SteJsws8trSajWhqyOzV/rbEmlRczIDxe8fJ1UD 2E7H03Cu+YxWsgDm8Iu+JME8KJlNXj/CQrorjG9s360UI9JxTY42HGXwEOhSbC88IcxK PzlThLbHGTtKmbtuTYb5JuvU0ClBpp53k/oQ8iB3bMt2Rb2C0U+HNLu+BL65712ew7NZ 2PdjvCqzd5vizN8YDepTJ35G7KwtZjk2Zh5ulU3cGKYrP1CYiziQaXGu2hm2CO+EsIYf CV4w== X-Forwarded-Encrypted: i=1; AJvYcCWVXETxJf8a+OWgCevZR0QImSmcAVJmdgRS60nYQe1lKRr3Evh+Ws6BiUi++x6DwDR4NwZJEqY3+TqPm9+XFLydhfU= X-Gm-Message-State: AOJu0YzdAkpI1PXj/cVXXBAYf5WQeTarfp+wB8E0czyk9EekG3r+10S4 lbUXSzjkwLkzXuFAZt31nFnu61GNl1J9tvf4iDDiwPMWTnZTvXmmUrY2qN3I4wJnJX5x5wuCrDZ 2cTwIwk0aOfiTQVuR8uuHpLeH8wDoyE/tX9VQ X-Google-Smtp-Source: AGHT+IFcPe6HJ1jApyQ8V22o7AJvrNkEDb0YV4XLytf86GnSi5rq0No3NguPHUAaJB83wyb39C5CjyTaBzWiBYd6Bxw= X-Received: by 2002:a67:f34f:0:b0:48b:c32e:2185 with SMTP id ada2fe7eead31-48c17159270mr4157262137.9.1717695998847; Thu, 06 Jun 2024 10:46:38 -0700 (PDT) MIME-Version: 1.0 References: <20240606165303.431215-1-flintglass@gmail.com> In-Reply-To: <20240606165303.431215-1-flintglass@gmail.com> From: Yosry Ahmed Date: Thu, 6 Jun 2024 10:45:59 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: limit number of zpools based on CPU and RAM To: Takero Funaki Cc: Johannes Weiner , Nhat Pham , Chengming Zhou , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 27929140004 X-Rspam-User: X-Stat-Signature: 9xbaidshjmjiany8cj5mwigbs16qxb7h X-HE-Tag: 1717695999-235238 X-HE-Meta: U2FsdGVkX19H7g4FoTGH2InAjAE3jKSdpW20W+3jjM/VrKKGQFiHXKJHuido+wBjCi/+lOk2ScIsODcvs8pNUIkv8pAigr9MRd/dYw7FfPV8QCNslBN6UqzIOkyuF69Ump0alm979humAOyLMaM+amJfF5HQFBbJR0W9vfgtfbsQ7Df/1Ki/g1l17I7KpIfgI6giK3ZEln/xakNgSFVDRvgIgd9uMXOKp8/1MSII2RMq0XP9yE1LihUU77dn4YR8z3yiqC01UfEvSIw4Oqk7FP68X+5iskKdoJZMYE8A9DM0Q3DVIQB81QRrCmg5A3++LMiOvgIj6SWN9fShw+plozULlbpJzOiUdCirxrHIuKcNLw9YsIcdy6M3Lj1aAycd640epVsjfSf5SqjqqomOwKNjeHDn0nT9h0JboO5hs25zbHZ9t98zsPhGeeP/6YLWo0B1RqasSCx40FtNBzf/f/RJe1loW8I/ogYH0sAxUBdYW6rL7Mjx5vZZm00D2xS4eeJwZOi7iNieVCcqAn+MEvknZ2zC7GUrY3uo/scE1toXMX8do7LApuFy7lC6Ra9J1FoFjv8mDYYs/DCECKYFz0fvL/RDVpWyNgBIf03e0EPneBz2QDEZy+naQsxz6ar/VkKX1WhwnCII6bPLBHhJshpdigqke4cXh1e2Te06RSn0GCIbbdSAnm8zVpDijSRqw4aYAqAFivAq+i9cefmCAnty1A3gdV9WiceY8v8yWaPDrqW9cPjNSIFJTEpvToUvIm8Xfa7jGTbip2IKegz3YNs3z5V4rvG8ehjIDdFvUTBAZL6Ma/ogLrWGsZlvXL54cMaKEIyko1zcpF3ssnD96nvp3nzWFRsvaTvZuH850BR/B/4d7CcjhPQJ7QkSJM6wOBX4GyPpdgsbFAZQibtfsx3+UkLL5y6RUXN2Dlsqi5sNbLQZohfi17wvL1ey0x1wBOWbGTxz/OVOwlPBDqT ofAkQeRZ DRYPY0rJO4IBO9cTchiQbiY9zWTFdX+dSJI7KAjT76ZxAY2huBwUhPmI+BYdpRld9UtA/lpqWVX+aLoOD5Iwmp3DaeCvjtvFi1IPBiriJx36V/eQDcBE+P3hOBI4TbLtT0fNf7B8nCREfuiYs5mhH6MTXHegR2DZYsvgFwDkNxH5VW9qYhVEw97FYeIn37+oa0HEpPj1dlxic3UcgCtjTLAzWoyTzAeHLTu0tMVwNyUrU5agh3fej13Lq7g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 6, 2024 at 9:53=E2=80=AFAM Takero Funaki = wrote: > > This patch limits the number of zpools used by zswap on smaller systems. > > Currently, zswap allocates 32 pools unconditionally. This was > implemented to reduce contention on per-zpool locks. However, it incurs > allocation overhead by distributing pages across pools, wasting memory > on systems with fewer CPUs and less RAM. > > This patch allocates approximately 2*CPU zpools, with a minimum of 1 > zpool for single-CPU systems and up to 32 zpools for systems with 16 or > more CPUs. This number is sufficient to keep the probability of > busy-waiting by a thread under 40%. The upper limit of 32 zpools remains > unchanged. > > For memory, it limits to 1 zpool per 60MB of memory for the 20% default > max pool size limit, assuming the best case with no fragmentation in > zspages. It expects 90% pool usage for zsmalloc. > > Signed-off-by: Takero Funaki There are a lot of magic numbers in this patch, and it seems like it's all based on theory. I don't object to making the number of zpools dynamic in some way, but unless we do it in a data-driven way where we understand the implications, I think the added complexity and inconsistency is not justified. For example, 2*CPU zpools is an overkill and will cause a lot of fragmentation. We use 32 zpools right now for machines with 100s of CPUs. I know that you are keeping 32 as the limit, but why 2*CPUs if nr_cpus <=3D 16? Also, the limitation based on memory size assumes that zsmalloc is the only allocator used by zswap, which is unfortunately not the case. The current implementation using 32 zpools all the time is not perfect, and I did write a patch to make it at least be min(nr_cpus, 32), but it is simple and it works. Complexity should be justified. > --- > mm/zswap.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 60 insertions(+), 7 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 4de342a63bc2..e957bfdeaf70 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -124,8 +124,11 @@ static unsigned int zswap_accept_thr_percent =3D 90;= /* of max pool size */ > module_param_named(accept_threshold_percent, zswap_accept_thr_percent, > uint, 0644); > > -/* Number of zpools in zswap_pool (empirically determined for scalabilit= y) */ > -#define ZSWAP_NR_ZPOOLS 32 > +/* > + * Number of max zpools in zswap_pool (empirically determined for scalab= ility) > + * This must be order of 2, for pointer hashing. > + */ > +#define ZSWAP_NR_ZPOOLS_MAX 32 > > /* Enable/disable memory pressure-based shrinker. */ > static bool zswap_shrinker_enabled =3D IS_ENABLED( > @@ -157,12 +160,13 @@ struct crypto_acomp_ctx { > * needs to be verified that it's still valid in the tree. > */ > struct zswap_pool { > - struct zpool *zpools[ZSWAP_NR_ZPOOLS]; > + struct zpool *zpools[ZSWAP_NR_ZPOOLS_MAX]; > struct crypto_acomp_ctx __percpu *acomp_ctx; > struct percpu_ref ref; > struct list_head list; > struct work_struct release_work; > struct hlist_node node; > + unsigned char nr_zpools_order; > char tfm_name[CRYPTO_MAX_ALG_NAME]; > }; > > @@ -243,11 +247,55 @@ static inline struct xarray *swap_zswap_tree(swp_en= try_t swp) > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > zpool_get_type((p)->zpools[0])) > > +static unsigned long zswap_max_pages(void); > + > /********************************* > * pool functions > **********************************/ > static void __zswap_pool_empty(struct percpu_ref *ref); > > +/* > + * Estimate the optimal number of zpools based on CPU and memory. > + * > + * For CPUs, aim for 40% or lower probability of busy-waiting from a thr= ead, > + * assuming all cores are accessing zswap concurrently. > + * The threshold is chosen for the simplicity of the formula: > + * The probability is 1-(1-(1/pool))^(thr-1). For 40% threshold, this is > + * approximately pool =3D 2 * threads rounded up to orders of 2. > + * Threads \ Pools > + * 2 4 8 16 32 > + * 2 0.50 0.25 < 0.13 0.06 0.03 > + * 4 0.88 0.58 0.33 < 0.18 0.09 > + * 6 0.97 0.76 0.49 0.28 < 0.15 > + * 8 0.99 0.87 0.61 0.36 < 0.20 > + * 10 1.00 0.92 0.70 0.44 0.25 < > + * 16 1.00 0.99 0.87 0.62 0.38 < > + * 18 1.00 0.99 0.90 0.67 0.42 > + * > + * For memory, expect 90% pool usage for zsmalloc in the best case. > + * Assuming uniform distribution, we need to store: > + * 590 : sum of pages_per_zspage > + * * 0.5 : about half of zspage is empty if no fragmentation > + * / (1-0.9) : 90% target usage > + * =3D 2950 : expected max pages of a zpool, > + * equivalent to 60MB RAM for a 20% max_pool_percent. > + */ > +static void __zswap_set_nr_zpools(struct zswap_pool *pool) > +{ > + unsigned long mem =3D zswap_max_pages(); > + unsigned long cpu =3D num_online_cpus(); > + > + mem =3D DIV_ROUND_UP(mem, 2950); > + mem =3D min(max(1, mem), ZSWAP_NR_ZPOOLS_MAX); > + > + if (cpu <=3D 1) > + cpu =3D 1; > + else > + cpu =3D 1 << ilog2(min(cpu * 2, ZSWAP_NR_ZPOOLS_MAX); > + > + pool->nr_zpools_order =3D ilog2(min(mem, cpu)); > +} > + > static struct zswap_pool *zswap_pool_create(char *type, char *compressor= ) > { > int i; > @@ -271,7 +319,9 @@ static struct zswap_pool *zswap_pool_create(char *typ= e, char *compressor) > if (!pool) > return NULL; > > - for (i =3D 0; i < ZSWAP_NR_ZPOOLS; i++) { > + __zswap_set_nr_zpools(pool); > + > + for (i =3D 0; i < (1 << pool->nr_zpools_order); i++) { > /* unique name for each pool specifically required by zsm= alloc */ > snprintf(name, 38, "zswap%x", > atomic_inc_return(&zswap_pools_count)); > @@ -372,7 +422,7 @@ static void zswap_pool_destroy(struct zswap_pool *poo= l) > cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->no= de); > free_percpu(pool->acomp_ctx); > > - for (i =3D 0; i < ZSWAP_NR_ZPOOLS; i++) > + for (i =3D 0; i < (1 << pool->nr_zpools_order); i++) > zpool_destroy_pool(pool->zpools[i]); > kfree(pool); > } > @@ -513,7 +563,7 @@ unsigned long zswap_total_pages(void) > list_for_each_entry_rcu(pool, &zswap_pools, list) { > int i; > > - for (i =3D 0; i < ZSWAP_NR_ZPOOLS; i++) > + for (i =3D 0; i < (1 << pool->nr_zpools_order); i++) > total +=3D zpool_get_total_pages(pool->zpools[i])= ; > } > rcu_read_unlock(); > @@ -822,7 +872,10 @@ static void zswap_entry_cache_free(struct zswap_entr= y *entry) > > static struct zpool *zswap_find_zpool(struct zswap_entry *entry) > { > - return entry->pool->zpools[hash_ptr(entry, ilog2(ZSWAP_NR_ZPOOLS)= )]; > + if (entry->pool->nr_zpools_order =3D=3D 0) > + return entry->pool->zpools[0]; > + > + return entry->pool->zpools[hash_ptr(entry, entry->pool->nr_zpools= _order)]; > } > > /* > -- > 2.43.0 >