From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 373FBC021B2 for ; Thu, 20 Feb 2025 23:28:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A51E56B00A5; Thu, 20 Feb 2025 18:28:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A009B6B00BA; Thu, 20 Feb 2025 18:28:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 87B58280001; Thu, 20 Feb 2025 18:28:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 62D556B00A5 for ; Thu, 20 Feb 2025 18:28:56 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BDC661432E5 for ; Thu, 20 Feb 2025 23:28:55 +0000 (UTC) X-FDA: 83141915430.25.25C089F Received: from mail-qk1-f179.google.com (mail-qk1-f179.google.com [209.85.222.179]) by imf01.hostedemail.com (Postfix) with ESMTP id DDD2540008 for ; Thu, 20 Feb 2025 23:28:53 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bdT3N+YG; spf=pass (imf01.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.179 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740094133; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A7nsjQsOdLCanrHBE5ga+GmcstPuckFgM2UOaW0NXKA=; b=Hu++R/G1Q7yqSPe788N3NMoS6uVhRi7xb+eehO2lXO41winJjNRimVs6g85kLWrB9VtudV oDkahRXHcKPEp5n3ImarYKgwFLS0eT1fmzU3ATFAWOfcETsMPca/cpA9zTvxEMytodyAnM laGxf4anaSALVA0IX+z24LeSEHdaY/A= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bdT3N+YG; spf=pass (imf01.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.179 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740094133; a=rsa-sha256; cv=none; b=1Y7W3H0kO224rOEf0lIufb+x9jzSD9NBA0HGfgJSeG3KiUFAtgCUIx4ZHJ07Gj29a4pPwW fMMg6LkOEUOd48zvRXc/c+SZ/JNDUIu8udgZzMxk1W6Dm6bxpv5c7t+Np3zu8Q0oU/ZOfC C+0yPp0FaZADd4CPNEis3gyh9oOmfb4= Received: by mail-qk1-f179.google.com with SMTP id af79cd13be357-7c08b14baa9so135785785a.3 for ; Thu, 20 Feb 2025 15:28:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740094133; x=1740698933; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=A7nsjQsOdLCanrHBE5ga+GmcstPuckFgM2UOaW0NXKA=; b=bdT3N+YGNjReFrxh8D2UB0rBqDV/02Koc5vT03gIpaN/q5zk0pVlXFmCSMw7bkqAAW 0sc9gbwrS6Wd9CdO8Ydf15SYRmwoDYW7SI7XLxccYPrvW8nru8zhbqLCnlVMwzZh33C1 pDD9uzzlLKJeFp48mOKG48KADryvt8zDmdpBXrmEoJLPlfyzSafGh5Wkk605R1/SPZBf xnhvmXzgHXMAPesd5yuv5FrnyeTE67nb1KjH3eq3BnNYysMkBaMZtQL4B1ctmHRn3FSx 9zGhPG8sH8owFiMd3HGhVYb9uaLlr/tkucDTtONqKdfLsQPhxcuVWZsYy8kXcbZ6ge9S 6yFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740094133; x=1740698933; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A7nsjQsOdLCanrHBE5ga+GmcstPuckFgM2UOaW0NXKA=; b=W+Nv0MHUuLmHbKyFSmEU/g8pF5PJiFm6Nx9jvG2yzRiue9dOk7nnPQcvqWUWWrTlOr S4QCviU1phNVWFWSSdbnFnSEwqe2++/rcqY4KeEagYln2z6DNeTGwW4iPcZyVr5seMtc cRBTGSEGTFMtEICJC22FpCNjMlfDh6XhPmSKgnDAxcVSAymI0jLY3bnK6xnhnWnENfdm tetD2gtD+EV+xqDKdUcoNAgppOBlbi9CnMQVFBobunFnE+M7dCPLZfiimsXBiIHqrwy2 Z4sm1MT7vXhyaErTItByFjeF/FIiqHzrCteNd6bN2rDhyFLquwMp9XK+15Y7bHtn3oBV Y7ZQ== X-Forwarded-Encrypted: i=1; AJvYcCWuwRgOMqtqsLEq7cgy6whkV8AJpJ4W/4+19mYdZLpSTWP+Ru2dPa5fSLi7wmuxqzypKSq4PnuFZA==@kvack.org X-Gm-Message-State: AOJu0Ywq+WPWcdMOxMWUIFPQ8I6u6xwjzAzBoX2hlhgkPQKgJiE4i+/z 8nv57Z5Wb8ef6WlhbJrxGaIVS57Kem2r+aaEjvHa/p15ebOCQgzF0X3UXyzdRNmGaAe6fNx1e8x r5Yajoy0xGQx5r7Ze8vpxn+8qJzo= X-Gm-Gg: ASbGnctJx0Xfm4o1Mhf4JBNGuQv9RSqh8JyRJZ9ifZuQMjwPzXHKoHcK/YnqlDnGxbr S1IE1S8XyGiEJ6oUdLvEcgx2yrjo5yDbxm+/jDJNZK+P+n9yg6VkvRtMuPLz6It6OTpf6GpO8B2 ikkZKhA18VUKnEBoO+/8qxNtXrFXDw X-Google-Smtp-Source: AGHT+IEHXJcop8/Ak9HvfAznsV/aGJqQT005imbN1kpgP+gHn3wVTRj3bOocUNK7cMYkjPmFBvJMUnOB8fvHq11vZMk= X-Received: by 2002:a05:620a:46a3:b0:7c0:b185:a951 with SMTP id af79cd13be357-7c0ceee9997mr213049485a.5.1740094132794; Thu, 20 Feb 2025 15:28:52 -0800 (PST) MIME-Version: 1.0 References: <20250206072102.29045-1-kanchana.p.sridhar@intel.com> <20250206072102.29045-17-kanchana.p.sridhar@intel.com> In-Reply-To: <20250206072102.29045-17-kanchana.p.sridhar@intel.com> From: Nhat Pham Date: Thu, 20 Feb 2025 15:28:42 -0800 X-Gm-Features: AWEUYZmqO_8pyUpV8OmZZT3mNE43hjKnnoLJe7KFKexIC-rIf29_dV6rwBTILuw Message-ID: Subject: Re: [PATCH v6 16/16] mm: zswap: Fix for zstd performance regression with 2M folios. To: Kanchana P Sridhar Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosry.ahmed@linux.dev, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com, wajdi.k.feghali@intel.com, vinodh.gopal@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: DDD2540008 X-Stat-Signature: pd9pa55ef3g698ihkhhsk64xdwx3zax7 X-HE-Tag: 1740094133-633711 X-HE-Meta: U2FsdGVkX19NFEY4SxSxLEYhuK9AL5pkQKVR+Ooe0jJTPscsHDCiP9m0B0HoelR5TMDE3DE0rNTncugTFIOjfOIRXY3Pq7yqxXHM1TBjxTVK6sYse5SL/rl27JQnv7+JRwxxmnlk7alb3xDjNz9WFw50KxnPGIrGcqvXqy3AyW1qjPVnfWVChm4XHzjOCl+LPWIxuD+XoXFfZzVo5AaGy2AQE6oR6Ea4WUwoh71xebQnWLjQUKyjTWryVL2zDFcBCvjmVx5epaFWQ9xL4yRfDcHFIc5tTXHVj2dOR952VNOdTDS7PdVI3vrnUEbBviK77IWojaVdD7uDtn+doudcAROzRjIkZKcQ6ZtdNtgPmxeFWHtugVu3EnQwUXC60htdua7UswehxilSzm2zvh6B1L0VFh099c2zYlfHhXI69wMcHXUFdW354aj0oWgU9d9NIgU9dkPeFzswe+wFXxb3COVkf7JiZmrGBiAs1Y3XIucWFnTTin/sNnx/CedTcS5Ddce64J4KvEHWodRl0kV400mN8gw/sdiUJnMNPAKWHUWMHgp/qIpfdc2HSWH2+sWYVU2RAoCuqDsKXTAFaZzUvwXX/IdUgpGrxoUqG0fbXZvbxJnL48SGGUCqHTjMWCZcTC/ryIQwls3IlJRVqb2VCFf5kNHI0f9+IDEFre4DMPEsdkFyQAnXLleucJa2ZP6HFm7Mfp+Vhe8BG9fKc6tB8avaLGGvm/9TO7sNJSjsXMEmseHoALx3LwkdXStGAVyuqR3Lz+DTfobDvFFC1CazR3QkrvJIMzmq0m8xtKv0vOX5HX65WeWzdX2P0ntHKaStfP3U6EkbjGE4EziLJPpFsoRvkNQaj/hjRgUfvCllWM9xMzdx7tHTvowGtGoRRx6VISJ5FBlFEj65y2s/txD2/PJlyPBKrQbsGtWgVF9MN73W+B5FP3mjF9w/xrCufWvVU6PKRhvO1Ie7Yzl6VhN RSakrtMo dfDTl0Xa3iAOKRLXve/yrlljSVYMe7npLrhvGViCNPx9QLjJ8L23CToL1M+5raAzr/Yp6w9SHSzfjoFqZADd0q5OM+urxV6wReF0VAfzZyokbqMNSWo6ETSdA1ppYnSlI6dsL4vjAA+YHrIFBoGzgKGGe+4SyHRXBF11eYzO3yRVRw0INIh50xXG4TMVgRd+9qNHg6ltfcl76tzToFcWWrJZ/gO7D+R5hTZ9YDwJq9TqXiJPvUQZe603SLMBb1mJoZeEnaL7mVicGBpx2i8oxebx4D8Kxxqw/+oFnmGo/vu6lORv9Z+yODd48BpRvPM/bi6eWtMcWh9dR+qOQq41Hpe6Z+Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 5, 2025 at 11:21=E2=80=AFPM Kanchana P Sridhar wrote: > > With the previous patch that enables support for batch compressions in > zswap_compress_folio(), a 6.2% throughput regression was seen with zstd a= nd > 2M folios, using vm-scalability/usemem. > > For compressors that don't support batching, this was root-caused to the > following zswap_store_folio() structure: > > Batched stores: > --------------- > - Allocate all entries, > - Compress all entries, > - Store all entries in xarray/LRU. Can you clarify why the above structure leads to performance regression? > > Hence, the above structure is maintained only for batched stores, and the > following structure is implemented for sequential stores of large folio p= ages, > that fixes the zstd regression, while preserving common code paths for ba= tched > and sequential stores of a folio: > > Sequential stores: > ------------------ > For each page in folio: > - allocate an entry, > - compress the page, > - store the entry in xarray/LRU. > > This is submitted as a separate patch only for code review purposes. I wi= ll > squash this with the previous commit in subsequent versions of this > patch-series. > > Signed-off-by: Kanchana P Sridhar > --- > mm/zswap.c | 193 ++++++++++++++++++++++++++++++----------------------- > 1 file changed, 111 insertions(+), 82 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index f1cba77eda62..7bfc720a6201 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1592,40 +1592,32 @@ static bool zswap_batch_compress(struct folio *fo= lio, > return ret; > } > > -static bool zswap_compress_folio(struct folio *folio, > - struct zswap_entry *entries[], > - struct zswap_pool *pool) > +static __always_inline bool zswap_compress_folio(struct folio *folio, > + struct zswap_entry *entr= ies[], > + long from_index, > + struct zswap_pool *pool, > + unsigned int batch_size, > + struct crypto_acomp_ctx = *acomp_ctx) > { > - long index, nr_pages =3D folio_nr_pages(folio); > - struct crypto_acomp_ctx *acomp_ctx; > - unsigned int batch_size; > + long index =3D from_index, nr_pages =3D folio_nr_pages(folio); > bool ret =3D true; > > - acomp_ctx =3D acomp_ctx_get_cpu_lock(pool); > - batch_size =3D acomp_ctx->nr_reqs; > - > if ((batch_size > 1) && (nr_pages > 1)) { > - for (index =3D 0; index < nr_pages; index +=3D batch_size= ) { > + for (; index < nr_pages; index +=3D batch_size) { > > if (!zswap_batch_compress(folio, index, batch_siz= e, > &entries[index], pool, = acomp_ctx)) { > ret =3D false; > - goto unlock_acomp_ctx; > + break; > } > } > } else { > - for (index =3D 0; index < nr_pages; ++index) { > - struct page *page =3D folio_page(folio, index); > + struct page *page =3D folio_page(folio, index); > > - if (!zswap_compress(page, entries[index], pool, a= comp_ctx)) { > - ret =3D false; > - goto unlock_acomp_ctx; > - } > - } > + if (!zswap_compress(page, entries[index], pool, acomp_ctx= )) > + ret =3D false; > } > > -unlock_acomp_ctx: > - acomp_ctx_put_unlock(acomp_ctx); > return ret; > } > > @@ -1637,92 +1629,128 @@ static bool zswap_compress_folio(struct folio *f= olio, > * handles to ERR_PTR(-EINVAL) at allocation time, and the fact that the > * entry's handle is subsequently modified only upon a successful zpool_= malloc() > * after the page is compressed. > + * > + * For compressors that don't support batching, the following structure > + * showed a performance regression with zstd/2M folios: > + * > + * Batched stores: > + * --------------- > + * - Allocate all entries, > + * - Compress all entries, > + * - Store all entries in xarray/LRU. > + * > + * Hence, the above structure is maintained only for batched stores, and= the > + * following structure is implemented for sequential stores of large fol= io pages, > + * that fixes the regression, while preserving common code paths for bat= ched > + * and sequential stores of a folio: > + * > + * Sequential stores: > + * ------------------ > + * For each page in folio: > + * - allocate an entry, > + * - compress the page, > + * - store the entry in xarray/LRU. > */ > static bool zswap_store_folio(struct folio *folio, > struct obj_cgroup *objcg, > struct zswap_pool *pool) > { > - long index, from_index =3D 0, nr_pages =3D folio_nr_pages(folio); > + long index =3D 0, from_index =3D 0, nr_pages, nr_folio_pages =3D = folio_nr_pages(folio); > struct zswap_entry **entries =3D NULL; > + struct crypto_acomp_ctx *acomp_ctx; > int node_id =3D folio_nid(folio); > + unsigned int batch_size; > > - entries =3D kmalloc(nr_pages * sizeof(*entries), GFP_KERNEL); > + entries =3D kmalloc(nr_folio_pages * sizeof(*entries), GFP_KERNEL= ); > if (!entries) > return false; > > - for (index =3D 0; index < nr_pages; ++index) { > - entries[index] =3D zswap_entry_cache_alloc(GFP_KERNEL, no= de_id); > + acomp_ctx =3D acomp_ctx_get_cpu_lock(pool); > + batch_size =3D acomp_ctx->nr_reqs; > > - if (!entries[index]) { > - zswap_reject_kmemcache_fail++; > - nr_pages =3D index; > - goto store_folio_failed; > + nr_pages =3D (batch_size > 1) ? nr_folio_pages : 1; > + > + while (1) { > + for (index =3D from_index; index < nr_pages; ++index) { > + entries[index] =3D zswap_entry_cache_alloc(GFP_KE= RNEL, node_id); > + > + if (!entries[index]) { > + zswap_reject_kmemcache_fail++; > + nr_pages =3D index; > + goto store_folio_failed; > + } > + > + entries[index]->handle =3D (unsigned long)ERR_PTR= (-EINVAL); > } > > - entries[index]->handle =3D (unsigned long)ERR_PTR(-EINVAL= ); > - } > + if (!zswap_compress_folio(folio, entries, from_index, poo= l, batch_size, acomp_ctx)) > + goto store_folio_failed; > > - if (!zswap_compress_folio(folio, entries, pool)) > - goto store_folio_failed; > + for (index =3D from_index; index < nr_pages; ++index) { > + swp_entry_t page_swpentry =3D page_swap_entry(fol= io_page(folio, index)); > + struct zswap_entry *old, *entry =3D entries[index= ]; > > - for (index =3D 0; index < nr_pages; ++index) { > - swp_entry_t page_swpentry =3D page_swap_entry(folio_page(= folio, index)); > - struct zswap_entry *old, *entry =3D entries[index]; > + old =3D xa_store(swap_zswap_tree(page_swpentry), > + swp_offset(page_swpentry), > + entry, GFP_KERNEL); > + if (xa_is_err(old)) { > + int err =3D xa_err(old); > > - old =3D xa_store(swap_zswap_tree(page_swpentry), > - swp_offset(page_swpentry), > - entry, GFP_KERNEL); > - if (xa_is_err(old)) { > - int err =3D xa_err(old); > + WARN_ONCE(err !=3D -ENOMEM, "unexpected x= array error: %d\n", err); > + zswap_reject_alloc_fail++; > + from_index =3D index; > + goto store_folio_failed; > + } > > - WARN_ONCE(err !=3D -ENOMEM, "unexpected xarray er= ror: %d\n", err); > - zswap_reject_alloc_fail++; > - from_index =3D index; > - goto store_folio_failed; > - } > + /* > + * We may have had an existing entry that became = stale when > + * the folio was redirtied and now the new versio= n is being > + * swapped out. Get rid of the old. > + */ > + if (old) > + zswap_entry_free(old); > > - /* > - * We may have had an existing entry that became stale wh= en > - * the folio was redirtied and now the new version is bei= ng > - * swapped out. Get rid of the old. > - */ > - if (old) > - zswap_entry_free(old); > + /* > + * The entry is successfully compressed and store= d in the tree, there is > + * no further possibility of failure. Grab refs t= o the pool and objcg, > + * charge zswap memory, and increment zswap_store= d_pages. > + * The opposite actions will be performed by zswa= p_entry_free() > + * when the entry is removed from the tree. > + */ > + zswap_pool_get(pool); > + if (objcg) { > + obj_cgroup_get(objcg); > + obj_cgroup_charge_zswap(objcg, entry->len= gth); > + } > + atomic_long_inc(&zswap_stored_pages); > > - /* > - * The entry is successfully compressed and stored in the= tree, there is > - * no further possibility of failure. Grab refs to the po= ol and objcg, > - * charge zswap memory, and increment zswap_stored_pages. > - * The opposite actions will be performed by zswap_entry_= free() > - * when the entry is removed from the tree. > - */ > - zswap_pool_get(pool); > - if (objcg) { > - obj_cgroup_get(objcg); > - obj_cgroup_charge_zswap(objcg, entry->length); > + /* > + * We finish initializing the entry while it's al= ready in xarray. > + * This is safe because: > + * > + * 1. Concurrent stores and invalidations are exc= luded by folio lock. > + * > + * 2. Writeback is excluded by the entry not bein= g on the LRU yet. > + * The publishing order matters to prevent wri= teback from seeing > + * an incoherent entry. > + */ > + entry->pool =3D pool; > + entry->swpentry =3D page_swpentry; > + entry->objcg =3D objcg; > + entry->referenced =3D true; > + if (entry->length) { > + INIT_LIST_HEAD(&entry->lru); > + zswap_lru_add(&zswap_list_lru, entry); > + } > } > - atomic_long_inc(&zswap_stored_pages); > > - /* > - * We finish initializing the entry while it's already in= xarray. > - * This is safe because: > - * > - * 1. Concurrent stores and invalidations are excluded by= folio lock. > - * > - * 2. Writeback is excluded by the entry not being on the= LRU yet. > - * The publishing order matters to prevent writeback f= rom seeing > - * an incoherent entry. > - */ > - entry->pool =3D pool; > - entry->swpentry =3D page_swpentry; > - entry->objcg =3D objcg; > - entry->referenced =3D true; > - if (entry->length) { > - INIT_LIST_HEAD(&entry->lru); > - zswap_lru_add(&zswap_list_lru, entry); > - } > + from_index =3D nr_pages++; > + > + if (nr_pages > nr_folio_pages) > + break; > } > > + acomp_ctx_put_unlock(acomp_ctx); > kfree(entries); > return true; > > @@ -1734,6 +1762,7 @@ static bool zswap_store_folio(struct folio *folio, > zswap_entry_cache_free(entries[index]); > } > > + acomp_ctx_put_unlock(acomp_ctx); > kfree(entries); > return false; > } > -- > 2.27.0 >