Re: [RESEND PATCH v9 00/19] zswap compression batching

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nhat Pham <nphamcs@gmail.com>
To: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	hannes@cmpxchg.org,  yosry.ahmed@linux.dev,
	chengming.zhou@linux.dev, usamaarif642@gmail.com,
	 ryan.roberts@arm.com, 21cnbao@gmail.com,
	ying.huang@linux.alibaba.com,  akpm@linux-foundation.org,
	senozhatsky@chromium.org,  linux-crypto@vger.kernel.org,
	herbert@gondor.apana.org.au,  davem@davemloft.net,
	clabbe@baylibre.com, ardb@kernel.org,  ebiggers@google.com,
	surenb@google.com, kristen.c.accardi@intel.com,
	 vinicius.gomes@intel.com, wajdi.k.feghali@intel.com,
	vinodh.gopal@intel.com
Subject: Re: [RESEND PATCH v9 00/19] zswap compression batching
Date: Thu, 8 May 2025 14:20:04 -0700	[thread overview]
Message-ID: <CAKEwX=MybjpmXVxM3QbfdQyXOv2xq87CZKzh1w2pdxucwSMttA@mail.gmail.com> (raw)
In-Reply-To: <CAKEwX=NJm-9zodgb_UC2z+vshw98MmcqZDw_xvbQWaaU29eGMw@mail.gmail.com>

On Thu, May 8, 2025 at 1:55 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Thu, May 8, 2025 at 12:41 PM Kanchana P Sridhar
> <kanchana.p.sridhar@intel.com> wrote:
> >
> >
> > Compression Batching:
> > =====================
> >
> > This patch-series introduces batch compression of pages in large folios to
> > improve zswap swapout latency. It preserves the existing zswap protocols
> > for non-batching software compressors by calling crypto_acomp sequentially
> > per page in the batch. Additionally, in support of hardware accelerators
> > that can process a batch as an integral unit, the patch-series creates
> > generic batching interfaces in crypto_acomp, and calls the
> > crypto_acomp_batch_compress() interface in zswap_compress() for compressors
> > that intrinsically support batching.
> >
> > The patch series provides a proof point by using the Intel Analytics
> > Accelerator (IAA) for implementing the compress/decompress batching API
> > using hardware parallelism in the iaa_crypto driver and another proof point
> > with a sequential software compressor, zstd.
>
> Any plan on doing hardware accelerated/offloaded/parallelized zstd? :)
>
> >
> > SUMMARY:
> > ========
> >
> >   The first proof point is to test with IAA using a sequential call (fully
> >   synchronous, compress one page at a time) vs. a batching call (fully
> >   asynchronous, submit a batch to IAA for parallel compression, then poll for
> >   completion statuses).
> >
> >     The performance testing data with usemem 30 processes and kernel
> >     compilation test using 32 threads, show 67%-77% throughput gains and
> >     28%-32% sys time reduction (usemem30) and 2-3% sys time reduction
> >     (kernel compilation) with zswap_store() large folios using IAA compress
> >     batching as compared to IAA sequential.
> >
> >   The second proof point is to make sure that software algorithms such as
> >   zstd do not regress. The data indicates that for sequential software
> >   algorithms a performance gain is achieved.
> >
> >     With the performance optimizations implemented in patches 18 and 19 of
> >     v9, zstd usemem30 throughput increases by 1%, along with a 6%-8% sys time
> >     reduction. With kernel compilation using zstd, we get a 0.4%-3.2%
> >     reduction in sys time. These optimizations pertain to common code
> >     paths, removing redundant branches/computes, using prefetchw() of the
> >     zswap entry before it is written, and selectively annotating branches
> >     with likely()/unlikely() compiler directives to minimize branch
> >     mis-prediction penalty. Additionally, using the batching code for
> >     non-batching compressors to sequentially compress/store batches of up
> >     to ZSWAP_MAX_BATCH_SIZE (8) pages seems to help, most likely due to
> >     cache locality of working set structures such as the array of
> >     zswap_entry-s for the batch.
>
> Nice!
>
> >
> >     Our internal validation of zstd with the batching interface vs. IAA with
> >     the batching interface on Emerald Rapids has shown that IAA
> >     compress/decompress batching gives 21.3% more memory savings as compared
> >     to zstd, for 5% performance loss as compared to the baseline without any
> >     memory pressure. IAA batching demonstrates more than 2X the memory
> >     savings obtained by zstd at this 95% performance KPI.
> >     The compression ratio with IAA is 2.23, and with zstd 2.96. Even with
> >     this compression ratio deficit for IAA, batching is extremely
>
> I'm confused. How does IAA give more memory savings, while having a
> worse compression ratio? How do you define memory savings here?
>
> >     beneficial. As we improve the compression ratio of the IAA accelerator,
> >     we expect to see even better memory savings with IAA as compared to
> >     software compressors.
> >
> >
> >   Batching Roadmap:
> >   =================
> >
> >   1) Compression batching within large folios (this series).
> >
> >   2) Reclaim batching of hybrid folios:
> >
> >      We can expect to see even more significant performance and throughput
> >      improvements if we use the parallelism offered by IAA to do reclaim
> >      batching of 4K/large folios (really any-order folios), and using the
> >      zswap_store() high throughput compression pipeline to batch-compress
> >      pages comprising these folios, not just batching within large
> >      folios. This is the reclaim batching patch 13 in v1, which we expect
> >      to submit in a separate patch-series.
>
> Are you aware of the current kcompressd work:
>
> https://lore.kernel.org/all/20250430082651.3152444-1-qun-wei.lin@mediatek.com/
>
> It basically offloads compression work into a separate kernel thread
> (kcompressd), for kswapd reclaim.
>
> This might provide you with a more natural place to perform batch
> compression - instead of compressing one page at a time from the
> worker thread's queue, you can grab a batch worth of pages and feed it
> to IAA.
>
> Downside is it only applies to indirect reclaim. Proactive and direct
> reclaimers are not covered, unfortunately.
>
> >
> >   3) Decompression batching:
> >
> >      We have developed a zswap load batching interface for IAA to be used
> >      for parallel decompression batching, using swapin_readahead().
> >
> >   These capabilities are architected so as to be useful to zswap and
> >   zram. We are actively working on integrating these components with zram.
>
> Yeah problem with readahead is you can potentially get different
> backends in the batch, and modifying readahead code is pretty ugly :)
> But we'll see...
>

Another place where you can do decompression batching is for zswap
writeback :) Right now, we are decompressing the pages and writing
them back one page at a time. You can, however, grab a batch worth of
them, feed to IAA for processing, before submitting them all for IO :)

I have a prototype that perform batch writeback (mostly for IO
efficiency purpose) - lmk if you want to play with it. Problem, as
usual, is benchmarking :)

next prev parent reply	other threads:[~2025-05-08 21:20 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-08 19:41 Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 01/19] crypto: acomp - Remove request chaining Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 02/19] crypto: acomp - Reinstate non-chained crypto_acomp_[de]compress() Kanchana P Sridhar
2025-05-13  8:01   ` Herbert Xu
2025-05-08 19:41 ` [PATCH v9 03/19] Revert "crypto: testmgr - Add multibuffer acomp testing" Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 04/19] crypto: scomp - Fix off-by-one bug when calculating last page Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 05/19] crypto: iaa - Re-organize the iaa_crypto driver code Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 06/19] crypto: iaa - New architecture for IAA device WQ comp/decomp usage & core mapping Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 07/19] crypto: iaa - Define and use req->data instead of req->base.data Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 08/19] crypto: iaa - Descriptor allocation timeouts with mitigations in iaa_crypto Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 09/19] crypto: iaa - CRYPTO_ACOMP_REQ_POLL acomp_req flag for sequential vs. parallel Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 10/19] crypto: acomp - New interfaces to facilitate batching support in acomp & drivers Kanchana P Sridhar
2025-05-13  8:03   ` Herbert Xu
2025-05-16 19:17     ` Sridhar, Kanchana P
2025-05-17  0:46       ` Herbert Xu
2025-05-18 20:41         ` Sridhar, Kanchana P
2025-05-08 19:41 ` [PATCH v9 11/19] crypto: iaa - Implement crypto_acomp batching interfaces for Intel IAA Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 12/19] crypto: iaa - Enable async mode and make it the default Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 13/19] crypto: iaa - Disable iaa_verify_compress by default Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 14/19] mm: zswap: Move the CPU hotplug procedures under "pool functions" Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 15/19] mm: zswap: Per-CPU acomp_ctx resources exist from pool creation to deletion Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 16/19] mm: zswap: Consistently use IS_ERR_OR_NULL() to check acomp_ctx resources Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 17/19] mm: zswap: Allocate pool batching resources if the compressor supports batching Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 18/19] mm: zswap: zswap_store() will process a folio in batches Kanchana P Sridhar
2025-05-08 19:41 ` [PATCH v9 19/19] mm: zswap: Batched zswap_compress() with compress batching of large folios Kanchana P Sridhar
2025-05-08 19:54 ` [RESEND PATCH v9 00/19] zswap compression batching Sridhar, Kanchana P
2025-05-08 20:55 ` Nhat Pham
2025-05-08 21:20   ` Nhat Pham [this message]
2025-05-09 18:29     ` Sridhar, Kanchana P
2025-05-09 18:26   ` Sridhar, Kanchana P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=MybjpmXVxM3QbfdQyXOv2xq87CZKzh1w2pdxucwSMttA@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=clabbe@baylibre.com \
    --cc=davem@davemloft.net \
    --cc=ebiggers@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=kanchana.p.sridhar@intel.com \
    --cc=kristen.c.accardi@intel.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=senozhatsky@chromium.org \
    --cc=surenb@google.com \
    --cc=usamaarif642@gmail.com \
    --cc=vinicius.gomes@intel.com \
    --cc=vinodh.gopal@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox