RE: [PATCH v11 22/24] mm: zswap: Allocate pool batching resources if the compressor supports batching.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
To: Barry Song <21cnbao@gmail.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"yosry.ahmed@linux.dev" <yosry.ahmed@linux.dev>,
	"nphamcs@gmail.com" <nphamcs@gmail.com>,
	"chengming.zhou@linux.dev" <chengming.zhou@linux.dev>,
	"usamaarif642@gmail.com" <usamaarif642@gmail.com>,
	"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
	"ying.huang@linux.alibaba.com" <ying.huang@linux.alibaba.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"senozhatsky@chromium.org" <senozhatsky@chromium.org>,
	"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	"herbert@gondor.apana.org.au" <herbert@gondor.apana.org.au>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"clabbe@baylibre.com" <clabbe@baylibre.com>,
	"ardb@kernel.org" <ardb@kernel.org>,
	"ebiggers@google.com" <ebiggers@google.com>,
	"surenb@google.com" <surenb@google.com>,
	"Accardi, Kristen C" <kristen.c.accardi@intel.com>,
	"Gomes, Vinicius" <vinicius.gomes@intel.com>,
	"Feghali, Wajdi K" <wajdi.k.feghali@intel.com>,
	"Gopal, Vinodh" <vinodh.gopal@intel.com>,
	"Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
Subject: RE: [PATCH v11 22/24] mm: zswap: Allocate pool batching resources if the compressor supports batching.
Date: Thu, 28 Aug 2025 22:47:21 +0000	[thread overview]
Message-ID: <PH7PR11MB81211DD54822167C6BA238D5C93BA@PH7PR11MB8121.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CAGsJ_4zjPxSjrSomm3E3gOuW+AqiTKwUHJ34q9m9aJb3y3vEKw@mail.gmail.com>


> -----Original Message-----
> From: Barry Song <21cnbao@gmail.com>
> Sent: Thursday, August 28, 2025 2:40 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; yosry.ahmed@linux.dev; nphamcs@gmail.com;
> chengming.zhou@linux.dev; usamaarif642@gmail.com;
> ryan.roberts@arm.com; ying.huang@linux.alibaba.com; akpm@linux-
> foundation.org; senozhatsky@chromium.org; linux-crypto@vger.kernel.org;
> herbert@gondor.apana.org.au; davem@davemloft.net;
> clabbe@baylibre.com; ardb@kernel.org; ebiggers@google.com;
> surenb@google.com; Accardi, Kristen C <kristen.c.accardi@intel.com>;
> Gomes, Vinicius <vinicius.gomes@intel.com>; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v11 22/24] mm: zswap: Allocate pool batching resources
> if the compressor supports batching.
> 
> On Wed, Aug 27, 2025 at 12:06 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Barry Song <21cnbao@gmail.com>
> > > Sent: Monday, August 25, 2025 10:17 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > hannes@cmpxchg.org; yosry.ahmed@linux.dev; nphamcs@gmail.com;
> > > chengming.zhou@linux.dev; usamaarif642@gmail.com;
> > > ryan.roberts@arm.com; ying.huang@linux.alibaba.com; akpm@linux-
> > > foundation.org; senozhatsky@chromium.org; linux-
> crypto@vger.kernel.org;
> > > herbert@gondor.apana.org.au; davem@davemloft.net;
> > > clabbe@baylibre.com; ardb@kernel.org; ebiggers@google.com;
> > > surenb@google.com; Accardi, Kristen C <kristen.c.accardi@intel.com>;
> > > Gomes, Vinicius <vinicius.gomes@intel.com>; Feghali, Wajdi K
> > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v11 22/24] mm: zswap: Allocate pool batching
> resources
> > > if the compressor supports batching.
> > >
> > > > > > > [...]
> > > > > > > >
> > > > > > > > +       /*
> > > > > > > > +        * Set the unit of compress batching for large folios, for quick
> > > > > > > > +        * retrieval in the zswap_compress() fast path:
> > > > > > > > +        * If the compressor is sequential (@pool-
> >compr_batch_size is
> > > 1),
> > > > > > > > +        * large folios will be compressed in batches of
> > > > > > > ZSWAP_MAX_BATCH_SIZE
> > > > > > > > +        * pages, where each page in the batch is compressed
> > > sequentially.
> > > > > > > > +        * We see better performance by processing the folio in
> batches
> > > of
> > > > > > > > +        * ZSWAP_MAX_BATCH_SIZE, due to cache locality of
> working
> > > set
> > > > > > > > +        * structures.
> > > > > > > > +        */
> > > > > > > > +       pool->batch_size = (pool->compr_batch_size > 1) ?
> > > > > > > > +                               pool->compr_batch_size :
> > > ZSWAP_MAX_BATCH_SIZE;
> > > > > > > > +
> > > > > > > >         zswap_pool_debug("created", pool);
> > > > > > > >
> > > > > > > >         return pool;
> > > > > > > >
> > > > > > >
> > > > > > > It’s hard to follow — you add batch_size and compr_batch_size in
> this
> > > > > > > patch, but only use them in another. Could we merge the related
> > > changes
> > > > > > > into one patch instead of splitting them into several that don’t work
> > > > > > > independently?
> > > > > >
> > > > > > Hi Barry,
> > > > > >
> > > > > > Thanks for reviewing the code and for your comments! Sure, I can
> merge
> > > > > > this patch with the next one. I was trying to keep the changes
> > > modularized
> > > > > > to a) zswap_cpu_comp_prepare(), b) zswap_store() and c)
> > > > > zswap_compress()
> > > > > > so the changes are broken into smaller parts, but I can see how this
> can
> > > > > > make the changes appear disjointed.
> > > > > >
> > > > > > One thing though: the commit logs for each of the patches will
> > > > > > also probably need to be merged, since I have tried to explain the
> > > > > > changes in detail.
> > > > >
> > > > > It’s fine to merge the changelog and present the story as a whole. Do
> we
> > > >
> > > > Sure.
> > > >
> > > > > really need both pool->batch_size and pool->compr_batch_size? I
> assume
> > > > > pool->batch_size = pool->compr_batch_size if HW supports batch;
> > > otherwise
> > > > > pool->compr_batch_size = 1.
> > > >
> > > > Actually not exactly. We have found value in compressing in batches of
> > > > ZSWAP_MAX_BATCH_SIZE even for software compressors. Latency
> benefits
> > > > from cache locality of working-set data. Hence the approach that we
> have
> > > > settled on is pool->batch_size = ZSWAP_MAX_BATCH_SIZE if
> > > > the compressor does not support batching (i.e., if pool-
> >compr_batch_size is
> > > 1).
> > > > If it does, then pool->batch_size = pool->compr_batch_size.
> > >
> > > I understand that even without a hardware batch, you can still
> > > have some software batching that excludes compression.
> > >
> > > However, based on the code below, it looks like
> > > pool->compr_batch_size is almost always either equal to
> > > pool->batch_size or 1:
> > >
> > > +       pool->compr_batch_size = min(ZSWAP_MAX_BATCH_SIZE,
> > > +                                    crypto_acomp_batch_size(acomp_ctx->acomp));
> >
> > I would like to explain some of the considerations in coming up with this
> > approach:
> >
> > 1) The compression algorithm gets to decide an optimal batch-size.
> >     For a hardware accelerator such as IAA, this value could be different
> >     than ZSWAP_MAX_BATCH_SIZE.
> >
> > 2) ZSWAP_MAX_BATCH_SIZE acts as a limiting factor to the # of acomp_ctx
> >     per-CPU resources that will be allocated in zswap_cpu_comp_prepare();
> >     as per Yosry's suggestion. This helps limit the memory overhead for
> >     batching algorithms.
> >
> > 3) If a batching algorithm works with a batch size "X" , where
> >      1 < X < ZSWAP_MAX_BATCH_SIZE, two things need to happen:
> >      a) We want to only allocate "X" per-CPU resources.
> >      b) We want to process the folio in batches of "X", not
> ZSWAP_MAX_BATCH_SIZE
> >           to avail of the benefits of hardware parallelism. This is the
> compression
> >           step size you also mention.
> >           In particular, we cannot treat batch_size as ZSWAP_MAX_BATCH_SIZE,
> >           and send a batch of ZSWAP_MAX_BATCH_SIZE pages to
> zswap_compress()
> >           in this case. For e.g., what if the compress step-size is 6, but the new
> >           zswap_store_pages() introduced in patch 23 sends 8 pages to
> >           zswap_compress() because ZSWAP_MAX_BATCH_SIZE is set to 8?
> >           The code in zswap_compress() could get quite messy, which will
> impact
> >           latency.
> 
> If ZSWAP_MAX_BATCH_SIZE is set to 8 and there is no hardware batching,
> compression is done with a step size of 1. If the hardware step size is 4,
> compression occurs in two steps. If the hardware step size is 6, the first
> compression uses a step size of 6, and the second uses a step size of 2.
> Do you think this will work?

Hi Barry,

This would be non-optimal from code simplicity and latency perspectives.
One of the benefits of using the hardware accelerator's "batch parallelism"
is cost amortization across the batch. We might lose this benefit if we make
multiple calls to zswap_compress() to ask the hardware accelerator to
batch compress in smaller batches. Compression throughput would also
be sub-optimal.

In my patch-series, the rule is simple: if an algorithm has specified a
batch-size, carve out the folio by that "batch-size" # of pages to be
compressed as a batch in zswap_compress(). This custom batch-size
is capped at ZSWAP_MAX_BATCH_SIZE.

If an algorithm has not specified a batch-size, the default batch-size
is 1. In this case, carve out the folio by ZSWAP_MAX_BATCH_SIZE
# of pages to be compressed as a batch in zswap_compress().

> 
> I don’t quite understand why you want to save
> ZSWAP_MAX_BATCH_SIZE - X resources, since even without hardware
> batching
> you are still allocating all ZSWAP_MAX_BATCH_SIZE resources. This is the
> case for all platforms except yours.

Not sure I understand.. Just to clarify, this is not done to save on resources,
rather for the reasons stated above.

We are already saving on resources by only allocating only
"pool->compr_batch_size" number of resources
(*not* ZSWAP_MAX_BATCH_SIZE resources):

	pool->compr_batch_size = min(ZSWAP_MAX_BATCH_SIZE,
				     crypto_acomp_batch_size(acomp_ctx->acomp));

For non-Intel platforms, this means only 1 dst buffer is allocated, as
explained in the commit log for this patch.

" A "u8 compr_batch_size" member is added to "struct zswap_pool", as per
Yosry's suggestion. pool->compr_batch_size is set as the minimum of the
compressor's max batch-size and ZSWAP_MAX_BATCH_SIZE. Accordingly, it
proceeds to allocate the necessary compression dst buffers in the
per-CPU acomp_ctx."

Thanks,
Kanchana

> 
> Thanks
> Barry

next prev parent reply	other threads:[~2025-08-28 22:47 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-01  4:36 [PATCH v11 00/24] zswap compression batching with optimized iaa_crypto driver Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 01/24] crypto: iaa - Reorganize the iaa_crypto driver code Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 02/24] crypto: iaa - New architecture for IAA device WQ comp/decomp usage & core mapping Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 03/24] crypto: iaa - Simplify, consistency of function parameters, minor stats bug fix Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 04/24] crypto: iaa - Descriptor allocation timeouts with mitigations Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 05/24] crypto: iaa - iaa_wq uses percpu_refs for get/put reference counting Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 06/24] crypto: iaa - Simplify the code flow in iaa_compress() and iaa_decompress() Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 07/24] crypto: iaa - Refactor hardware descriptor setup into separate procedures Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 08/24] crypto: iaa - Simplified, efficient job submissions for non-irq mode Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 09/24] crypto: iaa - Deprecate exporting add/remove IAA compression modes Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 10/24] crypto: iaa - Rearchitect the iaa_crypto driver to be usable by zswap and zram Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 11/24] crypto: iaa - Enablers for submitting descriptors then polling for completion Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 12/24] crypto: acomp - Add "void *kernel_data" in "struct acomp_req" for kernel users Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 13/24] crypto: iaa - IAA Batching for parallel compressions/decompressions Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 14/24] crypto: iaa - Enable async mode and make it the default Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 15/24] crypto: iaa - Disable iaa_verify_compress by default Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 16/24] crypto: iaa - Submit the two largest source buffers first in decompress batching Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 17/24] crypto: iaa - Add deflate-iaa-dynamic compression mode Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 18/24] crypto: acomp - Add crypto_acomp_batch_size() to get an algorithm's batch-size Kanchana P Sridhar
2025-08-15  5:28   ` Herbert Xu
2025-08-22 19:31     ` Sridhar, Kanchana P
2025-08-22 21:48       ` Nhat Pham
2025-08-22 21:58         ` Sridhar, Kanchana P
2025-08-22 22:00           ` Sridhar, Kanchana P
2025-08-01  4:36 ` [PATCH v11 19/24] crypto: iaa - IAA acomp_algs register the get_batch_size() interface Kanchana P Sridhar
2025-08-29  0:16   ` Barry Song
2025-08-29  3:12     ` Sridhar, Kanchana P
2025-08-01  4:36 ` [PATCH v11 20/24] mm: zswap: Per-CPU acomp_ctx resources exist from pool creation to deletion Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 21/24] mm: zswap: Consistently use IS_ERR_OR_NULL() to check acomp_ctx resources Kanchana P Sridhar
2025-08-01  4:36 ` [PATCH v11 22/24] mm: zswap: Allocate pool batching resources if the compressor supports batching Kanchana P Sridhar
2025-08-14 20:58   ` Nhat Pham
2025-08-14 22:05     ` Sridhar, Kanchana P
2025-08-26  3:48   ` Barry Song
2025-08-26  4:27     ` Sridhar, Kanchana P
2025-08-26  4:42       ` Barry Song
2025-08-26  4:56         ` Sridhar, Kanchana P
2025-08-26  5:17           ` Barry Song
2025-08-27  0:06             ` Sridhar, Kanchana P
2025-08-28 21:39               ` Barry Song
2025-08-28 22:47                 ` Sridhar, Kanchana P [this message]
2025-08-28 23:28                   ` Barry Song
2025-08-29  2:56                     ` Sridhar, Kanchana P
2025-08-29  3:42                       ` Barry Song
2025-08-29 18:39                         ` Sridhar, Kanchana P
2025-08-30  8:40                           ` Barry Song
2025-09-03 18:00                             ` Sridhar, Kanchana P
2025-08-01  4:36 ` [PATCH v11 23/24] mm: zswap: zswap_store() will process a large folio in batches Kanchana P Sridhar
2025-08-14 21:05   ` Nhat Pham
2025-08-14 22:10     ` Sridhar, Kanchana P
2025-08-28 23:59   ` Barry Song
2025-08-29  3:06     ` Sridhar, Kanchana P
2025-08-01  4:36 ` [PATCH v11 24/24] mm: zswap: Batched zswap_compress() with compress batching of large folios Kanchana P Sridhar
2025-08-14 21:14   ` Nhat Pham
2025-08-14 22:17     ` Sridhar, Kanchana P
2025-08-28 23:54   ` Barry Song
2025-08-29  3:04     ` Sridhar, Kanchana P
2025-08-29  3:31       ` Barry Song
2025-08-29  3:39         ` Sridhar, Kanchana P
2025-08-08 23:51 ` [PATCH v11 00/24] zswap compression batching with optimized iaa_crypto driver Nhat Pham
2025-08-09  0:03   ` Sridhar, Kanchana P
2025-08-15  5:27   ` Herbert Xu
2025-08-22 19:26     ` Sridhar, Kanchana P
2025-08-25  5:38       ` Herbert Xu
2025-08-25 18:12         ` Sridhar, Kanchana P
2025-08-26  1:13           ` Herbert Xu
2025-08-26  4:09             ` Sridhar, Kanchana P
2025-08-26  4:14               ` Herbert Xu
2025-08-26  4:42                 ` Sridhar, Kanchana P
2025-09-18  2:38   ` Vinicius Costa Gomes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH7PR11MB81211DD54822167C6BA238D5C93BA@PH7PR11MB8121.namprd11.prod.outlook.com \
    --to=kanchana.p.sridhar@intel.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=clabbe@baylibre.com \
    --cc=davem@davemloft.net \
    --cc=ebiggers@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=kristen.c.accardi@intel.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=senozhatsky@chromium.org \
    --cc=surenb@google.com \
    --cc=usamaarif642@gmail.com \
    --cc=vinicius.gomes@intel.com \
    --cc=vinodh.gopal@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox