RE: [PATCH v4 09/10] mm: zswap: Allocate pool batching resources if the crypto_alg supports batching.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	Nhat Pham <nphamcs@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"chengming.zhou@linux.dev" <chengming.zhou@linux.dev>,
	"usamaarif642@gmail.com" <usamaarif642@gmail.com>,
	"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
	"21cnbao@gmail.com" <21cnbao@gmail.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"clabbe@baylibre.com" <clabbe@baylibre.com>,
	"ardb@kernel.org" <ardb@kernel.org>,
	"ebiggers@google.com" <ebiggers@google.com>,
	"surenb@google.com" <surenb@google.com>,
	"Accardi, Kristen C" <kristen.c.accardi@intel.com>,
	"Feghali, Wajdi K" <wajdi.k.feghali@intel.com>,
	"Gopal, Vinodh" <vinodh.gopal@intel.com>,
	"Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
Subject: RE: [PATCH v4 09/10] mm: zswap: Allocate pool batching resources if the crypto_alg supports batching.
Date: Tue, 3 Dec 2024 22:24:17 +0000	[thread overview]
Message-ID: <SJ0PR11MB5678F262108A0ED22A84698BC9362@SJ0PR11MB5678.namprd11.prod.outlook.com> (raw)
In-Reply-To: <SJ0PR11MB56789D21AAA0970F9B8AB5CFC9362@SJ0PR11MB5678.namprd11.prod.outlook.com>


> -----Original Message-----
> From: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Sent: Tuesday, December 3, 2024 2:18 PM
> To: Yosry Ahmed <yosryahmed@google.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>; Nhat Pham
> <nphamcs@gmail.com>; linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; chengming.zhou@linux.dev;
> usamaarif642@gmail.com; ryan.roberts@arm.com; 21cnbao@gmail.com;
> akpm@linux-foundation.org; linux-crypto@vger.kernel.org;
> davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org;
> ebiggers@google.com; surenb@google.com; Accardi, Kristen C
> <kristen.c.accardi@intel.com>; Feghali, Wajdi K <wajdi.k.feghali@intel.com>;
> Gopal, Vinodh <vinodh.gopal@intel.com>; Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com>
> Subject: RE: [PATCH v4 09/10] mm: zswap: Allocate pool batching resources if
> the crypto_alg supports batching.
> 
> 
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@google.com>
> > Sent: Tuesday, December 3, 2024 1:44 PM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>; Nhat Pham
> > <nphamcs@gmail.com>; linux-kernel@vger.kernel.org; linux-
> mm@kvack.org;
> > hannes@cmpxchg.org; chengming.zhou@linux.dev;
> > usamaarif642@gmail.com; ryan.roberts@arm.com; ying.huang@intel.com;
> > 21cnbao@gmail.com; akpm@linux-foundation.org; linux-
> > crypto@vger.kernel.org; davem@davemloft.net; clabbe@baylibre.com;
> > ardb@kernel.org; ebiggers@google.com; surenb@google.com; Accardi,
> > Kristen C <kristen.c.accardi@intel.com>; Feghali, Wajdi K
> > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v4 09/10] mm: zswap: Allocate pool batching resources
> if
> > the crypto_alg supports batching.
> >
> > On Tue, Dec 3, 2024 at 1:37 PM Sridhar, Kanchana P
> > <kanchana.p.sridhar@intel.com> wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Herbert Xu <herbert@gondor.apana.org.au>
> > > > Sent: Tuesday, December 3, 2024 12:01 AM
> > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > Cc: Nhat Pham <nphamcs@gmail.com>; linux-kernel@vger.kernel.org;
> > linux-
> > > > mm@kvack.org; hannes@cmpxchg.org; yosryahmed@google.com;
> > > > chengming.zhou@linux.dev; usamaarif642@gmail.com;
> > > > ryan.roberts@arm.com; ying.huang@intel.com; 21cnbao@gmail.com;
> > > > akpm@linux-foundation.org; linux-crypto@vger.kernel.org;
> > > > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org;
> > > > ebiggers@google.com; surenb@google.com; Accardi, Kristen C
> > > > <kristen.c.accardi@intel.com>; Feghali, Wajdi K
> > <wajdi.k.feghali@intel.com>;
> > > > Gopal, Vinodh <vinodh.gopal@intel.com>
> > > > Subject: Re: [PATCH v4 09/10] mm: zswap: Allocate pool batching
> > resources if
> > > > the crypto_alg supports batching.
> > > >
> > > > On Tue, Dec 03, 2024 at 12:30:30AM +0000, Sridhar, Kanchana P wrote:
> > > > >
> > > > > > Why do we need this "can_batch" field? IIUC, this can be
> determined
> > > > > > from the compressor internal fields itself, no?
> > > > > >
> > > > > > acomp_has_async_batching(acomp);
> > > > > >
> > > > > > Is this just for convenience, or is this actually an expensive thing to
> > > > compute?
> > > > >
> > > > > Thanks for your comments. This is a good question. I tried not to imply
> > that
> > > > > batching resources have been allocated for the cpu based only on what
> > > > > acomp_has_async_batching() returns. It is possible that the cpu
> onlining
> > > > > code ran into an -ENOMEM error on any particular cpu. In this case, I
> > set
> > > > > the pool->can_batch to "false", mainly for convenience, so that zswap
> > > > > can be somewhat insulated from migration. I agree that this may not
> be
> > > > > the best solution; and whether or not batching is enabled can be
> directly
> > > > > determined just before the call to crypto_acomp_batch_compress()
> > > > > based on:
> > > > >
> > > > > acomp_ctx->nr_reqs == SWAP_CRYPTO_BATCH_SIZE;
> > > >
> > > > With ahash request chaining, the idea is to accumulate as much
> > > > data as you can before you provide it to the Crypto API.  The
> > > > API is responsible for dividing it up if the underlying driver
> > > > is only able to handle one request at a time.
> > > >
> > > > So that would be the ideal model to use for compression as well.
> > > > Provide as much data as you can and let the API handle the case
> > > > where the data needs to be divided up.
> > >
> > > Thanks for this suggestion! This sounds like a clean way to handle the
> > > batching/sequential compress/decompress within the crypto API as long
> > > as it can be contained in the crypto acompress layer.
> > > If the zswap maintainers don't have any objections, I can look into the
> > > feasibility of doing this.
> >
> > Does this mean that instead of zswap breaking down the folio into
> > SWAP_CRYPTO_BATCH_SIZE -sized batches, we pass all the pages to the
> > crypto layer and let it do the batching as it pleases?
> 
> If I understand Herbert's suggestion correctly, I think what he meant was
> that we allocate only SWAP_CRYPTO_BATCH_SIZE # of buffers in zswap (say,
> 8)
> during the cpu onlining always. The acomp_has_async_batching() API can
> be used to determine whether to allocate more than one acomp_req and
> crypto_wait (fyi, I am creating SWAP_CRYPTO_BATCH_SIZE # of crypto_wait
> for the request chaining with the goal of understanding performance wrt the
> existing implementation of crypto_acomp_batch_compress()).
> In zswap_store_folio(), we process the large folio in batches of 8 pages
> and call "crypto_acomp_batch_compress()" for each batch. Based on earlier
> discussions in this thread, it might make sense to add a bool option to
> crypto_acomp_batch_compress() as follows:
> 
> static inline bool crypto_acomp_batch_compress(struct acomp_req *reqs[],
> 					       struct crypto_wait *waits[],
> 					       struct page *pages[],
> 					       u8 *dsts[],
> 					       unsigned int dlens[],
> 					       int errors[],
> 					       int nr_pages,
> 					       bool parallel);
> 
> zswap would acquire the per-cpu acomp_ctx->mutex, and pass
> (acomp_ctx->nr_reqs == SWAP_CRYPTO_BATCH_SIZE) for the "parallel"
> parameter.
> This indicates to crypto_acomp_batch_compress() whether or not
> SWAP_CRYPTO_BATCH_SIZE # of elements are available in "reqs" and
> "waits".
> 
> If we have multiple "reqs" (parallel == true), we use request chaining (or the
> existing asynchronous poll implementation) for IAA batching. If (parallel ==
> false),
> crypto_acomp_batch_compress() will look something like this:
> 
> static inline bool crypto_acomp_batch_compress(struct acomp_req *reqs[],
> 					       struct crypto_wait *waits[],
> 					       struct page *pages[],
> 					       u8 *dsts[],
> 					       unsigned int dlens[],
> 					       int errors[],
> 					       int nr_pages,
> 					       bool parallel)
> {
> 	if (!parallel) {
> 		struct scatterlist input, output;
> 		int i;
> 
> 		for (i = 0; i < nr_pages; ++i) {
> 			/* for pages[i], buffers[i], dlens[i]: borrow first half of
> 			 * zswap_compress() functionality:
> 			*/
> 			dst = acomp_ctx->buffers[i];
> 			sg_init_table(&input, 1);
> 			sg_set_page(&input, pages[i], PAGE_SIZE, 0);
> 
> 			sg_init_one(&output, dst, PAGE_SIZE * 2);
> 			acomp_request_set_params(acomp_ctx->reqs[0],
> &input, &output, PAGE_SIZE, dlens[i]);
> 
> 			comp_ret =
> crypto_wait_req(crypto_acomp_compress(acomp_ctx->reqs[0]), acomp_ctx-
> >waits[0]);
> 			dlens[i] = acomp_ctx->reqs[0]->dlen;
> 		}
> 	}
> 
> 	/*
> 	 * At this point we would have sequentially compressed the batch.
> 	 * zswap_store_folio() can process the buffers and dlens using
> 	 * common code for batching and non-batching compressors.
> 	*/
> }
> 
> IIUC, this suggestion appears to be along the lines of using common
> code in zswap as far as possible, for compressors that do and do not
> support batching. Herbert can correct me if I am wrong.
> 
> If this is indeed the case, the memory penalty for software compressors
> would be:
> 1) pre-allocating SWAP_CRYPTO_BATCH_SIZE acomp_ctx->buffers in
>     zswap_cpu_comp_prepare().
> 2) SWAP_CRYPTO_BATCH_SIZE stack variables for pages and dlens in
>     zswap_store_folio().
> 
> This would be an additional memory penalty for what we gain by
> having the common code paths in zswap for compressors that do
> and do not support batching.

Alternately, we could use request chaining always, even for software
compressors for a larger memory penalty per-cpu, by allocating
SWAP_CRYPTO_BATCH_SIZE # of reqs/waits by default. I don't know
if this would have functional issues because the chain of requests
will be processed sequentially (basically all requests are added to a
list), but maybe Herbert is suggesting this (not sure).

Thanks,
Kanchana

> 
> Thanks,
> Kanchana
> 
> >
> > It sounds nice on the surface, but this implies that we have to
> > allocate folio_nr_pages() buffers in zswap, essentially as the
> > allocation is the same size as the folio itself. While the allocation
> > does not need to be contiguous, making a large number of allocations
> > in the reclaim path is definitely not something we want. For a 2M THP,
> > we'd need to allocate 2M in zswap_store().
> >
> > If we choose to keep preallocating, assuming the maximum THP size is
> > 2M, we need to allocate 2M * nr_cpus worth of buffers. That's a lot of
> > memory.
> >
> > I feel like I am missing something.
> >
> > >
> > > Thanks,
> > > Kanchana
> > >
> > > >
> > > > Cheers,
> > > > --
> > > > Email: Herbert Xu <herbert@gondor.apana.org.au>
> > > > Home Page: http://gondor.apana.org.au/~herbert/
> > > > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

next prev parent reply	other threads:[~2024-12-03 22:24 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-23  7:01 [PATCH v4 00/10] zswap IAA compress batching Kanchana P Sridhar
2024-11-23  7:01 ` [PATCH v4 01/10] crypto: acomp - Define two new interfaces for compress/decompress batching Kanchana P Sridhar
2024-11-25  9:35   ` Herbert Xu
2024-11-25 20:03     ` Sridhar, Kanchana P
2024-11-26  2:13       ` Sridhar, Kanchana P
2024-11-26  2:14         ` Herbert Xu
2024-11-26  2:37           ` Sridhar, Kanchana P
2024-11-27  1:22             ` Sridhar, Kanchana P
2024-11-27  5:04               ` Herbert Xu
2024-11-23  7:01 ` [PATCH v4 02/10] crypto: iaa - Add an acomp_req flag CRYPTO_ACOMP_REQ_POLL to enable async mode Kanchana P Sridhar
2024-11-23  7:01 ` [PATCH v4 03/10] crypto: iaa - Implement batch_compress(), batch_decompress() API in iaa_crypto Kanchana P Sridhar
2024-11-26  7:05   ` kernel test robot
2024-11-23  7:01 ` [PATCH v4 04/10] crypto: iaa - Make async mode the default Kanchana P Sridhar
2024-11-23  7:01 ` [PATCH v4 05/10] crypto: iaa - Disable iaa_verify_compress by default Kanchana P Sridhar
2024-11-23  7:01 ` [PATCH v4 06/10] crypto: iaa - Re-organize the iaa_crypto driver code Kanchana P Sridhar
2024-11-23  7:01 ` [PATCH v4 07/10] crypto: iaa - Map IAA devices/wqs to cores based on packages instead of NUMA Kanchana P Sridhar
2024-11-23  7:01 ` [PATCH v4 08/10] crypto: iaa - Distribute compress jobs from all cores to all IAAs on a package Kanchana P Sridhar
2024-11-23  7:01 ` [PATCH v4 09/10] mm: zswap: Allocate pool batching resources if the crypto_alg supports batching Kanchana P Sridhar
2024-12-02 19:15   ` Nhat Pham
2024-12-03  0:30     ` Sridhar, Kanchana P
2024-12-03  8:00       ` Herbert Xu
2024-12-03 21:37         ` Sridhar, Kanchana P
2024-12-03 21:44           ` Yosry Ahmed
2024-12-03 22:17             ` Sridhar, Kanchana P
2024-12-03 22:24               ` Sridhar, Kanchana P [this message]
2024-12-04  1:42             ` Herbert Xu
2024-12-04 22:35               ` Yosry Ahmed
2024-12-04 22:49                 ` Sridhar, Kanchana P
2024-12-04 22:55                   ` Yosry Ahmed
2024-12-04 23:12                     ` Sridhar, Kanchana P
2024-12-21  6:30       ` Sridhar, Kanchana P
2024-11-23  7:01 ` [PATCH v4 10/10] mm: zswap: Compress batching with Intel IAA in zswap_batch_store() of large folios Kanchana P Sridhar
2024-11-25  8:00   ` kernel test robot
2024-11-25 20:20   ` Yosry Ahmed
2024-11-25 21:47     ` Johannes Weiner
2024-11-25 21:54     ` Sridhar, Kanchana P
2024-11-25 22:08       ` Yosry Ahmed
2024-12-02 19:26       ` Nhat Pham
2024-12-03  0:34         ` Sridhar, Kanchana P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB5678F262108A0ED22A84698BC9362@SJ0PR11MB5678.namprd11.prod.outlook.com \
    --to=kanchana.p.sridhar@intel.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=clabbe@baylibre.com \
    --cc=davem@davemloft.net \
    --cc=ebiggers@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=kristen.c.accardi@intel.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=usamaarif642@gmail.com \
    --cc=vinodh.gopal@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox