RE: [PATCH v13 22/22] mm: zswap: Batched zswap_compress() with compress batching of large folios.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"nphamcs@gmail.com" <nphamcs@gmail.com>,
	"chengming.zhou@linux.dev" <chengming.zhou@linux.dev>,
	"usamaarif642@gmail.com" <usamaarif642@gmail.com>,
	"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
	"21cnbao@gmail.com" <21cnbao@gmail.com>,
	"ying.huang@linux.alibaba.com" <ying.huang@linux.alibaba.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"senozhatsky@chromium.org" <senozhatsky@chromium.org>,
	"sj@kernel.org" <sj@kernel.org>,
	"kasong@tencent.com" <kasong@tencent.com>,
	"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	"herbert@gondor.apana.org.au" <herbert@gondor.apana.org.au>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"clabbe@baylibre.com" <clabbe@baylibre.com>,
	"ardb@kernel.org" <ardb@kernel.org>,
	"ebiggers@google.com" <ebiggers@google.com>,
	"surenb@google.com" <surenb@google.com>,
	"Accardi, Kristen C" <kristen.c.accardi@intel.com>,
	"Gomes, Vinicius" <vinicius.gomes@intel.com>,
	"Feghali, Wajdi K" <wajdi.k.feghali@intel.com>,
	"Gopal, Vinodh" <vinodh.gopal@intel.com>,
	"Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
Subject: RE: [PATCH v13 22/22] mm: zswap: Batched zswap_compress() with compress batching of large folios.
Date: Fri, 14 Nov 2025 06:43:21 +0000	[thread overview]
Message-ID: <SJ2PR11MB8472610CE6EF5BA83BCC8D2EC9CAA@SJ2PR11MB8472.namprd11.prod.outlook.com> (raw)
In-Reply-To: <ifqmrypobhqxlkh734md5it22vggmkvqo2t2uy7hgch5hmlyln@flqi75fwmfd4>


> -----Original Message-----
> From: Yosry Ahmed <yosry.ahmed@linux.dev>
> Sent: Thursday, November 13, 2025 9:52 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> usamaarif642@gmail.com; ryan.roberts@arm.com; 21cnbao@gmail.com;
> ying.huang@linux.alibaba.com; akpm@linux-foundation.org;
> senozhatsky@chromium.org; sj@kernel.org; kasong@tencent.com; linux-
> crypto@vger.kernel.org; herbert@gondor.apana.org.au;
> davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org;
> ebiggers@google.com; surenb@google.com; Accardi, Kristen C
> <kristen.c.accardi@intel.com>; Gomes, Vinicius <vinicius.gomes@intel.com>;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v13 22/22] mm: zswap: Batched zswap_compress() with
> compress batching of large folios.
> 
> On Thu, Nov 13, 2025 at 11:55:10PM +0000, Sridhar, Kanchana P wrote:
> >
> > > -----Original Message-----
> > > From: Yosry Ahmed <yosry.ahmed@linux.dev>
> > > Sent: Thursday, November 13, 2025 1:35 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > hannes@cmpxchg.org; nphamcs@gmail.com;
> chengming.zhou@linux.dev;
> > > usamaarif642@gmail.com; ryan.roberts@arm.com; 21cnbao@gmail.com;
> > > ying.huang@linux.alibaba.com; akpm@linux-foundation.org;
> > > senozhatsky@chromium.org; sj@kernel.org; kasong@tencent.com; linux-
> > > crypto@vger.kernel.org; herbert@gondor.apana.org.au;
> > > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org;
> > > ebiggers@google.com; surenb@google.com; Accardi, Kristen C
> > > <kristen.c.accardi@intel.com>; Gomes, Vinicius
> <vinicius.gomes@intel.com>;
> > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v13 22/22] mm: zswap: Batched zswap_compress()
> with
> > > compress batching of large folios.
> > >
> [..]
> > > > +		/*
> > > > +		 * If a page cannot be compressed into a size smaller than
> > > > +		 * PAGE_SIZE, save the content as is without a compression,
> > > to
> > > > +		 * keep the LRU order of writebacks.  If writeback is disabled,
> > > > +		 * reject the page since it only adds metadata overhead.
> > > > +		 * swap_writeout() will put the page back to the active LRU
> > > list
> > > > +		 * in the case.
> > > > +		 *
> > > > +		 * It is assumed that any compressor that sets the output
> > > length
> > > > +		 * to 0 or a value >= PAGE_SIZE will also return a negative
> > > > +		 * error status in @err; i.e, will not return a successful
> > > > +		 * compression status in @err in this case.
> > > > +		 */
> > >
> > > Ugh, checking the compression error and checking the compression length
> > > are now in separate places so we need to check if writeback is disabled
> > > in separate places and store the page as-is. It's ugly, and I think the
> > > current code is not correct.
> >
> > The code is 100% correct. You need to spend more time understanding
> > the code. I have stated my assumption above in the comments to
> > help in understanding the "why".
> >
> > From a maintainer, I would expect more responsible statements than
> > this. A flippant remark made without understanding the code (and,
> > disparaging the comments intended to help you do this), can impact
> > someone's career. I am held accountable in my job based on your
> > comments.
> >
> > That said, I have worked tirelessly and innovated to make the code
> > compliant with Herbert's suggestions (which btw have enabled an
> > elegant batching implementation and code commonality for IAA and
> > software compressors), validated it thoroughly for IAA and ZSTD to
> > ensure that both demonstrate performance improvements, which
> > are crucial for memory savings. I am proud of this work.
> >
> >
> > >
> > > > +		if (err && !wb_enabled)
> > > > +			goto compress_error;
> > > > +
> > > > +		for_each_sg(acomp_ctx->sg_outputs->sgl, sg, nr_comps, k) {
> > > > +			j = k + i;
> > >
> > > Please use meaningful iterator names rather than i, j, and k and the huge
> > > comment explaining what they are.
> >
> > I happen to have a different view: having longer iterator names firstly makes
> > code seem "verbose" and detracts from readability, not to mention
> exceeding the
> > 80-character line limit. The comments are essential for code maintainability
> > and avoid out-of-bounds errors when the next zswap developer wants to
> > optimize the code.
> >
> > One drawback of i/j/k iterators is mis-typing errors which cannot be caught
> > at compile time. Let me think some more about how to strike a good
> balance.
> >
> > >
> > > > +			dst = acomp_ctx->buffers[k];
> > > > +			dlen = sg->length | *errp;
> > >
> > > Why are we doing this?
> > >
> > > > +
> > > > +			if (dlen < 0) {
> > >
> > > We should do the incompressible page handling also if dlen is PAGE_SIZE,
> > > or if the compression failed (I guess that's the intention of bit OR'ing
> > > with *errp?)
> >
> > Yes, indeed: that's the intention of bit OR'ing with *errp.
> 
> ..and you never really answered my question. In the exising code we
> store the page as incompressible if writeback is enabled AND
> crypto_wait_req() fails or dlen is zero or PAGE_SIZE. We check above
> if crypto_wait_req() fails and writeback is disabled, but what about the
> rest?

Let me explain this some more. The new code only relies on the assumption
that if dlen is zero or >= PAGE_SIZE, the compressor will not return a 0
("successful status"). In other words, the compressor will return an error status
in this case, which is expected to be a negative error code.

Under these (hopefully valid) assumptions, the code handles the simple case
of an error compression return status and writeback is disabled, by the
"goto compress_error".

The rest is handled by these:

1) First, I need to adapt the use of sg_outputs->sgl->length to represent the
compress length for software compressors, so I do this after crypto_wait_req()
returns:

                acomp_ctx->sg_outputs->sgl->length = acomp_ctx->req->dlen;

I did not want to propose any changes to crypto software compressors protocols.

2) After the check for the "if (err && !wb_enabled)" case, the new code has this:

                for_each_sg(acomp_ctx->sg_outputs->sgl, sg, nr_comps, k) {
                        j = k + i;
                        dst = acomp_ctx->buffers[k];
                        dlen = sg->length | *errp;

                        if (dlen < 0) {
                                dlen = PAGE_SIZE;
                                dst = kmap_local_page(folio_page(folio, start + j));
                        }

For batching compressors, namely, iaa_crypto, the individual output SG
lists sg->length follows the requirements from Herbert: each sg->length
is the compressed length or the error status (a negative error code).

Then all I need to know whether to store the page as incompressible
is to either directly test if sg->length is negative (for batching compressors),
or sg->length bit-OR'ed with the crypto_wait_req() return status ("err")
is negative. This is accomplished by the "dlen = sg->length | *errp;".

I believe this maintains backward compatibility with the existing code.
Please let me know if you agree.

> 
> We don't check again if writeback is enabled before storing the page is
> incompressible, and we do not check if dlen is zero or PAGE_SIZE. Are
> these cases no longer possible?

Hope the above explanation clarifies things some more? These case
are possible, and as long as they return an error status, they should be
correctly handled by the new code.

> 
> Also, why use errp, why not explicitly use the appropriate error code?
> It's also unclear to me why the error code is always zero with HW
> compression?

This is because of the sg->length requirements (compressed length/error)
for the batching interface suggested by Herbert. Hence, I upfront define
err_sg to 0, and, set errp to &err_sg for batching compressors. For software
compressors, errp is set to &err, namely, the above check will always apply
the software compressor's error status to the compressed length via
the bit-OR to determine if the page needs to be stored uncompressed.


> 
> >
> > >
> > > > +				dlen = PAGE_SIZE;
> > > > +				dst = kmap_local_page(folio_page(folio, start
> > > + j));
> > > > +			}
> > > > +
> > > > +			handle = zs_malloc(pool->zs_pool, dlen, gfp, nid);
> > > >
> > > > -	zs_obj_write(pool->zs_pool, handle, dst, dlen);
> > > > -	entry->handle = handle;
> > > > -	entry->length = dlen;
> > > > +			if (IS_ERR_VALUE(handle)) {
> > > > +				if (PTR_ERR((void *)handle) == -ENOSPC)
> > > > +					zswap_reject_compress_poor++;
> > > > +				else
> > > > +					zswap_reject_alloc_fail++;
> > > >
> > > > -unlock:
> > > > -	if (mapped)
> > > > -		kunmap_local(dst);
> > > > -	if (comp_ret == -ENOSPC || alloc_ret == -ENOSPC)
> > > > -		zswap_reject_compress_poor++;
> > > > -	else if (comp_ret)
> > > > -		zswap_reject_compress_fail++;
> > > > -	else if (alloc_ret)
> > > > -		zswap_reject_alloc_fail++;
> > > > +				goto err_unlock;
> > > > +			}
> > > > +
> > > > +			zs_obj_write(pool->zs_pool, handle, dst, dlen);
> > > > +			entries[j]->handle = handle;
> > > > +			entries[j]->length = dlen;
> > > > +			if (dst != acomp_ctx->buffers[k])
> > > > +				kunmap_local(dst);
> > > > +		}
> > > > +	} /* finished compress and store nr_pages. */
> > > > +
> > > > +	mutex_unlock(&acomp_ctx->mutex);
> > > > +	return true;
> > > > +
> > > > +compress_error:
> > > > +	for_each_sg(acomp_ctx->sg_outputs->sgl, sg, nr_comps, k) {
> > > > +		if ((int)sg->length < 0) {
> > > > +			if ((int)sg->length == -ENOSPC)
> > > > +				zswap_reject_compress_poor++;
> > > > +			else
> > > > +				zswap_reject_compress_fail++;
> > > > +		}
> > > > +	}
> > > >
> > > > +err_unlock:
> > > >  	mutex_unlock(&acomp_ctx->mutex);
> > > > -	return comp_ret == 0 && alloc_ret == 0;
> > > > +	return false;
> > > >  }
> > > >
> > > >  static bool zswap_decompress(struct zswap_entry *entry, struct folio
> > > *folio)
> > > > @@ -1488,12 +1604,9 @@ static bool zswap_store_pages(struct folio
> > > *folio,
> > > >  		INIT_LIST_HEAD(&entries[i]->lru);
> > > >  	}
> > > >
> > > > -	for (i = 0; i < nr_pages; ++i) {
> > > > -		struct page *page = folio_page(folio, start + i);
> > > > -
> > > > -		if (!zswap_compress(page, entries[i], pool, wb_enabled))
> > > > -			goto store_pages_failed;
> > > > -	}
> > > > +	if (unlikely(!zswap_compress(folio, start, nr_pages, entries, pool,
> > > > +				     nid, wb_enabled)))
> > > > +		goto store_pages_failed;
> > > >
> > > >  	for (i = 0; i < nr_pages; ++i) {
> > > >  		struct zswap_entry *old, *entry = entries[i];
> > > > --
> > > > 2.27.0
> > > >

next prev parent reply	other threads:[~2025-11-14  6:43 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-04  9:12 [PATCH v13 00/22] zswap compression batching with optimized iaa_crypto driver Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 01/22] crypto: iaa - Reorganize the iaa_crypto driver code Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 02/22] crypto: iaa - New architecture for IAA device WQ comp/decomp usage & core mapping Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 03/22] crypto: iaa - Simplify, consistency of function parameters, minor stats bug fix Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 04/22] crypto: iaa - Descriptor allocation timeouts with mitigations Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 05/22] crypto: iaa - iaa_wq uses percpu_refs for get/put reference counting Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 06/22] crypto: iaa - Simplify the code flow in iaa_compress() and iaa_decompress() Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 07/22] crypto: iaa - Refactor hardware descriptor setup into separate procedures Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 08/22] crypto: iaa - Simplified, efficient job submissions for non-irq mode Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 09/22] crypto: iaa - Deprecate exporting add/remove IAA compression modes Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 10/22] crypto: iaa - Expect a single scatterlist for a [de]compress request's src/dst Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 11/22] crypto: iaa - Rearchitect iaa_crypto to have clean interfaces with crypto_acomp Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 12/22] crypto: acomp - Define a unit_size in struct acomp_req to enable batching Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 13/22] crypto: iaa - IAA Batching for parallel compressions/decompressions Kanchana P Sridhar
2025-11-14  9:59   ` Herbert Xu
2025-11-16 18:53     ` Sridhar, Kanchana P
2025-11-17  3:12       ` Herbert Xu
2025-11-17  5:47         ` Sridhar, Kanchana P
2025-11-04  9:12 ` [PATCH v13 14/22] crypto: iaa - Enable async mode and make it the default Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 15/22] crypto: iaa - Disable iaa_verify_compress by default Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 16/22] crypto: iaa - Submit the two largest source buffers first in decompress batching Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 17/22] crypto: iaa - Add deflate-iaa-dynamic compression mode Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 18/22] crypto: acomp - Add crypto_acomp_batch_size() to get an algorithm's batch-size Kanchana P Sridhar
2025-11-04  9:12 ` [PATCH v13 19/22] mm: zswap: Per-CPU acomp_ctx resources exist from pool creation to deletion Kanchana P Sridhar
2025-11-13 20:24   ` Yosry Ahmed
2025-12-12  0:55     ` Sridhar, Kanchana P
2025-12-12  1:06       ` Yosry Ahmed
2025-12-12  1:58         ` Sridhar, Kanchana P
2025-12-12  2:47           ` Yosry Ahmed
2025-12-12  4:32             ` Sridhar, Kanchana P
2025-12-12 18:17     ` Sridhar, Kanchana P
2025-12-12 18:43       ` Yosry Ahmed
2025-12-12 20:53         ` Sridhar, Kanchana P
2025-12-12 22:25           ` Yosry Ahmed
2025-12-13 19:53             ` Sridhar, Kanchana P
2025-11-04  9:12 ` [PATCH v13 20/22] mm: zswap: Consistently use IS_ERR_OR_NULL() to check acomp_ctx resources Kanchana P Sridhar
2025-11-13 20:25   ` Yosry Ahmed
2025-12-12  1:07     ` Sridhar, Kanchana P
2025-11-04  9:12 ` [PATCH v13 21/22] mm: zswap: zswap_store() will process a large folio in batches Kanchana P Sridhar
2025-11-06 17:45   ` Nhat Pham
2025-11-07  2:28     ` Sridhar, Kanchana P
2025-11-13 20:52       ` Yosry Ahmed
2025-11-13 20:51   ` Yosry Ahmed
2025-12-12  1:43     ` Sridhar, Kanchana P
2025-12-12  4:40       ` Yosry Ahmed
2025-12-12 18:03         ` Sridhar, Kanchana P
2025-11-04  9:12 ` [PATCH v13 22/22] mm: zswap: Batched zswap_compress() with compress batching of large folios Kanchana P Sridhar
2025-11-13 21:34   ` Yosry Ahmed
2025-11-13 23:55     ` Sridhar, Kanchana P
2025-11-14  0:46       ` Yosry Ahmed
2025-12-19  2:29         ` Sridhar, Kanchana P
2025-12-19 15:26           ` Yosry Ahmed
2025-12-19 19:03             ` Sridhar, Kanchana P
2025-11-14  5:52       ` Yosry Ahmed
2025-11-14  6:43         ` Sridhar, Kanchana P [this message]
2025-11-14 15:37           ` Yosry Ahmed
2025-11-14 19:23             ` Sridhar, Kanchana P
2025-11-14 19:44               ` Yosry Ahmed
2025-11-14 19:59                 ` Sridhar, Kanchana P
2025-11-14 20:49                   ` Yosry Ahmed
2025-11-26  5:46             ` Herbert Xu
2025-11-26  6:34               ` Yosry Ahmed
2025-11-26 20:05                 ` Sridhar, Kanchana P
2025-12-08  3:23                   ` Herbert Xu
2025-12-08  4:17                     ` Sridhar, Kanchana P
2025-12-08  4:24                       ` Herbert Xu
2025-12-08  4:33                         ` Sridhar, Kanchana P
2025-12-09  1:15                         ` Yosry Ahmed
2025-12-09  2:32                           ` Herbert Xu
2025-12-09 16:55                             ` Yosry Ahmed
2025-12-09 17:21                               ` Sridhar, Kanchana P
2025-12-09 17:31                                 ` Yosry Ahmed
2025-12-09 19:38                                   ` Sridhar, Kanchana P
2025-12-10 16:01                                     ` Yosry Ahmed
2025-12-10 18:47                                       ` Sridhar, Kanchana P
2025-12-10  4:28                                   ` Herbert Xu
2025-12-10  5:36                                     ` Sridhar, Kanchana P
2025-12-10 15:53                                     ` Yosry Ahmed
2025-11-13 18:14 ` [PATCH v13 00/22] zswap compression batching with optimized iaa_crypto driver Sridhar, Kanchana P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ2PR11MB8472610CE6EF5BA83BCC8D2EC9CAA@SJ2PR11MB8472.namprd11.prod.outlook.com \
    --to=kanchana.p.sridhar@intel.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=clabbe@baylibre.com \
    --cc=davem@davemloft.net \
    --cc=ebiggers@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=kasong@tencent.com \
    --cc=kristen.c.accardi@intel.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=senozhatsky@chromium.org \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=usamaarif642@gmail.com \
    --cc=vinicius.gomes@intel.com \
    --cc=vinodh.gopal@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox