From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
"nphamcs@gmail.com" <nphamcs@gmail.com>,
"chengming.zhou@linux.dev" <chengming.zhou@linux.dev>,
"usamaarif642@gmail.com" <usamaarif642@gmail.com>,
"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
"21cnbao@gmail.com" <21cnbao@gmail.com>,
"ying.huang@linux.alibaba.com" <ying.huang@linux.alibaba.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"senozhatsky@chromium.org" <senozhatsky@chromium.org>,
"sj@kernel.org" <sj@kernel.org>,
"kasong@tencent.com" <kasong@tencent.com>,
"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
"herbert@gondor.apana.org.au" <herbert@gondor.apana.org.au>,
"davem@davemloft.net" <davem@davemloft.net>,
"clabbe@baylibre.com" <clabbe@baylibre.com>,
"ardb@kernel.org" <ardb@kernel.org>,
"ebiggers@google.com" <ebiggers@google.com>,
"surenb@google.com" <surenb@google.com>,
"Accardi, Kristen C" <kristen.c.accardi@intel.com>,
"Gomes, Vinicius" <vinicius.gomes@intel.com>,
"Feghali, Wajdi K" <wajdi.k.feghali@intel.com>,
"Gopal, Vinodh" <vinodh.gopal@intel.com>
Subject: Re: [PATCH v12 20/23] mm: zswap: Per-CPU acomp_ctx resources exist from pool creation to deletion.
Date: Tue, 30 Sep 2025 21:20:02 +0000 [thread overview]
Message-ID: <6xb7feds424kfld4udwmbtccftwnnx6vmbpvmjcwlionfdlmuj@vz4uzh6tog5g> (raw)
In-Reply-To: <SA3PR11MB81209CF02417CED3C0459504C91AA@SA3PR11MB8120.namprd11.prod.outlook.com>
> > > > > static struct zswap_pool *zswap_pool_create(char *compressor)
> > > > > {
> > > > > struct zswap_pool *pool;
> > > > > @@ -263,19 +287,43 @@ static struct zswap_pool
> > > > *zswap_pool_create(char *compressor)
> > > > >
> > > > > strscpy(pool->tfm_name, compressor, sizeof(pool->tfm_name));
> > > > >
> > > > > - pool->acomp_ctx = alloc_percpu(*pool->acomp_ctx);
> > > > > + /* Many things rely on the zero-initialization. */
> > > > > + pool->acomp_ctx = alloc_percpu_gfp(*pool->acomp_ctx,
> > > > > + GFP_KERNEL | __GFP_ZERO);
> > > > > if (!pool->acomp_ctx) {
> > > > > pr_err("percpu alloc failed\n");
> > > > > goto error;
> > > > > }
> > > > >
> > > > > - for_each_possible_cpu(cpu)
> > > > > - mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex);
> > > > > -
> > > > > + /*
> > > > > + * This is serialized against CPU hotplug operations. Hence, cores
> > > > > + * cannot be offlined until this finishes.
> > > > > + * In case of errors, we need to goto "ref_fail" instead of "error"
> > > > > + * because there is no teardown callback registered anymore, for
> > > > > + * cpuhp_state_add_instance() to de-allocate resources as it rolls
> > > > back
> > > > > + * state on cores before the CPU on which error was encountered.
> > > > > + */
> > > > > ret =
> > > > cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
> > > > > &pool->node);
> > > > > +
> > > > > + /*
> > > > > + * We only needed the multi state instance add operation to invoke
> > > > the
> > > > > + * startup callback for all cores without cores getting offlined. Since
> > > > > + * the acomp_ctx resources will now only be de-allocated when the
> > > > pool
> > > > > + * is destroyed, we can safely remove the multi state instance. This
> > > > > + * minimizes (but does not eliminate) the possibility of
> > > > > + * zswap_cpu_comp_prepare() being invoked again due to a CPU
> > > > > + * offline-online transition. Removing the instance also prevents race
> > > > > + * conditions between CPU onlining after initial pool creation, and
> > > > > + * acomp_ctx_dealloc() freeing the acomp_ctx resources.
> > > > > + * Note that we delete the instance before checking the error status
> > > > of
> > > > > + * the node list add operation because we want the instance removal
> > > > even
> > > > > + * in case of errors in the former.
> > > > > + */
> > > > > + cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
> > > > &pool->node);
> > > > > +
> > > >
> > > > I don't understand what's wrong with the current flow? We call
> > > > cpuhp_state_remove_instance() in pool deletion before freeing up the
> > > > per-CPU resources. Why is this not enough?
> > >
> > > This is because with the changes proposed in this commit, the multi state
> > > add instance is used during pool creation as a way to create acomp_ctx
> > > resources correctly with just the offline/online state transitions guaranteed
> > > by CPU hotplug, without needing additional mutex locking as in the
> > mainline.
> > > In other words, the consistency wrt safely creating/deleting acomp_ctx
> > > resources with the changes being proposed is accomplished by the hotplug
> > > state transitions guarantee. Stated differently, the hotplug framework
> > > helps enforce the new design during pool creation without relying on the
> > > mutex and subsequent simplifications during zswap_[de]compress()
> > > proposed in this commit.
> > >
> > > Once this is done, deleting the CPU hotplug state seems cleaner, and
> > reflects
> > > the change in policy of the resources' lifetime. It also prevents race
> > conditions
> > > between zswap_cpu_comp_prepare() and acomp_ctx_dealloc() called from
> > > zswap_pool_destroy().
> >
> > How is a race with zswap_cpu_comp_prepare() possible if we call
> > cpuhp_state_remove_instance() before acomp_ctx_dealloc() in the pool
> > deletion path?
>
> Good point. I agree, calling cpuhp_state_remove_instance() before
> acomp_ctx_dealloc() will not cause a race. However, if we consider the
> time from pool creation to deletion: if there is an online-offline-online
> transition, can zswap_cpu_comp_prepare() race with the call to
> cpuhp_state_remove_instance()? If so, wouldn't this cause unpredictable
> behavior?
How will this race happen?
cpuhp_state_remove_instance() is called while a pool is being destroyed,
while zswap_cpu_comp_prepare() while the pool is being created or during
CPU onlining.
The former cannot race, and the latter should be synchronized by hotplug
code.
>
> I agree, this can occur even with the code in this commit, but there is
> less risk of things going wrong because we remove the CPU hotplug
> instance before the pool is added to zswap_pools.
>
> Further, removing the CPU hotplug instance directly codifies the
> intent of this commit, i.e., to use this as a facilitator and manage memory
> allotted to acomp_ctx, but not to manage those resources' lifetime
> thereafter.
>
> Do you see any advantage of keeping the call to cpuhp_state_remove_instance()
> occur before acomp_ctx_dealloc() in zswap_pool_destroy()? Please let me know
> if I am missing something.
What about more CPUs going online? Without the hotplug instance we don't
get per-CPU resources for those. We are not using the hotplug mechanism
just to facilitate per-CPU resource allocation, we use it to
automatically allocate resources for newly onlined CPUs without having
to preallocate for all possible CPUs.
Also, this makes the code more difficult to reason about, and is an
unncessary change from the current behavior.
The only change needed is to drop the teardown callback and do the
freeing in the pool destruction path instead.
next prev parent reply other threads:[~2025-09-30 21:20 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-26 3:34 [PATCH v12 00/23] zswap compression batching with optimized iaa_crypto driver Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 01/23] crypto: iaa - Reorganize the iaa_crypto driver code Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 02/23] crypto: iaa - New architecture for IAA device WQ comp/decomp usage & core mapping Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 03/23] crypto: iaa - Simplify, consistency of function parameters, minor stats bug fix Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 04/23] crypto: iaa - Descriptor allocation timeouts with mitigations Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 05/23] crypto: iaa - iaa_wq uses percpu_refs for get/put reference counting Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 06/23] crypto: iaa - Simplify the code flow in iaa_compress() and iaa_decompress() Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 07/23] crypto: iaa - Refactor hardware descriptor setup into separate procedures Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 08/23] crypto: iaa - Simplified, efficient job submissions for non-irq mode Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 09/23] crypto: iaa - Deprecate exporting add/remove IAA compression modes Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 10/23] crypto: iaa - Expect a single scatterlist for a [de]compress request's src/dst Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 11/23] crypto: iaa - Rearchitect the iaa_crypto driver to be usable by zswap and zram Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 12/23] crypto: iaa - Enablers for submitting descriptors then polling for completion Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 13/23] crypto: acomp - Define a unit_size in struct acomp_req to enable batching Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 14/23] crypto: iaa - IAA Batching for parallel compressions/decompressions Kanchana P Sridhar
2025-10-17 1:09 ` Herbert Xu
2025-10-17 4:04 ` Sridhar, Kanchana P
2025-10-17 4:09 ` Herbert Xu
2025-10-17 5:12 ` Sergey Senozhatsky
2025-10-17 5:49 ` Sridhar, Kanchana P
2025-09-26 3:34 ` [PATCH v12 15/23] crypto: iaa - Enable async mode and make it the default Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 16/23] crypto: iaa - Disable iaa_verify_compress by default Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 17/23] crypto: iaa - Submit the two largest source buffers first in decompress batching Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 18/23] crypto: iaa - Add deflate-iaa-dynamic compression mode Kanchana P Sridhar
2025-09-26 3:34 ` [PATCH v12 19/23] crypto: acomp - Add crypto_acomp_batch_size() to get an algorithm's batch-size Kanchana P Sridhar
2025-10-17 1:04 ` Herbert Xu
2025-10-17 4:02 ` Sridhar, Kanchana P
2025-09-26 3:34 ` [PATCH v12 20/23] mm: zswap: Per-CPU acomp_ctx resources exist from pool creation to deletion Kanchana P Sridhar
2025-09-30 15:49 ` Yosry Ahmed
2025-09-30 18:20 ` Sridhar, Kanchana P
2025-09-30 18:29 ` Yosry Ahmed
2025-09-30 21:00 ` Sridhar, Kanchana P
2025-09-30 21:20 ` Yosry Ahmed [this message]
2025-09-30 21:56 ` Sridhar, Kanchana P
2025-10-01 15:33 ` Yosry Ahmed
2025-10-01 17:37 ` Sridhar, Kanchana P
2025-09-26 3:35 ` [PATCH v12 21/23] mm: zswap: Consistently use IS_ERR_OR_NULL() to check acomp_ctx resources Kanchana P Sridhar
2025-09-26 3:35 ` [PATCH v12 22/23] mm: zswap: zswap_store() will process a large folio in batches Kanchana P Sridhar
2025-10-01 16:19 ` Yosry Ahmed
2025-10-01 21:20 ` Sridhar, Kanchana P
2025-10-03 19:10 ` Sridhar, Kanchana P
2025-10-13 17:47 ` Sridhar, Kanchana P
2025-10-13 17:58 ` Sridhar, Kanchana P
2025-10-14 15:29 ` Yosry Ahmed
2025-10-14 16:35 ` Nhat Pham
2025-10-15 3:42 ` Sridhar, Kanchana P
2025-10-15 17:04 ` Nhat Pham
2025-10-15 22:15 ` Sridhar, Kanchana P
2025-10-15 22:24 ` Yosry Ahmed
2025-10-15 22:36 ` Nhat Pham
2025-10-16 0:56 ` Sridhar, Kanchana P
2025-10-31 22:15 ` Sridhar, Kanchana P
2025-09-26 3:35 ` [PATCH v12 23/23] mm: zswap: Batched zswap_compress() with compress batching of large folios Kanchana P Sridhar
2025-10-22 5:18 ` Herbert Xu
2025-10-31 22:18 ` Sridhar, Kanchana P
2025-10-13 18:03 ` [PATCH v12 00/23] zswap compression batching with optimized iaa_crypto driver Sridhar, Kanchana P
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6xb7feds424kfld4udwmbtccftwnnx6vmbpvmjcwlionfdlmuj@vz4uzh6tog5g \
--to=yosry.ahmed@linux.dev \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=clabbe@baylibre.com \
--cc=davem@davemloft.net \
--cc=ebiggers@google.com \
--cc=hannes@cmpxchg.org \
--cc=herbert@gondor.apana.org.au \
--cc=kanchana.p.sridhar@intel.com \
--cc=kasong@tencent.com \
--cc=kristen.c.accardi@intel.com \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=senozhatsky@chromium.org \
--cc=sj@kernel.org \
--cc=surenb@google.com \
--cc=usamaarif642@gmail.com \
--cc=vinicius.gomes@intel.com \
--cc=vinodh.gopal@intel.com \
--cc=wajdi.k.feghali@intel.com \
--cc=ying.huang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox