From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
hannes@cmpxchg.org, yosry.ahmed@linux.dev, nphamcs@gmail.com,
chengming.zhou@linux.dev, usamaarif642@gmail.com,
ryan.roberts@arm.com, 21cnbao@gmail.com,
ying.huang@linux.alibaba.com, akpm@linux-foundation.org,
senozhatsky@chromium.org, sj@kernel.org, kasong@tencent.com,
linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au,
davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org,
ebiggers@google.com, surenb@google.com,
kristen.c.accardi@intel.com, vinicius.gomes@intel.com,
giovanni.cabiddu@intel.com
Cc: wajdi.k.feghali@intel.com, kanchana.p.sridhar@intel.com
Subject: [PATCH v14 23/26] mm: zswap: Tie per-CPU acomp_ctx lifetime to the pool.
Date: Sat, 24 Jan 2026 19:35:34 -0800 [thread overview]
Message-ID: <20260125033537.334628-24-kanchana.p.sridhar@intel.com> (raw)
In-Reply-To: <20260125033537.334628-1-kanchana.p.sridhar@intel.com>
Currently, per-CPU acomp_ctx are allocated on pool creation and/or CPU
hotplug, and destroyed on pool destruction or CPU hotunplug. This
complicates the lifetime management to save memory while a CPU is
offlined, which is not very common.
Simplify lifetime management by allocating per-CPU acomp_ctx once on
pool creation (or CPU hotplug for CPUs onlined later), and keeping them
allocated until the pool is destroyed.
Refactor cleanup code from zswap_cpu_comp_dead() into
acomp_ctx_dealloc() to be used elsewhere.
The main benefit of using the CPU hotplug multi state instance startup
callback to allocate the acomp_ctx resources is that it prevents the
cores from being offlined until the multi state instance addition call
returns.
From Documentation/core-api/cpu_hotplug.rst:
"The node list add/remove operations and the callback invocations are
serialized against CPU hotplug operations."
Furthermore, zswap_[de]compress() cannot contend with
zswap_cpu_comp_prepare() because:
- During pool creation/deletion, the pool is not in the zswap_pools
list.
- During CPU hot[un]plug, the CPU is not yet online, as Yosry pointed
out. zswap_cpu_comp_prepare() will be run on a control CPU,
since CPUHP_MM_ZSWP_POOL_PREPARE is in the PREPARE section of "enum
cpuhp_state".
In both these cases, any recursions into zswap reclaim from
zswap_cpu_comp_prepare() will be handled by the old pool.
The above two observations enable the following simplifications:
1) zswap_cpu_comp_prepare():
a) acomp_ctx mutex locking:
If the process gets migrated while zswap_cpu_comp_prepare() is
running, it will complete on the new CPU. In case of failures, we
pass the acomp_ctx pointer obtained at the start of
zswap_cpu_comp_prepare() to acomp_ctx_dealloc(), which again, can
only undergo migration. There appear to be no contention
scenarios that might cause inconsistent values of acomp_ctx's
members. Hence, it seems there is no need for
mutex_lock(&acomp_ctx->mutex) in zswap_cpu_comp_prepare().
b) acomp_ctx mutex initialization:
Since the pool is not yet on zswap_pools list, we don't need to
initialize the per-CPU acomp_ctx mutex in
zswap_pool_create(). This has been restored to occur in
zswap_cpu_comp_prepare().
c) Subsequent CPU offline-online transitions:
zswap_cpu_comp_prepare() checks upfront if acomp_ctx->acomp is
valid. If so, it returns success. This should handle any CPU
hotplug online-offline transitions after pool creation is done.
2) CPU offline vis-a-vis zswap ops:
Let's suppose the process is migrated to another CPU before the
current CPU is dysfunctional. If zswap_[de]compress() holds the
acomp_ctx->mutex lock of the offlined CPU, that mutex will be
released once it completes on the new CPU. Since there is no
teardown callback, there is no possibility of UAF.
3) Pool creation/deletion and process migration to another CPU:
During pool creation/deletion, the pool is not in the zswap_pools
list. Hence it cannot contend with zswap ops on that CPU. However,
the process can get migrated.
a) Pool creation --> zswap_cpu_comp_prepare()
--> process migrated:
* Old CPU offline: no-op.
* zswap_cpu_comp_prepare() continues
to run on the new CPU to finish
allocating acomp_ctx resources for
the offlined CPU.
b) Pool deletion --> acomp_ctx_dealloc()
--> process migrated:
* Old CPU offline: no-op.
* acomp_ctx_dealloc() continues
to run on the new CPU to finish
de-allocating acomp_ctx resources
for the offlined CPU.
4) Pool deletion vis-a-vis CPU onlining:
The call to cpuhp_state_remove_instance() cannot race with
zswap_cpu_comp_prepare() because of hotplug synchronization.
The current acomp_ctx_get_cpu_lock()/acomp_ctx_put_unlock() are
deleted. Instead, zswap_[de]compress() directly call
mutex_[un]lock(&acomp_ctx->mutex).
The per-CPU memory cost of not deleting the acomp_ctx resources upon CPU
offlining, and only deleting them when the pool is destroyed, is as
follows, on x86_64:
IAA with 8 dst buffers for batching: 64.34 KB
Software compressors with 1 dst buffer: 8.28 KB
This cost is only paid when a CPU is offlined, until it is onlined
again.
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
mm/zswap.c | 164 +++++++++++++++++++++--------------------------------
1 file changed, 66 insertions(+), 98 deletions(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index 038e240c03dd..9480d54264e4 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -241,6 +241,20 @@ static inline struct xarray *swap_zswap_tree(swp_entry_t swp)
**********************************/
static void __zswap_pool_empty(struct percpu_ref *ref);
+static void acomp_ctx_dealloc(struct crypto_acomp_ctx *acomp_ctx)
+{
+ if (IS_ERR_OR_NULL(acomp_ctx))
+ return;
+
+ if (!IS_ERR_OR_NULL(acomp_ctx->req))
+ acomp_request_free(acomp_ctx->req);
+
+ if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
+ crypto_free_acomp(acomp_ctx->acomp);
+
+ kfree(acomp_ctx->buffer);
+}
+
static struct zswap_pool *zswap_pool_create(char *compressor)
{
struct zswap_pool *pool;
@@ -262,19 +276,27 @@ static struct zswap_pool *zswap_pool_create(char *compressor)
strscpy(pool->tfm_name, compressor, sizeof(pool->tfm_name));
- pool->acomp_ctx = alloc_percpu(*pool->acomp_ctx);
+ /* Many things rely on the zero-initialization. */
+ pool->acomp_ctx = alloc_percpu_gfp(*pool->acomp_ctx,
+ GFP_KERNEL | __GFP_ZERO);
if (!pool->acomp_ctx) {
pr_err("percpu alloc failed\n");
goto error;
}
- for_each_possible_cpu(cpu)
- mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex);
-
+ /*
+ * This is serialized against CPU hotplug operations. Hence, cores
+ * cannot be offlined until this finishes.
+ */
ret = cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
&pool->node);
+
+ /*
+ * cpuhp_state_add_instance() will not cleanup on failure since
+ * we don't register a hotunplug callback.
+ */
if (ret)
- goto error;
+ goto cpuhp_add_fail;
/* being the current pool takes 1 ref; this func expects the
* caller to always add the new pool as the current pool
@@ -291,6 +313,10 @@ static struct zswap_pool *zswap_pool_create(char *compressor)
ref_fail:
cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->node);
+
+cpuhp_add_fail:
+ for_each_possible_cpu(cpu)
+ acomp_ctx_dealloc(per_cpu_ptr(pool->acomp_ctx, cpu));
error:
if (pool->acomp_ctx)
free_percpu(pool->acomp_ctx);
@@ -321,9 +347,15 @@ static struct zswap_pool *__zswap_pool_create_fallback(void)
static void zswap_pool_destroy(struct zswap_pool *pool)
{
+ int cpu;
+
zswap_pool_debug("destroying", pool);
cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->node);
+
+ for_each_possible_cpu(cpu)
+ acomp_ctx_dealloc(per_cpu_ptr(pool->acomp_ctx, cpu));
+
free_percpu(pool->acomp_ctx);
zs_destroy_pool(pool->zs_pool);
@@ -735,39 +767,36 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
{
struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
- struct crypto_acomp *acomp = NULL;
- struct acomp_req *req = NULL;
- u8 *buffer = NULL;
- int ret;
+ int ret = -ENOMEM;
- buffer = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
- if (!buffer) {
- ret = -ENOMEM;
- goto fail;
+ /*
+ * To handle cases where the CPU goes through online-offline-online
+ * transitions, we return if the acomp_ctx has already been initialized.
+ */
+ if (acomp_ctx->acomp) {
+ WARN_ON_ONCE(IS_ERR(acomp_ctx->acomp));
+ return 0;
}
- acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu));
- if (IS_ERR(acomp)) {
+ acomp_ctx->buffer = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
+ if (!acomp_ctx->buffer)
+ return ret;
+
+ acomp_ctx->acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu));
+ if (IS_ERR(acomp_ctx->acomp)) {
pr_err("could not alloc crypto acomp %s : %ld\n",
- pool->tfm_name, PTR_ERR(acomp));
- ret = PTR_ERR(acomp);
+ pool->tfm_name, PTR_ERR(acomp_ctx->acomp));
+ ret = PTR_ERR(acomp_ctx->acomp);
goto fail;
}
- req = acomp_request_alloc(acomp);
- if (!req) {
+ acomp_ctx->req = acomp_request_alloc(acomp_ctx->acomp);
+ if (!acomp_ctx->req) {
pr_err("could not alloc crypto acomp_request %s\n",
pool->tfm_name);
- ret = -ENOMEM;
goto fail;
}
- /*
- * Only hold the mutex after completing allocations, otherwise we may
- * recurse into zswap through reclaim and attempt to hold the mutex
- * again resulting in a deadlock.
- */
- mutex_lock(&acomp_ctx->mutex);
crypto_init_wait(&acomp_ctx->wait);
/*
@@ -775,83 +804,19 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
* crypto_wait_req(); if the backend of acomp is scomp, the callback
* won't be called, crypto_wait_req() will return without blocking.
*/
- acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+ acomp_request_set_callback(acomp_ctx->req, CRYPTO_TFM_REQ_MAY_BACKLOG,
crypto_req_done, &acomp_ctx->wait);
- acomp_ctx->buffer = buffer;
- acomp_ctx->acomp = acomp;
- acomp_ctx->req = req;
-
acomp_request_set_unit_size(acomp_ctx->req, PAGE_SIZE);
- mutex_unlock(&acomp_ctx->mutex);
+ mutex_init(&acomp_ctx->mutex);
return 0;
fail:
- if (!IS_ERR_OR_NULL(acomp))
- crypto_free_acomp(acomp);
- kfree(buffer);
+ acomp_ctx_dealloc(acomp_ctx);
return ret;
}
-static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
-{
- struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
- struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
- struct acomp_req *req;
- struct crypto_acomp *acomp;
- u8 *buffer;
-
- if (IS_ERR_OR_NULL(acomp_ctx))
- return 0;
-
- mutex_lock(&acomp_ctx->mutex);
- req = acomp_ctx->req;
- acomp = acomp_ctx->acomp;
- buffer = acomp_ctx->buffer;
- acomp_ctx->req = NULL;
- acomp_ctx->acomp = NULL;
- acomp_ctx->buffer = NULL;
- mutex_unlock(&acomp_ctx->mutex);
-
- /*
- * Do the actual freeing after releasing the mutex to avoid subtle
- * locking dependencies causing deadlocks.
- */
- if (!IS_ERR_OR_NULL(req))
- acomp_request_free(req);
- if (!IS_ERR_OR_NULL(acomp))
- crypto_free_acomp(acomp);
- kfree(buffer);
-
- return 0;
-}
-
-static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock(struct zswap_pool *pool)
-{
- struct crypto_acomp_ctx *acomp_ctx;
-
- for (;;) {
- acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
- mutex_lock(&acomp_ctx->mutex);
- if (likely(acomp_ctx->req))
- return acomp_ctx;
- /*
- * It is possible that we were migrated to a different CPU after
- * getting the per-CPU ctx but before the mutex was acquired. If
- * the old CPU got offlined, zswap_cpu_comp_dead() could have
- * already freed ctx->req (among other things) and set it to
- * NULL. Just try again on the new CPU that we ended up on.
- */
- mutex_unlock(&acomp_ctx->mutex);
- }
-}
-
-static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
-{
- mutex_unlock(&acomp_ctx->mutex);
-}
-
static bool zswap_compress(struct page *page, struct zswap_entry *entry,
struct zswap_pool *pool)
{
@@ -864,7 +829,9 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
u8 *dst;
bool mapped = false;
- acomp_ctx = acomp_ctx_get_cpu_lock(pool);
+ acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
+ mutex_lock(&acomp_ctx->mutex);
+
dst = acomp_ctx->buffer;
sg_init_table(&input, 1);
sg_set_page(&input, page, PAGE_SIZE, 0);
@@ -927,7 +894,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
else if (alloc_ret)
zswap_reject_alloc_fail++;
- acomp_ctx_put_unlock(acomp_ctx);
+ mutex_unlock(&acomp_ctx->mutex);
return comp_ret == 0 && alloc_ret == 0;
}
@@ -939,7 +906,8 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
int decomp_ret = 0, dlen = PAGE_SIZE;
u8 *src, *obj;
- acomp_ctx = acomp_ctx_get_cpu_lock(pool);
+ acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
+ mutex_lock(&acomp_ctx->mutex);
obj = zs_obj_read_begin(pool->zs_pool, entry->handle, entry->length,
acomp_ctx->buffer);
@@ -971,7 +939,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
read_done:
zs_obj_read_end(pool->zs_pool, entry->handle, entry->length, obj);
- acomp_ctx_put_unlock(acomp_ctx);
+ mutex_unlock(&acomp_ctx->mutex);
if (!decomp_ret && dlen == PAGE_SIZE)
return true;
@@ -1797,7 +1765,7 @@ static int zswap_setup(void)
ret = cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE,
"mm/zswap_pool:prepare",
zswap_cpu_comp_prepare,
- zswap_cpu_comp_dead);
+ NULL);
if (ret)
goto hp_fail;
--
2.27.0
next prev parent reply other threads:[~2026-01-25 3:36 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-25 3:35 [PATCH v14 00/26] zswap compression batching with optimized iaa_crypto driver Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 01/26] crypto: iaa - Reorganize the iaa_crypto driver code Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 02/26] crypto: iaa - Replace sprintf with sysfs_emit in sysfs show functions Kanchana P Sridhar
2026-02-06 10:47 ` Herbert Xu
2026-01-25 3:35 ` [PATCH v14 03/26] crypto: iaa - New architecture for IAA device WQ [de]comp usage & core mapping Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 04/26] crypto: iaa - Simplify, consistency of function parameters, minor stats bug fix Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 05/26] crypto: iaa - Descriptor allocation timeouts with mitigations Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 06/26] crypto: iaa - iaa_wq uses percpu_refs for get/put reference counting Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 07/26] crypto: iaa - Simplify the code flow in iaa_compress() and iaa_decompress() Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 08/26] crypto: iaa - Refactor hardware descriptor setup into separate procedures Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 09/26] crypto: iaa - Simplified, efficient job submissions for non-irq mode Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 10/26] crypto: iaa - Deprecate exporting add/remove IAA compression modes Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 11/26] crypto: iaa - Expect a single scatterlist for a [de]compress request's src/dst Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 12/26] crypto: iaa - Rearchitect iaa_crypto to have clean interfaces with crypto_acomp Kanchana P Sridhar
2026-02-06 10:49 ` Herbert Xu
2026-01-25 3:35 ` [PATCH v14 13/26] crypto: acomp - Define a unit_size in struct acomp_req to enable batching Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 14/26] crypto: acomp - Add bit to indicate segmentation support Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 15/26] crypto: acomp - Add trivial segmentation wrapper Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 16/26] crypto: iaa - IAA Batching for parallel compressions/decompressions Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 17/26] crypto: iaa - Submit the two largest source buffers first in batch decompress Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 18/26] crypto: acomp, iaa - crypto_acomp integration of IAA Batching Kanchana P Sridhar
2026-02-05 4:14 ` Herbert Xu
2026-01-25 3:35 ` [PATCH v14 19/26] crypto: iaa - Enable async mode and make it the default Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 20/26] crypto: iaa - Disable iaa_verify_compress by default Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 21/26] crypto: iaa - Add deflate-iaa-dynamic compression mode Kanchana P Sridhar
2026-01-25 3:35 ` [PATCH v14 22/26] crypto: acomp - Add crypto_acomp_batch_size() to get an algorithm's batch-size Kanchana P Sridhar
2026-01-25 3:35 ` Kanchana P Sridhar [this message]
2026-02-04 16:29 ` [PATCH v14 23/26] mm: zswap: Tie per-CPU acomp_ctx lifetime to the pool Yosry Ahmed
2026-01-25 3:35 ` [PATCH v14 24/26] mm: zswap: Consistently use IS_ERR_OR_NULL() to check acomp_ctx resources Kanchana P Sridhar
2026-01-30 23:53 ` Nhat Pham
2026-01-31 1:15 ` Sridhar, Kanchana P
2026-01-25 3:35 ` [PATCH v14 25/26] mm: zswap: Store large folios in batches Kanchana P Sridhar
2026-01-31 0:33 ` Nhat Pham
2026-01-31 20:22 ` Sridhar, Kanchana P
2026-02-04 16:57 ` Yosry Ahmed
2026-01-25 3:35 ` [PATCH v14 26/26] mm: zswap: Batched zswap_compress() for compress batching of large folios Kanchana P Sridhar
2026-01-31 1:12 ` Nhat Pham
2026-01-31 20:31 ` Sridhar, Kanchana P
2026-02-01 0:48 ` Nhat Pham
2026-02-01 2:53 ` Sridhar, Kanchana P
2026-02-04 0:30 ` Nhat Pham
2026-02-04 18:10 ` Yosry Ahmed
2026-02-04 18:17 ` Yosry Ahmed
2026-02-04 18:17 ` Yosry Ahmed
2026-02-04 18:21 ` [PATCH v14 00/26] zswap compression batching with optimized iaa_crypto driver Yosry Ahmed
2026-02-04 18:39 ` Andrew Morton
2026-02-04 18:49 ` Yosry Ahmed
2026-02-05 4:16 ` Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260125033537.334628-24-kanchana.p.sridhar@intel.com \
--to=kanchana.p.sridhar@intel.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=clabbe@baylibre.com \
--cc=davem@davemloft.net \
--cc=ebiggers@google.com \
--cc=giovanni.cabiddu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=herbert@gondor.apana.org.au \
--cc=kasong@tencent.com \
--cc=kristen.c.accardi@intel.com \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=senozhatsky@chromium.org \
--cc=sj@kernel.org \
--cc=surenb@google.com \
--cc=usamaarif642@gmail.com \
--cc=vinicius.gomes@intel.com \
--cc=wajdi.k.feghali@intel.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox