[PATCH v3 0/6] mm/zswap: dstmem reuse optimizations and cleanups

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/6] mm/zswap: dstmem reuse optimizations and cleanups
@ 2023-12-18 11:50 Chengming Zhou
  2023-12-18 11:50 ` [PATCH v3 1/6] mm/zswap: change dstmem size to one page Chengming Zhou
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Chengming Zhou @ 2023-12-18 11:50 UTC (permalink / raw)
  To: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Chris Li, Andrew Morton, Nhat Pham
  Cc: Chris Li, Yosry Ahmed, linux-kernel, Chengming Zhou, linux-mm, Nhat Pham

Hi everyone,

Changes in v3:
- Collect Reviewed-by tag.
- Drop the __zswap_store() refactoring part.
- Link to v2: https://lore.kernel.org/r/20231213-zswap-dstmem-v2-0-daa5d9ae41a7@bytedance.com

Changes in v2:
- Add more changelog and test data about changing dstmem to one page.
- Reorder patches to put dstmem reusing and __zswap_load() refactoring
  together, still refactor after dstmem reusing since we don't want
  to handle __zswap_load() failure due to memory allocation failure
  in zswap_writeback_entry().
- Append a patch to directly use percpu mutex and buffer in load/store
  and refactor out __zswap_store() to simplify zswap_store().
- Link to v1: https://lore.kernel.org/r/20231213-zswap-dstmem-v1-0-896763369d04@bytedance.com

This series is split from [1] to only include zswap dstmem reuse
optimizations and cleanups, the other part of rbtree breakdown will
be deferred to retest after the rbtree converted to xarray.

And the problem this series tries to optimize is that zswap_load()
and zswap_writeback_entry() have to malloc a temporary memory to
support !zpool_can_sleep_mapped(). We can avoid it by reusing the
percpu crypto_acomp_ctx->dstmem, which is also used by zswap_store()
and protected by the same percpu crypto_acomp_ctx->mutex.

[1] https://lore.kernel.org/all/20231206-zswap-lock-optimize-v1-0-e25b059f9c3a@bytedance.com/

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
Chengming Zhou (6):
      mm/zswap: change dstmem size to one page
      mm/zswap: reuse dstmem when decompress
      mm/zswap: refactor out __zswap_load()
      mm/zswap: cleanup zswap_load()
      mm/zswap: cleanup zswap_writeback_entry()
      mm/zswap: directly use percpu mutex and buffer in load/store

 mm/zswap.c | 209 +++++++++++++++++++++++--------------------------------------
 1 file changed, 77 insertions(+), 132 deletions(-)
---
base-commit: 1f242c1964cf9b8d663a2fd72159b296205a8126
change-id: 20231213-zswap-dstmem-d828f563303d

Best regards,
-- 
Chengming Zhou <zhouchengming@bytedance.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/6] mm/zswap: change dstmem size to one page
  2023-12-18 11:50 [PATCH v3 0/6] mm/zswap: dstmem reuse optimizations and cleanups Chengming Zhou
@ 2023-12-18 11:50 ` Chengming Zhou
  2023-12-18 11:50 ` [PATCH v3 2/6] mm/zswap: reuse dstmem when decompress Chengming Zhou
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Chengming Zhou @ 2023-12-18 11:50 UTC (permalink / raw)
  To: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Chris Li, Andrew Morton, Nhat Pham
  Cc: Chris Li, Yosry Ahmed, linux-kernel, Chengming Zhou, linux-mm, Nhat Pham

Change the dstmem size from 2 * PAGE_SIZE to only one page since
we only need at most one page when compress, and the "dlen" is also
PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE
we don't wanna store the output in zswap anyway.

So change it to one page, and delete the stale comment.

There is no any history about the reason why we needed 2 pages, it has
been 2 * PAGE_SIZE since the time zswap was first merged.

According to Yosry and Nhat, one potential reason is that we used to
store a zswap header containing the swap entry in the compressed page
for writeback purposes, but we don't do that anymore.

This patch works good in kernel build testing even when the input data
doesn't compress at all (i.e. dlen == PAGE_SIZE), which we can see
from the bpftrace tool:

bpftrace -e 'k:zpool_malloc {@[(uint32)arg1==4096]=count()}'
@[1]: 2
@[0]: 12011430

Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 7ee54a3d8281..976f278aa507 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -707,7 +707,7 @@ static int zswap_dstmem_prepare(unsigned int cpu)
 	struct mutex *mutex;
 	u8 *dst;
 
-	dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu));
+	dst = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
 	if (!dst)
 		return -ENOMEM;
 
@@ -1662,8 +1662,7 @@ bool zswap_store(struct folio *folio)
 	sg_init_table(&input, 1);
 	sg_set_page(&input, page, PAGE_SIZE, 0);
 
-	/* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */
-	sg_init_one(&output, dst, PAGE_SIZE * 2);
+	sg_init_one(&output, dst, PAGE_SIZE);
 	acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen);
 	/*
 	 * it maybe looks a little bit silly that we send an asynchronous request,

-- 
b4 0.10.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 2/6] mm/zswap: reuse dstmem when decompress
  2023-12-18 11:50 [PATCH v3 0/6] mm/zswap: dstmem reuse optimizations and cleanups Chengming Zhou
  2023-12-18 11:50 ` [PATCH v3 1/6] mm/zswap: change dstmem size to one page Chengming Zhou
@ 2023-12-18 11:50 ` Chengming Zhou
  2023-12-18 11:50 ` [PATCH v3 3/6] mm/zswap: refactor out __zswap_load() Chengming Zhou
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Chengming Zhou @ 2023-12-18 11:50 UTC (permalink / raw)
  To: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Chris Li, Andrew Morton, Nhat Pham
  Cc: Chris Li, Yosry Ahmed, linux-kernel, Chengming Zhou, linux-mm, Nhat Pham

In the !zpool_can_sleep_mapped() case such as zsmalloc, we need to first
copy the entry->handle memory to a temporary memory, which is allocated
using kmalloc.

Obviously we can reuse the per-compressor dstmem to avoid allocating
every time, since it's percpu-compressor and protected in percpu mutex.

Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Chris Li <chrisl@kernel.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 44 ++++++++++++--------------------------------
 1 file changed, 12 insertions(+), 32 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 976f278aa507..6b872744e962 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1417,19 +1417,13 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	struct crypto_acomp_ctx *acomp_ctx;
 	struct zpool *pool = zswap_find_zpool(entry);
 	bool page_was_allocated;
-	u8 *src, *tmp = NULL;
+	u8 *src;
 	unsigned int dlen;
 	int ret;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_NONE,
 	};
 
-	if (!zpool_can_sleep_mapped(pool)) {
-		tmp = kmalloc(PAGE_SIZE, GFP_KERNEL);
-		if (!tmp)
-			return -ENOMEM;
-	}
-
 	/* try to allocate swap cache page */
 	mpol = get_task_policy(current);
 	page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
@@ -1465,15 +1459,15 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	/* decompress */
 	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
 	dlen = PAGE_SIZE;
+	mutex_lock(acomp_ctx->mutex);
 
 	src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO);
 	if (!zpool_can_sleep_mapped(pool)) {
-		memcpy(tmp, src, entry->length);
-		src = tmp;
+		memcpy(acomp_ctx->dstmem, src, entry->length);
+		src = acomp_ctx->dstmem;
 		zpool_unmap_handle(pool, entry->handle);
 	}
 
-	mutex_lock(acomp_ctx->mutex);
 	sg_init_one(&input, src, entry->length);
 	sg_init_table(&output, 1);
 	sg_set_page(&output, page, PAGE_SIZE, 0);
@@ -1482,9 +1476,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	dlen = acomp_ctx->req->dlen;
 	mutex_unlock(acomp_ctx->mutex);
 
-	if (!zpool_can_sleep_mapped(pool))
-		kfree(tmp);
-	else
+	if (zpool_can_sleep_mapped(pool))
 		zpool_unmap_handle(pool, entry->handle);
 
 	BUG_ON(ret);
@@ -1508,9 +1500,6 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	return ret;
 
 fail:
-	if (!zpool_can_sleep_mapped(pool))
-		kfree(tmp);
-
 	/*
 	 * If we get here because the page is already in swapcache, a
 	 * load may be happening concurrently. It is safe and okay to
@@ -1771,7 +1760,7 @@ bool zswap_load(struct folio *folio)
 	struct zswap_entry *entry;
 	struct scatterlist input, output;
 	struct crypto_acomp_ctx *acomp_ctx;
-	u8 *src, *dst, *tmp;
+	u8 *src, *dst;
 	struct zpool *zpool;
 	unsigned int dlen;
 	bool ret;
@@ -1796,26 +1785,19 @@ bool zswap_load(struct folio *folio)
 	}
 
 	zpool = zswap_find_zpool(entry);
-	if (!zpool_can_sleep_mapped(zpool)) {
-		tmp = kmalloc(entry->length, GFP_KERNEL);
-		if (!tmp) {
-			ret = false;
-			goto freeentry;
-		}
-	}
 
 	/* decompress */
 	dlen = PAGE_SIZE;
-	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
+	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
+	mutex_lock(acomp_ctx->mutex);
 
+	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
 	if (!zpool_can_sleep_mapped(zpool)) {
-		memcpy(tmp, src, entry->length);
-		src = tmp;
+		memcpy(acomp_ctx->dstmem, src, entry->length);
+		src = acomp_ctx->dstmem;
 		zpool_unmap_handle(zpool, entry->handle);
 	}
 
-	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-	mutex_lock(acomp_ctx->mutex);
 	sg_init_one(&input, src, entry->length);
 	sg_init_table(&output, 1);
 	sg_set_page(&output, page, PAGE_SIZE, 0);
@@ -1826,15 +1808,13 @@ bool zswap_load(struct folio *folio)
 
 	if (zpool_can_sleep_mapped(zpool))
 		zpool_unmap_handle(zpool, entry->handle);
-	else
-		kfree(tmp);
 
 	ret = true;
 stats:
 	count_vm_event(ZSWPIN);
 	if (entry->objcg)
 		count_objcg_event(entry->objcg, ZSWPIN);
-freeentry:
+
 	spin_lock(&tree->lock);
 	if (ret && zswap_exclusive_loads_enabled) {
 		zswap_invalidate_entry(tree, entry);

-- 
b4 0.10.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 3/6] mm/zswap: refactor out __zswap_load()
  2023-12-18 11:50 [PATCH v3 0/6] mm/zswap: dstmem reuse optimizations and cleanups Chengming Zhou
  2023-12-18 11:50 ` [PATCH v3 1/6] mm/zswap: change dstmem size to one page Chengming Zhou
  2023-12-18 11:50 ` [PATCH v3 2/6] mm/zswap: reuse dstmem when decompress Chengming Zhou
@ 2023-12-18 11:50 ` Chengming Zhou
  2023-12-19 11:59   ` Chris Li
  2023-12-18 11:50 ` [PATCH v3 4/6] mm/zswap: cleanup zswap_load() Chengming Zhou
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Chengming Zhou @ 2023-12-18 11:50 UTC (permalink / raw)
  To: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Chris Li, Andrew Morton, Nhat Pham
  Cc: Chris Li, Yosry Ahmed, linux-kernel, Chengming Zhou, linux-mm, Nhat Pham

The zswap_load() and zswap_writeback_entry() have the same part that
decompress the data from zswap_entry to page, so refactor out the
common part as __zswap_load(entry, page).

Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 92 ++++++++++++++++++++++----------------------------------------
 1 file changed, 32 insertions(+), 60 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 6b872744e962..3433bd6b3cef 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1392,6 +1392,35 @@ static int zswap_enabled_param_set(const char *val,
 	return ret;
 }
 
+static void __zswap_load(struct zswap_entry *entry, struct page *page)
+{
+	struct zpool *zpool = zswap_find_zpool(entry);
+	struct scatterlist input, output;
+	struct crypto_acomp_ctx *acomp_ctx;
+	u8 *src;
+
+	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
+	mutex_lock(acomp_ctx->mutex);
+
+	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
+	if (!zpool_can_sleep_mapped(zpool)) {
+		memcpy(acomp_ctx->dstmem, src, entry->length);
+		src = acomp_ctx->dstmem;
+		zpool_unmap_handle(zpool, entry->handle);
+	}
+
+	sg_init_one(&input, src, entry->length);
+	sg_init_table(&output, 1);
+	sg_set_page(&output, page, PAGE_SIZE, 0);
+	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
+	BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
+	BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
+	mutex_unlock(acomp_ctx->mutex);
+
+	if (zpool_can_sleep_mapped(zpool))
+		zpool_unmap_handle(zpool, entry->handle);
+}
+
 /*********************************
 * writeback code
 **********************************/
@@ -1413,12 +1442,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	swp_entry_t swpentry = entry->swpentry;
 	struct page *page;
 	struct mempolicy *mpol;
-	struct scatterlist input, output;
-	struct crypto_acomp_ctx *acomp_ctx;
-	struct zpool *pool = zswap_find_zpool(entry);
 	bool page_was_allocated;
-	u8 *src;
-	unsigned int dlen;
 	int ret;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_NONE,
@@ -1456,31 +1480,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	}
 	spin_unlock(&tree->lock);
 
-	/* decompress */
-	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-	dlen = PAGE_SIZE;
-	mutex_lock(acomp_ctx->mutex);
-
-	src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO);
-	if (!zpool_can_sleep_mapped(pool)) {
-		memcpy(acomp_ctx->dstmem, src, entry->length);
-		src = acomp_ctx->dstmem;
-		zpool_unmap_handle(pool, entry->handle);
-	}
-
-	sg_init_one(&input, src, entry->length);
-	sg_init_table(&output, 1);
-	sg_set_page(&output, page, PAGE_SIZE, 0);
-	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
-	ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
-	dlen = acomp_ctx->req->dlen;
-	mutex_unlock(acomp_ctx->mutex);
-
-	if (zpool_can_sleep_mapped(pool))
-		zpool_unmap_handle(pool, entry->handle);
-
-	BUG_ON(ret);
-	BUG_ON(dlen != PAGE_SIZE);
+	__zswap_load(entry, page);
 
 	/* page is up to date */
 	SetPageUptodate(page);
@@ -1758,11 +1758,7 @@ bool zswap_load(struct folio *folio)
 	struct page *page = &folio->page;
 	struct zswap_tree *tree = zswap_trees[type];
 	struct zswap_entry *entry;
-	struct scatterlist input, output;
-	struct crypto_acomp_ctx *acomp_ctx;
-	u8 *src, *dst;
-	struct zpool *zpool;
-	unsigned int dlen;
+	u8 *dst;
 	bool ret;
 
 	VM_WARN_ON_ONCE(!folio_test_locked(folio));
@@ -1784,31 +1780,7 @@ bool zswap_load(struct folio *folio)
 		goto stats;
 	}
 
-	zpool = zswap_find_zpool(entry);
-
-	/* decompress */
-	dlen = PAGE_SIZE;
-	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-	mutex_lock(acomp_ctx->mutex);
-
-	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
-	if (!zpool_can_sleep_mapped(zpool)) {
-		memcpy(acomp_ctx->dstmem, src, entry->length);
-		src = acomp_ctx->dstmem;
-		zpool_unmap_handle(zpool, entry->handle);
-	}
-
-	sg_init_one(&input, src, entry->length);
-	sg_init_table(&output, 1);
-	sg_set_page(&output, page, PAGE_SIZE, 0);
-	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
-	if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait))
-		WARN_ON(1);
-	mutex_unlock(acomp_ctx->mutex);
-
-	if (zpool_can_sleep_mapped(zpool))
-		zpool_unmap_handle(zpool, entry->handle);
-
+	__zswap_load(entry, page);
 	ret = true;
 stats:
 	count_vm_event(ZSWPIN);

-- 
b4 0.10.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 4/6] mm/zswap: cleanup zswap_load()
  2023-12-18 11:50 [PATCH v3 0/6] mm/zswap: dstmem reuse optimizations and cleanups Chengming Zhou
                   ` (2 preceding siblings ...)
  2023-12-18 11:50 ` [PATCH v3 3/6] mm/zswap: refactor out __zswap_load() Chengming Zhou
@ 2023-12-18 11:50 ` Chengming Zhou
  2023-12-19 12:47   ` Chris Li
  2023-12-18 11:50 ` [PATCH v3 5/6] mm/zswap: cleanup zswap_writeback_entry() Chengming Zhou
  2023-12-18 11:50 ` [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store Chengming Zhou
  5 siblings, 1 reply; 22+ messages in thread
From: Chengming Zhou @ 2023-12-18 11:50 UTC (permalink / raw)
  To: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Chris Li, Andrew Morton, Nhat Pham
  Cc: Chris Li, Yosry Ahmed, linux-kernel, Chengming Zhou, linux-mm, Nhat Pham

After the common decompress part goes to __zswap_load(), we can cleanup
the zswap_load() a little.

Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 3433bd6b3cef..86886276cb81 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1759,7 +1759,6 @@ bool zswap_load(struct folio *folio)
 	struct zswap_tree *tree = zswap_trees[type];
 	struct zswap_entry *entry;
 	u8 *dst;
-	bool ret;
 
 	VM_WARN_ON_ONCE(!folio_test_locked(folio));
 
@@ -1776,19 +1775,16 @@ bool zswap_load(struct folio *folio)
 		dst = kmap_local_page(page);
 		zswap_fill_page(dst, entry->value);
 		kunmap_local(dst);
-		ret = true;
-		goto stats;
+	} else {
+		__zswap_load(entry, page);
 	}
 
-	__zswap_load(entry, page);
-	ret = true;
-stats:
 	count_vm_event(ZSWPIN);
 	if (entry->objcg)
 		count_objcg_event(entry->objcg, ZSWPIN);
 
 	spin_lock(&tree->lock);
-	if (ret && zswap_exclusive_loads_enabled) {
+	if (zswap_exclusive_loads_enabled) {
 		zswap_invalidate_entry(tree, entry);
 		folio_mark_dirty(folio);
 	} else if (entry->length) {
@@ -1798,7 +1794,7 @@ bool zswap_load(struct folio *folio)
 	zswap_entry_put(tree, entry);
 	spin_unlock(&tree->lock);
 
-	return ret;
+	return true;
 }
 
 void zswap_invalidate(int type, pgoff_t offset)

-- 
b4 0.10.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 5/6] mm/zswap: cleanup zswap_writeback_entry()
  2023-12-18 11:50 [PATCH v3 0/6] mm/zswap: dstmem reuse optimizations and cleanups Chengming Zhou
                   ` (3 preceding siblings ...)
  2023-12-18 11:50 ` [PATCH v3 4/6] mm/zswap: cleanup zswap_load() Chengming Zhou
@ 2023-12-18 11:50 ` Chengming Zhou
  2023-12-19 12:50   ` Chris Li
  2023-12-18 11:50 ` [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store Chengming Zhou
  5 siblings, 1 reply; 22+ messages in thread
From: Chengming Zhou @ 2023-12-18 11:50 UTC (permalink / raw)
  To: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Chris Li, Andrew Morton, Nhat Pham
  Cc: Chris Li, Yosry Ahmed, linux-kernel, Chengming Zhou, linux-mm, Nhat Pham

Also after the common decompress part goes to __zswap_load(), we can
cleanup the zswap_writeback_entry() a little.

Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 86886276cb81..2c349fd88904 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1443,7 +1443,6 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	struct page *page;
 	struct mempolicy *mpol;
 	bool page_was_allocated;
-	int ret;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_NONE,
 	};
@@ -1453,15 +1452,18 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
 				NO_INTERLEAVE_INDEX, &page_was_allocated, true);
 	if (!page) {
-		ret = -ENOMEM;
-		goto fail;
+		/*
+		 * If we get here because the page is already in swapcache, a
+		 * load may be happening concurrently. It is safe and okay to
+		 * not free the entry. It is also okay to return !0.
+		 */
+		return -ENOMEM;
 	}
 
 	/* Found an existing page, we raced with load/swapin */
 	if (!page_was_allocated) {
 		put_page(page);
-		ret = -EEXIST;
-		goto fail;
+		return -EEXIST;
 	}
 
 	/*
@@ -1475,8 +1477,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) {
 		spin_unlock(&tree->lock);
 		delete_from_swap_cache(page_folio(page));
-		ret = -ENOMEM;
-		goto fail;
+		return -ENOMEM;
 	}
 	spin_unlock(&tree->lock);
 
@@ -1497,15 +1498,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	__swap_writepage(page, &wbc);
 	put_page(page);
 
-	return ret;
-
-fail:
-	/*
-	 * If we get here because the page is already in swapcache, a
-	 * load may be happening concurrently. It is safe and okay to
-	 * not free the entry. It is also okay to return !0.
-	 */
-	return ret;
+	return 0;
 }
 
 static int zswap_is_page_same_filled(void *ptr, unsigned long *value)

-- 
b4 0.10.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-18 11:50 [PATCH v3 0/6] mm/zswap: dstmem reuse optimizations and cleanups Chengming Zhou
                   ` (4 preceding siblings ...)
  2023-12-18 11:50 ` [PATCH v3 5/6] mm/zswap: cleanup zswap_writeback_entry() Chengming Zhou
@ 2023-12-18 11:50 ` Chengming Zhou
  2023-12-19 13:29   ` Chris Li
  5 siblings, 1 reply; 22+ messages in thread
From: Chengming Zhou @ 2023-12-18 11:50 UTC (permalink / raw)
  To: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Chris Li, Andrew Morton, Nhat Pham
  Cc: Chris Li, Yosry Ahmed, linux-kernel, Chengming Zhou, linux-mm, Nhat Pham

Since the introduce of reusing the dstmem in the load path, it seems
confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mutex
now for purposes other than what the naming suggests.

Yosry suggested removing these two fields from acomp_ctx, and directly
using zswap_dstmem and zswap_mutex in both the load and store paths,
rename them, and add proper comments above their definitions that they
are for generic percpu buffering on the load and store paths.

So this patch remove dstmem and mutex from acomp_ctx, and rename the
zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on
the load and store paths.

Suggested-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 69 +++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 37 insertions(+), 32 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 2c349fd88904..71bdcd552e5b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -166,8 +166,6 @@ struct crypto_acomp_ctx {
 	struct crypto_acomp *acomp;
 	struct acomp_req *req;
 	struct crypto_wait wait;
-	u8 *dstmem;
-	struct mutex *mutex;
 };
 
 /*
@@ -694,7 +692,7 @@ static void zswap_alloc_shrinker(struct zswap_pool *pool)
 /*********************************
 * per-cpu code
 **********************************/
-static DEFINE_PER_CPU(u8 *, zswap_dstmem);
+static DEFINE_PER_CPU(u8 *, zswap_buffer);
 /*
  * If users dynamically change the zpool type and compressor at runtime, i.e.
  * zswap is running, zswap can have more than one zpool on one cpu, but they
@@ -702,39 +700,39 @@ static DEFINE_PER_CPU(u8 *, zswap_dstmem);
  */
 static DEFINE_PER_CPU(struct mutex *, zswap_mutex);
 
-static int zswap_dstmem_prepare(unsigned int cpu)
+static int zswap_buffer_prepare(unsigned int cpu)
 {
 	struct mutex *mutex;
-	u8 *dst;
+	u8 *buf;
 
-	dst = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
-	if (!dst)
+	buf = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
+	if (!buf)
 		return -ENOMEM;
 
 	mutex = kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cpu));
 	if (!mutex) {
-		kfree(dst);
+		kfree(buf);
 		return -ENOMEM;
 	}
 
 	mutex_init(mutex);
-	per_cpu(zswap_dstmem, cpu) = dst;
+	per_cpu(zswap_buffer, cpu) = buf;
 	per_cpu(zswap_mutex, cpu) = mutex;
 	return 0;
 }
 
-static int zswap_dstmem_dead(unsigned int cpu)
+static int zswap_buffer_dead(unsigned int cpu)
 {
 	struct mutex *mutex;
-	u8 *dst;
+	u8 *buf;
 
 	mutex = per_cpu(zswap_mutex, cpu);
 	kfree(mutex);
 	per_cpu(zswap_mutex, cpu) = NULL;
 
-	dst = per_cpu(zswap_dstmem, cpu);
-	kfree(dst);
-	per_cpu(zswap_dstmem, cpu) = NULL;
+	buf = per_cpu(zswap_buffer, cpu);
+	kfree(buf);
+	per_cpu(zswap_buffer, cpu) = NULL;
 
 	return 0;
 }
@@ -772,9 +770,6 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
 	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
 				   crypto_req_done, &acomp_ctx->wait);
 
-	acomp_ctx->mutex = per_cpu(zswap_mutex, cpu);
-	acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu);
-
 	return 0;
 }
 
@@ -1397,15 +1392,21 @@ static void __zswap_load(struct zswap_entry *entry, struct page *page)
 	struct zpool *zpool = zswap_find_zpool(entry);
 	struct scatterlist input, output;
 	struct crypto_acomp_ctx *acomp_ctx;
-	u8 *src;
+	u8 *src, *buf;
+	int cpu;
+	struct mutex *mutex;
 
-	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
-	mutex_lock(acomp_ctx->mutex);
+	cpu = raw_smp_processor_id();
+	mutex = per_cpu(zswap_mutex, cpu);
+	mutex_lock(mutex);
+
+	acomp_ctx = per_cpu_ptr(entry->pool->acomp_ctx, cpu);
 
 	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
 	if (!zpool_can_sleep_mapped(zpool)) {
-		memcpy(acomp_ctx->dstmem, src, entry->length);
-		src = acomp_ctx->dstmem;
+		buf = per_cpu(zswap_buffer, cpu);
+		memcpy(buf, src, entry->length);
+		src = buf;
 		zpool_unmap_handle(zpool, entry->handle);
 	}
 
@@ -1415,7 +1416,7 @@ static void __zswap_load(struct zswap_entry *entry, struct page *page)
 	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
 	BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
 	BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
-	mutex_unlock(acomp_ctx->mutex);
+	mutex_unlock(mutex);
 
 	if (zpool_can_sleep_mapped(zpool))
 		zpool_unmap_handle(zpool, entry->handle);
@@ -1551,6 +1552,8 @@ bool zswap_store(struct folio *folio)
 	u8 *src, *dst;
 	gfp_t gfp;
 	int ret;
+	int cpu;
+	struct mutex *mutex;
 
 	VM_WARN_ON_ONCE(!folio_test_locked(folio));
 	VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
@@ -1636,11 +1639,13 @@ bool zswap_store(struct folio *folio)
 	}
 
 	/* compress */
-	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
+	cpu = raw_smp_processor_id();
+	mutex = per_cpu(zswap_mutex, cpu);
+	mutex_lock(mutex);
 
-	mutex_lock(acomp_ctx->mutex);
+	acomp_ctx = per_cpu_ptr(entry->pool->acomp_ctx, cpu);
+	dst = per_cpu(zswap_buffer, cpu);
 
-	dst = acomp_ctx->dstmem;
 	sg_init_table(&input, 1);
 	sg_set_page(&input, page, PAGE_SIZE, 0);
 
@@ -1683,7 +1688,7 @@ bool zswap_store(struct folio *folio)
 	buf = zpool_map_handle(zpool, handle, ZPOOL_MM_WO);
 	memcpy(buf, dst, dlen);
 	zpool_unmap_handle(zpool, handle);
-	mutex_unlock(acomp_ctx->mutex);
+	mutex_unlock(mutex);
 
 	/* populate entry */
 	entry->swpentry = swp_entry(type, offset);
@@ -1726,7 +1731,7 @@ bool zswap_store(struct folio *folio)
 	return true;
 
 put_dstmem:
-	mutex_unlock(acomp_ctx->mutex);
+	mutex_unlock(mutex);
 put_pool:
 	zswap_pool_put(entry->pool);
 freepage:
@@ -1902,10 +1907,10 @@ static int zswap_setup(void)
 	}
 
 	ret = cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare",
-				zswap_dstmem_prepare, zswap_dstmem_dead);
+				zswap_buffer_prepare, zswap_buffer_dead);
 	if (ret) {
-		pr_err("dstmem alloc failed\n");
-		goto dstmem_fail;
+		pr_err("buffer alloc failed\n");
+		goto buffer_fail;
 	}
 
 	ret = cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE,
@@ -1940,7 +1945,7 @@ static int zswap_setup(void)
 		zswap_pool_destroy(pool);
 hp_fail:
 	cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
-dstmem_fail:
+buffer_fail:
 	kmem_cache_destroy(zswap_entry_cache);
 cache_fail:
 	/* if built-in, we aren't unloaded on failure; don't allow use */

-- 
b4 0.10.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/6] mm/zswap: refactor out __zswap_load()
  2023-12-18 11:50 ` [PATCH v3 3/6] mm/zswap: refactor out __zswap_load() Chengming Zhou
@ 2023-12-19 11:59   ` Chris Li
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Li @ 2023-12-19 11:59 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Andrew Morton, Nhat Pham, linux-kernel,
	linux-mm

Acked-by: Chris Li <chrisl@kernel.org> (Google)

Chris

On Mon, Dec 18, 2023 at 3:50 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> The zswap_load() and zswap_writeback_entry() have the same part that
> decompress the data from zswap_entry to page, so refactor out the
> common part as __zswap_load(entry, page).
>
> Reviewed-by: Nhat Pham <nphamcs@gmail.com>
> Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  mm/zswap.c | 92 ++++++++++++++++++++++----------------------------------------
>  1 file changed, 32 insertions(+), 60 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 6b872744e962..3433bd6b3cef 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1392,6 +1392,35 @@ static int zswap_enabled_param_set(const char *val,
>         return ret;
>  }
>
> +static void __zswap_load(struct zswap_entry *entry, struct page *page)
> +{
> +       struct zpool *zpool = zswap_find_zpool(entry);
> +       struct scatterlist input, output;
> +       struct crypto_acomp_ctx *acomp_ctx;
> +       u8 *src;
> +
> +       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> +       mutex_lock(acomp_ctx->mutex);
> +
> +       src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
> +       if (!zpool_can_sleep_mapped(zpool)) {
> +               memcpy(acomp_ctx->dstmem, src, entry->length);
> +               src = acomp_ctx->dstmem;
> +               zpool_unmap_handle(zpool, entry->handle);
> +       }
> +
> +       sg_init_one(&input, src, entry->length);
> +       sg_init_table(&output, 1);
> +       sg_set_page(&output, page, PAGE_SIZE, 0);
> +       acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
> +       BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
> +       BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> +       mutex_unlock(acomp_ctx->mutex);
> +
> +       if (zpool_can_sleep_mapped(zpool))
> +               zpool_unmap_handle(zpool, entry->handle);
> +}
> +
>  /*********************************
>  * writeback code
>  **********************************/
> @@ -1413,12 +1442,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>         swp_entry_t swpentry = entry->swpentry;
>         struct page *page;
>         struct mempolicy *mpol;
> -       struct scatterlist input, output;
> -       struct crypto_acomp_ctx *acomp_ctx;
> -       struct zpool *pool = zswap_find_zpool(entry);
>         bool page_was_allocated;
> -       u8 *src;
> -       unsigned int dlen;
>         int ret;
>         struct writeback_control wbc = {
>                 .sync_mode = WB_SYNC_NONE,
> @@ -1456,31 +1480,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>         }
>         spin_unlock(&tree->lock);
>
> -       /* decompress */
> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> -       dlen = PAGE_SIZE;
> -       mutex_lock(acomp_ctx->mutex);
> -
> -       src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO);
> -       if (!zpool_can_sleep_mapped(pool)) {
> -               memcpy(acomp_ctx->dstmem, src, entry->length);
> -               src = acomp_ctx->dstmem;
> -               zpool_unmap_handle(pool, entry->handle);
> -       }
> -
> -       sg_init_one(&input, src, entry->length);
> -       sg_init_table(&output, 1);
> -       sg_set_page(&output, page, PAGE_SIZE, 0);
> -       acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
> -       ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
> -       dlen = acomp_ctx->req->dlen;
> -       mutex_unlock(acomp_ctx->mutex);
> -
> -       if (zpool_can_sleep_mapped(pool))
> -               zpool_unmap_handle(pool, entry->handle);
> -
> -       BUG_ON(ret);
> -       BUG_ON(dlen != PAGE_SIZE);
> +       __zswap_load(entry, page);
>
>         /* page is up to date */
>         SetPageUptodate(page);
> @@ -1758,11 +1758,7 @@ bool zswap_load(struct folio *folio)
>         struct page *page = &folio->page;
>         struct zswap_tree *tree = zswap_trees[type];
>         struct zswap_entry *entry;
> -       struct scatterlist input, output;
> -       struct crypto_acomp_ctx *acomp_ctx;
> -       u8 *src, *dst;
> -       struct zpool *zpool;
> -       unsigned int dlen;
> +       u8 *dst;
>         bool ret;
>
>         VM_WARN_ON_ONCE(!folio_test_locked(folio));
> @@ -1784,31 +1780,7 @@ bool zswap_load(struct folio *folio)
>                 goto stats;
>         }
>
> -       zpool = zswap_find_zpool(entry);
> -
> -       /* decompress */
> -       dlen = PAGE_SIZE;
> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> -       mutex_lock(acomp_ctx->mutex);
> -
> -       src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
> -       if (!zpool_can_sleep_mapped(zpool)) {
> -               memcpy(acomp_ctx->dstmem, src, entry->length);
> -               src = acomp_ctx->dstmem;
> -               zpool_unmap_handle(zpool, entry->handle);
> -       }
> -
> -       sg_init_one(&input, src, entry->length);
> -       sg_init_table(&output, 1);
> -       sg_set_page(&output, page, PAGE_SIZE, 0);
> -       acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
> -       if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait))
> -               WARN_ON(1);
> -       mutex_unlock(acomp_ctx->mutex);
> -
> -       if (zpool_can_sleep_mapped(zpool))
> -               zpool_unmap_handle(zpool, entry->handle);
> -
> +       __zswap_load(entry, page);
>         ret = true;
>  stats:
>         count_vm_event(ZSWPIN);
>
> --
> b4 0.10.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/6] mm/zswap: cleanup zswap_load()
  2023-12-18 11:50 ` [PATCH v3 4/6] mm/zswap: cleanup zswap_load() Chengming Zhou
@ 2023-12-19 12:47   ` Chris Li
  2023-12-19 14:07     ` Chengming Zhou
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Li @ 2023-12-19 12:47 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Andrew Morton, Nhat Pham, linux-kernel,
	linux-mm

On Mon, Dec 18, 2023 at 3:50 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> After the common decompress part goes to __zswap_load(), we can cleanup
> the zswap_load() a little.
>
> Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  mm/zswap.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 3433bd6b3cef..86886276cb81 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1759,7 +1759,6 @@ bool zswap_load(struct folio *folio)
>         struct zswap_tree *tree = zswap_trees[type];
>         struct zswap_entry *entry;
>         u8 *dst;
> -       bool ret;
>
>         VM_WARN_ON_ONCE(!folio_test_locked(folio));
>
> @@ -1776,19 +1775,16 @@ bool zswap_load(struct folio *folio)
>                 dst = kmap_local_page(page);
>                 zswap_fill_page(dst, entry->value);
>                 kunmap_local(dst);
> -               ret = true;
> -               goto stats;
> +       } else {
> +               __zswap_load(entry, page);

Very minor nitpick. I think this change you only take out the "ret",
you don't have to remove the goto. Personally I prefer the one with
the goto because If (!entry->length) is a rare case, having them
indented match the normal execution flow is the streamlined one
without indentation.

If you keep the else statement without the goto. You can move
__zswap_load(entry,page) to the if statement so most common case go
through the if statement rather than else.

I also think this commit can fold into the previous one. As I said,
this is minor comment, it is your call.

Acked-by: Chis Li <chrisl@kernel.org> (Google)

Chris


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/6] mm/zswap: cleanup zswap_writeback_entry()
  2023-12-18 11:50 ` [PATCH v3 5/6] mm/zswap: cleanup zswap_writeback_entry() Chengming Zhou
@ 2023-12-19 12:50   ` Chris Li
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Li @ 2023-12-19 12:50 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Andrew Morton, Nhat Pham, linux-kernel,
	linux-mm

Acked-by: Chris Li <chrisl@kernel.org> (Google)

I also thing this one can fold into patch 3. Too trivial to be a separate patch.
Your call.

Chris

On Mon, Dec 18, 2023 at 3:50 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> Also after the common decompress part goes to __zswap_load(), we can
> cleanup the zswap_writeback_entry() a little.
>
> Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
> Reviewed-by: Nhat Pham <nphamcs@gmail.com>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  mm/zswap.c | 25 +++++++++----------------
>  1 file changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 86886276cb81..2c349fd88904 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1443,7 +1443,6 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>         struct page *page;
>         struct mempolicy *mpol;
>         bool page_was_allocated;
> -       int ret;
>         struct writeback_control wbc = {
>                 .sync_mode = WB_SYNC_NONE,
>         };
> @@ -1453,15 +1452,18 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>         page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
>                                 NO_INTERLEAVE_INDEX, &page_was_allocated, true);
>         if (!page) {
> -               ret = -ENOMEM;
> -               goto fail;
> +               /*
> +                * If we get here because the page is already in swapcache, a
> +                * load may be happening concurrently. It is safe and okay to
> +                * not free the entry. It is also okay to return !0.
> +                */
> +               return -ENOMEM;
>         }
>
>         /* Found an existing page, we raced with load/swapin */
>         if (!page_was_allocated) {
>                 put_page(page);
> -               ret = -EEXIST;
> -               goto fail;
> +               return -EEXIST;
>         }
>
>         /*
> @@ -1475,8 +1477,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>         if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) {
>                 spin_unlock(&tree->lock);
>                 delete_from_swap_cache(page_folio(page));
> -               ret = -ENOMEM;
> -               goto fail;
> +               return -ENOMEM;
>         }
>         spin_unlock(&tree->lock);
>
> @@ -1497,15 +1498,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>         __swap_writepage(page, &wbc);
>         put_page(page);
>
> -       return ret;
> -
> -fail:
> -       /*
> -        * If we get here because the page is already in swapcache, a
> -        * load may be happening concurrently. It is safe and okay to
> -        * not free the entry. It is also okay to return !0.
> -        */
> -       return ret;
> +       return 0;
>  }
>
>  static int zswap_is_page_same_filled(void *ptr, unsigned long *value)
>
> --
> b4 0.10.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-18 11:50 ` [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store Chengming Zhou
@ 2023-12-19 13:29   ` Chris Li
  2023-12-19 18:43     ` Nhat Pham
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Li @ 2023-12-19 13:29 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Andrew Morton, Nhat Pham, linux-kernel,
	linux-mm

Hi Chengming and Yosry,

On Mon, Dec 18, 2023 at 3:50 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> Since the introduce of reusing the dstmem in the load path, it seems
> confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mutex
> now for purposes other than what the naming suggests.
>
> Yosry suggested removing these two fields from acomp_ctx, and directly
> using zswap_dstmem and zswap_mutex in both the load and store paths,
> rename them, and add proper comments above their definitions that they
> are for generic percpu buffering on the load and store paths.
>
> So this patch remove dstmem and mutex from acomp_ctx, and rename the
> zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on
> the load and store paths.

Sorry joining this discussion late.

I get the rename of "dstmem" to "buffer". Because the buffer is used
for both load and store as well. What I don't get is that, why do we
move it out of the acomp_ctx struct. Now we have 3 per cpu entry:
buffer, mutex and acomp_ctx. I think we should do the reverse, fold
this three per cpu entry into one struct the acomp_ctx. Each per_cpu
load() has a sequence of dance around the cpu id and disable preempt
etc, while each of the struct member load is just a plan memory load.
It seems to me it would be more optimal to combine this three per cpu
entry into acomp_ctx. Just do the per cpu for the acomp_ctx once.

>
> Suggested-by: Yosry Ahmed <yosryahmed@google.com>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  mm/zswap.c | 69 +++++++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 37 insertions(+), 32 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 2c349fd88904..71bdcd552e5b 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -166,8 +166,6 @@ struct crypto_acomp_ctx {
>         struct crypto_acomp *acomp;
>         struct acomp_req *req;
>         struct crypto_wait wait;
> -       u8 *dstmem;
> -       struct mutex *mutex;
>  };
>
>  /*
> @@ -694,7 +692,7 @@ static void zswap_alloc_shrinker(struct zswap_pool *pool)
>  /*********************************
>  * per-cpu code
>  **********************************/
> -static DEFINE_PER_CPU(u8 *, zswap_dstmem);
> +static DEFINE_PER_CPU(u8 *, zswap_buffer);
>  /*
>   * If users dynamically change the zpool type and compressor at runtime, i.e.
>   * zswap is running, zswap can have more than one zpool on one cpu, but they
> @@ -702,39 +700,39 @@ static DEFINE_PER_CPU(u8 *, zswap_dstmem);
>   */
>  static DEFINE_PER_CPU(struct mutex *, zswap_mutex);
>
> -static int zswap_dstmem_prepare(unsigned int cpu)
> +static int zswap_buffer_prepare(unsigned int cpu)
>  {
>         struct mutex *mutex;
> -       u8 *dst;
> +       u8 *buf;
>
> -       dst = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
> -       if (!dst)
> +       buf = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
> +       if (!buf)
>                 return -ENOMEM;
>
>         mutex = kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cpu));
>         if (!mutex) {
> -               kfree(dst);
> +               kfree(buf);
>                 return -ENOMEM;
>         }
>
>         mutex_init(mutex);
> -       per_cpu(zswap_dstmem, cpu) = dst;
> +       per_cpu(zswap_buffer, cpu) = buf;
>         per_cpu(zswap_mutex, cpu) = mutex;
>         return 0;
>  }
>
> -static int zswap_dstmem_dead(unsigned int cpu)
> +static int zswap_buffer_dead(unsigned int cpu)
>  {
>         struct mutex *mutex;
> -       u8 *dst;
> +       u8 *buf;
>
>         mutex = per_cpu(zswap_mutex, cpu);
>         kfree(mutex);
>         per_cpu(zswap_mutex, cpu) = NULL;
>
> -       dst = per_cpu(zswap_dstmem, cpu);
> -       kfree(dst);
> -       per_cpu(zswap_dstmem, cpu) = NULL;
> +       buf = per_cpu(zswap_buffer, cpu);
> +       kfree(buf);
> +       per_cpu(zswap_buffer, cpu) = NULL;
>
>         return 0;
>  }
> @@ -772,9 +770,6 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
>         acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
>                                    crypto_req_done, &acomp_ctx->wait);
>
> -       acomp_ctx->mutex = per_cpu(zswap_mutex, cpu);
> -       acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu);
> -
>         return 0;
>  }
>
> @@ -1397,15 +1392,21 @@ static void __zswap_load(struct zswap_entry *entry, struct page *page)
>         struct zpool *zpool = zswap_find_zpool(entry);
>         struct scatterlist input, output;
>         struct crypto_acomp_ctx *acomp_ctx;
> -       u8 *src;
> +       u8 *src, *buf;
> +       int cpu;
> +       struct mutex *mutex;
>
> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> -       mutex_lock(acomp_ctx->mutex);
> +       cpu = raw_smp_processor_id();
> +       mutex = per_cpu(zswap_mutex, cpu);
First per cpu call.
> +       mutex_lock(mutex);
> +
> +       acomp_ctx = per_cpu_ptr(entry->pool->acomp_ctx, cpu);
Second per cpu
>
>         src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
>         if (!zpool_can_sleep_mapped(zpool)) {
> -               memcpy(acomp_ctx->dstmem, src, entry->length);
> -               src = acomp_ctx->dstmem;
> +               buf = per_cpu(zswap_buffer, cpu);

Here is the function that does the third per_cpu call. I think doing
one per_cpu and the rest of them load from the context is more
optimal.
Conceptually it is cleaner as well. It is clear what this mutex is
supposed to protect. It is protecting the compression context struct.
Move it out as per cpu, it is less clear what is the protection scope
of the mutex.

What am I missing?

Chris


> +               memcpy(buf, src, entry->length);
> +               src = buf;
>                 zpool_unmap_handle(zpool, entry->handle);
>         }
>
> @@ -1415,7 +1416,7 @@ static void __zswap_load(struct zswap_entry *entry, struct page *page)
>         acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
>         BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
>         BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> -       mutex_unlock(acomp_ctx->mutex);
> +       mutex_unlock(mutex);
>
>         if (zpool_can_sleep_mapped(zpool))
>                 zpool_unmap_handle(zpool, entry->handle);
> @@ -1551,6 +1552,8 @@ bool zswap_store(struct folio *folio)
>         u8 *src, *dst;
>         gfp_t gfp;
>         int ret;
> +       int cpu;
> +       struct mutex *mutex;
>
>         VM_WARN_ON_ONCE(!folio_test_locked(folio));
>         VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
> @@ -1636,11 +1639,13 @@ bool zswap_store(struct folio *folio)
>         }
>
>         /* compress */
> -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> +       cpu = raw_smp_processor_id();
> +       mutex = per_cpu(zswap_mutex, cpu);
> +       mutex_lock(mutex);
>
> -       mutex_lock(acomp_ctx->mutex);
> +       acomp_ctx = per_cpu_ptr(entry->pool->acomp_ctx, cpu);
> +       dst = per_cpu(zswap_buffer, cpu);
>
> -       dst = acomp_ctx->dstmem;
>         sg_init_table(&input, 1);
>         sg_set_page(&input, page, PAGE_SIZE, 0);
>
> @@ -1683,7 +1688,7 @@ bool zswap_store(struct folio *folio)
>         buf = zpool_map_handle(zpool, handle, ZPOOL_MM_WO);
>         memcpy(buf, dst, dlen);
>         zpool_unmap_handle(zpool, handle);
> -       mutex_unlock(acomp_ctx->mutex);
> +       mutex_unlock(mutex);
>
>         /* populate entry */
>         entry->swpentry = swp_entry(type, offset);
> @@ -1726,7 +1731,7 @@ bool zswap_store(struct folio *folio)
>         return true;
>
>  put_dstmem:
> -       mutex_unlock(acomp_ctx->mutex);
> +       mutex_unlock(mutex);
>  put_pool:
>         zswap_pool_put(entry->pool);
>  freepage:
> @@ -1902,10 +1907,10 @@ static int zswap_setup(void)
>         }
>
>         ret = cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare",
> -                               zswap_dstmem_prepare, zswap_dstmem_dead);
> +                               zswap_buffer_prepare, zswap_buffer_dead);
>         if (ret) {
> -               pr_err("dstmem alloc failed\n");
> -               goto dstmem_fail;
> +               pr_err("buffer alloc failed\n");
> +               goto buffer_fail;
>         }
>
>         ret = cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE,
> @@ -1940,7 +1945,7 @@ static int zswap_setup(void)
>                 zswap_pool_destroy(pool);
>  hp_fail:
>         cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
> -dstmem_fail:
> +buffer_fail:
>         kmem_cache_destroy(zswap_entry_cache);
>  cache_fail:
>         /* if built-in, we aren't unloaded on failure; don't allow use */
>
> --
> b4 0.10.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 4/6] mm/zswap: cleanup zswap_load()
  2023-12-19 12:47   ` Chris Li
@ 2023-12-19 14:07     ` Chengming Zhou
  0 siblings, 0 replies; 22+ messages in thread
From: Chengming Zhou @ 2023-12-19 14:07 UTC (permalink / raw)
  To: Chris Li
  Cc: Seth Jennings, Yosry Ahmed, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Andrew Morton, Nhat Pham, linux-kernel,
	linux-mm

On 2023/12/19 20:47, Chris Li wrote:
> On Mon, Dec 18, 2023 at 3:50 AM Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
>>
>> After the common decompress part goes to __zswap_load(), we can cleanup
>> the zswap_load() a little.
>>
>> Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>> ---
>>  mm/zswap.c | 12 ++++--------
>>  1 file changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/zswap.c b/mm/zswap.c
>> index 3433bd6b3cef..86886276cb81 100644
>> --- a/mm/zswap.c
>> +++ b/mm/zswap.c
>> @@ -1759,7 +1759,6 @@ bool zswap_load(struct folio *folio)
>>         struct zswap_tree *tree = zswap_trees[type];
>>         struct zswap_entry *entry;
>>         u8 *dst;
>> -       bool ret;
>>
>>         VM_WARN_ON_ONCE(!folio_test_locked(folio));
>>
>> @@ -1776,19 +1775,16 @@ bool zswap_load(struct folio *folio)
>>                 dst = kmap_local_page(page);
>>                 zswap_fill_page(dst, entry->value);
>>                 kunmap_local(dst);
>> -               ret = true;
>> -               goto stats;
>> +       } else {
>> +               __zswap_load(entry, page);
> 
> Very minor nitpick. I think this change you only take out the "ret",
> you don't have to remove the goto. Personally I prefer the one with
> the goto because If (!entry->length) is a rare case, having them
> indented match the normal execution flow is the streamlined one
> without indentation.
> 
> If you keep the else statement without the goto. You can move
> __zswap_load(entry,page) to the if statement so most common case go
> through the if statement rather than else.

Make sense. I will change if a new version is needed.

> 
> I also think this commit can fold into the previous one. As I said,
> this is minor comment, it is your call.
> 
> Acked-by: Chis Li <chrisl@kernel.org> (Google)

Thanks for review!


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-19 13:29   ` Chris Li
@ 2023-12-19 18:43     ` Nhat Pham
  2023-12-19 21:39       ` Yosry Ahmed
  0 siblings, 1 reply; 22+ messages in thread
From: Nhat Pham @ 2023-12-19 18:43 UTC (permalink / raw)
  To: Chris Li
  Cc: Chengming Zhou, Seth Jennings, Yosry Ahmed, Vitaly Wool,
	Dan Streetman, Johannes Weiner, Andrew Morton, linux-kernel,
	linux-mm

On Tue, Dec 19, 2023 at 5:29 AM Chris Li <chrisl@kernel.org> wrote:
>
> Hi Chengming and Yosry,
>
> On Mon, Dec 18, 2023 at 3:50 AM Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
> >
> > Since the introduce of reusing the dstmem in the load path, it seems
> > confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mutex
> > now for purposes other than what the naming suggests.
> >
> > Yosry suggested removing these two fields from acomp_ctx, and directly
> > using zswap_dstmem and zswap_mutex in both the load and store paths,
> > rename them, and add proper comments above their definitions that they
> > are for generic percpu buffering on the load and store paths.
> >
> > So this patch remove dstmem and mutex from acomp_ctx, and rename the
> > zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on
> > the load and store paths.
>
> Sorry joining this discussion late.
>
> I get the rename of "dstmem" to "buffer". Because the buffer is used
> for both load and store as well. What I don't get is that, why do we
> move it out of the acomp_ctx struct. Now we have 3 per cpu entry:
> buffer, mutex and acomp_ctx. I think we should do the reverse, fold
> this three per cpu entry into one struct the acomp_ctx. Each per_cpu
> load() has a sequence of dance around the cpu id and disable preempt
> etc, while each of the struct member load is just a plan memory load.
> It seems to me it would be more optimal to combine this three per cpu
> entry into acomp_ctx. Just do the per cpu for the acomp_ctx once.

I agree with Chris. From a practicality POV, what Chris says here
makes sense. From a semantic POV, this buffer is only used in
(de)compression contexts - be it in store, load, or writeback - so it
belonging to the orignal struct still makes sense to me. Why separate
it out, without any benefits. Just rename the old field buffer or
zswap_buffer and call it a day? It will be a smaller patch too!

>
> >
> > Suggested-by: Yosry Ahmed <yosryahmed@google.com>
> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > ---
> >  mm/zswap.c | 69 +++++++++++++++++++++++++++++++++-----------------------------
> >  1 file changed, 37 insertions(+), 32 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 2c349fd88904..71bdcd552e5b 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -166,8 +166,6 @@ struct crypto_acomp_ctx {
> >         struct crypto_acomp *acomp;
> >         struct acomp_req *req;
> >         struct crypto_wait wait;
> > -       u8 *dstmem;
> > -       struct mutex *mutex;
> >  };
> >
> >  /*
> > @@ -694,7 +692,7 @@ static void zswap_alloc_shrinker(struct zswap_pool *pool)
> >  /*********************************
> >  * per-cpu code
> >  **********************************/
> > -static DEFINE_PER_CPU(u8 *, zswap_dstmem);
> > +static DEFINE_PER_CPU(u8 *, zswap_buffer);
> >  /*
> >   * If users dynamically change the zpool type and compressor at runtime, i.e.
> >   * zswap is running, zswap can have more than one zpool on one cpu, but they
> > @@ -702,39 +700,39 @@ static DEFINE_PER_CPU(u8 *, zswap_dstmem);
> >   */
> >  static DEFINE_PER_CPU(struct mutex *, zswap_mutex);
> >
> > -static int zswap_dstmem_prepare(unsigned int cpu)
> > +static int zswap_buffer_prepare(unsigned int cpu)
> >  {
> >         struct mutex *mutex;
> > -       u8 *dst;
> > +       u8 *buf;
> >
> > -       dst = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
> > -       if (!dst)
> > +       buf = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
> > +       if (!buf)
> >                 return -ENOMEM;
> >
> >         mutex = kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cpu));
> >         if (!mutex) {
> > -               kfree(dst);
> > +               kfree(buf);
> >                 return -ENOMEM;
> >         }
> >
> >         mutex_init(mutex);
> > -       per_cpu(zswap_dstmem, cpu) = dst;
> > +       per_cpu(zswap_buffer, cpu) = buf;
> >         per_cpu(zswap_mutex, cpu) = mutex;
> >         return 0;
> >  }
> >
> > -static int zswap_dstmem_dead(unsigned int cpu)
> > +static int zswap_buffer_dead(unsigned int cpu)
> >  {
> >         struct mutex *mutex;
> > -       u8 *dst;
> > +       u8 *buf;
> >
> >         mutex = per_cpu(zswap_mutex, cpu);
> >         kfree(mutex);
> >         per_cpu(zswap_mutex, cpu) = NULL;
> >
> > -       dst = per_cpu(zswap_dstmem, cpu);
> > -       kfree(dst);
> > -       per_cpu(zswap_dstmem, cpu) = NULL;
> > +       buf = per_cpu(zswap_buffer, cpu);
> > +       kfree(buf);
> > +       per_cpu(zswap_buffer, cpu) = NULL;
> >
> >         return 0;
> >  }
> > @@ -772,9 +770,6 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
> >         acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> >                                    crypto_req_done, &acomp_ctx->wait);
> >
> > -       acomp_ctx->mutex = per_cpu(zswap_mutex, cpu);
> > -       acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu);
> > -
> >         return 0;
> >  }
> >
> > @@ -1397,15 +1392,21 @@ static void __zswap_load(struct zswap_entry *entry, struct page *page)
> >         struct zpool *zpool = zswap_find_zpool(entry);
> >         struct scatterlist input, output;
> >         struct crypto_acomp_ctx *acomp_ctx;
> > -       u8 *src;
> > +       u8 *src, *buf;
> > +       int cpu;
> > +       struct mutex *mutex;
> >
> > -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> > -       mutex_lock(acomp_ctx->mutex);
> > +       cpu = raw_smp_processor_id();
> > +       mutex = per_cpu(zswap_mutex, cpu);
> First per cpu call.
> > +       mutex_lock(mutex);
> > +
> > +       acomp_ctx = per_cpu_ptr(entry->pool->acomp_ctx, cpu);
> Second per cpu
> >
> >         src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
> >         if (!zpool_can_sleep_mapped(zpool)) {
> > -               memcpy(acomp_ctx->dstmem, src, entry->length);
> > -               src = acomp_ctx->dstmem;
> > +               buf = per_cpu(zswap_buffer, cpu);
>
> Here is the function that does the third per_cpu call. I think doing
> one per_cpu and the rest of them load from the context is more
> optimal.
> Conceptually it is cleaner as well. It is clear what this mutex is
> supposed to protect. It is protecting the compression context struct.
> Move it out as per cpu, it is less clear what is the protection scope
> of the mutex.
>
> What am I missing?
>
> Chris
>
>
> > +               memcpy(buf, src, entry->length);
> > +               src = buf;
> >                 zpool_unmap_handle(zpool, entry->handle);
> >         }
> >
> > @@ -1415,7 +1416,7 @@ static void __zswap_load(struct zswap_entry *entry, struct page *page)
> >         acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
> >         BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
> >         BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> > -       mutex_unlock(acomp_ctx->mutex);
> > +       mutex_unlock(mutex);
> >
> >         if (zpool_can_sleep_mapped(zpool))
> >                 zpool_unmap_handle(zpool, entry->handle);
> > @@ -1551,6 +1552,8 @@ bool zswap_store(struct folio *folio)
> >         u8 *src, *dst;
> >         gfp_t gfp;
> >         int ret;
> > +       int cpu;
> > +       struct mutex *mutex;
> >
> >         VM_WARN_ON_ONCE(!folio_test_locked(folio));
> >         VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
> > @@ -1636,11 +1639,13 @@ bool zswap_store(struct folio *folio)
> >         }
> >
> >         /* compress */
> > -       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> > +       cpu = raw_smp_processor_id();
> > +       mutex = per_cpu(zswap_mutex, cpu);
> > +       mutex_lock(mutex);
> >
> > -       mutex_lock(acomp_ctx->mutex);
> > +       acomp_ctx = per_cpu_ptr(entry->pool->acomp_ctx, cpu);
> > +       dst = per_cpu(zswap_buffer, cpu);
> >
> > -       dst = acomp_ctx->dstmem;
> >         sg_init_table(&input, 1);
> >         sg_set_page(&input, page, PAGE_SIZE, 0);
> >
> > @@ -1683,7 +1688,7 @@ bool zswap_store(struct folio *folio)
> >         buf = zpool_map_handle(zpool, handle, ZPOOL_MM_WO);
> >         memcpy(buf, dst, dlen);
> >         zpool_unmap_handle(zpool, handle);
> > -       mutex_unlock(acomp_ctx->mutex);
> > +       mutex_unlock(mutex);
> >
> >         /* populate entry */
> >         entry->swpentry = swp_entry(type, offset);
> > @@ -1726,7 +1731,7 @@ bool zswap_store(struct folio *folio)
> >         return true;
> >
> >  put_dstmem:
> > -       mutex_unlock(acomp_ctx->mutex);
> > +       mutex_unlock(mutex);
> >  put_pool:
> >         zswap_pool_put(entry->pool);
> >  freepage:
> > @@ -1902,10 +1907,10 @@ static int zswap_setup(void)
> >         }
> >
> >         ret = cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare",
> > -                               zswap_dstmem_prepare, zswap_dstmem_dead);
> > +                               zswap_buffer_prepare, zswap_buffer_dead);
> >         if (ret) {
> > -               pr_err("dstmem alloc failed\n");
> > -               goto dstmem_fail;
> > +               pr_err("buffer alloc failed\n");
> > +               goto buffer_fail;
> >         }
> >
> >         ret = cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE,
> > @@ -1940,7 +1945,7 @@ static int zswap_setup(void)
> >                 zswap_pool_destroy(pool);
> >  hp_fail:
> >         cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
> > -dstmem_fail:
> > +buffer_fail:
> >         kmem_cache_destroy(zswap_entry_cache);
> >  cache_fail:
> >         /* if built-in, we aren't unloaded on failure; don't allow use */
> >
> > --
> > b4 0.10.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-19 18:43     ` Nhat Pham
@ 2023-12-19 21:39       ` Yosry Ahmed
  2023-12-19 22:48         ` Chris Li
  2023-12-20 12:20         ` Chengming Zhou
  0 siblings, 2 replies; 22+ messages in thread
From: Yosry Ahmed @ 2023-12-19 21:39 UTC (permalink / raw)
  To: Nhat Pham
  Cc: Chris Li, Chengming Zhou, Seth Jennings, Vitaly Wool,
	Dan Streetman, Johannes Weiner, Andrew Morton, linux-kernel,
	linux-mm

On Tue, Dec 19, 2023 at 10:43 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Tue, Dec 19, 2023 at 5:29 AM Chris Li <chrisl@kernel.org> wrote:
> >
> > Hi Chengming and Yosry,
> >
> > On Mon, Dec 18, 2023 at 3:50 AM Chengming Zhou
> > <zhouchengming@bytedance.com> wrote:
> > >
> > > Since the introduce of reusing the dstmem in the load path, it seems
> > > confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mutex
> > > now for purposes other than what the naming suggests.
> > >
> > > Yosry suggested removing these two fields from acomp_ctx, and directly
> > > using zswap_dstmem and zswap_mutex in both the load and store paths,
> > > rename them, and add proper comments above their definitions that they
> > > are for generic percpu buffering on the load and store paths.
> > >
> > > So this patch remove dstmem and mutex from acomp_ctx, and rename the
> > > zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on
> > > the load and store paths.
> >
> > Sorry joining this discussion late.
> >
> > I get the rename of "dstmem" to "buffer". Because the buffer is used
> > for both load and store as well. What I don't get is that, why do we
> > move it out of the acomp_ctx struct. Now we have 3 per cpu entry:
> > buffer, mutex and acomp_ctx. I think we should do the reverse, fold
> > this three per cpu entry into one struct the acomp_ctx. Each per_cpu
> > load() has a sequence of dance around the cpu id and disable preempt
> > etc, while each of the struct member load is just a plan memory load.
> > It seems to me it would be more optimal to combine this three per cpu
> > entry into acomp_ctx. Just do the per cpu for the acomp_ctx once.
>
> I agree with Chris. From a practicality POV, what Chris says here
> makes sense. From a semantic POV, this buffer is only used in
> (de)compression contexts - be it in store, load, or writeback - so it
> belonging to the orignal struct still makes sense to me. Why separate
> it out, without any benefits. Just rename the old field buffer or
> zswap_buffer and call it a day? It will be a smaller patch too!
>

My main concern is that the struct name is specific for the crypto
acomp stuff, but that buffer and mutex are not.
How about we keep it in the struct, but refactor the struct as follows:

struct zswap_ctx {
    struct {
        struct crypto_acomp *acomp;
        struct acomp_req *req;
        struct crypto_wait wait;
    }  acomp_ctx;
    u8 *dstmem;
    struct mutex *mutex;
};

, and then rename zswap_pool.acomp_ctx to zswap_pool.ctx?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-19 21:39       ` Yosry Ahmed
@ 2023-12-19 22:48         ` Chris Li
  2023-12-19 23:04           ` Yosry Ahmed
  2023-12-20 12:20         ` Chengming Zhou
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Li @ 2023-12-19 22:48 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Nhat Pham, Chengming Zhou, Seth Jennings, Vitaly Wool,
	Dan Streetman, Johannes Weiner, Andrew Morton, linux-kernel,
	linux-mm

Hi Yosry,

On Tue, Dec 19, 2023 at 1:39 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> My main concern is that the struct name is specific for the crypto
> acomp stuff, but that buffer and mutex are not.
> How about we keep it in the struct, but refactor the struct as follows:

If it is the naming of the struct you are not happy about. We can
change the naming.

>
> struct zswap_ctx {
>     struct {
>         struct crypto_acomp *acomp;
>         struct acomp_req *req;
>         struct crypto_wait wait;
>     }  acomp_ctx;
>     u8 *dstmem;
>     struct mutex *mutex;
> };

The compression and decompression requires the buffer and mutex. The
mutex is not used other than compress and decompress, right?
In my mind, they are fine staying in the struct. I am not sure adding
an level acomp_ctx provides anything. It makes access structure
members deeper.

If you care about separating out the crypto acomp,  how about just
remove acomp_ctx and make it an anonymous structure.
struct zswap_comp_ctx {
    struct /* cryto acomp context */ {
        struct crypto_acomp *acomp;
        struct acomp_req *req;
        struct crypto_wait wait;
     };
     u8 *dstmem;
     struct mutex *mutex;
 };

Then we remove other per_cpu_load as well.

I also think the original struct name is fine, we don't need to change
the struct name.  The value/code_change ratio for renaming the struct
alone is low. On the other hand, if we can get rid of some per cpu
load, that value is high enough to justify the patch.

Chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-19 22:48         ` Chris Li
@ 2023-12-19 23:04           ` Yosry Ahmed
  2023-12-19 23:33             ` Chris Li
  0 siblings, 1 reply; 22+ messages in thread
From: Yosry Ahmed @ 2023-12-19 23:04 UTC (permalink / raw)
  To: Chris Li
  Cc: Nhat Pham, Chengming Zhou, Seth Jennings, Vitaly Wool,
	Dan Streetman, Johannes Weiner, Andrew Morton, linux-kernel,
	linux-mm

On Tue, Dec 19, 2023 at 2:49 PM Chris Li <chrisl@kernel.org> wrote:
>
> Hi Yosry,
>
> On Tue, Dec 19, 2023 at 1:39 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > My main concern is that the struct name is specific for the crypto
> > acomp stuff, but that buffer and mutex are not.
> > How about we keep it in the struct, but refactor the struct as follows:
>
> If it is the naming of the struct you are not happy about. We can
> change the naming.
>
> >
> > struct zswap_ctx {
> >     struct {
> >         struct crypto_acomp *acomp;
> >         struct acomp_req *req;
> >         struct crypto_wait wait;
> >     }  acomp_ctx;
> >     u8 *dstmem;
> >     struct mutex *mutex;
> > };
>
> The compression and decompression requires the buffer and mutex. The
> mutex is not used other than compress and decompress, right?
> In my mind, they are fine staying in the struct. I am not sure adding
> an level acomp_ctx provides anything. It makes access structure
> members deeper.
>
> If you care about separating out the crypto acomp,  how about just
> remove acomp_ctx and make it an anonymous structure.
> struct zswap_comp_ctx {
>     struct /* cryto acomp context */ {
>         struct crypto_acomp *acomp;
>         struct acomp_req *req;
>         struct crypto_wait wait;
>      };
>      u8 *dstmem;
>      struct mutex *mutex;
>  };

I prefer naming the internal struct, but I am fine with an anonymous
struct as well. I just think it's a semantically sound separation.

>
> Then we remove other per_cpu_load as well.
>
> I also think the original struct name is fine, we don't need to change
> the struct name.

The original struct name makes it seems like the data in the struct is
only used for the crypto acomp API, which is not the case.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-19 23:04           ` Yosry Ahmed
@ 2023-12-19 23:33             ` Chris Li
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Li @ 2023-12-19 23:33 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Nhat Pham, Chengming Zhou, Seth Jennings, Vitaly Wool,
	Dan Streetman, Johannes Weiner, Andrew Morton, linux-kernel,
	linux-mm

Hi Yosry,


On Tue, Dec 19, 2023 at 3:05 PM Yosry Ahmed <yosryahmed@google.com> wrote:

> > The compression and decompression requires the buffer and mutex. The
> > mutex is not used other than compress and decompress, right?
> > In my mind, they are fine staying in the struct. I am not sure adding
> > an level acomp_ctx provides anything. It makes access structure
> > members deeper.
> >
> > If you care about separating out the crypto acomp,  how about just
> > remove acomp_ctx and make it an anonymous structure.
> > struct zswap_comp_ctx {
> >     struct /* cryto acomp context */ {
> >         struct crypto_acomp *acomp;
> >         struct acomp_req *req;
> >         struct crypto_wait wait;
> >      };
> >      u8 *dstmem;
> >      struct mutex *mutex;
> >  };
>
> I prefer naming the internal struct, but I am fine with an anonymous
> struct as well. I just think it's a semantically sound separation.

Ack.

>
> >
> > Then we remove other per_cpu_load as well.
> >
> > I also think the original struct name is fine, we don't need to change
> > the struct name.
>
> The original struct name makes it seems like the data in the struct is
> only used for the crypto acomp API, which is not the case.

The mutex and buffer are used associated with the crypto acomp API
that is why I think it is fine to stay within the struct as well.
Using the anonymous struct to separate it out is marginally better. I
think we are in agreement here.

Chris


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-19 21:39       ` Yosry Ahmed
  2023-12-19 22:48         ` Chris Li
@ 2023-12-20 12:20         ` Chengming Zhou
  2023-12-21  0:19           ` Yosry Ahmed
  2023-12-22 17:37           ` Chris Li
  1 sibling, 2 replies; 22+ messages in thread
From: Chengming Zhou @ 2023-12-20 12:20 UTC (permalink / raw)
  To: Yosry Ahmed, Nhat Pham, Chris Li
  Cc: Seth Jennings, Vitaly Wool, Dan Streetman, Johannes Weiner,
	Andrew Morton, linux-kernel, linux-mm

On 2023/12/20 05:39, Yosry Ahmed wrote:
> On Tue, Dec 19, 2023 at 10:43 AM Nhat Pham <nphamcs@gmail.com> wrote:
>>
>> On Tue, Dec 19, 2023 at 5:29 AM Chris Li <chrisl@kernel.org> wrote:
>>>
>>> Hi Chengming and Yosry,
>>>
>>> On Mon, Dec 18, 2023 at 3:50 AM Chengming Zhou
>>> <zhouchengming@bytedance.com> wrote:
>>>>
>>>> Since the introduce of reusing the dstmem in the load path, it seems
>>>> confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mutex
>>>> now for purposes other than what the naming suggests.
>>>>
>>>> Yosry suggested removing these two fields from acomp_ctx, and directly
>>>> using zswap_dstmem and zswap_mutex in both the load and store paths,
>>>> rename them, and add proper comments above their definitions that they
>>>> are for generic percpu buffering on the load and store paths.
>>>>
>>>> So this patch remove dstmem and mutex from acomp_ctx, and rename the
>>>> zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on
>>>> the load and store paths.
>>>
>>> Sorry joining this discussion late.
>>>
>>> I get the rename of "dstmem" to "buffer". Because the buffer is used
>>> for both load and store as well. What I don't get is that, why do we
>>> move it out of the acomp_ctx struct. Now we have 3 per cpu entry:
>>> buffer, mutex and acomp_ctx. I think we should do the reverse, fold
>>> this three per cpu entry into one struct the acomp_ctx. Each per_cpu
>>> load() has a sequence of dance around the cpu id and disable preempt
>>> etc, while each of the struct member load is just a plan memory load.
>>> It seems to me it would be more optimal to combine this three per cpu
>>> entry into acomp_ctx. Just do the per cpu for the acomp_ctx once.
>>
>> I agree with Chris. From a practicality POV, what Chris says here
>> makes sense. From a semantic POV, this buffer is only used in
>> (de)compression contexts - be it in store, load, or writeback - so it
>> belonging to the orignal struct still makes sense to me. Why separate
>> it out, without any benefits. Just rename the old field buffer or
>> zswap_buffer and call it a day? It will be a smaller patch too!
>>
> 
> My main concern is that the struct name is specific for the crypto
> acomp stuff, but that buffer and mutex are not.
> How about we keep it in the struct, but refactor the struct as follows:
> 
> struct zswap_ctx {
>     struct {
>         struct crypto_acomp *acomp;
>         struct acomp_req *req;
>         struct crypto_wait wait;
>     }  acomp_ctx;
>     u8 *dstmem;
>     struct mutex *mutex;
> };
> 
> , and then rename zswap_pool.acomp_ctx to zswap_pool.ctx?

I think there are two viewpoints here, both works ok to me.

The first is from ownship or lifetime, these percpu mutex and dstmem
are shared between all pools, so no one pool own the mutex and dstmem,
nor does the percpu acomp_ctx in each pool.

The second is from usage, these percpu mutex and dstmem are used by
the percpu acomp_ctx in each pool, so it's easy to use to put pointers
to them in the percpu acomp_ctx.

Actually I think it's simpler to let the percpu acomp_ctx has its own
mutex and dstmem, which in fact are the necessary parts when it use
the acomp interfaces.

This way, we could delete the percpu mutex and dstmem, and its hotplugs,
and not shared anymore between all pools. Maybe we would have many pools
at the same time in the future, like different compression algorithm or
different zpool for different memcg workloads. Who knows? :-)

So how about this patch below? Just RFC.

Subject: [PATCH] mm/zswap: make each acomp_ctx has its own mutex and dstmem

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 include/linux/cpuhotplug.h |  1 -
 mm/zswap.c                 | 86 ++++++++++++--------------------------
 2 files changed, 26 insertions(+), 61 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index efc0c0b07efb..c3e06e21766a 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -124,7 +124,6 @@ enum cpuhp_state {
 	CPUHP_ARM_BL_PREPARE,
 	CPUHP_TRACE_RB_PREPARE,
 	CPUHP_MM_ZS_PREPARE,
-	CPUHP_MM_ZSWP_MEM_PREPARE,
 	CPUHP_MM_ZSWP_POOL_PREPARE,
 	CPUHP_KVM_PPC_BOOK3S_PREPARE,
 	CPUHP_ZCOMP_PREPARE,
diff --git a/mm/zswap.c b/mm/zswap.c
index 2c349fd88904..37301f1a80a9 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -694,63 +694,31 @@ static void zswap_alloc_shrinker(struct zswap_pool *pool)
 /*********************************
 * per-cpu code
 **********************************/
-static DEFINE_PER_CPU(u8 *, zswap_dstmem);
-/*
- * If users dynamically change the zpool type and compressor at runtime, i.e.
- * zswap is running, zswap can have more than one zpool on one cpu, but they
- * are sharing dtsmem. So we need this mutex to be per-cpu.
- */
-static DEFINE_PER_CPU(struct mutex *, zswap_mutex);
-
-static int zswap_dstmem_prepare(unsigned int cpu)
-{
-	struct mutex *mutex;
-	u8 *dst;
-
-	dst = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
-	if (!dst)
-		return -ENOMEM;
-
-	mutex = kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cpu));
-	if (!mutex) {
-		kfree(dst);
-		return -ENOMEM;
-	}
-
-	mutex_init(mutex);
-	per_cpu(zswap_dstmem, cpu) = dst;
-	per_cpu(zswap_mutex, cpu) = mutex;
-	return 0;
-}
-
-static int zswap_dstmem_dead(unsigned int cpu)
-{
-	struct mutex *mutex;
-	u8 *dst;
-
-	mutex = per_cpu(zswap_mutex, cpu);
-	kfree(mutex);
-	per_cpu(zswap_mutex, cpu) = NULL;
-
-	dst = per_cpu(zswap_dstmem, cpu);
-	kfree(dst);
-	per_cpu(zswap_dstmem, cpu) = NULL;
-
-	return 0;
-}
-
 static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
 {
 	struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
 	struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
 	struct crypto_acomp *acomp;
 	struct acomp_req *req;
+	int ret = 0;
+
+	acomp_ctx->dstmem = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
+	if (!acomp_ctx->dstmem)
+		return -ENOMEM;
+
+	acomp_ctx->mutex = kmalloc_node(sizeof(struct mutex), GFP_KERNEL, cpu_to_node(cpu));
+	if (!acomp_ctx->mutex) {
+		ret = -ENOMEM;
+		goto mutex_fail;
+	}
+	mutex_init(acomp_ctx->mutex);

 	acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu));
 	if (IS_ERR(acomp)) {
 		pr_err("could not alloc crypto acomp %s : %ld\n",
 				pool->tfm_name, PTR_ERR(acomp));
-		return PTR_ERR(acomp);
+		ret = PTR_ERR(acomp);
+		goto acomp_fail;
 	}
 	acomp_ctx->acomp = acomp;

@@ -758,8 +726,8 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
 	if (!req) {
 		pr_err("could not alloc crypto acomp_request %s\n",
 		       pool->tfm_name);
-		crypto_free_acomp(acomp_ctx->acomp);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto req_fail;
 	}
 	acomp_ctx->req = req;

@@ -772,10 +740,15 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
 	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
 				   crypto_req_done, &acomp_ctx->wait);

-	acomp_ctx->mutex = per_cpu(zswap_mutex, cpu);
-	acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu);
-
 	return 0;
+req_fail:
+	crypto_free_acomp(acomp_ctx->acomp);
+acomp_fail:
+	kfree(acomp_ctx->mutex);
+mutex_fail:
+	kfree(acomp_ctx->dstmem);
+
+	return ret;
 }

 static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
@@ -788,6 +761,8 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
 			acomp_request_free(acomp_ctx->req);
 		if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
 			crypto_free_acomp(acomp_ctx->acomp);
+		kfree(acomp_ctx->mutex);
+		kfree(acomp_ctx->dstmem);
 	}

 	return 0;
@@ -1901,13 +1876,6 @@ static int zswap_setup(void)
 		goto cache_fail;
 	}

-	ret = cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare",
-				zswap_dstmem_prepare, zswap_dstmem_dead);
-	if (ret) {
-		pr_err("dstmem alloc failed\n");
-		goto dstmem_fail;
-	}
-
 	ret = cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE,
 				      "mm/zswap_pool:prepare",
 				      zswap_cpu_comp_prepare,
@@ -1939,8 +1907,6 @@ static int zswap_setup(void)
 	if (pool)
 		zswap_pool_destroy(pool);
 hp_fail:
-	cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
-dstmem_fail:
 	kmem_cache_destroy(zswap_entry_cache);
 cache_fail:
 	/* if built-in, we aren't unloaded on failure; don't allow use */
--
2.20.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-20 12:20         ` Chengming Zhou
@ 2023-12-21  0:19           ` Yosry Ahmed
  2023-12-25 14:39             ` Chengming Zhou
  2023-12-22 17:37           ` Chris Li
  1 sibling, 1 reply; 22+ messages in thread
From: Yosry Ahmed @ 2023-12-21  0:19 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Nhat Pham, Chris Li, Seth Jennings, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Andrew Morton, linux-kernel, linux-mm

On Wed, Dec 20, 2023 at 4:20 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> On 2023/12/20 05:39, Yosry Ahmed wrote:
> > On Tue, Dec 19, 2023 at 10:43 AM Nhat Pham <nphamcs@gmail.com> wrote:
> >>
> >> On Tue, Dec 19, 2023 at 5:29 AM Chris Li <chrisl@kernel.org> wrote:
> >>>
> >>> Hi Chengming and Yosry,
> >>>
> >>> On Mon, Dec 18, 2023 at 3:50 AM Chengming Zhou
> >>> <zhouchengming@bytedance.com> wrote:
> >>>>
> >>>> Since the introduce of reusing the dstmem in the load path, it seems
> >>>> confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mutex
> >>>> now for purposes other than what the naming suggests.
> >>>>
> >>>> Yosry suggested removing these two fields from acomp_ctx, and directly
> >>>> using zswap_dstmem and zswap_mutex in both the load and store paths,
> >>>> rename them, and add proper comments above their definitions that they
> >>>> are for generic percpu buffering on the load and store paths.
> >>>>
> >>>> So this patch remove dstmem and mutex from acomp_ctx, and rename the
> >>>> zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on
> >>>> the load and store paths.
> >>>
> >>> Sorry joining this discussion late.
> >>>
> >>> I get the rename of "dstmem" to "buffer". Because the buffer is used
> >>> for both load and store as well. What I don't get is that, why do we
> >>> move it out of the acomp_ctx struct. Now we have 3 per cpu entry:
> >>> buffer, mutex and acomp_ctx. I think we should do the reverse, fold
> >>> this three per cpu entry into one struct the acomp_ctx. Each per_cpu
> >>> load() has a sequence of dance around the cpu id and disable preempt
> >>> etc, while each of the struct member load is just a plan memory load.
> >>> It seems to me it would be more optimal to combine this three per cpu
> >>> entry into acomp_ctx. Just do the per cpu for the acomp_ctx once.
> >>
> >> I agree with Chris. From a practicality POV, what Chris says here
> >> makes sense. From a semantic POV, this buffer is only used in
> >> (de)compression contexts - be it in store, load, or writeback - so it
> >> belonging to the orignal struct still makes sense to me. Why separate
> >> it out, without any benefits. Just rename the old field buffer or
> >> zswap_buffer and call it a day? It will be a smaller patch too!
> >>
> >
> > My main concern is that the struct name is specific for the crypto
> > acomp stuff, but that buffer and mutex are not.
> > How about we keep it in the struct, but refactor the struct as follows:
> >
> > struct zswap_ctx {
> >     struct {
> >         struct crypto_acomp *acomp;
> >         struct acomp_req *req;
> >         struct crypto_wait wait;
> >     }  acomp_ctx;
> >     u8 *dstmem;
> >     struct mutex *mutex;
> > };
> >
> > , and then rename zswap_pool.acomp_ctx to zswap_pool.ctx?
>
> I think there are two viewpoints here, both works ok to me.
>
> The first is from ownship or lifetime, these percpu mutex and dstmem
> are shared between all pools, so no one pool own the mutex and dstmem,
> nor does the percpu acomp_ctx in each pool.
>
> The second is from usage, these percpu mutex and dstmem are used by
> the percpu acomp_ctx in each pool, so it's easy to use to put pointers
> to them in the percpu acomp_ctx.
>
> Actually I think it's simpler to let the percpu acomp_ctx has its own
> mutex and dstmem, which in fact are the necessary parts when it use
> the acomp interfaces.
>
> This way, we could delete the percpu mutex and dstmem, and its hotplugs,
> and not shared anymore between all pools. Maybe we would have many pools
> at the same time in the future, like different compression algorithm or
> different zpool for different memcg workloads. Who knows? :-)
>
> So how about this patch below? Just RFC.

The general approach looks fine to me, although I still prefer we
reorganize the struct as Chris and I discussed: rename
crypto_acomp_ctx to a generic name, add a (anonymous) struct for the
crypto_acomp stuff, rename dstmem to buffer or so.

I think we can also make the mutex a static part of the struct, any
advantage to dynamically allocating it?

>
> Subject: [PATCH] mm/zswap: make each acomp_ctx has its own mutex and dstmem
>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  include/linux/cpuhotplug.h |  1 -
>  mm/zswap.c                 | 86 ++++++++++++--------------------------
>  2 files changed, 26 insertions(+), 61 deletions(-)
>
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index efc0c0b07efb..c3e06e21766a 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -124,7 +124,6 @@ enum cpuhp_state {
>         CPUHP_ARM_BL_PREPARE,
>         CPUHP_TRACE_RB_PREPARE,
>         CPUHP_MM_ZS_PREPARE,
> -       CPUHP_MM_ZSWP_MEM_PREPARE,
>         CPUHP_MM_ZSWP_POOL_PREPARE,
>         CPUHP_KVM_PPC_BOOK3S_PREPARE,
>         CPUHP_ZCOMP_PREPARE,
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 2c349fd88904..37301f1a80a9 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -694,63 +694,31 @@ static void zswap_alloc_shrinker(struct zswap_pool *pool)
>  /*********************************
>  * per-cpu code
>  **********************************/
> -static DEFINE_PER_CPU(u8 *, zswap_dstmem);
> -/*
> - * If users dynamically change the zpool type and compressor at runtime, i.e.
> - * zswap is running, zswap can have more than one zpool on one cpu, but they
> - * are sharing dtsmem. So we need this mutex to be per-cpu.
> - */
> -static DEFINE_PER_CPU(struct mutex *, zswap_mutex);
> -
> -static int zswap_dstmem_prepare(unsigned int cpu)
> -{
> -       struct mutex *mutex;
> -       u8 *dst;
> -
> -       dst = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
> -       if (!dst)
> -               return -ENOMEM;
> -
> -       mutex = kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cpu));
> -       if (!mutex) {
> -               kfree(dst);
> -               return -ENOMEM;
> -       }
> -
> -       mutex_init(mutex);
> -       per_cpu(zswap_dstmem, cpu) = dst;
> -       per_cpu(zswap_mutex, cpu) = mutex;
> -       return 0;
> -}
> -
> -static int zswap_dstmem_dead(unsigned int cpu)
> -{
> -       struct mutex *mutex;
> -       u8 *dst;
> -
> -       mutex = per_cpu(zswap_mutex, cpu);
> -       kfree(mutex);
> -       per_cpu(zswap_mutex, cpu) = NULL;
> -
> -       dst = per_cpu(zswap_dstmem, cpu);
> -       kfree(dst);
> -       per_cpu(zswap_dstmem, cpu) = NULL;
> -
> -       return 0;
> -}
> -
>  static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
>  {
>         struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
>         struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
>         struct crypto_acomp *acomp;
>         struct acomp_req *req;
> +       int ret = 0;
> +
> +       acomp_ctx->dstmem = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
> +       if (!acomp_ctx->dstmem)
> +               return -ENOMEM;
> +
> +       acomp_ctx->mutex = kmalloc_node(sizeof(struct mutex), GFP_KERNEL, cpu_to_node(cpu));
> +       if (!acomp_ctx->mutex) {
> +               ret = -ENOMEM;
> +               goto mutex_fail;
> +       }
> +       mutex_init(acomp_ctx->mutex);
>
>         acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu));
>         if (IS_ERR(acomp)) {
>                 pr_err("could not alloc crypto acomp %s : %ld\n",
>                                 pool->tfm_name, PTR_ERR(acomp));
> -               return PTR_ERR(acomp);
> +               ret = PTR_ERR(acomp);
> +               goto acomp_fail;
>         }
>         acomp_ctx->acomp = acomp;
>
> @@ -758,8 +726,8 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
>         if (!req) {
>                 pr_err("could not alloc crypto acomp_request %s\n",
>                        pool->tfm_name);
> -               crypto_free_acomp(acomp_ctx->acomp);
> -               return -ENOMEM;
> +               ret = -ENOMEM;
> +               goto req_fail;
>         }
>         acomp_ctx->req = req;
>
> @@ -772,10 +740,15 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
>         acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
>                                    crypto_req_done, &acomp_ctx->wait);
>
> -       acomp_ctx->mutex = per_cpu(zswap_mutex, cpu);
> -       acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu);
> -
>         return 0;
> +req_fail:
> +       crypto_free_acomp(acomp_ctx->acomp);
> +acomp_fail:
> +       kfree(acomp_ctx->mutex);
> +mutex_fail:
> +       kfree(acomp_ctx->dstmem);
> +
> +       return ret;
>  }
>
>  static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
> @@ -788,6 +761,8 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
>                         acomp_request_free(acomp_ctx->req);
>                 if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
>                         crypto_free_acomp(acomp_ctx->acomp);
> +               kfree(acomp_ctx->mutex);
> +               kfree(acomp_ctx->dstmem);
>         }
>
>         return 0;
> @@ -1901,13 +1876,6 @@ static int zswap_setup(void)
>                 goto cache_fail;
>         }
>
> -       ret = cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare",
> -                               zswap_dstmem_prepare, zswap_dstmem_dead);
> -       if (ret) {
> -               pr_err("dstmem alloc failed\n");
> -               goto dstmem_fail;
> -       }
> -
>         ret = cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE,
>                                       "mm/zswap_pool:prepare",
>                                       zswap_cpu_comp_prepare,
> @@ -1939,8 +1907,6 @@ static int zswap_setup(void)
>         if (pool)
>                 zswap_pool_destroy(pool);
>  hp_fail:
> -       cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
> -dstmem_fail:
>         kmem_cache_destroy(zswap_entry_cache);
>  cache_fail:
>         /* if built-in, we aren't unloaded on failure; don't allow use */
> --
> 2.20.1
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-20 12:20         ` Chengming Zhou
  2023-12-21  0:19           ` Yosry Ahmed
@ 2023-12-22 17:37           ` Chris Li
  2023-12-25 14:32             ` Chengming Zhou
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Li @ 2023-12-22 17:37 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Yosry Ahmed, Nhat Pham, Seth Jennings, Vitaly Wool,
	Dan Streetman, Johannes Weiner, Andrew Morton, linux-kernel,
	linux-mm

Hi Chengming,

The patch looks good to me.

Acked-by: Chris Li <chrisl@kernel.org> (Google)

On Wed, Dec 20, 2023 at 4:21 AM Chengming Zhou
<zhouchengming@bytedance.com> wrote:
>
> On 2023/12/20 05:39, Yosry Ahmed wrote:
> > On Tue, Dec 19, 2023 at 10:43 AM Nhat Pham <nphamcs@gmail.com> wrote:
> >>
> >> On Tue, Dec 19, 2023 at 5:29 AM Chris Li <chrisl@kernel.org> wrote:
> >>>
> >>> Hi Chengming and Yosry,
> >>>
> >>> On Mon, Dec 18, 2023 at 3:50 AM Chengming Zhou
> >>> <zhouchengming@bytedance.com> wrote:
> >>>>
> >>>> Since the introduce of reusing the dstmem in the load path, it seems
> >>>> confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mutex
> >>>> now for purposes other than what the naming suggests.
> >>>>
> >>>> Yosry suggested removing these two fields from acomp_ctx, and directly
> >>>> using zswap_dstmem and zswap_mutex in both the load and store paths,
> >>>> rename them, and add proper comments above their definitions that they
> >>>> are for generic percpu buffering on the load and store paths.
> >>>>
> >>>> So this patch remove dstmem and mutex from acomp_ctx, and rename the
> >>>> zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on
> >>>> the load and store paths.
> >>>
> >>> Sorry joining this discussion late.
> >>>
> >>> I get the rename of "dstmem" to "buffer". Because the buffer is used
> >>> for both load and store as well. What I don't get is that, why do we
> >>> move it out of the acomp_ctx struct. Now we have 3 per cpu entry:
> >>> buffer, mutex and acomp_ctx. I think we should do the reverse, fold
> >>> this three per cpu entry into one struct the acomp_ctx. Each per_cpu
> >>> load() has a sequence of dance around the cpu id and disable preempt
> >>> etc, while each of the struct member load is just a plan memory load.
> >>> It seems to me it would be more optimal to combine this three per cpu
> >>> entry into acomp_ctx. Just do the per cpu for the acomp_ctx once.
> >>
> >> I agree with Chris. From a practicality POV, what Chris says here
> >> makes sense. From a semantic POV, this buffer is only used in
> >> (de)compression contexts - be it in store, load, or writeback - so it
> >> belonging to the orignal struct still makes sense to me. Why separate
> >> it out, without any benefits. Just rename the old field buffer or
> >> zswap_buffer and call it a day? It will be a smaller patch too!
> >>
> >
> > My main concern is that the struct name is specific for the crypto
> > acomp stuff, but that buffer and mutex are not.
> > How about we keep it in the struct, but refactor the struct as follows:
> >
> > struct zswap_ctx {
> >     struct {
> >         struct crypto_acomp *acomp;
> >         struct acomp_req *req;
> >         struct crypto_wait wait;
> >     }  acomp_ctx;
> >     u8 *dstmem;
> >     struct mutex *mutex;
> > };
> >
> > , and then rename zswap_pool.acomp_ctx to zswap_pool.ctx?
>
> I think there are two viewpoints here, both works ok to me.
>
> The first is from ownship or lifetime, these percpu mutex and dstmem
> are shared between all pools, so no one pool own the mutex and dstmem,
> nor does the percpu acomp_ctx in each pool.
>
> The second is from usage, these percpu mutex and dstmem are used by
> the percpu acomp_ctx in each pool, so it's easy to use to put pointers
> to them in the percpu acomp_ctx.
>
> Actually I think it's simpler to let the percpu acomp_ctx has its own
> mutex and dstmem, which in fact are the necessary parts when it use
> the acomp interfaces.

Agree, that is why I prefer to keep the struct together. I am fine
with what Yosry suggested and the anonymous struct, just consider it
is not critically necessary.

>
> This way, we could delete the percpu mutex and dstmem, and its hotplugs,

That is the real value of this patch. Thanks for doing that.

> and not shared anymore between all pools. Maybe we would have many pools
> at the same time in the future, like different compression algorithm or
> different zpool for different memcg workloads. Who knows? :-)

As long as the zswap is not re-enterable, e.g. never have the nested
page fault that causes zswap_load within another zswap_load, I think
we are fine having more than one pool share the buffer. In fact, if we
trigger the nested zswap load, I expect the kernel will dead lock on
the nested mutex acquire because the mutex is already taken. We will
know about kernel panic rather than selient memory corruption.

>
> So how about this patch below? Just RFC.
>
> Subject: [PATCH] mm/zswap: make each acomp_ctx has its own mutex and dstmem

Thank you for doing that, you can consider submitting it as the real
patch instead of the RFC. I see real value in this patch removing some
per CPU fields.

>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  include/linux/cpuhotplug.h |  1 -
>  mm/zswap.c                 | 86 ++++++++++++--------------------------
>  2 files changed, 26 insertions(+), 61 deletions(-)
>
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index efc0c0b07efb..c3e06e21766a 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -124,7 +124,6 @@ enum cpuhp_state {
>         CPUHP_ARM_BL_PREPARE,
>         CPUHP_TRACE_RB_PREPARE,
>         CPUHP_MM_ZS_PREPARE,
> -       CPUHP_MM_ZSWP_MEM_PREPARE,
>         CPUHP_MM_ZSWP_POOL_PREPARE,
>         CPUHP_KVM_PPC_BOOK3S_PREPARE,
>         CPUHP_ZCOMP_PREPARE,
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 2c349fd88904..37301f1a80a9 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -694,63 +694,31 @@ static void zswap_alloc_shrinker(struct zswap_pool *pool)
>  /*********************************
>  * per-cpu code
>  **********************************/
> -static DEFINE_PER_CPU(u8 *, zswap_dstmem);
> -/*
> - * If users dynamically change the zpool type and compressor at runtime, i.e.
> - * zswap is running, zswap can have more than one zpool on one cpu, but they
> - * are sharing dtsmem. So we need this mutex to be per-cpu.
> - */
> -static DEFINE_PER_CPU(struct mutex *, zswap_mutex);
> -
> -static int zswap_dstmem_prepare(unsigned int cpu)

Nice!

Chris

> -{
> -       struct mutex *mutex;
> -       u8 *dst;
> -
> -       dst = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
> -       if (!dst)
> -               return -ENOMEM;
> -
> -       mutex = kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cpu));
> -       if (!mutex) {
> -               kfree(dst);
> -               return -ENOMEM;
> -       }
> -
> -       mutex_init(mutex);
> -       per_cpu(zswap_dstmem, cpu) = dst;
> -       per_cpu(zswap_mutex, cpu) = mutex;
> -       return 0;
> -}
> -
> -static int zswap_dstmem_dead(unsigned int cpu)
> -{
> -       struct mutex *mutex;
> -       u8 *dst;
> -
> -       mutex = per_cpu(zswap_mutex, cpu);
> -       kfree(mutex);
> -       per_cpu(zswap_mutex, cpu) = NULL;
> -
> -       dst = per_cpu(zswap_dstmem, cpu);
> -       kfree(dst);
> -       per_cpu(zswap_dstmem, cpu) = NULL;
> -
> -       return 0;
> -}
> -
>  static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
>  {
>         struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
>         struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
>         struct crypto_acomp *acomp;
>         struct acomp_req *req;
> +       int ret = 0;
> +
> +       acomp_ctx->dstmem = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
> +       if (!acomp_ctx->dstmem)
> +               return -ENOMEM;
> +
> +       acomp_ctx->mutex = kmalloc_node(sizeof(struct mutex), GFP_KERNEL, cpu_to_node(cpu));
> +       if (!acomp_ctx->mutex) {
> +               ret = -ENOMEM;
> +               goto mutex_fail;
> +       }
> +       mutex_init(acomp_ctx->mutex);
>
>         acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu));
>         if (IS_ERR(acomp)) {
>                 pr_err("could not alloc crypto acomp %s : %ld\n",
>                                 pool->tfm_name, PTR_ERR(acomp));
> -               return PTR_ERR(acomp);
> +               ret = PTR_ERR(acomp);
> +               goto acomp_fail;
>         }
>         acomp_ctx->acomp = acomp;
>
> @@ -758,8 +726,8 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
>         if (!req) {
>                 pr_err("could not alloc crypto acomp_request %s\n",
>                        pool->tfm_name);
> -               crypto_free_acomp(acomp_ctx->acomp);
> -               return -ENOMEM;
> +               ret = -ENOMEM;
> +               goto req_fail;
>         }
>         acomp_ctx->req = req;
>
> @@ -772,10 +740,15 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
>         acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
>                                    crypto_req_done, &acomp_ctx->wait);
>
> -       acomp_ctx->mutex = per_cpu(zswap_mutex, cpu);
> -       acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu);
> -
>         return 0;
> +req_fail:
> +       crypto_free_acomp(acomp_ctx->acomp);
> +acomp_fail:
> +       kfree(acomp_ctx->mutex);
> +mutex_fail:
> +       kfree(acomp_ctx->dstmem);
> +
> +       return ret;
>  }
>
>  static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
> @@ -788,6 +761,8 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
>                         acomp_request_free(acomp_ctx->req);
>                 if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
>                         crypto_free_acomp(acomp_ctx->acomp);
> +               kfree(acomp_ctx->mutex);
> +               kfree(acomp_ctx->dstmem);
>         }
>
>         return 0;
> @@ -1901,13 +1876,6 @@ static int zswap_setup(void)
>                 goto cache_fail;
>         }
>
> -       ret = cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:prepare",
> -                               zswap_dstmem_prepare, zswap_dstmem_dead);
> -       if (ret) {
> -               pr_err("dstmem alloc failed\n");
> -               goto dstmem_fail;
> -       }
> -
>         ret = cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE,
>                                       "mm/zswap_pool:prepare",
>                                       zswap_cpu_comp_prepare,
> @@ -1939,8 +1907,6 @@ static int zswap_setup(void)
>         if (pool)
>                 zswap_pool_destroy(pool);
>  hp_fail:
> -       cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
> -dstmem_fail:
>         kmem_cache_destroy(zswap_entry_cache);
>  cache_fail:
>         /* if built-in, we aren't unloaded on failure; don't allow use */
> --
> 2.20.1
>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-22 17:37           ` Chris Li
@ 2023-12-25 14:32             ` Chengming Zhou
  0 siblings, 0 replies; 22+ messages in thread
From: Chengming Zhou @ 2023-12-25 14:32 UTC (permalink / raw)
  To: Chris Li
  Cc: Yosry Ahmed, Nhat Pham, Seth Jennings, Vitaly Wool,
	Dan Streetman, Johannes Weiner, Andrew Morton, linux-kernel,
	linux-mm

On 2023/12/23 01:37, Chris Li wrote:
> Hi Chengming,
> 
> The patch looks good to me.
> 
> Acked-by: Chris Li <chrisl@kernel.org> (Google)
> 

Thanks.

[...]
>>
>> I think there are two viewpoints here, both works ok to me.
>>
>> The first is from ownship or lifetime, these percpu mutex and dstmem
>> are shared between all pools, so no one pool own the mutex and dstmem,
>> nor does the percpu acomp_ctx in each pool.
>>
>> The second is from usage, these percpu mutex and dstmem are used by
>> the percpu acomp_ctx in each pool, so it's easy to use to put pointers
>> to them in the percpu acomp_ctx.
>>
>> Actually I think it's simpler to let the percpu acomp_ctx has its own
>> mutex and dstmem, which in fact are the necessary parts when it use
>> the acomp interfaces.
> 
> Agree, that is why I prefer to keep the struct together. I am fine
> with what Yosry suggested and the anonymous struct, just consider it
> is not critically necessary.
> 

Agree, I have no strong opinion about it, so will just leave it as it is.

>>
>> This way, we could delete the percpu mutex and dstmem, and its hotplugs,
> 
> That is the real value of this patch. Thanks for doing that.
> 
>> and not shared anymore between all pools. Maybe we would have many pools
>> at the same time in the future, like different compression algorithm or
>> different zpool for different memcg workloads. Who knows? :-)
> 
> As long as the zswap is not re-enterable, e.g. never have the nested
> page fault that causes zswap_load within another zswap_load, I think
> we are fine having more than one pool share the buffer. In fact, if we
> trigger the nested zswap load, I expect the kernel will dead lock on
> the nested mutex acquire because the mutex is already taken. We will
> know about kernel panic rather than selient memory corruption.
> 
>>
>> So how about this patch below? Just RFC.
>>
>> Subject: [PATCH] mm/zswap: make each acomp_ctx has its own mutex and dstmem
> 
> Thank you for doing that, you can consider submitting it as the real
> patch instead of the RFC. I see real value in this patch removing some
> per CPU fields.

Ok, will update.

Thanks!


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store
  2023-12-21  0:19           ` Yosry Ahmed
@ 2023-12-25 14:39             ` Chengming Zhou
  0 siblings, 0 replies; 22+ messages in thread
From: Chengming Zhou @ 2023-12-25 14:39 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Nhat Pham, Chris Li, Seth Jennings, Vitaly Wool, Dan Streetman,
	Johannes Weiner, Andrew Morton, linux-kernel, linux-mm

On 2023/12/21 08:19, Yosry Ahmed wrote:
> On Wed, Dec 20, 2023 at 4:20 AM Chengming Zhou
> <zhouchengming@bytedance.com> wrote:
>>
>> On 2023/12/20 05:39, Yosry Ahmed wrote:
>>> On Tue, Dec 19, 2023 at 10:43 AM Nhat Pham <nphamcs@gmail.com> wrote:
>>>>
>>>> On Tue, Dec 19, 2023 at 5:29 AM Chris Li <chrisl@kernel.org> wrote:
>>>>>
>>>>> Hi Chengming and Yosry,
>>>>>
>>>>> On Mon, Dec 18, 2023 at 3:50 AM Chengming Zhou
>>>>> <zhouchengming@bytedance.com> wrote:
>>>>>>
>>>>>> Since the introduce of reusing the dstmem in the load path, it seems
>>>>>> confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mutex
>>>>>> now for purposes other than what the naming suggests.
>>>>>>
>>>>>> Yosry suggested removing these two fields from acomp_ctx, and directly
>>>>>> using zswap_dstmem and zswap_mutex in both the load and store paths,
>>>>>> rename them, and add proper comments above their definitions that they
>>>>>> are for generic percpu buffering on the load and store paths.
>>>>>>
>>>>>> So this patch remove dstmem and mutex from acomp_ctx, and rename the
>>>>>> zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on
>>>>>> the load and store paths.
>>>>>
>>>>> Sorry joining this discussion late.
>>>>>
>>>>> I get the rename of "dstmem" to "buffer". Because the buffer is used
>>>>> for both load and store as well. What I don't get is that, why do we
>>>>> move it out of the acomp_ctx struct. Now we have 3 per cpu entry:
>>>>> buffer, mutex and acomp_ctx. I think we should do the reverse, fold
>>>>> this three per cpu entry into one struct the acomp_ctx. Each per_cpu
>>>>> load() has a sequence of dance around the cpu id and disable preempt
>>>>> etc, while each of the struct member load is just a plan memory load.
>>>>> It seems to me it would be more optimal to combine this three per cpu
>>>>> entry into acomp_ctx. Just do the per cpu for the acomp_ctx once.
>>>>
>>>> I agree with Chris. From a practicality POV, what Chris says here
>>>> makes sense. From a semantic POV, this buffer is only used in
>>>> (de)compression contexts - be it in store, load, or writeback - so it
>>>> belonging to the orignal struct still makes sense to me. Why separate
>>>> it out, without any benefits. Just rename the old field buffer or
>>>> zswap_buffer and call it a day? It will be a smaller patch too!
>>>>
>>>
>>> My main concern is that the struct name is specific for the crypto
>>> acomp stuff, but that buffer and mutex are not.
>>> How about we keep it in the struct, but refactor the struct as follows:
>>>
>>> struct zswap_ctx {
>>>     struct {
>>>         struct crypto_acomp *acomp;
>>>         struct acomp_req *req;
>>>         struct crypto_wait wait;
>>>     }  acomp_ctx;
>>>     u8 *dstmem;
>>>     struct mutex *mutex;
>>> };
>>>
>>> , and then rename zswap_pool.acomp_ctx to zswap_pool.ctx?
>>
>> I think there are two viewpoints here, both works ok to me.
>>
>> The first is from ownship or lifetime, these percpu mutex and dstmem
>> are shared between all pools, so no one pool own the mutex and dstmem,
>> nor does the percpu acomp_ctx in each pool.
>>
>> The second is from usage, these percpu mutex and dstmem are used by
>> the percpu acomp_ctx in each pool, so it's easy to use to put pointers
>> to them in the percpu acomp_ctx.
>>
>> Actually I think it's simpler to let the percpu acomp_ctx has its own
>> mutex and dstmem, which in fact are the necessary parts when it use
>> the acomp interfaces.
>>
>> This way, we could delete the percpu mutex and dstmem, and its hotplugs,
>> and not shared anymore between all pools. Maybe we would have many pools
>> at the same time in the future, like different compression algorithm or
>> different zpool for different memcg workloads. Who knows? :-)
>>
>> So how about this patch below? Just RFC.
> 
> The general approach looks fine to me, although I still prefer we
> reorganize the struct as Chris and I discussed: rename
> crypto_acomp_ctx to a generic name, add a (anonymous) struct for the
> crypto_acomp stuff, rename dstmem to buffer or so.
> 
> I think we can also make the mutex a static part of the struct, any
> advantage to dynamically allocating it?

Agree, it seems no much advantage to me, I can change to a static part.
As for the restructure, I have no strong opinion about it, maybe it's
better for me to leave it as it is.

Thanks!


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-12-25 14:39 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-18 11:50 [PATCH v3 0/6] mm/zswap: dstmem reuse optimizations and cleanups Chengming Zhou
2023-12-18 11:50 ` [PATCH v3 1/6] mm/zswap: change dstmem size to one page Chengming Zhou
2023-12-18 11:50 ` [PATCH v3 2/6] mm/zswap: reuse dstmem when decompress Chengming Zhou
2023-12-18 11:50 ` [PATCH v3 3/6] mm/zswap: refactor out __zswap_load() Chengming Zhou
2023-12-19 11:59   ` Chris Li
2023-12-18 11:50 ` [PATCH v3 4/6] mm/zswap: cleanup zswap_load() Chengming Zhou
2023-12-19 12:47   ` Chris Li
2023-12-19 14:07     ` Chengming Zhou
2023-12-18 11:50 ` [PATCH v3 5/6] mm/zswap: cleanup zswap_writeback_entry() Chengming Zhou
2023-12-19 12:50   ` Chris Li
2023-12-18 11:50 ` [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store Chengming Zhou
2023-12-19 13:29   ` Chris Li
2023-12-19 18:43     ` Nhat Pham
2023-12-19 21:39       ` Yosry Ahmed
2023-12-19 22:48         ` Chris Li
2023-12-19 23:04           ` Yosry Ahmed
2023-12-19 23:33             ` Chris Li
2023-12-20 12:20         ` Chengming Zhou
2023-12-21  0:19           ` Yosry Ahmed
2023-12-25 14:39             ` Chengming Zhou
2023-12-22 17:37           ` Chris Li
2023-12-25 14:32             ` Chengming Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox