[PATCH v1 0/2] Vectorize and simplify zswap_store

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/2] Vectorize and simplify zswap_store_page().
@ 2024-11-27 22:53 Kanchana P Sridhar
  2024-11-27 22:53 ` [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio Kanchana P Sridhar
  2024-11-27 22:53 ` [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching Kanchana P Sridhar
  0 siblings, 2 replies; 18+ messages in thread
From: Kanchana P Sridhar @ 2024-11-27 22:53 UTC (permalink / raw)
  To: linux-kernel, linux-mm, hannes, yosryahmed, nphamcs,
	chengming.zhou, usamaarif642, ryan.roberts, 21cnbao, akpm
  Cc: wajdi.k.feghali, vinodh.gopal, kanchana.p.sridhar

This patch series vectorizes and simplifies the existing
zswap_store()/zswap_store_page() implementation so that the IAA compress
batching functionality in [1] can be developed seamlessly with minimal
impact to zswap_store() code; while still having a single consolidated
implementation of zswap_store() for batching/non-batching compression
algorithms, which will make the code easier to maintain.

These changes have been developed based on code review comments in [2].

I would greatly appreciate suggestions for further improving the patches in
this series.

Once this patch series is reviewed, I intend to incorporate these patches
in the IAA compress batching series [1], to develop a v5 of that series.

The main focus of testing this specific series was to make sure there are no
performance regressions. usemem 30 processes was run for three folio
configurations, with zstd and with deflate-iaa:

1) 4k folios
2) 16k/32k/64k folios
3) 2M folios


System setup for testing:
=========================
Testing of this patch-series was done with mm-unstable as of 11-18-2024,
commit 5a7056135bb6, without and with this patch-series.
Data was gathered on an Intel Sapphire Rapids server, dual-socket 56 cores
per socket, 4 IAA devices per socket, 503 GiB RAM and 525G SSD disk
partition swap. Core frequency was fixed at 2500MHz.

Other kernel configuration parameters:

    zswap compressor  : zstd, deflate-iaa
    zswap allocator   : zsmalloc
    vm.page-cluster   : 2

IAA "compression verification" is enabled and IAA is run in the sync mode.
1WQ is configured per IAA device, and handles both, compressions and
decompressions.


Regression testing (usemem30):
==============================
The vm-scalability "usemem" test was run in a cgroup whose memory.high
was fixed at 150G. The is no swap limit set for the cgroup. 30 usemem
processes were run, each allocating and writing 10G of memory, and sleeping
for 10 sec before exiting:

usemem --init-time -w -O -s 10 -n 30 10g


 4k folios: zstd:
 ================

 -------------------------------------------------------------------------------
                       mm-unstable-11-18-2024   v1 of this patch-series
                                                 Patch 1       Patch 2 
 -------------------------------------------------------------------------------
 zswap compressor               zstd                zstd          zstd         
 vm.page-cluster                   2                   2             2
 -------------------------------------------------------------------------------
 Total throughput (KB/s)   4,783,479           4,755,909     4,868,751     
 Avg throughput (KB/s)       159,449             158,530       162,291     
 elapsed time (sec)           127.46              129.70        125.70     
 sys time (sec)             3,088.65            3,143.92      3,071.41     
                                                                              
 -------------------------------------------------------------------------------
 memcg_high                  437,178             428,090       451,918        
 memcg_swap_fail                   0                   0             0        
 zswpout                  48,931,290          48,931,325    48,932,080       
 zswpin                          390                 398           380        
 pswpout                           0                   0             0        
 pswpin                            0                   0             0        
 thp_swpout                        0                   0             0        
 thp_swpout_fallback               0                   0             0        
 pgmajfault                    3,231               3,636         3,627        
 swap_ra                          93                  90            91        
 swap_ra_hit                      48                  43            48        
 -------------------------------------------------------------------------------


 4k folios: deflate-iaa:
 =======================

 -------------------------------------------------------------------------------
                       mm-unstable-11-18-2024   v1 of this patch-series
                                                 Patch 1       Patch 2 
 -------------------------------------------------------------------------------
 zswap compressor        deflate-iaa         deflate-iaa   deflate-iaa         
 vm.page-cluster                   2                   2             2
 -------------------------------------------------------------------------------
 Total throughput (KB/s)   5,155,471           5,397,318     5,231,233     
 Avg throughput (KB/s)       171,849             179,910       174,374     
 elapsed time (sec)           108.35              104.93        107.81     
 sys time (sec)             2,400.32            2,293.43      2,395.95     
                                                                              
 -------------------------------------------------------------------------------
 memcg_high                  670,635             634,770       632,160        
 memcg_swap_fail                   0                   0             0        
 zswpout                  62,098,929          57,334,719    58,221,779       
 zswpin                          425                 402           392        
 pswpout                           0                   0             0        
 pswpin                            0                   0             0        
 thp_swpout                        0                   0             0        
 thp_swpout_fallback               0                   0             0        
 pgmajfault                    3,271               3,641         3,632        
 swap_ra                         103                 101            93        
 swap_ra_hit                      47                  48            45        
 -------------------------------------------------------------------------------


 16k/32/64k folios: zstd:
 ========================

 -------------------------------------------------------------------------------
                       mm-unstable-11-18-2024   v1 of this patch-series
                                                 Patch 1       Patch 2 
 -------------------------------------------------------------------------------
 zswap compressor               zstd                zstd          zstd         
 vm.page-cluster                   2                   2             2
 -------------------------------------------------------------------------------
 Total throughput (KB/s)   6,284,634           6,227,125     6,221,686
 Avg throughput (KB/s)       209,487             207,570       207,389
 elapsed time (sec)           107.64              110.57        109.96
 sys time (sec)             2,566.69            2,636.39      2,615.76
                                                                      
 -------------------------------------------------------------------------------
 memcg_high                  477,219             476,572       477,768
 memcg_swap_fail               1,040               1,089         1,088
 zswpout                  48,931,670          48,931,991    48,931,829
 zswpin                          384                 400           397
 pswpout                           0                   0             0
 pswpin                            0                   0             0
 thp_swpout                        0                   0             0
 thp_swpout_fallback               0                   0             0
 16kB-swpout_fallback              0                   0             0                                    
 32kB_swpout_fallback              0                   0             0
 64kB_swpout_fallback          1,040               1,089         1,088
 pgmajfault                    3,258               3,271         3,265
 swap_ra                          95                 106           101
 swap_ra_hit                      46                  56            54
 ZSWPOUT-16kB                      2                   3             5
 ZSWPOUT-32kB                      0                   1             1
 ZSWPOUT-64kB              3,057,203           3,057,162     3,057,147
 SWPOUT-16kB                       0                   0             0
 SWPOUT-32kB                       0                   0             0
 SWPOUT-64kB                       0                   0             0
 -------------------------------------------------------------------------------


 16k/32/64k folios: deflate-iaa:
 ===============================

 -------------------------------------------------------------------------------
                       mm-unstable-11-18-2024   v1 of this patch-series
                                                 Patch 1       Patch 2 
 -------------------------------------------------------------------------------
 zswap compressor        deflate-iaa         deflate-iaa   deflate-iaa         
 vm.page-cluster                   2                   2             2
 -------------------------------------------------------------------------------
 Total throughput (KB/s)   7,149,906           7,268,900     7,126,761
 Avg throughput (KB/s)       238,330             242,296       237,558
 elapsed time (sec)            84.38               87.44         84.18
 sys time (sec)             1,844.32            1,847.65      1,741.97
                                                                      
 -------------------------------------------------------------------------------
 memcg_high                  616,897             704,278       585,911
 memcg_swap_fail               2,734               1,858         1,708
 zswpout                  55,520,017          60,188,111    52,639,745
 zswpin                          491                 393           444
 pswpout                           0                   0             0
 pswpin                            0                   0             0
 thp_swpout                        0                   0             0
 thp_swpout_fallback               0                   0             0
 16kB-swpout_fallback              0                   0             0                      
 32kB_swpout_fallback              0                   0             0
 64kB_swpout_fallback          2,734               1,858         1,708
 pgmajfault                    3,314               3,266         3,277
 swap_ra                         128                 103           154
 swap_ra_hit                      49                  46            90
 ZSWPOUT-16kB                      4                   4             3
 ZSWPOUT-32kB                      2                   1             1
 ZSWPOUT-64kB              3,467,400           3,759,882     3,288,260
 SWPOUT-16kB                       0                   0             0
 SWPOUT-32kB                       0                   0             0
 SWPOUT-64kB                       0                   0             0
 -------------------------------------------------------------------------------


 2M folios: zstd:
 ================

 -------------------------------------------------------------------------------
                       mm-unstable-11-18-2024   v1 of this patch-series
                                                 Patch 1       Patch 2 
 -------------------------------------------------------------------------------
 zswap compressor               zstd                zstd          zstd         
 vm.page-cluster                   2                   2             2
 -------------------------------------------------------------------------------
 Total throughput (KB/s)   6,466,700           6,544,384     6,532,820 
 Avg throughput (KB/s)       215,556             218,146       217,760
 elapsed time (sec)           106.80              106.29        105.45
 sys time (sec)             2,420.88            2,462.67      2,380.86
                                                                      
 -------------------------------------------------------------------------------
 memcg_high                   60,926              58,746        62,680
 memcg_swap_fail                  44                  62            60
 zswpout                  48,892,828          48,936,840    48,934,265
 zswpin                          391                 406           391
 pswpout                           0                   0             0
 pswpin                            0                   0             0
 thp_swpout                        0                   0             0
 thp_swpout_fallback              44                  62            60
 pgmajfault                    4,907               4,793         5,461
 swap_ra                       5,070               4,693         6,605
 swap_ra_hit                   5,024               4,639         6,556
 ZSWPOUT-2048kB               95,442              95,509        95,506
 SWPOUT-2048kB                     0                   0             0
 -------------------------------------------------------------------------------


 2M folios: deflate-iaa:
 =======================

 -------------------------------------------------------------------------------
                       mm-unstable-11-18-2024   v1 of this patch-series
                                                 Patch 1       Patch 2 
 -------------------------------------------------------------------------------
 zswap compressor        deflate-iaa         deflate-iaa   deflate-iaa         
 vm.page-cluster                   2                   2             2
 -------------------------------------------------------------------------------
 Total throughput (KB/s)   7,245,936           7,589,698     7,470,639
 Avg throughput (KB/s)       241,531             252,989       249,021
 elapsed time (sec)            84.44               82.77         82.54
 sys time (sec)             1,753.41            1,681.53      1,674.63
                                                                      
 -------------------------------------------------------------------------------
 memcg_high                   79,259              85,642        84,382
 memcg_swap_fail                 139               1,429         2,163
 zswpout                  57,701,156          59,347,201    58,657,587
 zswpin                          419                 467           469
 pswpout                           0                   0             0
 pswpin                            0                   0             0
 thp_swpout                        0                   0             0
 thp_swpout_fallback             139               1,429         2,163
 pgmajfault                   11,542              19,689        28,301
 swap_ra                      24,613              47,622        73,288
 swap_ra_hit                  24,555              47,535        73,203
 ZSWPOUT-2048kB              112,515             114,659       112,860
 SWPOUT-2048kB                     0                   0             0
 -------------------------------------------------------------------------------


Summary:
========
There are no noticeable performance regressions with this patch series.


Changes in v1:
==============
1) Addressed code review comments from Yosry and Johannes in [2]. Thanks
   both!

 [1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935
 [2]: https://patchwork.kernel.org/project/linux-mm/patch/20241123070127.332773-11-kanchana.p.sridhar@intel.com/


Thanks,
Kanchana



Kanchana P Sridhar (2):
  mm: zswap: Modified zswap_store_page() to process multiple pages in a
    folio.
  mm: zswap: zswap_store_pages() simplifications for batching.

 include/linux/zswap.h |   1 +
 mm/zswap.c            | 199 ++++++++++++++++++++++++++++--------------
 2 files changed, 135 insertions(+), 65 deletions(-)


base-commit: 5a7056135bb69da2ce0a42eb8c07968c1331777b
-- 
2.27.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio.
  2024-11-27 22:53 [PATCH v1 0/2] Vectorize and simplify zswap_store_page() Kanchana P Sridhar
@ 2024-11-27 22:53 ` Kanchana P Sridhar
  2024-12-02 19:33   ` Yosry Ahmed
  2024-12-03  3:23   ` Chengming Zhou
  2024-11-27 22:53 ` [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching Kanchana P Sridhar
  1 sibling, 2 replies; 18+ messages in thread
From: Kanchana P Sridhar @ 2024-11-27 22:53 UTC (permalink / raw)
  To: linux-kernel, linux-mm, hannes, yosryahmed, nphamcs,
	chengming.zhou, usamaarif642, ryan.roberts, 21cnbao, akpm
  Cc: wajdi.k.feghali, vinodh.gopal, kanchana.p.sridhar

Modified zswap_store() to store the folio in batches of
SWAP_CRYPTO_BATCH_SIZE pages. Accordingly, refactored zswap_store_page()
into zswap_store_pages() that processes a range of pages in the folio.
zswap_store_pages() is a vectorized version of zswap_store_page().

For now, zswap_store_pages() will sequentially compress these pages with
zswap_compress().

These changes are follow-up to code review comments received for [1], and
are intended to set up zswap_store() for batching with Intel IAA.

[1]: https://patchwork.kernel.org/project/linux-mm/patch/20241123070127.332773-11-kanchana.p.sridhar@intel.com/

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 include/linux/zswap.h |   1 +
 mm/zswap.c            | 154 ++++++++++++++++++++++++------------------
 2 files changed, 88 insertions(+), 67 deletions(-)

diff --git a/include/linux/zswap.h b/include/linux/zswap.h
index d961ead91bf1..05a81e750744 100644
--- a/include/linux/zswap.h
+++ b/include/linux/zswap.h
@@ -7,6 +7,7 @@
 
 struct lruvec;
 
+#define SWAP_CRYPTO_BATCH_SIZE 8UL
 extern atomic_long_t zswap_stored_pages;
 
 #ifdef CONFIG_ZSWAP
diff --git a/mm/zswap.c b/mm/zswap.c
index f6316b66fb23..b09d1023e775 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1409,78 +1409,96 @@ static void shrink_worker(struct work_struct *w)
 * main API
 **********************************/
 
-static ssize_t zswap_store_page(struct page *page,
-				struct obj_cgroup *objcg,
-				struct zswap_pool *pool)
+/*
+ * Store multiple pages in @folio, starting from the page at index @si up to
+ * and including the page at index @ei.
+ */
+static ssize_t zswap_store_pages(struct folio *folio,
+				 long si,
+				 long ei,
+				 struct obj_cgroup *objcg,
+				 struct zswap_pool *pool)
 {
-	swp_entry_t page_swpentry = page_swap_entry(page);
+	struct page *page;
+	swp_entry_t page_swpentry;
 	struct zswap_entry *entry, *old;
+	size_t compressed_bytes = 0;
+	u8 nr_pages = ei - si + 1;
+	u8 i;
+
+	for (i = 0; i < nr_pages; ++i) {
+		page = folio_page(folio, si + i);
+		page_swpentry = page_swap_entry(page);
+
+		/* allocate entry */
+		entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
+		if (!entry) {
+			zswap_reject_kmemcache_fail++;
+			return -EINVAL;
+		}
 
-	/* allocate entry */
-	entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
-	if (!entry) {
-		zswap_reject_kmemcache_fail++;
-		return -EINVAL;
-	}
-
-	if (!zswap_compress(page, entry, pool))
-		goto compress_failed;
+		if (!zswap_compress(page, entry, pool))
+			goto compress_failed;
 
-	old = xa_store(swap_zswap_tree(page_swpentry),
-		       swp_offset(page_swpentry),
-		       entry, GFP_KERNEL);
-	if (xa_is_err(old)) {
-		int err = xa_err(old);
+		old = xa_store(swap_zswap_tree(page_swpentry),
+			       swp_offset(page_swpentry),
+			       entry, GFP_KERNEL);
+		if (xa_is_err(old)) {
+			int err = xa_err(old);
 
-		WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
-		zswap_reject_alloc_fail++;
-		goto store_failed;
-	}
+			WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
+			zswap_reject_alloc_fail++;
+			goto store_failed;
+		}
 
-	/*
-	 * We may have had an existing entry that became stale when
-	 * the folio was redirtied and now the new version is being
-	 * swapped out. Get rid of the old.
-	 */
-	if (old)
-		zswap_entry_free(old);
+		/*
+		 * We may have had an existing entry that became stale when
+		 * the folio was redirtied and now the new version is being
+		 * swapped out. Get rid of the old.
+		 */
+		if (old)
+			zswap_entry_free(old);
 
-	/*
-	 * The entry is successfully compressed and stored in the tree, there is
-	 * no further possibility of failure. Grab refs to the pool and objcg.
-	 * These refs will be dropped by zswap_entry_free() when the entry is
-	 * removed from the tree.
-	 */
-	zswap_pool_get(pool);
-	if (objcg)
-		obj_cgroup_get(objcg);
+		/*
+		 * The entry is successfully compressed and stored in the tree, there is
+		 * no further possibility of failure. Grab refs to the pool and objcg.
+		 * These refs will be dropped by zswap_entry_free() when the entry is
+		 * removed from the tree.
+		 */
+		zswap_pool_get(pool);
+		if (objcg)
+			obj_cgroup_get(objcg);
 
-	/*
-	 * We finish initializing the entry while it's already in xarray.
-	 * This is safe because:
-	 *
-	 * 1. Concurrent stores and invalidations are excluded by folio lock.
-	 *
-	 * 2. Writeback is excluded by the entry not being on the LRU yet.
-	 *    The publishing order matters to prevent writeback from seeing
-	 *    an incoherent entry.
-	 */
-	entry->pool = pool;
-	entry->swpentry = page_swpentry;
-	entry->objcg = objcg;
-	entry->referenced = true;
-	if (entry->length) {
-		INIT_LIST_HEAD(&entry->lru);
-		zswap_lru_add(&zswap_list_lru, entry);
-	}
+		/*
+		 * We finish initializing the entry while it's already in xarray.
+		 * This is safe because:
+		 *
+		 * 1. Concurrent stores and invalidations are excluded by folio lock.
+		 *
+		 * 2. Writeback is excluded by the entry not being on the LRU yet.
+		 *    The publishing order matters to prevent writeback from seeing
+		 *    an incoherent entry.
+		 */
+		entry->pool = pool;
+		entry->swpentry = page_swpentry;
+		entry->objcg = objcg;
+		entry->referenced = true;
+		if (entry->length) {
+			INIT_LIST_HEAD(&entry->lru);
+			zswap_lru_add(&zswap_list_lru, entry);
+		}
 
-	return entry->length;
+		compressed_bytes += entry->length;
+		continue;
 
 store_failed:
-	zpool_free(pool->zpool, entry->handle);
+		zpool_free(pool->zpool, entry->handle);
 compress_failed:
-	zswap_entry_cache_free(entry);
-	return -EINVAL;
+		zswap_entry_cache_free(entry);
+		return -EINVAL;
+	}
+
+	return compressed_bytes;
 }
 
 bool zswap_store(struct folio *folio)
@@ -1492,7 +1510,7 @@ bool zswap_store(struct folio *folio)
 	struct zswap_pool *pool;
 	size_t compressed_bytes = 0;
 	bool ret = false;
-	long index;
+	long si, ei, incr = SWAP_CRYPTO_BATCH_SIZE;
 
 	VM_WARN_ON_ONCE(!folio_test_locked(folio));
 	VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
@@ -1526,11 +1544,13 @@ bool zswap_store(struct folio *folio)
 		mem_cgroup_put(memcg);
 	}
 
-	for (index = 0; index < nr_pages; ++index) {
-		struct page *page = folio_page(folio, index);
+	/* Store the folio in batches of SWAP_CRYPTO_BATCH_SIZE pages. */
+	for (si = 0, ei = min(si + incr - 1, nr_pages - 1);
+	     ((si < nr_pages) && (ei < nr_pages));
+	     si = ei + 1, ei = min(si + incr - 1, nr_pages - 1)) {
 		ssize_t bytes;
 
-		bytes = zswap_store_page(page, objcg, pool);
+		bytes = zswap_store_pages(folio, si, ei, objcg, pool);
 		if (bytes < 0)
 			goto put_pool;
 		compressed_bytes += bytes;
@@ -1565,9 +1585,9 @@ bool zswap_store(struct folio *folio)
 		struct zswap_entry *entry;
 		struct xarray *tree;
 
-		for (index = 0; index < nr_pages; ++index) {
-			tree = swap_zswap_tree(swp_entry(type, offset + index));
-			entry = xa_erase(tree, offset + index);
+		for (si = 0; si < nr_pages; ++si) {
+			tree = swap_zswap_tree(swp_entry(type, offset + si));
+			entry = xa_erase(tree, offset + si);
 			if (entry)
 				zswap_entry_free(entry);
 		}
-- 
2.27.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
  2024-11-27 22:53 [PATCH v1 0/2] Vectorize and simplify zswap_store_page() Kanchana P Sridhar
  2024-11-27 22:53 ` [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio Kanchana P Sridhar
@ 2024-11-27 22:53 ` Kanchana P Sridhar
  2024-11-28  7:00   ` Chengming Zhou
  1 sibling, 1 reply; 18+ messages in thread
From: Kanchana P Sridhar @ 2024-11-27 22:53 UTC (permalink / raw)
  To: linux-kernel, linux-mm, hannes, yosryahmed, nphamcs,
	chengming.zhou, usamaarif642, ryan.roberts, 21cnbao, akpm
  Cc: wajdi.k.feghali, vinodh.gopal, kanchana.p.sridhar

In order to set up zswap_store_pages() to enable a clean batching
implementation in [1], this patch implements the following changes:

1) Addition of zswap_alloc_entries() which will allocate zswap entries for
   all pages in the specified range for the folio, upfront. If this fails,
   we return an error status to zswap_store().

2) Addition of zswap_compress_pages() that calls zswap_compress() for each
   page, and returns false if any zswap_compress() fails, so
   zswap_store_page() can cleanup resources allocated and return an error
   status to zswap_store().

3) A "store_pages_failed" label that is a catch-all for all failure points
   in zswap_store_pages(). This facilitates cleaner error handling within
   zswap_store_pages(), which will become important for IAA compress
   batching in [1].

[1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 71 insertions(+), 22 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index b09d1023e775..db80c66e2205 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct *w)
 * main API
 **********************************/
 
+static bool zswap_compress_pages(struct page *pages[],
+				 struct zswap_entry *entries[],
+				 u8 nr_pages,
+				 struct zswap_pool *pool)
+{
+	u8 i;
+
+	for (i = 0; i < nr_pages; ++i) {
+		if (!zswap_compress(pages[i], entries[i], pool))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * Allocate @nr zswap entries for storing @nr pages in a folio.
+ * If any one of the entry allocation fails, delete all entries allocated
+ * thus far, and return false.
+ * If @nr entries are successfully allocated, set each entry's "handle"
+ * to "ERR_PTR(-EINVAL)" to denote that the handle has not yet been allocated.
+ */
+static bool zswap_alloc_entries(struct zswap_entry *entries[], int node_id, u8 nr)
+{
+	u8 i;
+
+	for (i = 0; i < nr; ++i) {
+		entries[i] = zswap_entry_cache_alloc(GFP_KERNEL, node_id);
+		if (!entries[i]) {
+			u8 j;
+
+			zswap_reject_kmemcache_fail++;
+			for (j = 0; j < i; ++j)
+				zswap_entry_cache_free(entries[j]);
+			return false;
+		}
+
+		entries[i]->handle = (unsigned long)ERR_PTR(-EINVAL);
+	}
+
+	return true;
+}
+
 /*
  * Store multiple pages in @folio, starting from the page at index @si up to
  * and including the page at index @ei.
+ * The error handling from all failure points is handled by the
+ * "store_pages_failed" label, based on the initial ERR_PTR(-EINVAL) value for
+ * the zswap_entry's handle set by zswap_alloc_entries(), and the fact that the
+ * entry's handle is subsequently modified only upon a successful zpool_malloc().
  */
 static ssize_t zswap_store_pages(struct folio *folio,
 				 long si,
@@ -1419,26 +1466,25 @@ static ssize_t zswap_store_pages(struct folio *folio,
 				 struct obj_cgroup *objcg,
 				 struct zswap_pool *pool)
 {
-	struct page *page;
-	swp_entry_t page_swpentry;
-	struct zswap_entry *entry, *old;
+	struct zswap_entry *entries[SWAP_CRYPTO_BATCH_SIZE], *old;
+	struct page *pages[SWAP_CRYPTO_BATCH_SIZE];
 	size_t compressed_bytes = 0;
 	u8 nr_pages = ei - si + 1;
 	u8 i;
 
-	for (i = 0; i < nr_pages; ++i) {
-		page = folio_page(folio, si + i);
-		page_swpentry = page_swap_entry(page);
+	/* allocate entries */
+	if (!zswap_alloc_entries(entries, folio_nid(folio), nr_pages))
+		return -EINVAL;
 
-		/* allocate entry */
-		entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
-		if (!entry) {
-			zswap_reject_kmemcache_fail++;
-			return -EINVAL;
-		}
+	for (i = 0; i < nr_pages; ++i)
+		pages[i] = folio_page(folio, si + i);
 
-		if (!zswap_compress(page, entry, pool))
-			goto compress_failed;
+	if (!zswap_compress_pages(pages, entries, nr_pages, pool))
+		goto store_pages_failed;
+
+	for (i = 0; i < nr_pages; ++i) {
+		swp_entry_t page_swpentry = page_swap_entry(pages[i]);
+		struct zswap_entry *entry = entries[i];
 
 		old = xa_store(swap_zswap_tree(page_swpentry),
 			       swp_offset(page_swpentry),
@@ -1448,7 +1494,7 @@ static ssize_t zswap_store_pages(struct folio *folio,
 
 			WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
 			zswap_reject_alloc_fail++;
-			goto store_failed;
+			goto store_pages_failed;
 		}
 
 		/*
@@ -1489,16 +1535,19 @@ static ssize_t zswap_store_pages(struct folio *folio,
 		}
 
 		compressed_bytes += entry->length;
-		continue;
-
-store_failed:
-		zpool_free(pool->zpool, entry->handle);
-compress_failed:
-		zswap_entry_cache_free(entry);
-		return -EINVAL;
 	}
 
 	return compressed_bytes;
+
+store_pages_failed:
+	for (i = 0; i < nr_pages; ++i) {
+		if (!IS_ERR_VALUE(entries[i]->handle))
+			zpool_free(pool->zpool, entries[i]->handle);
+
+		zswap_entry_cache_free(entries[i]);
+	}
+
+	return -EINVAL;
 }
 
 bool zswap_store(struct folio *folio)
-- 
2.27.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
  2024-11-27 22:53 ` [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching Kanchana P Sridhar
@ 2024-11-28  7:00   ` Chengming Zhou
  2024-12-02 19:32     ` Yosry Ahmed
  2024-12-03  0:17     ` Nhat Pham
  0 siblings, 2 replies; 18+ messages in thread
From: Chengming Zhou @ 2024-11-28  7:00 UTC (permalink / raw)
  To: Kanchana P Sridhar, linux-kernel, linux-mm, hannes, yosryahmed,
	nphamcs, usamaarif642, ryan.roberts, 21cnbao, akpm
  Cc: wajdi.k.feghali, vinodh.gopal

On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> In order to set up zswap_store_pages() to enable a clean batching
> implementation in [1], this patch implements the following changes:
> 
> 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
>     all pages in the specified range for the folio, upfront. If this fails,
>     we return an error status to zswap_store().
> 
> 2) Addition of zswap_compress_pages() that calls zswap_compress() for each
>     page, and returns false if any zswap_compress() fails, so
>     zswap_store_page() can cleanup resources allocated and return an error
>     status to zswap_store().
> 
> 3) A "store_pages_failed" label that is a catch-all for all failure points
>     in zswap_store_pages(). This facilitates cleaner error handling within
>     zswap_store_pages(), which will become important for IAA compress
>     batching in [1].
> 
> [1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935
> 
> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> ---
>   mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++-------------
>   1 file changed, 71 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index b09d1023e775..db80c66e2205 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct *w)
>   * main API
>   **********************************/
>   
> +static bool zswap_compress_pages(struct page *pages[],
> +				 struct zswap_entry *entries[],
> +				 u8 nr_pages,
> +				 struct zswap_pool *pool)
> +{
> +	u8 i;
> +
> +	for (i = 0; i < nr_pages; ++i) {
> +		if (!zswap_compress(pages[i], entries[i], pool))
> +			return false;
> +	}
> +
> +	return true;
> +}

How about introducing a `zswap_compress_folio()` interface which
can be used by `zswap_store()`?
```
zswap_store()
	nr_pages = folio_nr_pages(folio)

	entries = zswap_alloc_entries(nr_pages)

	ret = zswap_compress_folio(folio, entries, pool)

	// store entries into xarray and LRU list
```

And this version `zswap_compress_folio()` is very simple for now:
```
zswap_compress_folio()
	nr_pages = folio_nr_pages(folio)

	for (index = 0; index < nr_pages; ++index) {
		struct page *page = folio_page(folio, index);

		if (!zswap_compress(page, &entries[index], pool))
			return false;
	}

	return true;
```
This can be easily extended to support your "batched" version.

Then the old `zswap_store_page()` could be removed.

The good point is simplicity, that we don't need to slice folio
into multiple batches, then repeat the common operations for each
batch, like preparing entries, storing into xarray and LRU list...

Thanks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
  2024-11-28  7:00   ` Chengming Zhou
@ 2024-12-02 19:32     ` Yosry Ahmed
  2024-12-03  1:01       ` Sridhar, Kanchana P
  2024-12-03  0:17     ` Nhat Pham
  1 sibling, 1 reply; 18+ messages in thread
From: Yosry Ahmed @ 2024-12-02 19:32 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Kanchana P Sridhar, linux-kernel, linux-mm, hannes, nphamcs,
	usamaarif642, ryan.roberts, 21cnbao, akpm, wajdi.k.feghali,
	vinodh.gopal

On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
<chengming.zhou@linux.dev> wrote:
>
> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> > In order to set up zswap_store_pages() to enable a clean batching
> > implementation in [1], this patch implements the following changes:
> >
> > 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
> >     all pages in the specified range for the folio, upfront. If this fails,
> >     we return an error status to zswap_store().
> >
> > 2) Addition of zswap_compress_pages() that calls zswap_compress() for each
> >     page, and returns false if any zswap_compress() fails, so
> >     zswap_store_page() can cleanup resources allocated and return an error
> >     status to zswap_store().
> >
> > 3) A "store_pages_failed" label that is a catch-all for all failure points
> >     in zswap_store_pages(). This facilitates cleaner error handling within
> >     zswap_store_pages(), which will become important for IAA compress
> >     batching in [1].
> >
> > [1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935
> >
> > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > ---
> >   mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++-------------
> >   1 file changed, 71 insertions(+), 22 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index b09d1023e775..db80c66e2205 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct *w)
> >   * main API
> >   **********************************/
> >
> > +static bool zswap_compress_pages(struct page *pages[],
> > +                              struct zswap_entry *entries[],
> > +                              u8 nr_pages,
> > +                              struct zswap_pool *pool)
> > +{
> > +     u8 i;
> > +
> > +     for (i = 0; i < nr_pages; ++i) {
> > +             if (!zswap_compress(pages[i], entries[i], pool))
> > +                     return false;
> > +     }
> > +
> > +     return true;
> > +}
>
> How about introducing a `zswap_compress_folio()` interface which
> can be used by `zswap_store()`?
> ```
> zswap_store()
>         nr_pages = folio_nr_pages(folio)
>
>         entries = zswap_alloc_entries(nr_pages)
>
>         ret = zswap_compress_folio(folio, entries, pool)
>
>         // store entries into xarray and LRU list
> ```
>
> And this version `zswap_compress_folio()` is very simple for now:
> ```
> zswap_compress_folio()
>         nr_pages = folio_nr_pages(folio)
>
>         for (index = 0; index < nr_pages; ++index) {
>                 struct page *page = folio_page(folio, index);
>
>                 if (!zswap_compress(page, &entries[index], pool))
>                         return false;
>         }
>
>         return true;
> ```
> This can be easily extended to support your "batched" version.
>
> Then the old `zswap_store_page()` could be removed.
>
> The good point is simplicity, that we don't need to slice folio
> into multiple batches, then repeat the common operations for each
> batch, like preparing entries, storing into xarray and LRU list...

+1

Also, I don't like the helpers hiding some of the loops and leaving
others, as Johannes said, please keep all the iteration over pages at
the same function level where possible to make the code clear.

This should not be a separate series too, when I said divide into
chunks I meant leave out the multiple folios batching and focus on
batching pages in a single large folio, not breaking down the series
into multiple ones. Not a big deal tho :)

>
> Thanks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio.
  2024-11-27 22:53 ` [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio Kanchana P Sridhar
@ 2024-12-02 19:33   ` Yosry Ahmed
  2024-12-03  1:13     ` Sridhar, Kanchana P
  2024-12-03  3:23   ` Chengming Zhou
  1 sibling, 1 reply; 18+ messages in thread
From: Yosry Ahmed @ 2024-12-02 19:33 UTC (permalink / raw)
  To: Kanchana P Sridhar
  Cc: linux-kernel, linux-mm, hannes, nphamcs, chengming.zhou,
	usamaarif642, ryan.roberts, 21cnbao, akpm, wajdi.k.feghali,
	vinodh.gopal

On Wed, Nov 27, 2024 at 2:53 PM Kanchana P Sridhar
<kanchana.p.sridhar@intel.com> wrote:
>
> Modified zswap_store() to store the folio in batches of
> SWAP_CRYPTO_BATCH_SIZE pages. Accordingly, refactored zswap_store_page()
> into zswap_store_pages() that processes a range of pages in the folio.
> zswap_store_pages() is a vectorized version of zswap_store_page().
>
> For now, zswap_store_pages() will sequentially compress these pages with
> zswap_compress().
>
> These changes are follow-up to code review comments received for [1], and
> are intended to set up zswap_store() for batching with Intel IAA.
>
> [1]: https://patchwork.kernel.org/project/linux-mm/patch/20241123070127.332773-11-kanchana.p.sridhar@intel.com/
>
> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> ---
>  include/linux/zswap.h |   1 +
>  mm/zswap.c            | 154 ++++++++++++++++++++++++------------------
>  2 files changed, 88 insertions(+), 67 deletions(-)
>
> diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> index d961ead91bf1..05a81e750744 100644
> --- a/include/linux/zswap.h
> +++ b/include/linux/zswap.h
> @@ -7,6 +7,7 @@
>
>  struct lruvec;
>
> +#define SWAP_CRYPTO_BATCH_SIZE 8UL
>  extern atomic_long_t zswap_stored_pages;
>
>  #ifdef CONFIG_ZSWAP
> diff --git a/mm/zswap.c b/mm/zswap.c
> index f6316b66fb23..b09d1023e775 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1409,78 +1409,96 @@ static void shrink_worker(struct work_struct *w)
>  * main API
>  **********************************/
>
> -static ssize_t zswap_store_page(struct page *page,
> -                               struct obj_cgroup *objcg,
> -                               struct zswap_pool *pool)
> +/*
> + * Store multiple pages in @folio, starting from the page at index @si up to
> + * and including the page at index @ei.
> + */
> +static ssize_t zswap_store_pages(struct folio *folio,
> +                                long si,
> +                                long ei,
> +                                struct obj_cgroup *objcg,
> +                                struct zswap_pool *pool)
>  {
> -       swp_entry_t page_swpentry = page_swap_entry(page);
> +       struct page *page;
> +       swp_entry_t page_swpentry;
>         struct zswap_entry *entry, *old;
> +       size_t compressed_bytes = 0;
> +       u8 nr_pages = ei - si + 1;
> +       u8 i;
> +
> +       for (i = 0; i < nr_pages; ++i) {
> +               page = folio_page(folio, si + i);
> +               page_swpentry = page_swap_entry(page);
> +
> +               /* allocate entry */
> +               entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
> +               if (!entry) {
> +                       zswap_reject_kmemcache_fail++;
> +                       return -EINVAL;
> +               }

I think this patch is wrong on its own, for example if an allocation
fails in the above loop we exit without cleaning up previous
allocations. I think it's fixed in patch 2 but we cannot introduce
bugs in-between patches. I think the helpers in patch 2 don't really
help as I mentioned. Please combine the changes and keep them in the
main series (unless you have a reason not to).

>
> -       /* allocate entry */
> -       entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
> -       if (!entry) {
> -               zswap_reject_kmemcache_fail++;
> -               return -EINVAL;
> -       }
> -
> -       if (!zswap_compress(page, entry, pool))
> -               goto compress_failed;
> +               if (!zswap_compress(page, entry, pool))
> +                       goto compress_failed;
>
> -       old = xa_store(swap_zswap_tree(page_swpentry),
> -                      swp_offset(page_swpentry),
> -                      entry, GFP_KERNEL);
> -       if (xa_is_err(old)) {
> -               int err = xa_err(old);
> +               old = xa_store(swap_zswap_tree(page_swpentry),
> +                              swp_offset(page_swpentry),
> +                              entry, GFP_KERNEL);
> +               if (xa_is_err(old)) {
> +                       int err = xa_err(old);
>
> -               WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
> -               zswap_reject_alloc_fail++;
> -               goto store_failed;
> -       }
> +                       WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
> +                       zswap_reject_alloc_fail++;
> +                       goto store_failed;
> +               }
>
> -       /*
> -        * We may have had an existing entry that became stale when
> -        * the folio was redirtied and now the new version is being
> -        * swapped out. Get rid of the old.
> -        */
> -       if (old)
> -               zswap_entry_free(old);
> +               /*
> +                * We may have had an existing entry that became stale when
> +                * the folio was redirtied and now the new version is being
> +                * swapped out. Get rid of the old.
> +                */
> +               if (old)
> +                       zswap_entry_free(old);
>
> -       /*
> -        * The entry is successfully compressed and stored in the tree, there is
> -        * no further possibility of failure. Grab refs to the pool and objcg.
> -        * These refs will be dropped by zswap_entry_free() when the entry is
> -        * removed from the tree.
> -        */
> -       zswap_pool_get(pool);
> -       if (objcg)
> -               obj_cgroup_get(objcg);
> +               /*
> +                * The entry is successfully compressed and stored in the tree, there is
> +                * no further possibility of failure. Grab refs to the pool and objcg.
> +                * These refs will be dropped by zswap_entry_free() when the entry is
> +                * removed from the tree.
> +                */
> +               zswap_pool_get(pool);
> +               if (objcg)
> +                       obj_cgroup_get(objcg);
>
> -       /*
> -        * We finish initializing the entry while it's already in xarray.
> -        * This is safe because:
> -        *
> -        * 1. Concurrent stores and invalidations are excluded by folio lock.
> -        *
> -        * 2. Writeback is excluded by the entry not being on the LRU yet.
> -        *    The publishing order matters to prevent writeback from seeing
> -        *    an incoherent entry.
> -        */
> -       entry->pool = pool;
> -       entry->swpentry = page_swpentry;
> -       entry->objcg = objcg;
> -       entry->referenced = true;
> -       if (entry->length) {
> -               INIT_LIST_HEAD(&entry->lru);
> -               zswap_lru_add(&zswap_list_lru, entry);
> -       }
> +               /*
> +                * We finish initializing the entry while it's already in xarray.
> +                * This is safe because:
> +                *
> +                * 1. Concurrent stores and invalidations are excluded by folio lock.
> +                *
> +                * 2. Writeback is excluded by the entry not being on the LRU yet.
> +                *    The publishing order matters to prevent writeback from seeing
> +                *    an incoherent entry.
> +                */
> +               entry->pool = pool;
> +               entry->swpentry = page_swpentry;
> +               entry->objcg = objcg;
> +               entry->referenced = true;
> +               if (entry->length) {
> +                       INIT_LIST_HEAD(&entry->lru);
> +                       zswap_lru_add(&zswap_list_lru, entry);
> +               }
>
> -       return entry->length;
> +               compressed_bytes += entry->length;
> +               continue;
>
>  store_failed:
> -       zpool_free(pool->zpool, entry->handle);
> +               zpool_free(pool->zpool, entry->handle);
>  compress_failed:
> -       zswap_entry_cache_free(entry);
> -       return -EINVAL;
> +               zswap_entry_cache_free(entry);
> +               return -EINVAL;
> +       }
> +
> +       return compressed_bytes;
>  }
>
>  bool zswap_store(struct folio *folio)
> @@ -1492,7 +1510,7 @@ bool zswap_store(struct folio *folio)
>         struct zswap_pool *pool;
>         size_t compressed_bytes = 0;
>         bool ret = false;
> -       long index;
> +       long si, ei, incr = SWAP_CRYPTO_BATCH_SIZE;
>
>         VM_WARN_ON_ONCE(!folio_test_locked(folio));
>         VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
> @@ -1526,11 +1544,13 @@ bool zswap_store(struct folio *folio)
>                 mem_cgroup_put(memcg);
>         }
>
> -       for (index = 0; index < nr_pages; ++index) {
> -               struct page *page = folio_page(folio, index);
> +       /* Store the folio in batches of SWAP_CRYPTO_BATCH_SIZE pages. */
> +       for (si = 0, ei = min(si + incr - 1, nr_pages - 1);
> +            ((si < nr_pages) && (ei < nr_pages));
> +            si = ei + 1, ei = min(si + incr - 1, nr_pages - 1)) {
>                 ssize_t bytes;
>
> -               bytes = zswap_store_page(page, objcg, pool);
> +               bytes = zswap_store_pages(folio, si, ei, objcg, pool);
>                 if (bytes < 0)
>                         goto put_pool;
>                 compressed_bytes += bytes;
> @@ -1565,9 +1585,9 @@ bool zswap_store(struct folio *folio)
>                 struct zswap_entry *entry;
>                 struct xarray *tree;
>
> -               for (index = 0; index < nr_pages; ++index) {
> -                       tree = swap_zswap_tree(swp_entry(type, offset + index));
> -                       entry = xa_erase(tree, offset + index);
> +               for (si = 0; si < nr_pages; ++si) {
> +                       tree = swap_zswap_tree(swp_entry(type, offset + si));
> +                       entry = xa_erase(tree, offset + si);
>                         if (entry)
>                                 zswap_entry_free(entry);
>                 }
> --
> 2.27.0
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
  2024-11-28  7:00   ` Chengming Zhou
  2024-12-02 19:32     ` Yosry Ahmed
@ 2024-12-03  0:17     ` Nhat Pham
  2024-12-03  1:15       ` Sridhar, Kanchana P
  1 sibling, 1 reply; 18+ messages in thread
From: Nhat Pham @ 2024-12-03  0:17 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Kanchana P Sridhar, linux-kernel, linux-mm, hannes, yosryahmed,
	usamaarif642, ryan.roberts, 21cnbao, akpm, wajdi.k.feghali,
	vinodh.gopal

On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
<chengming.zhou@linux.dev> wrote:
>
> How about introducing a `zswap_compress_folio()` interface which
> can be used by `zswap_store()`?
> ```
> zswap_store()
>         nr_pages = folio_nr_pages(folio)
>
>         entries = zswap_alloc_entries(nr_pages)
>
>         ret = zswap_compress_folio(folio, entries, pool)
>
>         // store entries into xarray and LRU list
> ```
>
> And this version `zswap_compress_folio()` is very simple for now:
> ```
> zswap_compress_folio()
>         nr_pages = folio_nr_pages(folio)
>
>         for (index = 0; index < nr_pages; ++index) {
>                 struct page *page = folio_page(folio, index);
>
>                 if (!zswap_compress(page, &entries[index], pool))
>                         return false;
>         }
>
>         return true;
> ```
> This can be easily extended to support your "batched" version.
>
> Then the old `zswap_store_page()` could be removed.
>
> The good point is simplicity, that we don't need to slice folio
> into multiple batches, then repeat the common operations for each
> batch, like preparing entries, storing into xarray and LRU list...
>

+1.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
  2024-12-02 19:32     ` Yosry Ahmed
@ 2024-12-03  1:01       ` Sridhar, Kanchana P
  2024-12-03  3:06         ` Chengming Zhou
  0 siblings, 1 reply; 18+ messages in thread
From: Sridhar, Kanchana P @ 2024-12-03  1:01 UTC (permalink / raw)
  To: Yosry Ahmed, Chengming Zhou
  Cc: linux-kernel, linux-mm, hannes, nphamcs, usamaarif642,
	ryan.roberts, 21cnbao, akpm, Feghali, Wajdi K, Gopal, Vinodh,
	Sridhar, Kanchana P

Hi Chengming, Yosry,

> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Monday, December 2, 2024 11:33 AM
> To: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> nphamcs@gmail.com; usamaarif642@gmail.com; ryan.roberts@arm.com;
> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
> for batching.
> 
> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
> <chengming.zhou@linux.dev> wrote:
> >
> > On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> > > In order to set up zswap_store_pages() to enable a clean batching
> > > implementation in [1], this patch implements the following changes:
> > >
> > > 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
> > >     all pages in the specified range for the folio, upfront. If this fails,
> > >     we return an error status to zswap_store().
> > >
> > > 2) Addition of zswap_compress_pages() that calls zswap_compress() for
> each
> > >     page, and returns false if any zswap_compress() fails, so
> > >     zswap_store_page() can cleanup resources allocated and return an
> error
> > >     status to zswap_store().
> > >
> > > 3) A "store_pages_failed" label that is a catch-all for all failure points
> > >     in zswap_store_pages(). This facilitates cleaner error handling within
> > >     zswap_store_pages(), which will become important for IAA compress
> > >     batching in [1].
> > >
> > > [1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935
> > >
> > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > > ---
> > >   mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++----
> ---------
> > >   1 file changed, 71 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index b09d1023e775..db80c66e2205 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct
> *w)
> > >   * main API
> > >   **********************************/
> > >
> > > +static bool zswap_compress_pages(struct page *pages[],
> > > +                              struct zswap_entry *entries[],
> > > +                              u8 nr_pages,
> > > +                              struct zswap_pool *pool)
> > > +{
> > > +     u8 i;
> > > +
> > > +     for (i = 0; i < nr_pages; ++i) {
> > > +             if (!zswap_compress(pages[i], entries[i], pool))
> > > +                     return false;
> > > +     }
> > > +
> > > +     return true;
> > > +}
> >
> > How about introducing a `zswap_compress_folio()` interface which
> > can be used by `zswap_store()`?
> > ```
> > zswap_store()
> >         nr_pages = folio_nr_pages(folio)
> >
> >         entries = zswap_alloc_entries(nr_pages)
> >
> >         ret = zswap_compress_folio(folio, entries, pool)
> >
> >         // store entries into xarray and LRU list
> > ```
> >
> > And this version `zswap_compress_folio()` is very simple for now:
> > ```
> > zswap_compress_folio()
> >         nr_pages = folio_nr_pages(folio)
> >
> >         for (index = 0; index < nr_pages; ++index) {
> >                 struct page *page = folio_page(folio, index);
> >
> >                 if (!zswap_compress(page, &entries[index], pool))
> >                         return false;
> >         }
> >
> >         return true;
> > ```
> > This can be easily extended to support your "batched" version.
> >
> > Then the old `zswap_store_page()` could be removed.
> >
> > The good point is simplicity, that we don't need to slice folio
> > into multiple batches, then repeat the common operations for each
> > batch, like preparing entries, storing into xarray and LRU list...
> 
> +1

Thanks for the code review comments. One question though: would
it make sense to trade-off the memory footprint cost with the code
simplification? For instance, lets say we want to store a 64k folio.
We would allocate memory for 16 zswap entries, and lets say one of
the compressions fails, we would deallocate memory for all 16 zswap
entries. Could this lead to zswap_entry kmem_cache starvation and
subsequent zswap_store() failures in multiple processes scenarios?

In other words, allocating entries in smaller batches -- more specifically,
only the compress batchsize -- seems to strike a balance in terms of
memory footprint, while mitigating the starvation aspect, and possibly
also helping latency (allocating a large # of zswap entries and potentially
deallocating, could impact latency).

If we agree with the merits of processing a large folio in smaller batches:
this in turn requires we store the smaller batches of entries in the
xarray/LRU before moving to the next batch. Which means all the
zswap_store() ops need to be done for a batch before moving to the next
batch.

> 
> Also, I don't like the helpers hiding some of the loops and leaving
> others, as Johannes said, please keep all the iteration over pages at
> the same function level where possible to make the code clear.

Sure. I can either inline all the loops into zswap_store_pages(), or convert
all iterations into helpers with a consistent signature:

zswap_<proc_name>(arrayed_struct, nr_pages);

Please let me know which would work best. Thanks!

> 
> This should not be a separate series too, when I said divide into
> chunks I meant leave out the multiple folios batching and focus on
> batching pages in a single large folio, not breaking down the series
> into multiple ones. Not a big deal tho :)

I understand. I am trying to de-couple and develop in parallel the
following, which I intend to converge into a v5 of the original series [1]:
  a) Vectorization, followed by batching of zswap_store() of large folios.
  b) acomp request chaining suggestions from Herbert, which could
       change the existing v4 implementation of the
       crypto_acomp_batch_compress() API that zswap would need to
       call for IAA compress batching.

[1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935

Thanks,
Kanchana

> 
> >
> > Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio.
  2024-12-02 19:33   ` Yosry Ahmed
@ 2024-12-03  1:13     ` Sridhar, Kanchana P
  2024-12-03  5:34       ` Yosry Ahmed
  0 siblings, 1 reply; 18+ messages in thread
From: Sridhar, Kanchana P @ 2024-12-03  1:13 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: linux-kernel, linux-mm, hannes, nphamcs, chengming.zhou,
	usamaarif642, ryan.roberts, 21cnbao, akpm, Feghali, Wajdi K,
	Gopal, Vinodh, Sridhar, Kanchana P


> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Monday, December 2, 2024 11:34 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> usamaarif642@gmail.com; ryan.roberts@arm.com; 21cnbao@gmail.com;
> akpm@linux-foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>;
> Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to
> process multiple pages in a folio.
> 
> On Wed, Nov 27, 2024 at 2:53 PM Kanchana P Sridhar
> <kanchana.p.sridhar@intel.com> wrote:
> >
> > Modified zswap_store() to store the folio in batches of
> > SWAP_CRYPTO_BATCH_SIZE pages. Accordingly, refactored
> zswap_store_page()
> > into zswap_store_pages() that processes a range of pages in the folio.
> > zswap_store_pages() is a vectorized version of zswap_store_page().
> >
> > For now, zswap_store_pages() will sequentially compress these pages with
> > zswap_compress().
> >
> > These changes are follow-up to code review comments received for [1], and
> > are intended to set up zswap_store() for batching with Intel IAA.
> >
> > [1]: https://patchwork.kernel.org/project/linux-
> mm/patch/20241123070127.332773-11-kanchana.p.sridhar@intel.com/
> >
> > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > ---
> >  include/linux/zswap.h |   1 +
> >  mm/zswap.c            | 154 ++++++++++++++++++++++++------------------
> >  2 files changed, 88 insertions(+), 67 deletions(-)
> >
> > diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> > index d961ead91bf1..05a81e750744 100644
> > --- a/include/linux/zswap.h
> > +++ b/include/linux/zswap.h
> > @@ -7,6 +7,7 @@
> >
> >  struct lruvec;
> >
> > +#define SWAP_CRYPTO_BATCH_SIZE 8UL
> >  extern atomic_long_t zswap_stored_pages;
> >
> >  #ifdef CONFIG_ZSWAP
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index f6316b66fb23..b09d1023e775 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1409,78 +1409,96 @@ static void shrink_worker(struct work_struct
> *w)
> >  * main API
> >  **********************************/
> >
> > -static ssize_t zswap_store_page(struct page *page,
> > -                               struct obj_cgroup *objcg,
> > -                               struct zswap_pool *pool)
> > +/*
> > + * Store multiple pages in @folio, starting from the page at index @si up to
> > + * and including the page at index @ei.
> > + */
> > +static ssize_t zswap_store_pages(struct folio *folio,
> > +                                long si,
> > +                                long ei,
> > +                                struct obj_cgroup *objcg,
> > +                                struct zswap_pool *pool)
> >  {
> > -       swp_entry_t page_swpentry = page_swap_entry(page);
> > +       struct page *page;
> > +       swp_entry_t page_swpentry;
> >         struct zswap_entry *entry, *old;
> > +       size_t compressed_bytes = 0;
> > +       u8 nr_pages = ei - si + 1;
> > +       u8 i;
> > +
> > +       for (i = 0; i < nr_pages; ++i) {
> > +               page = folio_page(folio, si + i);
> > +               page_swpentry = page_swap_entry(page);
> > +
> > +               /* allocate entry */
> > +               entry = zswap_entry_cache_alloc(GFP_KERNEL,
> page_to_nid(page));
> > +               if (!entry) {
> > +                       zswap_reject_kmemcache_fail++;
> > +                       return -EINVAL;
> > +               }
> 
> I think this patch is wrong on its own, for example if an allocation
> fails in the above loop we exit without cleaning up previous
> allocations. I think it's fixed in patch 2 but we cannot introduce

I think there might be a misunderstanding. zswap_store_pages() will
clean up local resources allocated during an iteration of the for loop,
upon an error in that iteration. If you see the "goto store_failed" and
"goto compress_failed" this would explain what I mean. If an allocation
fails for an iteration, we simply return -EINVAL, and zswap_store()
will goto the "put_pool" label with "ret" still false, which will delete
all zswap entries for the folio (allocated up until the error iteration in
zswap_store_pages(); or potentially already in the xarray).

Hence, there is no bug and each of the two patches are correct by
themselves AFAICT, but please let me know if I am missing anything. Thanks!

> bugs in-between patches. I think the helpers in patch 2 don't really
> help as I mentioned. Please combine the changes and keep them in the
> main series (unless you have a reason not to).

Sure. As noted in my earlier response to comments received for patch 2,
I can either inline all iterations or create helpers for all iterations over
the pages in a batch. Appreciate your suggestions on which would be a
better approach. 

Thanks,
Kanchana

> 
> >
> > -       /* allocate entry */
> > -       entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
> > -       if (!entry) {
> > -               zswap_reject_kmemcache_fail++;
> > -               return -EINVAL;
> > -       }
> > -
> > -       if (!zswap_compress(page, entry, pool))
> > -               goto compress_failed;
> > +               if (!zswap_compress(page, entry, pool))
> > +                       goto compress_failed;
> >
> > -       old = xa_store(swap_zswap_tree(page_swpentry),
> > -                      swp_offset(page_swpentry),
> > -                      entry, GFP_KERNEL);
> > -       if (xa_is_err(old)) {
> > -               int err = xa_err(old);
> > +               old = xa_store(swap_zswap_tree(page_swpentry),
> > +                              swp_offset(page_swpentry),
> > +                              entry, GFP_KERNEL);
> > +               if (xa_is_err(old)) {
> > +                       int err = xa_err(old);
> >
> > -               WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n",
> err);
> > -               zswap_reject_alloc_fail++;
> > -               goto store_failed;
> > -       }
> > +                       WARN_ONCE(err != -ENOMEM, "unexpected xarray error:
> %d\n", err);
> > +                       zswap_reject_alloc_fail++;
> > +                       goto store_failed;
> > +               }
> >
> > -       /*
> > -        * We may have had an existing entry that became stale when
> > -        * the folio was redirtied and now the new version is being
> > -        * swapped out. Get rid of the old.
> > -        */
> > -       if (old)
> > -               zswap_entry_free(old);
> > +               /*
> > +                * We may have had an existing entry that became stale when
> > +                * the folio was redirtied and now the new version is being
> > +                * swapped out. Get rid of the old.
> > +                */
> > +               if (old)
> > +                       zswap_entry_free(old);
> >
> > -       /*
> > -        * The entry is successfully compressed and stored in the tree, there is
> > -        * no further possibility of failure. Grab refs to the pool and objcg.
> > -        * These refs will be dropped by zswap_entry_free() when the entry is
> > -        * removed from the tree.
> > -        */
> > -       zswap_pool_get(pool);
> > -       if (objcg)
> > -               obj_cgroup_get(objcg);
> > +               /*
> > +                * The entry is successfully compressed and stored in the tree,
> there is
> > +                * no further possibility of failure. Grab refs to the pool and objcg.
> > +                * These refs will be dropped by zswap_entry_free() when the
> entry is
> > +                * removed from the tree.
> > +                */
> > +               zswap_pool_get(pool);
> > +               if (objcg)
> > +                       obj_cgroup_get(objcg);
> >
> > -       /*
> > -        * We finish initializing the entry while it's already in xarray.
> > -        * This is safe because:
> > -        *
> > -        * 1. Concurrent stores and invalidations are excluded by folio lock.
> > -        *
> > -        * 2. Writeback is excluded by the entry not being on the LRU yet.
> > -        *    The publishing order matters to prevent writeback from seeing
> > -        *    an incoherent entry.
> > -        */
> > -       entry->pool = pool;
> > -       entry->swpentry = page_swpentry;
> > -       entry->objcg = objcg;
> > -       entry->referenced = true;
> > -       if (entry->length) {
> > -               INIT_LIST_HEAD(&entry->lru);
> > -               zswap_lru_add(&zswap_list_lru, entry);
> > -       }
> > +               /*
> > +                * We finish initializing the entry while it's already in xarray.
> > +                * This is safe because:
> > +                *
> > +                * 1. Concurrent stores and invalidations are excluded by folio
> lock.
> > +                *
> > +                * 2. Writeback is excluded by the entry not being on the LRU yet.
> > +                *    The publishing order matters to prevent writeback from seeing
> > +                *    an incoherent entry.
> > +                */
> > +               entry->pool = pool;
> > +               entry->swpentry = page_swpentry;
> > +               entry->objcg = objcg;
> > +               entry->referenced = true;
> > +               if (entry->length) {
> > +                       INIT_LIST_HEAD(&entry->lru);
> > +                       zswap_lru_add(&zswap_list_lru, entry);
> > +               }
> >
> > -       return entry->length;
> > +               compressed_bytes += entry->length;
> > +               continue;
> >
> >  store_failed:
> > -       zpool_free(pool->zpool, entry->handle);
> > +               zpool_free(pool->zpool, entry->handle);
> >  compress_failed:
> > -       zswap_entry_cache_free(entry);
> > -       return -EINVAL;
> > +               zswap_entry_cache_free(entry);
> > +               return -EINVAL;
> > +       }
> > +
> > +       return compressed_bytes;
> >  }
> >
> >  bool zswap_store(struct folio *folio)
> > @@ -1492,7 +1510,7 @@ bool zswap_store(struct folio *folio)
> >         struct zswap_pool *pool;
> >         size_t compressed_bytes = 0;
> >         bool ret = false;
> > -       long index;
> > +       long si, ei, incr = SWAP_CRYPTO_BATCH_SIZE;
> >
> >         VM_WARN_ON_ONCE(!folio_test_locked(folio));
> >         VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
> > @@ -1526,11 +1544,13 @@ bool zswap_store(struct folio *folio)
> >                 mem_cgroup_put(memcg);
> >         }
> >
> > -       for (index = 0; index < nr_pages; ++index) {
> > -               struct page *page = folio_page(folio, index);
> > +       /* Store the folio in batches of SWAP_CRYPTO_BATCH_SIZE pages. */
> > +       for (si = 0, ei = min(si + incr - 1, nr_pages - 1);
> > +            ((si < nr_pages) && (ei < nr_pages));
> > +            si = ei + 1, ei = min(si + incr - 1, nr_pages - 1)) {
> >                 ssize_t bytes;
> >
> > -               bytes = zswap_store_page(page, objcg, pool);
> > +               bytes = zswap_store_pages(folio, si, ei, objcg, pool);
> >                 if (bytes < 0)
> >                         goto put_pool;
> >                 compressed_bytes += bytes;
> > @@ -1565,9 +1585,9 @@ bool zswap_store(struct folio *folio)
> >                 struct zswap_entry *entry;
> >                 struct xarray *tree;
> >
> > -               for (index = 0; index < nr_pages; ++index) {
> > -                       tree = swap_zswap_tree(swp_entry(type, offset + index));
> > -                       entry = xa_erase(tree, offset + index);
> > +               for (si = 0; si < nr_pages; ++si) {
> > +                       tree = swap_zswap_tree(swp_entry(type, offset + si));
> > +                       entry = xa_erase(tree, offset + si);
> >                         if (entry)
> >                                 zswap_entry_free(entry);
> >                 }
> > --
> > 2.27.0
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
  2024-12-03  0:17     ` Nhat Pham
@ 2024-12-03  1:15       ` Sridhar, Kanchana P
  0 siblings, 0 replies; 18+ messages in thread
From: Sridhar, Kanchana P @ 2024-12-03  1:15 UTC (permalink / raw)
  To: Nhat Pham, Chengming Zhou
  Cc: linux-kernel, linux-mm, hannes, yosryahmed, usamaarif642,
	ryan.roberts, 21cnbao, akpm, Feghali, Wajdi K, Gopal, Vinodh,
	Sridhar, Kanchana P


> -----Original Message-----
> From: Nhat Pham <nphamcs@gmail.com>
> Sent: Monday, December 2, 2024 4:17 PM
> To: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> yosryahmed@google.com; usamaarif642@gmail.com;
> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
> for batching.
> 
> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
> <chengming.zhou@linux.dev> wrote:
> >
> > How about introducing a `zswap_compress_folio()` interface which
> > can be used by `zswap_store()`?
> > ```
> > zswap_store()
> >         nr_pages = folio_nr_pages(folio)
> >
> >         entries = zswap_alloc_entries(nr_pages)
> >
> >         ret = zswap_compress_folio(folio, entries, pool)
> >
> >         // store entries into xarray and LRU list
> > ```
> >
> > And this version `zswap_compress_folio()` is very simple for now:
> > ```
> > zswap_compress_folio()
> >         nr_pages = folio_nr_pages(folio)
> >
> >         for (index = 0; index < nr_pages; ++index) {
> >                 struct page *page = folio_page(folio, index);
> >
> >                 if (!zswap_compress(page, &entries[index], pool))
> >                         return false;
> >         }
> >
> >         return true;
> > ```
> > This can be easily extended to support your "batched" version.
> >
> > Then the old `zswap_store_page()` could be removed.
> >
> > The good point is simplicity, that we don't need to slice folio
> > into multiple batches, then repeat the common operations for each
> > batch, like preparing entries, storing into xarray and LRU list...
> >
> 
> +1.

Thanks Nhat. I replied with some potential considerations in my reply
to Chengming's and Yosry's comments. Would appreciate it if you can
add follow-up suggestions to that reply.

Thanks,
Kanchana

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
  2024-12-03  1:01       ` Sridhar, Kanchana P
@ 2024-12-03  3:06         ` Chengming Zhou
  2024-12-03  4:18           ` Sridhar, Kanchana P
  0 siblings, 1 reply; 18+ messages in thread
From: Chengming Zhou @ 2024-12-03  3:06 UTC (permalink / raw)
  To: Sridhar, Kanchana P, Yosry Ahmed
  Cc: linux-kernel, linux-mm, hannes, nphamcs, usamaarif642,
	ryan.roberts, 21cnbao, akpm, Feghali, Wajdi K, Gopal, Vinodh

On 2024/12/3 09:01, Sridhar, Kanchana P wrote:
> Hi Chengming, Yosry,
> 
>> -----Original Message-----
>> From: Yosry Ahmed <yosryahmed@google.com>
>> Sent: Monday, December 2, 2024 11:33 AM
>> To: Chengming Zhou <chengming.zhou@linux.dev>
>> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
>> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
>> nphamcs@gmail.com; usamaarif642@gmail.com; ryan.roberts@arm.com;
>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
>> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
>> for batching.
>>
>> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
>> <chengming.zhou@linux.dev> wrote:
>>>
>>> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
>>>> In order to set up zswap_store_pages() to enable a clean batching
>>>> implementation in [1], this patch implements the following changes:
>>>>
>>>> 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
>>>>      all pages in the specified range for the folio, upfront. If this fails,
>>>>      we return an error status to zswap_store().
>>>>
>>>> 2) Addition of zswap_compress_pages() that calls zswap_compress() for
>> each
>>>>      page, and returns false if any zswap_compress() fails, so
>>>>      zswap_store_page() can cleanup resources allocated and return an
>> error
>>>>      status to zswap_store().
>>>>
>>>> 3) A "store_pages_failed" label that is a catch-all for all failure points
>>>>      in zswap_store_pages(). This facilitates cleaner error handling within
>>>>      zswap_store_pages(), which will become important for IAA compress
>>>>      batching in [1].
>>>>
>>>> [1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935
>>>>
>>>> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
>>>> ---
>>>>    mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++----
>> ---------
>>>>    1 file changed, 71 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/mm/zswap.c b/mm/zswap.c
>>>> index b09d1023e775..db80c66e2205 100644
>>>> --- a/mm/zswap.c
>>>> +++ b/mm/zswap.c
>>>> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct
>> *w)
>>>>    * main API
>>>>    **********************************/
>>>>
>>>> +static bool zswap_compress_pages(struct page *pages[],
>>>> +                              struct zswap_entry *entries[],
>>>> +                              u8 nr_pages,
>>>> +                              struct zswap_pool *pool)
>>>> +{
>>>> +     u8 i;
>>>> +
>>>> +     for (i = 0; i < nr_pages; ++i) {
>>>> +             if (!zswap_compress(pages[i], entries[i], pool))
>>>> +                     return false;
>>>> +     }
>>>> +
>>>> +     return true;
>>>> +}
>>>
>>> How about introducing a `zswap_compress_folio()` interface which
>>> can be used by `zswap_store()`?
>>> ```
>>> zswap_store()
>>>          nr_pages = folio_nr_pages(folio)
>>>
>>>          entries = zswap_alloc_entries(nr_pages)
>>>
>>>          ret = zswap_compress_folio(folio, entries, pool)
>>>
>>>          // store entries into xarray and LRU list
>>> ```
>>>
>>> And this version `zswap_compress_folio()` is very simple for now:
>>> ```
>>> zswap_compress_folio()
>>>          nr_pages = folio_nr_pages(folio)
>>>
>>>          for (index = 0; index < nr_pages; ++index) {
>>>                  struct page *page = folio_page(folio, index);
>>>
>>>                  if (!zswap_compress(page, &entries[index], pool))
>>>                          return false;
>>>          }
>>>
>>>          return true;
>>> ```
>>> This can be easily extended to support your "batched" version.
>>>
>>> Then the old `zswap_store_page()` could be removed.
>>>
>>> The good point is simplicity, that we don't need to slice folio
>>> into multiple batches, then repeat the common operations for each
>>> batch, like preparing entries, storing into xarray and LRU list...
>>
>> +1
> 
> Thanks for the code review comments. One question though: would
> it make sense to trade-off the memory footprint cost with the code
> simplification? For instance, lets say we want to store a 64k folio.
> We would allocate memory for 16 zswap entries, and lets say one of
> the compressions fails, we would deallocate memory for all 16 zswap
> entries. Could this lead to zswap_entry kmem_cache starvation and
> subsequent zswap_store() failures in multiple processes scenarios?

Ah, I get your consideration. But it's the unlikely case, right?

If the case you mentioned above happens a lot, I think yes, we should
optimize its memory footprint to avoid allocation and deallocation.

On the other hand, we should consider a folio would be compressed
successfully in most cases. So we have to allocate all entries
eventually.

Based on your consideration, I think your way is ok too, although
I think the patch 2/2 should be dropped, since it hides pages loop
in smaller functions, as Yosry mentioned too.

> 
> In other words, allocating entries in smaller batches -- more specifically,
> only the compress batchsize -- seems to strike a balance in terms of
> memory footprint, while mitigating the starvation aspect, and possibly
> also helping latency (allocating a large # of zswap entries and potentially
> deallocating, could impact latency).

If we consider the likely case (compress successfully), the whole
latency should be better, right? Since we can bulk allocate all
entries at first, and bulk insert to xarray and LRU at last.

> 
> If we agree with the merits of processing a large folio in smaller batches:
> this in turn requires we store the smaller batches of entries in the
> xarray/LRU before moving to the next batch. Which means all the
> zswap_store() ops need to be done for a batch before moving to the next
> batch.
> 

Both way is ok for me based on your memory footprint consideration
above.

Thanks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio.
  2024-11-27 22:53 ` [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio Kanchana P Sridhar
  2024-12-02 19:33   ` Yosry Ahmed
@ 2024-12-03  3:23   ` Chengming Zhou
  2024-12-03  4:37     ` Sridhar, Kanchana P
  1 sibling, 1 reply; 18+ messages in thread
From: Chengming Zhou @ 2024-12-03  3:23 UTC (permalink / raw)
  To: Kanchana P Sridhar, linux-kernel, linux-mm, hannes, yosryahmed,
	nphamcs, usamaarif642, ryan.roberts, 21cnbao, akpm
  Cc: wajdi.k.feghali, vinodh.gopal

On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> Modified zswap_store() to store the folio in batches of
> SWAP_CRYPTO_BATCH_SIZE pages. Accordingly, refactored zswap_store_page()
> into zswap_store_pages() that processes a range of pages in the folio.
> zswap_store_pages() is a vectorized version of zswap_store_page().

Maybe I missed something, but this refactor change looks confused to me.

Why not zswap_store_pages() just reuse the existing zswap_store_page()?
```
zswap_store_pages()
	compressed_bytes = 0
	for each page in this batch
		size = zswap_store_page(page)
		if (size < 0)
			return size;
		compressed_bytes += size;
	return compressed_bytes;
```

Thanks.

> 
> For now, zswap_store_pages() will sequentially compress these pages with
> zswap_compress().
> 
> These changes are follow-up to code review comments received for [1], and
> are intended to set up zswap_store() for batching with Intel IAA.
> 
> [1]: https://patchwork.kernel.org/project/linux-mm/patch/20241123070127.332773-11-kanchana.p.sridhar@intel.com/
> 
> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> ---
>   include/linux/zswap.h |   1 +
>   mm/zswap.c            | 154 ++++++++++++++++++++++++------------------
>   2 files changed, 88 insertions(+), 67 deletions(-)
> 
> diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> index d961ead91bf1..05a81e750744 100644
> --- a/include/linux/zswap.h
> +++ b/include/linux/zswap.h
> @@ -7,6 +7,7 @@
>   
>   struct lruvec;
>   
> +#define SWAP_CRYPTO_BATCH_SIZE 8UL
>   extern atomic_long_t zswap_stored_pages;
>   
>   #ifdef CONFIG_ZSWAP
> diff --git a/mm/zswap.c b/mm/zswap.c
> index f6316b66fb23..b09d1023e775 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1409,78 +1409,96 @@ static void shrink_worker(struct work_struct *w)
>   * main API
>   **********************************/
>   
> -static ssize_t zswap_store_page(struct page *page,
> -				struct obj_cgroup *objcg,
> -				struct zswap_pool *pool)
> +/*
> + * Store multiple pages in @folio, starting from the page at index @si up to
> + * and including the page at index @ei.
> + */
> +static ssize_t zswap_store_pages(struct folio *folio,
> +				 long si,
> +				 long ei,
> +				 struct obj_cgroup *objcg,
> +				 struct zswap_pool *pool)
>   {
> -	swp_entry_t page_swpentry = page_swap_entry(page);
> +	struct page *page;
> +	swp_entry_t page_swpentry;
>   	struct zswap_entry *entry, *old;
> +	size_t compressed_bytes = 0;
> +	u8 nr_pages = ei - si + 1;
> +	u8 i;
> +
> +	for (i = 0; i < nr_pages; ++i) {
> +		page = folio_page(folio, si + i);
> +		page_swpentry = page_swap_entry(page);
> +
> +		/* allocate entry */
> +		entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
> +		if (!entry) {
> +			zswap_reject_kmemcache_fail++;
> +			return -EINVAL;
> +		}
>   
> -	/* allocate entry */
> -	entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
> -	if (!entry) {
> -		zswap_reject_kmemcache_fail++;
> -		return -EINVAL;
> -	}
> -
> -	if (!zswap_compress(page, entry, pool))
> -		goto compress_failed;
> +		if (!zswap_compress(page, entry, pool))
> +			goto compress_failed;
>   
> -	old = xa_store(swap_zswap_tree(page_swpentry),
> -		       swp_offset(page_swpentry),
> -		       entry, GFP_KERNEL);
> -	if (xa_is_err(old)) {
> -		int err = xa_err(old);
> +		old = xa_store(swap_zswap_tree(page_swpentry),
> +			       swp_offset(page_swpentry),
> +			       entry, GFP_KERNEL);
> +		if (xa_is_err(old)) {
> +			int err = xa_err(old);
>   
> -		WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
> -		zswap_reject_alloc_fail++;
> -		goto store_failed;
> -	}
> +			WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
> +			zswap_reject_alloc_fail++;
> +			goto store_failed;
> +		}
>   
> -	/*
> -	 * We may have had an existing entry that became stale when
> -	 * the folio was redirtied and now the new version is being
> -	 * swapped out. Get rid of the old.
> -	 */
> -	if (old)
> -		zswap_entry_free(old);
> +		/*
> +		 * We may have had an existing entry that became stale when
> +		 * the folio was redirtied and now the new version is being
> +		 * swapped out. Get rid of the old.
> +		 */
> +		if (old)
> +			zswap_entry_free(old);
>   
> -	/*
> -	 * The entry is successfully compressed and stored in the tree, there is
> -	 * no further possibility of failure. Grab refs to the pool and objcg.
> -	 * These refs will be dropped by zswap_entry_free() when the entry is
> -	 * removed from the tree.
> -	 */
> -	zswap_pool_get(pool);
> -	if (objcg)
> -		obj_cgroup_get(objcg);
> +		/*
> +		 * The entry is successfully compressed and stored in the tree, there is
> +		 * no further possibility of failure. Grab refs to the pool and objcg.
> +		 * These refs will be dropped by zswap_entry_free() when the entry is
> +		 * removed from the tree.
> +		 */
> +		zswap_pool_get(pool);
> +		if (objcg)
> +			obj_cgroup_get(objcg);
>   
> -	/*
> -	 * We finish initializing the entry while it's already in xarray.
> -	 * This is safe because:
> -	 *
> -	 * 1. Concurrent stores and invalidations are excluded by folio lock.
> -	 *
> -	 * 2. Writeback is excluded by the entry not being on the LRU yet.
> -	 *    The publishing order matters to prevent writeback from seeing
> -	 *    an incoherent entry.
> -	 */
> -	entry->pool = pool;
> -	entry->swpentry = page_swpentry;
> -	entry->objcg = objcg;
> -	entry->referenced = true;
> -	if (entry->length) {
> -		INIT_LIST_HEAD(&entry->lru);
> -		zswap_lru_add(&zswap_list_lru, entry);
> -	}
> +		/*
> +		 * We finish initializing the entry while it's already in xarray.
> +		 * This is safe because:
> +		 *
> +		 * 1. Concurrent stores and invalidations are excluded by folio lock.
> +		 *
> +		 * 2. Writeback is excluded by the entry not being on the LRU yet.
> +		 *    The publishing order matters to prevent writeback from seeing
> +		 *    an incoherent entry.
> +		 */
> +		entry->pool = pool;
> +		entry->swpentry = page_swpentry;
> +		entry->objcg = objcg;
> +		entry->referenced = true;
> +		if (entry->length) {
> +			INIT_LIST_HEAD(&entry->lru);
> +			zswap_lru_add(&zswap_list_lru, entry);
> +		}
>   
> -	return entry->length;
> +		compressed_bytes += entry->length;
> +		continue;
>   
>   store_failed:
> -	zpool_free(pool->zpool, entry->handle);
> +		zpool_free(pool->zpool, entry->handle);
>   compress_failed:
> -	zswap_entry_cache_free(entry);
> -	return -EINVAL;
> +		zswap_entry_cache_free(entry);
> +		return -EINVAL;
> +	}
> +
> +	return compressed_bytes;
>   }
>   
>   bool zswap_store(struct folio *folio)
> @@ -1492,7 +1510,7 @@ bool zswap_store(struct folio *folio)
>   	struct zswap_pool *pool;
>   	size_t compressed_bytes = 0;
>   	bool ret = false;
> -	long index;
> +	long si, ei, incr = SWAP_CRYPTO_BATCH_SIZE;
>   
>   	VM_WARN_ON_ONCE(!folio_test_locked(folio));
>   	VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
> @@ -1526,11 +1544,13 @@ bool zswap_store(struct folio *folio)
>   		mem_cgroup_put(memcg);
>   	}
>   
> -	for (index = 0; index < nr_pages; ++index) {
> -		struct page *page = folio_page(folio, index);
> +	/* Store the folio in batches of SWAP_CRYPTO_BATCH_SIZE pages. */
> +	for (si = 0, ei = min(si + incr - 1, nr_pages - 1);
> +	     ((si < nr_pages) && (ei < nr_pages));
> +	     si = ei + 1, ei = min(si + incr - 1, nr_pages - 1)) {
>   		ssize_t bytes;
>   
> -		bytes = zswap_store_page(page, objcg, pool);
> +		bytes = zswap_store_pages(folio, si, ei, objcg, pool);
>   		if (bytes < 0)
>   			goto put_pool;
>   		compressed_bytes += bytes;
> @@ -1565,9 +1585,9 @@ bool zswap_store(struct folio *folio)
>   		struct zswap_entry *entry;
>   		struct xarray *tree;
>   
> -		for (index = 0; index < nr_pages; ++index) {
> -			tree = swap_zswap_tree(swp_entry(type, offset + index));
> -			entry = xa_erase(tree, offset + index);
> +		for (si = 0; si < nr_pages; ++si) {
> +			tree = swap_zswap_tree(swp_entry(type, offset + si));
> +			entry = xa_erase(tree, offset + si);
>   			if (entry)
>   				zswap_entry_free(entry);
>   		}


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
  2024-12-03  3:06         ` Chengming Zhou
@ 2024-12-03  4:18           ` Sridhar, Kanchana P
  2024-12-03  5:49             ` Yosry Ahmed
  0 siblings, 1 reply; 18+ messages in thread
From: Sridhar, Kanchana P @ 2024-12-03  4:18 UTC (permalink / raw)
  To: Chengming Zhou, Yosry Ahmed
  Cc: linux-kernel, linux-mm, hannes, nphamcs, usamaarif642,
	ryan.roberts, 21cnbao, akpm, Feghali, Wajdi K, Gopal, Vinodh,
	Sridhar, Kanchana P


> -----Original Message-----
> From: Chengming Zhou <chengming.zhou@linux.dev>
> Sent: Monday, December 2, 2024 7:06 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Yosry Ahmed
> <yosryahmed@google.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; nphamcs@gmail.com; usamaarif642@gmail.com;
> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
> for batching.
> 
> On 2024/12/3 09:01, Sridhar, Kanchana P wrote:
> > Hi Chengming, Yosry,
> >
> >> -----Original Message-----
> >> From: Yosry Ahmed <yosryahmed@google.com>
> >> Sent: Monday, December 2, 2024 11:33 AM
> >> To: Chengming Zhou <chengming.zhou@linux.dev>
> >> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> >> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> >> nphamcs@gmail.com; usamaarif642@gmail.com; ryan.roberts@arm.com;
> >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> >> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages()
> simplifications
> >> for batching.
> >>
> >> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
> >> <chengming.zhou@linux.dev> wrote:
> >>>
> >>> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> >>>> In order to set up zswap_store_pages() to enable a clean batching
> >>>> implementation in [1], this patch implements the following changes:
> >>>>
> >>>> 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
> >>>>      all pages in the specified range for the folio, upfront. If this fails,
> >>>>      we return an error status to zswap_store().
> >>>>
> >>>> 2) Addition of zswap_compress_pages() that calls zswap_compress() for
> >> each
> >>>>      page, and returns false if any zswap_compress() fails, so
> >>>>      zswap_store_page() can cleanup resources allocated and return an
> >> error
> >>>>      status to zswap_store().
> >>>>
> >>>> 3) A "store_pages_failed" label that is a catch-all for all failure points
> >>>>      in zswap_store_pages(). This facilitates cleaner error handling within
> >>>>      zswap_store_pages(), which will become important for IAA compress
> >>>>      batching in [1].
> >>>>
> >>>> [1]: https://patchwork.kernel.org/project/linux-
> mm/list/?series=911935
> >>>>
> >>>> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> >>>> ---
> >>>>    mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++-
> ---
> >> ---------
> >>>>    1 file changed, 71 insertions(+), 22 deletions(-)
> >>>>
> >>>> diff --git a/mm/zswap.c b/mm/zswap.c
> >>>> index b09d1023e775..db80c66e2205 100644
> >>>> --- a/mm/zswap.c
> >>>> +++ b/mm/zswap.c
> >>>> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct
> >> *w)
> >>>>    * main API
> >>>>    **********************************/
> >>>>
> >>>> +static bool zswap_compress_pages(struct page *pages[],
> >>>> +                              struct zswap_entry *entries[],
> >>>> +                              u8 nr_pages,
> >>>> +                              struct zswap_pool *pool)
> >>>> +{
> >>>> +     u8 i;
> >>>> +
> >>>> +     for (i = 0; i < nr_pages; ++i) {
> >>>> +             if (!zswap_compress(pages[i], entries[i], pool))
> >>>> +                     return false;
> >>>> +     }
> >>>> +
> >>>> +     return true;
> >>>> +}
> >>>
> >>> How about introducing a `zswap_compress_folio()` interface which
> >>> can be used by `zswap_store()`?
> >>> ```
> >>> zswap_store()
> >>>          nr_pages = folio_nr_pages(folio)
> >>>
> >>>          entries = zswap_alloc_entries(nr_pages)
> >>>
> >>>          ret = zswap_compress_folio(folio, entries, pool)
> >>>
> >>>          // store entries into xarray and LRU list
> >>> ```
> >>>
> >>> And this version `zswap_compress_folio()` is very simple for now:
> >>> ```
> >>> zswap_compress_folio()
> >>>          nr_pages = folio_nr_pages(folio)
> >>>
> >>>          for (index = 0; index < nr_pages; ++index) {
> >>>                  struct page *page = folio_page(folio, index);
> >>>
> >>>                  if (!zswap_compress(page, &entries[index], pool))
> >>>                          return false;
> >>>          }
> >>>
> >>>          return true;
> >>> ```
> >>> This can be easily extended to support your "batched" version.
> >>>
> >>> Then the old `zswap_store_page()` could be removed.
> >>>
> >>> The good point is simplicity, that we don't need to slice folio
> >>> into multiple batches, then repeat the common operations for each
> >>> batch, like preparing entries, storing into xarray and LRU list...
> >>
> >> +1
> >
> > Thanks for the code review comments. One question though: would
> > it make sense to trade-off the memory footprint cost with the code
> > simplification? For instance, lets say we want to store a 64k folio.
> > We would allocate memory for 16 zswap entries, and lets say one of
> > the compressions fails, we would deallocate memory for all 16 zswap
> > entries. Could this lead to zswap_entry kmem_cache starvation and
> > subsequent zswap_store() failures in multiple processes scenarios?
> 
> Ah, I get your consideration. But it's the unlikely case, right?
> 
> If the case you mentioned above happens a lot, I think yes, we should
> optimize its memory footprint to avoid allocation and deallocation.

Thanks Chengming. I see your point. Let me gather performance data
for the two options, and share.

> 
> On the other hand, we should consider a folio would be compressed
> successfully in most cases. So we have to allocate all entries
> eventually.
> 
> Based on your consideration, I think your way is ok too, although
> I think the patch 2/2 should be dropped, since it hides pages loop
> in smaller functions, as Yosry mentioned too.

My main intent with patch 2/2 was to set up the error handling
path to be common and simpler, whether errors were encountered
during compression/zpool_malloc/xarray store. Hence, I initialize the
allocated zswap_entry's handle in zswap_alloc_entries() to ERR_PTR(-EINVAL),
so it is easy for the common error handling code in patch 2 to determine
if the handle was allocated (and hence needs to be freed). This benefits
the batching code by eliminating the need to maintain state as to which
stage of zswap_store_pages() sees an error, based on which resources
would need to be deleted.

My key consideration is to keep the batching error handling code simple,
hence these changes in patch 2. The changes described above would
help batching, and should not impact the non-batching case, as indicated
by the regression testing data I've included in the cover letter.

I don't mind inlining the implementation of the helper functions, as I
mentioned in my response to Yosry. I am hoping the error handling
simplifications are acceptable, since they will help the batching code.

> 
> >
> > In other words, allocating entries in smaller batches -- more specifically,
> > only the compress batchsize -- seems to strike a balance in terms of
> > memory footprint, while mitigating the starvation aspect, and possibly
> > also helping latency (allocating a large # of zswap entries and potentially
> > deallocating, could impact latency).
> 
> If we consider the likely case (compress successfully), the whole
> latency should be better, right? Since we can bulk allocate all
> entries at first, and bulk insert to xarray and LRU at last.

I think so too, but would like to confirm with some experiments and update.

> 
> >
> > If we agree with the merits of processing a large folio in smaller batches:
> > this in turn requires we store the smaller batches of entries in the
> > xarray/LRU before moving to the next batch. Which means all the
> > zswap_store() ops need to be done for a batch before moving to the next
> > batch.
> >
> 
> Both way is ok for me based on your memory footprint consideration
> above.

Sounds good, thanks!

Thanks,
Kanchana

> 
> Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio.
  2024-12-03  3:23   ` Chengming Zhou
@ 2024-12-03  4:37     ` Sridhar, Kanchana P
  0 siblings, 0 replies; 18+ messages in thread
From: Sridhar, Kanchana P @ 2024-12-03  4:37 UTC (permalink / raw)
  To: Chengming Zhou, linux-kernel, linux-mm, hannes, yosryahmed,
	nphamcs, usamaarif642, ryan.roberts, 21cnbao, akpm, Sridhar,
	Kanchana P
  Cc: Feghali, Wajdi K, Gopal, Vinodh


> -----Original Message-----
> From: Chengming Zhou <chengming.zhou@linux.dev>
> Sent: Monday, December 2, 2024 7:24 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> yosryahmed@google.com; nphamcs@gmail.com; usamaarif642@gmail.com;
> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org
> Cc: Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to
> process multiple pages in a folio.
> 
> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> > Modified zswap_store() to store the folio in batches of
> > SWAP_CRYPTO_BATCH_SIZE pages. Accordingly, refactored
> zswap_store_page()
> > into zswap_store_pages() that processes a range of pages in the folio.
> > zswap_store_pages() is a vectorized version of zswap_store_page().
> 
> Maybe I missed something, but this refactor change looks confused to me.
> 
> Why not zswap_store_pages() just reuse the existing zswap_store_page()?
> ```
> zswap_store_pages()
> 	compressed_bytes = 0
> 	for each page in this batch
> 		size = zswap_store_page(page)
> 		if (size < 0)
> 			return size;
> 		compressed_bytes += size;
> 	return compressed_bytes;
> ```

zswap_store_pages() is just a vectorized version of zswap_store_page(), that
operates on (a chunk of) the pages of a large folio. If the compressor supports
batching, this allows us to batch-compress the chunk of pages. For e.g., IAA
can potentially compress the chunk of pages in parallel. If the compressor
does not support batching, each page in the "batch" (IOW the "chunk of pages"
in the folio) will be compressed by calling zswap_compress(each page), as is
done in patch 2/2 currently.

If we agree with the structure of zswap_store() -- zswap_store_pages()
introduced in this patch-series, it allows me to add batching in a really
seamless manner -- this would be contained to a code branch that either
calls zswap_batch_compress(all pages) or zswap_compress(each page).
Yosry had also suggested this in an earlier comment to [1]:

"For example, most of zswap_store() should remain the same. It is still
getting a folio to compress, the only difference is that we will
parallelize the page compressions. zswap_store_page() is where some
changes need to be made. Instead of a single function that handles the
storage of each page, we need a vectorized function that handles the
storage of N pages in a folio (allocate zswap_entry's, do xarray
insertions, etc). This should be refactoring in a separate patch."

Besides these immediate benefits, I believe processing the pages of
a large folio in chunks allows us to explore batch-processing of other
ops, for instance, adding the zswap entries to the xarray, to further
reduce zswap_store() latency for large folios.

I hope this explains things a little better.

[1]: https://patchwork.kernel.org/project/linux-mm/patch/20241123070127.332773-11-kanchana.p.sridhar@intel.com/

Thanks,
Kanchana

> 
> Thanks.
> 
> >
> > For now, zswap_store_pages() will sequentially compress these pages with
> > zswap_compress().
> >
> > These changes are follow-up to code review comments received for [1], and
> > are intended to set up zswap_store() for batching with Intel IAA.
> >
> > [1]: https://patchwork.kernel.org/project/linux-
> mm/patch/20241123070127.332773-11-kanchana.p.sridhar@intel.com/
> >
> > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > ---
> >   include/linux/zswap.h |   1 +
> >   mm/zswap.c            | 154 ++++++++++++++++++++++++------------------
> >   2 files changed, 88 insertions(+), 67 deletions(-)
> >
> > diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> > index d961ead91bf1..05a81e750744 100644
> > --- a/include/linux/zswap.h
> > +++ b/include/linux/zswap.h
> > @@ -7,6 +7,7 @@
> >
> >   struct lruvec;
> >
> > +#define SWAP_CRYPTO_BATCH_SIZE 8UL
> >   extern atomic_long_t zswap_stored_pages;
> >
> >   #ifdef CONFIG_ZSWAP
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index f6316b66fb23..b09d1023e775 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1409,78 +1409,96 @@ static void shrink_worker(struct work_struct
> *w)
> >   * main API
> >   **********************************/
> >
> > -static ssize_t zswap_store_page(struct page *page,
> > -				struct obj_cgroup *objcg,
> > -				struct zswap_pool *pool)
> > +/*
> > + * Store multiple pages in @folio, starting from the page at index @si up to
> > + * and including the page at index @ei.
> > + */
> > +static ssize_t zswap_store_pages(struct folio *folio,
> > +				 long si,
> > +				 long ei,
> > +				 struct obj_cgroup *objcg,
> > +				 struct zswap_pool *pool)
> >   {
> > -	swp_entry_t page_swpentry = page_swap_entry(page);
> > +	struct page *page;
> > +	swp_entry_t page_swpentry;
> >   	struct zswap_entry *entry, *old;
> > +	size_t compressed_bytes = 0;
> > +	u8 nr_pages = ei - si + 1;
> > +	u8 i;
> > +
> > +	for (i = 0; i < nr_pages; ++i) {
> > +		page = folio_page(folio, si + i);
> > +		page_swpentry = page_swap_entry(page);
> > +
> > +		/* allocate entry */
> > +		entry = zswap_entry_cache_alloc(GFP_KERNEL,
> page_to_nid(page));
> > +		if (!entry) {
> > +			zswap_reject_kmemcache_fail++;
> > +			return -EINVAL;
> > +		}
> >
> > -	/* allocate entry */
> > -	entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page));
> > -	if (!entry) {
> > -		zswap_reject_kmemcache_fail++;
> > -		return -EINVAL;
> > -	}
> > -
> > -	if (!zswap_compress(page, entry, pool))
> > -		goto compress_failed;
> > +		if (!zswap_compress(page, entry, pool))
> > +			goto compress_failed;
> >
> > -	old = xa_store(swap_zswap_tree(page_swpentry),
> > -		       swp_offset(page_swpentry),
> > -		       entry, GFP_KERNEL);
> > -	if (xa_is_err(old)) {
> > -		int err = xa_err(old);
> > +		old = xa_store(swap_zswap_tree(page_swpentry),
> > +			       swp_offset(page_swpentry),
> > +			       entry, GFP_KERNEL);
> > +		if (xa_is_err(old)) {
> > +			int err = xa_err(old);
> >
> > -		WARN_ONCE(err != -ENOMEM, "unexpected xarray error:
> %d\n", err);
> > -		zswap_reject_alloc_fail++;
> > -		goto store_failed;
> > -	}
> > +			WARN_ONCE(err != -ENOMEM, "unexpected xarray
> error: %d\n", err);
> > +			zswap_reject_alloc_fail++;
> > +			goto store_failed;
> > +		}
> >
> > -	/*
> > -	 * We may have had an existing entry that became stale when
> > -	 * the folio was redirtied and now the new version is being
> > -	 * swapped out. Get rid of the old.
> > -	 */
> > -	if (old)
> > -		zswap_entry_free(old);
> > +		/*
> > +		 * We may have had an existing entry that became stale when
> > +		 * the folio was redirtied and now the new version is being
> > +		 * swapped out. Get rid of the old.
> > +		 */
> > +		if (old)
> > +			zswap_entry_free(old);
> >
> > -	/*
> > -	 * The entry is successfully compressed and stored in the tree, there is
> > -	 * no further possibility of failure. Grab refs to the pool and objcg.
> > -	 * These refs will be dropped by zswap_entry_free() when the entry is
> > -	 * removed from the tree.
> > -	 */
> > -	zswap_pool_get(pool);
> > -	if (objcg)
> > -		obj_cgroup_get(objcg);
> > +		/*
> > +		 * The entry is successfully compressed and stored in the tree,
> there is
> > +		 * no further possibility of failure. Grab refs to the pool and
> objcg.
> > +		 * These refs will be dropped by zswap_entry_free() when the
> entry is
> > +		 * removed from the tree.
> > +		 */
> > +		zswap_pool_get(pool);
> > +		if (objcg)
> > +			obj_cgroup_get(objcg);
> >
> > -	/*
> > -	 * We finish initializing the entry while it's already in xarray.
> > -	 * This is safe because:
> > -	 *
> > -	 * 1. Concurrent stores and invalidations are excluded by folio lock.
> > -	 *
> > -	 * 2. Writeback is excluded by the entry not being on the LRU yet.
> > -	 *    The publishing order matters to prevent writeback from seeing
> > -	 *    an incoherent entry.
> > -	 */
> > -	entry->pool = pool;
> > -	entry->swpentry = page_swpentry;
> > -	entry->objcg = objcg;
> > -	entry->referenced = true;
> > -	if (entry->length) {
> > -		INIT_LIST_HEAD(&entry->lru);
> > -		zswap_lru_add(&zswap_list_lru, entry);
> > -	}
> > +		/*
> > +		 * We finish initializing the entry while it's already in xarray.
> > +		 * This is safe because:
> > +		 *
> > +		 * 1. Concurrent stores and invalidations are excluded by folio
> lock.
> > +		 *
> > +		 * 2. Writeback is excluded by the entry not being on the LRU
> yet.
> > +		 *    The publishing order matters to prevent writeback from
> seeing
> > +		 *    an incoherent entry.
> > +		 */
> > +		entry->pool = pool;
> > +		entry->swpentry = page_swpentry;
> > +		entry->objcg = objcg;
> > +		entry->referenced = true;
> > +		if (entry->length) {
> > +			INIT_LIST_HEAD(&entry->lru);
> > +			zswap_lru_add(&zswap_list_lru, entry);
> > +		}
> >
> > -	return entry->length;
> > +		compressed_bytes += entry->length;
> > +		continue;
> >
> >   store_failed:
> > -	zpool_free(pool->zpool, entry->handle);
> > +		zpool_free(pool->zpool, entry->handle);
> >   compress_failed:
> > -	zswap_entry_cache_free(entry);
> > -	return -EINVAL;
> > +		zswap_entry_cache_free(entry);
> > +		return -EINVAL;
> > +	}
> > +
> > +	return compressed_bytes;
> >   }
> >
> >   bool zswap_store(struct folio *folio)
> > @@ -1492,7 +1510,7 @@ bool zswap_store(struct folio *folio)
> >   	struct zswap_pool *pool;
> >   	size_t compressed_bytes = 0;
> >   	bool ret = false;
> > -	long index;
> > +	long si, ei, incr = SWAP_CRYPTO_BATCH_SIZE;
> >
> >   	VM_WARN_ON_ONCE(!folio_test_locked(folio));
> >   	VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
> > @@ -1526,11 +1544,13 @@ bool zswap_store(struct folio *folio)
> >   		mem_cgroup_put(memcg);
> >   	}
> >
> > -	for (index = 0; index < nr_pages; ++index) {
> > -		struct page *page = folio_page(folio, index);
> > +	/* Store the folio in batches of SWAP_CRYPTO_BATCH_SIZE pages. */
> > +	for (si = 0, ei = min(si + incr - 1, nr_pages - 1);
> > +	     ((si < nr_pages) && (ei < nr_pages));
> > +	     si = ei + 1, ei = min(si + incr - 1, nr_pages - 1)) {
> >   		ssize_t bytes;
> >
> > -		bytes = zswap_store_page(page, objcg, pool);
> > +		bytes = zswap_store_pages(folio, si, ei, objcg, pool);
> >   		if (bytes < 0)
> >   			goto put_pool;
> >   		compressed_bytes += bytes;
> > @@ -1565,9 +1585,9 @@ bool zswap_store(struct folio *folio)
> >   		struct zswap_entry *entry;
> >   		struct xarray *tree;
> >
> > -		for (index = 0; index < nr_pages; ++index) {
> > -			tree = swap_zswap_tree(swp_entry(type, offset +
> index));
> > -			entry = xa_erase(tree, offset + index);
> > +		for (si = 0; si < nr_pages; ++si) {
> > +			tree = swap_zswap_tree(swp_entry(type, offset +
> si));
> > +			entry = xa_erase(tree, offset + si);
> >   			if (entry)
> >   				zswap_entry_free(entry);
> >   		}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio.
  2024-12-03  1:13     ` Sridhar, Kanchana P
@ 2024-12-03  5:34       ` Yosry Ahmed
  2024-12-03 21:25         ` Sridhar, Kanchana P
  0 siblings, 1 reply; 18+ messages in thread
From: Yosry Ahmed @ 2024-12-03  5:34 UTC (permalink / raw)
  To: Sridhar, Kanchana P
  Cc: linux-kernel, linux-mm, hannes, nphamcs, chengming.zhou,
	usamaarif642, ryan.roberts, 21cnbao, akpm, Feghali, Wajdi K,
	Gopal, Vinodh

On Mon, Dec 2, 2024 at 5:13 PM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@google.com>
> > Sent: Monday, December 2, 2024 11:34 AM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> > usamaarif642@gmail.com; ryan.roberts@arm.com; 21cnbao@gmail.com;
> > akpm@linux-foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>;
> > Gopal, Vinodh <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to
> > process multiple pages in a folio.
> >
> > On Wed, Nov 27, 2024 at 2:53 PM Kanchana P Sridhar
> > <kanchana.p.sridhar@intel.com> wrote:
> > >
> > > Modified zswap_store() to store the folio in batches of
> > > SWAP_CRYPTO_BATCH_SIZE pages. Accordingly, refactored
> > zswap_store_page()
> > > into zswap_store_pages() that processes a range of pages in the folio.
> > > zswap_store_pages() is a vectorized version of zswap_store_page().
> > >
> > > For now, zswap_store_pages() will sequentially compress these pages with
> > > zswap_compress().
> > >
> > > These changes are follow-up to code review comments received for [1], and
> > > are intended to set up zswap_store() for batching with Intel IAA.
> > >
> > > [1]: https://patchwork.kernel.org/project/linux-
> > mm/patch/20241123070127.332773-11-kanchana.p.sridhar@intel.com/
> > >
> > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > > ---
> > >  include/linux/zswap.h |   1 +
> > >  mm/zswap.c            | 154 ++++++++++++++++++++++++------------------
> > >  2 files changed, 88 insertions(+), 67 deletions(-)
> > >
> > > diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> > > index d961ead91bf1..05a81e750744 100644
> > > --- a/include/linux/zswap.h
> > > +++ b/include/linux/zswap.h
> > > @@ -7,6 +7,7 @@
> > >
> > >  struct lruvec;
> > >
> > > +#define SWAP_CRYPTO_BATCH_SIZE 8UL
> > >  extern atomic_long_t zswap_stored_pages;
> > >
> > >  #ifdef CONFIG_ZSWAP
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index f6316b66fb23..b09d1023e775 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -1409,78 +1409,96 @@ static void shrink_worker(struct work_struct
> > *w)
> > >  * main API
> > >  **********************************/
> > >
> > > -static ssize_t zswap_store_page(struct page *page,
> > > -                               struct obj_cgroup *objcg,
> > > -                               struct zswap_pool *pool)
> > > +/*
> > > + * Store multiple pages in @folio, starting from the page at index @si up to
> > > + * and including the page at index @ei.
> > > + */
> > > +static ssize_t zswap_store_pages(struct folio *folio,
> > > +                                long si,
> > > +                                long ei,
> > > +                                struct obj_cgroup *objcg,
> > > +                                struct zswap_pool *pool)
> > >  {
> > > -       swp_entry_t page_swpentry = page_swap_entry(page);
> > > +       struct page *page;
> > > +       swp_entry_t page_swpentry;
> > >         struct zswap_entry *entry, *old;
> > > +       size_t compressed_bytes = 0;
> > > +       u8 nr_pages = ei - si + 1;
> > > +       u8 i;
> > > +
> > > +       for (i = 0; i < nr_pages; ++i) {
> > > +               page = folio_page(folio, si + i);
> > > +               page_swpentry = page_swap_entry(page);
> > > +
> > > +               /* allocate entry */
> > > +               entry = zswap_entry_cache_alloc(GFP_KERNEL,
> > page_to_nid(page));
> > > +               if (!entry) {
> > > +                       zswap_reject_kmemcache_fail++;
> > > +                       return -EINVAL;
> > > +               }
> >
> > I think this patch is wrong on its own, for example if an allocation
> > fails in the above loop we exit without cleaning up previous
> > allocations. I think it's fixed in patch 2 but we cannot introduce
>
> I think there might be a misunderstanding. zswap_store_pages() will
> clean up local resources allocated during an iteration of the for loop,
> upon an error in that iteration. If you see the "goto store_failed" and
> "goto compress_failed" this would explain what I mean. If an allocation
> fails for an iteration, we simply return -EINVAL, and zswap_store()
> will goto the "put_pool" label with "ret" still false, which will delete
> all zswap entries for the folio (allocated up until the error iteration in
> zswap_store_pages(); or potentially already in the xarray).
>
> Hence, there is no bug and each of the two patches are correct by
> themselves AFAICT, but please let me know if I am missing anything. Thanks!

Uh yes, the cleanup is in zswap_store().

>
> > bugs in-between patches. I think the helpers in patch 2 don't really
> > help as I mentioned. Please combine the changes and keep them in the
> > main series (unless you have a reason not to).
>
> Sure. As noted in my earlier response to comments received for patch 2,
> I can either inline all iterations or create helpers for all iterations over
> the pages in a batch. Appreciate your suggestions on which would be a
> better approach.

I think leaving them open-coded will be clearer for now. We can
revisit the code organization later if it gets out of hand.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
  2024-12-03  4:18           ` Sridhar, Kanchana P
@ 2024-12-03  5:49             ` Yosry Ahmed
  2024-12-03 21:28               ` Sridhar, Kanchana P
  0 siblings, 1 reply; 18+ messages in thread
From: Yosry Ahmed @ 2024-12-03  5:49 UTC (permalink / raw)
  To: Sridhar, Kanchana P
  Cc: Chengming Zhou, linux-kernel, linux-mm, hannes, nphamcs,
	usamaarif642, ryan.roberts, 21cnbao, akpm, Feghali, Wajdi K,
	Gopal, Vinodh

On Mon, Dec 2, 2024 at 8:19 PM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Chengming Zhou <chengming.zhou@linux.dev>
> > Sent: Monday, December 2, 2024 7:06 PM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Yosry Ahmed
> > <yosryahmed@google.com>
> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > hannes@cmpxchg.org; nphamcs@gmail.com; usamaarif642@gmail.com;
> > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
> > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
> > for batching.
> >
> > On 2024/12/3 09:01, Sridhar, Kanchana P wrote:
> > > Hi Chengming, Yosry,
> > >
> > >> -----Original Message-----
> > >> From: Yosry Ahmed <yosryahmed@google.com>
> > >> Sent: Monday, December 2, 2024 11:33 AM
> > >> To: Chengming Zhou <chengming.zhou@linux.dev>
> > >> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> > >> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> > >> nphamcs@gmail.com; usamaarif642@gmail.com; ryan.roberts@arm.com;
> > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > >> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages()
> > simplifications
> > >> for batching.
> > >>
> > >> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
> > >> <chengming.zhou@linux.dev> wrote:
> > >>>
> > >>> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> > >>>> In order to set up zswap_store_pages() to enable a clean batching
> > >>>> implementation in [1], this patch implements the following changes:
> > >>>>
> > >>>> 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
> > >>>>      all pages in the specified range for the folio, upfront. If this fails,
> > >>>>      we return an error status to zswap_store().
> > >>>>
> > >>>> 2) Addition of zswap_compress_pages() that calls zswap_compress() for
> > >> each
> > >>>>      page, and returns false if any zswap_compress() fails, so
> > >>>>      zswap_store_page() can cleanup resources allocated and return an
> > >> error
> > >>>>      status to zswap_store().
> > >>>>
> > >>>> 3) A "store_pages_failed" label that is a catch-all for all failure points
> > >>>>      in zswap_store_pages(). This facilitates cleaner error handling within
> > >>>>      zswap_store_pages(), which will become important for IAA compress
> > >>>>      batching in [1].
> > >>>>
> > >>>> [1]: https://patchwork.kernel.org/project/linux-
> > mm/list/?series=911935
> > >>>>
> > >>>> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > >>>> ---
> > >>>>    mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++-
> > ---
> > >> ---------
> > >>>>    1 file changed, 71 insertions(+), 22 deletions(-)
> > >>>>
> > >>>> diff --git a/mm/zswap.c b/mm/zswap.c
> > >>>> index b09d1023e775..db80c66e2205 100644
> > >>>> --- a/mm/zswap.c
> > >>>> +++ b/mm/zswap.c
> > >>>> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct
> > >> *w)
> > >>>>    * main API
> > >>>>    **********************************/
> > >>>>
> > >>>> +static bool zswap_compress_pages(struct page *pages[],
> > >>>> +                              struct zswap_entry *entries[],
> > >>>> +                              u8 nr_pages,
> > >>>> +                              struct zswap_pool *pool)
> > >>>> +{
> > >>>> +     u8 i;
> > >>>> +
> > >>>> +     for (i = 0; i < nr_pages; ++i) {
> > >>>> +             if (!zswap_compress(pages[i], entries[i], pool))
> > >>>> +                     return false;
> > >>>> +     }
> > >>>> +
> > >>>> +     return true;
> > >>>> +}
> > >>>
> > >>> How about introducing a `zswap_compress_folio()` interface which
> > >>> can be used by `zswap_store()`?
> > >>> ```
> > >>> zswap_store()
> > >>>          nr_pages = folio_nr_pages(folio)
> > >>>
> > >>>          entries = zswap_alloc_entries(nr_pages)
> > >>>
> > >>>          ret = zswap_compress_folio(folio, entries, pool)
> > >>>
> > >>>          // store entries into xarray and LRU list
> > >>> ```
> > >>>
> > >>> And this version `zswap_compress_folio()` is very simple for now:
> > >>> ```
> > >>> zswap_compress_folio()
> > >>>          nr_pages = folio_nr_pages(folio)
> > >>>
> > >>>          for (index = 0; index < nr_pages; ++index) {
> > >>>                  struct page *page = folio_page(folio, index);
> > >>>
> > >>>                  if (!zswap_compress(page, &entries[index], pool))
> > >>>                          return false;
> > >>>          }
> > >>>
> > >>>          return true;
> > >>> ```
> > >>> This can be easily extended to support your "batched" version.
> > >>>
> > >>> Then the old `zswap_store_page()` could be removed.
> > >>>
> > >>> The good point is simplicity, that we don't need to slice folio
> > >>> into multiple batches, then repeat the common operations for each
> > >>> batch, like preparing entries, storing into xarray and LRU list...
> > >>
> > >> +1
> > >
> > > Thanks for the code review comments. One question though: would
> > > it make sense to trade-off the memory footprint cost with the code
> > > simplification? For instance, lets say we want to store a 64k folio.
> > > We would allocate memory for 16 zswap entries, and lets say one of
> > > the compressions fails, we would deallocate memory for all 16 zswap
> > > entries. Could this lead to zswap_entry kmem_cache starvation and
> > > subsequent zswap_store() failures in multiple processes scenarios?
> >
> > Ah, I get your consideration. But it's the unlikely case, right?
> >
> > If the case you mentioned above happens a lot, I think yes, we should
> > optimize its memory footprint to avoid allocation and deallocation.
>
> Thanks Chengming. I see your point. Let me gather performance data
> for the two options, and share.

Yeah I think we shouldn't optimize for the uncommon case, not until
there's a real problem that needs fixing.

>
> >
> > On the other hand, we should consider a folio would be compressed
> > successfully in most cases. So we have to allocate all entries
> > eventually.
> >
> > Based on your consideration, I think your way is ok too, although
> > I think the patch 2/2 should be dropped, since it hides pages loop
> > in smaller functions, as Yosry mentioned too.
>
> My main intent with patch 2/2 was to set up the error handling
> path to be common and simpler, whether errors were encountered
> during compression/zpool_malloc/xarray store. Hence, I initialize the
> allocated zswap_entry's handle in zswap_alloc_entries() to ERR_PTR(-EINVAL),
> so it is easy for the common error handling code in patch 2 to determine
> if the handle was allocated (and hence needs to be freed). This benefits
> the batching code by eliminating the need to maintain state as to which
> stage of zswap_store_pages() sees an error, based on which resources
> would need to be deleted.
>
> My key consideration is to keep the batching error handling code simple,
> hence these changes in patch 2. The changes described above would
> help batching, and should not impact the non-batching case, as indicated
> by the regression testing data I've included in the cover letter.
>
> I don't mind inlining the implementation of the helper functions, as I
> mentioned in my response to Yosry. I am hoping the error handling
> simplifications are acceptable, since they will help the batching code.

I think having the loops open-coded should still be better than
separate helpers. But I understand not wanting to have the loops
directly in zswap_store(), as the error handling would be simpler if
we do it in a separate function like zswap_store_pages().

How about we just move the loop from  zswap_store() to
zswap_store_page() and call it zswap_store_folio()? When batching is
added I imagine we may need to split the loop into two loops before
and after zswap_compress_folio(), which isn't very neat but is
probably fine.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio.
  2024-12-03  5:34       ` Yosry Ahmed
@ 2024-12-03 21:25         ` Sridhar, Kanchana P
  0 siblings, 0 replies; 18+ messages in thread
From: Sridhar, Kanchana P @ 2024-12-03 21:25 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: linux-kernel, linux-mm, hannes, nphamcs, chengming.zhou,
	usamaarif642, ryan.roberts, 21cnbao, akpm, Feghali, Wajdi K,
	Gopal, Vinodh, Sridhar, Kanchana P


> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Monday, December 2, 2024 9:34 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> usamaarif642@gmail.com; ryan.roberts@arm.com; 21cnbao@gmail.com;
> akpm@linux-foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>;
> Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to
> process multiple pages in a folio.
> 
> On Mon, Dec 2, 2024 at 5:13 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Yosry Ahmed <yosryahmed@google.com>
> > > Sent: Monday, December 2, 2024 11:34 AM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > hannes@cmpxchg.org; nphamcs@gmail.com;
> chengming.zhou@linux.dev;
> > > usamaarif642@gmail.com; ryan.roberts@arm.com; 21cnbao@gmail.com;
> > > akpm@linux-foundation.org; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>;
> > > Gopal, Vinodh <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to
> > > process multiple pages in a folio.
> > >
> > > On Wed, Nov 27, 2024 at 2:53 PM Kanchana P Sridhar
> > > <kanchana.p.sridhar@intel.com> wrote:
> > > >
> > > > Modified zswap_store() to store the folio in batches of
> > > > SWAP_CRYPTO_BATCH_SIZE pages. Accordingly, refactored
> > > zswap_store_page()
> > > > into zswap_store_pages() that processes a range of pages in the folio.
> > > > zswap_store_pages() is a vectorized version of zswap_store_page().
> > > >
> > > > For now, zswap_store_pages() will sequentially compress these pages
> with
> > > > zswap_compress().
> > > >
> > > > These changes are follow-up to code review comments received for [1],
> and
> > > > are intended to set up zswap_store() for batching with Intel IAA.
> > > >
> > > > [1]: https://patchwork.kernel.org/project/linux-
> > > mm/patch/20241123070127.332773-11-kanchana.p.sridhar@intel.com/
> > > >
> > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > > > ---
> > > >  include/linux/zswap.h |   1 +
> > > >  mm/zswap.c            | 154 ++++++++++++++++++++++++------------------
> > > >  2 files changed, 88 insertions(+), 67 deletions(-)
> > > >
> > > > diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> > > > index d961ead91bf1..05a81e750744 100644
> > > > --- a/include/linux/zswap.h
> > > > +++ b/include/linux/zswap.h
> > > > @@ -7,6 +7,7 @@
> > > >
> > > >  struct lruvec;
> > > >
> > > > +#define SWAP_CRYPTO_BATCH_SIZE 8UL
> > > >  extern atomic_long_t zswap_stored_pages;
> > > >
> > > >  #ifdef CONFIG_ZSWAP
> > > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > > index f6316b66fb23..b09d1023e775 100644
> > > > --- a/mm/zswap.c
> > > > +++ b/mm/zswap.c
> > > > @@ -1409,78 +1409,96 @@ static void shrink_worker(struct
> work_struct
> > > *w)
> > > >  * main API
> > > >  **********************************/
> > > >
> > > > -static ssize_t zswap_store_page(struct page *page,
> > > > -                               struct obj_cgroup *objcg,
> > > > -                               struct zswap_pool *pool)
> > > > +/*
> > > > + * Store multiple pages in @folio, starting from the page at index @si
> up to
> > > > + * and including the page at index @ei.
> > > > + */
> > > > +static ssize_t zswap_store_pages(struct folio *folio,
> > > > +                                long si,
> > > > +                                long ei,
> > > > +                                struct obj_cgroup *objcg,
> > > > +                                struct zswap_pool *pool)
> > > >  {
> > > > -       swp_entry_t page_swpentry = page_swap_entry(page);
> > > > +       struct page *page;
> > > > +       swp_entry_t page_swpentry;
> > > >         struct zswap_entry *entry, *old;
> > > > +       size_t compressed_bytes = 0;
> > > > +       u8 nr_pages = ei - si + 1;
> > > > +       u8 i;
> > > > +
> > > > +       for (i = 0; i < nr_pages; ++i) {
> > > > +               page = folio_page(folio, si + i);
> > > > +               page_swpentry = page_swap_entry(page);
> > > > +
> > > > +               /* allocate entry */
> > > > +               entry = zswap_entry_cache_alloc(GFP_KERNEL,
> > > page_to_nid(page));
> > > > +               if (!entry) {
> > > > +                       zswap_reject_kmemcache_fail++;
> > > > +                       return -EINVAL;
> > > > +               }
> > >
> > > I think this patch is wrong on its own, for example if an allocation
> > > fails in the above loop we exit without cleaning up previous
> > > allocations. I think it's fixed in patch 2 but we cannot introduce
> >
> > I think there might be a misunderstanding. zswap_store_pages() will
> > clean up local resources allocated during an iteration of the for loop,
> > upon an error in that iteration. If you see the "goto store_failed" and
> > "goto compress_failed" this would explain what I mean. If an allocation
> > fails for an iteration, we simply return -EINVAL, and zswap_store()
> > will goto the "put_pool" label with "ret" still false, which will delete
> > all zswap entries for the folio (allocated up until the error iteration in
> > zswap_store_pages(); or potentially already in the xarray).
> >
> > Hence, there is no bug and each of the two patches are correct by
> > themselves AFAICT, but please let me know if I am missing anything.
> Thanks!
> 
> Uh yes, the cleanup is in zswap_store().
> 
> >
> > > bugs in-between patches. I think the helpers in patch 2 don't really
> > > help as I mentioned. Please combine the changes and keep them in the
> > > main series (unless you have a reason not to).
> >
> > Sure. As noted in my earlier response to comments received for patch 2,
> > I can either inline all iterations or create helpers for all iterations over
> > the pages in a batch. Appreciate your suggestions on which would be a
> > better approach.
> 
> I think leaving them open-coded will be clearer for now. We can
> revisit the code organization later if it gets out of hand.

Sounds good!

Thanks,
Kanchana

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching.
  2024-12-03  5:49             ` Yosry Ahmed
@ 2024-12-03 21:28               ` Sridhar, Kanchana P
  0 siblings, 0 replies; 18+ messages in thread
From: Sridhar, Kanchana P @ 2024-12-03 21:28 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Chengming Zhou, linux-kernel, linux-mm, hannes, nphamcs,
	usamaarif642, ryan.roberts, 21cnbao, akpm, Feghali, Wajdi K,
	Gopal, Vinodh, Sridhar, Kanchana P


> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Monday, December 2, 2024 9:50 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> nphamcs@gmail.com; usamaarif642@gmail.com; ryan.roberts@arm.com;
> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
> for batching.
> 
> On Mon, Dec 2, 2024 at 8:19 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Chengming Zhou <chengming.zhou@linux.dev>
> > > Sent: Monday, December 2, 2024 7:06 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Yosry Ahmed
> > > <yosryahmed@google.com>
> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > hannes@cmpxchg.org; nphamcs@gmail.com; usamaarif642@gmail.com;
> > > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-
> foundation.org;
> > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages()
> simplifications
> > > for batching.
> > >
> > > On 2024/12/3 09:01, Sridhar, Kanchana P wrote:
> > > > Hi Chengming, Yosry,
> > > >
> > > >> -----Original Message-----
> > > >> From: Yosry Ahmed <yosryahmed@google.com>
> > > >> Sent: Monday, December 2, 2024 11:33 AM
> > > >> To: Chengming Zhou <chengming.zhou@linux.dev>
> > > >> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
> > > >> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
> > > >> nphamcs@gmail.com; usamaarif642@gmail.com;
> ryan.roberts@arm.com;
> > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > > >> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages()
> > > simplifications
> > > >> for batching.
> > > >>
> > > >> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
> > > >> <chengming.zhou@linux.dev> wrote:
> > > >>>
> > > >>> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
> > > >>>> In order to set up zswap_store_pages() to enable a clean batching
> > > >>>> implementation in [1], this patch implements the following changes:
> > > >>>>
> > > >>>> 1) Addition of zswap_alloc_entries() which will allocate zswap
> entries for
> > > >>>>      all pages in the specified range for the folio, upfront. If this fails,
> > > >>>>      we return an error status to zswap_store().
> > > >>>>
> > > >>>> 2) Addition of zswap_compress_pages() that calls zswap_compress()
> for
> > > >> each
> > > >>>>      page, and returns false if any zswap_compress() fails, so
> > > >>>>      zswap_store_page() can cleanup resources allocated and return
> an
> > > >> error
> > > >>>>      status to zswap_store().
> > > >>>>
> > > >>>> 3) A "store_pages_failed" label that is a catch-all for all failure points
> > > >>>>      in zswap_store_pages(). This facilitates cleaner error handling
> within
> > > >>>>      zswap_store_pages(), which will become important for IAA
> compress
> > > >>>>      batching in [1].
> > > >>>>
> > > >>>> [1]: https://patchwork.kernel.org/project/linux-
> > > mm/list/?series=911935
> > > >>>>
> > > >>>> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > > >>>> ---
> > > >>>>    mm/zswap.c | 93
> +++++++++++++++++++++++++++++++++++++++++-
> > > ---
> > > >> ---------
> > > >>>>    1 file changed, 71 insertions(+), 22 deletions(-)
> > > >>>>
> > > >>>> diff --git a/mm/zswap.c b/mm/zswap.c
> > > >>>> index b09d1023e775..db80c66e2205 100644
> > > >>>> --- a/mm/zswap.c
> > > >>>> +++ b/mm/zswap.c
> > > >>>> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct
> work_struct
> > > >> *w)
> > > >>>>    * main API
> > > >>>>    **********************************/
> > > >>>>
> > > >>>> +static bool zswap_compress_pages(struct page *pages[],
> > > >>>> +                              struct zswap_entry *entries[],
> > > >>>> +                              u8 nr_pages,
> > > >>>> +                              struct zswap_pool *pool)
> > > >>>> +{
> > > >>>> +     u8 i;
> > > >>>> +
> > > >>>> +     for (i = 0; i < nr_pages; ++i) {
> > > >>>> +             if (!zswap_compress(pages[i], entries[i], pool))
> > > >>>> +                     return false;
> > > >>>> +     }
> > > >>>> +
> > > >>>> +     return true;
> > > >>>> +}
> > > >>>
> > > >>> How about introducing a `zswap_compress_folio()` interface which
> > > >>> can be used by `zswap_store()`?
> > > >>> ```
> > > >>> zswap_store()
> > > >>>          nr_pages = folio_nr_pages(folio)
> > > >>>
> > > >>>          entries = zswap_alloc_entries(nr_pages)
> > > >>>
> > > >>>          ret = zswap_compress_folio(folio, entries, pool)
> > > >>>
> > > >>>          // store entries into xarray and LRU list
> > > >>> ```
> > > >>>
> > > >>> And this version `zswap_compress_folio()` is very simple for now:
> > > >>> ```
> > > >>> zswap_compress_folio()
> > > >>>          nr_pages = folio_nr_pages(folio)
> > > >>>
> > > >>>          for (index = 0; index < nr_pages; ++index) {
> > > >>>                  struct page *page = folio_page(folio, index);
> > > >>>
> > > >>>                  if (!zswap_compress(page, &entries[index], pool))
> > > >>>                          return false;
> > > >>>          }
> > > >>>
> > > >>>          return true;
> > > >>> ```
> > > >>> This can be easily extended to support your "batched" version.
> > > >>>
> > > >>> Then the old `zswap_store_page()` could be removed.
> > > >>>
> > > >>> The good point is simplicity, that we don't need to slice folio
> > > >>> into multiple batches, then repeat the common operations for each
> > > >>> batch, like preparing entries, storing into xarray and LRU list...
> > > >>
> > > >> +1
> > > >
> > > > Thanks for the code review comments. One question though: would
> > > > it make sense to trade-off the memory footprint cost with the code
> > > > simplification? For instance, lets say we want to store a 64k folio.
> > > > We would allocate memory for 16 zswap entries, and lets say one of
> > > > the compressions fails, we would deallocate memory for all 16 zswap
> > > > entries. Could this lead to zswap_entry kmem_cache starvation and
> > > > subsequent zswap_store() failures in multiple processes scenarios?
> > >
> > > Ah, I get your consideration. But it's the unlikely case, right?
> > >
> > > If the case you mentioned above happens a lot, I think yes, we should
> > > optimize its memory footprint to avoid allocation and deallocation.
> >
> > Thanks Chengming. I see your point. Let me gather performance data
> > for the two options, and share.
> 
> Yeah I think we shouldn't optimize for the uncommon case, not until
> there's a real problem that needs fixing.

Agreed.

> 
> >
> > >
> > > On the other hand, we should consider a folio would be compressed
> > > successfully in most cases. So we have to allocate all entries
> > > eventually.
> > >
> > > Based on your consideration, I think your way is ok too, although
> > > I think the patch 2/2 should be dropped, since it hides pages loop
> > > in smaller functions, as Yosry mentioned too.
> >
> > My main intent with patch 2/2 was to set up the error handling
> > path to be common and simpler, whether errors were encountered
> > during compression/zpool_malloc/xarray store. Hence, I initialize the
> > allocated zswap_entry's handle in zswap_alloc_entries() to ERR_PTR(-
> EINVAL),
> > so it is easy for the common error handling code in patch 2 to determine
> > if the handle was allocated (and hence needs to be freed). This benefits
> > the batching code by eliminating the need to maintain state as to which
> > stage of zswap_store_pages() sees an error, based on which resources
> > would need to be deleted.
> >
> > My key consideration is to keep the batching error handling code simple,
> > hence these changes in patch 2. The changes described above would
> > help batching, and should not impact the non-batching case, as indicated
> > by the regression testing data I've included in the cover letter.
> >
> > I don't mind inlining the implementation of the helper functions, as I
> > mentioned in my response to Yosry. I am hoping the error handling
> > simplifications are acceptable, since they will help the batching code.
> 
> I think having the loops open-coded should still be better than
> separate helpers. But I understand not wanting to have the loops
> directly in zswap_store(), as the error handling would be simpler if
> we do it in a separate function like zswap_store_pages().
> 
> How about we just move the loop from  zswap_store() to
> zswap_store_page() and call it zswap_store_folio()? When batching is
> added I imagine we may need to split the loop into two loops before
> and after zswap_compress_folio(), which isn't very neat but is
> probably fine.

Sure, this sounds like a good way to organize the code. I will proceed
as suggested.

Thanks,
Kanchana

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-12-03 21:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-27 22:53 [PATCH v1 0/2] Vectorize and simplify zswap_store_page() Kanchana P Sridhar
2024-11-27 22:53 ` [PATCH v1 1/2] mm: zswap: Modified zswap_store_page() to process multiple pages in a folio Kanchana P Sridhar
2024-12-02 19:33   ` Yosry Ahmed
2024-12-03  1:13     ` Sridhar, Kanchana P
2024-12-03  5:34       ` Yosry Ahmed
2024-12-03 21:25         ` Sridhar, Kanchana P
2024-12-03  3:23   ` Chengming Zhou
2024-12-03  4:37     ` Sridhar, Kanchana P
2024-11-27 22:53 ` [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for batching Kanchana P Sridhar
2024-11-28  7:00   ` Chengming Zhou
2024-12-02 19:32     ` Yosry Ahmed
2024-12-03  1:01       ` Sridhar, Kanchana P
2024-12-03  3:06         ` Chengming Zhou
2024-12-03  4:18           ` Sridhar, Kanchana P
2024-12-03  5:49             ` Yosry Ahmed
2024-12-03 21:28               ` Sridhar, Kanchana P
2024-12-03  0:17     ` Nhat Pham
2024-12-03  1:15       ` Sridhar, Kanchana P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox