* Re: [RFC PATCH v2 0/7] mm: zswap: move writeback LRU from zpool to zswap [not found] <20230606145611.704392-1-cerasuolodomenico@gmail.com> @ 2023-06-07 9:16 ` Yosry Ahmed 2023-06-07 9:23 ` Domenico Cerasuolo [not found] ` <20230606145611.704392-8-cerasuolodomenico@gmail.com> ` (6 subsequent siblings) 7 siblings, 1 reply; 28+ messages in thread From: Yosry Ahmed @ 2023-06-07 9:16 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > This series aims to improve the zswap reclaim mechanism by reorganizing > the LRU management. In the current implementation, the LRU is maintained > within each zpool driver, resulting in duplicated code across the three > drivers. The proposed change consists in moving the LRU management from > the individual implementations up to the zswap layer. > > The primary objective of this refactoring effort is to simplify the > codebase. By unifying the reclaim loop and consolidating LRU handling > within zswap, we can eliminate redundant code and improve > maintainability. Additionally, this change enables the reclamation of > stored pages in their actual LRU order. Presently, the zpool drivers > link backing pages in an LRU, causing compressed pages with different > LRU positions to be written back simultaneously. > > The series consists of several patches. The first patch implements the > LRU and the reclaim loop in zswap, but it is not used yet because all > three driver implementations are marked as zpool_evictable. > The following three commits modify each zpool driver to be not > zpool_evictable, allowing the use of the reclaim loop in zswap. > As the drivers removed their shrink functions, the zpool interface is > then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink. > Finally, the code in zswap is further cleaned up by simplifying the > writeback function and removing the now unnecessary zswap_header. > > Based on mm-stable + commit 399ab221f3ff > ("mm: zswap: shrink until can accept") currently in mm-unstable. I tested this + commit fe1d1f7d0fb5 ("mm: zswap: support exclusive loads") currently in mm-unstable, using zsmalloc and CONFIG_ZSWAP_EXCLUSIVE_LOADS=y. I only ran basic zswap tests with manual writeback induction and made sure everything is sane. I obviously hope you did more involved testing :) The only problem I came across is the conflict with fe1d1f7d0fb5, and I suggested the fix in patch 1. With the fix, everything seems correct. So I guess, FWIW for all the patches except 2 & 3 (for zbud and z3fold): Tested-by: Yosry Ahmed <yosryahmed@google.com> > > V2: > - fixed lru list init/del/del_init (Johannes) > - renamed pool.lock to lru_lock and added lock ordering comment (Yosry) > - trimmed zsmalloc even more (Johannes | Nhat) > - moved ref drop out of writeback function (Johannes) > > Domenico Cerasuolo (7): > mm: zswap: add pool shrinking mechanism > mm: zswap: remove page reclaim logic from zbud > mm: zswap: remove page reclaim logic from z3fold > mm: zswap: remove page reclaim logic from zsmalloc > mm: zswap: remove shrink from zpool interface > mm: zswap: simplify writeback function > mm: zswap: remove zswap_header > > include/linux/zpool.h | 19 +- > mm/z3fold.c | 249 +------------------------- > mm/zbud.c | 167 +----------------- > mm/zpool.c | 48 +---- > mm/zsmalloc.c | 396 ++---------------------------------------- > mm/zswap.c | 186 +++++++++++--------- > 6 files changed, 130 insertions(+), 935 deletions(-) > > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 0/7] mm: zswap: move writeback LRU from zpool to zswap 2023-06-07 9:16 ` [RFC PATCH v2 0/7] mm: zswap: move writeback LRU from zpool to zswap Yosry Ahmed @ 2023-06-07 9:23 ` Domenico Cerasuolo 2023-06-07 9:32 ` Yosry Ahmed 0 siblings, 1 reply; 28+ messages in thread From: Domenico Cerasuolo @ 2023-06-07 9:23 UTC (permalink / raw) To: Yosry Ahmed Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Wed, Jun 7, 2023 at 11:16 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo > <cerasuolodomenico@gmail.com> wrote: > > > > This series aims to improve the zswap reclaim mechanism by reorganizing > > the LRU management. In the current implementation, the LRU is maintained > > within each zpool driver, resulting in duplicated code across the three > > drivers. The proposed change consists in moving the LRU management from > > the individual implementations up to the zswap layer. > > > > The primary objective of this refactoring effort is to simplify the > > codebase. By unifying the reclaim loop and consolidating LRU handling > > within zswap, we can eliminate redundant code and improve > > maintainability. Additionally, this change enables the reclamation of > > stored pages in their actual LRU order. Presently, the zpool drivers > > link backing pages in an LRU, causing compressed pages with different > > LRU positions to be written back simultaneously. > > > > The series consists of several patches. The first patch implements the > > LRU and the reclaim loop in zswap, but it is not used yet because all > > three driver implementations are marked as zpool_evictable. > > The following three commits modify each zpool driver to be not > > zpool_evictable, allowing the use of the reclaim loop in zswap. > > As the drivers removed their shrink functions, the zpool interface is > > then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink. > > Finally, the code in zswap is further cleaned up by simplifying the > > writeback function and removing the now unnecessary zswap_header. > > > > Based on mm-stable + commit 399ab221f3ff > > ("mm: zswap: shrink until can accept") currently in mm-unstable. > > I tested this + commit fe1d1f7d0fb5 ("mm: zswap: support exclusive > loads") currently in mm-unstable, using zsmalloc and > CONFIG_ZSWAP_EXCLUSIVE_LOADS=y. I only ran basic zswap tests with > manual writeback induction and made sure everything is sane. I > obviously hope you did more involved testing :) > > The only problem I came across is the conflict with fe1d1f7d0fb5, and > I suggested the fix in patch 1. With the fix, everything seems > correct. > > So I guess, FWIW for all the patches except 2 & 3 (for zbud and z3fold): > Tested-by: Yosry Ahmed <yosryahmed@google.com> Thanks a lot for the effort! I'll rebase and test it again before submitting the new version. > > > > > V2: > > - fixed lru list init/del/del_init (Johannes) > > - renamed pool.lock to lru_lock and added lock ordering comment (Yosry) > > - trimmed zsmalloc even more (Johannes | Nhat) > > - moved ref drop out of writeback function (Johannes) > > > > Domenico Cerasuolo (7): > > mm: zswap: add pool shrinking mechanism > > mm: zswap: remove page reclaim logic from zbud > > mm: zswap: remove page reclaim logic from z3fold > > mm: zswap: remove page reclaim logic from zsmalloc > > mm: zswap: remove shrink from zpool interface > > mm: zswap: simplify writeback function > > mm: zswap: remove zswap_header > > > > include/linux/zpool.h | 19 +- > > mm/z3fold.c | 249 +------------------------- > > mm/zbud.c | 167 +----------------- > > mm/zpool.c | 48 +---- > > mm/zsmalloc.c | 396 ++---------------------------------------- > > mm/zswap.c | 186 +++++++++++--------- > > 6 files changed, 130 insertions(+), 935 deletions(-) > > > > -- > > 2.34.1 > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 0/7] mm: zswap: move writeback LRU from zpool to zswap 2023-06-07 9:23 ` Domenico Cerasuolo @ 2023-06-07 9:32 ` Yosry Ahmed 0 siblings, 0 replies; 28+ messages in thread From: Yosry Ahmed @ 2023-06-07 9:32 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Wed, Jun 7, 2023 at 2:24 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > On Wed, Jun 7, 2023 at 11:16 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo > > <cerasuolodomenico@gmail.com> wrote: > > > > > > This series aims to improve the zswap reclaim mechanism by reorganizing > > > the LRU management. In the current implementation, the LRU is maintained > > > within each zpool driver, resulting in duplicated code across the three > > > drivers. The proposed change consists in moving the LRU management from > > > the individual implementations up to the zswap layer. > > > > > > The primary objective of this refactoring effort is to simplify the > > > codebase. By unifying the reclaim loop and consolidating LRU handling > > > within zswap, we can eliminate redundant code and improve > > > maintainability. Additionally, this change enables the reclamation of > > > stored pages in their actual LRU order. Presently, the zpool drivers > > > link backing pages in an LRU, causing compressed pages with different > > > LRU positions to be written back simultaneously. > > > > > > The series consists of several patches. The first patch implements the > > > LRU and the reclaim loop in zswap, but it is not used yet because all > > > three driver implementations are marked as zpool_evictable. > > > The following three commits modify each zpool driver to be not > > > zpool_evictable, allowing the use of the reclaim loop in zswap. > > > As the drivers removed their shrink functions, the zpool interface is > > > then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink. > > > Finally, the code in zswap is further cleaned up by simplifying the > > > writeback function and removing the now unnecessary zswap_header. > > > > > > Based on mm-stable + commit 399ab221f3ff > > > ("mm: zswap: shrink until can accept") currently in mm-unstable. > > > > I tested this + commit fe1d1f7d0fb5 ("mm: zswap: support exclusive > > loads") currently in mm-unstable, using zsmalloc and > > CONFIG_ZSWAP_EXCLUSIVE_LOADS=y. I only ran basic zswap tests with > > manual writeback induction and made sure everything is sane. I > > obviously hope you did more involved testing :) > > > > The only problem I came across is the conflict with fe1d1f7d0fb5, and > > I suggested the fix in patch 1. With the fix, everything seems > > correct. > > > > So I guess, FWIW for all the patches except 2 & 3 (for zbud and z3fold): > > Tested-by: Yosry Ahmed <yosryahmed@google.com> > > Thanks a lot for the effort! I'll rebase and test it again before submitting the > new version. Perhaps give v2 a little bit more time to give other folks a chance to take a look as well, save yourself (and probably Andrew) the trouble of sending a new version for every single review :) > > > > > > > > > V2: > > > - fixed lru list init/del/del_init (Johannes) > > > - renamed pool.lock to lru_lock and added lock ordering comment (Yosry) > > > - trimmed zsmalloc even more (Johannes | Nhat) > > > - moved ref drop out of writeback function (Johannes) > > > > > > Domenico Cerasuolo (7): > > > mm: zswap: add pool shrinking mechanism > > > mm: zswap: remove page reclaim logic from zbud > > > mm: zswap: remove page reclaim logic from z3fold > > > mm: zswap: remove page reclaim logic from zsmalloc > > > mm: zswap: remove shrink from zpool interface > > > mm: zswap: simplify writeback function > > > mm: zswap: remove zswap_header > > > > > > include/linux/zpool.h | 19 +- > > > mm/z3fold.c | 249 +------------------------- > > > mm/zbud.c | 167 +----------------- > > > mm/zpool.c | 48 +---- > > > mm/zsmalloc.c | 396 ++---------------------------------------- > > > mm/zswap.c | 186 +++++++++++--------- > > > 6 files changed, 130 insertions(+), 935 deletions(-) > > > > > > -- > > > 2.34.1 > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20230606145611.704392-8-cerasuolodomenico@gmail.com>]
* Re: [RFC PATCH v2 7/7] mm: zswap: remove zswap_header [not found] ` <20230606145611.704392-8-cerasuolodomenico@gmail.com> @ 2023-06-07 9:30 ` Yosry Ahmed 2023-06-09 16:10 ` Domenico Cerasuolo 0 siblings, 1 reply; 28+ messages in thread From: Yosry Ahmed @ 2023-06-07 9:30 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > Previously, zswap_header served the purpose of storing the swpentry > within zpool pages. This allowed zpool implementations to pass relevant > information to the writeback function. However, with the current > implementation, writeback is directly handled within zswap. > Consequently, there is no longer a necessity for zswap_header, as the > swp_entry_t can be stored directly in zswap_entry. > > Suggested-by: Yosry Ahmed <yosryahmed@google.com> > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Thanks for this cleanup. It gives us back some of the memory used by list_head in zswap entry, and remove an unnecessary zpool map operation. > --- > mm/zswap.c | 52 ++++++++++++++++++++++------------------------------ > 1 file changed, 22 insertions(+), 30 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index ef8604812352..f689444dd5a7 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -193,7 +193,7 @@ struct zswap_pool { > */ > struct zswap_entry { > struct rb_node rbnode; > - pgoff_t offset; > + swp_entry_t swpentry; > int refcount; > unsigned int length; > struct zswap_pool *pool; > @@ -205,10 +205,6 @@ struct zswap_entry { > struct list_head lru; > }; > > -struct zswap_header { > - swp_entry_t swpentry; > -}; > - > /* > * The tree lock in the zswap_tree struct protects a few things: > * - the rbtree > @@ -250,7 +246,7 @@ static bool zswap_has_pool; > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > zpool_get_type((p)->zpool)) > > -static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > +static int zswap_writeback_entry(struct zswap_entry *entry, > struct zswap_tree *tree); > static int zswap_pool_get(struct zswap_pool *pool); > static void zswap_pool_put(struct zswap_pool *pool); > @@ -311,12 +307,14 @@ static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset) > { > struct rb_node *node = root->rb_node; > struct zswap_entry *entry; > + pgoff_t entry_offset; > > while (node) { > entry = rb_entry(node, struct zswap_entry, rbnode); > - if (entry->offset > offset) > + entry_offset = swp_offset(entry->swpentry); > + if (entry_offset > offset) > node = node->rb_left; > - else if (entry->offset < offset) > + else if (entry_offset < offset) > node = node->rb_right; > else > return entry; > @@ -333,13 +331,15 @@ static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry, > { > struct rb_node **link = &root->rb_node, *parent = NULL; > struct zswap_entry *myentry; > + pgoff_t myentry_offset, entry_offset = swp_offset(entry->swpentry); > > while (*link) { > parent = *link; > myentry = rb_entry(parent, struct zswap_entry, rbnode); > - if (myentry->offset > entry->offset) > + myentry_offset = swp_offset(myentry->swpentry); > + if (myentry_offset > entry_offset) > link = &(*link)->rb_left; > - else if (myentry->offset < entry->offset) > + else if (myentry_offset < entry_offset) This whole myentry thing is very confusing to me. I initially thought myentry would be the entry passed in as an argument. Can we change the naming here to make it more consistent with zswap_rb_search() naming? > link = &(*link)->rb_right; > else { > *dupentry = myentry; > @@ -598,7 +598,6 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > static int zswap_shrink(struct zswap_pool *pool) > { > struct zswap_entry *lru_entry, *tree_entry = NULL; > - struct zswap_header *zhdr; > struct zswap_tree *tree; > int swpoffset; > int ret; > @@ -611,15 +610,13 @@ static int zswap_shrink(struct zswap_pool *pool) > } > lru_entry = list_last_entry(&pool->lru, struct zswap_entry, lru); > list_del_init(&lru_entry->lru); > - zhdr = zpool_map_handle(pool->zpool, lru_entry->handle, ZPOOL_MM_RO); > - tree = zswap_trees[swp_type(zhdr->swpentry)]; > - zpool_unmap_handle(pool->zpool, lru_entry->handle); > /* > * Once the pool lock is dropped, the lru_entry might get freed. The > * swpoffset is copied to the stack, and lru_entry isn't deref'd again > * until the entry is verified to still be alive in the tree. > */ > - swpoffset = swp_offset(zhdr->swpentry); > + swpoffset = swp_offset(lru_entry->swpentry); > + tree = zswap_trees[swp_type(lru_entry->swpentry)]; > spin_unlock(&pool->lru_lock); > > /* hold a reference from tree so it won't be freed during writeback */ > @@ -633,7 +630,7 @@ static int zswap_shrink(struct zswap_pool *pool) > } > spin_unlock(&tree->lock); > > - ret = zswap_writeback_entry(lru_entry, zhdr, tree); > + ret = zswap_writeback_entry(lru_entry, tree); > > spin_lock(&tree->lock); > if (ret) { > @@ -1046,10 +1043,10 @@ static int zswap_get_swap_cache_page(swp_entry_t entry, > * the swap cache, the compressed version stored by zswap can be > * freed. > */ > -static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > +static int zswap_writeback_entry(struct zswap_entry *entry, > struct zswap_tree *tree) > { > - swp_entry_t swpentry = zhdr->swpentry; > + swp_entry_t swpentry = entry->swpentry; > struct page *page; > struct scatterlist input, output; > struct crypto_acomp_ctx *acomp_ctx; > @@ -1089,7 +1086,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header > * writing. > */ > spin_lock(&tree->lock); > - if (zswap_rb_search(&tree->rbroot, entry->offset) != entry) { > + if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) { > spin_unlock(&tree->lock); > delete_from_swap_cache(page_folio(page)); > ret = -ENOMEM; > @@ -1101,8 +1098,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > dlen = PAGE_SIZE; > > - zhdr = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); > - src = (u8 *)zhdr + sizeof(struct zswap_header); > + src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); > if (!zpool_can_sleep_mapped(pool)) { > memcpy(tmp, src, entry->length); > src = tmp; > @@ -1196,11 +1192,10 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > struct obj_cgroup *objcg = NULL; > struct zswap_pool *pool; > int ret; > - unsigned int hlen, dlen = PAGE_SIZE; > + unsigned int dlen = PAGE_SIZE; > unsigned long handle, value; > char *buf; > u8 *src, *dst; > - struct zswap_header zhdr = { .swpentry = swp_entry(type, offset) }; > gfp_t gfp; > > /* THP isn't supported */ > @@ -1245,7 +1240,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > src = kmap_atomic(page); > if (zswap_is_page_same_filled(src, &value)) { > kunmap_atomic(src); > - entry->offset = offset; > + entry->swpentry = swp_entry(type, offset); > entry->length = 0; > entry->value = value; > atomic_inc(&zswap_same_filled_pages); > @@ -1299,11 +1294,10 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > } > > /* store */ > - hlen = sizeof(zhdr); > gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; > if (zpool_malloc_support_movable(entry->pool->zpool)) > gfp |= __GFP_HIGHMEM | __GFP_MOVABLE; > - ret = zpool_malloc(entry->pool->zpool, hlen + dlen, gfp, &handle); > + ret = zpool_malloc(entry->pool->zpool, dlen, gfp, &handle); > if (ret == -ENOSPC) { > zswap_reject_compress_poor++; > goto put_dstmem; > @@ -1313,13 +1307,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > goto put_dstmem; > } > buf = zpool_map_handle(entry->pool->zpool, handle, ZPOOL_MM_WO); > - memcpy(buf, &zhdr, hlen); > - memcpy(buf + hlen, dst, dlen); > + memcpy(buf, dst, dlen); > zpool_unmap_handle(entry->pool->zpool, handle); > mutex_unlock(acomp_ctx->mutex); > > /* populate entry */ > - entry->offset = offset; > + entry->swpentry = swp_entry(type, offset); > entry->handle = handle; > entry->length = dlen; > > @@ -1418,7 +1411,6 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > /* decompress */ > dlen = PAGE_SIZE; > src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO); > - src += sizeof(struct zswap_header); > > if (!zpool_can_sleep_mapped(entry->pool->zpool)) { > memcpy(tmp, src, entry->length); > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 7/7] mm: zswap: remove zswap_header 2023-06-07 9:30 ` [RFC PATCH v2 7/7] mm: zswap: remove zswap_header Yosry Ahmed @ 2023-06-09 16:10 ` Domenico Cerasuolo 2023-06-09 17:13 ` Yosry Ahmed 0 siblings, 1 reply; 28+ messages in thread From: Domenico Cerasuolo @ 2023-06-09 16:10 UTC (permalink / raw) To: Yosry Ahmed Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Wed, Jun 7, 2023 at 11:30 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo > <cerasuolodomenico@gmail.com> wrote: > > > > Previously, zswap_header served the purpose of storing the swpentry > > within zpool pages. This allowed zpool implementations to pass relevant > > information to the writeback function. However, with the current > > implementation, writeback is directly handled within zswap. > > Consequently, there is no longer a necessity for zswap_header, as the > > swp_entry_t can be stored directly in zswap_entry. > > > > Suggested-by: Yosry Ahmed <yosryahmed@google.com> > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > > Thanks for this cleanup. It gives us back some of the memory used by > list_head in zswap entry, and remove an unnecessary zpool map > operation. > > > --- > > mm/zswap.c | 52 ++++++++++++++++++++++------------------------------ > > 1 file changed, 22 insertions(+), 30 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index ef8604812352..f689444dd5a7 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -193,7 +193,7 @@ struct zswap_pool { > > */ > > struct zswap_entry { > > struct rb_node rbnode; > > - pgoff_t offset; > > + swp_entry_t swpentry; > > int refcount; > > unsigned int length; > > struct zswap_pool *pool; > > @@ -205,10 +205,6 @@ struct zswap_entry { > > struct list_head lru; > > }; > > > > -struct zswap_header { > > - swp_entry_t swpentry; > > -}; > > - > > /* > > * The tree lock in the zswap_tree struct protects a few things: > > * - the rbtree > > @@ -250,7 +246,7 @@ static bool zswap_has_pool; > > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > > zpool_get_type((p)->zpool)) > > > > -static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > > +static int zswap_writeback_entry(struct zswap_entry *entry, > > struct zswap_tree *tree); > > static int zswap_pool_get(struct zswap_pool *pool); > > static void zswap_pool_put(struct zswap_pool *pool); > > @@ -311,12 +307,14 @@ static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset) > > { > > struct rb_node *node = root->rb_node; > > struct zswap_entry *entry; > > + pgoff_t entry_offset; > > > > while (node) { > > entry = rb_entry(node, struct zswap_entry, rbnode); > > - if (entry->offset > offset) > > + entry_offset = swp_offset(entry->swpentry); > > + if (entry_offset > offset) > > node = node->rb_left; > > - else if (entry->offset < offset) > > + else if (entry_offset < offset) > > node = node->rb_right; > > else > > return entry; > > @@ -333,13 +331,15 @@ static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry, > > { > > struct rb_node **link = &root->rb_node, *parent = NULL; > > struct zswap_entry *myentry; > > + pgoff_t myentry_offset, entry_offset = swp_offset(entry->swpentry); > > > > while (*link) { > > parent = *link; > > myentry = rb_entry(parent, struct zswap_entry, rbnode); > > - if (myentry->offset > entry->offset) > > + myentry_offset = swp_offset(myentry->swpentry); > > + if (myentry_offset > entry_offset) > > link = &(*link)->rb_left; > > - else if (myentry->offset < entry->offset) > > + else if (myentry_offset < entry_offset) > > This whole myentry thing is very confusing to me. I initially thought > myentry would be the entry passed in as an argument. Can we change the > naming here to make it more consistent with zswap_rb_search() naming? > I'm not sure I understood the suggestion, is it related to the addition of myentry_offset variable or is myentry itself that you would like to change? > > link = &(*link)->rb_right; > > else { > > *dupentry = myentry; > > @@ -598,7 +598,6 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > > static int zswap_shrink(struct zswap_pool *pool) > > { > > struct zswap_entry *lru_entry, *tree_entry = NULL; > > - struct zswap_header *zhdr; > > struct zswap_tree *tree; > > int swpoffset; > > int ret; > > @@ -611,15 +610,13 @@ static int zswap_shrink(struct zswap_pool *pool) > > } > > lru_entry = list_last_entry(&pool->lru, struct zswap_entry, lru); > > list_del_init(&lru_entry->lru); > > - zhdr = zpool_map_handle(pool->zpool, lru_entry->handle, ZPOOL_MM_RO); > > - tree = zswap_trees[swp_type(zhdr->swpentry)]; > > - zpool_unmap_handle(pool->zpool, lru_entry->handle); > > /* > > * Once the pool lock is dropped, the lru_entry might get freed. The > > * swpoffset is copied to the stack, and lru_entry isn't deref'd again > > * until the entry is verified to still be alive in the tree. > > */ > > - swpoffset = swp_offset(zhdr->swpentry); > > + swpoffset = swp_offset(lru_entry->swpentry); > > + tree = zswap_trees[swp_type(lru_entry->swpentry)]; > > spin_unlock(&pool->lru_lock); > > > > /* hold a reference from tree so it won't be freed during writeback */ > > @@ -633,7 +630,7 @@ static int zswap_shrink(struct zswap_pool *pool) > > } > > spin_unlock(&tree->lock); > > > > - ret = zswap_writeback_entry(lru_entry, zhdr, tree); > > + ret = zswap_writeback_entry(lru_entry, tree); > > > > spin_lock(&tree->lock); > > if (ret) { > > @@ -1046,10 +1043,10 @@ static int zswap_get_swap_cache_page(swp_entry_t entry, > > * the swap cache, the compressed version stored by zswap can be > > * freed. > > */ > > -static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > > +static int zswap_writeback_entry(struct zswap_entry *entry, > > struct zswap_tree *tree) > > { > > - swp_entry_t swpentry = zhdr->swpentry; > > + swp_entry_t swpentry = entry->swpentry; > > struct page *page; > > struct scatterlist input, output; > > struct crypto_acomp_ctx *acomp_ctx; > > @@ -1089,7 +1086,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header > > * writing. > > */ > > spin_lock(&tree->lock); > > - if (zswap_rb_search(&tree->rbroot, entry->offset) != entry) { > > + if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) { > > spin_unlock(&tree->lock); > > delete_from_swap_cache(page_folio(page)); > > ret = -ENOMEM; > > @@ -1101,8 +1098,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header > > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > > dlen = PAGE_SIZE; > > > > - zhdr = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); > > - src = (u8 *)zhdr + sizeof(struct zswap_header); > > + src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); > > if (!zpool_can_sleep_mapped(pool)) { > > memcpy(tmp, src, entry->length); > > src = tmp; > > @@ -1196,11 +1192,10 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > struct obj_cgroup *objcg = NULL; > > struct zswap_pool *pool; > > int ret; > > - unsigned int hlen, dlen = PAGE_SIZE; > > + unsigned int dlen = PAGE_SIZE; > > unsigned long handle, value; > > char *buf; > > u8 *src, *dst; > > - struct zswap_header zhdr = { .swpentry = swp_entry(type, offset) }; > > gfp_t gfp; > > > > /* THP isn't supported */ > > @@ -1245,7 +1240,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > src = kmap_atomic(page); > > if (zswap_is_page_same_filled(src, &value)) { > > kunmap_atomic(src); > > - entry->offset = offset; > > + entry->swpentry = swp_entry(type, offset); > > entry->length = 0; > > entry->value = value; > > atomic_inc(&zswap_same_filled_pages); > > @@ -1299,11 +1294,10 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > } > > > > /* store */ > > - hlen = sizeof(zhdr); > > gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; > > if (zpool_malloc_support_movable(entry->pool->zpool)) > > gfp |= __GFP_HIGHMEM | __GFP_MOVABLE; > > - ret = zpool_malloc(entry->pool->zpool, hlen + dlen, gfp, &handle); > > + ret = zpool_malloc(entry->pool->zpool, dlen, gfp, &handle); > > if (ret == -ENOSPC) { > > zswap_reject_compress_poor++; > > goto put_dstmem; > > @@ -1313,13 +1307,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > goto put_dstmem; > > } > > buf = zpool_map_handle(entry->pool->zpool, handle, ZPOOL_MM_WO); > > - memcpy(buf, &zhdr, hlen); > > - memcpy(buf + hlen, dst, dlen); > > + memcpy(buf, dst, dlen); > > zpool_unmap_handle(entry->pool->zpool, handle); > > mutex_unlock(acomp_ctx->mutex); > > > > /* populate entry */ > > - entry->offset = offset; > > + entry->swpentry = swp_entry(type, offset); > > entry->handle = handle; > > entry->length = dlen; > > > > @@ -1418,7 +1411,6 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > > /* decompress */ > > dlen = PAGE_SIZE; > > src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO); > > - src += sizeof(struct zswap_header); > > > > if (!zpool_can_sleep_mapped(entry->pool->zpool)) { > > memcpy(tmp, src, entry->length); > > -- > > 2.34.1 > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 7/7] mm: zswap: remove zswap_header 2023-06-09 16:10 ` Domenico Cerasuolo @ 2023-06-09 17:13 ` Yosry Ahmed 0 siblings, 0 replies; 28+ messages in thread From: Yosry Ahmed @ 2023-06-09 17:13 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Fri, Jun 9, 2023 at 9:10 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > On Wed, Jun 7, 2023 at 11:30 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo > > <cerasuolodomenico@gmail.com> wrote: > > > > > > Previously, zswap_header served the purpose of storing the swpentry > > > within zpool pages. This allowed zpool implementations to pass relevant > > > information to the writeback function. However, with the current > > > implementation, writeback is directly handled within zswap. > > > Consequently, there is no longer a necessity for zswap_header, as the > > > swp_entry_t can be stored directly in zswap_entry. > > > > > > Suggested-by: Yosry Ahmed <yosryahmed@google.com> > > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > > > > Thanks for this cleanup. It gives us back some of the memory used by > > list_head in zswap entry, and remove an unnecessary zpool map > > operation. > > > > > --- > > > mm/zswap.c | 52 ++++++++++++++++++++++------------------------------ > > > 1 file changed, 22 insertions(+), 30 deletions(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index ef8604812352..f689444dd5a7 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -193,7 +193,7 @@ struct zswap_pool { > > > */ > > > struct zswap_entry { > > > struct rb_node rbnode; > > > - pgoff_t offset; > > > + swp_entry_t swpentry; > > > int refcount; > > > unsigned int length; > > > struct zswap_pool *pool; > > > @@ -205,10 +205,6 @@ struct zswap_entry { > > > struct list_head lru; > > > }; > > > > > > -struct zswap_header { > > > - swp_entry_t swpentry; > > > -}; > > > - > > > /* > > > * The tree lock in the zswap_tree struct protects a few things: > > > * - the rbtree > > > @@ -250,7 +246,7 @@ static bool zswap_has_pool; > > > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > > > zpool_get_type((p)->zpool)) > > > > > > -static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > > > +static int zswap_writeback_entry(struct zswap_entry *entry, > > > struct zswap_tree *tree); > > > static int zswap_pool_get(struct zswap_pool *pool); > > > static void zswap_pool_put(struct zswap_pool *pool); > > > @@ -311,12 +307,14 @@ static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset) > > > { > > > struct rb_node *node = root->rb_node; > > > struct zswap_entry *entry; > > > + pgoff_t entry_offset; > > > > > > while (node) { > > > entry = rb_entry(node, struct zswap_entry, rbnode); > > > - if (entry->offset > offset) > > > + entry_offset = swp_offset(entry->swpentry); > > > + if (entry_offset > offset) > > > node = node->rb_left; > > > - else if (entry->offset < offset) > > > + else if (entry_offset < offset) > > > node = node->rb_right; > > > else > > > return entry; > > > @@ -333,13 +331,15 @@ static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry, > > > { > > > struct rb_node **link = &root->rb_node, *parent = NULL; > > > struct zswap_entry *myentry; > > > + pgoff_t myentry_offset, entry_offset = swp_offset(entry->swpentry); > > > > > > while (*link) { > > > parent = *link; > > > myentry = rb_entry(parent, struct zswap_entry, rbnode); > > > - if (myentry->offset > entry->offset) > > > + myentry_offset = swp_offset(myentry->swpentry); > > > + if (myentry_offset > entry_offset) > > > link = &(*link)->rb_left; > > > - else if (myentry->offset < entry->offset) > > > + else if (myentry_offset < entry_offset) > > > > This whole myentry thing is very confusing to me. I initially thought > > myentry would be the entry passed in as an argument. Can we change the > > naming here to make it more consistent with zswap_rb_search() naming? > > > > I'm not sure I understood the suggestion, is it related to the addition of > myentry_offset variable or is myentry itself that you would like to change? Just the variable naming myentry (which existed before your patch) and myentry_offset is confusing, because (at least to me) they seem like they refer to the argument, but they actually refer to the search iterator. It's also inconsistent with the naming convention used in zswap_rb_search(). It's just a nit though, don't overthink it :) > > > > link = &(*link)->rb_right; > > > else { > > > *dupentry = myentry; > > > @@ -598,7 +598,6 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > > > static int zswap_shrink(struct zswap_pool *pool) > > > { > > > struct zswap_entry *lru_entry, *tree_entry = NULL; > > > - struct zswap_header *zhdr; > > > struct zswap_tree *tree; > > > int swpoffset; > > > int ret; > > > @@ -611,15 +610,13 @@ static int zswap_shrink(struct zswap_pool *pool) > > > } > > > lru_entry = list_last_entry(&pool->lru, struct zswap_entry, lru); > > > list_del_init(&lru_entry->lru); > > > - zhdr = zpool_map_handle(pool->zpool, lru_entry->handle, ZPOOL_MM_RO); > > > - tree = zswap_trees[swp_type(zhdr->swpentry)]; > > > - zpool_unmap_handle(pool->zpool, lru_entry->handle); > > > /* > > > * Once the pool lock is dropped, the lru_entry might get freed. The > > > * swpoffset is copied to the stack, and lru_entry isn't deref'd again > > > * until the entry is verified to still be alive in the tree. > > > */ > > > - swpoffset = swp_offset(zhdr->swpentry); > > > + swpoffset = swp_offset(lru_entry->swpentry); > > > + tree = zswap_trees[swp_type(lru_entry->swpentry)]; > > > spin_unlock(&pool->lru_lock); > > > > > > /* hold a reference from tree so it won't be freed during writeback */ > > > @@ -633,7 +630,7 @@ static int zswap_shrink(struct zswap_pool *pool) > > > } > > > spin_unlock(&tree->lock); > > > > > > - ret = zswap_writeback_entry(lru_entry, zhdr, tree); > > > + ret = zswap_writeback_entry(lru_entry, tree); > > > > > > spin_lock(&tree->lock); > > > if (ret) { > > > @@ -1046,10 +1043,10 @@ static int zswap_get_swap_cache_page(swp_entry_t entry, > > > * the swap cache, the compressed version stored by zswap can be > > > * freed. > > > */ > > > -static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > > > +static int zswap_writeback_entry(struct zswap_entry *entry, > > > struct zswap_tree *tree) > > > { > > > - swp_entry_t swpentry = zhdr->swpentry; > > > + swp_entry_t swpentry = entry->swpentry; > > > struct page *page; > > > struct scatterlist input, output; > > > struct crypto_acomp_ctx *acomp_ctx; > > > @@ -1089,7 +1086,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header > > > * writing. > > > */ > > > spin_lock(&tree->lock); > > > - if (zswap_rb_search(&tree->rbroot, entry->offset) != entry) { > > > + if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) { > > > spin_unlock(&tree->lock); > > > delete_from_swap_cache(page_folio(page)); > > > ret = -ENOMEM; > > > @@ -1101,8 +1098,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header > > > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > > > dlen = PAGE_SIZE; > > > > > > - zhdr = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); > > > - src = (u8 *)zhdr + sizeof(struct zswap_header); > > > + src = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); > > > if (!zpool_can_sleep_mapped(pool)) { > > > memcpy(tmp, src, entry->length); > > > src = tmp; > > > @@ -1196,11 +1192,10 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > > struct obj_cgroup *objcg = NULL; > > > struct zswap_pool *pool; > > > int ret; > > > - unsigned int hlen, dlen = PAGE_SIZE; > > > + unsigned int dlen = PAGE_SIZE; > > > unsigned long handle, value; > > > char *buf; > > > u8 *src, *dst; > > > - struct zswap_header zhdr = { .swpentry = swp_entry(type, offset) }; > > > gfp_t gfp; > > > > > > /* THP isn't supported */ > > > @@ -1245,7 +1240,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > > src = kmap_atomic(page); > > > if (zswap_is_page_same_filled(src, &value)) { > > > kunmap_atomic(src); > > > - entry->offset = offset; > > > + entry->swpentry = swp_entry(type, offset); > > > entry->length = 0; > > > entry->value = value; > > > atomic_inc(&zswap_same_filled_pages); > > > @@ -1299,11 +1294,10 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > > } > > > > > > /* store */ > > > - hlen = sizeof(zhdr); > > > gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; > > > if (zpool_malloc_support_movable(entry->pool->zpool)) > > > gfp |= __GFP_HIGHMEM | __GFP_MOVABLE; > > > - ret = zpool_malloc(entry->pool->zpool, hlen + dlen, gfp, &handle); > > > + ret = zpool_malloc(entry->pool->zpool, dlen, gfp, &handle); > > > if (ret == -ENOSPC) { > > > zswap_reject_compress_poor++; > > > goto put_dstmem; > > > @@ -1313,13 +1307,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > > goto put_dstmem; > > > } > > > buf = zpool_map_handle(entry->pool->zpool, handle, ZPOOL_MM_WO); > > > - memcpy(buf, &zhdr, hlen); > > > - memcpy(buf + hlen, dst, dlen); > > > + memcpy(buf, dst, dlen); > > > zpool_unmap_handle(entry->pool->zpool, handle); > > > mutex_unlock(acomp_ctx->mutex); > > > > > > /* populate entry */ > > > - entry->offset = offset; > > > + entry->swpentry = swp_entry(type, offset); > > > entry->handle = handle; > > > entry->length = dlen; > > > > > > @@ -1418,7 +1411,6 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > > > /* decompress */ > > > dlen = PAGE_SIZE; > > > src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO); > > > - src += sizeof(struct zswap_header); > > > > > > if (!zpool_can_sleep_mapped(entry->pool->zpool)) { > > > memcpy(tmp, src, entry->length); > > > -- > > > 2.34.1 > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20230606145611.704392-3-cerasuolodomenico@gmail.com>]
* Re: [RFC PATCH v2 2/7] mm: zswap: remove page reclaim logic from zbud [not found] ` <20230606145611.704392-3-cerasuolodomenico@gmail.com> @ 2023-06-08 16:02 ` Johannes Weiner 0 siblings, 0 replies; 28+ messages in thread From: Johannes Weiner @ 2023-06-08 16:02 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, akpm, linux-kernel, kernel-team Hi Domenico, On Tue, Jun 06, 2023 at 04:56:06PM +0200, Domenico Cerasuolo wrote: > With the recent enhancement to zswap enabling direct page writeback, the > need for the shrink code in zbud has become obsolete. As a result, this > commit removes the page reclaim logic from zbud entirely. > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> I find the changelog a bit difficult to understand. How about: Switch zbud to the new generic zswap LRU and remove its custom implementation. > @@ -585,7 +425,7 @@ static struct zpool_driver zbud_zpool_driver = { > .destroy = zbud_zpool_destroy, > .malloc = zbud_zpool_malloc, > .free = zbud_zpool_free, > - .shrink = zbud_zpool_shrink, > + .shrink = NULL, > .map = zbud_zpool_map, > .unmap = zbud_zpool_unmap, > .total_size = zbud_zpool_total_size, Like Minchan pointed out in the zsmalloc patch, you can just remove the line as the member is NULL by default. With that, please add: Acked-by: Johannes Weiner <hannes@cmpxchg.org> ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20230606145611.704392-4-cerasuolodomenico@gmail.com>]
* Re: [RFC PATCH v2 3/7] mm: zswap: remove page reclaim logic from z3fold [not found] ` <20230606145611.704392-4-cerasuolodomenico@gmail.com> @ 2023-06-08 16:05 ` Johannes Weiner 0 siblings, 0 replies; 28+ messages in thread From: Johannes Weiner @ 2023-06-08 16:05 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, akpm, linux-kernel, kernel-team On Tue, Jun 06, 2023 at 04:56:07PM +0200, Domenico Cerasuolo wrote: > With the recent enhancement to zswap enabling direct page writeback, the > need for the shrink code in z3fold has become obsolete. As a result, > this commit removes the page reclaim logic from z3fold entirely. > @@ -1649,7 +1409,7 @@ static struct zpool_driver z3fold_zpool_driver = { > .destroy = z3fold_zpool_destroy, > .malloc = z3fold_zpool_malloc, > .free = z3fold_zpool_free, > - .shrink = z3fold_zpool_shrink, > + .shrink = NULL, > .map = z3fold_zpool_map, > .unmap = z3fold_zpool_unmap, > .total_size = z3fold_zpool_total_size, Same two points of feedback on the changelog and the NULL as for zbud. Otherwise, Acked-by: Johannes Weiner <hannes@cmpxchg.org> ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20230606145611.704392-5-cerasuolodomenico@gmail.com>]
* Re: [RFC PATCH v2 4/7] mm: zswap: remove page reclaim logic from zsmalloc [not found] ` <20230606145611.704392-5-cerasuolodomenico@gmail.com> @ 2023-06-07 17:23 ` Nhat Pham 2023-06-07 17:45 ` Minchan Kim 2023-06-08 16:07 ` Johannes Weiner 2 siblings, 0 replies; 28+ messages in thread From: Nhat Pham @ 2023-06-07 17:23 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, hannes, akpm, linux-kernel, kernel-team On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > With the recent enhancement to zswap enabling direct page writeback, the > need for the shrink code in zsmalloc has become obsolete. As a result, > this commit removes the page reclaim logic from zsmalloc entirely. > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > --- > mm/zsmalloc.c | 393 ++------------------------------------------------ > 1 file changed, 13 insertions(+), 380 deletions(-) > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 02f7f414aade..75386283dba0 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -107,21 +107,8 @@ > */ > #define OBJ_ALLOCATED_TAG 1 > > -#ifdef CONFIG_ZPOOL > -/* > - * The second least-significant bit in the object's header identifies if the > - * value stored at the header is a deferred handle from the last reclaim > - * attempt. > - * > - * As noted above, this is valid because we have room for two bits. > - */ > -#define OBJ_DEFERRED_HANDLE_TAG 2 > -#define OBJ_TAG_BITS 2 > -#define OBJ_TAG_MASK (OBJ_ALLOCATED_TAG | OBJ_DEFERRED_HANDLE_TAG) > -#else > #define OBJ_TAG_BITS 1 > #define OBJ_TAG_MASK OBJ_ALLOCATED_TAG > -#endif /* CONFIG_ZPOOL */ > > #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) > #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) > @@ -227,12 +214,6 @@ struct link_free { > * Handle of allocated object. > */ > unsigned long handle; > -#ifdef CONFIG_ZPOOL > - /* > - * Deferred handle of a reclaimed object. > - */ > - unsigned long deferred_handle; > -#endif > }; > }; > > @@ -250,13 +231,6 @@ struct zs_pool { > /* Compact classes */ > struct shrinker shrinker; > > -#ifdef CONFIG_ZPOOL > - /* List tracking the zspages in LRU order by most recently added object */ > - struct list_head lru; > - struct zpool *zpool; > - const struct zpool_ops *zpool_ops; > -#endif > - > #ifdef CONFIG_ZSMALLOC_STAT > struct dentry *stat_dentry; > #endif > @@ -279,13 +253,6 @@ struct zspage { > unsigned int freeobj; > struct page *first_page; > struct list_head list; /* fullness list */ > - > -#ifdef CONFIG_ZPOOL > - /* links the zspage to the lru list in the pool */ > - struct list_head lru; > - bool under_reclaim; > -#endif > - > struct zs_pool *pool; > rwlock_t lock; > }; > @@ -393,14 +360,7 @@ static void *zs_zpool_create(const char *name, gfp_t gfp, > * different contexts and its caller must provide a valid > * gfp mask. > */ > - struct zs_pool *pool = zs_create_pool(name); > - > - if (pool) { > - pool->zpool = zpool; > - pool->zpool_ops = zpool_ops; > - } > - > - return pool; > + return zs_create_pool(name); > } > > static void zs_zpool_destroy(void *pool) > @@ -422,27 +382,6 @@ static void zs_zpool_free(void *pool, unsigned long handle) > zs_free(pool, handle); > } > > -static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries); > - > -static int zs_zpool_shrink(void *pool, unsigned int pages, > - unsigned int *reclaimed) > -{ > - unsigned int total = 0; > - int ret = -EINVAL; > - > - while (total < pages) { > - ret = zs_reclaim_page(pool, 8); > - if (ret < 0) > - break; > - total++; > - } > - > - if (reclaimed) > - *reclaimed = total; > - > - return ret; > -} > - > static void *zs_zpool_map(void *pool, unsigned long handle, > enum zpool_mapmode mm) > { > @@ -481,7 +420,7 @@ static struct zpool_driver zs_zpool_driver = { > .malloc_support_movable = true, > .malloc = zs_zpool_malloc, > .free = zs_zpool_free, > - .shrink = zs_zpool_shrink, > + .shrink = NULL, > .map = zs_zpool_map, > .unmap = zs_zpool_unmap, > .total_size = zs_zpool_total_size, > @@ -884,14 +823,6 @@ static inline bool obj_allocated(struct page *page, void *obj, unsigned long *ph > return obj_tagged(page, obj, phandle, OBJ_ALLOCATED_TAG); > } > > -#ifdef CONFIG_ZPOOL > -static bool obj_stores_deferred_handle(struct page *page, void *obj, > - unsigned long *phandle) > -{ > - return obj_tagged(page, obj, phandle, OBJ_DEFERRED_HANDLE_TAG); > -} > -#endif > - > static void reset_page(struct page *page) > { > __ClearPageMovable(page); > @@ -922,39 +853,6 @@ static int trylock_zspage(struct zspage *zspage) > return 0; > } > > -#ifdef CONFIG_ZPOOL > -static unsigned long find_deferred_handle_obj(struct size_class *class, > - struct page *page, int *obj_idx); > - > -/* > - * Free all the deferred handles whose objects are freed in zs_free. > - */ > -static void free_handles(struct zs_pool *pool, struct size_class *class, > - struct zspage *zspage) > -{ > - int obj_idx = 0; > - struct page *page = get_first_page(zspage); > - unsigned long handle; > - > - while (1) { > - handle = find_deferred_handle_obj(class, page, &obj_idx); > - if (!handle) { > - page = get_next_page(page); > - if (!page) > - break; > - obj_idx = 0; > - continue; > - } > - > - cache_free_handle(pool, handle); > - obj_idx++; > - } > -} > -#else > -static inline void free_handles(struct zs_pool *pool, struct size_class *class, > - struct zspage *zspage) {} > -#endif > - > static void __free_zspage(struct zs_pool *pool, struct size_class *class, > struct zspage *zspage) > { > @@ -969,9 +867,6 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class, > VM_BUG_ON(get_zspage_inuse(zspage)); > VM_BUG_ON(fg != ZS_INUSE_RATIO_0); > > - /* Free all deferred handles from zs_free */ > - free_handles(pool, class, zspage); > - > next = page = get_first_page(zspage); > do { > VM_BUG_ON_PAGE(!PageLocked(page), page); > @@ -1006,9 +901,6 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class, > } > > remove_zspage(class, zspage, ZS_INUSE_RATIO_0); > -#ifdef CONFIG_ZPOOL > - list_del(&zspage->lru); > -#endif > __free_zspage(pool, class, zspage); > } > > @@ -1054,11 +946,6 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) > off %= PAGE_SIZE; > } > > -#ifdef CONFIG_ZPOOL > - INIT_LIST_HEAD(&zspage->lru); > - zspage->under_reclaim = false; > -#endif > - > set_freeobj(zspage, 0); > } > > @@ -1525,20 +1412,13 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) > /* We completely set up zspage so mark them as movable */ > SetZsPageMovable(pool, zspage); > out: > -#ifdef CONFIG_ZPOOL > - /* Add/move zspage to beginning of LRU */ > - if (!list_empty(&zspage->lru)) > - list_del(&zspage->lru); > - list_add(&zspage->lru, &pool->lru); > -#endif > - > spin_unlock(&pool->lock); > > return handle; > } > EXPORT_SYMBOL_GPL(zs_malloc); > > -static void obj_free(int class_size, unsigned long obj, unsigned long *handle) > +static void obj_free(int class_size, unsigned long obj) > { > struct link_free *link; > struct zspage *zspage; > @@ -1554,25 +1434,12 @@ static void obj_free(int class_size, unsigned long obj, unsigned long *handle) > vaddr = kmap_atomic(f_page); > link = (struct link_free *)(vaddr + f_offset); > > - if (handle) { > -#ifdef CONFIG_ZPOOL > - /* Stores the (deferred) handle in the object's header */ > - *handle |= OBJ_DEFERRED_HANDLE_TAG; > - *handle &= ~OBJ_ALLOCATED_TAG; > - > - if (likely(!ZsHugePage(zspage))) > - link->deferred_handle = *handle; > - else > - f_page->index = *handle; > -#endif > - } else { > - /* Insert this object in containing zspage's freelist */ > - if (likely(!ZsHugePage(zspage))) > - link->next = get_freeobj(zspage) << OBJ_TAG_BITS; > - else > - f_page->index = 0; > - set_freeobj(zspage, f_objidx); > - } > + /* Insert this object in containing zspage's freelist */ > + if (likely(!ZsHugePage(zspage))) > + link->next = get_freeobj(zspage) << OBJ_TAG_BITS; > + else > + f_page->index = 0; > + set_freeobj(zspage, f_objidx); > > kunmap_atomic(vaddr); > mod_zspage_inuse(zspage, -1); > @@ -1600,21 +1467,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle) > class = zspage_class(pool, zspage); > > class_stat_dec(class, ZS_OBJS_INUSE, 1); > - > -#ifdef CONFIG_ZPOOL > - if (zspage->under_reclaim) { > - /* > - * Reclaim needs the handles during writeback. It'll free > - * them along with the zspage when it's done with them. > - * > - * Record current deferred handle in the object's header. > - */ > - obj_free(class->size, obj, &handle); > - spin_unlock(&pool->lock); > - return; > - } > -#endif > - obj_free(class->size, obj, NULL); > + obj_free(class->size, obj); > > fullness = fix_fullness_group(class, zspage); > if (fullness == ZS_INUSE_RATIO_0) > @@ -1735,18 +1588,6 @@ static unsigned long find_alloced_obj(struct size_class *class, > return find_tagged_obj(class, page, obj_idx, OBJ_ALLOCATED_TAG); > } > > -#ifdef CONFIG_ZPOOL > -/* > - * Find object storing a deferred handle in header in zspage from index object > - * and return handle. > - */ > -static unsigned long find_deferred_handle_obj(struct size_class *class, > - struct page *page, int *obj_idx) > -{ > - return find_tagged_obj(class, page, obj_idx, OBJ_DEFERRED_HANDLE_TAG); > -} > -#endif > - > struct zs_compact_control { > /* Source spage for migration which could be a subpage of zspage */ > struct page *s_page; > @@ -1786,7 +1627,7 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class, > zs_object_copy(class, free_obj, used_obj); > obj_idx++; > record_obj(handle, free_obj); > - obj_free(class->size, used_obj, NULL); > + obj_free(class->size, used_obj); > } > > /* Remember last position in this iteration */ > @@ -1846,7 +1687,7 @@ static int putback_zspage(struct size_class *class, struct zspage *zspage) > return fullness; > } > > -#if defined(CONFIG_ZPOOL) || defined(CONFIG_COMPACTION) > +#ifdef CONFIG_COMPACTION > /* > * To prevent zspage destroy during migration, zspage freeing should > * hold locks of all pages in the zspage. > @@ -1888,24 +1729,7 @@ static void lock_zspage(struct zspage *zspage) > } > migrate_read_unlock(zspage); > } > -#endif /* defined(CONFIG_ZPOOL) || defined(CONFIG_COMPACTION) */ > - > -#ifdef CONFIG_ZPOOL > -/* > - * Unlocks all the pages of the zspage. > - * > - * pool->lock must be held before this function is called > - * to prevent the underlying pages from migrating. > - */ > -static void unlock_zspage(struct zspage *zspage) > -{ > - struct page *page = get_first_page(zspage); > - > - do { > - unlock_page(page); > - } while ((page = get_next_page(page)) != NULL); > -} > -#endif /* CONFIG_ZPOOL */ > +#endif /* CONFIG_COMPACTION */ > > static void migrate_lock_init(struct zspage *zspage) > { > @@ -2126,9 +1950,6 @@ static void async_free_zspage(struct work_struct *work) > VM_BUG_ON(fullness != ZS_INUSE_RATIO_0); > class = pool->size_class[class_idx]; > spin_lock(&pool->lock); > -#ifdef CONFIG_ZPOOL > - list_del(&zspage->lru); > -#endif > __free_zspage(pool, class, zspage); > spin_unlock(&pool->lock); > } > @@ -2474,10 +2295,6 @@ struct zs_pool *zs_create_pool(const char *name) > */ > zs_register_shrinker(pool); > > -#ifdef CONFIG_ZPOOL > - INIT_LIST_HEAD(&pool->lru); > -#endif > - > return pool; > > err: > @@ -2520,190 +2337,6 @@ void zs_destroy_pool(struct zs_pool *pool) > } > EXPORT_SYMBOL_GPL(zs_destroy_pool); > > -#ifdef CONFIG_ZPOOL > -static void restore_freelist(struct zs_pool *pool, struct size_class *class, > - struct zspage *zspage) > -{ > - unsigned int obj_idx = 0; > - unsigned long handle, off = 0; /* off is within-page offset */ > - struct page *page = get_first_page(zspage); > - struct link_free *prev_free = NULL; > - void *prev_page_vaddr = NULL; > - > - /* in case no free object found */ > - set_freeobj(zspage, (unsigned int)(-1UL)); > - > - while (page) { > - void *vaddr = kmap_atomic(page); > - struct page *next_page; > - > - while (off < PAGE_SIZE) { > - void *obj_addr = vaddr + off; > - > - /* skip allocated object */ > - if (obj_allocated(page, obj_addr, &handle)) { > - obj_idx++; > - off += class->size; > - continue; > - } > - > - /* free deferred handle from reclaim attempt */ > - if (obj_stores_deferred_handle(page, obj_addr, &handle)) > - cache_free_handle(pool, handle); > - > - if (prev_free) > - prev_free->next = obj_idx << OBJ_TAG_BITS; > - else /* first free object found */ > - set_freeobj(zspage, obj_idx); > - > - prev_free = (struct link_free *)vaddr + off / sizeof(*prev_free); > - /* if last free object in a previous page, need to unmap */ > - if (prev_page_vaddr) { > - kunmap_atomic(prev_page_vaddr); > - prev_page_vaddr = NULL; > - } > - > - obj_idx++; > - off += class->size; > - } > - > - /* > - * Handle the last (full or partial) object on this page. > - */ > - next_page = get_next_page(page); > - if (next_page) { > - if (!prev_free || prev_page_vaddr) { > - /* > - * There is no free object in this page, so we can safely > - * unmap it. > - */ > - kunmap_atomic(vaddr); > - } else { > - /* update prev_page_vaddr since prev_free is on this page */ > - prev_page_vaddr = vaddr; > - } > - } else { /* this is the last page */ > - if (prev_free) { > - /* > - * Reset OBJ_TAG_BITS bit to last link to tell > - * whether it's allocated object or not. > - */ > - prev_free->next = -1UL << OBJ_TAG_BITS; > - } > - > - /* unmap previous page (if not done yet) */ > - if (prev_page_vaddr) { > - kunmap_atomic(prev_page_vaddr); > - prev_page_vaddr = NULL; > - } > - > - kunmap_atomic(vaddr); > - } > - > - page = next_page; > - off %= PAGE_SIZE; > - } > -} > - > -static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries) > -{ > - int i, obj_idx, ret = 0; > - unsigned long handle; > - struct zspage *zspage; > - struct page *page; > - int fullness; > - > - /* Lock LRU and fullness list */ > - spin_lock(&pool->lock); > - if (list_empty(&pool->lru)) { > - spin_unlock(&pool->lock); > - return -EINVAL; > - } > - > - for (i = 0; i < retries; i++) { > - struct size_class *class; > - > - zspage = list_last_entry(&pool->lru, struct zspage, lru); > - list_del(&zspage->lru); > - > - /* zs_free may free objects, but not the zspage and handles */ > - zspage->under_reclaim = true; > - > - class = zspage_class(pool, zspage); > - fullness = get_fullness_group(class, zspage); > - > - /* Lock out object allocations and object compaction */ > - remove_zspage(class, zspage, fullness); > - > - spin_unlock(&pool->lock); > - cond_resched(); > - > - /* Lock backing pages into place */ > - lock_zspage(zspage); > - > - obj_idx = 0; > - page = get_first_page(zspage); > - while (1) { > - handle = find_alloced_obj(class, page, &obj_idx); > - if (!handle) { > - page = get_next_page(page); > - if (!page) > - break; > - obj_idx = 0; > - continue; > - } > - > - /* > - * This will write the object and call zs_free. > - * > - * zs_free will free the object, but the > - * under_reclaim flag prevents it from freeing > - * the zspage altogether. This is necessary so > - * that we can continue working with the > - * zspage potentially after the last object > - * has been freed. > - */ > - ret = pool->zpool_ops->evict(pool->zpool, handle); > - if (ret) > - goto next; > - > - obj_idx++; > - } > - > -next: > - /* For freeing the zspage, or putting it back in the pool and LRU list. */ > - spin_lock(&pool->lock); > - zspage->under_reclaim = false; > - > - if (!get_zspage_inuse(zspage)) { > - /* > - * Fullness went stale as zs_free() won't touch it > - * while the page is removed from the pool. Fix it > - * up for the check in __free_zspage(). > - */ > - zspage->fullness = ZS_INUSE_RATIO_0; > - > - __free_zspage(pool, class, zspage); > - spin_unlock(&pool->lock); > - return 0; > - } > - > - /* > - * Eviction fails on one of the handles, so we need to restore zspage. > - * We need to rebuild its freelist (and free stored deferred handles), > - * put it back to the correct size class, and add it to the LRU list. > - */ > - restore_freelist(pool, class, zspage); > - putback_zspage(class, zspage); > - list_add(&zspage->lru, &pool->lru); > - unlock_zspage(zspage); > - } > - > - spin_unlock(&pool->lock); > - return -EAGAIN; > -} > -#endif /* CONFIG_ZPOOL */ > - > static int __init zs_init(void) > { > int ret; > -- > 2.34.1 > LGTM! Thanks for the clean-up, Domenico. Acked-by: Nhat Pham <nphamcs@gmail.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 4/7] mm: zswap: remove page reclaim logic from zsmalloc [not found] ` <20230606145611.704392-5-cerasuolodomenico@gmail.com> 2023-06-07 17:23 ` [RFC PATCH v2 4/7] mm: zswap: remove page reclaim logic from zsmalloc Nhat Pham @ 2023-06-07 17:45 ` Minchan Kim 2023-06-08 16:07 ` Johannes Weiner 2 siblings, 0 replies; 28+ messages in thread From: Minchan Kim @ 2023-06-07 17:45 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Tue, Jun 06, 2023 at 04:56:08PM +0200, Domenico Cerasuolo wrote: > With the recent enhancement to zswap enabling direct page writeback, the > need for the shrink code in zsmalloc has become obsolete. As a result, > this commit removes the page reclaim logic from zsmalloc entirely. > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > --- > mm/zsmalloc.c | 393 ++------------------------------------------------ > 1 file changed, 13 insertions(+), 380 deletions(-) > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 02f7f414aade..75386283dba0 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -107,21 +107,8 @@ > */ > #define OBJ_ALLOCATED_TAG 1 > > -#ifdef CONFIG_ZPOOL > -/* > - * The second least-significant bit in the object's header identifies if the > - * value stored at the header is a deferred handle from the last reclaim > - * attempt. > - * > - * As noted above, this is valid because we have room for two bits. > - */ > -#define OBJ_DEFERRED_HANDLE_TAG 2 > -#define OBJ_TAG_BITS 2 > -#define OBJ_TAG_MASK (OBJ_ALLOCATED_TAG | OBJ_DEFERRED_HANDLE_TAG) > -#else > #define OBJ_TAG_BITS 1 > #define OBJ_TAG_MASK OBJ_ALLOCATED_TAG > -#endif /* CONFIG_ZPOOL */ > > #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) > #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) > @@ -227,12 +214,6 @@ struct link_free { > * Handle of allocated object. > */ > unsigned long handle; > -#ifdef CONFIG_ZPOOL > - /* > - * Deferred handle of a reclaimed object. > - */ > - unsigned long deferred_handle; > -#endif > }; > }; > > @@ -250,13 +231,6 @@ struct zs_pool { > /* Compact classes */ > struct shrinker shrinker; > > -#ifdef CONFIG_ZPOOL > - /* List tracking the zspages in LRU order by most recently added object */ > - struct list_head lru; > - struct zpool *zpool; > - const struct zpool_ops *zpool_ops; > -#endif > - > #ifdef CONFIG_ZSMALLOC_STAT > struct dentry *stat_dentry; > #endif > @@ -279,13 +253,6 @@ struct zspage { > unsigned int freeobj; > struct page *first_page; > struct list_head list; /* fullness list */ > - > -#ifdef CONFIG_ZPOOL > - /* links the zspage to the lru list in the pool */ > - struct list_head lru; > - bool under_reclaim; > -#endif > - > struct zs_pool *pool; > rwlock_t lock; > }; > @@ -393,14 +360,7 @@ static void *zs_zpool_create(const char *name, gfp_t gfp, > * different contexts and its caller must provide a valid > * gfp mask. > */ > - struct zs_pool *pool = zs_create_pool(name); > - > - if (pool) { > - pool->zpool = zpool; > - pool->zpool_ops = zpool_ops; > - } > - > - return pool; > + return zs_create_pool(name); > } > > static void zs_zpool_destroy(void *pool) > @@ -422,27 +382,6 @@ static void zs_zpool_free(void *pool, unsigned long handle) > zs_free(pool, handle); > } > > -static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries); > - > -static int zs_zpool_shrink(void *pool, unsigned int pages, > - unsigned int *reclaimed) > -{ > - unsigned int total = 0; > - int ret = -EINVAL; > - > - while (total < pages) { > - ret = zs_reclaim_page(pool, 8); > - if (ret < 0) > - break; > - total++; > - } > - > - if (reclaimed) > - *reclaimed = total; > - > - return ret; > -} > - > static void *zs_zpool_map(void *pool, unsigned long handle, > enum zpool_mapmode mm) > { > @@ -481,7 +420,7 @@ static struct zpool_driver zs_zpool_driver = { > .malloc_support_movable = true, > .malloc = zs_zpool_malloc, > .free = zs_zpool_free, > - .shrink = zs_zpool_shrink, > + .shrink = NULL, You can simply delete the line instead since the NULL is default behavior. Other than that, Super nice. Thanks, Domenico! Acked-by: Minchan Kim <minchan@kernel.org> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 4/7] mm: zswap: remove page reclaim logic from zsmalloc [not found] ` <20230606145611.704392-5-cerasuolodomenico@gmail.com> 2023-06-07 17:23 ` [RFC PATCH v2 4/7] mm: zswap: remove page reclaim logic from zsmalloc Nhat Pham 2023-06-07 17:45 ` Minchan Kim @ 2023-06-08 16:07 ` Johannes Weiner 2 siblings, 0 replies; 28+ messages in thread From: Johannes Weiner @ 2023-06-08 16:07 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, akpm, linux-kernel, kernel-team On Tue, Jun 06, 2023 at 04:56:08PM +0200, Domenico Cerasuolo wrote: > With the recent enhancement to zswap enabling direct page writeback, the > need for the shrink code in zsmalloc has become obsolete. As a result, > this commit removes the page reclaim logic from zsmalloc entirely. > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > --- > mm/zsmalloc.c | 393 ++------------------------------------------------ > 1 file changed, 13 insertions(+), 380 deletions(-) Yowsa. Acked-by: Johannes Weiner <hannes@cmpxchg.org> ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20230606145611.704392-6-cerasuolodomenico@gmail.com>]
* Re: [RFC PATCH v2 5/7] mm: zswap: remove shrink from zpool interface [not found] ` <20230606145611.704392-6-cerasuolodomenico@gmail.com> @ 2023-06-07 9:19 ` Yosry Ahmed 2023-06-08 16:10 ` Johannes Weiner 2023-06-08 17:51 ` Nhat Pham 2 siblings, 0 replies; 28+ messages in thread From: Yosry Ahmed @ 2023-06-07 9:19 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > Now that all three zswap backends have removed their shrink code, it is > no longer necessary for the zpool interface to include shrink/writeback > endpoints. > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> I will leave reviewing the driver-specific patches to the respective maintainers, but this cleanup LGTM otherwise. Reviewed-by: Yosry Ahmed <yosryahmed@google.com> > --- > include/linux/zpool.h | 19 ++--------------- > mm/z3fold.c | 5 +---- > mm/zbud.c | 5 +---- > mm/zpool.c | 48 ++----------------------------------------- > mm/zsmalloc.c | 5 +---- > mm/zswap.c | 27 +++++++----------------- > 6 files changed, 14 insertions(+), 95 deletions(-) > > diff --git a/include/linux/zpool.h b/include/linux/zpool.h > index e8997010612a..6b15a4213de5 100644 > --- a/include/linux/zpool.h > +++ b/include/linux/zpool.h > @@ -14,10 +14,6 @@ > > struct zpool; > > -struct zpool_ops { > - int (*evict)(struct zpool *pool, unsigned long handle); > -}; > - > /* > * Control how a handle is mapped. It will be ignored if the > * implementation does not support it. Its use is optional. > @@ -40,7 +36,7 @@ enum zpool_mapmode { > bool zpool_has_pool(char *type); > > struct zpool *zpool_create_pool(const char *type, const char *name, > - gfp_t gfp, const struct zpool_ops *ops); > + gfp_t gfp); > > const char *zpool_get_type(struct zpool *pool); > > @@ -53,9 +49,6 @@ int zpool_malloc(struct zpool *pool, size_t size, gfp_t gfp, > > void zpool_free(struct zpool *pool, unsigned long handle); > > -int zpool_shrink(struct zpool *pool, unsigned int pages, > - unsigned int *reclaimed); > - > void *zpool_map_handle(struct zpool *pool, unsigned long handle, > enum zpool_mapmode mm); > > @@ -72,7 +65,6 @@ u64 zpool_get_total_size(struct zpool *pool); > * @destroy: destroy a pool. > * @malloc: allocate mem from a pool. > * @free: free mem from a pool. > - * @shrink: shrink the pool. > * @sleep_mapped: whether zpool driver can sleep during map. > * @map: map a handle. > * @unmap: unmap a handle. > @@ -87,10 +79,7 @@ struct zpool_driver { > atomic_t refcount; > struct list_head list; > > - void *(*create)(const char *name, > - gfp_t gfp, > - const struct zpool_ops *ops, > - struct zpool *zpool); > + void *(*create)(const char *name, gfp_t gfp); > void (*destroy)(void *pool); > > bool malloc_support_movable; > @@ -98,9 +87,6 @@ struct zpool_driver { > unsigned long *handle); > void (*free)(void *pool, unsigned long handle); > > - int (*shrink)(void *pool, unsigned int pages, > - unsigned int *reclaimed); > - > bool sleep_mapped; > void *(*map)(void *pool, unsigned long handle, > enum zpool_mapmode mm); > @@ -113,7 +99,6 @@ void zpool_register_driver(struct zpool_driver *driver); > > int zpool_unregister_driver(struct zpool_driver *driver); > > -bool zpool_evictable(struct zpool *pool); > bool zpool_can_sleep_mapped(struct zpool *pool); > > #endif > diff --git a/mm/z3fold.c b/mm/z3fold.c > index 4af8741553ac..e84de91ecccb 100644 > --- a/mm/z3fold.c > +++ b/mm/z3fold.c > @@ -1364,9 +1364,7 @@ static const struct movable_operations z3fold_mops = { > * zpool > ****************/ > > -static void *z3fold_zpool_create(const char *name, gfp_t gfp, > - const struct zpool_ops *zpool_ops, > - struct zpool *zpool) > +static void *z3fold_zpool_create(const char *name, gfp_t gfp) > { > return z3fold_create_pool(name, gfp); > } > @@ -1409,7 +1407,6 @@ static struct zpool_driver z3fold_zpool_driver = { > .destroy = z3fold_zpool_destroy, > .malloc = z3fold_zpool_malloc, > .free = z3fold_zpool_free, > - .shrink = NULL, > .map = z3fold_zpool_map, > .unmap = z3fold_zpool_unmap, > .total_size = z3fold_zpool_total_size, > diff --git a/mm/zbud.c b/mm/zbud.c > index 19bc662ef5e9..2190cc1f37b3 100644 > --- a/mm/zbud.c > +++ b/mm/zbud.c > @@ -380,9 +380,7 @@ static u64 zbud_get_pool_size(struct zbud_pool *pool) > * zpool > ****************/ > > -static void *zbud_zpool_create(const char *name, gfp_t gfp, > - const struct zpool_ops *zpool_ops, > - struct zpool *zpool) > +static void *zbud_zpool_create(const char *name, gfp_t gfp) > { > return zbud_create_pool(gfp); > } > @@ -425,7 +423,6 @@ static struct zpool_driver zbud_zpool_driver = { > .destroy = zbud_zpool_destroy, > .malloc = zbud_zpool_malloc, > .free = zbud_zpool_free, > - .shrink = NULL, > .map = zbud_zpool_map, > .unmap = zbud_zpool_unmap, > .total_size = zbud_zpool_total_size, > diff --git a/mm/zpool.c b/mm/zpool.c > index 6a19c4a58f77..846410479c2f 100644 > --- a/mm/zpool.c > +++ b/mm/zpool.c > @@ -133,7 +133,6 @@ EXPORT_SYMBOL(zpool_has_pool); > * @type: The type of the zpool to create (e.g. zbud, zsmalloc) > * @name: The name of the zpool (e.g. zram0, zswap) > * @gfp: The GFP flags to use when allocating the pool. > - * @ops: The optional ops callback. > * > * This creates a new zpool of the specified type. The gfp flags will be > * used when allocating memory, if the implementation supports it. If the > @@ -145,8 +144,7 @@ EXPORT_SYMBOL(zpool_has_pool); > * > * Returns: New zpool on success, NULL on failure. > */ > -struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp, > - const struct zpool_ops *ops) > +struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp) > { > struct zpool_driver *driver; > struct zpool *zpool; > @@ -173,7 +171,7 @@ struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp, > } > > zpool->driver = driver; > - zpool->pool = driver->create(name, gfp, ops, zpool); > + zpool->pool = driver->create(name, gfp); > > if (!zpool->pool) { > pr_err("couldn't create %s pool\n", type); > @@ -279,30 +277,6 @@ void zpool_free(struct zpool *zpool, unsigned long handle) > zpool->driver->free(zpool->pool, handle); > } > > -/** > - * zpool_shrink() - Shrink the pool size > - * @zpool: The zpool to shrink. > - * @pages: The number of pages to shrink the pool. > - * @reclaimed: The number of pages successfully evicted. > - * > - * This attempts to shrink the actual memory size of the pool > - * by evicting currently used handle(s). If the pool was > - * created with no zpool_ops, or the evict call fails for any > - * of the handles, this will fail. If non-NULL, the @reclaimed > - * parameter will be set to the number of pages reclaimed, > - * which may be more than the number of pages requested. > - * > - * Implementations must guarantee this to be thread-safe. > - * > - * Returns: 0 on success, negative value on error/failure. > - */ > -int zpool_shrink(struct zpool *zpool, unsigned int pages, > - unsigned int *reclaimed) > -{ > - return zpool->driver->shrink ? > - zpool->driver->shrink(zpool->pool, pages, reclaimed) : -EINVAL; > -} > - > /** > * zpool_map_handle() - Map a previously allocated handle into memory > * @zpool: The zpool that the handle was allocated from > @@ -359,24 +333,6 @@ u64 zpool_get_total_size(struct zpool *zpool) > return zpool->driver->total_size(zpool->pool); > } > > -/** > - * zpool_evictable() - Test if zpool is potentially evictable > - * @zpool: The zpool to test > - * > - * Zpool is only potentially evictable when it's created with struct > - * zpool_ops.evict and its driver implements struct zpool_driver.shrink. > - * > - * However, it doesn't necessarily mean driver will use zpool_ops.evict > - * in its implementation of zpool_driver.shrink. It could do internal > - * defragmentation instead. > - * > - * Returns: true if potentially evictable; false otherwise. > - */ > -bool zpool_evictable(struct zpool *zpool) > -{ > - return zpool->driver->shrink; > -} > - > /** > * zpool_can_sleep_mapped - Test if zpool can sleep when do mapped. > * @zpool: The zpool to test > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 75386283dba0..634daa19b6c2 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -351,9 +351,7 @@ static void record_obj(unsigned long handle, unsigned long obj) > > #ifdef CONFIG_ZPOOL > > -static void *zs_zpool_create(const char *name, gfp_t gfp, > - const struct zpool_ops *zpool_ops, > - struct zpool *zpool) > +static void *zs_zpool_create(const char *name, gfp_t gfp) > { > /* > * Ignore global gfp flags: zs_malloc() may be invoked from > @@ -420,7 +418,6 @@ static struct zpool_driver zs_zpool_driver = { > .malloc_support_movable = true, > .malloc = zs_zpool_malloc, > .free = zs_zpool_free, > - .shrink = NULL, > .map = zs_zpool_map, > .unmap = zs_zpool_unmap, > .total_size = zs_zpool_total_size, > diff --git a/mm/zswap.c b/mm/zswap.c > index c99bafcefecf..2831bf56b168 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -254,10 +254,6 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle); > static int zswap_pool_get(struct zswap_pool *pool); > static void zswap_pool_put(struct zswap_pool *pool); > > -static const struct zpool_ops zswap_zpool_ops = { > - .evict = zswap_writeback_entry > -}; > - > static bool zswap_is_full(void) > { > return totalram_pages() * zswap_max_pool_percent / 100 < > @@ -375,12 +371,9 @@ static void zswap_free_entry(struct zswap_entry *entry) > if (!entry->length) > atomic_dec(&zswap_same_filled_pages); > else { > - /* zpool_evictable will be removed once all 3 backends have migrated */ > - if (!zpool_evictable(entry->pool->zpool)) { > - spin_lock(&entry->pool->lru_lock); > - list_del(&entry->lru); > - spin_unlock(&entry->pool->lru_lock); > - } > + spin_lock(&entry->pool->lru_lock); > + list_del(&entry->lru); > + spin_unlock(&entry->pool->lru_lock); > zpool_free(entry->pool->zpool, entry->handle); > zswap_pool_put(entry->pool); > } > @@ -659,12 +652,8 @@ static void shrink_worker(struct work_struct *w) > shrink_work); > int ret, failures = 0; > > - /* zpool_evictable will be removed once all 3 backends have migrated */ > do { > - if (zpool_evictable(pool->zpool)) > - ret = zpool_shrink(pool->zpool, 1, NULL); > - else > - ret = zswap_shrink(pool); > + ret = zswap_shrink(pool); > if (ret) { > zswap_reject_reclaim_fail++; > if (ret != -EAGAIN) > @@ -702,7 +691,7 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) > /* unique name for each pool specifically required by zsmalloc */ > snprintf(name, 38, "zswap%x", atomic_inc_return(&zswap_pools_count)); > > - pool->zpool = zpool_create_pool(type, name, gfp, &zswap_zpool_ops); > + pool->zpool = zpool_create_pool(type, name, gfp); > if (!pool->zpool) { > pr_err("%s zpool not available\n", type); > goto error; > @@ -1388,8 +1377,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > zswap_entry_put(tree, dupentry); > } > } while (ret == -EEXIST); > - /* zpool_evictable will be removed once all 3 backends have migrated */ > - if (entry->length && !zpool_evictable(entry->pool->zpool)) { > + if (entry->length) { > spin_lock(&entry->pool->lru_lock); > list_add(&entry->lru, &entry->pool->lru); > spin_unlock(&entry->pool->lru_lock); > @@ -1495,8 +1483,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > freeentry: > spin_lock(&tree->lock); > zswap_entry_put(tree, entry); > - /* zpool_evictable will be removed once all 3 backends have migrated */ > - if (entry->length && !zpool_evictable(entry->pool->zpool)) { > + if (entry->length) { > spin_lock(&entry->pool->lru_lock); > list_move(&entry->lru, &entry->pool->lru); > spin_unlock(&entry->pool->lru_lock); > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 5/7] mm: zswap: remove shrink from zpool interface [not found] ` <20230606145611.704392-6-cerasuolodomenico@gmail.com> 2023-06-07 9:19 ` [RFC PATCH v2 5/7] mm: zswap: remove shrink from zpool interface Yosry Ahmed @ 2023-06-08 16:10 ` Johannes Weiner 2023-06-08 17:51 ` Nhat Pham 2 siblings, 0 replies; 28+ messages in thread From: Johannes Weiner @ 2023-06-08 16:10 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, akpm, linux-kernel, kernel-team On Tue, Jun 06, 2023 at 04:56:09PM +0200, Domenico Cerasuolo wrote: > @@ -40,7 +36,7 @@ enum zpool_mapmode { > bool zpool_has_pool(char *type); > > struct zpool *zpool_create_pool(const char *type, const char *name, > - gfp_t gfp, const struct zpool_ops *ops); > + gfp_t gfp); This fits into a single line now. Otherwise, the patch looks great to me. Acked-by: Johannes Weiner <hannes@cmpxchg.org> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 5/7] mm: zswap: remove shrink from zpool interface [not found] ` <20230606145611.704392-6-cerasuolodomenico@gmail.com> 2023-06-07 9:19 ` [RFC PATCH v2 5/7] mm: zswap: remove shrink from zpool interface Yosry Ahmed 2023-06-08 16:10 ` Johannes Weiner @ 2023-06-08 17:51 ` Nhat Pham 2 siblings, 0 replies; 28+ messages in thread From: Nhat Pham @ 2023-06-08 17:51 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, hannes, akpm, linux-kernel, kernel-team On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > Now that all three zswap backends have removed their shrink code, it is > no longer necessary for the zpool interface to include shrink/writeback > endpoints. > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > --- > include/linux/zpool.h | 19 ++--------------- > mm/z3fold.c | 5 +---- > mm/zbud.c | 5 +---- > mm/zpool.c | 48 ++----------------------------------------- > mm/zsmalloc.c | 5 +---- > mm/zswap.c | 27 +++++++----------------- > 6 files changed, 14 insertions(+), 95 deletions(-) > > diff --git a/include/linux/zpool.h b/include/linux/zpool.h > index e8997010612a..6b15a4213de5 100644 > --- a/include/linux/zpool.h > +++ b/include/linux/zpool.h > @@ -14,10 +14,6 @@ > > struct zpool; > > -struct zpool_ops { > - int (*evict)(struct zpool *pool, unsigned long handle); > -}; > - > /* > * Control how a handle is mapped. It will be ignored if the > * implementation does not support it. Its use is optional. > @@ -40,7 +36,7 @@ enum zpool_mapmode { > bool zpool_has_pool(char *type); > > struct zpool *zpool_create_pool(const char *type, const char *name, > - gfp_t gfp, const struct zpool_ops *ops); > + gfp_t gfp); > > const char *zpool_get_type(struct zpool *pool); > > @@ -53,9 +49,6 @@ int zpool_malloc(struct zpool *pool, size_t size, gfp_t gfp, > > void zpool_free(struct zpool *pool, unsigned long handle); > > -int zpool_shrink(struct zpool *pool, unsigned int pages, > - unsigned int *reclaimed); > - > void *zpool_map_handle(struct zpool *pool, unsigned long handle, > enum zpool_mapmode mm); > > @@ -72,7 +65,6 @@ u64 zpool_get_total_size(struct zpool *pool); > * @destroy: destroy a pool. > * @malloc: allocate mem from a pool. > * @free: free mem from a pool. > - * @shrink: shrink the pool. > * @sleep_mapped: whether zpool driver can sleep during map. > * @map: map a handle. > * @unmap: unmap a handle. > @@ -87,10 +79,7 @@ struct zpool_driver { > atomic_t refcount; > struct list_head list; > > - void *(*create)(const char *name, > - gfp_t gfp, > - const struct zpool_ops *ops, > - struct zpool *zpool); > + void *(*create)(const char *name, gfp_t gfp); > void (*destroy)(void *pool); > > bool malloc_support_movable; > @@ -98,9 +87,6 @@ struct zpool_driver { > unsigned long *handle); > void (*free)(void *pool, unsigned long handle); > > - int (*shrink)(void *pool, unsigned int pages, > - unsigned int *reclaimed); > - > bool sleep_mapped; > void *(*map)(void *pool, unsigned long handle, > enum zpool_mapmode mm); > @@ -113,7 +99,6 @@ void zpool_register_driver(struct zpool_driver *driver); > > int zpool_unregister_driver(struct zpool_driver *driver); > > -bool zpool_evictable(struct zpool *pool); > bool zpool_can_sleep_mapped(struct zpool *pool); > > #endif > diff --git a/mm/z3fold.c b/mm/z3fold.c > index 4af8741553ac..e84de91ecccb 100644 > --- a/mm/z3fold.c > +++ b/mm/z3fold.c > @@ -1364,9 +1364,7 @@ static const struct movable_operations z3fold_mops = { > * zpool > ****************/ > > -static void *z3fold_zpool_create(const char *name, gfp_t gfp, > - const struct zpool_ops *zpool_ops, > - struct zpool *zpool) > +static void *z3fold_zpool_create(const char *name, gfp_t gfp) > { > return z3fold_create_pool(name, gfp); > } > @@ -1409,7 +1407,6 @@ static struct zpool_driver z3fold_zpool_driver = { > .destroy = z3fold_zpool_destroy, > .malloc = z3fold_zpool_malloc, > .free = z3fold_zpool_free, > - .shrink = NULL, > .map = z3fold_zpool_map, > .unmap = z3fold_zpool_unmap, > .total_size = z3fold_zpool_total_size, > diff --git a/mm/zbud.c b/mm/zbud.c > index 19bc662ef5e9..2190cc1f37b3 100644 > --- a/mm/zbud.c > +++ b/mm/zbud.c > @@ -380,9 +380,7 @@ static u64 zbud_get_pool_size(struct zbud_pool *pool) > * zpool > ****************/ > > -static void *zbud_zpool_create(const char *name, gfp_t gfp, > - const struct zpool_ops *zpool_ops, > - struct zpool *zpool) > +static void *zbud_zpool_create(const char *name, gfp_t gfp) > { > return zbud_create_pool(gfp); > } > @@ -425,7 +423,6 @@ static struct zpool_driver zbud_zpool_driver = { > .destroy = zbud_zpool_destroy, > .malloc = zbud_zpool_malloc, > .free = zbud_zpool_free, > - .shrink = NULL, > .map = zbud_zpool_map, > .unmap = zbud_zpool_unmap, > .total_size = zbud_zpool_total_size, > diff --git a/mm/zpool.c b/mm/zpool.c > index 6a19c4a58f77..846410479c2f 100644 > --- a/mm/zpool.c > +++ b/mm/zpool.c > @@ -133,7 +133,6 @@ EXPORT_SYMBOL(zpool_has_pool); > * @type: The type of the zpool to create (e.g. zbud, zsmalloc) > * @name: The name of the zpool (e.g. zram0, zswap) > * @gfp: The GFP flags to use when allocating the pool. > - * @ops: The optional ops callback. > * > * This creates a new zpool of the specified type. The gfp flags will be > * used when allocating memory, if the implementation supports it. If the > @@ -145,8 +144,7 @@ EXPORT_SYMBOL(zpool_has_pool); > * > * Returns: New zpool on success, NULL on failure. > */ > -struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp, > - const struct zpool_ops *ops) > +struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp) > { > struct zpool_driver *driver; > struct zpool *zpool; > @@ -173,7 +171,7 @@ struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp, > } > > zpool->driver = driver; > - zpool->pool = driver->create(name, gfp, ops, zpool); > + zpool->pool = driver->create(name, gfp); > > if (!zpool->pool) { > pr_err("couldn't create %s pool\n", type); > @@ -279,30 +277,6 @@ void zpool_free(struct zpool *zpool, unsigned long handle) > zpool->driver->free(zpool->pool, handle); > } > > -/** > - * zpool_shrink() - Shrink the pool size > - * @zpool: The zpool to shrink. > - * @pages: The number of pages to shrink the pool. > - * @reclaimed: The number of pages successfully evicted. > - * > - * This attempts to shrink the actual memory size of the pool > - * by evicting currently used handle(s). If the pool was > - * created with no zpool_ops, or the evict call fails for any > - * of the handles, this will fail. If non-NULL, the @reclaimed > - * parameter will be set to the number of pages reclaimed, > - * which may be more than the number of pages requested. > - * > - * Implementations must guarantee this to be thread-safe. > - * > - * Returns: 0 on success, negative value on error/failure. > - */ > -int zpool_shrink(struct zpool *zpool, unsigned int pages, > - unsigned int *reclaimed) > -{ > - return zpool->driver->shrink ? > - zpool->driver->shrink(zpool->pool, pages, reclaimed) : -EINVAL; > -} > - > /** > * zpool_map_handle() - Map a previously allocated handle into memory > * @zpool: The zpool that the handle was allocated from > @@ -359,24 +333,6 @@ u64 zpool_get_total_size(struct zpool *zpool) > return zpool->driver->total_size(zpool->pool); > } > > -/** > - * zpool_evictable() - Test if zpool is potentially evictable > - * @zpool: The zpool to test > - * > - * Zpool is only potentially evictable when it's created with struct > - * zpool_ops.evict and its driver implements struct zpool_driver.shrink. > - * > - * However, it doesn't necessarily mean driver will use zpool_ops.evict > - * in its implementation of zpool_driver.shrink. It could do internal > - * defragmentation instead. > - * > - * Returns: true if potentially evictable; false otherwise. > - */ > -bool zpool_evictable(struct zpool *zpool) > -{ > - return zpool->driver->shrink; > -} > - > /** > * zpool_can_sleep_mapped - Test if zpool can sleep when do mapped. > * @zpool: The zpool to test > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 75386283dba0..634daa19b6c2 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -351,9 +351,7 @@ static void record_obj(unsigned long handle, unsigned long obj) > > #ifdef CONFIG_ZPOOL > > -static void *zs_zpool_create(const char *name, gfp_t gfp, > - const struct zpool_ops *zpool_ops, > - struct zpool *zpool) > +static void *zs_zpool_create(const char *name, gfp_t gfp) > { > /* > * Ignore global gfp flags: zs_malloc() may be invoked from > @@ -420,7 +418,6 @@ static struct zpool_driver zs_zpool_driver = { > .malloc_support_movable = true, > .malloc = zs_zpool_malloc, > .free = zs_zpool_free, > - .shrink = NULL, > .map = zs_zpool_map, > .unmap = zs_zpool_unmap, > .total_size = zs_zpool_total_size, > diff --git a/mm/zswap.c b/mm/zswap.c > index c99bafcefecf..2831bf56b168 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -254,10 +254,6 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle); > static int zswap_pool_get(struct zswap_pool *pool); > static void zswap_pool_put(struct zswap_pool *pool); > > -static const struct zpool_ops zswap_zpool_ops = { > - .evict = zswap_writeback_entry > -}; > - > static bool zswap_is_full(void) > { > return totalram_pages() * zswap_max_pool_percent / 100 < > @@ -375,12 +371,9 @@ static void zswap_free_entry(struct zswap_entry *entry) > if (!entry->length) > atomic_dec(&zswap_same_filled_pages); > else { > - /* zpool_evictable will be removed once all 3 backends have migrated */ > - if (!zpool_evictable(entry->pool->zpool)) { > - spin_lock(&entry->pool->lru_lock); > - list_del(&entry->lru); > - spin_unlock(&entry->pool->lru_lock); > - } > + spin_lock(&entry->pool->lru_lock); > + list_del(&entry->lru); > + spin_unlock(&entry->pool->lru_lock); > zpool_free(entry->pool->zpool, entry->handle); > zswap_pool_put(entry->pool); > } > @@ -659,12 +652,8 @@ static void shrink_worker(struct work_struct *w) > shrink_work); > int ret, failures = 0; > > - /* zpool_evictable will be removed once all 3 backends have migrated */ > do { > - if (zpool_evictable(pool->zpool)) > - ret = zpool_shrink(pool->zpool, 1, NULL); > - else > - ret = zswap_shrink(pool); > + ret = zswap_shrink(pool); > if (ret) { > zswap_reject_reclaim_fail++; > if (ret != -EAGAIN) > @@ -702,7 +691,7 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) > /* unique name for each pool specifically required by zsmalloc */ > snprintf(name, 38, "zswap%x", atomic_inc_return(&zswap_pools_count)); > > - pool->zpool = zpool_create_pool(type, name, gfp, &zswap_zpool_ops); > + pool->zpool = zpool_create_pool(type, name, gfp); > if (!pool->zpool) { > pr_err("%s zpool not available\n", type); > goto error; > @@ -1388,8 +1377,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > zswap_entry_put(tree, dupentry); > } > } while (ret == -EEXIST); > - /* zpool_evictable will be removed once all 3 backends have migrated */ > - if (entry->length && !zpool_evictable(entry->pool->zpool)) { > + if (entry->length) { > spin_lock(&entry->pool->lru_lock); > list_add(&entry->lru, &entry->pool->lru); > spin_unlock(&entry->pool->lru_lock); > @@ -1495,8 +1483,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > freeentry: > spin_lock(&tree->lock); > zswap_entry_put(tree, entry); > - /* zpool_evictable will be removed once all 3 backends have migrated */ > - if (entry->length && !zpool_evictable(entry->pool->zpool)) { > + if (entry->length) { > spin_lock(&entry->pool->lru_lock); > list_move(&entry->lru, &entry->pool->lru); > spin_unlock(&entry->pool->lru_lock); > -- > 2.34.1 > zsmalloc's shrink removal looks good to me. Acked-by: Nhat Pham <nphamcs@gmail.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20230606145611.704392-7-cerasuolodomenico@gmail.com>]
* Re: [RFC PATCH v2 6/7] mm: zswap: simplify writeback function [not found] ` <20230606145611.704392-7-cerasuolodomenico@gmail.com> @ 2023-06-07 9:26 ` Yosry Ahmed 2023-06-09 10:23 ` Domenico Cerasuolo 2023-06-08 16:48 ` Johannes Weiner 1 sibling, 1 reply; 28+ messages in thread From: Yosry Ahmed @ 2023-06-07 9:26 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > Previously, in zswap, the writeback function was passed to zpool drivers > for their usage in calling the writeback operation. However, since the > drivers did not possess references to entries and the function was > specifically designed to work with handles, the writeback process has > been modified to occur directly within zswap. Consequently, this change > allows for some simplification of the writeback function, taking into > account the updated workflow. > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > --- > mm/zswap.c | 69 ++++++++++++++---------------------------------------- > 1 file changed, 17 insertions(+), 52 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 2831bf56b168..ef8604812352 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -250,7 +250,8 @@ static bool zswap_has_pool; > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > zpool_get_type((p)->zpool)) > > -static int zswap_writeback_entry(struct zpool *pool, unsigned long handle); > +static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > + struct zswap_tree *tree); > static int zswap_pool_get(struct zswap_pool *pool); > static void zswap_pool_put(struct zswap_pool *pool); > > @@ -632,15 +633,21 @@ static int zswap_shrink(struct zswap_pool *pool) > } > spin_unlock(&tree->lock); > > - ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > + ret = zswap_writeback_entry(lru_entry, zhdr, tree); > > spin_lock(&tree->lock); > if (ret) { > spin_lock(&pool->lru_lock); > list_move(&lru_entry->lru, &pool->lru); > spin_unlock(&pool->lru_lock); > + zswap_entry_put(tree, tree_entry); > + } else { > + /* free the local reference */ > + zswap_entry_put(tree, tree_entry); > + /* free the entry if it's not been invalidated*/ > + if (lru_entry == zswap_rb_search(&tree->rbroot, swpoffset)) > + zswap_entry_put(tree, tree_entry); The comment that was here about the 2 possible cases was useful imo, maybe keep it? Also, I am not sure why we need to do a tree search vs. just reading the refcount here before the first put. We can even make zswap_entry_put() return the refcount after the put to know if we need the additional put or not. Can anyone think of any reason why we need to explicitly search the tree here? > } > - zswap_entry_put(tree, tree_entry); > spin_unlock(&tree->lock); > > return ret ? -EAGAIN : 0; > @@ -1039,16 +1046,14 @@ static int zswap_get_swap_cache_page(swp_entry_t entry, > * the swap cache, the compressed version stored by zswap can be > * freed. > */ > -static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > +static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > + struct zswap_tree *tree) > { > - struct zswap_header *zhdr; > - swp_entry_t swpentry; > - struct zswap_tree *tree; > - pgoff_t offset; > - struct zswap_entry *entry; > + swp_entry_t swpentry = zhdr->swpentry; > struct page *page; > struct scatterlist input, output; > struct crypto_acomp_ctx *acomp_ctx; > + struct zpool *pool = entry->pool->zpool; > > u8 *src, *tmp = NULL; > unsigned int dlen; > @@ -1063,25 +1068,6 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > return -ENOMEM; > } > > - /* extract swpentry from data */ > - zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO); > - swpentry = zhdr->swpentry; /* here */ > - tree = zswap_trees[swp_type(swpentry)]; > - offset = swp_offset(swpentry); > - zpool_unmap_handle(pool, handle); > - > - /* find and ref zswap entry */ > - spin_lock(&tree->lock); > - entry = zswap_entry_find_get(&tree->rbroot, offset); > - if (!entry) { > - /* entry was invalidated */ > - spin_unlock(&tree->lock); > - kfree(tmp); > - return 0; > - } > - spin_unlock(&tree->lock); > - BUG_ON(offset != entry->offset); > - > /* try to allocate swap cache page */ > switch (zswap_get_swap_cache_page(swpentry, &page)) { > case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */ > @@ -1115,12 +1101,12 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > dlen = PAGE_SIZE; > > - zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO); > + zhdr = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); > src = (u8 *)zhdr + sizeof(struct zswap_header); > if (!zpool_can_sleep_mapped(pool)) { > memcpy(tmp, src, entry->length); > src = tmp; > - zpool_unmap_handle(pool, handle); > + zpool_unmap_handle(pool, entry->handle); > } > > mutex_lock(acomp_ctx->mutex); > @@ -1135,7 +1121,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > if (!zpool_can_sleep_mapped(pool)) > kfree(tmp); > else > - zpool_unmap_handle(pool, handle); > + zpool_unmap_handle(pool, entry->handle); > > BUG_ON(ret); > BUG_ON(dlen != PAGE_SIZE); > @@ -1152,23 +1138,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > put_page(page); > zswap_written_back_pages++; > > - spin_lock(&tree->lock); > - /* drop local reference */ > - zswap_entry_put(tree, entry); > - > - /* > - * There are two possible situations for entry here: > - * (1) refcount is 1(normal case), entry is valid and on the tree > - * (2) refcount is 0, entry is freed and not on the tree > - * because invalidate happened during writeback > - * search the tree and free the entry if find entry > - */ > - if (entry == zswap_rb_search(&tree->rbroot, offset)) > - zswap_entry_put(tree, entry); > - spin_unlock(&tree->lock); > - > return ret; > - > fail: > if (!zpool_can_sleep_mapped(pool)) > kfree(tmp); > @@ -1177,13 +1147,8 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > * if we get here due to ZSWAP_SWAPCACHE_EXIST > * a load may be happening concurrently. > * it is safe and okay to not free the entry. > - * if we free the entry in the following put > * it is also okay to return !0 > */ > - spin_lock(&tree->lock); > - zswap_entry_put(tree, entry); > - spin_unlock(&tree->lock); > - > return ret; > } > > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 6/7] mm: zswap: simplify writeback function 2023-06-07 9:26 ` [RFC PATCH v2 6/7] mm: zswap: simplify writeback function Yosry Ahmed @ 2023-06-09 10:23 ` Domenico Cerasuolo 2023-06-09 11:01 ` Yosry Ahmed 0 siblings, 1 reply; 28+ messages in thread From: Domenico Cerasuolo @ 2023-06-09 10:23 UTC (permalink / raw) To: Yosry Ahmed Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Wed, Jun 7, 2023 at 11:27 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo > <cerasuolodomenico@gmail.com> wrote: > > > > Previously, in zswap, the writeback function was passed to zpool drivers > > for their usage in calling the writeback operation. However, since the > > drivers did not possess references to entries and the function was > > specifically designed to work with handles, the writeback process has > > been modified to occur directly within zswap. Consequently, this change > > allows for some simplification of the writeback function, taking into > > account the updated workflow. > > > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > > --- > > mm/zswap.c | 69 ++++++++++++++---------------------------------------- > > 1 file changed, 17 insertions(+), 52 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 2831bf56b168..ef8604812352 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -250,7 +250,8 @@ static bool zswap_has_pool; > > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > > zpool_get_type((p)->zpool)) > > > > -static int zswap_writeback_entry(struct zpool *pool, unsigned long handle); > > +static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > > + struct zswap_tree *tree); > > static int zswap_pool_get(struct zswap_pool *pool); > > static void zswap_pool_put(struct zswap_pool *pool); > > > > @@ -632,15 +633,21 @@ static int zswap_shrink(struct zswap_pool *pool) > > } > > spin_unlock(&tree->lock); > > > > - ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > > + ret = zswap_writeback_entry(lru_entry, zhdr, tree); > > > > spin_lock(&tree->lock); > > if (ret) { > > spin_lock(&pool->lru_lock); > > list_move(&lru_entry->lru, &pool->lru); > > spin_unlock(&pool->lru_lock); > > + zswap_entry_put(tree, tree_entry); > > + } else { > > + /* free the local reference */ > > + zswap_entry_put(tree, tree_entry); > > + /* free the entry if it's not been invalidated*/ > > + if (lru_entry == zswap_rb_search(&tree->rbroot, swpoffset)) > > + zswap_entry_put(tree, tree_entry); > > The comment that was here about the 2 possible cases was useful imo, > maybe keep it? I honestly found it brittle in that we're not interested in the refcount there, but rather in releasing the base reference that keeps the entry valid. There's not way to know which refcount value it should be though, consider that throughout this series the values can be 1 or 0, but also 2 or 1, depending on where this function is called (zpool_shrink or zswap_shrink). > > Also, I am not sure why we need to do a tree search vs. just reading > the refcount here before the first put. We can even make > zswap_entry_put() return the refcount after the put to know if we need > the additional put or not. > > Can anyone think of any reason why we need to explicitly search the tree here? I think that the reasoning here is that refcount > 0 doesn't necessarily mean that the entry is on the tree. I'm not sure if reading refcount directly here would cause an issue now, probably not, but I wonder if bugs could be introduced if the context in which this function is called changes. > > > } > > - zswap_entry_put(tree, tree_entry); > > spin_unlock(&tree->lock); > > > > return ret ? -EAGAIN : 0; > > @@ -1039,16 +1046,14 @@ static int zswap_get_swap_cache_page(swp_entry_t entry, > > * the swap cache, the compressed version stored by zswap can be > > * freed. > > */ > > -static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > +static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > > + struct zswap_tree *tree) > > { > > - struct zswap_header *zhdr; > > - swp_entry_t swpentry; > > - struct zswap_tree *tree; > > - pgoff_t offset; > > - struct zswap_entry *entry; > > + swp_entry_t swpentry = zhdr->swpentry; > > struct page *page; > > struct scatterlist input, output; > > struct crypto_acomp_ctx *acomp_ctx; > > + struct zpool *pool = entry->pool->zpool; > > > > u8 *src, *tmp = NULL; > > unsigned int dlen; > > @@ -1063,25 +1068,6 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > return -ENOMEM; > > } > > > > - /* extract swpentry from data */ > > - zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO); > > - swpentry = zhdr->swpentry; /* here */ > > - tree = zswap_trees[swp_type(swpentry)]; > > - offset = swp_offset(swpentry); > > - zpool_unmap_handle(pool, handle); > > - > > - /* find and ref zswap entry */ > > - spin_lock(&tree->lock); > > - entry = zswap_entry_find_get(&tree->rbroot, offset); > > - if (!entry) { > > - /* entry was invalidated */ > > - spin_unlock(&tree->lock); > > - kfree(tmp); > > - return 0; > > - } > > - spin_unlock(&tree->lock); > > - BUG_ON(offset != entry->offset); > > - > > /* try to allocate swap cache page */ > > switch (zswap_get_swap_cache_page(swpentry, &page)) { > > case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */ > > @@ -1115,12 +1101,12 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > > dlen = PAGE_SIZE; > > > > - zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO); > > + zhdr = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); > > src = (u8 *)zhdr + sizeof(struct zswap_header); > > if (!zpool_can_sleep_mapped(pool)) { > > memcpy(tmp, src, entry->length); > > src = tmp; > > - zpool_unmap_handle(pool, handle); > > + zpool_unmap_handle(pool, entry->handle); > > } > > > > mutex_lock(acomp_ctx->mutex); > > @@ -1135,7 +1121,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > if (!zpool_can_sleep_mapped(pool)) > > kfree(tmp); > > else > > - zpool_unmap_handle(pool, handle); > > + zpool_unmap_handle(pool, entry->handle); > > > > BUG_ON(ret); > > BUG_ON(dlen != PAGE_SIZE); > > @@ -1152,23 +1138,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > put_page(page); > > zswap_written_back_pages++; > > > > - spin_lock(&tree->lock); > > - /* drop local reference */ > > - zswap_entry_put(tree, entry); > > - > > - /* > > - * There are two possible situations for entry here: > > - * (1) refcount is 1(normal case), entry is valid and on the tree > > - * (2) refcount is 0, entry is freed and not on the tree > > - * because invalidate happened during writeback > > - * search the tree and free the entry if find entry > > - */ > > - if (entry == zswap_rb_search(&tree->rbroot, offset)) > > - zswap_entry_put(tree, entry); > > - spin_unlock(&tree->lock); > > - > > return ret; > > - > > fail: > > if (!zpool_can_sleep_mapped(pool)) > > kfree(tmp); > > @@ -1177,13 +1147,8 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > * if we get here due to ZSWAP_SWAPCACHE_EXIST > > * a load may be happening concurrently. > > * it is safe and okay to not free the entry. > > - * if we free the entry in the following put > > * it is also okay to return !0 > > */ > > - spin_lock(&tree->lock); > > - zswap_entry_put(tree, entry); > > - spin_unlock(&tree->lock); > > - > > return ret; > > } > > > > -- > > 2.34.1 > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 6/7] mm: zswap: simplify writeback function 2023-06-09 10:23 ` Domenico Cerasuolo @ 2023-06-09 11:01 ` Yosry Ahmed 0 siblings, 0 replies; 28+ messages in thread From: Yosry Ahmed @ 2023-06-09 11:01 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Fri, Jun 9, 2023 at 3:23 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > On Wed, Jun 7, 2023 at 11:27 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo > > <cerasuolodomenico@gmail.com> wrote: > > > > > > Previously, in zswap, the writeback function was passed to zpool drivers > > > for their usage in calling the writeback operation. However, since the > > > drivers did not possess references to entries and the function was > > > specifically designed to work with handles, the writeback process has > > > been modified to occur directly within zswap. Consequently, this change > > > allows for some simplification of the writeback function, taking into > > > account the updated workflow. > > > > > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > > > --- > > > mm/zswap.c | 69 ++++++++++++++---------------------------------------- > > > 1 file changed, 17 insertions(+), 52 deletions(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index 2831bf56b168..ef8604812352 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -250,7 +250,8 @@ static bool zswap_has_pool; > > > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > > > zpool_get_type((p)->zpool)) > > > > > > -static int zswap_writeback_entry(struct zpool *pool, unsigned long handle); > > > +static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > > > + struct zswap_tree *tree); > > > static int zswap_pool_get(struct zswap_pool *pool); > > > static void zswap_pool_put(struct zswap_pool *pool); > > > > > > @@ -632,15 +633,21 @@ static int zswap_shrink(struct zswap_pool *pool) > > > } > > > spin_unlock(&tree->lock); > > > > > > - ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > > > + ret = zswap_writeback_entry(lru_entry, zhdr, tree); > > > > > > spin_lock(&tree->lock); > > > if (ret) { > > > spin_lock(&pool->lru_lock); > > > list_move(&lru_entry->lru, &pool->lru); > > > spin_unlock(&pool->lru_lock); > > > + zswap_entry_put(tree, tree_entry); > > > + } else { > > > + /* free the local reference */ > > > + zswap_entry_put(tree, tree_entry); > > > + /* free the entry if it's not been invalidated*/ > > > + if (lru_entry == zswap_rb_search(&tree->rbroot, swpoffset)) > > > + zswap_entry_put(tree, tree_entry); > > > > The comment that was here about the 2 possible cases was useful imo, > > maybe keep it? > > I honestly found it brittle in that we're not interested in the refcount there, > but rather in releasing the base reference that keeps the entry valid. > There's not way to know which refcount value it should be though, consider > that throughout this series the values can be 1 or 0, but also 2 or 1, > depending on where this function is called (zpool_shrink or zswap_shrink). Yeah looking at it with fresh eyes makes sense, we want to invalidate the entry, not really caring about the refcount (see below). > > > > > Also, I am not sure why we need to do a tree search vs. just reading > > the refcount here before the first put. We can even make > > zswap_entry_put() return the refcount after the put to know if we need > > the additional put or not. > > > > Can anyone think of any reason why we need to explicitly search the tree here? > > I think that the reasoning here is that refcount > 0 doesn't necessarily mean > that the entry is on the tree. I'm not sure if reading refcount directly here > would cause an issue now, probably not, but I wonder if bugs could be > introduced if the context in which this function is called changes. Yeah I agree. As Johannes suggested, using zswap_invalidate_entry() (from my exclusive loads patch in mm-unstable) would be best here imo. > > > > > > } > > > - zswap_entry_put(tree, tree_entry); > > > spin_unlock(&tree->lock); > > > > > > return ret ? -EAGAIN : 0; > > > @@ -1039,16 +1046,14 @@ static int zswap_get_swap_cache_page(swp_entry_t entry, > > > * the swap cache, the compressed version stored by zswap can be > > > * freed. > > > */ > > > -static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > > +static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > > > + struct zswap_tree *tree) > > > { > > > - struct zswap_header *zhdr; > > > - swp_entry_t swpentry; > > > - struct zswap_tree *tree; > > > - pgoff_t offset; > > > - struct zswap_entry *entry; > > > + swp_entry_t swpentry = zhdr->swpentry; > > > struct page *page; > > > struct scatterlist input, output; > > > struct crypto_acomp_ctx *acomp_ctx; > > > + struct zpool *pool = entry->pool->zpool; > > > > > > u8 *src, *tmp = NULL; > > > unsigned int dlen; > > > @@ -1063,25 +1068,6 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > > return -ENOMEM; > > > } > > > > > > - /* extract swpentry from data */ > > > - zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO); > > > - swpentry = zhdr->swpentry; /* here */ > > > - tree = zswap_trees[swp_type(swpentry)]; > > > - offset = swp_offset(swpentry); > > > - zpool_unmap_handle(pool, handle); > > > - > > > - /* find and ref zswap entry */ > > > - spin_lock(&tree->lock); > > > - entry = zswap_entry_find_get(&tree->rbroot, offset); > > > - if (!entry) { > > > - /* entry was invalidated */ > > > - spin_unlock(&tree->lock); > > > - kfree(tmp); > > > - return 0; > > > - } > > > - spin_unlock(&tree->lock); > > > - BUG_ON(offset != entry->offset); > > > - > > > /* try to allocate swap cache page */ > > > switch (zswap_get_swap_cache_page(swpentry, &page)) { > > > case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */ > > > @@ -1115,12 +1101,12 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > > > dlen = PAGE_SIZE; > > > > > > - zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO); > > > + zhdr = zpool_map_handle(pool, entry->handle, ZPOOL_MM_RO); > > > src = (u8 *)zhdr + sizeof(struct zswap_header); > > > if (!zpool_can_sleep_mapped(pool)) { > > > memcpy(tmp, src, entry->length); > > > src = tmp; > > > - zpool_unmap_handle(pool, handle); > > > + zpool_unmap_handle(pool, entry->handle); > > > } > > > > > > mutex_lock(acomp_ctx->mutex); > > > @@ -1135,7 +1121,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > > if (!zpool_can_sleep_mapped(pool)) > > > kfree(tmp); > > > else > > > - zpool_unmap_handle(pool, handle); > > > + zpool_unmap_handle(pool, entry->handle); > > > > > > BUG_ON(ret); > > > BUG_ON(dlen != PAGE_SIZE); > > > @@ -1152,23 +1138,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > > put_page(page); > > > zswap_written_back_pages++; > > > > > > - spin_lock(&tree->lock); > > > - /* drop local reference */ > > > - zswap_entry_put(tree, entry); > > > - > > > - /* > > > - * There are two possible situations for entry here: > > > - * (1) refcount is 1(normal case), entry is valid and on the tree > > > - * (2) refcount is 0, entry is freed and not on the tree > > > - * because invalidate happened during writeback > > > - * search the tree and free the entry if find entry > > > - */ > > > - if (entry == zswap_rb_search(&tree->rbroot, offset)) > > > - zswap_entry_put(tree, entry); > > > - spin_unlock(&tree->lock); > > > - > > > return ret; > > > - > > > fail: > > > if (!zpool_can_sleep_mapped(pool)) > > > kfree(tmp); > > > @@ -1177,13 +1147,8 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) > > > * if we get here due to ZSWAP_SWAPCACHE_EXIST > > > * a load may be happening concurrently. > > > * it is safe and okay to not free the entry. > > > - * if we free the entry in the following put > > > * it is also okay to return !0 > > > */ > > > - spin_lock(&tree->lock); > > > - zswap_entry_put(tree, entry); > > > - spin_unlock(&tree->lock); > > > - > > > return ret; > > > } > > > > > > -- > > > 2.34.1 > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 6/7] mm: zswap: simplify writeback function [not found] ` <20230606145611.704392-7-cerasuolodomenico@gmail.com> 2023-06-07 9:26 ` [RFC PATCH v2 6/7] mm: zswap: simplify writeback function Yosry Ahmed @ 2023-06-08 16:48 ` Johannes Weiner 2023-06-09 11:05 ` Domenico Cerasuolo 1 sibling, 1 reply; 28+ messages in thread From: Johannes Weiner @ 2023-06-08 16:48 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, akpm, linux-kernel, kernel-team Hi Domenico, On Tue, Jun 06, 2023 at 04:56:10PM +0200, Domenico Cerasuolo wrote: > Previously, in zswap, the writeback function was passed to zpool drivers > for their usage in calling the writeback operation. However, since the > drivers did not possess references to entries and the function was > specifically designed to work with handles, the writeback process has > been modified to occur directly within zswap. I'm having trouble parsing this sentence. > Consequently, this change allows for some simplification of the > writeback function, taking into account the updated workflow. How about: zswap_writeback_entry() used to be a callback for the backends, which don't know about struct zswap_entry. Now that the only user is the generic zswap LRU reclaimer, it can be simplified: pass the pinned zswap_entry directly, and consolidate the refcount management in the shrink function. > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > --- > mm/zswap.c | 69 ++++++++++++++---------------------------------------- > 1 file changed, 17 insertions(+), 52 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 2831bf56b168..ef8604812352 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -250,7 +250,8 @@ static bool zswap_has_pool; > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > zpool_get_type((p)->zpool)) > > -static int zswap_writeback_entry(struct zpool *pool, unsigned long handle); > +static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > + struct zswap_tree *tree); > static int zswap_pool_get(struct zswap_pool *pool); > static void zswap_pool_put(struct zswap_pool *pool); > > @@ -632,15 +633,21 @@ static int zswap_shrink(struct zswap_pool *pool) > } > spin_unlock(&tree->lock); > > - ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > + ret = zswap_writeback_entry(lru_entry, zhdr, tree); > > spin_lock(&tree->lock); > if (ret) { > spin_lock(&pool->lru_lock); > list_move(&lru_entry->lru, &pool->lru); > spin_unlock(&pool->lru_lock); This could use a comment. /* Writeback failed, put entry back on LRU */ > + zswap_entry_put(tree, tree_entry); This put is a common factor in both branches, so you can consolidate. > + } else { > + /* free the local reference */ > + zswap_entry_put(tree, tree_entry); > + /* free the entry if it's not been invalidated*/ Missing space between text and */ > + if (lru_entry == zswap_rb_search(&tree->rbroot, swpoffset)) > + zswap_entry_put(tree, tree_entry); > } > - zswap_entry_put(tree, tree_entry); > spin_unlock(&tree->lock); The success path freeing (hopefully the common path) is now unfortunately hidden in fairly deep indentation. I think this would be better with a goto for the error case. All together, something like this? if (ret) { /* Writeback failed, put entry back on LRU */ ... goto put_unlock; } /* * Writeback started successfully, the page now belongs to the * swapcache. Drop the base ref from the tree - unless invalidate * already took it out while we had the tree->lock released for IO. */ if (lru_entry == zswap_rb_search(&tree->rb_root, swpoffset)) zswap_entry_put(tree, entry); put_unlock: /* Drop local reference */ zswap_entry_put(tree, tree_entry); spin_unlock(&tree->lock); return ret ? -EAGAIN : 0; Btw, it's unsettling that we drop the tree reference without explicitly removing the entry for the tree. We rely on the final put that frees the entry to do tree removal, but this doesn't happen if somebody else is holding a reference; the entry can remain in the tree long after we've officially handed over ownership of the data to swapcache. TTBOMK there are currently no bugs because of that, but it's a data corruption waiting to happen. This should be: /* * Writeback started successfully, the page now belongs to the * swapcache. Drop the entry from zswap - unless invalidate already * took it out while we had the tree->lock released for IO. */ if (entry == zswap_rb_search(&tree->rb_root, swpoffset)) zswap_invalidate_entry(tree, entry); Would you care to send a follow-up patch? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 6/7] mm: zswap: simplify writeback function 2023-06-08 16:48 ` Johannes Weiner @ 2023-06-09 11:05 ` Domenico Cerasuolo 0 siblings, 0 replies; 28+ messages in thread From: Domenico Cerasuolo @ 2023-06-09 11:05 UTC (permalink / raw) To: Johannes Weiner Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, akpm, linux-kernel, kernel-team On Thu, Jun 8, 2023 at 6:48 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > Hi Domenico, > > On Tue, Jun 06, 2023 at 04:56:10PM +0200, Domenico Cerasuolo wrote: > > Previously, in zswap, the writeback function was passed to zpool drivers > > for their usage in calling the writeback operation. However, since the > > drivers did not possess references to entries and the function was > > specifically designed to work with handles, the writeback process has > > been modified to occur directly within zswap. > > I'm having trouble parsing this sentence. > > > Consequently, this change allows for some simplification of the > > writeback function, taking into account the updated workflow. > > How about: > > zswap_writeback_entry() used to be a callback for the backends, which > don't know about struct zswap_entry. > > Now that the only user is the generic zswap LRU reclaimer, it can be > simplified: pass the pinned zswap_entry directly, and consolidate the > refcount management in the shrink function. Sounds clearer, will update. > > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > > --- > > mm/zswap.c | 69 ++++++++++++++---------------------------------------- > > 1 file changed, 17 insertions(+), 52 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 2831bf56b168..ef8604812352 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -250,7 +250,8 @@ static bool zswap_has_pool; > > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > > zpool_get_type((p)->zpool)) > > > > -static int zswap_writeback_entry(struct zpool *pool, unsigned long handle); > > +static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr, > > + struct zswap_tree *tree); > > static int zswap_pool_get(struct zswap_pool *pool); > > static void zswap_pool_put(struct zswap_pool *pool); > > > > @@ -632,15 +633,21 @@ static int zswap_shrink(struct zswap_pool *pool) > > } > > spin_unlock(&tree->lock); > > > > - ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > > + ret = zswap_writeback_entry(lru_entry, zhdr, tree); > > > > spin_lock(&tree->lock); > > if (ret) { > > spin_lock(&pool->lru_lock); > > list_move(&lru_entry->lru, &pool->lru); > > spin_unlock(&pool->lru_lock); > > This could use a comment. > > /* Writeback failed, put entry back on LRU */ > > > + zswap_entry_put(tree, tree_entry); > > This put is a common factor in both branches, so you can consolidate. > > > + } else { > > + /* free the local reference */ > > + zswap_entry_put(tree, tree_entry); > > + /* free the entry if it's not been invalidated*/ > > Missing space between text and */ > > > + if (lru_entry == zswap_rb_search(&tree->rbroot, swpoffset)) > > + zswap_entry_put(tree, tree_entry); > > } > > - zswap_entry_put(tree, tree_entry); > > spin_unlock(&tree->lock); > > The success path freeing (hopefully the common path) is now > unfortunately hidden in fairly deep indentation. I think this would be > better with a goto for the error case. > > All together, something like this? > > if (ret) { > /* Writeback failed, put entry back on LRU */ > ... > goto put_unlock; > } > > /* > * Writeback started successfully, the page now belongs to the > * swapcache. Drop the base ref from the tree - unless invalidate > * already took it out while we had the tree->lock released for IO. > */ > if (lru_entry == zswap_rb_search(&tree->rb_root, swpoffset)) > zswap_entry_put(tree, entry); > > put_unlock: > /* Drop local reference */ > zswap_entry_put(tree, tree_entry); > spin_unlock(&tree->lock); > > return ret ? -EAGAIN : 0; > This feedback overlaps with the on in patch 1/7, I'm integrating them so that in patch #1, the invalidation check is done only with rb search and a first `goto unlock` for error. Then here the base reference-drop is added after the `ret` check, and errors go to `put_unlock`: if (ret) { /* Writeback failed, put entry back on LRU */ spin_lock(&pool->lru_lock); list_move(&entry->lru, &pool->lru); spin_unlock(&pool->lru_lock); goto put_unlock; } /* Check for invalidate() race */ if (entry != zswap_rb_search(&tree->rbroot, swpoffset)) goto put_unlock; /* Drop base reference */ zswap_entry_put(tree, entry); put_unlock: /* Drop local reference */ zswap_entry_put(tree, entry); unlock: spin_unlock(&tree->lock); return ret ? -EAGAIN : 0; > Btw, it's unsettling that we drop the tree reference without > explicitly removing the entry for the tree. We rely on the final put > that frees the entry to do tree removal, but this doesn't happen if > somebody else is holding a reference; the entry can remain in the tree > long after we've officially handed over ownership of the data to > swapcache. TTBOMK there are currently no bugs because of that, but > it's a data corruption waiting to happen. > > This should be: > > /* > * Writeback started successfully, the page now belongs to the > * swapcache. Drop the entry from zswap - unless invalidate already > * took it out while we had the tree->lock released for IO. > */ > if (entry == zswap_rb_search(&tree->rb_root, swpoffset)) > zswap_invalidate_entry(tree, entry); > > Would you care to send a follow-up patch? Makes total sense, thanks will send a patch for this. ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <20230606145611.704392-2-cerasuolodomenico@gmail.com>]
* Re: [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism [not found] ` <20230606145611.704392-2-cerasuolodomenico@gmail.com> @ 2023-06-07 8:14 ` Yosry Ahmed 2023-06-07 9:22 ` Domenico Cerasuolo 2023-06-07 21:39 ` Nhat Pham ` (2 subsequent siblings) 3 siblings, 1 reply; 28+ messages in thread From: Yosry Ahmed @ 2023-06-07 8:14 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink > function, which is called from zpool_shrink. However, with this commit, > a unified shrink function is added to zswap. The ultimate goal is to > eliminate the need for zpool_shrink once all zpool implementations have > dropped their shrink code. > > To ensure the functionality of each commit, this change focuses solely > on adding the mechanism itself. No modifications are made to > the backends, meaning that functionally, there are no immediate changes. > The zswap mechanism will only come into effect once the backends have > removed their shrink code. The subsequent commits will address the > modifications needed in the backends. > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > --- > mm/zswap.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 91 insertions(+), 5 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index bcb82e09eb64..c99bafcefecf 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -150,6 +150,12 @@ struct crypto_acomp_ctx { > struct mutex *mutex; > }; > > +/* > + * The lock ordering is zswap_tree.lock -> zswap_pool.lru_lock. > + * The only case where lru_lock is not acquired while holding tree.lock is > + * when a zswap_entry is taken off the lru for writeback, in that case it > + * needs to be verified that it's still valid in the tree. > + */ > struct zswap_pool { > struct zpool *zpool; > struct crypto_acomp_ctx __percpu *acomp_ctx; > @@ -159,6 +165,8 @@ struct zswap_pool { > struct work_struct shrink_work; > struct hlist_node node; > char tfm_name[CRYPTO_MAX_ALG_NAME]; > + struct list_head lru; > + spinlock_t lru_lock; > }; > > /* > @@ -176,10 +184,12 @@ struct zswap_pool { > * be held while changing the refcount. Since the lock must > * be held, there is no reason to also make refcount atomic. > * length - the length in bytes of the compressed page data. Needed during > - * decompression. For a same value filled page length is 0. > + * decompression. For a same value filled page length is 0, and both > + * pool and lru are invalid and must be ignored. > * pool - the zswap_pool the entry's data is in > * handle - zpool allocation handle that stores the compressed page data > * value - value of the same-value filled pages which have same content > + * lru - handle to the pool's lru used to evict pages. > */ > struct zswap_entry { > struct rb_node rbnode; > @@ -192,6 +202,7 @@ struct zswap_entry { > unsigned long value; > }; > struct obj_cgroup *objcg; > + struct list_head lru; > }; > > struct zswap_header { > @@ -364,6 +375,12 @@ static void zswap_free_entry(struct zswap_entry *entry) > if (!entry->length) > atomic_dec(&zswap_same_filled_pages); > else { > + /* zpool_evictable will be removed once all 3 backends have migrated */ > + if (!zpool_evictable(entry->pool->zpool)) { > + spin_lock(&entry->pool->lru_lock); > + list_del(&entry->lru); > + spin_unlock(&entry->pool->lru_lock); > + } > zpool_free(entry->pool->zpool, entry->handle); > zswap_pool_put(entry->pool); > } > @@ -584,14 +601,70 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > return NULL; > } > > +static int zswap_shrink(struct zswap_pool *pool) Nit: rename to zswap_shrink_one() so that it's clear we always writeback one entry per call? > +{ > + struct zswap_entry *lru_entry, *tree_entry = NULL; > + struct zswap_header *zhdr; > + struct zswap_tree *tree; > + int swpoffset; > + int ret; > + > + /* get a reclaimable entry from LRU */ > + spin_lock(&pool->lru_lock); > + if (list_empty(&pool->lru)) { > + spin_unlock(&pool->lru_lock); > + return -EINVAL; > + } > + lru_entry = list_last_entry(&pool->lru, struct zswap_entry, lru); > + list_del_init(&lru_entry->lru); > + zhdr = zpool_map_handle(pool->zpool, lru_entry->handle, ZPOOL_MM_RO); > + tree = zswap_trees[swp_type(zhdr->swpentry)]; > + zpool_unmap_handle(pool->zpool, lru_entry->handle); > + /* > + * Once the pool lock is dropped, the lru_entry might get freed. The Nit: lru lock* > + * swpoffset is copied to the stack, and lru_entry isn't deref'd again > + * until the entry is verified to still be alive in the tree. > + */ > + swpoffset = swp_offset(zhdr->swpentry); > + spin_unlock(&pool->lru_lock); > + > + /* hold a reference from tree so it won't be freed during writeback */ > + spin_lock(&tree->lock); > + tree_entry = zswap_entry_find_get(&tree->rbroot, swpoffset); > + if (tree_entry != lru_entry) { > + if (tree_entry) > + zswap_entry_put(tree, tree_entry); > + spin_unlock(&tree->lock); > + return -EAGAIN; > + } > + spin_unlock(&tree->lock); > + > + ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > + > + spin_lock(&tree->lock); > + if (ret) { > + spin_lock(&pool->lru_lock); > + list_move(&lru_entry->lru, &pool->lru); > + spin_unlock(&pool->lru_lock); > + } > + zswap_entry_put(tree, tree_entry); > + spin_unlock(&tree->lock); > + > + return ret ? -EAGAIN : 0; > +} > + > static void shrink_worker(struct work_struct *w) > { > struct zswap_pool *pool = container_of(w, typeof(*pool), > shrink_work); > int ret, failures = 0; > > + /* zpool_evictable will be removed once all 3 backends have migrated */ > do { > - ret = zpool_shrink(pool->zpool, 1, NULL); > + if (zpool_evictable(pool->zpool)) > + ret = zpool_shrink(pool->zpool, 1, NULL); > + else > + ret = zswap_shrink(pool); > if (ret) { > zswap_reject_reclaim_fail++; > if (ret != -EAGAIN) > @@ -655,6 +728,8 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) > */ > kref_init(&pool->kref); > INIT_LIST_HEAD(&pool->list); > + INIT_LIST_HEAD(&pool->lru); > + spin_lock_init(&pool->lru_lock); > INIT_WORK(&pool->shrink_work, shrink_worker); > > zswap_pool_debug("created", pool); > @@ -1270,7 +1345,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > } > > /* store */ > - hlen = zpool_evictable(entry->pool->zpool) ? sizeof(zhdr) : 0; > + hlen = sizeof(zhdr); > gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; > if (zpool_malloc_support_movable(entry->pool->zpool)) > gfp |= __GFP_HIGHMEM | __GFP_MOVABLE; > @@ -1313,6 +1388,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > zswap_entry_put(tree, dupentry); > } > } while (ret == -EEXIST); > + /* zpool_evictable will be removed once all 3 backends have migrated */ > + if (entry->length && !zpool_evictable(entry->pool->zpool)) { > + spin_lock(&entry->pool->lru_lock); > + list_add(&entry->lru, &entry->pool->lru); > + spin_unlock(&entry->pool->lru_lock); > + } > spin_unlock(&tree->lock); > > /* update stats */ > @@ -1384,8 +1465,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > /* decompress */ > dlen = PAGE_SIZE; > src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO); > - if (zpool_evictable(entry->pool->zpool)) > - src += sizeof(struct zswap_header); > + src += sizeof(struct zswap_header); > > if (!zpool_can_sleep_mapped(entry->pool->zpool)) { > memcpy(tmp, src, entry->length); > @@ -1415,6 +1495,12 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > freeentry: > spin_lock(&tree->lock); > zswap_entry_put(tree, entry); > + /* zpool_evictable will be removed once all 3 backends have migrated */ > + if (entry->length && !zpool_evictable(entry->pool->zpool)) { > + spin_lock(&entry->pool->lru_lock); > + list_move(&entry->lru, &entry->pool->lru); > + spin_unlock(&entry->pool->lru_lock); > + } It's not really this patch's fault, but when merged with commit fe1d1f7d0fb5 ("mm: zswap: support exclusive loads") from mm-unstable [1], and with CONFIG_ZSWAP_EXCLUSIVE_LOADS=y, this causes a crash. This happens because fe1d1f7d0fb5 makes the loads exclusive, so zswap_entry_put(tree, entry) above the added code causes the entry to be freed, then we go ahead and deference multiple fields within it in the added chunk. Moving the chunk above zswap_entry_put() (and consequently also above zswap_invalidate_entry() from fe1d1f7d0fb5) makes this work correctly. Perhaps it would be useful to rebase on top of fe1d1f7d0fb5 for your next version(s), if any. Maybe the outcome would be something like: zswap_entry_put(tree, entry); if (!ret && IS_ENABLED(CONFIG_ZSWAP_EXCLUSIVE_LOADS)) { zswap_invalidate_entry(tree, entry); } else if (entry->length && !zpool_evictable(entry->pool->zpool)) { spin_lock(&entry->pool->lru_lock); list_move(&entry->lru, &entry->pool->lru); spin_unlock(&entry->pool->lru_lock); } I am assuming if we are going to invalidate the entry anyway there is no need to move it to the front of the lru -- but I didn't really think it through. [1]https://lore.kernel.org/lkml/20230530210251.493194-1-yosryahmed@google.com/ > spin_unlock(&tree->lock); > > return ret; > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism 2023-06-07 8:14 ` [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism Yosry Ahmed @ 2023-06-07 9:22 ` Domenico Cerasuolo 2023-06-07 9:31 ` Yosry Ahmed 0 siblings, 1 reply; 28+ messages in thread From: Domenico Cerasuolo @ 2023-06-07 9:22 UTC (permalink / raw) To: Yosry Ahmed Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Wed, Jun 7, 2023 at 10:14 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo > <cerasuolodomenico@gmail.com> wrote: > > > > Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink > > function, which is called from zpool_shrink. However, with this commit, > > a unified shrink function is added to zswap. The ultimate goal is to > > eliminate the need for zpool_shrink once all zpool implementations have > > dropped their shrink code. > > > > To ensure the functionality of each commit, this change focuses solely > > on adding the mechanism itself. No modifications are made to > > the backends, meaning that functionally, there are no immediate changes. > > The zswap mechanism will only come into effect once the backends have > > removed their shrink code. The subsequent commits will address the > > modifications needed in the backends. > > > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > > --- > > mm/zswap.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++--- > > 1 file changed, 91 insertions(+), 5 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index bcb82e09eb64..c99bafcefecf 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -150,6 +150,12 @@ struct crypto_acomp_ctx { > > struct mutex *mutex; > > }; > > > > +/* > > + * The lock ordering is zswap_tree.lock -> zswap_pool.lru_lock. > > + * The only case where lru_lock is not acquired while holding tree.lock is > > + * when a zswap_entry is taken off the lru for writeback, in that case it > > + * needs to be verified that it's still valid in the tree. > > + */ > > struct zswap_pool { > > struct zpool *zpool; > > struct crypto_acomp_ctx __percpu *acomp_ctx; > > @@ -159,6 +165,8 @@ struct zswap_pool { > > struct work_struct shrink_work; > > struct hlist_node node; > > char tfm_name[CRYPTO_MAX_ALG_NAME]; > > + struct list_head lru; > > + spinlock_t lru_lock; > > }; > > > > /* > > @@ -176,10 +184,12 @@ struct zswap_pool { > > * be held while changing the refcount. Since the lock must > > * be held, there is no reason to also make refcount atomic. > > * length - the length in bytes of the compressed page data. Needed during > > - * decompression. For a same value filled page length is 0. > > + * decompression. For a same value filled page length is 0, and both > > + * pool and lru are invalid and must be ignored. > > * pool - the zswap_pool the entry's data is in > > * handle - zpool allocation handle that stores the compressed page data > > * value - value of the same-value filled pages which have same content > > + * lru - handle to the pool's lru used to evict pages. > > */ > > struct zswap_entry { > > struct rb_node rbnode; > > @@ -192,6 +202,7 @@ struct zswap_entry { > > unsigned long value; > > }; > > struct obj_cgroup *objcg; > > + struct list_head lru; > > }; > > > > struct zswap_header { > > @@ -364,6 +375,12 @@ static void zswap_free_entry(struct zswap_entry *entry) > > if (!entry->length) > > atomic_dec(&zswap_same_filled_pages); > > else { > > + /* zpool_evictable will be removed once all 3 backends have migrated */ > > + if (!zpool_evictable(entry->pool->zpool)) { > > + spin_lock(&entry->pool->lru_lock); > > + list_del(&entry->lru); > > + spin_unlock(&entry->pool->lru_lock); > > + } > > zpool_free(entry->pool->zpool, entry->handle); > > zswap_pool_put(entry->pool); > > } > > @@ -584,14 +601,70 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > > return NULL; > > } > > > > +static int zswap_shrink(struct zswap_pool *pool) > > Nit: rename to zswap_shrink_one() so that it's clear we always > writeback one entry per call? I named it like that to mirror zpool_shrink but I think that you've got a point in that it might not be very clear that it is shrinking by one page only. What about zswap_reclaim_entry? I'm not a native speaker, but with zswap_shrink_one I wouldn't obviously intend that the "one" refers to an entry. > > > +{ > > + struct zswap_entry *lru_entry, *tree_entry = NULL; > > + struct zswap_header *zhdr; > > + struct zswap_tree *tree; > > + int swpoffset; > > + int ret; > > + > > + /* get a reclaimable entry from LRU */ > > + spin_lock(&pool->lru_lock); > > + if (list_empty(&pool->lru)) { > > + spin_unlock(&pool->lru_lock); > > + return -EINVAL; > > + } > > + lru_entry = list_last_entry(&pool->lru, struct zswap_entry, lru); > > + list_del_init(&lru_entry->lru); > > + zhdr = zpool_map_handle(pool->zpool, lru_entry->handle, ZPOOL_MM_RO); > > + tree = zswap_trees[swp_type(zhdr->swpentry)]; > > + zpool_unmap_handle(pool->zpool, lru_entry->handle); > > + /* > > + * Once the pool lock is dropped, the lru_entry might get freed. The > > Nit: lru lock* > > > + * swpoffset is copied to the stack, and lru_entry isn't deref'd again > > + * until the entry is verified to still be alive in the tree. > > + */ > > + swpoffset = swp_offset(zhdr->swpentry); > > + spin_unlock(&pool->lru_lock); > > + > > + /* hold a reference from tree so it won't be freed during writeback */ > > + spin_lock(&tree->lock); > > + tree_entry = zswap_entry_find_get(&tree->rbroot, swpoffset); > > + if (tree_entry != lru_entry) { > > + if (tree_entry) > > + zswap_entry_put(tree, tree_entry); > > + spin_unlock(&tree->lock); > > + return -EAGAIN; > > + } > > + spin_unlock(&tree->lock); > > + > > + ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > > + > > + spin_lock(&tree->lock); > > + if (ret) { > > + spin_lock(&pool->lru_lock); > > + list_move(&lru_entry->lru, &pool->lru); > > + spin_unlock(&pool->lru_lock); > > + } > > + zswap_entry_put(tree, tree_entry); > > + spin_unlock(&tree->lock); > > + > > + return ret ? -EAGAIN : 0; > > +} > > + > > static void shrink_worker(struct work_struct *w) > > { > > struct zswap_pool *pool = container_of(w, typeof(*pool), > > shrink_work); > > int ret, failures = 0; > > > > + /* zpool_evictable will be removed once all 3 backends have migrated */ > > do { > > - ret = zpool_shrink(pool->zpool, 1, NULL); > > + if (zpool_evictable(pool->zpool)) > > + ret = zpool_shrink(pool->zpool, 1, NULL); > > + else > > + ret = zswap_shrink(pool); > > if (ret) { > > zswap_reject_reclaim_fail++; > > if (ret != -EAGAIN) > > @@ -655,6 +728,8 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) > > */ > > kref_init(&pool->kref); > > INIT_LIST_HEAD(&pool->list); > > + INIT_LIST_HEAD(&pool->lru); > > + spin_lock_init(&pool->lru_lock); > > INIT_WORK(&pool->shrink_work, shrink_worker); > > > > zswap_pool_debug("created", pool); > > @@ -1270,7 +1345,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > } > > > > /* store */ > > - hlen = zpool_evictable(entry->pool->zpool) ? sizeof(zhdr) : 0; > > + hlen = sizeof(zhdr); > > gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; > > if (zpool_malloc_support_movable(entry->pool->zpool)) > > gfp |= __GFP_HIGHMEM | __GFP_MOVABLE; > > @@ -1313,6 +1388,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > zswap_entry_put(tree, dupentry); > > } > > } while (ret == -EEXIST); > > + /* zpool_evictable will be removed once all 3 backends have migrated */ > > + if (entry->length && !zpool_evictable(entry->pool->zpool)) { > > + spin_lock(&entry->pool->lru_lock); > > + list_add(&entry->lru, &entry->pool->lru); > > + spin_unlock(&entry->pool->lru_lock); > > + } > > spin_unlock(&tree->lock); > > > > /* update stats */ > > @@ -1384,8 +1465,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > > /* decompress */ > > dlen = PAGE_SIZE; > > src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO); > > - if (zpool_evictable(entry->pool->zpool)) > > - src += sizeof(struct zswap_header); > > + src += sizeof(struct zswap_header); > > > > if (!zpool_can_sleep_mapped(entry->pool->zpool)) { > > memcpy(tmp, src, entry->length); > > @@ -1415,6 +1495,12 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > > freeentry: > > spin_lock(&tree->lock); > > zswap_entry_put(tree, entry); > > + /* zpool_evictable will be removed once all 3 backends have migrated */ > > + if (entry->length && !zpool_evictable(entry->pool->zpool)) { > > + spin_lock(&entry->pool->lru_lock); > > + list_move(&entry->lru, &entry->pool->lru); > > + spin_unlock(&entry->pool->lru_lock); > > + } > > It's not really this patch's fault, but when merged with commit > fe1d1f7d0fb5 ("mm: zswap: support exclusive loads") from mm-unstable > [1], and with CONFIG_ZSWAP_EXCLUSIVE_LOADS=y, this causes a crash. > > This happens because fe1d1f7d0fb5 makes the loads exclusive, so > zswap_entry_put(tree, entry) above the added code causes the entry to > be freed, then we go ahead and deference multiple fields within it in > the added chunk. Moving the chunk above zswap_entry_put() (and > consequently also above zswap_invalidate_entry() from fe1d1f7d0fb5) > makes this work correctly. > > Perhaps it would be useful to rebase on top of fe1d1f7d0fb5 for your > next version(s), if any. Will definitely rebase, I just now saw that you tested the suggested resolution below, thanks, it does make sense. > > Maybe the outcome would be something like: > > zswap_entry_put(tree, entry); > if (!ret && IS_ENABLED(CONFIG_ZSWAP_EXCLUSIVE_LOADS)) { > zswap_invalidate_entry(tree, entry); > } else if (entry->length && !zpool_evictable(entry->pool->zpool)) { > spin_lock(&entry->pool->lru_lock); > list_move(&entry->lru, &entry->pool->lru); > spin_unlock(&entry->pool->lru_lock); > } > > I am assuming if we are going to invalidate the entry anyway there is > no need to move it to the front of the lru -- but I didn't really > think it through. > > [1]https://lore.kernel.org/lkml/20230530210251.493194-1-yosryahmed@google.com/ > > > spin_unlock(&tree->lock); > > > > return ret; > > -- > > 2.34.1 > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism 2023-06-07 9:22 ` Domenico Cerasuolo @ 2023-06-07 9:31 ` Yosry Ahmed 0 siblings, 0 replies; 28+ messages in thread From: Yosry Ahmed @ 2023-06-07 9:31 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, linux-mm, ddstreet, sjenning, nphamcs, hannes, akpm, linux-kernel, kernel-team On Wed, Jun 7, 2023 at 2:22 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > On Wed, Jun 7, 2023 at 10:14 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo > > <cerasuolodomenico@gmail.com> wrote: > > > > > > Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink > > > function, which is called from zpool_shrink. However, with this commit, > > > a unified shrink function is added to zswap. The ultimate goal is to > > > eliminate the need for zpool_shrink once all zpool implementations have > > > dropped their shrink code. > > > > > > To ensure the functionality of each commit, this change focuses solely > > > on adding the mechanism itself. No modifications are made to > > > the backends, meaning that functionally, there are no immediate changes. > > > The zswap mechanism will only come into effect once the backends have > > > removed their shrink code. The subsequent commits will address the > > > modifications needed in the backends. > > > > > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > > > --- > > > mm/zswap.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++--- > > > 1 file changed, 91 insertions(+), 5 deletions(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index bcb82e09eb64..c99bafcefecf 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -150,6 +150,12 @@ struct crypto_acomp_ctx { > > > struct mutex *mutex; > > > }; > > > > > > +/* > > > + * The lock ordering is zswap_tree.lock -> zswap_pool.lru_lock. > > > + * The only case where lru_lock is not acquired while holding tree.lock is > > > + * when a zswap_entry is taken off the lru for writeback, in that case it > > > + * needs to be verified that it's still valid in the tree. > > > + */ > > > struct zswap_pool { > > > struct zpool *zpool; > > > struct crypto_acomp_ctx __percpu *acomp_ctx; > > > @@ -159,6 +165,8 @@ struct zswap_pool { > > > struct work_struct shrink_work; > > > struct hlist_node node; > > > char tfm_name[CRYPTO_MAX_ALG_NAME]; > > > + struct list_head lru; > > > + spinlock_t lru_lock; > > > }; > > > > > > /* > > > @@ -176,10 +184,12 @@ struct zswap_pool { > > > * be held while changing the refcount. Since the lock must > > > * be held, there is no reason to also make refcount atomic. > > > * length - the length in bytes of the compressed page data. Needed during > > > - * decompression. For a same value filled page length is 0. > > > + * decompression. For a same value filled page length is 0, and both > > > + * pool and lru are invalid and must be ignored. > > > * pool - the zswap_pool the entry's data is in > > > * handle - zpool allocation handle that stores the compressed page data > > > * value - value of the same-value filled pages which have same content > > > + * lru - handle to the pool's lru used to evict pages. > > > */ > > > struct zswap_entry { > > > struct rb_node rbnode; > > > @@ -192,6 +202,7 @@ struct zswap_entry { > > > unsigned long value; > > > }; > > > struct obj_cgroup *objcg; > > > + struct list_head lru; > > > }; > > > > > > struct zswap_header { > > > @@ -364,6 +375,12 @@ static void zswap_free_entry(struct zswap_entry *entry) > > > if (!entry->length) > > > atomic_dec(&zswap_same_filled_pages); > > > else { > > > + /* zpool_evictable will be removed once all 3 backends have migrated */ > > > + if (!zpool_evictable(entry->pool->zpool)) { > > > + spin_lock(&entry->pool->lru_lock); > > > + list_del(&entry->lru); > > > + spin_unlock(&entry->pool->lru_lock); > > > + } > > > zpool_free(entry->pool->zpool, entry->handle); > > > zswap_pool_put(entry->pool); > > > } > > > @@ -584,14 +601,70 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > > > return NULL; > > > } > > > > > > +static int zswap_shrink(struct zswap_pool *pool) > > > > Nit: rename to zswap_shrink_one() so that it's clear we always > > writeback one entry per call? > > I named it like that to mirror zpool_shrink but I think that you've got a point > in that it might not be very clear that it is shrinking by one page only. > What about zswap_reclaim_entry? I'm not a native speaker, but with > zswap_shrink_one I wouldn't obviously intend that the "one" refers to an > entry. I am not a native speaker either, but zswap_reclaim_entry() sounds much better :) > > > > > > +{ > > > + struct zswap_entry *lru_entry, *tree_entry = NULL; > > > + struct zswap_header *zhdr; > > > + struct zswap_tree *tree; > > > + int swpoffset; > > > + int ret; > > > + > > > + /* get a reclaimable entry from LRU */ > > > + spin_lock(&pool->lru_lock); > > > + if (list_empty(&pool->lru)) { > > > + spin_unlock(&pool->lru_lock); > > > + return -EINVAL; > > > + } > > > + lru_entry = list_last_entry(&pool->lru, struct zswap_entry, lru); > > > + list_del_init(&lru_entry->lru); > > > + zhdr = zpool_map_handle(pool->zpool, lru_entry->handle, ZPOOL_MM_RO); > > > + tree = zswap_trees[swp_type(zhdr->swpentry)]; > > > + zpool_unmap_handle(pool->zpool, lru_entry->handle); > > > + /* > > > + * Once the pool lock is dropped, the lru_entry might get freed. The > > > > Nit: lru lock* > > > > > + * swpoffset is copied to the stack, and lru_entry isn't deref'd again > > > + * until the entry is verified to still be alive in the tree. > > > + */ > > > + swpoffset = swp_offset(zhdr->swpentry); > > > + spin_unlock(&pool->lru_lock); > > > + > > > + /* hold a reference from tree so it won't be freed during writeback */ > > > + spin_lock(&tree->lock); > > > + tree_entry = zswap_entry_find_get(&tree->rbroot, swpoffset); > > > + if (tree_entry != lru_entry) { > > > + if (tree_entry) > > > + zswap_entry_put(tree, tree_entry); > > > + spin_unlock(&tree->lock); > > > + return -EAGAIN; > > > + } > > > + spin_unlock(&tree->lock); > > > + > > > + ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > > > + > > > + spin_lock(&tree->lock); > > > + if (ret) { > > > + spin_lock(&pool->lru_lock); > > > + list_move(&lru_entry->lru, &pool->lru); > > > + spin_unlock(&pool->lru_lock); > > > + } > > > + zswap_entry_put(tree, tree_entry); > > > + spin_unlock(&tree->lock); > > > + > > > + return ret ? -EAGAIN : 0; > > > +} > > > + > > > static void shrink_worker(struct work_struct *w) > > > { > > > struct zswap_pool *pool = container_of(w, typeof(*pool), > > > shrink_work); > > > int ret, failures = 0; > > > > > > + /* zpool_evictable will be removed once all 3 backends have migrated */ > > > do { > > > - ret = zpool_shrink(pool->zpool, 1, NULL); > > > + if (zpool_evictable(pool->zpool)) > > > + ret = zpool_shrink(pool->zpool, 1, NULL); > > > + else > > > + ret = zswap_shrink(pool); > > > if (ret) { > > > zswap_reject_reclaim_fail++; > > > if (ret != -EAGAIN) > > > @@ -655,6 +728,8 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) > > > */ > > > kref_init(&pool->kref); > > > INIT_LIST_HEAD(&pool->list); > > > + INIT_LIST_HEAD(&pool->lru); > > > + spin_lock_init(&pool->lru_lock); > > > INIT_WORK(&pool->shrink_work, shrink_worker); > > > > > > zswap_pool_debug("created", pool); > > > @@ -1270,7 +1345,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > > } > > > > > > /* store */ > > > - hlen = zpool_evictable(entry->pool->zpool) ? sizeof(zhdr) : 0; > > > + hlen = sizeof(zhdr); > > > gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; > > > if (zpool_malloc_support_movable(entry->pool->zpool)) > > > gfp |= __GFP_HIGHMEM | __GFP_MOVABLE; > > > @@ -1313,6 +1388,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > > zswap_entry_put(tree, dupentry); > > > } > > > } while (ret == -EEXIST); > > > + /* zpool_evictable will be removed once all 3 backends have migrated */ > > > + if (entry->length && !zpool_evictable(entry->pool->zpool)) { > > > + spin_lock(&entry->pool->lru_lock); > > > + list_add(&entry->lru, &entry->pool->lru); > > > + spin_unlock(&entry->pool->lru_lock); > > > + } > > > spin_unlock(&tree->lock); > > > > > > /* update stats */ > > > @@ -1384,8 +1465,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > > > /* decompress */ > > > dlen = PAGE_SIZE; > > > src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO); > > > - if (zpool_evictable(entry->pool->zpool)) > > > - src += sizeof(struct zswap_header); > > > + src += sizeof(struct zswap_header); > > > > > > if (!zpool_can_sleep_mapped(entry->pool->zpool)) { > > > memcpy(tmp, src, entry->length); > > > @@ -1415,6 +1495,12 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > > > freeentry: > > > spin_lock(&tree->lock); > > > zswap_entry_put(tree, entry); > > > + /* zpool_evictable will be removed once all 3 backends have migrated */ > > > + if (entry->length && !zpool_evictable(entry->pool->zpool)) { > > > + spin_lock(&entry->pool->lru_lock); > > > + list_move(&entry->lru, &entry->pool->lru); > > > + spin_unlock(&entry->pool->lru_lock); > > > + } > > > > It's not really this patch's fault, but when merged with commit > > fe1d1f7d0fb5 ("mm: zswap: support exclusive loads") from mm-unstable > > [1], and with CONFIG_ZSWAP_EXCLUSIVE_LOADS=y, this causes a crash. > > > > This happens because fe1d1f7d0fb5 makes the loads exclusive, so > > zswap_entry_put(tree, entry) above the added code causes the entry to > > be freed, then we go ahead and deference multiple fields within it in > > the added chunk. Moving the chunk above zswap_entry_put() (and > > consequently also above zswap_invalidate_entry() from fe1d1f7d0fb5) > > makes this work correctly. > > > > Perhaps it would be useful to rebase on top of fe1d1f7d0fb5 for your > > next version(s), if any. > > Will definitely rebase, I just now saw that you tested the suggested resolution > below, thanks, it does make sense. > > > > > Maybe the outcome would be something like: > > > > zswap_entry_put(tree, entry); > > if (!ret && IS_ENABLED(CONFIG_ZSWAP_EXCLUSIVE_LOADS)) { > > zswap_invalidate_entry(tree, entry); > > } else if (entry->length && !zpool_evictable(entry->pool->zpool)) { > > spin_lock(&entry->pool->lru_lock); > > list_move(&entry->lru, &entry->pool->lru); > > spin_unlock(&entry->pool->lru_lock); > > } > > > > I am assuming if we are going to invalidate the entry anyway there is > > no need to move it to the front of the lru -- but I didn't really > > think it through. > > > > [1]https://lore.kernel.org/lkml/20230530210251.493194-1-yosryahmed@google.com/ > > > > > spin_unlock(&tree->lock); > > > > > > return ret; > > > -- > > > 2.34.1 > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism [not found] ` <20230606145611.704392-2-cerasuolodomenico@gmail.com> 2023-06-07 8:14 ` [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism Yosry Ahmed @ 2023-06-07 21:39 ` Nhat Pham 2023-06-08 15:58 ` Johannes Weiner 2023-06-08 16:52 ` Johannes Weiner 3 siblings, 0 replies; 28+ messages in thread From: Nhat Pham @ 2023-06-07 21:39 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, hannes, akpm, linux-kernel, kernel-team On Tue, Jun 6, 2023 at 7:56 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink > function, which is called from zpool_shrink. However, with this commit, > a unified shrink function is added to zswap. The ultimate goal is to > eliminate the need for zpool_shrink once all zpool implementations have > dropped their shrink code. > > To ensure the functionality of each commit, this change focuses solely > on adding the mechanism itself. No modifications are made to > the backends, meaning that functionally, there are no immediate changes. > The zswap mechanism will only come into effect once the backends have > removed their shrink code. The subsequent commits will address the > modifications needed in the backends. > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > --- > mm/zswap.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 91 insertions(+), 5 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index bcb82e09eb64..c99bafcefecf 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -150,6 +150,12 @@ struct crypto_acomp_ctx { > struct mutex *mutex; > }; > > +/* > + * The lock ordering is zswap_tree.lock -> zswap_pool.lru_lock. > + * The only case where lru_lock is not acquired while holding tree.lock is > + * when a zswap_entry is taken off the lru for writeback, in that case it > + * needs to be verified that it's still valid in the tree. > + */ > struct zswap_pool { > struct zpool *zpool; > struct crypto_acomp_ctx __percpu *acomp_ctx; > @@ -159,6 +165,8 @@ struct zswap_pool { > struct work_struct shrink_work; > struct hlist_node node; > char tfm_name[CRYPTO_MAX_ALG_NAME]; > + struct list_head lru; > + spinlock_t lru_lock; > }; > > /* > @@ -176,10 +184,12 @@ struct zswap_pool { > * be held while changing the refcount. Since the lock must > * be held, there is no reason to also make refcount atomic. > * length - the length in bytes of the compressed page data. Needed during > - * decompression. For a same value filled page length is 0. > + * decompression. For a same value filled page length is 0, and both > + * pool and lru are invalid and must be ignored. > * pool - the zswap_pool the entry's data is in > * handle - zpool allocation handle that stores the compressed page data > * value - value of the same-value filled pages which have same content > + * lru - handle to the pool's lru used to evict pages. > */ > struct zswap_entry { > struct rb_node rbnode; > @@ -192,6 +202,7 @@ struct zswap_entry { > unsigned long value; > }; > struct obj_cgroup *objcg; > + struct list_head lru; > }; > > struct zswap_header { > @@ -364,6 +375,12 @@ static void zswap_free_entry(struct zswap_entry *entry) > if (!entry->length) > atomic_dec(&zswap_same_filled_pages); > else { > + /* zpool_evictable will be removed once all 3 backends have migrated */ > + if (!zpool_evictable(entry->pool->zpool)) { > + spin_lock(&entry->pool->lru_lock); > + list_del(&entry->lru); > + spin_unlock(&entry->pool->lru_lock); > + } > zpool_free(entry->pool->zpool, entry->handle); > zswap_pool_put(entry->pool); > } > @@ -584,14 +601,70 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > return NULL; > } > > +static int zswap_shrink(struct zswap_pool *pool) > +{ > + struct zswap_entry *lru_entry, *tree_entry = NULL; > + struct zswap_header *zhdr; > + struct zswap_tree *tree; > + int swpoffset; > + int ret; > + > + /* get a reclaimable entry from LRU */ > + spin_lock(&pool->lru_lock); > + if (list_empty(&pool->lru)) { > + spin_unlock(&pool->lru_lock); > + return -EINVAL; > + } > + lru_entry = list_last_entry(&pool->lru, struct zswap_entry, lru); > + list_del_init(&lru_entry->lru); > + zhdr = zpool_map_handle(pool->zpool, lru_entry->handle, ZPOOL_MM_RO); > + tree = zswap_trees[swp_type(zhdr->swpentry)]; > + zpool_unmap_handle(pool->zpool, lru_entry->handle); > + /* > + * Once the pool lock is dropped, the lru_entry might get freed. The > + * swpoffset is copied to the stack, and lru_entry isn't deref'd again > + * until the entry is verified to still be alive in the tree. > + */ > + swpoffset = swp_offset(zhdr->swpentry); > + spin_unlock(&pool->lru_lock); > + > + /* hold a reference from tree so it won't be freed during writeback */ > + spin_lock(&tree->lock); > + tree_entry = zswap_entry_find_get(&tree->rbroot, swpoffset); > + if (tree_entry != lru_entry) { > + if (tree_entry) > + zswap_entry_put(tree, tree_entry); > + spin_unlock(&tree->lock); > + return -EAGAIN; > + } > + spin_unlock(&tree->lock); > + > + ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > + > + spin_lock(&tree->lock); > + if (ret) { > + spin_lock(&pool->lru_lock); > + list_move(&lru_entry->lru, &pool->lru); > + spin_unlock(&pool->lru_lock); > + } > + zswap_entry_put(tree, tree_entry); > + spin_unlock(&tree->lock); > + > + return ret ? -EAGAIN : 0; > +} > + > static void shrink_worker(struct work_struct *w) > { > struct zswap_pool *pool = container_of(w, typeof(*pool), > shrink_work); > int ret, failures = 0; > > + /* zpool_evictable will be removed once all 3 backends have migrated */ > do { > - ret = zpool_shrink(pool->zpool, 1, NULL); > + if (zpool_evictable(pool->zpool)) > + ret = zpool_shrink(pool->zpool, 1, NULL); > + else > + ret = zswap_shrink(pool); > if (ret) { > zswap_reject_reclaim_fail++; > if (ret != -EAGAIN) > @@ -655,6 +728,8 @@ static struct zswap_pool *zswap_pool_create(char *type, char *compressor) > */ > kref_init(&pool->kref); > INIT_LIST_HEAD(&pool->list); > + INIT_LIST_HEAD(&pool->lru); > + spin_lock_init(&pool->lru_lock); > INIT_WORK(&pool->shrink_work, shrink_worker); > > zswap_pool_debug("created", pool); > @@ -1270,7 +1345,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > } > > /* store */ > - hlen = zpool_evictable(entry->pool->zpool) ? sizeof(zhdr) : 0; > + hlen = sizeof(zhdr); > gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM; > if (zpool_malloc_support_movable(entry->pool->zpool)) > gfp |= __GFP_HIGHMEM | __GFP_MOVABLE; > @@ -1313,6 +1388,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > zswap_entry_put(tree, dupentry); > } > } while (ret == -EEXIST); > + /* zpool_evictable will be removed once all 3 backends have migrated */ > + if (entry->length && !zpool_evictable(entry->pool->zpool)) { > + spin_lock(&entry->pool->lru_lock); > + list_add(&entry->lru, &entry->pool->lru); > + spin_unlock(&entry->pool->lru_lock); > + } > spin_unlock(&tree->lock); > > /* update stats */ > @@ -1384,8 +1465,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > /* decompress */ > dlen = PAGE_SIZE; > src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO); > - if (zpool_evictable(entry->pool->zpool)) > - src += sizeof(struct zswap_header); > + src += sizeof(struct zswap_header); > > if (!zpool_can_sleep_mapped(entry->pool->zpool)) { > memcpy(tmp, src, entry->length); > @@ -1415,6 +1495,12 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > freeentry: > spin_lock(&tree->lock); > zswap_entry_put(tree, entry); > + /* zpool_evictable will be removed once all 3 backends have migrated */ > + if (entry->length && !zpool_evictable(entry->pool->zpool)) { > + spin_lock(&entry->pool->lru_lock); > + list_move(&entry->lru, &entry->pool->lru); > + spin_unlock(&entry->pool->lru_lock); > + } > spin_unlock(&tree->lock); > > return ret; > -- > 2.34.1 > Looks real solid to me! Thanks, Domenico. Acked-by: Nhat Pham <nphamcs@gmail.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism [not found] ` <20230606145611.704392-2-cerasuolodomenico@gmail.com> 2023-06-07 8:14 ` [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism Yosry Ahmed 2023-06-07 21:39 ` Nhat Pham @ 2023-06-08 15:58 ` Johannes Weiner 2023-06-08 16:52 ` Johannes Weiner 3 siblings, 0 replies; 28+ messages in thread From: Johannes Weiner @ 2023-06-08 15:58 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, akpm, linux-kernel, kernel-team Hi Domenico, Thanks for incorporating the feedback. Just two more nits: On Tue, Jun 06, 2023 at 04:56:05PM +0200, Domenico Cerasuolo wrote: > Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink > function, which is called from zpool_shrink. However, with this commit, > a unified shrink function is added to zswap. The ultimate goal is to > eliminate the need for zpool_shrink once all zpool implementations have > dropped their shrink code. > > To ensure the functionality of each commit, this change focuses solely > on adding the mechanism itself. No modifications are made to > the backends, meaning that functionally, there are no immediate changes. > The zswap mechanism will only come into effect once the backends have > removed their shrink code. The subsequent commits will address the > modifications needed in the backends. > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > @@ -364,6 +375,12 @@ static void zswap_free_entry(struct zswap_entry *entry) > if (!entry->length) > atomic_dec(&zswap_same_filled_pages); > else { > + /* zpool_evictable will be removed once all 3 backends have migrated */ > + if (!zpool_evictable(entry->pool->zpool)) { Comment indentation is off. > + spin_lock(&entry->pool->lru_lock); > + list_del(&entry->lru); > + spin_unlock(&entry->pool->lru_lock); > + } > zpool_free(entry->pool->zpool, entry->handle); > zswap_pool_put(entry->pool); > } > @@ -584,14 +601,70 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > return NULL; > } > > +static int zswap_shrink(struct zswap_pool *pool) > +{ > + struct zswap_entry *lru_entry, *tree_entry = NULL; > + struct zswap_header *zhdr; > + struct zswap_tree *tree; > + int swpoffset; pgoff_t With that and what Yosry pointed out fixed, please feel free to add Acked-by: Johannes Weiner <hannes@cmpxchg.org> to the next version. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism [not found] ` <20230606145611.704392-2-cerasuolodomenico@gmail.com> ` (2 preceding siblings ...) 2023-06-08 15:58 ` Johannes Weiner @ 2023-06-08 16:52 ` Johannes Weiner 2023-06-08 17:04 ` Johannes Weiner 3 siblings, 1 reply; 28+ messages in thread From: Johannes Weiner @ 2023-06-08 16:52 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, akpm, linux-kernel, kernel-team On Tue, Jun 06, 2023 at 04:56:05PM +0200, Domenico Cerasuolo wrote: > @@ -584,14 +601,70 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > return NULL; > } > > +static int zswap_shrink(struct zswap_pool *pool) > +{ > + struct zswap_entry *lru_entry, *tree_entry = NULL; > + struct zswap_header *zhdr; > + struct zswap_tree *tree; > + int swpoffset; > + int ret; > + > + /* get a reclaimable entry from LRU */ > + spin_lock(&pool->lru_lock); > + if (list_empty(&pool->lru)) { > + spin_unlock(&pool->lru_lock); > + return -EINVAL; > + } > + lru_entry = list_last_entry(&pool->lru, struct zswap_entry, lru); > + list_del_init(&lru_entry->lru); > + zhdr = zpool_map_handle(pool->zpool, lru_entry->handle, ZPOOL_MM_RO); > + tree = zswap_trees[swp_type(zhdr->swpentry)]; > + zpool_unmap_handle(pool->zpool, lru_entry->handle); > + /* > + * Once the pool lock is dropped, the lru_entry might get freed. The > + * swpoffset is copied to the stack, and lru_entry isn't deref'd again > + * until the entry is verified to still be alive in the tree. > + */ > + swpoffset = swp_offset(zhdr->swpentry); > + spin_unlock(&pool->lru_lock); > + > + /* hold a reference from tree so it won't be freed during writeback */ > + spin_lock(&tree->lock); > + tree_entry = zswap_entry_find_get(&tree->rbroot, swpoffset); > + if (tree_entry != lru_entry) { > + if (tree_entry) > + zswap_entry_put(tree, tree_entry); > + spin_unlock(&tree->lock); > + return -EAGAIN; > + } > + spin_unlock(&tree->lock); > + > + ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > + > + spin_lock(&tree->lock); > + if (ret) { > + spin_lock(&pool->lru_lock); > + list_move(&lru_entry->lru, &pool->lru); > + spin_unlock(&pool->lru_lock); > + } > + zswap_entry_put(tree, tree_entry); On re-reading this, I find the lru_entry vs tree_entry distinction unnecessarily complicated. Once it's known that the thing coming off the LRU is the same thing as in the tree, there is only "the entry". How about 'entry' and 'tree_entry', and after validation use 'entry' throughout the rest of the function? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism 2023-06-08 16:52 ` Johannes Weiner @ 2023-06-08 17:04 ` Johannes Weiner 2023-06-08 18:45 ` Johannes Weiner 0 siblings, 1 reply; 28+ messages in thread From: Johannes Weiner @ 2023-06-08 17:04 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, akpm, linux-kernel, kernel-team On Thu, Jun 08, 2023 at 12:52:51PM -0400, Johannes Weiner wrote: > On Tue, Jun 06, 2023 at 04:56:05PM +0200, Domenico Cerasuolo wrote: > > @@ -584,14 +601,70 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > > return NULL; > > } > > > > +static int zswap_shrink(struct zswap_pool *pool) > > +{ > > + struct zswap_entry *lru_entry, *tree_entry = NULL; > > + struct zswap_header *zhdr; > > + struct zswap_tree *tree; > > + int swpoffset; > > + int ret; > > + > > + /* get a reclaimable entry from LRU */ > > + spin_lock(&pool->lru_lock); > > + if (list_empty(&pool->lru)) { > > + spin_unlock(&pool->lru_lock); > > + return -EINVAL; > > + } > > + lru_entry = list_last_entry(&pool->lru, struct zswap_entry, lru); > > + list_del_init(&lru_entry->lru); > > + zhdr = zpool_map_handle(pool->zpool, lru_entry->handle, ZPOOL_MM_RO); > > + tree = zswap_trees[swp_type(zhdr->swpentry)]; > > + zpool_unmap_handle(pool->zpool, lru_entry->handle); > > + /* > > + * Once the pool lock is dropped, the lru_entry might get freed. The > > + * swpoffset is copied to the stack, and lru_entry isn't deref'd again > > + * until the entry is verified to still be alive in the tree. > > + */ > > + swpoffset = swp_offset(zhdr->swpentry); > > + spin_unlock(&pool->lru_lock); > > + > > + /* hold a reference from tree so it won't be freed during writeback */ > > + spin_lock(&tree->lock); > > + tree_entry = zswap_entry_find_get(&tree->rbroot, swpoffset); > > + if (tree_entry != lru_entry) { > > + if (tree_entry) > > + zswap_entry_put(tree, tree_entry); > > + spin_unlock(&tree->lock); > > + return -EAGAIN; > > + } > > + spin_unlock(&tree->lock); > > + > > + ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > > + > > + spin_lock(&tree->lock); > > + if (ret) { > > + spin_lock(&pool->lru_lock); > > + list_move(&lru_entry->lru, &pool->lru); > > + spin_unlock(&pool->lru_lock); > > + } > > + zswap_entry_put(tree, tree_entry); > > On re-reading this, I find the lru_entry vs tree_entry distinction > unnecessarily complicated. Once it's known that the thing coming off > the LRU is the same thing as in the tree, there is only "the entry". > > How about 'entry' and 'tree_entry', and after validation use 'entry' > throughout the rest of the function? Even better, safe the tree_entry entirely by getting the reference from the LRU already, and then just search the tree for a match: /* Get an entry off the LRU */ spin_lock(&pool->lru_lock); entry = list_last_entry(); list_del(&entry->lru); zswap_entry_get(entry); spin_unlock(&pool->lru_lock); /* Check for invalidate() race */ spin_lock(&tree->lock); if (entry != zswap_rb_search(&tree->rbroot, swpoffset)) { ret = -EAGAIN; goto put_unlock; } spin_unlock(&tree->lock); ret = zswap_writeback_entry(); spin_lock(&tree->lock); if (ret) { put_back_on_lru(); goto put_unlock; } /* Check for invalidate() race */ if (entry != zswap_rb_search(&tree->rbroot, swpoffset)) goto put_unlock; /* Drop base reference */ zswap_entry_put(tree, entry); put_unlock: /* Drop local reference */ zswap_entry_put(tree, entry); spin_unlock(&tree->lock); return ret ? -EAGAIN : 0; ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism 2023-06-08 17:04 ` Johannes Weiner @ 2023-06-08 18:45 ` Johannes Weiner 2023-06-09 8:39 ` Domenico Cerasuolo 0 siblings, 1 reply; 28+ messages in thread From: Johannes Weiner @ 2023-06-08 18:45 UTC (permalink / raw) To: Domenico Cerasuolo Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, akpm, linux-kernel, kernel-team On Thu, Jun 08, 2023 at 01:05:00PM -0400, Johannes Weiner wrote: > On Thu, Jun 08, 2023 at 12:52:51PM -0400, Johannes Weiner wrote: > > On Tue, Jun 06, 2023 at 04:56:05PM +0200, Domenico Cerasuolo wrote: > > > @@ -584,14 +601,70 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > > > return NULL; > > > } > > > > > > +static int zswap_shrink(struct zswap_pool *pool) > > > +{ > > > + struct zswap_entry *lru_entry, *tree_entry = NULL; > > > + struct zswap_header *zhdr; > > > + struct zswap_tree *tree; > > > + int swpoffset; > > > + int ret; > > > + > > > + /* get a reclaimable entry from LRU */ > > > + spin_lock(&pool->lru_lock); > > > + if (list_empty(&pool->lru)) { > > > + spin_unlock(&pool->lru_lock); > > > + return -EINVAL; > > > + } > > > + lru_entry = list_last_entry(&pool->lru, struct zswap_entry, lru); > > > + list_del_init(&lru_entry->lru); > > > + zhdr = zpool_map_handle(pool->zpool, lru_entry->handle, ZPOOL_MM_RO); > > > + tree = zswap_trees[swp_type(zhdr->swpentry)]; > > > + zpool_unmap_handle(pool->zpool, lru_entry->handle); > > > + /* > > > + * Once the pool lock is dropped, the lru_entry might get freed. The > > > + * swpoffset is copied to the stack, and lru_entry isn't deref'd again > > > + * until the entry is verified to still be alive in the tree. > > > + */ > > > + swpoffset = swp_offset(zhdr->swpentry); > > > + spin_unlock(&pool->lru_lock); > > > + > > > + /* hold a reference from tree so it won't be freed during writeback */ > > > + spin_lock(&tree->lock); > > > + tree_entry = zswap_entry_find_get(&tree->rbroot, swpoffset); > > > + if (tree_entry != lru_entry) { > > > + if (tree_entry) > > > + zswap_entry_put(tree, tree_entry); > > > + spin_unlock(&tree->lock); > > > + return -EAGAIN; > > > + } > > > + spin_unlock(&tree->lock); > > > + > > > + ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > > > + > > > + spin_lock(&tree->lock); > > > + if (ret) { > > > + spin_lock(&pool->lru_lock); > > > + list_move(&lru_entry->lru, &pool->lru); > > > + spin_unlock(&pool->lru_lock); > > > + } > > > + zswap_entry_put(tree, tree_entry); > > > > On re-reading this, I find the lru_entry vs tree_entry distinction > > unnecessarily complicated. Once it's known that the thing coming off > > the LRU is the same thing as in the tree, there is only "the entry". > > > > How about 'entry' and 'tree_entry', and after validation use 'entry' > > throughout the rest of the function? > > Even better, safe the tree_entry entirely by getting the reference > from the LRU already, and then just search the tree for a match: > > /* Get an entry off the LRU */ > spin_lock(&pool->lru_lock); > entry = list_last_entry(); > list_del(&entry->lru); > zswap_entry_get(entry); > spin_unlock(&pool->lru_lock); > > /* Check for invalidate() race */ > spin_lock(&tree->lock); > if (entry != zswap_rb_search(&tree->rbroot, swpoffset)) { > ret = -EAGAIN; > goto put_unlock; > } > spin_unlock(&tree->lock); Eh, brainfart. It needs the tree lock to bump the ref, of course. But this should work, right? /* Check for invalidate() race */ spin_lock(&tree->lock); if (entry != zswap_rb_search(&tree->rbroot, swpoffset)) { ret = -EAGAIN; goto unlock; } zswap_entry_get(entry); spin_unlock(&tree->lock); ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism 2023-06-08 18:45 ` Johannes Weiner @ 2023-06-09 8:39 ` Domenico Cerasuolo 0 siblings, 0 replies; 28+ messages in thread From: Domenico Cerasuolo @ 2023-06-09 8:39 UTC (permalink / raw) To: Johannes Weiner Cc: vitaly.wool, minchan, senozhatsky, yosryahmed, linux-mm, ddstreet, sjenning, nphamcs, akpm, linux-kernel, kernel-team On Thu, Jun 8, 2023 at 8:45 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Thu, Jun 08, 2023 at 01:05:00PM -0400, Johannes Weiner wrote: > > On Thu, Jun 08, 2023 at 12:52:51PM -0400, Johannes Weiner wrote: > > > On Tue, Jun 06, 2023 at 04:56:05PM +0200, Domenico Cerasuolo wrote: > > > > @@ -584,14 +601,70 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > > > > return NULL; > > > > } > > > > > > > > +static int zswap_shrink(struct zswap_pool *pool) > > > > +{ > > > > + struct zswap_entry *lru_entry, *tree_entry = NULL; > > > > + struct zswap_header *zhdr; > > > > + struct zswap_tree *tree; > > > > + int swpoffset; > > > > + int ret; > > > > + > > > > + /* get a reclaimable entry from LRU */ > > > > + spin_lock(&pool->lru_lock); > > > > + if (list_empty(&pool->lru)) { > > > > + spin_unlock(&pool->lru_lock); > > > > + return -EINVAL; > > > > + } > > > > + lru_entry = list_last_entry(&pool->lru, struct zswap_entry, lru); > > > > + list_del_init(&lru_entry->lru); > > > > + zhdr = zpool_map_handle(pool->zpool, lru_entry->handle, ZPOOL_MM_RO); > > > > + tree = zswap_trees[swp_type(zhdr->swpentry)]; > > > > + zpool_unmap_handle(pool->zpool, lru_entry->handle); > > > > + /* > > > > + * Once the pool lock is dropped, the lru_entry might get freed. The > > > > + * swpoffset is copied to the stack, and lru_entry isn't deref'd again > > > > + * until the entry is verified to still be alive in the tree. > > > > + */ > > > > + swpoffset = swp_offset(zhdr->swpentry); > > > > + spin_unlock(&pool->lru_lock); > > > > + > > > > + /* hold a reference from tree so it won't be freed during writeback */ > > > > + spin_lock(&tree->lock); > > > > + tree_entry = zswap_entry_find_get(&tree->rbroot, swpoffset); > > > > + if (tree_entry != lru_entry) { > > > > + if (tree_entry) > > > > + zswap_entry_put(tree, tree_entry); > > > > + spin_unlock(&tree->lock); > > > > + return -EAGAIN; > > > > + } > > > > + spin_unlock(&tree->lock); > > > > + > > > > + ret = zswap_writeback_entry(pool->zpool, lru_entry->handle); > > > > + > > > > + spin_lock(&tree->lock); > > > > + if (ret) { > > > > + spin_lock(&pool->lru_lock); > > > > + list_move(&lru_entry->lru, &pool->lru); > > > > + spin_unlock(&pool->lru_lock); > > > > + } > > > > + zswap_entry_put(tree, tree_entry); > > > > > > On re-reading this, I find the lru_entry vs tree_entry distinction > > > unnecessarily complicated. Once it's known that the thing coming off > > > the LRU is the same thing as in the tree, there is only "the entry". > > > > > > How about 'entry' and 'tree_entry', and after validation use 'entry' > > > throughout the rest of the function? > > > > Even better, safe the tree_entry entirely by getting the reference > > from the LRU already, and then just search the tree for a match: > > > > /* Get an entry off the LRU */ > > spin_lock(&pool->lru_lock); > > entry = list_last_entry(); > > list_del(&entry->lru); > > zswap_entry_get(entry); > > spin_unlock(&pool->lru_lock); > > > > /* Check for invalidate() race */ > > spin_lock(&tree->lock); > > if (entry != zswap_rb_search(&tree->rbroot, swpoffset)) { > > ret = -EAGAIN; > > goto put_unlock; > > } > > spin_unlock(&tree->lock); > > Eh, brainfart. It needs the tree lock to bump the ref, of course. > > But this should work, right? > > /* Check for invalidate() race */ > spin_lock(&tree->lock); > if (entry != zswap_rb_search(&tree->rbroot, swpoffset)) { > ret = -EAGAIN; > goto unlock; > } > zswap_entry_get(entry); > spin_unlock(&tree->lock); This should work indeed, it's much cleaner with just one local zswap_entry, will update! ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2023-06-09 17:14 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20230606145611.704392-1-cerasuolodomenico@gmail.com>
2023-06-07 9:16 ` [RFC PATCH v2 0/7] mm: zswap: move writeback LRU from zpool to zswap Yosry Ahmed
2023-06-07 9:23 ` Domenico Cerasuolo
2023-06-07 9:32 ` Yosry Ahmed
[not found] ` <20230606145611.704392-8-cerasuolodomenico@gmail.com>
2023-06-07 9:30 ` [RFC PATCH v2 7/7] mm: zswap: remove zswap_header Yosry Ahmed
2023-06-09 16:10 ` Domenico Cerasuolo
2023-06-09 17:13 ` Yosry Ahmed
[not found] ` <20230606145611.704392-3-cerasuolodomenico@gmail.com>
2023-06-08 16:02 ` [RFC PATCH v2 2/7] mm: zswap: remove page reclaim logic from zbud Johannes Weiner
[not found] ` <20230606145611.704392-4-cerasuolodomenico@gmail.com>
2023-06-08 16:05 ` [RFC PATCH v2 3/7] mm: zswap: remove page reclaim logic from z3fold Johannes Weiner
[not found] ` <20230606145611.704392-5-cerasuolodomenico@gmail.com>
2023-06-07 17:23 ` [RFC PATCH v2 4/7] mm: zswap: remove page reclaim logic from zsmalloc Nhat Pham
2023-06-07 17:45 ` Minchan Kim
2023-06-08 16:07 ` Johannes Weiner
[not found] ` <20230606145611.704392-6-cerasuolodomenico@gmail.com>
2023-06-07 9:19 ` [RFC PATCH v2 5/7] mm: zswap: remove shrink from zpool interface Yosry Ahmed
2023-06-08 16:10 ` Johannes Weiner
2023-06-08 17:51 ` Nhat Pham
[not found] ` <20230606145611.704392-7-cerasuolodomenico@gmail.com>
2023-06-07 9:26 ` [RFC PATCH v2 6/7] mm: zswap: simplify writeback function Yosry Ahmed
2023-06-09 10:23 ` Domenico Cerasuolo
2023-06-09 11:01 ` Yosry Ahmed
2023-06-08 16:48 ` Johannes Weiner
2023-06-09 11:05 ` Domenico Cerasuolo
[not found] ` <20230606145611.704392-2-cerasuolodomenico@gmail.com>
2023-06-07 8:14 ` [RFC PATCH v2 1/7] mm: zswap: add pool shrinking mechanism Yosry Ahmed
2023-06-07 9:22 ` Domenico Cerasuolo
2023-06-07 9:31 ` Yosry Ahmed
2023-06-07 21:39 ` Nhat Pham
2023-06-08 15:58 ` Johannes Weiner
2023-06-08 16:52 ` Johannes Weiner
2023-06-08 17:04 ` Johannes Weiner
2023-06-08 18:45 ` Johannes Weiner
2023-06-09 8:39 ` Domenico Cerasuolo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox