* [PATCH v2 0/2] mm: zswap: simplify zswap_swapoff() @ 2024-01-24 4:51 Yosry Ahmed 2024-01-24 4:51 ` [PATCH v2 1/2] mm: swap: enforce updating inuse_pages at the end of swap_range_free() Yosry Ahmed 2024-01-24 4:51 ` [PATCH v2 2/2] mm: zswap: remove unnecessary trees cleanups in zswap_swapoff() Yosry Ahmed 0 siblings, 2 replies; 6+ messages in thread From: Yosry Ahmed @ 2024-01-24 4:51 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Nhat Pham, Chris Li, Chengming Zhou, Huang Ying, linux-mm, linux-kernel, Yosry Ahmed These patches aim to simplify zswap_swapoff() by removing the unnecessary trees cleanup code. Patch 1 makes sure that the order of operations during swapoff is enforced correctly, making sure the simplification in patch 2 is correct in a future-proof manner. This is based on mm-unstable and v2 of the "mm/zswap: optimize the scalability of zswap rb-tree" series [1]. [1]https://lore.kernel.org/lkml/20240117-b4-zswap-lock-optimize-v2-0-b5cc55479090@bytedance.com/ Yosry Ahmed (2): mm: swap: enforce updating inuse_pages at the end of swap_range_free() mm: zswap: remove unnecessary trees cleanups in zswap_swapoff() mm/swapfile.c | 18 +++++++++++++++--- mm/zswap.c | 16 +++------------- 2 files changed, 18 insertions(+), 16 deletions(-) -- 2.43.0.429.g432eaa2c6b-goog ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/2] mm: swap: enforce updating inuse_pages at the end of swap_range_free() 2024-01-24 4:51 [PATCH v2 0/2] mm: zswap: simplify zswap_swapoff() Yosry Ahmed @ 2024-01-24 4:51 ` Yosry Ahmed 2024-01-24 5:20 ` Huang, Ying 2024-01-24 4:51 ` [PATCH v2 2/2] mm: zswap: remove unnecessary trees cleanups in zswap_swapoff() Yosry Ahmed 1 sibling, 1 reply; 6+ messages in thread From: Yosry Ahmed @ 2024-01-24 4:51 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Nhat Pham, Chris Li, Chengming Zhou, Huang Ying, linux-mm, linux-kernel, Yosry Ahmed In swap_range_free(), we update inuse_pages then do some cleanups (arch invalidation, zswap invalidation, swap cache cleanups, etc). During swapoff, try_to_unuse() checks that inuse_pages is 0 to make sure all swap entries are freed. Make sure we only update inuse_pages after we are done with the cleanups in swap_range_free(), and use the proper memory barriers to enforce it. This makes sure that code following try_to_unuse() can safely assume that swap_range_free() ran for all entries in thr swapfile (e.g. swap cache cleanup, zswap_swapoff()). In practice, this currently isn't a problem because swap_range_free() is called with the swap info lock held, and the swapoff code happens to spin for that after try_to_unuse(). However, this seems fragile and unintentional, so make it more relable and future-proof. This also facilitates a following simplification of zswap_swapoff(). Signed-off-by: Yosry Ahmed <yosryahmed@google.com> --- mm/swapfile.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index b11b6057d8b5f..0580bb3e34d77 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -737,8 +737,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, if (was_full && (si->flags & SWP_WRITEOK)) add_to_avail_list(si); } - atomic_long_add(nr_entries, &nr_swap_pages); - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); if (si->flags & SWP_BLKDEV) swap_slot_free_notify = si->bdev->bd_disk->fops->swap_slot_free_notify; @@ -752,6 +750,14 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, offset++; } clear_shadow_from_swap_cache(si->type, begin, end); + + /* + * Make sure that try_to_unuse() observes si->inuse_pages reaching 0 + * only after the above cleanups are done. + */ + smp_wmb(); + atomic_long_add(nr_entries, &nr_swap_pages); + WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); } static void set_cluster_next(struct swap_info_struct *si, unsigned long next) @@ -2049,7 +2055,7 @@ static int try_to_unuse(unsigned int type) unsigned int i; if (!READ_ONCE(si->inuse_pages)) - return 0; + goto success; retry: retval = shmem_unuse(type); @@ -2130,6 +2136,12 @@ static int try_to_unuse(unsigned int type) return -EINTR; } +success: + /* + * Make sure that further cleanups after try_to_unuse() returns happen + * after swap_range_free() reduces si->inuse_pages to 0. + */ + smp_mb(); return 0; } -- 2.43.0.429.g432eaa2c6b-goog ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] mm: swap: enforce updating inuse_pages at the end of swap_range_free() 2024-01-24 4:51 ` [PATCH v2 1/2] mm: swap: enforce updating inuse_pages at the end of swap_range_free() Yosry Ahmed @ 2024-01-24 5:20 ` Huang, Ying 0 siblings, 0 replies; 6+ messages in thread From: Huang, Ying @ 2024-01-24 5:20 UTC (permalink / raw) To: Yosry Ahmed Cc: Andrew Morton, Johannes Weiner, Nhat Pham, Chris Li, Chengming Zhou, linux-mm, linux-kernel Yosry Ahmed <yosryahmed@google.com> writes: > In swap_range_free(), we update inuse_pages then do some cleanups (arch > invalidation, zswap invalidation, swap cache cleanups, etc). During > swapoff, try_to_unuse() checks that inuse_pages is 0 to make sure all > swap entries are freed. Make sure we only update inuse_pages after we > are done with the cleanups in swap_range_free(), and use the proper > memory barriers to enforce it. This makes sure that code following > try_to_unuse() can safely assume that swap_range_free() ran for all > entries in thr swapfile (e.g. swap cache cleanup, zswap_swapoff()). > > In practice, this currently isn't a problem because swap_range_free() is > called with the swap info lock held, and the swapoff code happens to > spin for that after try_to_unuse(). However, this seems fragile and > unintentional, so make it more relable and future-proof. This also > facilitates a following simplification of zswap_swapoff(). > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> LGTM, Thanks! Reviewed-by: "Huang, Ying" <ying.huang@intel.com> > --- > mm/swapfile.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index b11b6057d8b5f..0580bb3e34d77 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -737,8 +737,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > if (was_full && (si->flags & SWP_WRITEOK)) > add_to_avail_list(si); > } > - atomic_long_add(nr_entries, &nr_swap_pages); > - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > if (si->flags & SWP_BLKDEV) > swap_slot_free_notify = > si->bdev->bd_disk->fops->swap_slot_free_notify; > @@ -752,6 +750,14 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > offset++; > } > clear_shadow_from_swap_cache(si->type, begin, end); > + > + /* > + * Make sure that try_to_unuse() observes si->inuse_pages reaching 0 > + * only after the above cleanups are done. > + */ > + smp_wmb(); > + atomic_long_add(nr_entries, &nr_swap_pages); > + WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > } > > static void set_cluster_next(struct swap_info_struct *si, unsigned long next) > @@ -2049,7 +2055,7 @@ static int try_to_unuse(unsigned int type) > unsigned int i; > > if (!READ_ONCE(si->inuse_pages)) > - return 0; > + goto success; > > retry: > retval = shmem_unuse(type); > @@ -2130,6 +2136,12 @@ static int try_to_unuse(unsigned int type) > return -EINTR; > } > > +success: > + /* > + * Make sure that further cleanups after try_to_unuse() returns happen > + * after swap_range_free() reduces si->inuse_pages to 0. > + */ > + smp_mb(); > return 0; > } ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] mm: zswap: remove unnecessary trees cleanups in zswap_swapoff() 2024-01-24 4:51 [PATCH v2 0/2] mm: zswap: simplify zswap_swapoff() Yosry Ahmed 2024-01-24 4:51 ` [PATCH v2 1/2] mm: swap: enforce updating inuse_pages at the end of swap_range_free() Yosry Ahmed @ 2024-01-24 4:51 ` Yosry Ahmed 2024-01-24 4:52 ` Yosry Ahmed 2024-01-24 7:34 ` Chengming Zhou 1 sibling, 2 replies; 6+ messages in thread From: Yosry Ahmed @ 2024-01-24 4:51 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Nhat Pham, Chris Li, Chengming Zhou, Huang Ying, linux-mm, linux-kernel, Yosry Ahmed During swapoff, try_to_unuse() makes sure that zswap_invalidate() is called for all swap entries before zswap_swapoff() is called. This means that all zswap entries should already be removed from the tree. Simplify zswap_swapoff() by removing the trees cleanup code, and leave an assertion in its place. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> --- mm/zswap.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index dcdd5ecfedb09..78df16d307aa8 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1808,19 +1808,9 @@ void zswap_swapoff(int type) if (!trees) return; - for (i = 0; i < nr_zswap_trees[type]; i++) { - struct zswap_tree *tree = trees + i; - struct zswap_entry *entry, *n; - - /* walk the tree and free everything */ - spin_lock(&tree->lock); - rbtree_postorder_for_each_entry_safe(entry, n, - &tree->rbroot, - rbnode) - zswap_free_entry(entry); - tree->rbroot = RB_ROOT; - spin_unlock(&tree->lock); - } + /* try_to_unuse() invalidated all the entries already */ + for (i = 0; i < nr_zswap_trees[type]; i++) + WARN_ON_ONCE(!RB_EMPTY_ROOT(&trees[i].rbroot)); kvfree(trees); nr_zswap_trees[type] = 0; -- 2.43.0.429.g432eaa2c6b-goog ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] mm: zswap: remove unnecessary trees cleanups in zswap_swapoff() 2024-01-24 4:51 ` [PATCH v2 2/2] mm: zswap: remove unnecessary trees cleanups in zswap_swapoff() Yosry Ahmed @ 2024-01-24 4:52 ` Yosry Ahmed 2024-01-24 7:34 ` Chengming Zhou 1 sibling, 0 replies; 6+ messages in thread From: Yosry Ahmed @ 2024-01-24 4:52 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Nhat Pham, Chris Li, Chengming Zhou, Huang Ying, linux-mm, linux-kernel On Tue, Jan 23, 2024 at 8:51 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > During swapoff, try_to_unuse() makes sure that zswap_invalidate() is > called for all swap entries before zswap_swapoff() is called. This means > that all zswap entries should already be removed from the tree. Simplify > zswap_swapoff() by removing the trees cleanup code, and leave an > assertion in its place. > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Johannes, Nhat, I didn't carry the Acks forward after the rebase as the code changed significantly. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] mm: zswap: remove unnecessary trees cleanups in zswap_swapoff() 2024-01-24 4:51 ` [PATCH v2 2/2] mm: zswap: remove unnecessary trees cleanups in zswap_swapoff() Yosry Ahmed 2024-01-24 4:52 ` Yosry Ahmed @ 2024-01-24 7:34 ` Chengming Zhou 1 sibling, 0 replies; 6+ messages in thread From: Chengming Zhou @ 2024-01-24 7:34 UTC (permalink / raw) To: Yosry Ahmed, Andrew Morton Cc: Johannes Weiner, Nhat Pham, Chris Li, Huang Ying, linux-mm, linux-kernel On 2024/1/24 12:51, Yosry Ahmed wrote: > During swapoff, try_to_unuse() makes sure that zswap_invalidate() is > called for all swap entries before zswap_swapoff() is called. This means > that all zswap entries should already be removed from the tree. Simplify > zswap_swapoff() by removing the trees cleanup code, and leave an > assertion in its place. > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Looks good to me, thanks! Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > mm/zswap.c | 16 +++------------- > 1 file changed, 3 insertions(+), 13 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index dcdd5ecfedb09..78df16d307aa8 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1808,19 +1808,9 @@ void zswap_swapoff(int type) > if (!trees) > return; > > - for (i = 0; i < nr_zswap_trees[type]; i++) { > - struct zswap_tree *tree = trees + i; > - struct zswap_entry *entry, *n; > - > - /* walk the tree and free everything */ > - spin_lock(&tree->lock); > - rbtree_postorder_for_each_entry_safe(entry, n, > - &tree->rbroot, > - rbnode) > - zswap_free_entry(entry); > - tree->rbroot = RB_ROOT; > - spin_unlock(&tree->lock); > - } > + /* try_to_unuse() invalidated all the entries already */ > + for (i = 0; i < nr_zswap_trees[type]; i++) > + WARN_ON_ONCE(!RB_EMPTY_ROOT(&trees[i].rbroot)); > > kvfree(trees); > nr_zswap_trees[type] = 0; ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-24 7:35 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-24 4:51 [PATCH v2 0/2] mm: zswap: simplify zswap_swapoff() Yosry Ahmed 2024-01-24 4:51 ` [PATCH v2 1/2] mm: swap: enforce updating inuse_pages at the end of swap_range_free() Yosry Ahmed 2024-01-24 5:20 ` Huang, Ying 2024-01-24 4:51 ` [PATCH v2 2/2] mm: zswap: remove unnecessary trees cleanups in zswap_swapoff() Yosry Ahmed 2024-01-24 4:52 ` Yosry Ahmed 2024-01-24 7:34 ` Chengming Zhou
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox