From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: "Huang, Ying" <ying.huang@intel.com>,
Chris Li <chrisl@kernel.org>, Minchan Kim <minchan@kernel.org>,
Barry Song <v-songbaohua@oppo.com>,
Ryan Roberts <ryan.roberts@arm.com>, Yu Zhao <yuzhao@google.com>,
SeongJae Park <sj@kernel.org>,
David Hildenbrand <david@redhat.com>,
Yosry Ahmed <yosryahmed@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Matthew Wilcox <willy@infradead.org>,
Nhat Pham <nphamcs@gmail.com>,
Chengming Zhou <zhouchengming@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, Kairui Song <kasong@tencent.com>
Subject: [RFC PATCH 05/10] mm/swap: clean shadow only in unmap path
Date: Wed, 27 Mar 2024 02:50:27 +0800 [thread overview]
Message-ID: <20240326185032.72159-6-ryncsn@gmail.com> (raw)
In-Reply-To: <20240326185032.72159-1-ryncsn@gmail.com>
From: Kairui Song <kasong@tencent.com>
After removing the cache bypass swapin, the first thing could be
gone is all the clear_shadow_from_swap_cache calls. Currently
clear_shadow_from_swap_cache is being called in many paths.
It's currently being called by swap_range_free which has two
direct callers:
- swap_free_cluster, which is only called by put_swap_folio to free up
the shadow of a slot cluster.
- swap_entry_free, which is only called by swapcache_free_entries to
free up shadow of a slot.
And these two are very commonly used everywhere in SWAP codes.
Notice the shadow is only written by __delete_from_swap_cache after
after a successful SWAP out, so clearly we only want to clear shadow
after SWAP in (the shadow is used and no longer needed) or Unmap/MADV_FREE.
After all swapin is using cached swapin path, clear_shadow_from_swap_cache
is not needed for swapin anymore, because we have to insert the folio
first, and this already removed the shadow. So we only need to clear the
shadow for Unmap/MADV_FREE.
All direct/indirect caller of swap_free_cluster and swap_entry_free
are listed below:
- swap_free_cluster:
-> put_swap_folio (Clean the cache flag and try delete shadow, after
removing the cache or error handling)
-> delete_from_swap_cache
-> __remove_mapping
-> shmem_writepage
-> folio_alloc_swap
-> add_to_swap
-> __read_swap_cache_async
- swap_entry_free
-> swapcache_free_entries
-> drain_slots_cache_cpu
-> free_swap_slot
-> put_swap_folio (Already covered above)
-> __swap_entry_free / swap_free
-> free_swap_and_cache (Called by Unmap/Zap/MADV_FREE)
-> madvise_free_single_vma
-> unmap_page_range
-> shmem_undo_range
-> swap_free (Called by swapin path)
-> do_swap_page (Swapin path)
-> alloc_swapdev_block/free_all_swap_pages ()
-> try_to_unmap_one (Error handling, no shadow)
-> shmem_set_folio_swapin_error (Shadow just gone)
-> shmem_swapin_folio (Shmem's do_swap_page)
-> unuse_pte (Swapoff, which always use swapcache)
So now we only need to call clear_shadow_from_swap_cache in
free_swap_and_cache because all swapin/out will went through swap cache
now. Previously all above functions could invoke
clear_shadow_from_swap_cache in case a cache bypass swapin left a entry
with uncleared shadow.
Also make clear_shadow_from_swap_cache only clear one entry for
simplicity.
Test result of sequential swapin/out:
Before (us) After (us)
Swapout: 33624641 33648529
Swapin: 41614858 40667696 (+2.3%)
Swapout (THP): 7795530 7658664
Swapin (THP) : 41708471 40602278 (+2.7%)
Signed-off-by: Kairui Song <kasong@tencent.com>
---
mm/swap.h | 6 ++----
mm/swap_state.c | 33 ++++++++-------------------------
mm/swapfile.c | 6 ++++--
3 files changed, 14 insertions(+), 31 deletions(-)
diff --git a/mm/swap.h b/mm/swap.h
index ac9573b03432..7721ddb3bdbc 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -39,8 +39,7 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
void __delete_from_swap_cache(struct folio *folio,
swp_entry_t entry, void *shadow);
void delete_from_swap_cache(struct folio *folio);
-void clear_shadow_from_swap_cache(int type, unsigned long begin,
- unsigned long end);
+void clear_shadow_from_swap_cache(swp_entry_t entry);
struct folio *swap_cache_get_folio(swp_entry_t entry,
struct vm_area_struct *vma, unsigned long addr);
struct folio *filemap_get_incore_folio(struct address_space *mapping,
@@ -148,8 +147,7 @@ static inline void delete_from_swap_cache(struct folio *folio)
{
}
-static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
- unsigned long end)
+static inline void clear_shadow_from_swap_cache(swp_entry_t entry)
{
}
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 49ef6250f676..b84e7b0ea4a5 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -245,34 +245,17 @@ void delete_from_swap_cache(struct folio *folio)
folio_ref_sub(folio, folio_nr_pages(folio));
}
-void clear_shadow_from_swap_cache(int type, unsigned long begin,
- unsigned long end)
+void clear_shadow_from_swap_cache(swp_entry_t entry)
{
- unsigned long curr = begin;
- void *old;
-
- for (;;) {
- swp_entry_t entry = swp_entry(type, curr);
- struct address_space *address_space = swap_address_space(entry);
- XA_STATE(xas, &address_space->i_pages, curr);
-
- xas_set_update(&xas, workingset_update_node);
+ struct address_space *address_space = swap_address_space(entry);
+ XA_STATE(xas, &address_space->i_pages, swp_offset(entry));
- xa_lock_irq(&address_space->i_pages);
- xas_for_each(&xas, old, end) {
- if (!xa_is_value(old))
- continue;
- xas_store(&xas, NULL);
- }
- xa_unlock_irq(&address_space->i_pages);
+ xas_set_update(&xas, workingset_update_node);
- /* search the next swapcache until we meet end */
- curr >>= SWAP_ADDRESS_SPACE_SHIFT;
- curr++;
- curr <<= SWAP_ADDRESS_SPACE_SHIFT;
- if (curr > end)
- break;
- }
+ xa_lock_irq(&address_space->i_pages);
+ if (xa_is_value(xas_load(&xas)))
+ xas_store(&xas, NULL);
+ xa_unlock_irq(&address_space->i_pages);
}
/*
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ae8d3aa05df7..bafae23c0f26 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -724,7 +724,6 @@ static void add_to_avail_list(struct swap_info_struct *p)
static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
unsigned int nr_entries)
{
- unsigned long begin = offset;
unsigned long end = offset + nr_entries - 1;
void (*swap_slot_free_notify)(struct block_device *, unsigned long);
@@ -748,7 +747,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
swap_slot_free_notify(si->bdev, offset);
offset++;
}
- clear_shadow_from_swap_cache(si->type, begin, end);
/*
* Make sure that try_to_unuse() observes si->inuse_pages reaching 0
@@ -1605,6 +1603,8 @@ bool folio_free_swap(struct folio *folio)
/*
* Free the swap entry like above, but also try to
* free the page cache entry if it is the last user.
+ * Useful when clearing the swap map and swap cache
+ * without reading swap content (eg. unmap, MADV_FREE)
*/
int free_swap_and_cache(swp_entry_t entry)
{
@@ -1626,6 +1626,8 @@ int free_swap_and_cache(swp_entry_t entry)
!swap_page_trans_huge_swapped(p, entry))
__try_to_reclaim_swap(p, swp_offset(entry),
TTRS_UNMAPPED | TTRS_FULL);
+ if (!count)
+ clear_shadow_from_swap_cache(entry);
put_swap_device(p);
}
return p != NULL;
--
2.43.0
next prev parent reply other threads:[~2024-03-26 19:04 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-26 18:50 [RFC PATCH 00/10] mm/swap: always use swap cache for synchronization Kairui Song
2024-03-26 18:50 ` [RFC PATCH 01/10] mm/filemap: split filemap storing logic into a standalone helper Kairui Song
2024-03-26 18:50 ` [RFC PATCH 02/10] mm/swap: move no readahead swapin code to a stand-alone helper Kairui Song
2024-03-26 18:50 ` [RFC PATCH 03/10] mm/swap: convert swapin_readahead to return a folio Kairui Song
2024-03-26 20:03 ` Matthew Wilcox
2024-03-26 18:50 ` [RFC PATCH 04/10] mm/swap: remove cache bypass swapin Kairui Song
2024-03-27 6:30 ` Huang, Ying
2024-03-27 6:55 ` Kairui Song
2024-03-27 7:29 ` Huang, Ying
2024-03-26 18:50 ` Kairui Song [this message]
2024-03-26 18:50 ` [RFC PATCH 06/10] mm/swap: switch to use multi index entries Kairui Song
2024-03-26 18:50 ` [RFC PATCH 07/10] mm/swap: rename __read_swap_cache_async to swap_cache_alloc_or_get Kairui Song
2024-03-26 18:50 ` [RFC PATCH 08/10] mm/swap: use swap cache as a synchronization layer Kairui Song
2024-03-26 18:50 ` [RFC PATCH 09/10] mm/swap: delay the swap cache lookup for swapin Kairui Song
2024-03-26 18:50 ` [RFC PATCH 10/10] mm/swap: optimize synchronous swapin Kairui Song
2024-03-27 6:22 ` Huang, Ying
2024-03-27 6:37 ` Kairui Song
2024-03-27 6:47 ` Huang, Ying
2024-03-27 7:14 ` Kairui Song
2024-03-27 8:16 ` Huang, Ying
2024-03-27 8:08 ` Barry Song
2024-03-27 8:44 ` Kairui Song
2024-03-27 2:52 ` [RFC PATCH 00/10] mm/swap: always use swap cache for synchronization Huang, Ying
2024-03-27 3:01 ` Kairui Song
2024-03-27 8:27 ` Ryan Roberts
2024-03-27 8:32 ` Huang, Ying
2024-03-27 9:39 ` Ryan Roberts
2024-03-27 11:04 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240326185032.72159-6-ryncsn@gmail.com \
--to=ryncsn@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=nphamcs@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=sj@kernel.org \
--cc=v-songbaohua@oppo.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yosryahmed@google.com \
--cc=yuzhao@google.com \
--cc=zhouchengming@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox