linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Yang Shi <shy828301@gmail.com>
Cc: ying.huang@intel.com, akpm@linux-foundation.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Aaron Lu <aaron.lu@intel.com>
Subject: Re: [PATCH] mm: swap: determine swap device by using page nid
Date: Thu, 7 Apr 2022 09:52:11 +0200	[thread overview]
Message-ID: <Yk6YKwZR5JjBokOs@dhcp22.suse.cz> (raw)
In-Reply-To: <20220407020953.475626-1-shy828301@gmail.com>

[Cc Aaron who has introduced the per node swap changes]

On Wed 06-04-22 19:09:53, Yang Shi wrote:
> The swap devices are linked to per node priority lists, the swap device
> closer to the node has higher priority on that node's priority list.
> This is supposed to improve I/O latency, particularly for some fast
> devices.  But the current code gets nid by calling numa_node_id() which
> actually returns the nid that the reclaimer is running on instead of the
> nid that the page belongs to.
> 
> Pass the page's nid dow to get_swap_pages() in order to pick up the
> right swap device.  But it doesn't work for the swap slots cache which
> is per cpu.  We could skip swap slots cache if the current node is not
> the page's node, but it may be overkilling. So keep using the current
> node's swap slots cache.  The issue was found by visual code inspection
> so it is not sure how much improvement could be achieved due to lack of
> suitable testing device.  But anyway the current code does violate the
> design.

Do you have any perf numbers for this change?
 
> Cc: Huang Ying <ying.huang@intel.com>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  include/linux/swap.h | 3 ++-
>  mm/swap_slots.c      | 7 ++++---
>  mm/swapfile.c        | 5 ++---
>  3 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 27093b477c5f..e442cf6b61ea 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -497,7 +497,8 @@ extern void si_swapinfo(struct sysinfo *);
>  extern swp_entry_t get_swap_page(struct page *page);
>  extern void put_swap_page(struct page *page, swp_entry_t entry);
>  extern swp_entry_t get_swap_page_of_type(int);
> -extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size);
> +extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size,
> +			  int node);
>  extern int add_swap_count_continuation(swp_entry_t, gfp_t);
>  extern void swap_shmem_alloc(swp_entry_t);
>  extern int swap_duplicate(swp_entry_t);
> diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> index 2b5531840583..a1c5cf6a4302 100644
> --- a/mm/swap_slots.c
> +++ b/mm/swap_slots.c
> @@ -264,7 +264,7 @@ static int refill_swap_slots_cache(struct swap_slots_cache *cache)
>  	cache->cur = 0;
>  	if (swap_slot_cache_active)
>  		cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE,
> -					   cache->slots, 1);
> +					   cache->slots, 1, numa_node_id());
>  
>  	return cache->nr;
>  }
> @@ -305,12 +305,13 @@ swp_entry_t get_swap_page(struct page *page)
>  {
>  	swp_entry_t entry;
>  	struct swap_slots_cache *cache;
> +	int nid = page_to_nid(page);
>  
>  	entry.val = 0;
>  
>  	if (PageTransHuge(page)) {
>  		if (IS_ENABLED(CONFIG_THP_SWAP))
> -			get_swap_pages(1, &entry, HPAGE_PMD_NR);
> +			get_swap_pages(1, &entry, HPAGE_PMD_NR, nid);
>  		goto out;
>  	}
>  
> @@ -342,7 +343,7 @@ swp_entry_t get_swap_page(struct page *page)
>  			goto out;
>  	}
>  
> -	get_swap_pages(1, &entry, 1);
> +	get_swap_pages(1, &entry, 1, nid);
>  out:
>  	if (mem_cgroup_try_charge_swap(page, entry)) {
>  		put_swap_page(page, entry);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 63c61f8b2611..151fffe0fd60 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1036,13 +1036,13 @@ static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx)
>  	swap_range_free(si, offset, SWAPFILE_CLUSTER);
>  }
>  
> -int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
> +int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size,
> +		   int node)
>  {
>  	unsigned long size = swap_entry_size(entry_size);
>  	struct swap_info_struct *si, *next;
>  	long avail_pgs;
>  	int n_ret = 0;
> -	int node;
>  
>  	/* Only single cluster request supported */
>  	WARN_ON_ONCE(n_goal > 1 && size == SWAPFILE_CLUSTER);
> @@ -1060,7 +1060,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
>  	atomic_long_sub(n_goal * size, &nr_swap_pages);
>  
>  start_over:
> -	node = numa_node_id();
>  	plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) {
>  		/* requeue si to after same-priority siblings */
>  		plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]);
> -- 
> 2.26.3

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2022-04-07  7:52 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07  2:09 Yang Shi
2022-04-07  7:52 ` Michal Hocko [this message]
2022-04-07 17:27   ` Yang Shi
2022-04-07  8:13 ` Aaron Lu
2022-04-07 17:36   ` Yang Shi
2022-04-20  8:33     ` Aaron Lu
2022-04-20 22:21       ` Yang Shi
2022-04-21  7:34         ` Aaron Lu
2022-04-21  7:49       ` ying.huang
2022-04-21  8:17         ` Aaron Lu
2022-04-21  8:30           ` Aaron Lu
2022-04-21  8:34           ` ying.huang
2022-04-22  6:24             ` Aaron Lu
2022-04-22  6:27               ` ying.huang
2022-04-22  6:43                 ` Aaron Lu
2022-04-22  7:26                   ` ying.huang
2022-04-22 17:00               ` Yang Shi
2022-04-23  3:22                 ` Aaron Lu
2022-04-29 10:26                 ` Aaron Lu
2022-04-29 19:07                   ` Yang Shi
2022-04-21 14:11         ` Aaron Lu
2022-04-21 17:19           ` Yang Shi
2022-04-21 23:57           ` ying.huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yk6YKwZR5JjBokOs@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox