Re: [PATCH] swap: Add percpu cluster_next to reduce lock contention on swap cache

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang\, Ying" <ying.huang@intel.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,  <linux-mm@kvack.org>,
	 <linux-kernel@vger.kernel.org>,  Michal Hocko <mhocko@suse.com>,
	 Minchan Kim <minchan@kernel.org>,
	 Tim Chen <tim.c.chen@linux.intel.com>,
	 Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH] swap: Add percpu cluster_next to reduce lock contention on swap cache
Date: Mon, 18 May 2020 14:37:15 +0800	[thread overview]
Message-ID: <87zha5kauc.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20200515235140.xkznql332xmqvuf2@ca-dmjordan1.us.oracle.com> (Daniel Jordan's message of "Fri, 15 May 2020 19:51:40 -0400")

Daniel Jordan <daniel.m.jordan@oracle.com> writes:

> On Thu, May 14, 2020 at 03:04:24PM +0800, Huang Ying wrote:
>> And the pmbench score increases 15.9%.
>
> What metric is that, and how long did you run the benchmark for?

I run the benchmark for 1800s.  The metric comes from the following
output of the pmbench,

[1] Benchmark done - took 1800.088 sec for 122910000 page access

That is, the throughput is 122910000 / 1800.088 = 68280.0 (accesses/s).
Then we sum the values from the different processes.

> Given that this thing is probabilistic, did you notice much variance from run
> to run?

The results looks quite stable for me.  The standard deviation of
results run to run is less than 1% for me.

>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 35be7a7271f4..9f1343b066c1 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -746,7 +746,16 @@ static int scan_swap_map_slots(struct swap_info_struct *si,
>>  	 */
>>  
>>  	si->flags += SWP_SCANNING;
>> -	scan_base = offset = si->cluster_next;
>> +	/*
>> +	 * Use percpu scan base for SSD to reduce lock contention on
>> +	 * cluster and swap cache.  For HDD, sequential access is more
>> +	 * important.
>> +	 */
>> +	if (si->flags & SWP_SOLIDSTATE)
>> +		scan_base = this_cpu_read(*si->cluster_next_cpu);
>> +	else
>> +		scan_base = si->cluster_next;
>> +	offset = scan_base;
>>  
>>  	/* SSD algorithm */
>>  	if (si->cluster_info) {
>
> It's just a nit but SWP_SOLIDSTATE and 'if (si->cluster_info)' are two ways to
> check the same thing and I'd stick with the one that's already there.

Yes.  In effect, (si->flags & SWP_SOLIDSTATE) and (si->cluster_info)
always has same value at least for now.  But I don't think they are
exactly same in semantics.  So I would rather to use their exact
semantics.

>> @@ -2962,6 +2979,8 @@ static unsigned long read_swap_header(struct swap_info_struct *p,
>>  
>>  	p->lowest_bit  = 1;
>>  	p->cluster_next = 1;
>> +	for_each_possible_cpu(i)
>> +		per_cpu(*p->cluster_next_cpu, i) = 1;
>
> These are later overwritten if the device is an SSD which seems to be the only
> case where these are used, so why have this?

Yes.  You are right.  Will remove this in the future versions.

>>  	p->cluster_nr = 0;
>>  
>>  	maxpages = max_swapfile_size();
>> @@ -3204,6 +3223,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>>  		 * SSD
>>  		 */
>>  		p->cluster_next = 1 + prandom_u32_max(p->highest_bit);
>> +		for_each_possible_cpu(cpu) {
>> +			per_cpu(*p->cluster_next_cpu, cpu) =
>> +				1 + prandom_u32_max(p->highest_bit);
>> +		}
>
> Is there a reason for adding one?  The history didn't enlighten me about why
> cluster_next does it.

The first swap slot is the swap partition header, you cand find the
corresponding code in syscall swapon function, below comments "Read the
swap header.".

Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2020-05-18  6:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-14  7:04 Huang Ying
2020-05-15 22:19 ` Andrew Morton
2020-05-18  5:52   ` Huang, Ying
2020-05-15 23:51 ` Daniel Jordan
2020-05-18  6:37   ` Huang, Ying [this message]
2020-05-19  1:39     ` Daniel Jordan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zha5kauc.fsf@yhuang-dev.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox