linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Huang\, Ying" <ying.huang@intel.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH -mm -V3 00/21] mm, THP, swap: Swapout/swapin THP in one piece
Date: Fri, 01 Jun 2018 15:03:50 +0800	[thread overview]
Message-ID: <87efhryomh.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20180601061116.GA4813@hori1.linux.bs1.fc.nec.co.jp> (Naoya Horiguchi's message of "Fri, 1 Jun 2018 06:11:16 +0000")

Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> writes:

> On Wed, May 23, 2018 at 04:26:04PM +0800, Huang, Ying wrote:
>> From: Huang Ying <ying.huang@intel.com>
>> 
>> Hi, Andrew, could you help me to check whether the overall design is
>> reasonable?
>> 
>> Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the
>> swap part of the patchset?  Especially [02/21], [03/21], [04/21],
>> [05/21], [06/21], [07/21], [08/21], [09/21], [10/21], [11/21],
>> [12/21], [20/21].
>> 
>> Hi, Andrea and Kirill, could you help me to review the THP part of the
>> patchset?  Especially [01/21], [07/21], [09/21], [11/21], [13/21],
>> [15/21], [16/21], [17/21], [18/21], [19/21], [20/21], [21/21].
>> 
>> Hi, Johannes and Michal, could you help me to review the cgroup part
>> of the patchset?  Especially [14/21].
>> 
>> And for all, Any comment is welcome!
>
> Hi Ying Huang,
> I've read through this series and find no issue.

Thanks a lot for your review!

> It seems that thp swapout never happens if swap devices are backed by
> rotation storages.  I guess that's because this feature depends on swap
> cluster searching algorithm which only supports non-rotational storages.
>
> I think that this limitation is OK because non-rotational storage is
> better for swap device (most future users will use it). But I think
> it's better to document the limitation somewhere because swap cluster
> is in-kernel thing and we can't assume that end users know about it.

Yes.  I will try to document it somewhere.

Best Regards,
Huang, Ying

> Thanks,
> Naoya Horiguchi
>
>> 
>> This patchset is based on the 2018-05-18 head of mmotm/master.
>> 
>> This is the final step of THP (Transparent Huge Page) swap
>> optimization.  After the first and second step, the splitting huge
>> page is delayed from almost the first step of swapout to after swapout
>> has been finished.  In this step, we avoid splitting THP for swapout
>> and swapout/swapin the THP in one piece.
>> 
>> We tested the patchset with vm-scalability benchmark swap-w-seq test
>> case, with 16 processes.  The test case forks 16 processes.  Each
>> process allocates large anonymous memory range, and writes it from
>> begin to end for 8 rounds.  The first round will swapout, while the
>> remaining rounds will swapin and swapout.  The test is done on a Xeon
>> E5 v3 system, the swap device used is a RAM simulated PMEM (persistent
>> memory) device.  The test result is as follow,
>> 
>>             base                  optimized
>> ---------------- -------------------------- 
>>          %stddev     %change         %stddev
>>              \          |                \  
>>    1417897 A+-  2%    +992.8%   15494673        vm-scalability.throughput
>>    1020489 A+-  4%   +1091.2%   12156349        vmstat.swap.si
>>    1255093 A+-  3%    +940.3%   13056114        vmstat.swap.so
>>    1259769 A+-  7%   +1818.3%   24166779        meminfo.AnonHugePages
>>   28021761           -10.7%   25018848 A+-  2%  meminfo.AnonPages
>>   64080064 A+-  4%     -95.6%    2787565 A+- 33%  interrupts.CAL:Function_call_interrupts
>>      13.91 A+-  5%     -13.8        0.10 A+- 27%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>> 
>> Where, the score of benchmark (bytes written per second) improved
>> 992.8%.  The swapout/swapin throughput improved 1008% (from about
>> 2.17GB/s to 24.04GB/s).  The performance difference is huge.  In base
>> kernel, for the first round of writing, the THP is swapout and split,
>> so in the remaining rounds, there is only normal page swapin and
>> swapout.  While in optimized kernel, the THP is kept after first
>> swapout, so THP swapin and swapout is used in the remaining rounds.
>> This shows the key benefit to swapout/swapin THP in one piece, the THP
>> will be kept instead of being split.  meminfo information verified
>> this, in base kernel only 4.5% of anonymous page are THP during the
>> test, while in optimized kernel, that is 96.6%.  The TLB flushing IPI
>> (represented as interrupts.CAL:Function_call_interrupts) reduced
>> 95.6%, while cycles for spinlock reduced from 13.9% to 0.1%.  These
>> are performance benefit of THP swapout/swapin too.
>> 
>> Below is the description for all steps of THP swap optimization.
>> 
>> Recently, the performance of the storage devices improved so fast that
>> we cannot saturate the disk bandwidth with single logical CPU when do
>> page swapping even on a high-end server machine.  Because the
>> performance of the storage device improved faster than that of single
>> logical CPU.  And it seems that the trend will not change in the near
>> future.  On the other hand, the THP becomes more and more popular
>> because of increased memory size.  So it becomes necessary to optimize
>> THP swap performance.
>> 
>> The advantages to swapout/swapin a THP in one piece include:
>> 
>> - Batch various swap operations for the THP.  Many operations need to
>>   be done once per THP instead of per normal page, for example,
>>   allocating/freeing the swap space, writing/reading the swap space,
>>   flushing TLB, page fault, etc.  This will improve the performance of
>>   the THP swap greatly.
>> 
>> - The THP swap space read/write will be large sequential IO (2M on
>>   x86_64).  It is particularly helpful for the swapin, which are
>>   usually 4k random IO.  This will improve the performance of the THP
>>   swap too.
>> 
>> - It will help the memory fragmentation, especially when the THP is
>>   heavily used by the applications.  The THP order pages will be free
>>   up after THP swapout.
>> 
>> - It will improve the THP utilization on the system with the swap
>>   turned on.  Because the speed for khugepaged to collapse the normal
>>   pages into the THP is quite slow.  After the THP is split during the
>>   swapout, it will take quite long time for the normal pages to
>>   collapse back into the THP after being swapin.  The high THP
>>   utilization helps the efficiency of the page based memory management
>>   too.
>> 
>> There are some concerns regarding THP swapin, mainly because possible
>> enlarged read/write IO size (for swapout/swapin) may put more overhead
>> on the storage device.  To deal with that, the THP swapin is turned on
>> only when necessary.  A new sysfs interface:
>> /sys/kernel/mm/transparent_hugepage/swapin_enabled is added to
>> configure it.  It uses "always/never/madvise" logic, to be turned on
>> globally, turned off globally, or turned on only for VMA with
>> MADV_HUGEPAGE, etc.
>> GE, etc.
>> 
>> Changelog
>> ---------
>> 
>> v3:
>> 
>> - Rebased on 5/18 HEAD of mmotm/master
>> 
>> - Fixed a build bug, Thanks 0-Day!
>> 
>> v2:
>> 
>> - Fixed several build bugs, Thanks 0-Day!
>> 
>> - Improved documentation as suggested by Randy Dunlap.
>> 
>> - Fixed several bugs in reading huge swap cluster
>> 

  reply	other threads:[~2018-06-01  7:03 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-23  8:26 Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 01/21] mm, THP, swap: Enable PMD swap operations for CONFIG_THP_SWAP Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 02/21] mm, THP, swap: Make CONFIG_THP_SWAP depends on CONFIG_SWAP Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate() Huang, Ying
2018-06-11 20:42   ` Daniel Jordan
2018-06-12  1:23     ` Huang, Ying
2018-06-12  3:15       ` Huang, Ying
2018-06-12 12:05         ` Daniel Jordan
2018-06-12 12:04       ` Daniel Jordan
2018-06-12 21:44       ` Daniel Jordan
2018-06-13  1:26         ` Huang, Ying
2018-06-13 11:49           ` Daniel Jordan
2018-05-23  8:26 ` [PATCH -mm -V3 04/21] mm, THP, swap: Support PMD swap mapping in swapcache_free_cluster() Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 05/21] mm, THP, swap: Support PMD swap mapping in free_swap_and_cache()/swap_free() Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 06/21] mm, THP, swap: Support PMD swap mapping when splitting huge PMD Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 07/21] mm, THP, swap: Support PMD swap mapping in split_swap_cluster() Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 08/21] mm, THP, swap: Support to read a huge swap cluster for swapin a THP Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 09/21] mm, THP, swap: Swapin a THP as a whole Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 10/21] mm, THP, swap: Support to count THP swapin and its fallback Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 11/21] mm, THP, swap: Add sysfs interface to configure THP swapin Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 12/21] mm, THP, swap: Support PMD swap mapping in swapoff Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 13/21] mm, THP, swap: Support PMD swap mapping in madvise_free() Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 14/21] mm, cgroup, THP, swap: Support to move swap account for PMD swap mapping Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 15/21] mm, THP, swap: Support to copy PMD swap mapping when fork() Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 16/21] mm, THP, swap: Free PMD swap mapping when zap_huge_pmd() Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 17/21] mm, THP, swap: Support PMD swap mapping for MADV_WILLNEED Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 18/21] mm, THP, swap: Support PMD swap mapping in mincore() Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 19/21] mm, THP, swap: Support PMD swap mapping in common path Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 20/21] mm, THP, swap: create PMD swap mapping when unmap the THP Huang, Ying
2018-05-23  8:26 ` [PATCH -mm -V3 21/21] mm, THP: Avoid to split THP when reclaim MADV_FREE THP Huang, Ying
2018-06-01  6:11 ` [PATCH -mm -V3 00/21] mm, THP, swap: Swapout/swapin THP in one piece Naoya Horiguchi
2018-06-01  7:03   ` Huang, Ying [this message]
2018-06-04 18:06 ` Daniel Jordan
2018-06-05  4:30   ` Huang, Ying
2018-06-05 16:38     ` Daniel Jordan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87efhryomh.fsf@yhuang-dev.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox