From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com,
ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com,
akpm@linux-foundation.org
Cc: nanhai.zou@intel.com, wajdi.k.feghali@intel.com,
vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com
Subject: [RFC PATCH v1 0/4] mm: ZSWAP swap-out of mTHP folios
Date: Tue, 13 Aug 2024 23:28:26 -0700 [thread overview]
Message-ID: <20240814062830.26833-1-kanchana.p.sridhar@intel.com> (raw)
This RFC patch-series enables zswap_store() to accept and store mTHP
folios. The most significant contribution in this series is from the
earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been
migrated to v6.10 in patch [3] of this series.
[1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting
https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@arm.com/T/#u
Additionally, there is an attempt to modularize some of the functionality
in zswap_store(), to make it more amenable to supporting any-order
mTHPs.
For instance, the determination of whether a folio is same-filled is
based on mapping an index into the folio to derive the page. Likewise,
there is a function "zswap_store_entry" added to store a zswap_entry in
the xarray.
For testing purposes, per-mTHP size vmstat zswap_store event counters are
added, and incremented upon successful zswap_store of an mTHP.
This patch-series is a precursor to ZSWAP compress batching of mTHP
swap-out and decompress batching of swap-ins based on swapin_readahead(),
using Intel IAA hardware acceleration, which we would like to submit in
subsequent RFC patch-series, with performance improvement data.
Performance Testing:
====================
Testing of this patch-series was done with the v6.10 mainline, without
and with this RFC, on an Intel Sapphire Rapids server, dual-socket 56
cores per socket, 4 IAA devices per socket.
The system has 503 GiB RAM, 176 GiB swap/ZSWAP with ZRAM as the backing
swap device. Core frequency was fixed at 2500MHz.
The vm-scalability "usemem" test was run in a cgroup whose memory.high
was fixed at 40G. Following a similar methodology as in Ryan Roberts'
"Swap-out mTHP without splitting" series [2], 70 usemem processes were
run, each allocating and writing 1G of memory:
usemem --init-time -w -O -n 70 1g
Other kernel configuration parameters:
ZSWAP Compressor : LZ4, DEFLATE-IAA
ZSWAP Allocator : ZSMALLOC
ZRAM Compressor : LZO-RLE
SWAP page-cluster : 2
In the experiments where "deflate-iaa" is used as the ZSWAP compressor,
IAA "compression verification" is enabled. Hence each IAA compression
will be decompressed internally by the "iaa_crypto" driver, the crc-s
returned by the hardware will be compared and errors reported in case of
mismatches. Thus "deflate-iaa" helps ensure better data integrity as
compared to the software compressors.
Throughput reported by usemem and perf sys time for running the test were
measured and averaged across 3 runs:
64KB mTHP:
==========
----------------------------------------------------------
| | | | |
|Kernel | mTHP SWAP-OUT | Throughput | Improvement|
| | | KB/s | |
|----------------|---------------|------------|------------|
|v6.10 mainline | ZRAM lzo-rle | 111,180 | Baseline |
|zswap-mTHP-RFC | ZSWAP lz4 | 115,996 | 4% |
|zswap-mTHP-RFC | ZSWAP | | |
| | deflate-iaa | 166,048 | 49% |
|----------------------------------------------------------|
| | | | |
|Kernel | mTHP SWAP-OUT | Sys time | Improvement|
| | | sec | |
|----------------|---------------|------------|------------|
|v6.10 mainline | ZRAM lzo-rle | 1,049.69 | Baseline |
|zswap-mTHP RFC | ZSWAP lz4 | 1,178.20 | -12% |
|zswap-mTHP-RFC | ZSWAP | | |
| | deflate-iaa | 626.12 | 40% |
----------------------------------------------------------
-------------------------------------------------------
| VMSTATS, mTHP ZSWAP stats, | v6.10 | zswap-mTHP |
| mTHP ZRAM stats: | mainline | RFC |
|-------------------------------------------------------|
| pswpin | 16 | 0 |
| pswpout | 7,823,984 | 0 |
| zswpin | 551 | 647 |
| zswpout | 1,410 | 15,175,113 |
|-------------------------------------------------------|
| thp_swpout | 0 | 0 |
| thp_swpout_fallback | 0 | 0 |
| pgmajfault | 2,189 | 2,241 |
|-------------------------------------------------------|
| zswpout_4kb_folio | | 1,497 |
| mthp_zswpout_64kb | | 948,351 |
|-------------------------------------------------------|
| hugepages-64kB/stats/swpout| 488,999 | 0 |
-------------------------------------------------------
2MB PMD-THP/2048K mTHP:
=======================
----------------------------------------------------------
| | | | |
|Kernel | mTHP SWAP-OUT | Throughput | Improvement|
| | | KB/s | |
|----------------|---------------|------------|------------|
|v6.10 mainline | ZRAM lzo-rle | 136,617 | Baseline |
|zswap-mTHP-RFC | ZSWAP lz4 | 137,360 | 1% |
|zswap-mTHP-RFC | ZSWAP | | |
| | deflate-iaa | 179,097 | 31% |
|----------------------------------------------------------|
| | | | |
|Kernel | mTHP SWAP-OUT | Sys time | Improvement|
| | | sec | |
|----------------|---------------|------------|------------|
|v6.10 mainline | ZRAM lzo-rle | 1,044.40 | Baseline |
|zswap-mTHP RFC | ZSWAP lz4 | 1,035.79 | 1% |
|zswap-mTHP-RFC | ZSWAP | | |
| | deflate-iaa | 571.31 | 45% |
----------------------------------------------------------
---------------------------------------------------------
| VMSTATS, mTHP ZSWAP stats, | v6.10 | zswap-mTHP |
| mTHP ZRAM stats: | mainline | RFC |
|---------------------------------------------------------|
| pswpin | 0 | 0 |
| pswpout | 8,630,272 | 0 |
| zswpin | 565 | 6,901 |
| zswpout | 1,388 | 15,379,163 |
|---------------------------------------------------------|
| thp_swpout | 16,856 | 0 |
| thp_swpout_fallback | 0 | 0 |
| pgmajfault | 2,184 | 8,532 |
|---------------------------------------------------------|
| zswpout_4kb_folio | | 5,851 |
| mthp_zswpout_2048kb | | 30,026 |
| zswpout_pmd_thp_folio | | 30,026 |
|---------------------------------------------------------|
| hugepages-2048kB/stats/swpout| 16,856 | 0 |
---------------------------------------------------------
As expected in the "Before" experiment, there are relatively fewer
swapouts, because ZRAM utilization is not accounted in the cgroup.
With the introduction of zswap_store mTHP, the "After" data reflects the
higher swapout activity, and consequent sys time degradation.
Our goal is to improve ZSWAP mTHP store performance using batching. With
Intel IAA compress/decompress batching used in ZSWAP (to be submitted as
additional RFC series), we are able to demonstrate significant
performance improvements with IAA as compared to software compressors.
For instance, with IAA-Canned compression [3] used with batching of
zswap_stores and of zswap_loads, the usemem experiment's average of 3
runs throughput improves to 170,461 KB/s (64KB mTHP) and 188,325 KB/s
(2MB THP).
[2] https://lore.kernel.org/linux-mm/20240408183946.2991168-1-ryan.roberts@arm.com/
[3] https://patchwork.kernel.org/project/linux-crypto/cover/cover.1710969449.git.andre.glover@linux.intel.com/
Kanchana P Sridhar (4):
mm: zswap: zswap_is_folio_same_filled() takes an index in the folio.
mm: vmstat: Per mTHP-size zswap_store vmstat event counters.
mm: zswap: zswap_store() extended to handle mTHP folios.
mm: page_io: Count successful mTHP zswap stores in vmstat.
include/linux/vm_event_item.h | 15 +++
mm/page_io.c | 44 +++++++
mm/vmstat.c | 15 +++
mm/zswap.c | 223 ++++++++++++++++++++++++----------
4 files changed, 233 insertions(+), 64 deletions(-)
--
2.27.0
next reply other threads:[~2024-08-14 6:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-14 6:28 Kanchana P Sridhar [this message]
2024-08-14 6:28 ` [RFC PATCH v1 1/4] mm: zswap: zswap_is_folio_same_filled() takes an index in the folio Kanchana P Sridhar
2024-08-14 6:28 ` [RFC PATCH v1 2/4] mm: vmstat: Per mTHP-size zswap_store vmstat event counters Kanchana P Sridhar
2024-08-14 7:48 ` Barry Song
2024-08-14 17:40 ` Sridhar, Kanchana P
2024-08-14 23:24 ` Barry Song
2024-08-15 1:37 ` Sridhar, Kanchana P
2024-08-14 6:28 ` [RFC PATCH v1 3/4] mm: zswap: zswap_store() extended to handle mTHP folios Kanchana P Sridhar
2024-08-14 6:28 ` [RFC PATCH v1 4/4] mm: page_io: Count successful mTHP zswap stores in vmstat Kanchana P Sridhar
2024-08-14 7:53 ` Barry Song
2024-08-14 17:47 ` Sridhar, Kanchana P
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240814062830.26833-1-kanchana.p.sridhar@intel.com \
--to=kanchana.p.sridhar@intel.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nanhai.zou@intel.com \
--cc=nphamcs@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=vinodh.gopal@intel.com \
--cc=wajdi.k.feghali@intel.com \
--cc=ying.huang@intel.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox