From: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
"yosryahmed@google.com" <yosryahmed@google.com>,
"nphamcs@gmail.com" <nphamcs@gmail.com>,
"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
"21cnbao@gmail.com" <21cnbao@gmail.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"Zou, Nanhai" <nanhai.zou@intel.com>,
"Feghali, Wajdi K" <wajdi.k.feghali@intel.com>,
"Gopal, Vinodh" <vinodh.gopal@intel.com>,
"Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
Subject: RE: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios
Date: Mon, 19 Aug 2024 05:12:53 +0000 [thread overview]
Message-ID: <SJ0PR11MB5678BFAA984BEEBBFC2FC351C98C2@SJ0PR11MB5678.namprd11.prod.outlook.com> (raw)
In-Reply-To: <87msl9i4lw.fsf@yhuang6-desk2.ccr.corp.intel.com>
Hi Ying,
> -----Original Message-----
> From: Huang, Ying <ying.huang@intel.com>
> Sent: Sunday, August 18, 2024 8:17 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com;
> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios
>
> Kanchana P Sridhar <kanchana.p.sridhar@intel.com> writes:
>
> [snip]
>
> >
> > Performance Testing:
> > ====================
> > Testing of this patch-series was done with the v6.11-rc3 mainline, without
> > and with this patch-series, on an Intel Sapphire Rapids server,
> > dual-socket 56 cores per socket, 4 IAA devices per socket.
> >
> > The system has 503 GiB RAM, with a 4G SSD as the backing swap device for
> > ZSWAP. Core frequency was fixed at 2500MHz.
> >
> > The vm-scalability "usemem" test was run in a cgroup whose memory.high
> > was fixed. Following a similar methodology as in Ryan Roberts'
> > "Swap-out mTHP without splitting" series [2], 70 usemem processes were
> > run, each allocating and writing 1G of memory:
> >
> > usemem --init-time -w -O -n 70 1g
> >
> > Since I was constrained to get the 70 usemem processes to generate
> > swapout activity with the 4G SSD, I ended up using different cgroup
> > memory.high fixed limits for the experiments with 64K mTHP and 2M THP:
> >
> > 64K mTHP experiments: cgroup memory fixed at 60G
> > 2M THP experiments : cgroup memory fixed at 55G
> >
> > The vm/sysfs stats included after the performance data provide details
> > on the swapout activity to SSD/ZSWAP.
> >
> > Other kernel configuration parameters:
> >
> > ZSWAP Compressor : LZ4, DEFLATE-IAA
> > ZSWAP Allocator : ZSMALLOC
> > SWAP page-cluster : 2
> >
> > In the experiments where "deflate-iaa" is used as the ZSWAP compressor,
> > IAA "compression verification" is enabled. Hence each IAA compression
> > will be decompressed internally by the "iaa_crypto" driver, the crc-s
> > returned by the hardware will be compared and errors reported in case of
> > mismatches. Thus "deflate-iaa" helps ensure better data integrity as
> > compared to the software compressors.
> >
> > Throughput reported by usemem and perf sys time for running the test
> > are as follows, averaged across 3 runs:
> >
> > 64KB mTHP (cgroup memory.high set to 60G):
> > ==========================================
> > ------------------------------------------------------------------
> > | | | | |
> > |Kernel | mTHP SWAP-OUT | Throughput | Improvement|
> > | | | KB/s | |
> > |--------------------|-------------------|------------|------------|
> > |v6.11-rc3 mainline | SSD | 335,346 | Baseline |
> > |zswap-mTHP-Store | ZSWAP lz4 | 271,558 | -19% |
>
> zswap throughput is worse than ssd swap? This doesn't look right.
I realize it might look that way, however, this is not an apples-to-apples comparison,
as explained in the latter part of my analysis (after the 2M THP data tables).
The primary reason for this is because of running the test under a fixed
cgroup memory limit.
In the "Before" scenario, mTHP get swapped out to SSD. However, the disk swap
usage is not accounted towards checking if the cgroup's memory limit has been
exceeded. Hence there are relatively fewer swap-outs, resulting mainly from the
1G allocations from each of the 70 usemem processes working with a 60G memory
limit on the parent cgroup.
However, the picture changes in the "After" scenario. mTHPs will now get stored in
zswap, which is accounted for in the cgroup's memory.current and counts
towards the fixed memory limit in effect for the parent cgroup. As a result, when
mTHP get stored in zswap, the mTHP compressed data in the zswap zpool now
count towards the cgroup's active memory and memory limit. This is in addition
to the 1G allocations from each of the 70 processes.
As you can see, this creates more memory pressure on the cgroup, resulting in
more swap-outs. With lz4 as the zswap compressor, this results in lesser throughput
wrt "Before".
However, with IAA as the zswap compressor, the throughout with zswap mTHP is
better than "Before" because of better hardware compress latencies, which handle
the higher swap-out activity without compromising on throughput.
>
> > |zswap-mTHP-Store | ZSWAP deflate-iaa | 388,154 | 16% |
> > |------------------------------------------------------------------|
> > | | | | |
> > |Kernel | mTHP SWAP-OUT | Sys time | Improvement|
> > | | | sec | |
> > |--------------------|-------------------|------------|------------|
> > |v6.11-rc3 mainline | SSD | 91.37 | Baseline |
> > |zswap-mTHP=Store | ZSWAP lz4 | 265.43 | -191% |
> > |zswap-mTHP-Store | ZSWAP deflate-iaa | 235.60 | -158% |
> > ------------------------------------------------------------------
> >
> > -----------------------------------------------------------------------
> > | VMSTATS, mTHP ZSWAP/SSD stats| v6.11-rc3 | zswap-mTHP | zswap-
> mTHP |
> > | | mainline | Store | Store |
> > | | | lz4 | deflate-iaa |
> > |-----------------------------------------------------------------------|
> > | pswpin | 0 | 0 | 0 |
> > | pswpout | 174,432 | 0 | 0 |
> > | zswpin | 703 | 534 | 721 |
> > | zswpout | 1,501 | 1,491,654 | 1,398,805 |
>
> It appears that the number of swapped pages for zswap is much larger
> than that of SSD swap. Why? I guess this is why zswap throughput is
> worse.
Your observation is correct. I hope the above explanation helps as to the
reasoning behind this.
Thanks,
Kanchana
>
> > |-----------------------------------------------------------------------|
> > | thp_swpout | 0 | 0 | 0 |
> > | thp_swpout_fallback | 0 | 0 | 0 |
> > | pgmajfault | 3,364 | 3,650 | 3,431 |
> > |-----------------------------------------------------------------------|
> > | hugepages-64kB/stats/zswpout | | 63,200 | 63,244 |
> > |-----------------------------------------------------------------------|
> > | hugepages-64kB/stats/swpout | 10,902 | 0 | 0 |
> > -----------------------------------------------------------------------
> >
>
> [snip]
>
> --
> Best Regards,
> Huang, Ying
next prev parent reply other threads:[~2024-08-19 5:13 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-19 2:16 Kanchana P Sridhar
2024-08-19 2:16 ` [PATCH v4 1/4] mm: zswap: zswap_is_folio_same_filled() takes an index in the folio Kanchana P Sridhar
2024-08-19 2:16 ` [PATCH v4 2/4] mm: zswap: zswap_store() extended to handle mTHP folios Kanchana P Sridhar
2024-08-20 20:03 ` Sridhar, Kanchana P
2024-08-19 2:16 ` [PATCH v4 3/4] mm: Add MTHP_STAT_ZSWPOUT to sysfs per-order mthp stats Kanchana P Sridhar
2024-08-19 2:16 ` [PATCH v4 4/4] mm: swap: Count successful mTHP ZSWAP stores in sysfs mTHP stats Kanchana P Sridhar
2024-08-19 3:16 ` [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios Huang, Ying
2024-08-19 5:12 ` Sridhar, Kanchana P [this message]
2024-08-19 5:51 ` Huang, Ying
2024-08-20 3:00 ` Sridhar, Kanchana P
2024-08-20 21:13 ` Nhat Pham
2024-08-20 22:09 ` Sridhar, Kanchana P
2024-08-21 14:42 ` Nhat Pham
2024-08-21 19:07 ` Sridhar, Kanchana P
2024-08-24 6:21 ` Sridhar, Kanchana P
2024-08-26 14:12 ` Nhat Pham
2024-08-27 6:08 ` Sridhar, Kanchana P
2024-08-27 15:23 ` Nhat Pham
2024-08-27 15:30 ` Nhat Pham
2024-08-27 18:43 ` Sridhar, Kanchana P
2024-08-28 7:27 ` Sridhar, Kanchana P
2024-08-27 18:42 ` Sridhar, Kanchana P
2024-08-28 7:24 ` Sridhar, Kanchana P
2024-08-28 7:43 ` Yosry Ahmed
2024-08-28 18:50 ` Sridhar, Kanchana P
2024-08-28 22:34 ` Yosry Ahmed
2024-08-29 0:14 ` Sridhar, Kanchana P
2024-08-24 3:09 ` Yosry Ahmed
2024-08-24 6:24 ` Sridhar, Kanchana P
2024-08-27 14:55 ` Nhat Pham
2024-08-27 18:09 ` Sridhar, Kanchana P
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SJ0PR11MB5678BFAA984BEEBBFC2FC351C98C2@SJ0PR11MB5678.namprd11.prod.outlook.com \
--to=kanchana.p.sridhar@intel.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nanhai.zou@intel.com \
--cc=nphamcs@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=vinodh.gopal@intel.com \
--cc=wajdi.k.feghali@intel.com \
--cc=ying.huang@intel.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox