From: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
"yosryahmed@google.com" <yosryahmed@google.com>,
"nphamcs@gmail.com" <nphamcs@gmail.com>,
"chengming.zhou@linux.dev" <chengming.zhou@linux.dev>,
"usamaarif642@gmail.com" <usamaarif642@gmail.com>,
"shakeel.butt@linux.dev" <shakeel.butt@linux.dev>,
"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
"21cnbao@gmail.com" <21cnbao@gmail.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"Zou, Nanhai" <nanhai.zou@intel.com>,
"Feghali, Wajdi K" <wajdi.k.feghali@intel.com>,
"Gopal, Vinodh" <vinodh.gopal@intel.com>,
"Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
Subject: RE: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios
Date: Thu, 26 Sep 2024 21:44:38 +0000 [thread overview]
Message-ID: <SJ0PR11MB56782FBA16D086D6264EFFD3C96A2@SJ0PR11MB5678.namprd11.prod.outlook.com> (raw)
In-Reply-To: <87msjurjwc.fsf@yhuang6-desk2.ccr.corp.intel.com>
> -----Original Message-----
> From: Huang, Ying <ying.huang@intel.com>
> Sent: Wednesday, September 25, 2024 11:48 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com;
> chengming.zhou@linux.dev; usamaarif642@gmail.com;
> shakeel.butt@linux.dev; ryan.roberts@arm.com; 21cnbao@gmail.com;
> akpm@linux-foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali,
> Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios
>
> "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com> writes:
>
> > Hi Ying,
> >
> >> -----Original Message-----
> >> From: Huang, Ying <ying.huang@intel.com>
> >> Sent: Wednesday, September 25, 2024 5:45 PM
> >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> >> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com;
> >> chengming.zhou@linux.dev; usamaarif642@gmail.com;
> >> shakeel.butt@linux.dev; ryan.roberts@arm.com; 21cnbao@gmail.com;
> >> akpm@linux-foundation.org; Zou, Nanhai <nanhai.zou@intel.com>;
> Feghali,
> >> Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> >> <vinodh.gopal@intel.com>
> >> Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios
> >>
> >> "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com> writes:
> >>
> >> >> -----Original Message-----
> >> >> From: Huang, Ying <ying.huang@intel.com>
> >> >> Sent: Tuesday, September 24, 2024 11:35 PM
> >> >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> >> >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> >> >> hannes@cmpxchg.org; yosryahmed@google.com;
> nphamcs@gmail.com;
> >> >> chengming.zhou@linux.dev; usamaarif642@gmail.com;
> >> >> shakeel.butt@linux.dev; ryan.roberts@arm.com; 21cnbao@gmail.com;
> >> >> akpm@linux-foundation.org; Zou, Nanhai <nanhai.zou@intel.com>;
> >> Feghali,
> >> >> Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> >> >> <vinodh.gopal@intel.com>
> >> >> Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios
> >> >>
> >> >> Kanchana P Sridhar <kanchana.p.sridhar@intel.com> writes:
> >> >>
> >> >> [snip]
> >> >>
> >> >> >
> >> >> > Case 1: Comparing zswap 4K vs. zswap mTHP
> >> >> > =========================================
> >> >> >
> >> >> > In this scenario, the "before" is CONFIG_THP_SWAP set to off, that
> >> results in
> >> >> > 64K/2M (m)THP to be split into 4K folios that get processed by zswap.
> >> >> >
> >> >> > The "after" is CONFIG_THP_SWAP set to on, and this patch-series,
> that
> >> >> results
> >> >> > in 64K/2M (m)THP to not be split, and processed by zswap.
> >> >> >
> >> >> > 64KB mTHP (cgroup memory.high set to 40G):
> >> >> > ==========================================
> >> >> >
> >> >> > -------------------------------------------------------------------------------
> >> >> > mm-unstable 9-23-2024 zswap-mTHP Change
> wrt
> >> >> > CONFIG_THP_SWAP=N CONFIG_THP_SWAP=Y
> >> Baseline
> >> >> > Baseline
> >> >> > -------------------------------------------------------------------------------
> >> >> > ZSWAP compressor zstd deflate- zstd deflate- zstd
> deflate-
> >> >> > iaa iaa iaa
> >> >> > -------------------------------------------------------------------------------
> >> >> > Throughput (KB/s) 143,323 125,485 153,550 129,609 7%
> >> 3%
> >> >> > elapsed time (sec) 24.97 25.42 23.90 25.19 4% 1%
> >> >> > sys time (sec) 822.72 750.96 757.70 731.13 8% 3%
> >> >> > memcg_high 132,743 169,825 148,075 192,744
> >> >> > memcg_swap_fail 639,067 841,553 2,204 2,215
> >> >> > pswpin 0 0 0 0
> >> >> > pswpout 0 0 0 0
> >> >> > zswpin 795 873 760 902
> >> >> > zswpout 10,011,266 13,195,137 10,010,017 13,193,554
> >> >> > thp_swpout 0 0 0 0
> >> >> > thp_swpout_ 0 0 0 0
> >> >> > fallback
> >> >> > 64kB-mthp_ 639,065 841,553 2,204 2,215
> >> >> > swpout_fallback
> >> >> > pgmajfault 2,861 2,924 3,054 3,259
> >> >> > ZSWPOUT-64kB n/a n/a 623,451 822,268
> >> >> > SWPOUT-64kB 0 0 0 0
> >> >> > -------------------------------------------------------------------------------
> >> >> >
> >> >>
> >> >> IIUC, the throughput is the sum of throughput of all usemem processes?
> >> >>
> >> >> One possible issue of usemem test case is the "imbalance" issue. That
> >> >> is, some usemem processes may swap-out/swap-in less, so the score is
> >> >> very high; while some other processes may swap-out/swap-in more, so
> the
> >> >> score is very low. Sometimes, the total score decreases, but the scores
> >> >> of usemem processes are more balanced, so that the performance
> should
> >> be
> >> >> considered better. And, in general, we should make usemem score
> >> >> balanced among processes via say longer test time. Can you check this
> >> >> in your test results?
> >> >
> >> > Actually, the throughput data listed in the cover-letter is the average of
> >> > all the usemem processes. Your observation about the "imbalance" issue
> is
> >> > right. Some processes see a higher throughput than others. I have
> noticed
> >> > that the throughputs progressively reduce as the individual processes
> exit
> >> > and print their stats.
> >> >
> >> > Listed below are the stats from two runs of usemem70: sleep 10 and
> sleep
> >> 30.
> >> > Both are run with a cgroup mem-limit of 40G. Data is with v7, 64K folios
> are
> >> > enabled, zswap uses zstd.
> >> >
> >> >
> >> > -----------------------------------------------
> >> > sleep 10 sleep 30
> >> > Throughput (KB/s) Throughput (KB/s)
> >> > -----------------------------------------------
> >> > 181,540 191,686
> >> > 179,651 191,459
> >> > 179,068 188,834
> >> > 177,244 187,568
> >> > 177,215 186,703
> >> > 176,565 185,584
> >> > 176,546 185,370
> >> > 176,470 185,021
> >> > 176,214 184,303
> >> > 176,128 184,040
> >> > 175,279 183,932
> >> > 174,745 180,831
> >> > 173,935 179,418
> >> > 161,546 168,014
> >> > 160,332 167,540
> >> > 160,122 167,364
> >> > 159,613 167,020
> >> > 159,546 166,590
> >> > 159,021 166,483
> >> > 158,845 166,418
> >> > 158,426 166,264
> >> > 158,396 166,066
> >> > 158,371 165,944
> >> > 158,298 165,866
> >> > 158,250 165,884
> >> > 158,057 165,533
> >> > 158,011 165,532
> >> > 157,899 165,457
> >> > 157,894 165,424
> >> > 157,839 165,410
> >> > 157,731 165,407
> >> > 157,629 165,273
> >> > 157,626 164,867
> >> > 157,581 164,636
> >> > 157,471 164,266
> >> > 157,430 164,225
> >> > 157,287 163,290
> >> > 156,289 153,597
> >> > 153,970 147,494
> >> > 148,244 147,102
> >> > 142,907 146,111
> >> > 142,811 145,789
> >> > 139,171 141,168
> >> > 136,314 140,714
> >> > 133,616 140,111
> >> > 132,881 139,636
> >> > 132,729 136,943
> >> > 132,680 136,844
> >> > 132,248 135,726
> >> > 132,027 135,384
> >> > 131,929 135,270
> >> > 131,766 134,748
> >> > 131,667 134,733
> >> > 131,576 134,582
> >> > 131,396 134,302
> >> > 131,351 134,160
> >> > 131,135 134,102
> >> > 130,885 134,097
> >> > 130,854 134,058
> >> > 130,767 134,006
> >> > 130,666 133,960
> >> > 130,647 133,894
> >> > 130,152 133,837
> >> > 130,006 133,747
> >> > 129,921 133,679
> >> > 129,856 133,666
> >> > 129,377 133,564
> >> > 128,366 133,331
> >> > 127,988 132,938
> >> > 126,903 132,746
> >> > -----------------------------------------------
> >> > sum 10,526,916 10,919,561
> >> > average 150,385 155,994
> >> > stddev 17,551 19,633
> >> > -----------------------------------------------
> >> > elapsed 24.40 43.66
> >> > time (sec)
> >> > sys time 806.25 766.05
> >> > (sec)
> >> > zswpout 10,008,713 10,008,407
> >> > 64K folio 623,463 623,629
> >> > swpout
> >> > -----------------------------------------------
> >>
> >> Although there are some imbalance, I don't find it's too much. So, I
> >> think the test result is reasonable. Please pay attention to the
> >> imbalance issue in the future tests.
> >
> > Sure, will do so.
> >
> >>
> >> > As we increase the time for which allocations are maintained,
> >> > there seems to be a slight improvement in throughput, but the
> >> > variance increases as well. The processes with lower throughput
> >> > could be the ones that handle the memcg being over limit by
> >> > doing reclaim, possibly before they can allocate.
> >> >
> >> > Interestingly, the longer test time does seem to reduce the amount
> >> > of reclaim (hence lower sys time), but more 64K large folios seem to
> >> > be reclaimed. Could this mean that with longer test time (sleep 30),
> >> > more cold memory residing in large folios is getting reclaimed, as
> >> > against memory just relinquished by the exiting processes?
> >>
> >> I don't think longer sleep time in test helps much to balance. Can you
> >> try with less process, and larger memory size per process? I guess that
> >> this will improve balance.
> >
> > I tried this, and the data is listed below:
> >
> > usemem options:
> > ---------------
> > 30 processes allocate 10G each
> > cgroup memory limit = 150G
> > sleep 10
> > 525Gi SSD disk swap partition
> > 64K large folios enabled
> >
> > Throughput (KB/s) of each of the 30 processes:
> > ---------------------------------------------------------------
> > mm-unstable zswap_store of large folios
> > 9-25-2024 v7
> > zswap compressor: zstd zstd deflate-iaa
> > ---------------------------------------------------------------
> > 38,393 234,485 374,427
> > 37,283 215,528 314,225
> > 37,156 214,942 304,413
> > 37,143 213,073 304,146
> > 36,814 212,904 290,186
> > 36,277 212,304 288,212
> > 36,104 212,207 285,682
> > 36,000 210,173 270,661
> > 35,994 208,487 256,960
> > 35,979 207,788 248,313
> > 35,967 207,714 235,338
> > 35,966 207,703 229,335
> > 35,835 207,690 221,697
> > 35,793 207,418 221,600
> > 35,692 206,160 219,346
> > 35,682 206,128 219,162
> > 35,681 205,817 219,155
> > 35,678 205,546 214,862
> > 35,678 205,523 214,710
> > 35,677 204,951 214,282
> > 35,677 204,283 213,441
> > 35,677 203,348 213,011
> > 35,675 203,028 212,923
> > 35,673 201,922 212,492
> > 35,672 201,660 212,225
> > 35,672 200,724 211,808
> > 35,672 200,324 211,420
> > 35,671 199,686 211,413
> > 35,667 198,858 211,346
> > 35,667 197,590 211,209
> > ---------------------------------------------------------------
> > sum 1,081,515 6,217,964 7,268,000
> > average 36,051 207,265 242,267
> > stddev 655 7,010 42,234
> > elapsed time (sec) 343.70 107.40 84.34
> > sys time (sec) 269.30 2,520.13 1,696.20
> > memcg.high breaches 443,672 475,074 623,333
> > zswpout 22,605 48,931,249 54,777,100
> > pswpout 40,004,528 0 0
> > hugepages-64K zswpout 0 3,057,090 3,421,855
> > hugepages-64K swpout 2,500,283 0 0
> > ---------------------------------------------------------------
> >
> > As you can see, this is quite a memory-constrained scenario, where we
> > are giving a 50% of total memory required, as the memory limit for the
> > cgroup in which the 30 processes are run. This causes significantly more
> > reclaim activity than the setup I was using thus far (70 processes, 1G,
> > 40G limit).
> >
> > The variance or "imbalance" reduces somewhat for zstd, but not for IAA.
> >
> > IAA shows really good throughput (17%) and elapsed time (21%) and
> > sys time (33%) improvement wrt zstd with zswap_store of large folios.
> > These are the memory-constrained scenarios in which IAA typically
> > does really well. IAA verify_compress is enabled, so this is an added
> > data integrity checks benefit we get with IAA.
> >
> > I would like to get your and the maintainers' feedback on whether
> > I should switch to this "usemem30-10G" setup for v8?
>
> The results looks good to me. I suggest you to use it.
Ok, sure, thanks Ying.
Thanks,
Kanchana
>
> --
> Best Regards,
> Huang, Ying
prev parent reply other threads:[~2024-09-26 21:44 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-24 1:17 Kanchana P Sridhar
2024-09-24 1:17 ` [PATCH v7 1/8] mm: Define obj_cgroup_get() if CONFIG_MEMCG is not defined Kanchana P Sridhar
2024-09-24 16:45 ` Nhat Pham
2024-09-24 1:17 ` [PATCH v7 2/8] mm: zswap: Modify zswap_compress() to accept a page instead of a folio Kanchana P Sridhar
2024-09-24 16:50 ` Nhat Pham
2024-09-24 1:17 ` [PATCH v7 3/8] mm: zswap: Refactor code to store an entry in zswap xarray Kanchana P Sridhar
2024-09-24 17:16 ` Nhat Pham
2024-09-24 20:40 ` Sridhar, Kanchana P
2024-09-24 19:14 ` Yosry Ahmed
2024-09-24 22:22 ` Sridhar, Kanchana P
2024-09-24 1:17 ` [PATCH v7 4/8] mm: zswap: Refactor code to delete stored offsets in case of errors Kanchana P Sridhar
2024-09-24 17:25 ` Nhat Pham
2024-09-24 20:41 ` Sridhar, Kanchana P
2024-09-24 19:20 ` Yosry Ahmed
2024-09-24 22:32 ` Sridhar, Kanchana P
2024-09-25 0:43 ` Yosry Ahmed
2024-09-25 1:18 ` Sridhar, Kanchana P
2024-09-25 14:11 ` Johannes Weiner
2024-09-25 18:45 ` Sridhar, Kanchana P
2024-09-24 1:17 ` [PATCH v7 5/8] mm: zswap: Compress and store a specific page in a folio Kanchana P Sridhar
2024-09-24 19:28 ` Yosry Ahmed
2024-09-24 22:45 ` Sridhar, Kanchana P
2024-09-25 0:47 ` Yosry Ahmed
2024-09-25 1:49 ` Sridhar, Kanchana P
2024-09-25 13:53 ` Johannes Weiner
2024-09-25 18:45 ` Sridhar, Kanchana P
2024-09-24 1:17 ` [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store() Kanchana P Sridhar
2024-09-24 17:33 ` Nhat Pham
2024-09-24 20:51 ` Sridhar, Kanchana P
2024-09-24 21:08 ` Nhat Pham
2024-09-24 21:34 ` Yosry Ahmed
2024-09-24 22:16 ` Nhat Pham
2024-09-24 22:18 ` Sridhar, Kanchana P
2024-09-24 22:28 ` Yosry Ahmed
2024-09-24 22:17 ` Sridhar, Kanchana P
2024-09-24 19:38 ` Yosry Ahmed
2024-09-24 20:51 ` Nhat Pham
2024-09-24 21:38 ` Yosry Ahmed
2024-09-24 23:11 ` Nhat Pham
2024-09-25 0:05 ` Sridhar, Kanchana P
2024-09-25 0:52 ` Yosry Ahmed
2024-09-24 23:21 ` Sridhar, Kanchana P
2024-09-24 23:02 ` Sridhar, Kanchana P
2024-09-25 13:40 ` Johannes Weiner
2024-09-25 18:30 ` Yosry Ahmed
2024-09-25 19:10 ` Sridhar, Kanchana P
2024-09-25 19:49 ` Yosry Ahmed
2024-09-25 20:49 ` Johannes Weiner
2024-09-25 19:20 ` Johannes Weiner
2024-09-25 19:39 ` Yosry Ahmed
2024-09-25 20:13 ` Johannes Weiner
2024-09-25 21:06 ` Yosry Ahmed
2024-09-25 22:29 ` Sridhar, Kanchana P
2024-09-26 3:58 ` Sridhar, Kanchana P
2024-09-26 4:52 ` Yosry Ahmed
2024-09-26 16:40 ` Sridhar, Kanchana P
2024-09-26 17:19 ` Yosry Ahmed
2024-09-26 17:29 ` Sridhar, Kanchana P
2024-09-26 17:34 ` Yosry Ahmed
2024-09-26 19:36 ` Sridhar, Kanchana P
2024-09-26 18:43 ` Johannes Weiner
2024-09-26 18:45 ` Yosry Ahmed
2024-09-26 19:40 ` Sridhar, Kanchana P
2024-09-26 19:39 ` Sridhar, Kanchana P
2024-09-25 14:27 ` Johannes Weiner
2024-09-25 18:17 ` Yosry Ahmed
2024-09-25 18:48 ` Sridhar, Kanchana P
2024-09-24 1:17 ` [PATCH v7 7/8] mm: swap: Count successful mTHP ZSWAP stores in sysfs mTHP zswpout stats Kanchana P Sridhar
2024-09-24 1:17 ` [PATCH v7 8/8] mm: Document the newly added mTHP zswpout stats, clarify swpout semantics Kanchana P Sridhar
2024-09-24 17:36 ` Nhat Pham
2024-09-24 20:52 ` Sridhar, Kanchana P
2024-09-24 19:34 ` [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios Yosry Ahmed
2024-09-24 22:50 ` Sridhar, Kanchana P
2024-09-25 6:35 ` Huang, Ying
2024-09-25 18:39 ` Sridhar, Kanchana P
2024-09-26 0:44 ` Huang, Ying
2024-09-26 3:48 ` Sridhar, Kanchana P
2024-09-26 6:47 ` Huang, Ying
2024-09-26 21:44 ` Sridhar, Kanchana P [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SJ0PR11MB56782FBA16D086D6264EFFD3C96A2@SJ0PR11MB5678.namprd11.prod.outlook.com \
--to=kanchana.p.sridhar@intel.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nanhai.zou@intel.com \
--cc=nphamcs@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=vinodh.gopal@intel.com \
--cc=wajdi.k.feghali@intel.com \
--cc=ying.huang@intel.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox