linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"yosryahmed@google.com" <yosryahmed@google.com>,
	"nphamcs@gmail.com" <nphamcs@gmail.com>,
	"chengming.zhou@linux.dev" <chengming.zhou@linux.dev>,
	"usamaarif642@gmail.com" <usamaarif642@gmail.com>,
	"shakeel.butt@linux.dev" <shakeel.butt@linux.dev>,
	"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
	"21cnbao@gmail.com" <21cnbao@gmail.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"Zou, Nanhai" <nanhai.zou@intel.com>,
	"Feghali, Wajdi K" <wajdi.k.feghali@intel.com>,
	"Gopal, Vinodh" <vinodh.gopal@intel.com>,
	"Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
Subject: RE: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios
Date: Thu, 26 Sep 2024 21:44:38 +0000	[thread overview]
Message-ID: <SJ0PR11MB56782FBA16D086D6264EFFD3C96A2@SJ0PR11MB5678.namprd11.prod.outlook.com> (raw)
In-Reply-To: <87msjurjwc.fsf@yhuang6-desk2.ccr.corp.intel.com>

> -----Original Message-----
> From: Huang, Ying <ying.huang@intel.com>
> Sent: Wednesday, September 25, 2024 11:48 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com;
> chengming.zhou@linux.dev; usamaarif642@gmail.com;
> shakeel.butt@linux.dev; ryan.roberts@arm.com; 21cnbao@gmail.com;
> akpm@linux-foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali,
> Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios
> 
> "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com> writes:
> 
> > Hi Ying,
> >
> >> -----Original Message-----
> >> From: Huang, Ying <ying.huang@intel.com>
> >> Sent: Wednesday, September 25, 2024 5:45 PM
> >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> >> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com;
> >> chengming.zhou@linux.dev; usamaarif642@gmail.com;
> >> shakeel.butt@linux.dev; ryan.roberts@arm.com; 21cnbao@gmail.com;
> >> akpm@linux-foundation.org; Zou, Nanhai <nanhai.zou@intel.com>;
> Feghali,
> >> Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> >> <vinodh.gopal@intel.com>
> >> Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios
> >>
> >> "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com> writes:
> >>
> >> >> -----Original Message-----
> >> >> From: Huang, Ying <ying.huang@intel.com>
> >> >> Sent: Tuesday, September 24, 2024 11:35 PM
> >> >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> >> >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> >> >> hannes@cmpxchg.org; yosryahmed@google.com;
> nphamcs@gmail.com;
> >> >> chengming.zhou@linux.dev; usamaarif642@gmail.com;
> >> >> shakeel.butt@linux.dev; ryan.roberts@arm.com; 21cnbao@gmail.com;
> >> >> akpm@linux-foundation.org; Zou, Nanhai <nanhai.zou@intel.com>;
> >> Feghali,
> >> >> Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> >> >> <vinodh.gopal@intel.com>
> >> >> Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios
> >> >>
> >> >> Kanchana P Sridhar <kanchana.p.sridhar@intel.com> writes:
> >> >>
> >> >> [snip]
> >> >>
> >> >> >
> >> >> > Case 1: Comparing zswap 4K vs. zswap mTHP
> >> >> > =========================================
> >> >> >
> >> >> > In this scenario, the "before" is CONFIG_THP_SWAP set to off, that
> >> results in
> >> >> > 64K/2M (m)THP to be split into 4K folios that get processed by zswap.
> >> >> >
> >> >> > The "after" is CONFIG_THP_SWAP set to on, and this patch-series,
> that
> >> >> results
> >> >> > in 64K/2M (m)THP to not be split, and processed by zswap.
> >> >> >
> >> >> >  64KB mTHP (cgroup memory.high set to 40G):
> >> >> >  ==========================================
> >> >> >
> >> >> >  -------------------------------------------------------------------------------
> >> >> >                     mm-unstable 9-23-2024              zswap-mTHP     Change
> wrt
> >> >> >                         CONFIG_THP_SWAP=N       CONFIG_THP_SWAP=Y
> >> Baseline
> >> >> >                                  Baseline
> >> >> >  -------------------------------------------------------------------------------
> >> >> >  ZSWAP compressor       zstd     deflate-        zstd    deflate-  zstd
> deflate-
> >> >> >                                       iaa                     iaa            iaa
> >> >> >  -------------------------------------------------------------------------------
> >> >> >  Throughput (KB/s)   143,323      125,485     153,550     129,609    7%
> >> 3%
> >> >> >  elapsed time (sec)    24.97        25.42       23.90       25.19    4%       1%
> >> >> >  sys time (sec)       822.72       750.96      757.70      731.13    8%       3%
> >> >> >  memcg_high          132,743      169,825     148,075     192,744
> >> >> >  memcg_swap_fail     639,067      841,553       2,204       2,215
> >> >> >  pswpin                    0            0           0           0
> >> >> >  pswpout                   0            0           0           0
> >> >> >  zswpin                  795          873         760         902
> >> >> >  zswpout          10,011,266   13,195,137  10,010,017  13,193,554
> >> >> >  thp_swpout                0            0           0           0
> >> >> >  thp_swpout_               0            0           0           0
> >> >> >   fallback
> >> >> >  64kB-mthp_          639,065      841,553       2,204       2,215
> >> >> >   swpout_fallback
> >> >> >  pgmajfault            2,861        2,924       3,054       3,259
> >> >> >  ZSWPOUT-64kB            n/a          n/a     623,451     822,268
> >> >> >  SWPOUT-64kB               0            0           0           0
> >> >> >  -------------------------------------------------------------------------------
> >> >> >
> >> >>
> >> >> IIUC, the throughput is the sum of throughput of all usemem processes?
> >> >>
> >> >> One possible issue of usemem test case is the "imbalance" issue.  That
> >> >> is, some usemem processes may swap-out/swap-in less, so the score is
> >> >> very high; while some other processes may swap-out/swap-in more, so
> the
> >> >> score is very low.  Sometimes, the total score decreases, but the scores
> >> >> of usemem processes are more balanced, so that the performance
> should
> >> be
> >> >> considered better.  And, in general, we should make usemem score
> >> >> balanced among processes via say longer test time.  Can you check this
> >> >> in your test results?
> >> >
> >> > Actually, the throughput data listed in the cover-letter is the average of
> >> > all the usemem processes. Your observation about the "imbalance" issue
> is
> >> > right. Some processes see a higher throughput than others. I have
> noticed
> >> > that the throughputs progressively reduce as the individual processes
> exit
> >> > and print their stats.
> >> >
> >> > Listed below are the stats from two runs of usemem70: sleep 10 and
> sleep
> >> 30.
> >> > Both are run with a cgroup mem-limit of 40G. Data is with v7, 64K folios
> are
> >> > enabled, zswap uses zstd.
> >> >
> >> >
> >> > -----------------------------------------------
> >> >                sleep 10           sleep 30
> >> >       Throughput (KB/s)  Throughput (KB/s)
> >> >  -----------------------------------------------
> >> >                 181,540            191,686
> >> >                 179,651            191,459
> >> >                 179,068            188,834
> >> >                 177,244            187,568
> >> >                 177,215            186,703
> >> >                 176,565            185,584
> >> >                 176,546            185,370
> >> >                 176,470            185,021
> >> >                 176,214            184,303
> >> >                 176,128            184,040
> >> >                 175,279            183,932
> >> >                 174,745            180,831
> >> >                 173,935            179,418
> >> >                 161,546            168,014
> >> >                 160,332            167,540
> >> >                 160,122            167,364
> >> >                 159,613            167,020
> >> >                 159,546            166,590
> >> >                 159,021            166,483
> >> >                 158,845            166,418
> >> >                 158,426            166,264
> >> >                 158,396            166,066
> >> >                 158,371            165,944
> >> >                 158,298            165,866
> >> >                 158,250            165,884
> >> >                 158,057            165,533
> >> >                 158,011            165,532
> >> >                 157,899            165,457
> >> >                 157,894            165,424
> >> >                 157,839            165,410
> >> >                 157,731            165,407
> >> >                 157,629            165,273
> >> >                 157,626            164,867
> >> >                 157,581            164,636
> >> >                 157,471            164,266
> >> >                 157,430            164,225
> >> >                 157,287            163,290
> >> >                 156,289            153,597
> >> >                 153,970            147,494
> >> >                 148,244            147,102
> >> >                 142,907            146,111
> >> >                 142,811            145,789
> >> >                 139,171            141,168
> >> >                 136,314            140,714
> >> >                 133,616            140,111
> >> >                 132,881            139,636
> >> >                 132,729            136,943
> >> >                 132,680            136,844
> >> >                 132,248            135,726
> >> >                 132,027            135,384
> >> >                 131,929            135,270
> >> >                 131,766            134,748
> >> >                 131,667            134,733
> >> >                 131,576            134,582
> >> >                 131,396            134,302
> >> >                 131,351            134,160
> >> >                 131,135            134,102
> >> >                 130,885            134,097
> >> >                 130,854            134,058
> >> >                 130,767            134,006
> >> >                 130,666            133,960
> >> >                 130,647            133,894
> >> >                 130,152            133,837
> >> >                 130,006            133,747
> >> >                 129,921            133,679
> >> >                 129,856            133,666
> >> >                 129,377            133,564
> >> >                 128,366            133,331
> >> >                 127,988            132,938
> >> >                 126,903            132,746
> >> >  -----------------------------------------------
> >> >       sum    10,526,916         10,919,561
> >> >   average       150,385            155,994
> >> >    stddev        17,551             19,633
> >> >  -----------------------------------------------
> >> >     elapsed       24.40              43.66
> >> >  time (sec)
> >> >    sys time      806.25             766.05
> >> >       (sec)
> >> >     zswpout  10,008,713         10,008,407
> >> >   64K folio     623,463            623,629
> >> >      swpout
> >> >  -----------------------------------------------
> >>
> >> Although there are some imbalance, I don't find it's too much.  So, I
> >> think the test result is reasonable.  Please pay attention to the
> >> imbalance issue in the future tests.
> >
> > Sure, will do so.
> >
> >>
> >> > As we increase the time for which allocations are maintained,
> >> > there seems to be a slight improvement in throughput, but the
> >> > variance increases as well. The processes with lower throughput
> >> > could be the ones that handle the memcg being over limit by
> >> > doing reclaim, possibly before they can allocate.
> >> >
> >> > Interestingly, the longer test time does seem to reduce the amount
> >> > of reclaim (hence lower sys time), but more 64K large folios seem to
> >> > be reclaimed. Could this mean that with longer test time (sleep 30),
> >> > more cold memory residing in large folios is getting reclaimed, as
> >> > against memory just relinquished by the exiting processes?
> >>
> >> I don't think longer sleep time in test helps much to balance.  Can you
> >> try with less process, and larger memory size per process?  I guess that
> >> this will improve balance.
> >
> > I tried this, and the data is listed below:
> >
> >   usemem options:
> >   ---------------
> >   30 processes allocate 10G each
> >   cgroup memory limit = 150G
> >   sleep 10
> >   525Gi SSD disk swap partition
> >   64K large folios enabled
> >
> >   Throughput (KB/s) of each of the 30 processes:
> >  ---------------------------------------------------------------
> >                       mm-unstable    zswap_store of large folios
> >                         9-25-2024                v7
> >  zswap compressor:           zstd         zstd  deflate-iaa
> >  ---------------------------------------------------------------
> >                            38,393      234,485      374,427
> >                            37,283      215,528      314,225
> >                            37,156      214,942      304,413
> >                            37,143      213,073      304,146
> >                            36,814      212,904      290,186
> >                            36,277      212,304      288,212
> >                            36,104      212,207      285,682
> >                            36,000      210,173      270,661
> >                            35,994      208,487      256,960
> >                            35,979      207,788      248,313
> >                            35,967      207,714      235,338
> >                            35,966      207,703      229,335
> >                            35,835      207,690      221,697
> >                            35,793      207,418      221,600
> >                            35,692      206,160      219,346
> >                            35,682      206,128      219,162
> >                            35,681      205,817      219,155
> >                            35,678      205,546      214,862
> >                            35,678      205,523      214,710
> >                            35,677      204,951      214,282
> >                            35,677      204,283      213,441
> >                            35,677      203,348      213,011
> >                            35,675      203,028      212,923
> >                            35,673      201,922      212,492
> >                            35,672      201,660      212,225
> >                            35,672      200,724      211,808
> >                            35,672      200,324      211,420
> >                            35,671      199,686      211,413
> >                            35,667      198,858      211,346
> >                            35,667      197,590      211,209
> >  ---------------------------------------------------------------
> >  sum                     1,081,515    6,217,964    7,268,000
> >  average                    36,051      207,265      242,267
> >  stddev                        655        7,010       42,234
> >  elapsed time (sec)         343.70       107.40        84.34
> >  sys time (sec)             269.30     2,520.13     1,696.20
> >  memcg.high breaches       443,672      475,074      623,333
> >  zswpout                    22,605   48,931,249   54,777,100
> >  pswpout                40,004,528            0            0
> >  hugepages-64K zswpout           0    3,057,090    3,421,855
> >  hugepages-64K swpout    2,500,283            0            0
> >  ---------------------------------------------------------------
> >
> > As you can see, this is quite a memory-constrained scenario, where we
> > are giving a 50% of total memory required, as the memory limit for the
> > cgroup in which the 30 processes are run. This causes significantly more
> > reclaim activity than the setup I was using thus far (70 processes, 1G,
> > 40G limit).
> >
> > The variance or "imbalance" reduces somewhat for zstd, but not for IAA.
> >
> > IAA shows really good throughput (17%) and elapsed time (21%) and
> > sys time (33%) improvement wrt zstd with zswap_store of large folios.
> > These are the memory-constrained scenarios in which IAA typically
> > does really well. IAA verify_compress is enabled, so this is an added
> > data integrity checks benefit we get with IAA.
> >
> > I would like to get your and the maintainers' feedback on whether
> > I should switch to this "usemem30-10G" setup for v8?
> 
> The results looks good to me.  I suggest you to use it.

Ok, sure, thanks Ying.

Thanks,
Kanchana

> 
> --
> Best Regards,
> Huang, Ying


      reply	other threads:[~2024-09-26 21:44 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-24  1:17 Kanchana P Sridhar
2024-09-24  1:17 ` [PATCH v7 1/8] mm: Define obj_cgroup_get() if CONFIG_MEMCG is not defined Kanchana P Sridhar
2024-09-24 16:45   ` Nhat Pham
2024-09-24  1:17 ` [PATCH v7 2/8] mm: zswap: Modify zswap_compress() to accept a page instead of a folio Kanchana P Sridhar
2024-09-24 16:50   ` Nhat Pham
2024-09-24  1:17 ` [PATCH v7 3/8] mm: zswap: Refactor code to store an entry in zswap xarray Kanchana P Sridhar
2024-09-24 17:16   ` Nhat Pham
2024-09-24 20:40     ` Sridhar, Kanchana P
2024-09-24 19:14   ` Yosry Ahmed
2024-09-24 22:22     ` Sridhar, Kanchana P
2024-09-24  1:17 ` [PATCH v7 4/8] mm: zswap: Refactor code to delete stored offsets in case of errors Kanchana P Sridhar
2024-09-24 17:25   ` Nhat Pham
2024-09-24 20:41     ` Sridhar, Kanchana P
2024-09-24 19:20   ` Yosry Ahmed
2024-09-24 22:32     ` Sridhar, Kanchana P
2024-09-25  0:43       ` Yosry Ahmed
2024-09-25  1:18         ` Sridhar, Kanchana P
2024-09-25 14:11         ` Johannes Weiner
2024-09-25 18:45           ` Sridhar, Kanchana P
2024-09-24  1:17 ` [PATCH v7 5/8] mm: zswap: Compress and store a specific page in a folio Kanchana P Sridhar
2024-09-24 19:28   ` Yosry Ahmed
2024-09-24 22:45     ` Sridhar, Kanchana P
2024-09-25  0:47       ` Yosry Ahmed
2024-09-25  1:49         ` Sridhar, Kanchana P
2024-09-25 13:53           ` Johannes Weiner
2024-09-25 18:45             ` Sridhar, Kanchana P
2024-09-24  1:17 ` [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store() Kanchana P Sridhar
2024-09-24 17:33   ` Nhat Pham
2024-09-24 20:51     ` Sridhar, Kanchana P
2024-09-24 21:08       ` Nhat Pham
2024-09-24 21:34         ` Yosry Ahmed
2024-09-24 22:16           ` Nhat Pham
2024-09-24 22:18             ` Sridhar, Kanchana P
2024-09-24 22:28             ` Yosry Ahmed
2024-09-24 22:17           ` Sridhar, Kanchana P
2024-09-24 19:38   ` Yosry Ahmed
2024-09-24 20:51     ` Nhat Pham
2024-09-24 21:38       ` Yosry Ahmed
2024-09-24 23:11         ` Nhat Pham
2024-09-25  0:05           ` Sridhar, Kanchana P
2024-09-25  0:52           ` Yosry Ahmed
2024-09-24 23:21       ` Sridhar, Kanchana P
2024-09-24 23:02     ` Sridhar, Kanchana P
2024-09-25 13:40     ` Johannes Weiner
2024-09-25 18:30       ` Yosry Ahmed
2024-09-25 19:10         ` Sridhar, Kanchana P
2024-09-25 19:49           ` Yosry Ahmed
2024-09-25 20:49             ` Johannes Weiner
2024-09-25 19:20         ` Johannes Weiner
2024-09-25 19:39           ` Yosry Ahmed
2024-09-25 20:13             ` Johannes Weiner
2024-09-25 21:06               ` Yosry Ahmed
2024-09-25 22:29                 ` Sridhar, Kanchana P
2024-09-26  3:58                   ` Sridhar, Kanchana P
2024-09-26  4:52                     ` Yosry Ahmed
2024-09-26 16:40                       ` Sridhar, Kanchana P
2024-09-26 17:19                         ` Yosry Ahmed
2024-09-26 17:29                           ` Sridhar, Kanchana P
2024-09-26 17:34                             ` Yosry Ahmed
2024-09-26 19:36                               ` Sridhar, Kanchana P
2024-09-26 18:43                             ` Johannes Weiner
2024-09-26 18:45                               ` Yosry Ahmed
2024-09-26 19:40                                 ` Sridhar, Kanchana P
2024-09-26 19:39                               ` Sridhar, Kanchana P
2024-09-25 14:27   ` Johannes Weiner
2024-09-25 18:17     ` Yosry Ahmed
2024-09-25 18:48     ` Sridhar, Kanchana P
2024-09-24  1:17 ` [PATCH v7 7/8] mm: swap: Count successful mTHP ZSWAP stores in sysfs mTHP zswpout stats Kanchana P Sridhar
2024-09-24  1:17 ` [PATCH v7 8/8] mm: Document the newly added mTHP zswpout stats, clarify swpout semantics Kanchana P Sridhar
2024-09-24 17:36   ` Nhat Pham
2024-09-24 20:52     ` Sridhar, Kanchana P
2024-09-24 19:34 ` [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios Yosry Ahmed
2024-09-24 22:50   ` Sridhar, Kanchana P
2024-09-25  6:35 ` Huang, Ying
2024-09-25 18:39   ` Sridhar, Kanchana P
2024-09-26  0:44     ` Huang, Ying
2024-09-26  3:48       ` Sridhar, Kanchana P
2024-09-26  6:47         ` Huang, Ying
2024-09-26 21:44           ` Sridhar, Kanchana P [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB56782FBA16D086D6264EFFD3C96A2@SJ0PR11MB5678.namprd11.prod.outlook.com \
    --to=kanchana.p.sridhar@intel.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nanhai.zou@intel.com \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=vinodh.gopal@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox