From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F37BDCCF9E9 for ; Thu, 26 Sep 2024 06:51:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 884DB6B00BB; Thu, 26 Sep 2024 02:51:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 834D76B00BC; Thu, 26 Sep 2024 02:51:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B6B16B00BD; Thu, 26 Sep 2024 02:51:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 45B986B00BB for ; Thu, 26 Sep 2024 02:51:13 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 078EFC08BF for ; Thu, 26 Sep 2024 06:51:13 +0000 (UTC) X-FDA: 82605967626.04.5CD3EB0 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) by imf18.hostedemail.com (Postfix) with ESMTP id F39531C0014 for ; Thu, 26 Sep 2024 06:51:09 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=HQeagaOr; spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.10 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727333434; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0jeu7pmb+uI01JAYiCXi45aAto+2/zZp7oAMx9ArpGM=; b=ZLWcteuJYmlmQw+vH2BaqWmHGYrcxpcKgRtvay0zwlufygA/LNBKiMMehZSTBqvW7YTTFX zIzxuRE38UmH/fRwRPEGzwOTyTSgOtwrml6LqixzFwQOsx9wJpf+WtRI99+WmOwxG/jcp+ QjKwY9qhE1mQjBFxe4VQBKxJR2Z+Z3E= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=HQeagaOr; spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.10 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727333434; a=rsa-sha256; cv=none; b=47T0VjAUHWLotMc8a9GbddK1ufxGxccxMuh+r60xKRLhX623J3cwqlPWPnnkuiMZ9aqagc fOpUDYYPZNrl6NwuWtXptJxHV5dPfdK2i9NU5HmUXDNOL8d+DTVtvKIxY4u12Ueg7cf95d Crm8ZiEJkt4t68djEGqeATY1z7vfQT4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1727333470; x=1758869470; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=4vOQ1UAED36KAtBYkjHVwAw+/u2JrzMSD+GfRoR+UD0=; b=HQeagaOrjwPsQSJ4/NAYQ0IDdqUATgwjIno/IzLVOnlv/Ip3VjAJ6Qci Oez2JNpGTAz415tr7lALyrl1FzH9zYvboJXRRYEFMpjRmRXISZBxOk2yK w9XSGzv8+fZq4KTbdZvVazrye2dOG89vr8mT2sa+Eyz6p6/1amjbE4fGa 99Gevil1qMGkeaJuK/kAhEYbtqR4vEwu4/gnhgeyow0WeGhGV3af7DM1g Nw0tmdHgO3l85EIpaPa1zLObA1AOZsjGGcRB10Qz0uVDMBXL1ZZrclTWu vE9zI1qmKbCIbA++4hkiR2NSlbXYCm/yO892gyJLDgyCDk4B5Ne15TXAz g==; X-CSE-ConnectionGUID: IBPtd03qSyKw84628+kiXw== X-CSE-MsgGUID: mmYmhRwuQ7q+iLO86QDflg== X-IronPort-AV: E=McAfee;i="6700,10204,11206"; a="37766522" X-IronPort-AV: E=Sophos;i="6.10,259,1719903600"; d="scan'208";a="37766522" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2024 23:51:09 -0700 X-CSE-ConnectionGUID: QGCYBa+8Q7Ot8BP7HCg5KA== X-CSE-MsgGUID: HrfVTZa/Qv6r1G2WEJItQw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,259,1719903600"; d="scan'208";a="76845787" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by ORVIESA003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2024 23:51:05 -0700 From: "Huang, Ying" To: "Sridhar, Kanchana P" Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hannes@cmpxchg.org" , "yosryahmed@google.com" , "nphamcs@gmail.com" , "chengming.zhou@linux.dev" , "usamaarif642@gmail.com" , "shakeel.butt@linux.dev" , "ryan.roberts@arm.com" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios In-Reply-To: (Kanchana P. Sridhar's message of "Thu, 26 Sep 2024 11:48:05 +0800") References: <20240924011709.7037-1-kanchana.p.sridhar@intel.com> <87v7yks0kd.fsf@yhuang6-desk2.ccr.corp.intel.com> <877cazs0p7.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 26 Sep 2024 14:47:31 +0800 Message-ID: <87msjurjwc.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Stat-Signature: 7jopeohkxtikwkuxim4jijpz9aipqfj9 X-Rspamd-Queue-Id: F39531C0014 X-Rspamd-Server: rspam11 X-HE-Tag: 1727333469-344229 X-HE-Meta: U2FsdGVkX19MM2XMQ6J+S7WqDax+SRSOZoVLigcPgfsMh81ht6X95I/cGMIvNIL8Of9xor9bz7O6wYKlfxoClrada2qZYCItiQRe1L2QVoke5ScqkAhvF4WeNHMEkuBWHZeycoXj4vHhtswGoe1Nl5xr2OOE+I+4CAMf86rXta5lWTJcDRC5jLq561/164S6avMk5wWIt3NELbrrpBVm4Ev3mLtLi4ZOX5YX9XLjtel3d5YJoBFNlzz/d45ceg0JTMUyHHeZ8iQxtRtucLQDitpjCnCLudRC/Kb+x6aJXSYiYPEjbEwiAiosHXNRopax5q3IBH0BcJY/uemeJIm6kXjqr+prW2GrM9a5Pci1AN24w55kw72KTZoUlQznfXxenzj3AweQGowZx17me7NiUWmkbw11Z3dSaQM6061y6+fukgQPFgYZ49eYlL62V8SV5vVTEEv+/Tkiie/hLVg1P5h6rAdvsBG2Mki6qyf/sINKXmgseOu9gkBt+wnLQnhkFgrhQgDboPok/jUqI6SWwoEvY2IOhxEcHXrLSDOthoWXs7BcgUt8JPffPXpGLZ2WeFpoi45fvVYfFhrbD9RqhFiVMlI9LMtamZd76irR5C3N93DAGMxSYGQHF1CIzHkAS9cVAeDS8jrMzNQ4wdvLn7QCFHlrhzGOFrRWpF480GDk6mPS0KfT6OTtHTKjYln8/8J6wkyPk2W6klLlTmOJCUKQ8xkgBJBvVfH40TTyR391Suda3R2fw0J3oL6/QKNCoLwFrus2RmcZ7ke2xtpTpwUaqu1U8BYrezEtn6lD7PvPFxQSxwUIJp05gDt/x5TKb4ueP54f6pNHhyJ4oFEOnh4xQYsq//3RIin1X6xW3L9Nhly8Je/bYMycif4c9s3aweP8oa+K0rUPQTDd4G5cXMW6oIhOqNUdUbCGWOdCLoatrEX2HPOAcMlO2SMi7a1wjvu+lX48x1FLX4XUqqE qP9RsmeP owCaS9UXwKCIqBZsdDLCVLvLfPvsqk+3yutBSQ4on4lUI0+oJ1XD7F+W6lXIgTPgX/I3c0Q8S/DhBTW5d8TrT0c4bMEbL1qZC5bZ95yg9i/JCP2Lm73voD+Npt9mE5ahDo52+ngBzSyqpy6aJ1BTcHspLmd/MtJy/vcV17h5u8tayZcjoJ6xkydXZyw/cEj5sOVRtJLFZi7xq3gsA7AjDYfyyneivoRxCpmo8hY/41HzEFloJmMq5ta570iiWKfeNm+2kXRFN2ok/HEre5YeiOClQtbKvkgZWepmtxRYhIlzk2/NTjWnu+7EZ4ofms+CbwgU/nexRgYtuABAmOz97ymZQLmocGUmw83kd/DZJgpzRUBLkhKCRfnC+qHzMCMTx+qy4rEJlxyVBIKyj4TfLka4MiJ1/V+sYxCRCpLgra2v3XlV5R1nbd0ZLg5d+v+2tWaQxDAaBuTblk4HzzBAQQmzCqa/GoAJAlMmKSgMUYJcX8CQqReJqeOzU/tJk2RyI1vPg3Rltppa/VfGDGCRj/6TB9LGyaKMlLodc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Sridhar, Kanchana P" writes: > Hi Ying, > >> -----Original Message----- >> From: Huang, Ying >> Sent: Wednesday, September 25, 2024 5:45 PM >> To: Sridhar, Kanchana P >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; >> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; >> chengming.zhou@linux.dev; usamaarif642@gmail.com; >> shakeel.butt@linux.dev; ryan.roberts@arm.com; 21cnbao@gmail.com; >> akpm@linux-foundation.org; Zou, Nanhai ; Feghali, >> Wajdi K ; Gopal, Vinodh >> >> Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios >> >> "Sridhar, Kanchana P" writes: >> >> >> -----Original Message----- >> >> From: Huang, Ying >> >> Sent: Tuesday, September 24, 2024 11:35 PM >> >> To: Sridhar, Kanchana P >> >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; >> >> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; >> >> chengming.zhou@linux.dev; usamaarif642@gmail.com; >> >> shakeel.butt@linux.dev; ryan.roberts@arm.com; 21cnbao@gmail.com; >> >> akpm@linux-foundation.org; Zou, Nanhai ; >> Feghali, >> >> Wajdi K ; Gopal, Vinodh >> >> >> >> Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios >> >> >> >> Kanchana P Sridhar writes: >> >> >> >> [snip] >> >> >> >> > >> >> > Case 1: Comparing zswap 4K vs. zswap mTHP >> >> > ========================================= >> >> > >> >> > In this scenario, the "before" is CONFIG_THP_SWAP set to off, that >> results in >> >> > 64K/2M (m)THP to be split into 4K folios that get processed by zswap. >> >> > >> >> > The "after" is CONFIG_THP_SWAP set to on, and this patch-series, that >> >> results >> >> > in 64K/2M (m)THP to not be split, and processed by zswap. >> >> > >> >> > 64KB mTHP (cgroup memory.high set to 40G): >> >> > ========================================== >> >> > >> >> > ------------------------------------------------------------------------------- >> >> > mm-unstable 9-23-2024 zswap-mTHP Change wrt >> >> > CONFIG_THP_SWAP=N CONFIG_THP_SWAP=Y >> Baseline >> >> > Baseline >> >> > ------------------------------------------------------------------------------- >> >> > ZSWAP compressor zstd deflate- zstd deflate- zstd deflate- >> >> > iaa iaa iaa >> >> > ------------------------------------------------------------------------------- >> >> > Throughput (KB/s) 143,323 125,485 153,550 129,609 7% >> 3% >> >> > elapsed time (sec) 24.97 25.42 23.90 25.19 4% 1% >> >> > sys time (sec) 822.72 750.96 757.70 731.13 8% 3% >> >> > memcg_high 132,743 169,825 148,075 192,744 >> >> > memcg_swap_fail 639,067 841,553 2,204 2,215 >> >> > pswpin 0 0 0 0 >> >> > pswpout 0 0 0 0 >> >> > zswpin 795 873 760 902 >> >> > zswpout 10,011,266 13,195,137 10,010,017 13,193,554 >> >> > thp_swpout 0 0 0 0 >> >> > thp_swpout_ 0 0 0 0 >> >> > fallback >> >> > 64kB-mthp_ 639,065 841,553 2,204 2,215 >> >> > swpout_fallback >> >> > pgmajfault 2,861 2,924 3,054 3,259 >> >> > ZSWPOUT-64kB n/a n/a 623,451 822,268 >> >> > SWPOUT-64kB 0 0 0 0 >> >> > ------------------------------------------------------------------------------- >> >> > >> >> >> >> IIUC, the throughput is the sum of throughput of all usemem processes? >> >> >> >> One possible issue of usemem test case is the "imbalance" issue. That >> >> is, some usemem processes may swap-out/swap-in less, so the score is >> >> very high; while some other processes may swap-out/swap-in more, so the >> >> score is very low. Sometimes, the total score decreases, but the scores >> >> of usemem processes are more balanced, so that the performance should >> be >> >> considered better. And, in general, we should make usemem score >> >> balanced among processes via say longer test time. Can you check this >> >> in your test results? >> > >> > Actually, the throughput data listed in the cover-letter is the average of >> > all the usemem processes. Your observation about the "imbalance" issue is >> > right. Some processes see a higher throughput than others. I have noticed >> > that the throughputs progressively reduce as the individual processes exit >> > and print their stats. >> > >> > Listed below are the stats from two runs of usemem70: sleep 10 and sleep >> 30. >> > Both are run with a cgroup mem-limit of 40G. Data is with v7, 64K folios are >> > enabled, zswap uses zstd. >> > >> > >> > ----------------------------------------------- >> > sleep 10 sleep 30 >> > Throughput (KB/s) Throughput (KB/s) >> > ----------------------------------------------- >> > 181,540 191,686 >> > 179,651 191,459 >> > 179,068 188,834 >> > 177,244 187,568 >> > 177,215 186,703 >> > 176,565 185,584 >> > 176,546 185,370 >> > 176,470 185,021 >> > 176,214 184,303 >> > 176,128 184,040 >> > 175,279 183,932 >> > 174,745 180,831 >> > 173,935 179,418 >> > 161,546 168,014 >> > 160,332 167,540 >> > 160,122 167,364 >> > 159,613 167,020 >> > 159,546 166,590 >> > 159,021 166,483 >> > 158,845 166,418 >> > 158,426 166,264 >> > 158,396 166,066 >> > 158,371 165,944 >> > 158,298 165,866 >> > 158,250 165,884 >> > 158,057 165,533 >> > 158,011 165,532 >> > 157,899 165,457 >> > 157,894 165,424 >> > 157,839 165,410 >> > 157,731 165,407 >> > 157,629 165,273 >> > 157,626 164,867 >> > 157,581 164,636 >> > 157,471 164,266 >> > 157,430 164,225 >> > 157,287 163,290 >> > 156,289 153,597 >> > 153,970 147,494 >> > 148,244 147,102 >> > 142,907 146,111 >> > 142,811 145,789 >> > 139,171 141,168 >> > 136,314 140,714 >> > 133,616 140,111 >> > 132,881 139,636 >> > 132,729 136,943 >> > 132,680 136,844 >> > 132,248 135,726 >> > 132,027 135,384 >> > 131,929 135,270 >> > 131,766 134,748 >> > 131,667 134,733 >> > 131,576 134,582 >> > 131,396 134,302 >> > 131,351 134,160 >> > 131,135 134,102 >> > 130,885 134,097 >> > 130,854 134,058 >> > 130,767 134,006 >> > 130,666 133,960 >> > 130,647 133,894 >> > 130,152 133,837 >> > 130,006 133,747 >> > 129,921 133,679 >> > 129,856 133,666 >> > 129,377 133,564 >> > 128,366 133,331 >> > 127,988 132,938 >> > 126,903 132,746 >> > ----------------------------------------------- >> > sum 10,526,916 10,919,561 >> > average 150,385 155,994 >> > stddev 17,551 19,633 >> > ----------------------------------------------- >> > elapsed 24.40 43.66 >> > time (sec) >> > sys time 806.25 766.05 >> > (sec) >> > zswpout 10,008,713 10,008,407 >> > 64K folio 623,463 623,629 >> > swpout >> > ----------------------------------------------- >> >> Although there are some imbalance, I don't find it's too much. So, I >> think the test result is reasonable. Please pay attention to the >> imbalance issue in the future tests. > > Sure, will do so. > >> >> > As we increase the time for which allocations are maintained, >> > there seems to be a slight improvement in throughput, but the >> > variance increases as well. The processes with lower throughput >> > could be the ones that handle the memcg being over limit by >> > doing reclaim, possibly before they can allocate. >> > >> > Interestingly, the longer test time does seem to reduce the amount >> > of reclaim (hence lower sys time), but more 64K large folios seem to >> > be reclaimed. Could this mean that with longer test time (sleep 30), >> > more cold memory residing in large folios is getting reclaimed, as >> > against memory just relinquished by the exiting processes? >> >> I don't think longer sleep time in test helps much to balance. Can you >> try with less process, and larger memory size per process? I guess that >> this will improve balance. > > I tried this, and the data is listed below: > > usemem options: > --------------- > 30 processes allocate 10G each > cgroup memory limit = 150G > sleep 10 > 525Gi SSD disk swap partition > 64K large folios enabled > > Throughput (KB/s) of each of the 30 processes: > --------------------------------------------------------------- > mm-unstable zswap_store of large folios > 9-25-2024 v7 > zswap compressor: zstd zstd deflate-iaa > --------------------------------------------------------------- > 38,393 234,485 374,427 > 37,283 215,528 314,225 > 37,156 214,942 304,413 > 37,143 213,073 304,146 > 36,814 212,904 290,186 > 36,277 212,304 288,212 > 36,104 212,207 285,682 > 36,000 210,173 270,661 > 35,994 208,487 256,960 > 35,979 207,788 248,313 > 35,967 207,714 235,338 > 35,966 207,703 229,335 > 35,835 207,690 221,697 > 35,793 207,418 221,600 > 35,692 206,160 219,346 > 35,682 206,128 219,162 > 35,681 205,817 219,155 > 35,678 205,546 214,862 > 35,678 205,523 214,710 > 35,677 204,951 214,282 > 35,677 204,283 213,441 > 35,677 203,348 213,011 > 35,675 203,028 212,923 > 35,673 201,922 212,492 > 35,672 201,660 212,225 > 35,672 200,724 211,808 > 35,672 200,324 211,420 > 35,671 199,686 211,413 > 35,667 198,858 211,346 > 35,667 197,590 211,209 > --------------------------------------------------------------- > sum 1,081,515 6,217,964 7,268,000 > average 36,051 207,265 242,267 > stddev 655 7,010 42,234 > elapsed time (sec) 343.70 107.40 84.34 > sys time (sec) 269.30 2,520.13 1,696.20 > memcg.high breaches 443,672 475,074 623,333 > zswpout 22,605 48,931,249 54,777,100 > pswpout 40,004,528 0 0 > hugepages-64K zswpout 0 3,057,090 3,421,855 > hugepages-64K swpout 2,500,283 0 0 > --------------------------------------------------------------- > > As you can see, this is quite a memory-constrained scenario, where we > are giving a 50% of total memory required, as the memory limit for the > cgroup in which the 30 processes are run. This causes significantly more > reclaim activity than the setup I was using thus far (70 processes, 1G, > 40G limit). > > The variance or "imbalance" reduces somewhat for zstd, but not for IAA. > > IAA shows really good throughput (17%) and elapsed time (21%) and > sys time (33%) improvement wrt zstd with zswap_store of large folios. > These are the memory-constrained scenarios in which IAA typically > does really well. IAA verify_compress is enabled, so this is an added > data integrity checks benefit we get with IAA. > > I would like to get your and the maintainers' feedback on whether > I should switch to this "usemem30-10G" setup for v8? The results looks good to me. I suggest you to use it. -- Best Regards, Huang, Ying