From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D1D0CCFA10 for ; Thu, 26 Sep 2024 00:48:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 91C9F6B00AC; Wed, 25 Sep 2024 20:48:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A59D6B00AD; Wed, 25 Sep 2024 20:48:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71EB46B00AE; Wed, 25 Sep 2024 20:48:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 507926B00AC for ; Wed, 25 Sep 2024 20:48:20 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id F0FE6141152 for ; Thu, 26 Sep 2024 00:48:19 +0000 (UTC) X-FDA: 82605053118.18.15854D7 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by imf06.hostedemail.com (Postfix) with ESMTP id B121A18000A for ; Thu, 26 Sep 2024 00:48:16 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gRSIbDY1; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf06.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727311661; a=rsa-sha256; cv=none; b=nFpjrgnszv+TFpp3Tm4EfoMjO5a7Y1SUO96CNHGibRi6EUOIBx4N8yoMXe/KiOPDAxB0jS 8wPkZfaZYWSJstNR/zcDXpOtEFJLJTehbC+uKvctuG9OCduyn9Fr1SazK+AjKPn4Z9hqy5 UztwotA8gE6Y83cGE+niuunQiUcpEuE= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gRSIbDY1; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf06.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.14 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727311661; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RLy/RiepgRLEe8x+EjzUsfM0fcj5eX6pvCLA0J0OmlE=; b=GxM+73LJeKHthEoYlT1t5YWlzt+6gc/qLoUwhAcwnXJ/EbeLoskBLfjzvcdVunuh5SgnCo Lux3H/XdPLVZet+uIWC8d/tHuMcQit0PyJgldk1SHnsC9mH1WcS2NDu2Nt4SgPCj4LOMlr M7dc2xd6mJ2DHkalsvXGqF63oC0YesU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1727311697; x=1758847697; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=cOgDWVnlUJg/7PGR9wEDlkr6Ro3VyvIPFpymOkUAPgI=; b=gRSIbDY1aYWeztqX7NKlzuwtbt6ImYqKldxvcnL8+5m840OAPBiU2yxU /gdOcYyjVByojJWq2uXQpVFcJArKgU5bXmxduW5WjGPC9fbPGssgOgpi4 vsOk3IyFMFuiwSJXFCfQOdlavH6U3GODJzSwJx1r5Me9ogNhqkysJHzPe Q4tNije3+X3rpwq3Qz5AmZv5Fe/HUR674Av0QukNxA5H4is2mGv5aYYnB I/CzG9w0hY0arfYGi4bb+tyA+VyarKr6gZuHo1Dv02+LmPPDVLkwkwGeb NcLNLdHUfbG4mnfGKMjBa/2DB55bS1pkEIxr4ZHgXbIjaB8TGS8NkeUMz g==; X-CSE-ConnectionGUID: YAEjqwTWTs2ohk1ByxurxQ== X-CSE-MsgGUID: KB6vvHpIREOaCf1d2Kkc5w== X-IronPort-AV: E=McAfee;i="6700,10204,11206"; a="26567180" X-IronPort-AV: E=Sophos;i="6.10,258,1719903600"; d="scan'208";a="26567180" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2024 17:48:13 -0700 X-CSE-ConnectionGUID: LX4AOFhzTMOiw48P8geTBA== X-CSE-MsgGUID: GkjIn0nhTKaOGM1czQFY2w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,258,1719903600"; d="scan'208";a="71623483" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Sep 2024 17:48:09 -0700 From: "Huang, Ying" To: "Sridhar, Kanchana P" Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hannes@cmpxchg.org" , "yosryahmed@google.com" , "nphamcs@gmail.com" , "chengming.zhou@linux.dev" , "usamaarif642@gmail.com" , "shakeel.butt@linux.dev" , "ryan.roberts@arm.com" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios In-Reply-To: (Kanchana P. Sridhar's message of "Thu, 26 Sep 2024 02:39:25 +0800") References: <20240924011709.7037-1-kanchana.p.sridhar@intel.com> <87v7yks0kd.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 26 Sep 2024 08:44:36 +0800 Message-ID: <877cazs0p7.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Queue-Id: B121A18000A X-Rspamd-Server: rspam01 X-Stat-Signature: r9f7b6dchx66m3wa1m1xukzatn9j31ge X-HE-Tag: 1727311696-799325 X-HE-Meta: U2FsdGVkX193dUsGOQDz0ChGRhzyfw1P8/kbKzSkfVyOfP8uBXSk4zG4vKZIL12rPO9fVs4sDXLegY0EipGiy4ZpRBX83LZZh8Xbnv40sV1WrR9HDnSYPJak1gHiKd8CcAkday+gkVC3H4G3HaYAx8ICPzVx0WbxdETuU8FhC8J784dhtneBZZRx8nPcE65n6vkWDuthdls9+29CfD/LiMf2SHul2PAka+FEHV5amHCyCfhfRSdx1bmcPwdqwy+/QlhM+3iW6zNcA+AN6gu0+73TFONx+yGDJmsMvjcaOv7+p2K69Dcyzzxs1I9dr2npGA7IQQKekn9+y3sANAAYx+ObFFwXeRfKuoPvM+zHhFsMoG1HwR9oYmH6AYXD5nLe8uTxRzC+7eVx+9RqLupFZ5ajIQImO/S6VcdUhrlbpZUa/srgSmZqvenJUULMcLPJJetN5PrVq9dZHBpKL7gdIkGdI6pLgQV4o64SwL9QaenepmUbHmS+y36e6vqwjgSb9JykltKBxsGvixEx3EnBtHK47hEtq5nyt+hWe2d6H8cs0bAwsLUQEBntq+A1E0YI9SAwwCspE1Tn4o19lvFuLIv/z3Mm7M3u8wRuTTDcGaLdY2i1Yp1pc6U8mgmzV2RBrrLvj729PUZ1HP05NrHHCY2DI5XJdkY/t+EnFDHg0EK6sFUE/mt27IlxU0UJmdXgPJz/7+bgOGcl/ltZrwUhnSnowgNh2knS6pkEkfOQaNPYjclZ2IdcsW5G4XgqPpICz8mGLFoJmSPCZsIVQZ3hHXj5rDp1CzynulL2MenzYl6LnO2pRbnknjVWtiAJWYGDtbNmj33QDxWoGy+uZv0nHl93uFTJ4klO1+nF31/f7lMgl/FDi1Vsz0R05bkBJvZM/mimZYPBke3DbKaMnZhHGQGkm5i7IbWv4fG6iF6M1epzf9oBW3VKul4gn8s3cqmWKyBIJL69Gu0gA7lSQFq TlT9INMi UkBwW1LZeq282PAnBW5AX+7eBiwkiA596nfCEDPLnsNjNC4uYi8QxAwu3EyOramSCMD/Y+g2rueRHgOLQZS2up2UaNp490fJzrbUC5BwUzN1Z6kSLohLHJJlUKqydMZsLoiZKFOd7K/ILFEATbU7LVGrdjkCQYk80bsDSLgIqYz9ASESY7tLPEGJWY89BOQIi23ASbQvK5qfAMkLHU5E4xK3Mtost6ONyZLDrnK2WUSjT61WTID/ZE6h7Cpan7016Dz4Ug8dMuvRrWysI0cr4HFewHW3z4If9oi78wih7AEZFw2Nl1Z7/Mmqk+nmpgBoR4vXJd8yOHzT0RQ89LLeU+mNyTR7qirp1+cQQxRfYymstu8TS2F4BALbeYZ8DCpsM5puNhqJsh12ZcG5x3MlXMdeZJ1XYq5essvnlCxI5sJJSheLt7srqOy0AfsRKrJa/Hseisj/LOCrfxrePMNnMaiNSsscn89JvnJYvL6cqzM3V4KfC637XNoLMbjJ3iHEkzuhWBuxAjQrp2plMw09ZvIw7ST9QFuvP3eBq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Sridhar, Kanchana P" writes: >> -----Original Message----- >> From: Huang, Ying >> Sent: Tuesday, September 24, 2024 11:35 PM >> To: Sridhar, Kanchana P >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; >> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; >> chengming.zhou@linux.dev; usamaarif642@gmail.com; >> shakeel.butt@linux.dev; ryan.roberts@arm.com; 21cnbao@gmail.com; >> akpm@linux-foundation.org; Zou, Nanhai ; Feghali, >> Wajdi K ; Gopal, Vinodh >> >> Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios >> >> Kanchana P Sridhar writes: >> >> [snip] >> >> > >> > Case 1: Comparing zswap 4K vs. zswap mTHP >> > ========================================= >> > >> > In this scenario, the "before" is CONFIG_THP_SWAP set to off, that results in >> > 64K/2M (m)THP to be split into 4K folios that get processed by zswap. >> > >> > The "after" is CONFIG_THP_SWAP set to on, and this patch-series, that >> results >> > in 64K/2M (m)THP to not be split, and processed by zswap. >> > >> > 64KB mTHP (cgroup memory.high set to 40G): >> > ========================================== >> > >> > ------------------------------------------------------------------------------- >> > mm-unstable 9-23-2024 zswap-mTHP Change wrt >> > CONFIG_THP_SWAP=N CONFIG_THP_SWAP=Y Baseline >> > Baseline >> > ------------------------------------------------------------------------------- >> > ZSWAP compressor zstd deflate- zstd deflate- zstd deflate- >> > iaa iaa iaa >> > ------------------------------------------------------------------------------- >> > Throughput (KB/s) 143,323 125,485 153,550 129,609 7% 3% >> > elapsed time (sec) 24.97 25.42 23.90 25.19 4% 1% >> > sys time (sec) 822.72 750.96 757.70 731.13 8% 3% >> > memcg_high 132,743 169,825 148,075 192,744 >> > memcg_swap_fail 639,067 841,553 2,204 2,215 >> > pswpin 0 0 0 0 >> > pswpout 0 0 0 0 >> > zswpin 795 873 760 902 >> > zswpout 10,011,266 13,195,137 10,010,017 13,193,554 >> > thp_swpout 0 0 0 0 >> > thp_swpout_ 0 0 0 0 >> > fallback >> > 64kB-mthp_ 639,065 841,553 2,204 2,215 >> > swpout_fallback >> > pgmajfault 2,861 2,924 3,054 3,259 >> > ZSWPOUT-64kB n/a n/a 623,451 822,268 >> > SWPOUT-64kB 0 0 0 0 >> > ------------------------------------------------------------------------------- >> > >> >> IIUC, the throughput is the sum of throughput of all usemem processes? >> >> One possible issue of usemem test case is the "imbalance" issue. That >> is, some usemem processes may swap-out/swap-in less, so the score is >> very high; while some other processes may swap-out/swap-in more, so the >> score is very low. Sometimes, the total score decreases, but the scores >> of usemem processes are more balanced, so that the performance should be >> considered better. And, in general, we should make usemem score >> balanced among processes via say longer test time. Can you check this >> in your test results? > > Actually, the throughput data listed in the cover-letter is the average of > all the usemem processes. Your observation about the "imbalance" issue is > right. Some processes see a higher throughput than others. I have noticed > that the throughputs progressively reduce as the individual processes exit > and print their stats. > > Listed below are the stats from two runs of usemem70: sleep 10 and sleep 30. > Both are run with a cgroup mem-limit of 40G. Data is with v7, 64K folios are > enabled, zswap uses zstd. > > > ----------------------------------------------- > sleep 10 sleep 30 > Throughput (KB/s) Throughput (KB/s) > ----------------------------------------------- > 181,540 191,686 > 179,651 191,459 > 179,068 188,834 > 177,244 187,568 > 177,215 186,703 > 176,565 185,584 > 176,546 185,370 > 176,470 185,021 > 176,214 184,303 > 176,128 184,040 > 175,279 183,932 > 174,745 180,831 > 173,935 179,418 > 161,546 168,014 > 160,332 167,540 > 160,122 167,364 > 159,613 167,020 > 159,546 166,590 > 159,021 166,483 > 158,845 166,418 > 158,426 166,264 > 158,396 166,066 > 158,371 165,944 > 158,298 165,866 > 158,250 165,884 > 158,057 165,533 > 158,011 165,532 > 157,899 165,457 > 157,894 165,424 > 157,839 165,410 > 157,731 165,407 > 157,629 165,273 > 157,626 164,867 > 157,581 164,636 > 157,471 164,266 > 157,430 164,225 > 157,287 163,290 > 156,289 153,597 > 153,970 147,494 > 148,244 147,102 > 142,907 146,111 > 142,811 145,789 > 139,171 141,168 > 136,314 140,714 > 133,616 140,111 > 132,881 139,636 > 132,729 136,943 > 132,680 136,844 > 132,248 135,726 > 132,027 135,384 > 131,929 135,270 > 131,766 134,748 > 131,667 134,733 > 131,576 134,582 > 131,396 134,302 > 131,351 134,160 > 131,135 134,102 > 130,885 134,097 > 130,854 134,058 > 130,767 134,006 > 130,666 133,960 > 130,647 133,894 > 130,152 133,837 > 130,006 133,747 > 129,921 133,679 > 129,856 133,666 > 129,377 133,564 > 128,366 133,331 > 127,988 132,938 > 126,903 132,746 > ----------------------------------------------- > sum 10,526,916 10,919,561 > average 150,385 155,994 > stddev 17,551 19,633 > ----------------------------------------------- > elapsed 24.40 43.66 > time (sec) > sys time 806.25 766.05 > (sec) > zswpout 10,008,713 10,008,407 > 64K folio 623,463 623,629 > swpout > ----------------------------------------------- Although there are some imbalance, I don't find it's too much. So, I think the test result is reasonable. Please pay attention to the imbalance issue in the future tests. > As we increase the time for which allocations are maintained, > there seems to be a slight improvement in throughput, but the > variance increases as well. The processes with lower throughput > could be the ones that handle the memcg being over limit by > doing reclaim, possibly before they can allocate. > > Interestingly, the longer test time does seem to reduce the amount > of reclaim (hence lower sys time), but more 64K large folios seem to > be reclaimed. Could this mean that with longer test time (sleep 30), > more cold memory residing in large folios is getting reclaimed, as > against memory just relinquished by the exiting processes? I don't think longer sleep time in test helps much to balance. Can you try with less process, and larger memory size per process? I guess that this will improve balance. -- Best Regards, Huang, Ying