From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A934DCF58F9 for ; Fri, 20 Sep 2024 09:32:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 24E4D6B0082; Fri, 20 Sep 2024 05:32:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1FE456B0083; Fri, 20 Sep 2024 05:32:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C5D86B0085; Fri, 20 Sep 2024 05:32:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DCDE36B0082 for ; Fri, 20 Sep 2024 05:32:55 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5F088A01BF for ; Fri, 20 Sep 2024 09:32:55 +0000 (UTC) X-FDA: 82584602310.07.30269F2 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) by imf01.hostedemail.com (Postfix) with ESMTP id A4A754000B for ; Fri, 20 Sep 2024 09:32:52 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=RosgH9H5; spf=pass (imf01.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726824659; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XiQ71DrRkE/opAr8v/rXJaMeurXLH+AZ5ECAYL0rBUI=; b=F61A3w7hDJxQURQFZRz6Q4LmKMiHD4uWX+Kzc2Plh9fwCjGqIfcksiq9UVdV/mU/PtK+Y2 7leSuIXGXnCuEodtBgwn4+u9LZkgV1fDxGx61/eEe4lObIFVaapIcw6ShXuGJmdmK15639 ZVoq72+P5es6lbwB1h+QhrCIxNV7ZAc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726824659; a=rsa-sha256; cv=none; b=kHPLS98SkatVz3TZmCCb93Ph/adWrAhBTRJlCEJjOr5UWpJ4sqzDqZwUD5cJOtqaZ2WH1k pF36E8D8ucjRkLLZfwnRR1XMdyrDgCo/d1tnMaSngYuXS+q5h/rQ9FeCkLrc1rHcyqnjNQ LLBA/XyrYH9LAacvIyd+a4Ypoc4p+mI= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=RosgH9H5; spf=pass (imf01.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726824773; x=1758360773; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=IE9pentAbh+IEJlmH3vThkMt45Jgd4/BBZj41E+nRrk=; b=RosgH9H5HxIJl/2L8dYKZj6ZSXWUMvCxUhLX/eXPALfTOczA+FxbzgoU 5WXtFQ1y+r/MTFc5oDhw8/APkmnmaLR2Mu8ku3KtTUp1an5n3OWcdoEsz QgRNErdPV8kOBp4x7CtxY9w/STBRb/wdC0M3a8ddGyc/LOHcgtW+cqeoE Jy0RUXsr7WbqBb5PtM+Vddz1s97KkNPtKvpnRnBrWi8LE3t+l000sE+pf CemE2Co6dJUv5StHjPcF1BN/2lcpNAWOZ4zqf1dl60vhhOJKMv6jASpU2 RKIyUmFbZFU7KvNmmVPd8eGNc7xk7syolncCHK4uBAZ1lQXzcEUM4v/64 Q==; X-CSE-ConnectionGUID: yHMSbIyERz+miqZeDW+Xdg== X-CSE-MsgGUID: gXXxJfJ8QGKLLk3P25ptww== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="25969774" X-IronPort-AV: E=Sophos;i="6.10,244,1719903600"; d="scan'208";a="25969774" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2024 02:32:51 -0700 X-CSE-ConnectionGUID: Pf+3Vca1RvGkY5z60bkUNA== X-CSE-MsgGUID: lMRDawGyTFSQWm+ORH2FTg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,244,1719903600"; d="scan'208";a="74618221" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2024 02:32:48 -0700 From: "Huang, Ying" To: "Sridhar, Kanchana P" Cc: Yosry Ahmed , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hannes@cmpxchg.org" , "nphamcs@gmail.com" , "chengming.zhou@linux.dev" , "usamaarif642@gmail.com" , "ryan.roberts@arm.com" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" Subject: Re: [PATCH v6 0/3] mm: ZSWAP swap-out of mTHP folios In-Reply-To: (Kanchana P. Sridhar's message of "Fri, 20 Sep 2024 09:41:02 +0800") References: <20240829212705.6714-1-kanchana.p.sridhar@intel.com> Date: Fri, 20 Sep 2024 17:29:14 +0800 Message-ID: <87ikuqvfkl.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A4A754000B X-Stat-Signature: taind4zz3q5dth4bnokfai7enratcbz4 X-HE-Tag: 1726824772-127066 X-HE-Meta: U2FsdGVkX1/8/4NCjwEp4z31F7OpjHnaXhGpNTHd2BjwEeaho8M0PEOSj+2JzNeShvqRX8EIsUUcIM65ArJdLEgAUuCuJ1rjMj5rxvLzCDWKppwhMO7LFAWasplDy/n7iZAt+suZiqdscuNa0HmVVffgPlBki3GQVrUPbIWFLOmzC2OlbMUXVVoUQsonFtDM2/axJpOvp85MJtbogq7ebR+9RfVORP3Li6IzQLQejphXUW/Nl18AndjpMwTbp7s8OchVVHUag8z5n8EjETyXPnmOrorflCAK0GAEE2bJA/SeyAG62SbPkuRLIDmnP19gCkjhzTNnCoiNtO5/MmeZRRYUhEhnx7tWmpxv3ESyBfnQzlvgZD6MKSR3afOum1SWl8zAWKJYIf4wjBg0Ce6qoJKh+ak4ZUW62qUWSMhyWEIKVrFHL4IqWCq1SZn7rm5ziiFdulkiVGupFsS+dPKphR1xueRqq+3DxB6rd+4gQHnXttsJd14Shqrf48oR4IvurwGXd2ojKGceinpE9EkLKrcKAC4sa00ZptD5AdriApuNyBaP/v5QjOesOYUnXLq+0egZ8AcpLgn6cpf0KS7yLFGYFuP3dgWsleuBRpRFFKIr0WCytC8wPxK/VkMLxzWJOZSVf7O8TT2wZzc7+iJBVzYr7AKdAgZfb4iAOVH3YxKhzFmJqISw4bdIHqQZILORZcV97qZjTR4iLamNgPv9dGMlhNbODQLA4eaLd/4Lfe4CPv5FsxMkKSv/wbszHFkAJqCZmQUPyrkui12FG5u6oHPQ/6r9zNK/zW4EUGIzYksSq9xb8yMX1BW211g18pGBYpIMwhUJugtYUq0eriu1C2ZjdvITynxVY70DVUETcz1DDtveHgP6B0UYU+1dobJ4E9xPiocQX4TYJK3JfGt83tAm/PylBb7gk2PfzjFNxdx6wrRX1MmqIvXPft9Tz/+QL8c7aPurVqbUIN8KTHS 4BnIouyA GS6GTuCIzJ35aBFeOfoLO6UHlYDUtWLieMV8IoFWeXmfJQDDyOgJou5tcNq34MjiaeOIHmZNhf3lkcKTQ0DvXReffteFGsx45v8/ndpGbWBBB3EcQFodtZ4dEb+qUjxFTUJpB9FuYVukvXpNuLsB+GhgNd7ZDInlEKm5680zd9pAqEOCxf76iXthiyHH5SCLUB9owRwGUl+prqyIIkjF1f+aCj1Lf6lNlbjes3j1WtRoEgwU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Sridhar, Kanchana P" writes: [snip] > > Thanks, these are good points. I ran this experiment with mm-unstable 9-17-2024, > commit 248ba8004e76eb335d7e6079724c3ee89a011389. > > Data is based on average of 3 runs of the vm-scalability "usemem" test. > > 4G SSD backing zswap, each process sleeps before exiting > ======================================================== > > 64KB mTHP (cgroup memory.high set to 60G, no swap limit): > ========================================================= > CONFIG_THP_SWAP=Y > Sapphire Rapids server with 503 GiB RAM and 4G SSD swap backing device > for zswap. > > Experiment 1: Each process sleeps for 0 sec after allocating memory > (usemem --init-time -w -O --sleep 0 -n 70 1g): > > ------------------------------------------------------------------------------- > mm-unstable 9-17-2024 zswap-mTHP v6 Change wrt > Baseline Baseline > "before" "after" (sleep 0) > ------------------------------------------------------------------------------- > ZSWAP compressor zstd deflate- zstd deflate- zstd deflate- > iaa iaa iaa > ------------------------------------------------------------------------------- > Throughput (KB/s) 296,684 274,207 359,722 390,162 21% 42% > sys time (sec) 92.67 93.33 251.06 237.56 -171% -155% > memcg_high 3,503 3,769 44,425 27,154 > memcg_swap_fail 0 0 115,814 141,936 > pswpin 17 0 0 0 > pswpout 370,853 393,232 0 0 > zswpin 693 123 666 667 > zswpout 1,484 123 1,366,680 1,199,645 > thp_swpout 0 0 0 0 > thp_swpout_ 0 0 0 0 > fallback > pgmajfault 3,384 2,951 3,656 3,468 > ZSWPOUT-64kB n/a n/a 82,940 73,121 > SWPOUT-64kB 23,178 24,577 0 0 > ------------------------------------------------------------------------------- > > > Experiment 2: Each process sleeps for 10 sec after allocating memory > (usemem --init-time -w -O --sleep 10 -n 70 1g): > > ------------------------------------------------------------------------------- > mm-unstable 9-17-2024 zswap-mTHP v6 Change wrt > Baseline Baseline > "before" "after" (sleep 10) > ------------------------------------------------------------------------------- > ZSWAP compressor zstd deflate- zstd deflate- zstd deflate- > iaa iaa iaa > ------------------------------------------------------------------------------- > Throughput (KB/s) 86,744 93,730 157,528 113,110 82% 21% > sys time (sec) 308.87 315.29 477.55 629.98 -55% -100% What is the elapsed time for all cases? > memcg_high 169,450 188,700 143,691 177,887 > memcg_swap_fail 10,131,859 9,740,646 18,738,715 19,528,110 > pswpin 17 16 0 0 > pswpout 1,154,779 1,210,485 0 0 > zswpin 711 659 1,016 736 > zswpout 70,212 50,128 1,235,560 1,275,917 > thp_swpout 0 0 0 0 > thp_swpout_ 0 0 0 0 > fallback > pgmajfault 6,120 6,291 8,789 6,474 > ZSWPOUT-64kB n/a n/a 67,587 68,912 > SWPOUT-64kB 72,174 75,655 0 0 > ------------------------------------------------------------------------------- > > > Conclusions from the experiments: > ================================= > 1) zswap-mTHP improves throughput as compared to the baseline, for zstd and > deflate-iaa. > > 2) Yosry's theory is proved correct in the 4G constrained swap setup. > When the processes are constrained to sleep 10 sec after allocating > memory, thereby keeping the memory allocated longer, the "Baseline" or > "before" with mTHP getting stored in SSD shows a degradation of 71% in > throughput and 238% in sys time, as compared to the "Baseline" with Higher sys time may come from compression with CPU vs. disk writing? > sleep 0 that benefits from serialization of disk IO not allowing all > processes to allocate memory at the same time. > > 3) In the 4G SSD "sleep 0" case, zswap-mTHP shows an increase in sys time > due to the cgroup charging and consequently higher memcg.high breaches > and swapout activity. > > However, the "sleep 10" case's sys time seems to degrade less, and the > memcg.high breaches and swapout activity are almost similar between the > before/after (confirming Yosry's hypothesis). Further, the > memcg_swap_fail activity in the "after" scenario is almost 2X that of > the "before". This indicates failure to obtain swap offsets, resulting > in the folio remaining active in memory. > > I tried to better understand this through the 64k mTHP swpout_fallback > stats in the "sleep 10" zstd experiments: > > -------------------------------------------------------------- > "before" "after" > -------------------------------------------------------------- > 64k mTHP swpout_fallback 627,308 897,407 > 64k folio swapouts 72,174 67,587 > [p|z]swpout events due to 64k mTHP 1,154,779 1,081,397 > 4k folio swapouts 70,212 154,163 > -------------------------------------------------------------- > > The data indicates a higher # of 64k folio swpout_fallback with > zswap-mTHP, that co-relates with the higher memcg_swap_fail counts and > 4k folio swapouts with zswap-mTHP. Could the root-cause be fragmentation > of the swap space due to zswap swapout being faster than SSD swapout? > [snip] -- Best Regards, Huang, Ying