From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47CBEC3DA4A for ; Mon, 19 Aug 2024 05:55:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 86A4B6B007B; Mon, 19 Aug 2024 01:55:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 819A96B0082; Mon, 19 Aug 2024 01:55:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E1326B0083; Mon, 19 Aug 2024 01:55:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5131E6B007B for ; Mon, 19 Aug 2024 01:55:39 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DF2D0C0F9E for ; Mon, 19 Aug 2024 05:55:38 +0000 (UTC) X-FDA: 82467933156.04.9DDD8ED Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) by imf14.hostedemail.com (Postfix) with ESMTP id 1D3D210000E for ; Mon, 19 Aug 2024 05:55:34 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Nnnqepqh; spf=pass (imf14.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724046921; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CQ2Ht9xiy7/MS1rcy9jDp+C6YX3iaIxuKWYsXvCsERI=; b=jphefpwzrWpPreAmnZflBmbZxKFPw8uF+tTZybgpQRnuupK+rPrhVQajpjagzucj2niHAQ YLCJS41ey9N+vhSE1H/w9AH3GwPmILc9DZKMIQ4aqGz6XmVPa/01z+DUNA2eCdb9dqhrNp TLGFZhJXSoz8e8jg8/XTHDh9Xtz4XB4= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Nnnqepqh; spf=pass (imf14.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724046921; a=rsa-sha256; cv=none; b=bGIdSzSdmk+P6/SmFmTK17dZ4Yu07ucGkVfzqU3nzRRV9aVSKCeom8E045tJrup+7lrpWA YyBZF7iP6KDx4uHwtBFupf7pl9XSXy2W21lzPNkp5RjVg6n8VDZ4KqIMPi7GPXVgl7pDJq RahxtSa11n7QMtyf+ht6mFXteAkPG3I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724046935; x=1755582935; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=X02MtRRl0Mo5bfRc8CoEeI7t1dU83VhdLcL35gShkUw=; b=NnnqepqhmOwaRlD1+NMjAPSiOko69u8bAbZgUNMYeZTRN+5g3EN8UxnS w6ju/PFFtQlkrlbnc3xUvUMYEsX3uC14wSsAyqcC+3xrIA74iEKesZlvm p2IVH1D3PVLFtEKSlfQ7hu5JLYK1F63IwP6T5RPOo72QGCMSmMsUSuizV EqgaNMaefV1yy06PFDqio8EOE6W/HviX9eov3mbZSoEn9+4hsnIr724NB tP81o0KIMgw4OSkIY5dZeJnONGt/GYzlKkwpHtk098vFviXvRfDCiMGDS SVtov5Int/phxHPPYDIm7osgzRFAQU3nn2UlIQRNapFpXJjQYmQnDCg2Y g==; X-CSE-ConnectionGUID: KBlwVL0FQxOLExoGBAnyPA== X-CSE-MsgGUID: U8b7UPq/RN2K/b273nvHgw== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="32845562" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="32845562" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Aug 2024 22:55:33 -0700 X-CSE-ConnectionGUID: 9Qvun1AxTjaWvydSp8WOUw== X-CSE-MsgGUID: kxh3FdgPRo29sRhPMXvKnA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="60426092" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Aug 2024 22:55:30 -0700 From: "Huang, Ying" To: "Sridhar, Kanchana P" Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hannes@cmpxchg.org" , "yosryahmed@google.com" , "nphamcs@gmail.com" , "ryan.roberts@arm.com" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" Subject: Re: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios In-Reply-To: (Kanchana P. Sridhar's message of "Mon, 19 Aug 2024 13:12:53 +0800") References: <20240819021621.29125-1-kanchana.p.sridhar@intel.com> <87msl9i4lw.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Mon, 19 Aug 2024 13:51:57 +0800 Message-ID: <87ikvxhxfm.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Stat-Signature: rph1t5rgajrbqgcjhboh1brw5dic68bi X-Rspamd-Queue-Id: 1D3D210000E X-Rspamd-Server: rspam11 X-HE-Tag: 1724046934-757944 X-HE-Meta: U2FsdGVkX1/Sja9znal1gbUNcEW5kb9FVzJj0IeRnjD6MA35j2KLHXzZnNL/fZMiGQNHDw08kewG/mtl9UXwAYAYdBDXq6loT409/hPEkSetuitTop4R2s15zkcjlwwdmxeFm8Gw9iUJKKsSFd+DuHMFJ7IUj5KoesywIjrCtYl6a5hK/BpCsRFaRom1eCk7Tdkwh2wdrFBWioq43PHPODZcDNEBg/n+AERxdV2gFGltqaCD7vSY3b7L8ZhXwo8Tss8FElx+hcEbtUiZCMT7msz3qfjJ7jwFjRixha5gXpNv0tZo0BTa7Dy/CN8FX5j20I3dZzrjuu9ARRxlqoiGss7jPUI2c3eyOIlzwfyBeC4oKjvrEYLkGVXBoEFqsGuTrfizZ58+d6TnfK9VqPzLQezzd1bU2QSr6/GMI5hJvefeqOd+ceneWNHnexbB0JOC9m4d4T/lZ1pEUqz8D2sfoL6704tv+WDh17oYQtqFB7QtGj2loHb+FYUvclebcxrgSEnFWa4Z1Sm5iXI0MKl/RBkMvMcC56cfQzPkimQ3xwLqmZfIAZ+Kh0/I9XqeJ+/emJIczJ4olvsnqU8hwM03QbR8/Iv+es+V7pxXw3C86PatsDQ8Gs4DiqPBNiZL86zMVikhIhhklqFVtUAK6BLdCFr7PHX+RWZL+bVb7vNMVrG8EtMkumS5z14dqZL5x9XgWABHi76yuNe2LUon6bE1ExeLAO8hlmobcVvSL5foRfJAbM4ZxBasEbVOqYMxqTMqnES3gmCNJ8Xdn67rZ+Qv+lwMg2kW6Mp+ivulTnNDKZ07jGIff59l4WZpxhsgJWXfhXNvF+fTRxtPqgSZtrZ6/AHzOP34MvbXYQ8ZKL7o2TYRF/uHZFOt0Bh0EQhL5jyfKIbb2QdVvUzjRMJshqO0L4G2fzKyC59E0UWYdVgIll9uAKRJ2vXZQUBsA0T3zVZ2fruhKR7X7FwL38Qt5Hw ELgpDM5I LbiN7g1uDejVwR5jhLpypy+VCUAvOoFlUumCaXy9mawLB+PxKHP7L96HtXpgObj97ByzcjwgvT7l8sqrohWNlOpRktjp4E4rHDPNtteI8iHTj7n480i5pidYPtYYi2DJs3OAmOm5bhZyHoerqVHRPGel1k+CiaQVjzNyXVGvEX1zCdKVZOd5ms3o8gRuaz0pbrxR0YWk+zgA8qtwrO+wiRf8LBisz6OaWn52fazqnl2Ngj67mBNdZ3whQzPOKlY3hOSyGVOYcKrXZeJNKtpcxyCcSpV509r3z1TYXueTYWlSVooiHGEHcXhgLg+Uyyu2D4Bmz3EfsEafxyrMpPZdRJDqBDap9NrE3VLJ/kmSgL3uwXuX16A/EshjeYKvkJ2sFDLOxhLQZK6g2ITwNODK75Fjz9uGsNWtCyBiYYaKmsT4f18te/83ZZwmWpFlqPwqUcMtypcnlS8ipvCgtoRhTk4Nb7oZ5wAybCT/7xFgsCHgywi+l/jGx9LBzTBnu7L2bpzr+y4H+v2uPCJ51DAT73REtWjGiTPCSIZMjMFxLC/ZaDWQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Sridhar, Kanchana P" writes: > Hi Ying, > >> -----Original Message----- >> From: Huang, Ying >> Sent: Sunday, August 18, 2024 8:17 PM >> To: Sridhar, Kanchana P >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; >> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; >> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org; >> Zou, Nanhai ; Feghali, Wajdi K >> ; Gopal, Vinodh >> Subject: Re: [PATCH v4 0/4] mm: ZSWAP swap-out of mTHP folios >> >> Kanchana P Sridhar writes: >> >> [snip] >> >> > >> > Performance Testing: >> > ==================== >> > Testing of this patch-series was done with the v6.11-rc3 mainline, without >> > and with this patch-series, on an Intel Sapphire Rapids server, >> > dual-socket 56 cores per socket, 4 IAA devices per socket. >> > >> > The system has 503 GiB RAM, with a 4G SSD as the backing swap device for >> > ZSWAP. Core frequency was fixed at 2500MHz. >> > >> > The vm-scalability "usemem" test was run in a cgroup whose memory.high >> > was fixed. Following a similar methodology as in Ryan Roberts' >> > "Swap-out mTHP without splitting" series [2], 70 usemem processes were >> > run, each allocating and writing 1G of memory: >> > >> > usemem --init-time -w -O -n 70 1g >> > >> > Since I was constrained to get the 70 usemem processes to generate >> > swapout activity with the 4G SSD, I ended up using different cgroup >> > memory.high fixed limits for the experiments with 64K mTHP and 2M THP: >> > >> > 64K mTHP experiments: cgroup memory fixed at 60G >> > 2M THP experiments : cgroup memory fixed at 55G >> > >> > The vm/sysfs stats included after the performance data provide details >> > on the swapout activity to SSD/ZSWAP. >> > >> > Other kernel configuration parameters: >> > >> > ZSWAP Compressor : LZ4, DEFLATE-IAA >> > ZSWAP Allocator : ZSMALLOC >> > SWAP page-cluster : 2 >> > >> > In the experiments where "deflate-iaa" is used as the ZSWAP compressor, >> > IAA "compression verification" is enabled. Hence each IAA compression >> > will be decompressed internally by the "iaa_crypto" driver, the crc-s >> > returned by the hardware will be compared and errors reported in case of >> > mismatches. Thus "deflate-iaa" helps ensure better data integrity as >> > compared to the software compressors. >> > >> > Throughput reported by usemem and perf sys time for running the test >> > are as follows, averaged across 3 runs: >> > >> > 64KB mTHP (cgroup memory.high set to 60G): >> > ========================================== >> > ------------------------------------------------------------------ >> > | | | | | >> > |Kernel | mTHP SWAP-OUT | Throughput | Improvement| >> > | | | KB/s | | >> > |--------------------|-------------------|------------|------------| >> > |v6.11-rc3 mainline | SSD | 335,346 | Baseline | >> > |zswap-mTHP-Store | ZSWAP lz4 | 271,558 | -19% | >> >> zswap throughput is worse than ssd swap? This doesn't look right. > > I realize it might look that way, however, this is not an apples-to-apples comparison, > as explained in the latter part of my analysis (after the 2M THP data tables). > The primary reason for this is because of running the test under a fixed > cgroup memory limit. > > In the "Before" scenario, mTHP get swapped out to SSD. However, the disk swap > usage is not accounted towards checking if the cgroup's memory limit has been > exceeded. Hence there are relatively fewer swap-outs, resulting mainly from the > 1G allocations from each of the 70 usemem processes working with a 60G memory > limit on the parent cgroup. > > However, the picture changes in the "After" scenario. mTHPs will now get stored in > zswap, which is accounted for in the cgroup's memory.current and counts > towards the fixed memory limit in effect for the parent cgroup. As a result, when > mTHP get stored in zswap, the mTHP compressed data in the zswap zpool now > count towards the cgroup's active memory and memory limit. This is in addition > to the 1G allocations from each of the 70 processes. > > As you can see, this creates more memory pressure on the cgroup, resulting in > more swap-outs. With lz4 as the zswap compressor, this results in lesser throughput > wrt "Before". > > However, with IAA as the zswap compressor, the throughout with zswap mTHP is > better than "Before" because of better hardware compress latencies, which handle > the higher swap-out activity without compromising on throughput. > >> >> > |zswap-mTHP-Store | ZSWAP deflate-iaa | 388,154 | 16% | >> > |------------------------------------------------------------------| >> > | | | | | >> > |Kernel | mTHP SWAP-OUT | Sys time | Improvement| >> > | | | sec | | >> > |--------------------|-------------------|------------|------------| >> > |v6.11-rc3 mainline | SSD | 91.37 | Baseline | >> > |zswap-mTHP=Store | ZSWAP lz4 | 265.43 | -191% | >> > |zswap-mTHP-Store | ZSWAP deflate-iaa | 235.60 | -158% | >> > ------------------------------------------------------------------ >> > >> > ----------------------------------------------------------------------- >> > | VMSTATS, mTHP ZSWAP/SSD stats| v6.11-rc3 | zswap-mTHP | zswap- >> mTHP | >> > | | mainline | Store | Store | >> > | | | lz4 | deflate-iaa | >> > |-----------------------------------------------------------------------| >> > | pswpin | 0 | 0 | 0 | >> > | pswpout | 174,432 | 0 | 0 | >> > | zswpin | 703 | 534 | 721 | >> > | zswpout | 1,501 | 1,491,654 | 1,398,805 | >> >> It appears that the number of swapped pages for zswap is much larger >> than that of SSD swap. Why? I guess this is why zswap throughput is >> worse. > > Your observation is correct. I hope the above explanation helps as to the > reasoning behind this. Before: (174432 + 1501) * 4 / 1024 = 687.2 MB After: 1491654 * 4.0 / 1024 = 5826.8 MB >From your previous words, 10GB memory should be swapped out. Even if the average compression ratio is 0, the swap-out count of zswap should be about 100% more than that of SSD. However, the ratio here appears unreasonable. -- Best Regards, Huang, Ying > Thanks, > Kanchana > >> >> > |-----------------------------------------------------------------------| >> > | thp_swpout | 0 | 0 | 0 | >> > | thp_swpout_fallback | 0 | 0 | 0 | >> > | pgmajfault | 3,364 | 3,650 | 3,431 | >> > |-----------------------------------------------------------------------| >> > | hugepages-64kB/stats/zswpout | | 63,200 | 63,244 | >> > |-----------------------------------------------------------------------| >> > | hugepages-64kB/stats/swpout | 10,902 | 0 | 0 | >> > ----------------------------------------------------------------------- >> > >> >> [snip] >> >> -- >> Best Regards, >> Huang, Ying