From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C83ECF58F1 for ; Fri, 20 Sep 2024 09:15:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 301CB6B0082; Fri, 20 Sep 2024 05:15:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B10F6B0083; Fri, 20 Sep 2024 05:15:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 179466B0085; Fri, 20 Sep 2024 05:15:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id EF7486B0082 for ; Fri, 20 Sep 2024 05:15:48 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 98EFC801A6 for ; Fri, 20 Sep 2024 09:15:48 +0000 (UTC) X-FDA: 82584559176.04.27D2A6B Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by imf09.hostedemail.com (Postfix) with ESMTP id C7592140015 for ; Fri, 20 Sep 2024 09:15:45 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=H20HPyo9; spf=pass (imf09.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.13 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726823689; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ot8xGhPjoYh15RDOIPPwF71TX1eTeDNqJ5ZHwHU18kg=; b=yXVbNEqhC4eXWuOUphrjmtxM59r2/KA5E9XWV9QG04UHj70QDEK/FLMzUbFwF0mvbJWGRK UNMtUad2wlsHZw3OQYU5rkoUYms5onOOY8hE86NVJSBvHngYe/hbOi+xPxKlu1bGcd5pyM IxzKujCyYWpRan0QeHovafT0Dt6H90I= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=H20HPyo9; spf=pass (imf09.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.13 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726823689; a=rsa-sha256; cv=none; b=L0nz9ry2qOi7qW1FdKEtfX98/zVl9D0sIEuP7ArAJHswUtQnn6iwYjjcx1ZfiABUwoxoaV n3F5jL5v0K7pq0PZwFJzvfLSU2iDGJmRpKpjW7Bv+EHh7E9QF/g2V1gD8/vqTLqHYRofmW CZUFH67YqqitT2nruhrEzqx8j/HJUOw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1726823746; x=1758359746; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=VnDz8E/CxphXslX9pLC9mxPJtilxX5wHcUxEthFCtFE=; b=H20HPyo9wbjPsYdeFGu2Vy29MZxp7bUoYAUA4p37OcXdOqNnWIZwKxqd 8EtUADXIujUbYm9PGatr1o7XNaFEEv8JcJgpH0n/+UT6Bk1fUt0j8THrg tWwtNJ8roKYEAymFNbpzHzPEkZTIO+eTTNNAQNkNnu3UtYvIt/pKe/J8f 7iW7Y2PclQjSwiDc8ilsawfZSL1LQLn2qA9Bpsgx0bWKh6jRt0uYxbM72 KQdVmJbC5UQmMckPEMp9Wy9MF3/coZAnYq4wjicW/cIEjgRSlwra23nbf 1DbN6JWl/DaZep2nEAD3eS2EGye/ltLh/8iSEdawd2+vWsgiETYXd8404 A==; X-CSE-ConnectionGUID: mZWp+n09SVian0hRF6ULEw== X-CSE-MsgGUID: YDZM3jLTR2mJlT+cCafkRg== X-IronPort-AV: E=McAfee;i="6700,10204,11200"; a="36955662" X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="36955662" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2024 02:15:44 -0700 X-CSE-ConnectionGUID: 1LMmnwRLQSCRgdccj/Xf9A== X-CSE-MsgGUID: ZmPhVy4CQ52btpXfqfXl5A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,243,1719903600"; d="scan'208";a="75013491" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by ORVIESA003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2024 02:15:41 -0700 From: "Huang, Ying" To: "Sridhar, Kanchana P" Cc: Nhat Pham , Yosry Ahmed , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hannes@cmpxchg.org" , "chengming.zhou@linux.dev" , "usamaarif642@gmail.com" , "ryan.roberts@arm.com" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" Subject: Re: [PATCH v6 0/3] mm: ZSWAP swap-out of mTHP folios In-Reply-To: (Kanchana P. Sridhar's message of "Fri, 20 Sep 2024 10:16:53 +0800") References: <20240829212705.6714-1-kanchana.p.sridhar@intel.com> Date: Fri, 20 Sep 2024 17:12:07 +0800 Message-ID: <87msk2vgd4.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: C7592140015 X-Stat-Signature: me8tcoyzq67ecs9iapjseaqj3h93bfma X-HE-Tag: 1726823745-9455 X-HE-Meta: U2FsdGVkX1/2gNqbdgWSGJbDvSazOTaNYLswpndGcEZDkUbvmXy/Vz9FpB6HX6V29Pv/xdlz76DePswtURiMSiGkn/uLrDQlmm3SV7y1ATh9FUXUAUFczPsvjXUm5HUA6ThRN4JNFzkGgblfYEW3niYwxLqjacvf3oeyjgwfgOJFyAsOkgtiZDCrxOhPXXg7T5LRY3x6MCTZKtjqOUp5y5Aq7JVOhwLHwQdYIS1v7zQ6vcNXx2fVsWX8EUoogXPPdDew/N6Y2C1B2m/hHNzDN8ZhXan8CxsWlRrqGWf2N2FZw73Hy6ztrfuZ00GpdLu+G0eRUHWu4q5beIA9cK6EydapqvpO8Du/dW6vvtuxLff8F/5PTDWphbfU1uUNFXWMnRLfTMANc8p2W2Qxhc/S7/4mxe6bYHiUzi0JJ8dld2963z0GKwn2R77Hg7402XBPhOufeKLG+lZoRfwa0jAgoVzeHcA8DIGiA3LKb11m+ofZ6O4NEAo6h2Sha51p3Be97jKAAajH/rEbr6Cl/LFDkG3nZheWtxszuwBo1IH5XBI1Zr5/tZJgw9P1dxYRehppbFCf7y6icTMziCChM2U+wa1+wIvR2cuBkwEDd/655Edpejk/cOgXK1dTs+/6A1O41PIkLLsjcZ84rOFMSmaLEdwGGvYRnpW85CBdhUh6gRSS7Cz2L/y4CS8id087eWg82SNg7K3W49YepWCZPKwg68uexM6GxZP4U5Kd6n756qHZ/4kR/i/USztkZtseK99yrJazIz/enKxYSqY5cMwKt/dA8kpXDR5M6V5S3iCpo0cy1uaE2/T51CsKZIu8riWu5dflpm6SANQkoE1iv+ME2uJIjKBO+J3YvaBPJUgI+tTyY21Fg7DWy9t3E5t42TFPxYHLhJKX0xq4d1vPzv2Yeu5uOWeTMQ/OV8UMpwSmjCHfXdjOAiTIyEjSPVkC1aTYftUJkgERF+ljykbx33l W96hS9b9 W7RBqKHi1LHSW2xVgXyFRDOPNo6ha96vGR8Yooey8tY9TX0MUlbcTfSUPW2q394JheH7oT/mvu/ky+dp1l2LH9Kog5Aid0YjHFbNwsxdNmPgzeIiHAgpce4mr5VkDX7ARXz4iAWYFN8xnR+QLc4+rV5ysu9k4heGXh2IhmZkingN0OUX7dRKdwyAPIMfjGgyQiFWVm5xXWTY+gkNXVXawnNOkcCd4uJQ9zhxom6N/hPPUbcUWQV+q1k3jV2qKpQLprLrAqe27KkXEsPyxCy673lqqJw6EQUeJpTNHFAI4HbZtBfhxXSKo0SU+i85kDd+yBceHnoiNSJowweJ8XHaEBd+m4SL8nd9t+dJv3uXkuMfeGRtD4s4D7TC1irJAKnDA8ni4G0G1O7qLgh4gr0x/HA+oMZD3/pQJs+G+m+SchM6fZ1/6vfscpirOw+vEdgxsF6K0BKXzccC0TEpQ+Cg//aoBEiDeCqPuwa6bcnjP4l1lAQ0uKrd08HqXXRm6E9gvLucnGBYHzuRVYfAOtq7OFRLpZFl5FKWkLldl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Sridhar, Kanchana P" writes: > Hi Nhat, > >> -----Original Message----- >> From: Nhat Pham >> Sent: Thursday, August 29, 2024 4:46 PM >> To: Yosry Ahmed >> Cc: Sridhar, Kanchana P ; linux- >> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org; >> chengming.zhou@linux.dev; usamaarif642@gmail.com; >> ryan.roberts@arm.com; Huang, Ying ; >> 21cnbao@gmail.com; akpm@linux-foundation.org; Zou, Nanhai >> ; Feghali, Wajdi K ; >> Gopal, Vinodh >> Subject: Re: [PATCH v6 0/3] mm: ZSWAP swap-out of mTHP folios >>=20 >> On Thu, Aug 29, 2024 at 3:49=E2=80=AFPM Yosry Ahmed >> wrote: >> > >> > On Thu, Aug 29, 2024 at 2:27=E2=80=AFPM Kanchana P Sridhar >> > >> > We are basically comparing zram with zswap in this case, and it's not >> > fair because, as you mentioned, the zswap compressed data is being >> > accounted for while the zram compressed data isn't. I am not really >> > sure how valuable these test results are. Even if we remove the cgroup >> > accounting from zswap, we won't see an improvement, we should expect a >> > similar performance to zram. >> > >> > I think the test results that are really valuable are case 1, where >> > zswap users are currently disabling CONFIG_THP_SWAP, and get to enable >> > it after this series. >>=20 >> Ah, this is a good point. >>=20 >> I think the point of comparing mTHP zswap v.s mTHP (SSD)swap is more >> of a sanity check. IOW, if mTHP swap outperforms mTHP zswap, then >> something is wrong (otherwise why would enable zswap - might as well >> just use swap, since SSD swap with mTHP >>> zswap with mTHP >>> zswap >> without mTHP). >>=20 >> That said, I don't think this benchmark can show it anyway. The access >> pattern here is such that all the allocated memories are really cold, >> so swap to disk (or to zram, which does not account memory usage >> towards cgroup) is better by definition... And Kanchana does not seem >> to have access to setup with larger SSD swapfiles? :) > > As follow up, I created a swapfile on disk to increase the SSD swap to 17= 9G. Are you sure you used swapfile instead of a swap partition? From the following code in scan_swap_map_slots(), if (order > 0) { /* * Should not even be attempting large allocations when huge * page swap is disabled. Warn and fail the allocation. */ if (!IS_ENABLED(CONFIG_THP_SWAP) || nr_pages > SWAPFILE_CLUSTER) { VM_WARN_ON_ONCE(1); return 0; } /* * Swapfile is not block device or not using clusters so unable * to allocate large entries. */ if (!(si->flags & SWP_BLKDEV) || !si->cluster_info) return 0; } large folio will be split for swapfile. -- Best Regards, Huang, Ying > 64KB mTHP (cgroup memory.high set to 40G, no swap limit): > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > CONFIG_THP_SWAP=3DY > Sapphire Rapids server with 503 GiB RAM and 179G SSD swap backing device > for zswap. > > usemem --init-time -w -O --sleep 0 -n 70 1g: > > ------------------------------------------------------------------------= ------- > mm-unstable 9-17-2024 zswap-mTHP v6 Cha= nge wrt > Baseline B= aseline > "before" "after" (s= leep 0) > ------------------------------------------------------------------------= ------- > ZSWAP compressor zstd deflate- zstd deflate- zstd d= eflate- > iaa iaa = iaa > ------------------------------------------------------------------------= ------- > Throughput (KB/s) 93,273 88,496 143,117 134,131 53% = 52% > sys time (sec) 316.68 349.00 917.88 877.74 -190% = -152% > memcg_high 73,836 83,522 126,120 133,013 > memcg_swap_fail 261,136 324,533 494,191 578,824 > pswpin 16 11 0 0 > pswpout 1,242,187 1,263,493 0 0 > zswpin 694 668 712 702 > zswpout 3,991,403 4,933,901 9,289,092 10,461,948 > thp_swpout 0 0 0 0 > thp_swpout_ 0 0 0 0 > fallback > pgmajfault 3,488 3,353 3,377 3,499 > ZSWPOUT-64kB n/a n/a 110,067 103,957 > SWPOUT-64kB 77,637 78,968 0 0 > ------------------------------------------------------------------------= ------- > > We do see 50% throughput improvement with mTHP-zswap wrt mTHP-SSD. > The sys time increase can be attributed to higher swapout activity > occurring with zswap-mTHP. > > I hope this quantifies the benefit of mTHP-zswap wrt mTHP-SSD in a > non-swap-constrained setup. The 4G SSD swap setup data I shared > in my response to Yosry also indicates better throughput with mTHP-zswap > as compared to mTHP-SSD. > > Please do let me know if you have any other questions/suggestions. > > Thanks, > Kanchana > >>=20 >> > >> > If we really want to compare CONFIG_THP_SWAP on before and after, it >> > should be with SSD because that's a more conventional setup. In this >> > case the users that have CONFIG_THP_SWAP=3Dy only experience the >> > benefits of zswap with this series. You mentioned experimenting with >> > usemem to keep the memory allocated longer so that you're able to have >> > a fair test with the small SSD swap setup. Did that work? >> > >> > I am hoping Nhat or Johannes would shed some light on whether they >> > usually have CONFIG_THP_SWAP enabled or not with zswap. I am trying to >> > figure out if any reasonable setups enable CONFIG_THP_SWAP with zswap. >> > Otherwise the testing results from case 1 should be sufficient. >> > >> > > >> > > In my opinion, even though the test set up does not provide an accur= ate >> > > way for a direct before/after comparison (because of zswap usage bei= ng >> > > counted in cgroup, hence towards the memory.high), it still seems >> > > reasonable for zswap_store to support (m)THP, so that further >> performance >> > > improvements can be implemented. >> > >> > This is only referring to the results of case 2, right? >> > >> > Honestly, I wouldn't want to merge mTHP swapout support on its own >> > just because it enables further performance improvements without >> > having actual patches for them. But I don't think this captures the >> > results accurately as it dismisses case 1 results (which I think are >> > more reasonable). >> > >> > Thnaks