From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF594C25B7C for ; Wed, 29 May 2024 08:57:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25AC66B00A0; Wed, 29 May 2024 04:57:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E3ED6B00A1; Wed, 29 May 2024 04:57:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 05C8A6B00A6; Wed, 29 May 2024 04:57:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D8D436B00A0 for ; Wed, 29 May 2024 04:57:19 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6A402C07AA for ; Wed, 29 May 2024 08:57:19 +0000 (UTC) X-FDA: 82170829398.14.C5FADDF Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by imf01.hostedemail.com (Postfix) with ESMTP id DC7774001A for ; Wed, 29 May 2024 08:57:16 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JIVa7h3i; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716973037; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aB10rPcr2o1y4t0tlsaCyMpB2noYIoesH1ij037pmMM=; b=ux1sxpdQzZqgEHnBTJMmAzxC+Uur+S4oMCuK/ZLiBMy2vxYh55q9oq5dQdVifrhGiZ5iJ/ 5eP2KWJTFhmwAlwJQiGItSYgiRDIhah/+pdgRfh3cQ+YCcPDWzDI/4baFOZWrh9MZe1cFz QrzbfAUBUdcddFyaZ/N952cHCBK3kmo= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JIVa7h3i; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716973037; a=rsa-sha256; cv=none; b=wb6p8U1fODymMwYAoYN0Jnvivx/lLkzmzf9Z8XbMG6c571bV3oHz+Csde5vn4fxZlggLzP 1Iq8YC1PSyzgaJ5l3eBhjWUcLHbPVoXIbSMCZpTdKMtv9KrYieHjfmHDIgJT/NV8Sm5bqw LwdAajb83MI0EQNGL9uDU2tDK8tF734= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1716973037; x=1748509037; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=9/uaxa/woxU5qbYNPehRwZbjzzi4VwJVgZ6Aq5w5LAU=; b=JIVa7h3iPSoYJBdUvx8KRoJ0AINkze8nkYCVn1XBkzb6HnDzYz+sT4tB JQ8jB7ZgPi/czz1/Ok2LXzOEbPl/LMDSUpGzyc6atbI7wn+52cx3r4YWv wWrWz8X55xEZsp3TVoMrudSliT0nM+KPaFLq1N688v2QkaDsofDk5AXCG 2IMiLlwaZQC/aerF4V5ZiQwO9olQUPNqntKNMBGRVpVk5JcWjFLYGdGS5 kD6+ejwaQx5ob6a0IaPIKriRL/ybOCUT7ninJYOk6PGY8fcau2j8WXS3a jqT4rVNJrnnc9ymbcVDaKOMFHbFf+rSV2sIHttDQu/Y6YWBOi14VhZx9k A==; X-CSE-ConnectionGUID: JXU4e4K2QoyfRNGDEiD4Fg== X-CSE-MsgGUID: ylK+DfylSRyLdg+zmbMsng== X-IronPort-AV: E=McAfee;i="6600,9927,11085"; a="23972417" X-IronPort-AV: E=Sophos;i="6.08,197,1712646000"; d="scan'208";a="23972417" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2024 01:57:15 -0700 X-CSE-ConnectionGUID: Nr9ueaw+QIqmwXKC4WNsYg== X-CSE-MsgGUID: eZtLlAMJQxi5LZe95BHp6A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,197,1712646000"; d="scan'208";a="39798318" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 May 2024 01:57:13 -0700 From: "Huang, Ying" To: Chris Li Cc: Andrew Morton , Kairui Song , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Subject: Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order In-Reply-To: (Chris Li's message of "Tue, 28 May 2024 14:04:34 -0700") References: <20240524-swap-allocator-v1-0-47861b423b26@kernel.org> Date: Wed, 29 May 2024 16:55:21 +0800 Message-ID: <87cyp5575y.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: DC7774001A X-Stat-Signature: dxh4zd1jzy3k37i5yi5oq997qqxc3ghs X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1716973036-862437 X-HE-Meta: U2FsdGVkX1+pKy5l1NxH/pGQwkl1U90mvbuKpfAJ4tKppfPl5JQ1NwyMiBAffGIk95S+baTGwg9p13SDBSe2sDBFXcqhg+IBaanVh7O4bZhNtrDZI0dbzCJNShN1g3I76BCoIkiPdeV9kiAGUNlL/AiOtc0pgJkXcGMRVX6V52UMGYTa4IQysC9Arc/94asvAksDw1I3lEqyYt6VtEA7bEh0x34l0VnrPzp2q7oXbwHj5ttAmnkF6nK565nrVS7MY6mLp7DKVbdiha+oaFUMl6Co7sEidFp4RcE4XbLcL/I6qmWBdCV87l6Si2WSFM/YJ87I8Sgf+kLMDrs6STlOAVKrtmsLEhTOza6EpIkQ6taj4PYESGyu+mehspAW80lKPoVeBTPB0Dsg6rt3Zt/SuKkNVbSfziuN3UWZY+NG3J0gGF1EWB+2D2ACi5sAxj8D8ZkMn9pszTRQYXCDQPX0YIuXBG2zbZHG05vhXEcfD5DYrn5UBOfVEeVmgvMZtGqDmnu/C9UR57fQkkkJGwFCFZqwiGn1H/Ec2a9e8J7WhNU6F01p6INHPtZDmg8fjOQxDRH+HSoqhdff8Q4JepJbf+jgOFiGRZ9L2ZJyObTAKrQf8CWas/UMVZFsR/t0TVj1dyd2dwnUvoV3YR4vzMCLpEq67xYZmAA9QUSFagiCAo+rYCfADt636Mp/NQziOI2Vh52hrpF5GblHI7lmmlKqCjuVxCdqHnH990wnQvdSj+26WKRmGQNO83bqd8C8CxvfKvGg+5nJTHOX78JLDQLWszjDOnAGE8PS6c78IVQM2sR7pLQ8uwBmeBDrZMUFZspqP36JsgXsteVXZx9kQpJV6w/VKkrUgtxg3y4R1JmEGOeoF8hPiK6SyRvlJJBz3B/+Ph6RPwulmpGFxHO4CZKZOIuyvHRJ34gfsQa8v77PI+buXlJqxDvGO8R79dgpHI/9FGLqgUMtm4OumDFY2r/ qW7mnzxK T3npIuZzYUV59eRI0eEqOiH2HBswq/1dVGLYvyvP1nHK0THASlzmMo5KOmbVNFMRqInjC1lUKdNMzuy19wq6RKGqe25WV4ctrL/aC85RIKE38UpFXZEAMPPFUhnUdMsX+BVtomnM5JVxaSFTZhhgQdyiuDj4wursJvln1X3SSMJvsAgyUuiEHbPBYlugAmXcI/BSZE6UeUE4Vazpjmzmb5Odlf7mHNMBjbSidEoGeu5IconhspZYd8tiMF3/I3yXGvHTEUM5BxAnAUzX2+vemGPOTfuQ7jjTX5Cme X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Chris Li writes: > I am spinning a new version for this series to address two issues > found in this series: > > 1) Oppo discovered a bug in the following line: > + ci =3D si->cluster_info + tmp; > Should be "tmp / SWAPFILE_CLUSTER" instead of "tmp". > That is a serious bug but trivial to fix. > > 2) order 0 allocation currently blindly scans swap_map disregarding > the cluster->order. IIUC, now, we only scan swap_map[] only if !list_empty(&si->free_clusters) && !list_empty(&si->nonfull_clusters[order]= ). That is, if you doesn't run low swap free space, you will not do that. > Given enough order 0 swap allocations(close to the > swap file size) the order 0 allocation head will eventually sweep > across the whole swapfile and destroy other cluster order allocations. > > The short term fix is just skipping clusters that are already assigned > to higher orders. Better to do any further optimization on top of the simpler one. Need to evaluate whether it's necessary to add more complexity. > In the long term, I want to unify the non-SSD to use clusters for > locking and allocations as well, just try to follow the last > allocation (less seeking) as much as possible. I have thought about that too. Personally, I think that it's good to remove swap_map[] scanning. The implementation can be simplified too. I don't know whether do we need to consider the performance of HDD swap now. -- Best Regards, Huang, Ying > On Fri, May 24, 2024 at 10:17=E2=80=AFAM Chris Li wro= te: >> >> This is the short term solutiolns "swap cluster order" listed >> in my "Swap Abstraction" discussion slice 8 in the recent >> LSF/MM conference. >> >> When commit 845982eb264bc "mm: swap: allow storage of all mTHP >> orders" is introduced, it only allocates the mTHP swap entries >> from new empty cluster list. That works well for PMD size THP, >> but it has a serius fragmentation issue reported by Barry. >> >> https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+A= h+NSgNQ@mail.gmail.com/ >> >> The mTHP allocation failure rate raises to almost 100% after a few >> hours in Barry's test run. >> >> The reason is that all the empty cluster has been exhausted while >> there are planty of free swap entries to in the cluster that is >> not 100% free. >> >> Address this by remember the swap allocation order in the cluster. >> Keep track of the per order non full cluster list for later allocation. >> >> This greatly improve the sucess rate of the mTHP swap allocation. >> While I am still waiting for Barry's test result. I paste Kairui's test >> result here: >> >> I'm able to reproduce such an issue with a simple script (enabling all o= rder of mthp): >> >> modprobe brd rd_nr=3D1 rd_size=3D$(( 10 * 1024 * 1024)) >> swapoff -a >> mkswap /dev/ram0 >> swapon /dev/ram0 >> >> rmdir /sys/fs/cgroup/benchmark >> mkdir -p /sys/fs/cgroup/benchmark >> cd /sys/fs/cgroup/benchmark >> echo 8G > memory.max >> echo $$ > cgroup.procs >> >> memcached -u nobody -m 16384 -s /tmp/memcached.socket -a 0766 -t 32 -B b= inary & >> >> /usr/local/bin/memtier_benchmark -S /tmp/memcached.socket \ >> -P memcache_binary -n allkeys --key-minimum=3D1 \ >> --key-maximum=3D18000000 --key-pattern=3DP:P -c 1 -t 32 \ >> --ratio 1:0 --pipeline 8 -d 1024 >> >> Before: >> Totals 48805.63 0.00 0.00 5.26045 1= .19100 38.91100 59.64700 51063.98 >> After: >> Totals 71098.84 0.00 0.00 3.60585 0= .71100 26.36700 39.16700 74388.74 >> >> And the fallback ratio dropped by a lot: >> Before: >> hugepages-32kB/stats/anon_swpout_fallback:15997 >> hugepages-32kB/stats/anon_swpout:18712 >> hugepages-512kB/stats/anon_swpout_fallback:192 >> hugepages-512kB/stats/anon_swpout:0 >> hugepages-2048kB/stats/anon_swpout_fallback:2 >> hugepages-2048kB/stats/anon_swpout:0 >> hugepages-1024kB/stats/anon_swpout_fallback:0 >> hugepages-1024kB/stats/anon_swpout:0 >> hugepages-64kB/stats/anon_swpout_fallback:18246 >> hugepages-64kB/stats/anon_swpout:17644 >> hugepages-16kB/stats/anon_swpout_fallback:13701 >> hugepages-16kB/stats/anon_swpout:18234 >> hugepages-256kB/stats/anon_swpout_fallback:8642 >> hugepages-256kB/stats/anon_swpout:93 >> hugepages-128kB/stats/anon_swpout_fallback:21497 >> hugepages-128kB/stats/anon_swpout:7596 >> >> (Still collecting more data, the success swpout was mostly done early, t= hen the fallback began to increase, nearly 100% failure rate) >> >> After: >> hugepages-32kB/stats/swpout:34445 >> hugepages-32kB/stats/swpout_fallback:0 >> hugepages-512kB/stats/swpout:1 >> hugepages-512kB/stats/swpout_fallback:134 >> hugepages-2048kB/stats/swpout:1 >> hugepages-2048kB/stats/swpout_fallback:1 >> hugepages-1024kB/stats/swpout:6 >> hugepages-1024kB/stats/swpout_fallback:0 >> hugepages-64kB/stats/swpout:35495 >> hugepages-64kB/stats/swpout_fallback:0 >> hugepages-16kB/stats/swpout:32441 >> hugepages-16kB/stats/swpout_fallback:0 >> hugepages-256kB/stats/swpout:2223 >> hugepages-256kB/stats/swpout_fallback:6278 >> hugepages-128kB/stats/swpout:29136 >> hugepages-128kB/stats/swpout_fallback:52 >> >> Reported-by: Barry Song <21cnbao@gmail.com> >> Tested-by: Kairui Song >> Signed-off-by: Chris Li >> --- >> Chris Li (2): >> mm: swap: swap cluster switch to double link list >> mm: swap: mTHP allocate swap entries from nonfull list >> >> include/linux/swap.h | 18 ++-- >> mm/swapfile.c | 252 +++++++++++++++++---------------------------= ------- >> 2 files changed, 93 insertions(+), 177 deletions(-) >> --- >> base-commit: c65920c76a977c2b73c3a8b03b4c0c00cc1285ed >> change-id: 20240523-swap-allocator-1534c480ece4 >> >> Best regards, >> -- >> Chris Li >>