From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51CD8C52D7C for ; Mon, 19 Aug 2024 08:31:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE07E6B0092; Mon, 19 Aug 2024 04:31:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D90466B0093; Mon, 19 Aug 2024 04:31:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C58226B0095; Mon, 19 Aug 2024 04:31:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A513F6B0092 for ; Mon, 19 Aug 2024 04:31:35 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 60F1E1C4C63 for ; Mon, 19 Aug 2024 08:31:35 +0000 (UTC) X-FDA: 82468326150.19.34B0BE7 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) by imf13.hostedemail.com (Postfix) with ESMTP id A4A4D20014 for ; Mon, 19 Aug 2024 08:31:32 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=H08Tcywh; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf13.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724056278; a=rsa-sha256; cv=none; b=IEliQ0Exk1GxZSXx1eOh5uZ+O08m9PJvnmkKGiErKWFo5ZodUa86NgdsUaq003r/XlVnJv 5QluQqjxOF27LFQVKTTwiYB631918EhTJlHPFeqvS0brOZSJYpP7voAS0L/AkVNO2LzVPs Fh+934WBzgThSFKrV9ppG0YDSTiiXYA= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=H08Tcywh; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf13.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724056278; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xOXbz1BxU3c9g5F9ieY3ROlMkKA3xe3nfCqyE4GiTnc=; b=q24vkNsVn9a3CD19KJR199ua+g2WOrKuNlVVKFDYb/6lg96cYKK40zdFOJ/1ZSsesv+O99 H4wXGQSlDnohWOKR6YzimQBrPMliNsRslEwsBk3JCe4robnQ2knAdIlqq65/BIU+rSMXao AO/vBBdlM9r766yhIEf8r+IheDK9yvE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724056293; x=1755592293; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=DPL/TQRgsVVMVIwduDS8nxKOG8CATSsg8N7iUL3mjhw=; b=H08Tcywhg5YDbcOiMfIf5SSkRoIFJddh8s4EWcnpzSeNx8Zt7veWhFQe 0sEY7vSFh0p7OQhN556Qcegqf85RS0q3I4VlJjvtOH58CP45w0+TShR6w yjybvdbbGRhBk4G/QnvKP743Hxsh6i425Oqyhb483qEMkV0B2fjOyDue/ tySsRFBi0aayqhQzF/ycrSEQltJ55U+68i7OI+pfSmEpEDGwH+1Lpaeit MrP0OJQ8Xsb3dYWhRs5JAGRhe3HeUxsCnTrfJS2Ru1rE9+ajzWkLmo8V8 FR+leWyL1tWTdP0yvj7YY8weXz8WtgNNbxYuPS/LzE+ajT+m5aho9TL0h A==; X-CSE-ConnectionGUID: GoF9yuKBSrG9pcct2Cb0BQ== X-CSE-MsgGUID: xkYiavKrR6KBhnBNXvwYGg== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="22107688" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="22107688" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:31:31 -0700 X-CSE-ConnectionGUID: jhKq1TzgRgGiwkV1rFXzBA== X-CSE-MsgGUID: VZzVHiKeRLyxkTdcKYl6gg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="91078117" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:31:28 -0700 From: "Huang, Ying" To: Kairui Song Cc: Chris Li , Hugh Dickins , Andrew Morton , Ryan Roberts , Kalesh Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Subject: Re: [PATCH v5 0/9] mm: swap: mTHP swap allocator base on swap cluster order In-Reply-To: (Kairui Song's message of "Mon, 19 Aug 2024 00:59:41 +0800") References: <20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org> <87h6bw3gxl.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sevfza3w.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Mon, 19 Aug 2024 16:27:55 +0800 Message-ID: <87ttfghq7o.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: A4A4D20014 X-Rspamd-Server: rspam01 X-Stat-Signature: 1rhmbnyzbqgcaxm1b1kfcd66chjgfbgs X-HE-Tag: 1724056292-72046 X-HE-Meta: U2FsdGVkX1/ORrYQ+JnCKdLAe/4uX4ob1Dla2ZSdFjGxCAJgm3MmPBSaxTSriLOyRoi+2rD8DZkGjgBvwzogBFjJDIvkF3tlRbeIv4aQeqsNFBJoxiORpNvo+Zo07dqCFlnJo/sr+FBWjXMctardms5YeCBfqzCRHjjr4Ju7xs3bm4D1Hq0WlF9h6Mjv2WA10dfEPG8pACeXUhFirDkdsmkd5v42WpjHKs1hw0BkCaSFtJVt9k1t2UefhcaYn4uriNv77bu7Mw5toxeNLPuCgMiG/LRVZA6F7s+7WFYIvMiRxksFBwah84TILE+udqywXI/5HXOVK7T+707yFnH52aUNMauw+A7UNlKMV53ykMj6F3Nfru+zlj3Q1gU7433wNuxf5dRrtzMVQ1PlQWvcwkwJnt21s1t9gOUkzA+iIwgNTaXjP92SmyFK2uSQ/HM3iTqcBDmS96kIJQeAIU5ae5jqx7Q+yKrP5Y0X2La67zGXsNOnEc75bas64NhEq+CTZVi5maZZQs66IX0imraRlPs0cntdsKnzNVIPlBY63RiV9+w/Gz9Elh8xolE/Vs1DaydbM9FQ262C/lpeYCKGr3BGZoe5VMqONHRtC+O1+c7UkATleXVBDD5J/nTuX1d3ZoWP8q5SPTIwz3IozaD+XHXAFZregSkrq6rJJbMR9juHhxcVNLJIPZkD5GWYkKAtZY0MrD6oJnIHAO825Js6whuwjJ9RcHooupeFA6Mhf1GzJ8NbSq/q8DyyUp2iqJJg9PuPxLfkcOXCmdygZBKnehCwYwCpy4YHyUvN7AcMjVohOthuQEiKf+A7fkHkshzMiX9VnDfRPLxOmojLU7yr4g9tpYQH22rYzYuYrs6+zo5x2gZ7MGWHndqUAN2m2For22BqpWSHWbJhMW+lIebwecf4IMtpiLuW5jRQh9BF/Vhd52ksLbELEyOtB2HDLU8xBF6jlhJfjdEosx1LruD d8iRveY3 Naxzzc6ELHMuoeGCNspdtPTTP67tWIWOWoMYmJaGG3KwcglJeRBDAjgYhw7v2e2+5loR3HijM4oqO3kjWxs8Ue3b1RToKaLEyJIGmPDCQsZNJMHH323u1LoU45YQnhy6Z/m0Oz30Au13i07kubKXO9u042tQ+kfkOr2kpyQGERW2gAaTBO1yOZZarYD7SMOeR2R2/xi0336t7qVbTqa2hsWo5OkGkXAVVHjZBTw5e8+E7v6oufciCUVYsY/4slSN4jDWmkeC4Llo/guQWN+Istq6bvA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kairui Song writes: > On Fri, Aug 16, 2024 at 3:53=E2=80=AFPM Chris Li wrot= e: >> >> On Thu, Aug 8, 2024 at 1:38=E2=80=AFAM Huang, Ying wrote: >> > >> > Chris Li writes: >> > >> > > On Wed, Aug 7, 2024 at 12:59=E2=80=AFAM Huang, Ying wrote: >> > >> >> > >> Hi, Chris, >> > >> >> > >> Chris Li writes: >> > >> >> > >> > This is the short term solutions "swap cluster order" listed >> > >> > in my "Swap Abstraction" discussion slice 8 in the recent >> > >> > LSF/MM conference. >> > >> > >> > >> > When commit 845982eb264bc "mm: swap: allow storage of all mTHP >> > >> > orders" is introduced, it only allocates the mTHP swap entries >> > >> > from the new empty cluster list. It has a fragmentation issue >> > >> > reported by Barry. >> > >> > >> > >> > https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgM= QdSMp+Ah+NSgNQ@mail.gmail.com/ >> > >> > >> > >> > The reason is that all the empty clusters have been exhausted whi= le >> > >> > there are plenty of free swap entries in the cluster that are >> > >> > not 100% free. >> > >> > >> > >> > Remember the swap allocation order in the cluster. >> > >> > Keep track of the per order non full cluster list for later alloc= ation. >> > >> > >> > >> > This series gives the swap SSD allocation a new separate code path >> > >> > from the HDD allocation. The new allocator use cluster list only >> > >> > and do not global scan swap_map[] without lock any more. >> > >> >> > >> This sounds good. Can we use SSD allocation method for HDD too? >> > >> We may not need a swap entry allocator optimized for HDD. >> > > >> > > Yes, that is the plan as well. That way we can completely get rid of >> > > the old scan_swap_map_slots() code. >> > >> > Good! >> > >> > > However, considering the size of the series, let's focus on the >> > > cluster allocation path first, get it tested and reviewed. >> > >> > OK. >> > >> > > For HDD optimization, mostly just the new block allocations portion >> > > need some separate code path from the new cluster allocator to not do >> > > the per cpu allocation. Allocating from the non free list doesn't >> > > need to change too >> > >> > I suggest not consider HDD optimization at all. Just use SSD algorithm >> > to simplify. >> >> Adding a global next allocating CI rather than the per CPU next CI >> pointer is pretty trivial as well. It is just a different way to fetch >> the next cluster pointer. > > Yes, if we enable the new cluster based allocator for HDD, we can > enable THP and mTHP for HDD too, and use a global cluster_next instead > of Per-CPU for it. > It's easy to do with minimal changes, and should actually boost > performance for HDD SWAP. Currently testing this locally. I think that it's better to start with SSD algorithm. Then, you can add HDD specific optimization on top of it with supporting data. BTW, I don't know why HDD shouldn't use per-CPU cluster. Sequential writing is more important for HDD. >> > >> >> > >> Hi, Hugh, >> > >> >> > >> What do you think about this? >> > >> >> > >> > This streamline the swap allocation for SSD. The code matches the >> > >> > execution flow much better. >> > >> > >> > >> > User impact: For users that allocate and free mix order mTHP swap= ping, >> > >> > It greatly improves the success rate of the mTHP swap allocation = after the >> > >> > initial phase. >> > >> > >> > >> > It also performs faster when the swapfile is close to full, becau= se the >> > >> > allocator can get the non full cluster from a list rather than sc= anning >> > >> > a lot of swap_map entries. >> > >> >> > >> Do you have some test results to prove this? Or which test below c= an >> > >> prove this? >> > > >> > > The two zram tests are already proving this. The system time >> > > improvement is about 2% on my low CPU count machine. >> > > Kairui has a higher core count machine and the difference is higher >> > > there. The theory is that higher CPU count has higher contentions. >> > >> > I will interpret this as the performance is better in theory. But >> > there's almost no measurable results so far. >> >> I am trying to understand why don't see the performance improvement in >> the zram setup in my cover letter as a measurable result? > > Hi Ying, you can check the test with the 32 cores AMD machine in the > cover letter, as Chris pointed out the performance gain is higher as > core number grows. The performance gain is still not much (*yet, based > on this design thing can go much faster after HDD codes are > dropped which enables many other optimizations, this series > is mainly focusing on the fragmentation issue), but I think a > stable ~4 - 8% improvement with a build linux kernel test > could be considered measurable? Is this the test result for "when the swapfile is close to full"? -- Best Regards, Huang, Ying