From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 51CD8C52D7C
	for <linux-mm@archiver.kernel.org>; Mon, 19 Aug 2024 08:31:36 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id DE07E6B0092; Mon, 19 Aug 2024 04:31:35 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D90466B0093; Mon, 19 Aug 2024 04:31:35 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C58226B0095; Mon, 19 Aug 2024 04:31:35 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id A513F6B0092
	for <linux-mm@kvack.org>; Mon, 19 Aug 2024 04:31:35 -0400 (EDT)
Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id 60F1E1C4C63
	for <linux-mm@kvack.org>; Mon, 19 Aug 2024 08:31:35 +0000 (UTC)
X-FDA: 82468326150.19.34B0BE7
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20])
	by imf13.hostedemail.com (Postfix) with ESMTP id A4A4D20014
	for <linux-mm@kvack.org>; Mon, 19 Aug 2024 08:31:32 +0000 (UTC)
Authentication-Results: imf13.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=H08Tcywh;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf13.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724056278; a=rsa-sha256;
	cv=none;
	b=IEliQ0Exk1GxZSXx1eOh5uZ+O08m9PJvnmkKGiErKWFo5ZodUa86NgdsUaq003r/XlVnJv
	5QluQqjxOF27LFQVKTTwiYB631918EhTJlHPFeqvS0brOZSJYpP7voAS0L/AkVNO2LzVPs
	Fh+934WBzgThSFKrV9ppG0YDSTiiXYA=
ARC-Authentication-Results: i=1;
	imf13.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=H08Tcywh;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf13.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1724056278;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=xOXbz1BxU3c9g5F9ieY3ROlMkKA3xe3nfCqyE4GiTnc=;
	b=q24vkNsVn9a3CD19KJR199ua+g2WOrKuNlVVKFDYb/6lg96cYKK40zdFOJ/1ZSsesv+O99
	H4wXGQSlDnohWOKR6YzimQBrPMliNsRslEwsBk3JCe4robnQ2knAdIlqq65/BIU+rSMXao
	AO/vBBdlM9r766yhIEf8r+IheDK9yvE=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1724056293; x=1755592293;
  h=from:to:cc:subject:in-reply-to:references:date:
   message-id:mime-version:content-transfer-encoding;
  bh=DPL/TQRgsVVMVIwduDS8nxKOG8CATSsg8N7iUL3mjhw=;
  b=H08Tcywhg5YDbcOiMfIf5SSkRoIFJddh8s4EWcnpzSeNx8Zt7veWhFQe
   0sEY7vSFh0p7OQhN556Qcegqf85RS0q3I4VlJjvtOH58CP45w0+TShR6w
   yjybvdbbGRhBk4G/QnvKP743Hxsh6i425Oqyhb483qEMkV0B2fjOyDue/
   tySsRFBi0aayqhQzF/ycrSEQltJ55U+68i7OI+pfSmEpEDGwH+1Lpaeit
   MrP0OJQ8Xsb3dYWhRs5JAGRhe3HeUxsCnTrfJS2Ru1rE9+ajzWkLmo8V8
   FR+leWyL1tWTdP0yvj7YY8weXz8WtgNNbxYuPS/LzE+ajT+m5aho9TL0h
   A==;
X-CSE-ConnectionGUID: GoF9yuKBSrG9pcct2Cb0BQ==
X-CSE-MsgGUID: xkYiavKrR6KBhnBNXvwYGg==
X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="22107688"
X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; 
   d="scan'208";a="22107688"
Received: from fmviesa001.fm.intel.com ([10.60.135.141])
  by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:31:31 -0700
X-CSE-ConnectionGUID: jhKq1TzgRgGiwkV1rFXzBA==
X-CSE-MsgGUID: VZzVHiKeRLyxkTdcKYl6gg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; 
   d="scan'208";a="91078117"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:31:28 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: Chris Li <chrisl@kernel.org>,  Hugh Dickins <hughd@google.com>,  Andrew
 Morton <akpm@linux-foundation.org>,  Ryan Roberts <ryan.roberts@arm.com>,
  Kalesh Singh <kaleshsingh@google.com>,  linux-kernel@vger.kernel.org,
  linux-mm@kvack.org,  Barry Song <baohua@kernel.org>
Subject: Re: [PATCH v5 0/9] mm: swap: mTHP swap allocator base on swap
 cluster order
In-Reply-To: <CAMgjq7DJwF+kwxJkDKnH-cnp-36xdEObrNpKGrH_GvNKQtqjSw@mail.gmail.com>
	(Kairui Song's message of "Mon, 19 Aug 2024 00:59:41 +0800")
References: <20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org>
	<87h6bw3gxl.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<CACePvbXH8b9SOePQ-Ld_UBbcAdJ3gdYtEkReMto5Hbq9WAL7JQ@mail.gmail.com>
	<87sevfza3w.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<CACePvbUenbKM+i5x6xR=2A=8tz4Eu2azDFAV_ksvn2TtrFsVOQ@mail.gmail.com>
	<CAMgjq7DJwF+kwxJkDKnH-cnp-36xdEObrNpKGrH_GvNKQtqjSw@mail.gmail.com>
Date: Mon, 19 Aug 2024 16:27:55 +0800
Message-ID: <87ttfghq7o.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Rspamd-Queue-Id: A4A4D20014
X-Rspamd-Server: rspam01
X-Stat-Signature: 1rhmbnyzbqgcaxm1b1kfcd66chjgfbgs
X-HE-Tag: 1724056292-72046
X-HE-Meta: U2FsdGVkX1/ORrYQ+JnCKdLAe/4uX4ob1Dla2ZSdFjGxCAJgm3MmPBSaxTSriLOyRoi+2rD8DZkGjgBvwzogBFjJDIvkF3tlRbeIv4aQeqsNFBJoxiORpNvo+Zo07dqCFlnJo/sr+FBWjXMctardms5YeCBfqzCRHjjr4Ju7xs3bm4D1Hq0WlF9h6Mjv2WA10dfEPG8pACeXUhFirDkdsmkd5v42WpjHKs1hw0BkCaSFtJVt9k1t2UefhcaYn4uriNv77bu7Mw5toxeNLPuCgMiG/LRVZA6F7s+7WFYIvMiRxksFBwah84TILE+udqywXI/5HXOVK7T+707yFnH52aUNMauw+A7UNlKMV53ykMj6F3Nfru+zlj3Q1gU7433wNuxf5dRrtzMVQ1PlQWvcwkwJnt21s1t9gOUkzA+iIwgNTaXjP92SmyFK2uSQ/HM3iTqcBDmS96kIJQeAIU5ae5jqx7Q+yKrP5Y0X2La67zGXsNOnEc75bas64NhEq+CTZVi5maZZQs66IX0imraRlPs0cntdsKnzNVIPlBY63RiV9+w/Gz9Elh8xolE/Vs1DaydbM9FQ262C/lpeYCKGr3BGZoe5VMqONHRtC+O1+c7UkATleXVBDD5J/nTuX1d3ZoWP8q5SPTIwz3IozaD+XHXAFZregSkrq6rJJbMR9juHhxcVNLJIPZkD5GWYkKAtZY0MrD6oJnIHAO825Js6whuwjJ9RcHooupeFA6Mhf1GzJ8NbSq/q8DyyUp2iqJJg9PuPxLfkcOXCmdygZBKnehCwYwCpy4YHyUvN7AcMjVohOthuQEiKf+A7fkHkshzMiX9VnDfRPLxOmojLU7yr4g9tpYQH22rYzYuYrs6+zo5x2gZ7MGWHndqUAN2m2For22BqpWSHWbJhMW+lIebwecf4IMtpiLuW5jRQh9BF/Vhd52ksLbELEyOtB2HDLU8xBF6jlhJfjdEosx1LruD
 d8iRveY3
 Naxzzc6ELHMuoeGCNspdtPTTP67tWIWOWoMYmJaGG3KwcglJeRBDAjgYhw7v2e2+5loR3HijM4oqO3kjWxs8Ue3b1RToKaLEyJIGmPDCQsZNJMHH323u1LoU45YQnhy6Z/m0Oz30Au13i07kubKXO9u042tQ+kfkOr2kpyQGERW2gAaTBO1yOZZarYD7SMOeR2R2/xi0336t7qVbTqa2hsWo5OkGkXAVVHjZBTw5e8+E7v6oufciCUVYsY/4slSN4jDWmkeC4Llo/guQWN+Istq6bvA==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Kairui Song <ryncsn@gmail.com> writes:

> On Fri, Aug 16, 2024 at 3:53=E2=80=AFPM Chris Li <chrisl@kernel.org> wrot=
e:
>>
>> On Thu, Aug 8, 2024 at 1:38=E2=80=AFAM Huang, Ying <ying.huang@intel.com=
> wrote:
>> >
>> > Chris Li <chrisl@kernel.org> writes:
>> >
>> > > On Wed, Aug 7, 2024 at 12:59=E2=80=AFAM Huang, Ying <ying.huang@inte=
l.com> wrote:
>> > >>
>> > >> Hi, Chris,
>> > >>
>> > >> Chris Li <chrisl@kernel.org> writes:
>> > >>
>> > >> > This is the short term solutions "swap cluster order" listed
>> > >> > in my "Swap Abstraction" discussion slice 8 in the recent
>> > >> > LSF/MM conference.
>> > >> >
>> > >> > When commit 845982eb264bc "mm: swap: allow storage of all mTHP
>> > >> > orders" is introduced, it only allocates the mTHP swap entries
>> > >> > from the new empty cluster list.  It has a fragmentation issue
>> > >> > reported by Barry.
>> > >> >
>> > >> > https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgM=
QdSMp+Ah+NSgNQ@mail.gmail.com/
>> > >> >
>> > >> > The reason is that all the empty clusters have been exhausted whi=
le
>> > >> > there are plenty of free swap entries in the cluster that are
>> > >> > not 100% free.
>> > >> >
>> > >> > Remember the swap allocation order in the cluster.
>> > >> > Keep track of the per order non full cluster list for later alloc=
ation.
>> > >> >
>> > >> > This series gives the swap SSD allocation a new separate code path
>> > >> > from the HDD allocation. The new allocator use cluster list only
>> > >> > and do not global scan swap_map[] without lock any more.
>> > >>
>> > >> This sounds good.  Can we use SSD allocation method for HDD too?
>> > >> We may not need a swap entry allocator optimized for HDD.
>> > >
>> > > Yes, that is the plan as well. That way we can completely get rid of
>> > > the old scan_swap_map_slots() code.
>> >
>> > Good!
>> >
>> > > However, considering the size of the series, let's focus on the
>> > > cluster allocation path first, get it tested and reviewed.
>> >
>> > OK.
>> >
>> > > For HDD optimization, mostly just the new block allocations portion
>> > > need some separate code path from the new cluster allocator to not do
>> > > the per cpu allocation.  Allocating from the non free list doesn't
>> > > need to change too
>> >
>> > I suggest not consider HDD optimization at all.  Just use SSD algorithm
>> > to simplify.
>>
>> Adding a global next allocating CI rather than the per CPU next CI
>> pointer is pretty trivial as well. It is just a different way to fetch
>> the next cluster pointer.
>
> Yes, if we enable the new cluster based allocator for HDD, we can
> enable THP and mTHP for HDD too, and use a global cluster_next instead
> of Per-CPU for it.
> It's easy to do with minimal changes, and should actually boost
> performance for HDD SWAP. Currently testing this locally.

I think that it's better to start with SSD algorithm.  Then, you can add
HDD specific optimization on top of it with supporting data.

BTW, I don't know why HDD shouldn't use per-CPU cluster.  Sequential
writing is more important for HDD.

>> > >>
>> > >> Hi, Hugh,
>> > >>
>> > >> What do you think about this?
>> > >>
>> > >> > This streamline the swap allocation for SSD. The code matches the
>> > >> > execution flow much better.
>> > >> >
>> > >> > User impact: For users that allocate and free mix order mTHP swap=
ping,
>> > >> > It greatly improves the success rate of the mTHP swap allocation =
after the
>> > >> > initial phase.
>> > >> >
>> > >> > It also performs faster when the swapfile is close to full, becau=
se the
>> > >> > allocator can get the non full cluster from a list rather than sc=
anning
>> > >> > a lot of swap_map entries.
>> > >>
>> > >> Do you have some test results to prove this?  Or which test below c=
an
>> > >> prove this?
>> > >
>> > > The two zram tests are already proving this. The system time
>> > > improvement is about 2% on my low CPU count machine.
>> > > Kairui has a higher core count machine and the difference is higher
>> > > there. The theory is that higher CPU count has higher contentions.
>> >
>> > I will interpret this as the performance is better in theory.  But
>> > there's almost no measurable results so far.
>>
>> I am trying to understand why don't see the performance improvement in
>> the zram setup in my cover letter as a measurable result?
>
> Hi Ying, you can check the test with the 32 cores AMD machine in the
> cover letter, as Chris pointed out the performance gain is higher as
> core number grows. The performance gain is still not much (*yet, based
> on this design thing can go much faster after HDD codes are
> dropped which enables many other optimizations, this series
> is mainly focusing on the fragmentation issue), but I think a
> stable ~4 - 8% improvement with a build linux kernel test
> could be considered measurable?

Is this the test result for "when the swapfile is close to full"?

--
Best Regards,
Huang, Ying