From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 471B3CDD0FF
	for <linux-mm@archiver.kernel.org>; Tue, 22 Oct 2024 22:07:17 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 7B0756B00A1; Tue, 22 Oct 2024 18:07:16 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 7611A6B00B4; Tue, 22 Oct 2024 18:07:16 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 58DB76B00B5; Tue, 22 Oct 2024 18:07:16 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 224D86B00A1
	for <linux-mm@kvack.org>; Tue, 22 Oct 2024 18:07:16 -0400 (EDT)
Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id 647FE1C3E5C
	for <linux-mm@kvack.org>; Tue, 22 Oct 2024 22:06:56 +0000 (UTC)
X-FDA: 82702624242.25.253F1C1
Received: from mail-oo1-f54.google.com (mail-oo1-f54.google.com [209.85.161.54])
	by imf22.hostedemail.com (Postfix) with ESMTP id 7350EC0009
	for <linux-mm@kvack.org>; Tue, 22 Oct 2024 22:06:53 +0000 (UTC)
Authentication-Results: imf22.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=cvaSYTky;
	spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.161.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1729634681;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=GpQHdkSrrifg6mMBJ5OBvthREeLICDKeEzHFZur+O6o=;
	b=gWEbqNysAt1tzamrhNEAUJ+oZmfteFiU9E2hdLlC/17PpTvOAuGcv+HKyDarFjtBh1/rKQ
	oRw1fVoUiEfyhBkIbzeDolad4nlMKuLxAd99edzUyzTJzbfZGYfoPMghcFwINwbv+cma1j
	HC4fTjXOtMJ0v4ELN185/rA8XL9kmYo=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729634681; a=rsa-sha256;
	cv=none;
	b=VitZ55tTZvTgdO7MYb1fjY+rcND412OQ7vH6ov5AD6++V2j7YyjF9QoBrDOlS/AQyYdMJp
	yEd5JnNF1BT6g6FHLoyaqZu267a3U0xzuqRUkuyz5RpsavMoLdyrm8ey1CCFcwnKm1GaVw
	xXc99628B2E2YW+hcPm4bshlwNQ3Xhc=
ARC-Authentication-Results: i=1;
	imf22.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=cvaSYTky;
	spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.161.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
Received: by mail-oo1-f54.google.com with SMTP id 006d021491bc7-5ebc9bda8c8so1141716eaf.0
        for <linux-mm@kvack.org>; Tue, 22 Oct 2024 15:07:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1729634832; x=1730239632; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=GpQHdkSrrifg6mMBJ5OBvthREeLICDKeEzHFZur+O6o=;
        b=cvaSYTky8dzZeseWlcJrmFNZHDooFC9t4CdDmNCFrjSr/7jmhD+bfZ683qUzek3qGI
         nldhjax+9lo4/bmp3eUXSxorYBmgB/tQRrAHFCiM+eF3XMreisKa9NQCTPatDN/djzhQ
         rThCnYonTHovfsS8SlkFReh8hmw+NMeDwsPHdzdimN1FIy8KlWilx8tiZS0C8c7BG0wT
         zcIr0s7gCyNr1PiSajnIfLzgu3kvgtIYgP8G9D52lOcVnEiCewsOY7dg7FzN8KNUisaP
         jEnAdW03R2BUBs2V5r4UGrGpcKJKOTWoJumq44aeL1UVuexYUMtIhv/DesQG8bc9kyH1
         RwrA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1729634832; x=1730239632;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=GpQHdkSrrifg6mMBJ5OBvthREeLICDKeEzHFZur+O6o=;
        b=nIprL9iPyJGH+FVQULCILY+AkpYmATxKFxEtvFGpGyLSfzIEn5LMIGqL6LQDtKQVGZ
         1x1Bx8MP7ikdj1qxwp42GLPjwRyhQHAyhnvW2nIcHPyXHyZsvNvqi9Yvq+FjpcS9EEhU
         XugFJCwy4T0SSv+EV7wZrh8hUFGxNIgm3pGTVYjv2rDoh5+lFbgKXjBsA9ds6zz1mE1j
         fwm6nz/VI6ojrvrl6gku/OIlB7Y0JiHlG+93SXHd1+OQtuIX29Em4HmYHhX6+8UAp494
         7lbm6pkykh8veJWpgNmKyzEfHBUvoilcGIkzvfyvZJDwbkbXaWwdA08HwYyOyORnMfOo
         NJTQ==
X-Forwarded-Encrypted: i=1; AJvYcCUPTvXtUFYbSLFIRY99fCA1AzOf0H7H/Kp1zmtfxpxehnYJcVb9KS19bdPaPDrklFElHoLfimaWug==@kvack.org
X-Gm-Message-State: AOJu0YymDOOc/d031NtU7mt2mgiMBnEGHhrzgB2bk4Hk05cJ1HBpB2/e
	lk6Hlge2/sWCPFvMGVFfMCXowOzgi6PzeatmejW+b1p6LH0f1NNprn9mJtMFQeQfPFVGi+Msp9R
	CU3WRw0oQoAWmxgEdhWeZb24dX+U=
X-Google-Smtp-Source: AGHT+IEQ8MN3H6ccxqndyX8CxCGkssCuch3G0zDY/RaqmDE+PosRp0Lx7lv+bpwR7YxS3kyZb8mrItsyzardtDXRWP0=
X-Received: by 2002:a05:6358:5328:b0:1c3:9d1e:842 with SMTP id
 e5c5f4694b2df-1c3d819b98cmr43531055d.20.1729634832251; Tue, 22 Oct 2024
 15:07:12 -0700 (PDT)
MIME-Version: 1.0
References: <20241018105026.2521366-1-usamaarif642@gmail.com>
 <CAGsJ_4xweuSwMUBuLSr2eUy69mtQumeDpMZ1g2jFPGq6nFn9fg@mail.gmail.com>
 <5313c721-9cf1-4ecd-ac23-1eeddabd691f@gmail.com> <b1c17b5e-acd9-4bef-820e-699768f1426d@gmail.com>
 <CAGsJ_4wykOyJupLhcqkSPe27rdANd=bOJhqxL74vcdZ+T9f==g@mail.gmail.com> <eab11780-e671-4d09-86a6-af4cf3589392@gmail.com>
In-Reply-To: <eab11780-e671-4d09-86a6-af4cf3589392@gmail.com>
From: Barry Song <21cnbao@gmail.com>
Date: Wed, 23 Oct 2024 11:07:00 +1300
Message-ID: <CAGsJ_4wWf7QnibY_uU8B=efuEACrvFaJJ=bJTD+9KrxFtfoMmQ@mail.gmail.com>
Subject: Re: [RFC 0/4] mm: zswap: add support for zswapin of large folios
To: Usama Arif <usamaarif642@gmail.com>
Cc: senozhatsky@chromium.org, minchan@kernel.org, hanchuanhua@oppo.com, 
	v-songbaohua@oppo.com, akpm@linux-foundation.org, linux-mm@kvack.org, 
	hannes@cmpxchg.org, david@redhat.com, willy@infradead.org, 
	kanchana.p.sridhar@intel.com, yosryahmed@google.com, nphamcs@gmail.com, 
	chengming.zhou@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, 
	riel@surriel.com, shakeel.butt@linux.dev, kernel-team@meta.com, 
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Stat-Signature: kbsbjnd8w3h86zirji4dik5w33g883nz
X-Rspamd-Queue-Id: 7350EC0009
X-Rspam-User: 
X-Rspamd-Server: rspam08
X-HE-Tag: 1729634813-917674
X-HE-Meta: U2FsdGVkX18U1dyEZOA7SEVK5tvc1/0knDzMDdCWZjUT9i8jZeVrdOEn0b9YXZs990IB4FvifB7dWkeIyGHgX4v7cdil1vbaYj+iascyfux0d4ARBFEskDEG1+AskDADO0TQKs8YJNXswH30K0kYrWei7wpcjDBHLSrx01/k1YO38ilironugWVCxFTw++dnT4Ox3jrqqH8gMMA2W+z6MyucR2uZHNC89N7mxb0BOtFjBHcOvPenlwlpCITg/c2XiZjGyJh1EUwRQPqDUG11rJ+GlmkIn1KTT5J5zIgS1D9ZoQZhQWZ36Nk/lLfA41PAXkYVYsyRg6v9ttm0AoUYw7t1cU7PAZLOJti+sab2ry//1occyEMexw1MlTsmaBFYYg9WjjkM5BG7KuNnHdHPAF3wWsb5lKArPGs/8F7fPvBsJ1AHn6Gtog4utawAYLGfJjFZ9vOtsnkVrTJn1sfF45A754LhLRFS4/fQBMOxFLn2YbpBZMEBRBNvRGggrxCY+J5B9+LssFbsOQFV6w7y1BnZCiPMFdt9FRqEcuj1vyqn1z9ypeRgUyo+CHJlADHAKW7XFYOA/eYgSybeEq8kTCCxWVZ2K2FF65s0aRb91JemATBZYmOvVz/gH9j+AGvgJDaazjK+gjju/sWoa72QT1hsT8ef2s4ALi7lqVKiZzJ6Y+NNrdPrzjeNNvsn7BGwwHDLpL9Buvy3r1ZlVjufNzamLBpyg6HDQt0W6ZCTbSXhBDIuQZaj2OvdoPRsPsOJJR+QX3F9fFNW7oZKHYn4rTktcJCecqweNGi2N1PvLb7zSlQO3PRpiJGRXAnjqcdaed64ZKSwDPFlFRhjMsOEouJ0xd3e+SIKT7lUgOl/f0p8GFKgR9LMeXZYD4/RHWvqdnvQdiJ7DCFKEGf+WAQnZ+jSR+Y1RYpjSZTzMsx2+0eKrKnc/ho2VYdDu+EQz1lX0zcjRFYMLdAuF80rZOg
 aeMUooAK
 nimFGt9LKakNKucSByeKIx2M2jYet/XVWlFZuLPU//GRvmn47ICHWpvZYh2/pGOohfrTQTAZB60FIIJ2aMJLxjgQ1yA9dwE8lSeGclyiUCe3hRPtmlrw2GsZTvXVfqsYKWdIWdUfs2rMfnTYrYu7OPYnKKqdiWdlwQLYo6KOWxNDMXBTGCfyM+ZTIiHFmOcsfb4Zys2AHNp36D7ZaZITVD/ZvR39lxoEmLJlERayEf8lXXXZImHBakMFzeHAScW5SOj1D6kDOgAnPcaq+zCNXHLJ0DjUP6at9qlRv/qfiQzh1Qe+jOdbQ9hT4Vdo6SnC2J8z3RmFc8W0kM2VnQyE/MkUrr/n/HoU87q9MRvcgtNUM/alVxboBNNOo1iLYp8lXB4o3jk88ET4O0Bjlcsq3j30O3KUCIglcwzmr04zdVx3FSDOUyumzykH87AIa07pLwRe08kD5Rv7FSPokax9ajhFcs1/dRlgKpkyg5VUqGLx6szxd1mAQM7Pxtw==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Wed, Oct 23, 2024 at 10:17=E2=80=AFAM Usama Arif <usamaarif642@gmail.com=
> wrote:
>
>
>
> On 22/10/2024 21:46, Barry Song wrote:
> > On Wed, Oct 23, 2024 at 4:26=E2=80=AFAM Usama Arif <usamaarif642@gmail.=
com> wrote:
> >>
> >>
> >>
> >> On 21/10/2024 11:40, Usama Arif wrote:
> >>>
> >>>
> >>> On 21/10/2024 06:09, Barry Song wrote:
> >>>> On Fri, Oct 18, 2024 at 11:50=E2=80=AFPM Usama Arif <usamaarif642@gm=
ail.com> wrote:
> >>>>>
> >>>>> After large folio zswapout support added in [1], this patch adds
> >>>>> support for zswapin of large folios to bring it on par with zram.
> >>>>> This series makes sure that the benefits of large folios (fewer
> >>>>> page faults, batched PTE and rmap manipulation, reduced lru list,
> >>>>> TLB coalescing (for arm64 and amd)) are not lost at swap out when
> >>>>> using zswap.
> >>>>>
> >>>>> It builds on top of [2] which added large folio swapin support for
> >>>>> zram and provides the same level of large folio swapin support as
> >>>>> zram, i.e. only supporting swap count =3D=3D 1.
> >>>>>
> >>>>> Patch 1 skips swapcache for swapping in zswap pages, this should im=
prove
> >>>>> no readahead swapin performance [3], and also allows us to build on=
 large
> >>>>> folio swapin support added in [2], hence is a prerequisite for patc=
h 3.
> >>>>>
> >>>>> Patch 3 adds support for large folio zswapin. This patch does not a=
dd
> >>>>> support for hybrid backends (i.e. folios partly present swap and zs=
wap).
> >>>>>
> >>>>> The main performance benefit comes from maintaining large folios *a=
fter*
> >>>>> swapin, large folio performance improvements have been mentioned in=
 previous
> >>>>> series posted on it [2],[4], so have not added those. Below is a si=
mple
> >>>>> microbenchmark to measure the time needed *for* zswpin of 1G memory=
 (along
> >>>>> with memory integrity check).
> >>>>>
> >>>>>                                 |  no mTHP (ms) | 1M mTHP enabled (=
ms)
> >>>>> Base kernel                     |   1165        |    1163
> >>>>> Kernel with mTHP zswpin series  |   1203        |     738
> >>>>
> >>>> Hi Usama,
> >>>> Do you know where this minor regression for non-mTHP comes from?
> >>>> As you even have skipped swapcache for small folios in zswap in patc=
h1,
> >>>> that part should have some gain? is it because of zswap_present_test=
()?
> >>>>
> >>>
> >>> Hi Barry,
> >>>
> >>> The microbenchmark does a sequential read of 1G of memory, so it prob=
ably
> >>> isnt very representative of real world usecases. This also means that
> >>> swap_vma_readahead is able to readahead accurately all pages in its w=
indow.
> >>> With this patch series, if doing 4K swapin, you get 1G/4K calls of fa=
st
> >>> do_swap_page. Without this patch, you get 1G/(4K*readahead window) of=
 slow
> >>> do_swap_page calls. I had added some prints and I was seeing 8 pages =
being
> >>> readahead in 1 do_swap_page. The larger number of calls causes the sl=
ight
> >>> regression (eventhough they are quite fast). I think in a realistic s=
cenario,
> >>> where readahead window wont be as large, there wont be a regression.
> >>> The cost of zswap_present_test in the whole call stack of swapping pa=
ge is
> >>> very low and I think can be ignored.
> >>>
> >>> I think the more interesting thing is what Kanchana pointed out in
> >>> https://lore.kernel.org/all/f2f2053f-ec5f-46a4-800d-50a3d2e61bff@gmai=
l.com/
> >>> I am curious, did you see this when testing large folio swapin and co=
mpression
> >>> at 4K granuality? Its looks like swap thrashing so I think it would b=
e common
> >>> between zswap and zram. I dont have larger granuality zswap compressi=
on done,
> >>> which is why I think there is a regression in time taken. (It could b=
e because
> >>> its tested on intel as well).
> >>>
> >>> Thanks,
> >>> Usama
> >>>
> >>
> >> Hi,
> >>
> >> So I have been doing some benchmarking after Kanchana pointed out a pe=
rformance
> >> regression in [1] of swapping in large folio. I would love to get thou=
ghts from
> >> zram folks on this, as thats where large folio swapin was first added =
[2].
> >> As far as I can see, the current support in zram is doing large folio =
swapin
> >> at 4K granuality. The large granuality compression in [3] which was po=
sted
> >> in March is not merged, so I am currently comparing upstream zram with=
 this series.
> >>
> >> With the microbenchmark below of timing 1G swapin, there was a very la=
rge improvement
> >> in performance by using this series. I think similar numbers would be =
seen in zram.
> >
> > Imagine running several apps on a phone and switching
> > between them: A =E2=86=92 B =E2=86=92 C =E2=86=92 D =E2=86=92 E =E2=80=
=A6 =E2=86=92 A =E2=86=92 B =E2=80=A6 The app
> > currently on the screen retains its memory, while the ones
> > sent to the background are swapped out. When we bring
> > those apps back to the foreground, their memory is restored.
> > This behavior is quite similar to what you're seeing with
> > your microbenchmark.
> >
>
> Hi Barry,
>
> Thanks for explaining this! Do you know if there is some open source benc=
hmark
> we could use to show an improvement in app switching with large folios?
>

I=E2=80=99m fairly certain the Android team has this benchmark, but it=E2=
=80=99s not
open source.

A straightforward way to simulate this is to use a script that
cyclically launches multiple applications, such as Chrome, Firefox,
Office, PDF, and others.

for example:

launch chrome;
launch firefox;
launch youtube;
....
launch chrome;
launch firefox;
....

On Android, we have "Android activity manager 'am' command" to do that.
https://gist.github.com/tsohr/5711945

Not quite sure if other windows managers have similar tools.

> Also I guess swap thrashing can happen when apps are brought back to fore=
ground?
>

Typically, the foreground app doesn't experience much swapping,
as it is the most recently or frequently used. However, this may
not hold for very low-end phones, where memory is significantly
less than the app's working set. For instance, we can't expect a
good user experience when playing a large game that requires 8GB
of memory on a 4GB phone! :-)
And for low-end phones, we never even enable mTHP.

> >>
> >> But when doing kernel build test, Kanchana saw a regression in [1]. I =
believe
> >> its because of swap thrashing (causing large zswap activity), due to l=
arger page swapin.
> >> The part of the code that decides large folio swapin is the same betwe=
en zswap and zram,
> >> so I believe this would be observed in zram as well.
> >
> > Is this an extreme case where the workload's working set far
> > exceeds the available memory by memcg limitation? I doubt mTHP
> > would provide any real benefit from the start if the workload is bound =
to
> > experience swap thrashing. What if we disable mTHP entirely?
> >
>
> I would agree, this is an extreme case. I wanted (z)swap activity to happ=
en so limited
> memory.max to 4G.
>
> mTHP is beneficial in kernel test benchmarking going from no mTHP to 16K:
>
> ARM make defconfig; time make -j$(nproc) Image, cgroup memory.max=3D4G
> metric         no mTHP         16K mTHP=3Dalways
> real           1m0.613s         0m52.008s
> user           25m23.028s       25m19.488s
> sys            25m45.466s       18m11.640s
> zswpin         1911194          3108438
> zswpout        6880815          9374628
> pgfault        120430166        48976658
> pgmajfault     1580674          2327086
>
>

Interesting! We never use a phone to build the Linux kernel, but
let me see if I can find some other machines to reproduce your data.

>
>
> >>
> >> My initial thought was this might be because its intel, where you dont=
 have the advantage
> >> of TLB coalescing, so tested on AMD and ARM, but the regression is the=
re on AMD
> >> and ARM as well, though a bit less (have added the numbers below).
> >>
> >> The numbers show that the zswap activity increases and page faults dec=
rease.
> >> Overall this does result in sys time increasing and real time slightly=
 increases,
> >> likely because the cost of increased zswap activity is more than the b=
enefit of
> >> lower page faults.
> >> I can see in [3] that pagefaults reduced in zram as well.
> >>
> >> Large folio swapin shows good numbers in microbenchmarks that just tar=
get reduce page
> >> faults and sequential swapin only, but not in kernel build test. Is a =
similar regression
> >> observed with zram when enabling large folio swapin on kernel build te=
st? Maybe large
> >> folio swapin makes more sense on workloads where mappings are kept for=
 a longer time?
> >>
> >
> > I suspect this is because mTHP doesn't always benefit workloads
> > when available memory is quite limited compared to the working set.
> > In that case, mTHP swap-in might introduce more features that
> > exacerbate the problem. We used to have an extra control "swapin_enable=
d"
> > for swap-in, but it never gained much traction:
> > https://lore.kernel.org/linux-mm/20240726094618.401593-5-21cnbao@gmail.=
com/
> > We can reconsider whether to include the knob, but if it's better
> > to disable mTHP entirely for these cases, we can still adhere to
> > the policy of "enabled".
> >
> Yes I think this makes sense to have. The only thing is, its too many kno=
bs!
> I personally think its already difficult to decide upto which mTHP size w=
e
> should enable (and I think this changes per workload). But if we add swap=
in_enabled
> on top of that it can make things more difficult.
>
> > Using large block compression and decompression in zRAM will
> > significantly reduce CPU usage, likely making the issue unnoticeable.
> > However, the default minimum size for large block support is currently
> > set to 64KB(ZSMALLOC_MULTI_PAGES_ORDER =3D 4).
> >
>
> I saw that the patch was sent in March, and there werent any updates afte=
r?
> Maybe I can try and cherry-pick that and see if we can develop large
> granularity compression for zswap.

will provide an updated version next week.

>
> >>
> >> Kernel build numbers in cgroup with memory.max=3D4G to trigger zswap
> >> Command for AMD: make defconfig; time make -j$(nproc) bzImage
> >> Command for ARM: make defconfig; time make -j$(nproc) Image
> >>
> >>
> >> AMD 16K+32K THP=3Dalways
> >> metric         mm-unstable      mm-unstable + large folio zswapin seri=
es
> >> real           1m23.038s        1m23.050s
> >> user           53m57.210s       53m53.437s
> >> sys            7m24.592s        7m48.843s
> >> zswpin         612070           999244
> >> zswpout        2226403          2347979
> >> pgfault        20667366         20481728
> >> pgmajfault     385887           269117
> >>
> >> AMD 16K+32K+64K THP=3Dalways
> >> metric         mm-unstable      mm-unstable + large folio zswapin seri=
es
> >> real           1m22.975s        1m23.266s
> >> user           53m51.302s       53m51.069s
> >> sys            7m40.168s        7m57.104s
> >> zswpin         676492           1258573
> >> zswpout        2449839          2714767
> >> pgfault        17540746         17296555
> >> pgmajfault     429629           307495
> >> --------------------------
> >> ARM 16K+32K THP=3Dalways
> >> metric         mm-unstable      mm-unstable + large folio zswapin seri=
es
> >> real           0m51.168s        0m52.086s
> >> user           25m14.715s       25m15.765s
> >> sys            17m18.856s       18m8.031s
> >> zswpin         3904129          7339245
> >> zswpout        11171295         13473461
> >> pgfault        37313345         36011338
> >> pgmajfault     2726253          1932642
> >>
> >>
> >> ARM 16K+32K+64K THP=3Dalways
> >> metric         mm-unstable      mm-unstable + large folio zswapin seri=
es
> >> real           0m52.017s        0m53.828s
> >> user           25m2.742s        25m0.046s
> >> sys            18m24.525s       20m26.207s
> >> zswpin         4853571          8908664
> >> zswpout        12297199         15768764
> >> pgfault        32158152         30425519
> >> pgmajfault     3320717          2237015
> >>
> >>
> >> Thanks!
> >> Usama
> >>
> >>
> >> [1] https://lore.kernel.org/all/f2f2053f-ec5f-46a4-800d-50a3d2e61bff@g=
mail.com/
> >> [2] https://lore.kernel.org/all/20240821074541.516249-3-hanchuanhua@op=
po.com/
> >> [3] https://lore.kernel.org/all/20240327214816.31191-1-21cnbao@gmail.c=
om/
> >>
> >>>
> >>>>>
> >>>>> The time measured was pretty consistent between runs (~1-2% variati=
on).
> >>>>> There is 36% improvement in zswapin time with 1M folios. The percen=
tage
> >>>>> improvement is likely to be more if the memcmp is removed.
> >>>>>
> >>>>> diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/te=
sting/selftests/cgroup/test_zswap.c
> >>>>> index 40de679248b8..77068c577c86 100644
> >>>>> --- a/tools/testing/selftests/cgroup/test_zswap.c
> >>>>> +++ b/tools/testing/selftests/cgroup/test_zswap.c
> >>>>> @@ -9,6 +9,8 @@
> >>>>>  #include <string.h>
> >>>>>  #include <sys/wait.h>
> >>>>>  #include <sys/mman.h>
> >>>>> +#include <sys/time.h>
> >>>>> +#include <malloc.h>
> >>>>>
> >>>>>  #include "../kselftest.h"
> >>>>>  #include "cgroup_util.h"
> >>>>> @@ -407,6 +409,74 @@ static int test_zswap_writeback_disabled(const=
 char *root)
> >>>>>         return test_zswap_writeback(root, false);
> >>>>>  }
> >>>>>
> >>>>> +static int zswapin_perf(const char *cgroup, void *arg)
> >>>>> +{
> >>>>> +       long pagesize =3D sysconf(_SC_PAGESIZE);
> >>>>> +       size_t memsize =3D MB(1*1024);
> >>>>> +       char buf[pagesize];
> >>>>> +       int ret =3D -1;
> >>>>> +       char *mem;
> >>>>> +       struct timeval start, end;
> >>>>> +
> >>>>> +       mem =3D (char *)memalign(2*1024*1024, memsize);
> >>>>> +       if (!mem)
> >>>>> +               return ret;
> >>>>> +
> >>>>> +       /*
> >>>>> +        * Fill half of each page with increasing data, and keep ot=
her
> >>>>> +        * half empty, this will result in data that is still compr=
essible
> >>>>> +        * and ends up in zswap, with material zswap usage.
> >>>>> +        */
> >>>>> +       for (int i =3D 0; i < pagesize; i++)
> >>>>> +               buf[i] =3D i < pagesize/2 ? (char) i : 0;
> >>>>> +
> >>>>> +       for (int i =3D 0; i < memsize; i +=3D pagesize)
> >>>>> +               memcpy(&mem[i], buf, pagesize);
> >>>>> +
> >>>>> +       /* Try and reclaim allocated memory */
> >>>>> +       if (cg_write_numeric(cgroup, "memory.reclaim", memsize)) {
> >>>>> +               ksft_print_msg("Failed to reclaim all of the reques=
ted memory\n");
> >>>>> +               goto out;
> >>>>> +       }
> >>>>> +
> >>>>> +       gettimeofday(&start, NULL);
> >>>>> +       /* zswpin */
> >>>>> +       for (int i =3D 0; i < memsize; i +=3D pagesize) {
> >>>>> +               if (memcmp(&mem[i], buf, pagesize)) {
> >>>>> +                       ksft_print_msg("invalid memory\n");
> >>>>> +                       goto out;
> >>>>> +               }
> >>>>> +       }
> >>>>> +       gettimeofday(&end, NULL);
> >>>>> +       printf ("zswapin took %fms to run.\n", (end.tv_sec - start.=
tv_sec)*1000 + (double)(end.tv_usec - start.tv_usec) / 1000);
> >>>>> +       ret =3D 0;
> >>>>> +out:
> >>>>> +       free(mem);
> >>>>> +       return ret;
> >>>>> +}
> >>>>> +
> >>>>> +static int test_zswapin_perf(const char *root)
> >>>>> +{
> >>>>> +       int ret =3D KSFT_FAIL;
> >>>>> +       char *test_group;
> >>>>> +
> >>>>> +       test_group =3D cg_name(root, "zswapin_perf_test");
> >>>>> +       if (!test_group)
> >>>>> +               goto out;
> >>>>> +       if (cg_create(test_group))
> >>>>> +               goto out;
> >>>>> +
> >>>>> +       if (cg_run(test_group, zswapin_perf, NULL))
> >>>>> +               goto out;
> >>>>> +
> >>>>> +       ret =3D KSFT_PASS;
> >>>>> +out:
> >>>>> +       cg_destroy(test_group);
> >>>>> +       free(test_group);
> >>>>> +       return ret;
> >>>>> +}
> >>>>> +
> >>>>>  /*
> >>>>>   * When trying to store a memcg page in zswap, if the memcg hits i=
ts memory
> >>>>>   * limit in zswap, writeback should affect only the zswapped pages=
 of that
> >>>>> @@ -584,6 +654,7 @@ struct zswap_test {
> >>>>>         T(test_zswapin),
> >>>>>         T(test_zswap_writeback_enabled),
> >>>>>         T(test_zswap_writeback_disabled),
> >>>>> +       T(test_zswapin_perf),
> >>>>>         T(test_no_kmem_bypass),
> >>>>>         T(test_no_invasive_cgroup_shrink),
> >>>>>  };
> >>>>>
> >>>>> [1] https://lore.kernel.org/all/20241001053222.6944-1-kanchana.p.sr=
idhar@intel.com/
> >>>>> [2] https://lore.kernel.org/all/20240821074541.516249-1-hanchuanhua=
@oppo.com/
> >>>>> [3] https://lore.kernel.org/all/1505886205-9671-5-git-send-email-mi=
nchan@kernel.org/T/#u
> >>>>> [4] https://lwn.net/Articles/955575/
> >>>>>
> >>>>> Usama Arif (4):
> >>>>>   mm/zswap: skip swapcache for swapping in zswap pages
> >>>>>   mm/zswap: modify zswap_decompress to accept page instead of folio
> >>>>>   mm/zswap: add support for large folio zswapin
> >>>>>   mm/zswap: count successful large folio zswap loads
> >>>>>
> >>>>>  Documentation/admin-guide/mm/transhuge.rst |   3 +
> >>>>>  include/linux/huge_mm.h                    |   1 +
> >>>>>  include/linux/zswap.h                      |   6 ++
> >>>>>  mm/huge_memory.c                           |   3 +
> >>>>>  mm/memory.c                                |  16 +--
> >>>>>  mm/page_io.c                               |   2 +-
> >>>>>  mm/zswap.c                                 | 120 ++++++++++++++---=
----
> >>>>>  7 files changed, 99 insertions(+), 52 deletions(-)
> >>>>>
> >>>>> --
> >>>>> 2.43.5
> >>>>>
> >>>>
> >

Thanks
Barry