From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15375CDDE5D for ; Wed, 23 Oct 2024 10:27:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7A1E6B0088; Wed, 23 Oct 2024 06:27:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A50B86B0089; Wed, 23 Oct 2024 06:27:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FA5A6B008A; Wed, 23 Oct 2024 06:27:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6DC8E6B0088 for ; Wed, 23 Oct 2024 06:27:02 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 855C4120829 for ; Wed, 23 Oct 2024 10:26:46 +0000 (UTC) X-FDA: 82704488244.07.35B2F14 Received: from mail-vs1-f41.google.com (mail-vs1-f41.google.com [209.85.217.41]) by imf21.hostedemail.com (Postfix) with ESMTP id 89B571C0004 for ; Wed, 23 Oct 2024 10:26:27 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ayxylcli; spf=pass (imf21.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.41 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729679143; a=rsa-sha256; cv=none; b=kK3goZYZZeVR73kRHYbyzp1q3n19AJDXbcxurMca55tWjPaq4UIoe/yeqnHDewHJed5OxN gqyPmHAdHWp+ueAUtXDeyf9bcqo5i99vkpmjbHO6bR+jfABQoTiKyolu5uBPijgic2goTN HWwt4NfmfI7tCIO36fVFSaDn1UP0M5Q= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ayxylcli; spf=pass (imf21.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.41 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729679143; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cD3WF3EmzIj/UjMOYgnbsaubBoun6n700zqtkOjtj/0=; b=OLKsvFkBWMIftK1bjclqPBeHjCQtl8m9jkea8MfB0grcx8/RQ+M2oYtzAtqI9Rbaz5POjz ct9feOgTtkkT5ObbSG4DoHwovtWI+9fjQgKqSmDU/ziaGbkv8wJ835KNBX+oIZzvi7PFwr nNAkSWrrwm+aliQ4HlnJvlWoUAJhQII= Received: by mail-vs1-f41.google.com with SMTP id ada2fe7eead31-4a5b15cedd6so1941121137.2 for ; Wed, 23 Oct 2024 03:27:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729679219; x=1730284019; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cD3WF3EmzIj/UjMOYgnbsaubBoun6n700zqtkOjtj/0=; b=ayxylcliuAUUgw4W5KZecPRFC45udtydaCUSWPqJEUMQShxGvRex0uVxqoXLsr34Cu 4wOOJ9MlyAueBhZbd5hUXoYy14TkNkQXe7cotlXrFmcJ08OYTWZmFXhDWFBs2OmXAHoo +2Pmvr1hB5R0l3iLF6dNWdQLwvyFP9KoP+nGEHkcHclt2aBH9lmx1A1EfQvAdAOi848g SqjgRejs9dwj/gwPz2qs+36tshcoDHi18oNt7DjxZphEe1PgNo6lNh4ozpPxZTbuS3v2 C402jFQ8x0fb/SBaOLcutE0eyEoR5v7/iNE/93hHaYjn88x+PzAYvsQgL+JVBU4n2HD6 er/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729679219; x=1730284019; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cD3WF3EmzIj/UjMOYgnbsaubBoun6n700zqtkOjtj/0=; b=GXsXD+FEfqTsPJ/gZm7wtkl6WQQbJndmWtLqYNrzL1NW6Awr2iQwOBRKpM6LIZjn5d XI42ollIsiQvJFuOYmWh8x7pdnSQVxwJsVE9wkITEJBaXuns1Dww4cdc9OXdCIMLE7EW RRSifUD1FiRyvmgdXu322KwuAXgf73as3lkZX9oO9Qbph4HURGxAUzTtztktMrHPhH6m R/WHhuZnx5XMx3rF1B5NY4VwLvEo2MpUC3OfNSP9Zu0/aadax+sOY6rtH0tWnt/JJqWD IhUa4A4bCTQ0LA59xqs0wS8AQR4g6SHCG+o74ThovDKHJsgHdhSfYkhuZj90EV+RH7bO FTtA== X-Forwarded-Encrypted: i=1; AJvYcCUWXofUqwN1VHoBqrQqYaOz3anyMJVE4nbQ9JGj+DKl0h2bh/H6xwPv7V8CJEa2YWIEBjImSyBR8w==@kvack.org X-Gm-Message-State: AOJu0YwjyRpVMVrsAIzSzDhHUX+2hnI9MOOdb3ErRGr6zNSbCh4oycR0 p5G7YlrIUeXDk5U3UGNiQ/3/6259csAbJsSWCV3Xm8+xn+S2Tf+spz5Q46aB3WQ2z6SDQ8CEA6C i7gse5aqQbLETauJpFOUBdM/bnMM= X-Google-Smtp-Source: AGHT+IHoSZ9iJCWtN80lyMqQitxDGgAShQFIS0lvgU/Pc6cfEG/3CV/yHOpOqP7fXB3VcEHlkC+ua8YEP2qjdiHnOR4= X-Received: by 2002:a05:6102:4411:b0:4a3:cce7:8177 with SMTP id ada2fe7eead31-4a751c07bddmr2531920137.16.1729679219093; Wed, 23 Oct 2024 03:26:59 -0700 (PDT) MIME-Version: 1.0 References: <20241018105026.2521366-1-usamaarif642@gmail.com> <5313c721-9cf1-4ecd-ac23-1eeddabd691f@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Wed, 23 Oct 2024 23:26:47 +1300 Message-ID: Subject: Re: [RFC 0/4] mm: zswap: add support for zswapin of large folios To: Usama Arif Cc: senozhatsky@chromium.org, minchan@kernel.org, hanchuanhua@oppo.com, v-songbaohua@oppo.com, akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, david@redhat.com, willy@infradead.org, kanchana.p.sridhar@intel.com, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, riel@surriel.com, shakeel.butt@linux.dev, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: nwzcyy8pjd6edgjiwic1b8skk9g1xhzi X-Rspamd-Queue-Id: 89B571C0004 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1729679187-370412 X-HE-Meta: U2FsdGVkX19bmkaoJBdvDLgs/sGq2pgzVGW6XaEHYS6oWiBDqEg22TK5fdOjmLTSrtFCoHu77ZDdVv92cxF5bf7BQq7G7jHRiFzzago0MOFDLROymqqDJASRsYEnKQvF6I7vgLy2eQTOFs2aNVl08Pzy26m1VoGiYSOt8HGX2wmi7g4lCH28uhRuRz3QF4950l1e5rz/equrNMbsEj7zSzjN84Mcxx40BoILCmMbSdd2Zxweavj3aS5xAaEEhcC6T8FFqr31htywi43IyEdh4QHJ5Rehz5Ny+suI2lW4W9WAkQKRa6ywtAFIM6Ri1aUJrC84IEVZCGuyFhqhepMHeisBitOIEjCOTeB5IEZKxNuECLjtwp/9I+4O3EQAuZk6motdpWHxpdaRKdziOlyRaZPn+sxVq6dTfS/hwaSDGVX9tYDZvTpOm8OSkfcNb/DWuzUQHwe4jD3Inzg3GfH8vtBPUKsPgBxkdJWtggLH+kGQvM2bGznbW5OyOQn3IhL05u4hs2DZiTsWS1ssK3u7RiKjZFA/401t0T9pbenkH0E7Qn6xinNlqK+wAE4GEnWUtznfdMlVyLiT8DwPwacsDxYLSTo4gXmWDOsNUwAVdNBcmUGtaBYFxdE/baTM74FsBjIhsTKtTCKG7OGAgJ9/JMmq8nJg3nQQeQufo/vVvuG7plOPBXTlZr90cuWKYGrbb9HELGgoEefp8yTkh+Z2SeYR85jx+o9N3wODEOibR3EAkECZl/vHgMyDAFtcOyiaSIL9j+ZiDzDUD/yyMjFp+MZW+04at5s63UsiZotC8nNSrDJTwTRA0Bi/deLTen8oqQofXlcLFB6XGsa77xqPyo9qDhBjvVyPN96Kz1oo1kBE/9IvJiUDNp+4V03lUKw1sUA3KxxIhwGmlXHizyXVh8IHE0/fhqL93ila5ZPQvdPRskbHoUR1uug9XicdaGmjOhAq+dDFX/Z5t4422F3 5I6QmpFA FSGYhHCb++BeMt5Rfvk0rrsxIbMaOc+XeDDJ3lYjw8tFcITr/M7ZEYH0WQZnrpEhoxkEWmyJkp88nlWeZQ2Smn2lkDgEmjRSOXFVuwe50cBYcX8iNX26mjd1kKdpw+Wn4fOFBlrWSuA4xcijr0M0c/gdj/B2352Fz8n8O7PfUahMjiz4C2s9QoO8yxjI6Op1Yg88dWAg4Z5HxdAPW4teEwrK2MwHDsMbYw8Lmo0ICh+xAtdlHUyRZ0dl+LWLPfVJT8X97YAQxro1Gr27xhD71rQ9XW8tZD9SdMeNHBdRYlC9qVgQe7An9lw2UbPIkUeGp/69pg6w/3gjXNe4YsqDOJkBRI3HvBoXNEtlV65vDIcUQ80R0m+gWIRtM1k3myOmfXOibOWDRpeOpqM+vwcYr+DND7sbU9ddJEk+weg3vNWl0b3Gxb/pslg/HFOoc3na2dJrgMFEOhuzw2JM+nZ2T/78iAo8deGluZmvp1AsHzX8Oo9gAVvolZSYsvg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 23, 2024 at 11:07=E2=80=AFAM Barry Song <21cnbao@gmail.com> wro= te: > > On Wed, Oct 23, 2024 at 10:17=E2=80=AFAM Usama Arif wrote: > > > > > > > > On 22/10/2024 21:46, Barry Song wrote: > > > On Wed, Oct 23, 2024 at 4:26=E2=80=AFAM Usama Arif wrote: > > >> > > >> > > >> > > >> On 21/10/2024 11:40, Usama Arif wrote: > > >>> > > >>> > > >>> On 21/10/2024 06:09, Barry Song wrote: > > >>>> On Fri, Oct 18, 2024 at 11:50=E2=80=AFPM Usama Arif wrote: > > >>>>> > > >>>>> After large folio zswapout support added in [1], this patch adds > > >>>>> support for zswapin of large folios to bring it on par with zram. > > >>>>> This series makes sure that the benefits of large folios (fewer > > >>>>> page faults, batched PTE and rmap manipulation, reduced lru list, > > >>>>> TLB coalescing (for arm64 and amd)) are not lost at swap out when > > >>>>> using zswap. > > >>>>> > > >>>>> It builds on top of [2] which added large folio swapin support fo= r > > >>>>> zram and provides the same level of large folio swapin support as > > >>>>> zram, i.e. only supporting swap count =3D=3D 1. > > >>>>> > > >>>>> Patch 1 skips swapcache for swapping in zswap pages, this should = improve > > >>>>> no readahead swapin performance [3], and also allows us to build = on large > > >>>>> folio swapin support added in [2], hence is a prerequisite for pa= tch 3. > > >>>>> > > >>>>> Patch 3 adds support for large folio zswapin. This patch does not= add > > >>>>> support for hybrid backends (i.e. folios partly present swap and = zswap). > > >>>>> > > >>>>> The main performance benefit comes from maintaining large folios = *after* > > >>>>> swapin, large folio performance improvements have been mentioned = in previous > > >>>>> series posted on it [2],[4], so have not added those. Below is a = simple > > >>>>> microbenchmark to measure the time needed *for* zswpin of 1G memo= ry (along > > >>>>> with memory integrity check). > > >>>>> > > >>>>> | no mTHP (ms) | 1M mTHP enabled= (ms) > > >>>>> Base kernel | 1165 | 1163 > > >>>>> Kernel with mTHP zswpin series | 1203 | 738 > > >>>> > > >>>> Hi Usama, > > >>>> Do you know where this minor regression for non-mTHP comes from? > > >>>> As you even have skipped swapcache for small folios in zswap in pa= tch1, > > >>>> that part should have some gain? is it because of zswap_present_te= st()? > > >>>> > > >>> > > >>> Hi Barry, > > >>> > > >>> The microbenchmark does a sequential read of 1G of memory, so it pr= obably > > >>> isnt very representative of real world usecases. This also means th= at > > >>> swap_vma_readahead is able to readahead accurately all pages in its= window. > > >>> With this patch series, if doing 4K swapin, you get 1G/4K calls of = fast > > >>> do_swap_page. Without this patch, you get 1G/(4K*readahead window) = of slow > > >>> do_swap_page calls. I had added some prints and I was seeing 8 page= s being > > >>> readahead in 1 do_swap_page. The larger number of calls causes the = slight > > >>> regression (eventhough they are quite fast). I think in a realistic= scenario, > > >>> where readahead window wont be as large, there wont be a regression= . > > >>> The cost of zswap_present_test in the whole call stack of swapping = page is > > >>> very low and I think can be ignored. > > >>> > > >>> I think the more interesting thing is what Kanchana pointed out in > > >>> https://lore.kernel.org/all/f2f2053f-ec5f-46a4-800d-50a3d2e61bff@gm= ail.com/ > > >>> I am curious, did you see this when testing large folio swapin and = compression > > >>> at 4K granuality? Its looks like swap thrashing so I think it would= be common > > >>> between zswap and zram. I dont have larger granuality zswap compres= sion done, > > >>> which is why I think there is a regression in time taken. (It could= be because > > >>> its tested on intel as well). > > >>> > > >>> Thanks, > > >>> Usama > > >>> > > >> > > >> Hi, > > >> > > >> So I have been doing some benchmarking after Kanchana pointed out a = performance > > >> regression in [1] of swapping in large folio. I would love to get th= oughts from > > >> zram folks on this, as thats where large folio swapin was first adde= d [2]. > > >> As far as I can see, the current support in zram is doing large foli= o swapin > > >> at 4K granuality. The large granuality compression in [3] which was = posted > > >> in March is not merged, so I am currently comparing upstream zram wi= th this series. > > >> > > >> With the microbenchmark below of timing 1G swapin, there was a very = large improvement > > >> in performance by using this series. I think similar numbers would b= e seen in zram. > > > > > > Imagine running several apps on a phone and switching > > > between them: A =E2=86=92 B =E2=86=92 C =E2=86=92 D =E2=86=92 E =E2= =80=A6 =E2=86=92 A =E2=86=92 B =E2=80=A6 The app > > > currently on the screen retains its memory, while the ones > > > sent to the background are swapped out. When we bring > > > those apps back to the foreground, their memory is restored. > > > This behavior is quite similar to what you're seeing with > > > your microbenchmark. > > > > > > > Hi Barry, > > > > Thanks for explaining this! Do you know if there is some open source be= nchmark > > we could use to show an improvement in app switching with large folios? > > > > I=E2=80=99m fairly certain the Android team has this benchmark, but it=E2= =80=99s not > open source. > > A straightforward way to simulate this is to use a script that > cyclically launches multiple applications, such as Chrome, Firefox, > Office, PDF, and others. > > for example: > > launch chrome; > launch firefox; > launch youtube; > .... > launch chrome; > launch firefox; > .... > > On Android, we have "Android activity manager 'am' command" to do that. > https://gist.github.com/tsohr/5711945 > > Not quite sure if other windows managers have similar tools. > > > Also I guess swap thrashing can happen when apps are brought back to fo= reground? > > > > Typically, the foreground app doesn't experience much swapping, > as it is the most recently or frequently used. However, this may > not hold for very low-end phones, where memory is significantly > less than the app's working set. For instance, we can't expect a > good user experience when playing a large game that requires 8GB > of memory on a 4GB phone! :-) > And for low-end phones, we never even enable mTHP. > > > >> > > >> But when doing kernel build test, Kanchana saw a regression in [1]. = I believe > > >> its because of swap thrashing (causing large zswap activity), due to= larger page swapin. > > >> The part of the code that decides large folio swapin is the same bet= ween zswap and zram, > > >> so I believe this would be observed in zram as well. > > > > > > Is this an extreme case where the workload's working set far > > > exceeds the available memory by memcg limitation? I doubt mTHP > > > would provide any real benefit from the start if the workload is boun= d to > > > experience swap thrashing. What if we disable mTHP entirely? > > > > > > > I would agree, this is an extreme case. I wanted (z)swap activity to ha= ppen so limited > > memory.max to 4G. > > > > mTHP is beneficial in kernel test benchmarking going from no mTHP to 16= K: > > > > ARM make defconfig; time make -j$(nproc) Image, cgroup memory.max=3D4G > > metric no mTHP 16K mTHP=3Dalways > > real 1m0.613s 0m52.008s > > user 25m23.028s 25m19.488s > > sys 25m45.466s 18m11.640s > > zswpin 1911194 3108438 > > zswpout 6880815 9374628 > > pgfault 120430166 48976658 > > pgmajfault 1580674 2327086 > > > > > > Interesting! We never use a phone to build the Linux kernel, but > let me see if I can find some other machines to reproduce your data. Hi Usama, I suspect the regression occurs because you're running an edge case where the memory cgroup stays nearly full most of the time (this isn't an inherent issue with large folio swap-in). As a result, swapping in mTHP quickly triggers a memcg overflow, causing a swap-out. The next swap-in then recreates the overflow, leading to a repeating cycle. We need a way to stop the cup from repeatedly filling to the brim and overflowing. While not a definitive fix, the following change might help improve the situation: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 17af08367c68..f2fa0eeb2d9a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4559,7 +4559,10 @@ int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, memcg =3D get_mem_cgroup_from_mm(mm); rcu_read_unlock(); - ret =3D charge_memcg(folio, memcg, gfp); + if (folio_test_large(folio) && mem_cgroup_margin(memcg) < MEMCG_CHARGE_BATCH) + ret =3D -ENOMEM; + else + ret =3D charge_memcg(folio, memcg, gfp); css_put(&memcg->css); return ret; } Please confirm if it makes the kernel build with memcg limitation faster. If so, let's work together to figure out an official patch :-) The above code hasn't con= sider the parent memcg's overflow, so not an ideal fix. > > > > > > > >> > > >> My initial thought was this might be because its intel, where you do= nt have the advantage > > >> of TLB coalescing, so tested on AMD and ARM, but the regression is t= here on AMD > > >> and ARM as well, though a bit less (have added the numbers below). > > >> > > >> The numbers show that the zswap activity increases and page faults d= ecrease. > > >> Overall this does result in sys time increasing and real time slight= ly increases, > > >> likely because the cost of increased zswap activity is more than the= benefit of > > >> lower page faults. > > >> I can see in [3] that pagefaults reduced in zram as well. > > >> > > >> Large folio swapin shows good numbers in microbenchmarks that just t= arget reduce page > > >> faults and sequential swapin only, but not in kernel build test. Is = a similar regression > > >> observed with zram when enabling large folio swapin on kernel build = test? Maybe large > > >> folio swapin makes more sense on workloads where mappings are kept f= or a longer time? > > >> > > > > > > I suspect this is because mTHP doesn't always benefit workloads > > > when available memory is quite limited compared to the working set. > > > In that case, mTHP swap-in might introduce more features that > > > exacerbate the problem. We used to have an extra control "swapin_enab= led" > > > for swap-in, but it never gained much traction: > > > https://lore.kernel.org/linux-mm/20240726094618.401593-5-21cnbao@gmai= l.com/ > > > We can reconsider whether to include the knob, but if it's better > > > to disable mTHP entirely for these cases, we can still adhere to > > > the policy of "enabled". > > > > > Yes I think this makes sense to have. The only thing is, its too many k= nobs! > > I personally think its already difficult to decide upto which mTHP size= we > > should enable (and I think this changes per workload). But if we add sw= apin_enabled > > on top of that it can make things more difficult. > > > > > Using large block compression and decompression in zRAM will > > > significantly reduce CPU usage, likely making the issue unnoticeable. > > > However, the default minimum size for large block support is currentl= y > > > set to 64KB(ZSMALLOC_MULTI_PAGES_ORDER =3D 4). > > > > > > > I saw that the patch was sent in March, and there werent any updates af= ter? > > Maybe I can try and cherry-pick that and see if we can develop large > > granularity compression for zswap. > > will provide an updated version next week. > > > > > >> > > >> Kernel build numbers in cgroup with memory.max=3D4G to trigger zswap > > >> Command for AMD: make defconfig; time make -j$(nproc) bzImage > > >> Command for ARM: make defconfig; time make -j$(nproc) Image > > >> > > >> > > >> AMD 16K+32K THP=3Dalways > > >> metric mm-unstable mm-unstable + large folio zswapin se= ries > > >> real 1m23.038s 1m23.050s > > >> user 53m57.210s 53m53.437s > > >> sys 7m24.592s 7m48.843s > > >> zswpin 612070 999244 > > >> zswpout 2226403 2347979 > > >> pgfault 20667366 20481728 > > >> pgmajfault 385887 269117 > > >> > > >> AMD 16K+32K+64K THP=3Dalways > > >> metric mm-unstable mm-unstable + large folio zswapin se= ries > > >> real 1m22.975s 1m23.266s > > >> user 53m51.302s 53m51.069s > > >> sys 7m40.168s 7m57.104s > > >> zswpin 676492 1258573 > > >> zswpout 2449839 2714767 > > >> pgfault 17540746 17296555 > > >> pgmajfault 429629 307495 > > >> -------------------------- > > >> ARM 16K+32K THP=3Dalways > > >> metric mm-unstable mm-unstable + large folio zswapin se= ries > > >> real 0m51.168s 0m52.086s > > >> user 25m14.715s 25m15.765s > > >> sys 17m18.856s 18m8.031s > > >> zswpin 3904129 7339245 > > >> zswpout 11171295 13473461 > > >> pgfault 37313345 36011338 > > >> pgmajfault 2726253 1932642 > > >> > > >> > > >> ARM 16K+32K+64K THP=3Dalways > > >> metric mm-unstable mm-unstable + large folio zswapin se= ries > > >> real 0m52.017s 0m53.828s > > >> user 25m2.742s 25m0.046s > > >> sys 18m24.525s 20m26.207s > > >> zswpin 4853571 8908664 > > >> zswpout 12297199 15768764 > > >> pgfault 32158152 30425519 > > >> pgmajfault 3320717 2237015 > > >> > > >> > > >> Thanks! > > >> Usama > > >> > > >> > > >> [1] https://lore.kernel.org/all/f2f2053f-ec5f-46a4-800d-50a3d2e61bff= @gmail.com/ > > >> [2] https://lore.kernel.org/all/20240821074541.516249-3-hanchuanhua@= oppo.com/ > > >> [3] https://lore.kernel.org/all/20240327214816.31191-1-21cnbao@gmail= .com/ > > >> > > >>> > > >>>>> > > >>>>> The time measured was pretty consistent between runs (~1-2% varia= tion). > > >>>>> There is 36% improvement in zswapin time with 1M folios. The perc= entage > > >>>>> improvement is likely to be more if the memcmp is removed. > > >>>>> > > >>>>> diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/= testing/selftests/cgroup/test_zswap.c > > >>>>> index 40de679248b8..77068c577c86 100644 > > >>>>> --- a/tools/testing/selftests/cgroup/test_zswap.c > > >>>>> +++ b/tools/testing/selftests/cgroup/test_zswap.c > > >>>>> @@ -9,6 +9,8 @@ > > >>>>> #include > > >>>>> #include > > >>>>> #include > > >>>>> +#include > > >>>>> +#include > > >>>>> > > >>>>> #include "../kselftest.h" > > >>>>> #include "cgroup_util.h" > > >>>>> @@ -407,6 +409,74 @@ static int test_zswap_writeback_disabled(con= st char *root) > > >>>>> return test_zswap_writeback(root, false); > > >>>>> } > > >>>>> > > >>>>> +static int zswapin_perf(const char *cgroup, void *arg) > > >>>>> +{ > > >>>>> + long pagesize =3D sysconf(_SC_PAGESIZE); > > >>>>> + size_t memsize =3D MB(1*1024); > > >>>>> + char buf[pagesize]; > > >>>>> + int ret =3D -1; > > >>>>> + char *mem; > > >>>>> + struct timeval start, end; > > >>>>> + > > >>>>> + mem =3D (char *)memalign(2*1024*1024, memsize); > > >>>>> + if (!mem) > > >>>>> + return ret; > > >>>>> + > > >>>>> + /* > > >>>>> + * Fill half of each page with increasing data, and keep = other > > >>>>> + * half empty, this will result in data that is still com= pressible > > >>>>> + * and ends up in zswap, with material zswap usage. > > >>>>> + */ > > >>>>> + for (int i =3D 0; i < pagesize; i++) > > >>>>> + buf[i] =3D i < pagesize/2 ? (char) i : 0; > > >>>>> + > > >>>>> + for (int i =3D 0; i < memsize; i +=3D pagesize) > > >>>>> + memcpy(&mem[i], buf, pagesize); > > >>>>> + > > >>>>> + /* Try and reclaim allocated memory */ > > >>>>> + if (cg_write_numeric(cgroup, "memory.reclaim", memsize)) = { > > >>>>> + ksft_print_msg("Failed to reclaim all of the requ= ested memory\n"); > > >>>>> + goto out; > > >>>>> + } > > >>>>> + > > >>>>> + gettimeofday(&start, NULL); > > >>>>> + /* zswpin */ > > >>>>> + for (int i =3D 0; i < memsize; i +=3D pagesize) { > > >>>>> + if (memcmp(&mem[i], buf, pagesize)) { > > >>>>> + ksft_print_msg("invalid memory\n"); > > >>>>> + goto out; > > >>>>> + } > > >>>>> + } > > >>>>> + gettimeofday(&end, NULL); > > >>>>> + printf ("zswapin took %fms to run.\n", (end.tv_sec - star= t.tv_sec)*1000 + (double)(end.tv_usec - start.tv_usec) / 1000); > > >>>>> + ret =3D 0; > > >>>>> +out: > > >>>>> + free(mem); > > >>>>> + return ret; > > >>>>> +} > > >>>>> + > > >>>>> +static int test_zswapin_perf(const char *root) > > >>>>> +{ > > >>>>> + int ret =3D KSFT_FAIL; > > >>>>> + char *test_group; > > >>>>> + > > >>>>> + test_group =3D cg_name(root, "zswapin_perf_test"); > > >>>>> + if (!test_group) > > >>>>> + goto out; > > >>>>> + if (cg_create(test_group)) > > >>>>> + goto out; > > >>>>> + > > >>>>> + if (cg_run(test_group, zswapin_perf, NULL)) > > >>>>> + goto out; > > >>>>> + > > >>>>> + ret =3D KSFT_PASS; > > >>>>> +out: > > >>>>> + cg_destroy(test_group); > > >>>>> + free(test_group); > > >>>>> + return ret; > > >>>>> +} > > >>>>> + > > >>>>> /* > > >>>>> * When trying to store a memcg page in zswap, if the memcg hits= its memory > > >>>>> * limit in zswap, writeback should affect only the zswapped pag= es of that > > >>>>> @@ -584,6 +654,7 @@ struct zswap_test { > > >>>>> T(test_zswapin), > > >>>>> T(test_zswap_writeback_enabled), > > >>>>> T(test_zswap_writeback_disabled), > > >>>>> + T(test_zswapin_perf), > > >>>>> T(test_no_kmem_bypass), > > >>>>> T(test_no_invasive_cgroup_shrink), > > >>>>> }; > > >>>>> > > >>>>> [1] https://lore.kernel.org/all/20241001053222.6944-1-kanchana.p.= sridhar@intel.com/ > > >>>>> [2] https://lore.kernel.org/all/20240821074541.516249-1-hanchuanh= ua@oppo.com/ > > >>>>> [3] https://lore.kernel.org/all/1505886205-9671-5-git-send-email-= minchan@kernel.org/T/#u > > >>>>> [4] https://lwn.net/Articles/955575/ > > >>>>> > > >>>>> Usama Arif (4): > > >>>>> mm/zswap: skip swapcache for swapping in zswap pages > > >>>>> mm/zswap: modify zswap_decompress to accept page instead of fol= io > > >>>>> mm/zswap: add support for large folio zswapin > > >>>>> mm/zswap: count successful large folio zswap loads > > >>>>> > > >>>>> Documentation/admin-guide/mm/transhuge.rst | 3 + > > >>>>> include/linux/huge_mm.h | 1 + > > >>>>> include/linux/zswap.h | 6 ++ > > >>>>> mm/huge_memory.c | 3 + > > >>>>> mm/memory.c | 16 +-- > > >>>>> mm/page_io.c | 2 +- > > >>>>> mm/zswap.c | 120 ++++++++++++++-= ------ > > >>>>> 7 files changed, 99 insertions(+), 52 deletions(-) > > >>>>> > > >>>>> -- > > >>>>> 2.43.5 > > >>>>> > > >>>> > > > > Thanks Barry