From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32F8AD3C93D for ; Mon, 21 Oct 2024 05:09:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B9866B007B; Mon, 21 Oct 2024 01:09:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 642956B0082; Mon, 21 Oct 2024 01:09:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BBED6B0083; Mon, 21 Oct 2024 01:09:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2A34B6B007B for ; Mon, 21 Oct 2024 01:09:21 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CB1FB41545 for ; Mon, 21 Oct 2024 05:09:11 +0000 (UTC) X-FDA: 82696430460.24.2D7FDA2 Received: from mail-vs1-f43.google.com (mail-vs1-f43.google.com [209.85.217.43]) by imf23.hostedemail.com (Postfix) with ESMTP id 3D16B14000A for ; Mon, 21 Oct 2024 05:09:09 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Gaf6xfVW; spf=pass (imf23.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729487208; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5ZfuLO1PsYZ/2x/t2VqSxTRS6GMZs9BTeqlNjwBX8hg=; b=gY82mCppPZvJDAdd9iLLonlfUleRWu7ut/5QQjSXpogOOJ85zPWtGxe0NI9DLyX5f4vVcZ OXvXTUmDxtc8hQOYZACtrZk5yXIC5OuXRS8oFmAdupKGep6eimkaA3ganZWMg7/cbhuBJJ anLrrJ+Ntxxw4GebxlpTtCEdiRAElPg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729487208; a=rsa-sha256; cv=none; b=3mgx+HQCCbHPhK30HOWrNTFFr63ZhvnyZHGJZ81iqUhtgeygpGTWBFOY3NPJCBviAYFuDW +h15+71/ITNKRjf3WBvt00pTy3jnebj37kfhNZMUy7n971kbwVJp9ByAZL37FUEy9qLPB1 KwbJlmB36YzVE6kWSrQtsBLTpJd9GWI= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Gaf6xfVW; spf=pass (imf23.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vs1-f43.google.com with SMTP id ada2fe7eead31-4a46f36158cso1194136137.2 for ; Sun, 20 Oct 2024 22:09:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729487357; x=1730092157; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5ZfuLO1PsYZ/2x/t2VqSxTRS6GMZs9BTeqlNjwBX8hg=; b=Gaf6xfVWPPECgvql1151JMc22Vk6X/08TandyrPetCHXU9E035bBpGUsUM0GSLqP/a dkyqV61rHs9jgYswN33N0HfwMddMD7jqLfg8e2enh1U3y/3mfIpcheVLmQOWb+JuYJOO BVKiwBzGuTQ4sKzgcQkicYfiksLuuj6kWTdf4JX1OMKIIKuXMuvCYZPM/KBfWjLMXzLH X6OFmlKO8ATN9KZZGtE3cX/Sho0lN3d9wK44uNOrPn9nDxk1LQvnxcfVnmNdvdkev/XV tXqiP/AIVPqdqUOhlZh5ETfNY41wqpbHXfuRcOvMh3O7QCc4Dqjog8Sm3PsARQbWPSxG SD1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729487357; x=1730092157; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5ZfuLO1PsYZ/2x/t2VqSxTRS6GMZs9BTeqlNjwBX8hg=; b=kUCpyFPMgShlhUXgGZqmgAejY03fJyFP+nfYMEm/xt3Udjr/INVMbbSoqurMzLERvB rk/Ge0VoJCK3hJB/5/3ExIyV3WKS5cbz8gHirRtiMVJPGkWRsWYRdosAIZMdcnSuMJQa DVwkn7gExxXysV6sMOZ4ed4MpwAJ8HcQy5J4TbTzphzhiBpKHTI85EfINHOgU2pahNY1 Fv1YyGuwX2f1r13gx5xB+uc9ZU5p310JWp2xBrAueCRAwrEkGjAAeI7eE2qe3ObM4s0L oVm9bw6iWmdCASxFxlm+LjMSRLrJ1b/rK+uZXs7GMHqKD9B4nfGUVT0HUDn75JmANf1u aSQg== X-Forwarded-Encrypted: i=1; AJvYcCXAdQNwpMgovmfu1CmZEh1rrXEijmlxcU5GWlfAXWd5nlpWru3va+PXzfoE8KogaseqUKEVHxhM8w==@kvack.org X-Gm-Message-State: AOJu0YwIeK77Vf8lMAzVJMeqIcemV/nkNBgX33QFLhPvQJswdxJd/eM7 VdD50TggUdzfV4bEX3da4SsH7P6tD8jqcFMQbptOQT3CjAvHNochuYGYbPhnNdDJMdgr3KXBZ/7 m8/92y36hvyx8p7S/6LjGZ7fe3x4= X-Google-Smtp-Source: AGHT+IH3xl7ThsfvKchUSKP+3qWnkHnpxJwcuvCeybMDE6GJeUa8ZBQy9SW1x/IgozmKqdbiCG+gNSwH41GKQQl1OHg= X-Received: by 2002:a05:6102:c4e:b0:4a4:80a0:98fb with SMTP id ada2fe7eead31-4a5d6a74742mr7957823137.5.1729487357560; Sun, 20 Oct 2024 22:09:17 -0700 (PDT) MIME-Version: 1.0 References: <20241018105026.2521366-1-usamaarif642@gmail.com> In-Reply-To: <20241018105026.2521366-1-usamaarif642@gmail.com> From: Barry Song <21cnbao@gmail.com> Date: Mon, 21 Oct 2024 18:09:06 +1300 Message-ID: Subject: Re: [RFC 0/4] mm: zswap: add support for zswapin of large folios To: Usama Arif Cc: akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, david@redhat.com, willy@infradead.org, kanchana.p.sridhar@intel.com, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, riel@surriel.com, shakeel.butt@linux.dev, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: ctngp759y8m4zbb69epemoaqxh38fnia X-Rspamd-Queue-Id: 3D16B14000A X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729487349-75464 X-HE-Meta: U2FsdGVkX1/un7/kb+yH46nrkIZ5gq8jl2f66ZipHpwQ23ytdCkkQMG0BkCmQi+pZhhCN09/FceE98AYGdZsqF5SqoGRKnKxOzZdv21BUXMMjOlgdmihhvxeyY/3d6rFp5Oztf7cFzBSSl8fQ+qHULApXnMOU7VcVW5y8HUAZhe4WHdv3/4WlkgdG7sa44vfcmiJKxJmCNpQNK/HSzXq3a47wUkbA7+LlZjehEbkgxsWxbsQDSpiVrKUrUAj0ViyoP3tkgBQyN55wyJFjxGh12/BVEOvPudNcFuFQ1QPXPXoYeRLjBzbUdDSm1STddBeAkML5LxiIluYZlESD/vdwEs+KsE1NlF6n2aw/kviILPUiArN2I6aH2SYnM5pObbmzgMXawOUHkk3BPzO9QbCdJHtb4Uj/hh7Cq/uHtbNz9qth/73qB1NZhj1UAS6m6ixfesfhQSvE4lCaUNOVW7eEQXUfcpykBD27W2n4k3LA+09hjLg1hewKdvSfLkd9qOvoou29XSb9//YMGHZMGJL6zjOXsIIY6jpmMn4fu5xTSGor+FIx+AxIURljzsFF2hJt9dDwOYqPN9Vgcng5wPgWzJfclxR1K7jfR3IDf7tO6bFnqzFH9UmEPvCtDAKQrd2p2YwMVKYZgyuxmZdFmA1f9Gkn63l69rt42REn46A/MgOaSbTCaoPaivb7af7pVQKmZ0ERxjBL8ac6zkJE1eU9or9fWvzStUERSyoWuvpVlkKapRfFWWqnHkoLYhyY9ADwY1baG+IryGFer3MD3PYQAejt8Cp2qy3UGc5n0cta90RD0N68WXJmisKGYQTdO2iHWB4/MUKB3bM7xyvaLrSkRur6InSOqRBoXzhtw0Mke8yFmucB09prlG0BBiVBc3u2WoDDSTu4aDOnv+3ox5NvFi7WSzN/a0saVGYZ1JVGDSQAFOjhAV8uPGovAqNR3Ud8Blwz3htgTIDvgzx79f 2b26YmMR D2mWbhAb8fKZ2efAWSu4lrX4OnCEsAByXAX8C+kigmce6G+wqV4mERgxBA9hCUJYZYC5qvykRRrlxjwJXsubeRMMCVRAQ89PqikURT/d8jKf1DJvQn0rWhzMMdK82y7KjwHLv2mBp4nbQyryLo1o/e2nYpLX1YSdl5vr4+UzNe02FOo8Auh8LD2oTSEwI7ia2v1S0IGi3wGZz8T94yBAkANgJYRXGX6jKm97X9+kpaB14VH3Qw5qSxlqRvFp+HPHYpK/1xlS84Wma7716iwGaKPaH67FVPQ4Erg3N5itKQXmCGUXcIN8UwGIg4ltAc3v1XOavCj8mMsMR1mWuc6kT9wa/hGc1/6YZvHvx72j1103UBT0ToObLbISOEP5oSj9XqSXelkKi/S95o3NO9gWu2RQmMVfSJRuSQKrzo4B4WTfzQameIODBXvlRmEvrwlx5ImAh8uOrkTG/0s3T2webzV2sAmp91hJbb5eP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000010, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 18, 2024 at 11:50=E2=80=AFPM Usama Arif wrote: > > After large folio zswapout support added in [1], this patch adds > support for zswapin of large folios to bring it on par with zram. > This series makes sure that the benefits of large folios (fewer > page faults, batched PTE and rmap manipulation, reduced lru list, > TLB coalescing (for arm64 and amd)) are not lost at swap out when > using zswap. > > It builds on top of [2] which added large folio swapin support for > zram and provides the same level of large folio swapin support as > zram, i.e. only supporting swap count =3D=3D 1. > > Patch 1 skips swapcache for swapping in zswap pages, this should improve > no readahead swapin performance [3], and also allows us to build on large > folio swapin support added in [2], hence is a prerequisite for patch 3. > > Patch 3 adds support for large folio zswapin. This patch does not add > support for hybrid backends (i.e. folios partly present swap and zswap). > > The main performance benefit comes from maintaining large folios *after* > swapin, large folio performance improvements have been mentioned in previ= ous > series posted on it [2],[4], so have not added those. Below is a simple > microbenchmark to measure the time needed *for* zswpin of 1G memory (alon= g > with memory integrity check). > > | no mTHP (ms) | 1M mTHP enabled (ms) > Base kernel | 1165 | 1163 > Kernel with mTHP zswpin series | 1203 | 738 Hi Usama, Do you know where this minor regression for non-mTHP comes from? As you even have skipped swapcache for small folios in zswap in patch1, that part should have some gain? is it because of zswap_present_test()? > > The time measured was pretty consistent between runs (~1-2% variation). > There is 36% improvement in zswapin time with 1M folios. The percentage > improvement is likely to be more if the memcmp is removed. > > diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/= selftests/cgroup/test_zswap.c > index 40de679248b8..77068c577c86 100644 > --- a/tools/testing/selftests/cgroup/test_zswap.c > +++ b/tools/testing/selftests/cgroup/test_zswap.c > @@ -9,6 +9,8 @@ > #include > #include > #include > +#include > +#include > > #include "../kselftest.h" > #include "cgroup_util.h" > @@ -407,6 +409,74 @@ static int test_zswap_writeback_disabled(const char = *root) > return test_zswap_writeback(root, false); > } > > +static int zswapin_perf(const char *cgroup, void *arg) > +{ > + long pagesize =3D sysconf(_SC_PAGESIZE); > + size_t memsize =3D MB(1*1024); > + char buf[pagesize]; > + int ret =3D -1; > + char *mem; > + struct timeval start, end; > + > + mem =3D (char *)memalign(2*1024*1024, memsize); > + if (!mem) > + return ret; > + > + /* > + * Fill half of each page with increasing data, and keep other > + * half empty, this will result in data that is still compressibl= e > + * and ends up in zswap, with material zswap usage. > + */ > + for (int i =3D 0; i < pagesize; i++) > + buf[i] =3D i < pagesize/2 ? (char) i : 0; > + > + for (int i =3D 0; i < memsize; i +=3D pagesize) > + memcpy(&mem[i], buf, pagesize); > + > + /* Try and reclaim allocated memory */ > + if (cg_write_numeric(cgroup, "memory.reclaim", memsize)) { > + ksft_print_msg("Failed to reclaim all of the requested me= mory\n"); > + goto out; > + } > + > + gettimeofday(&start, NULL); > + /* zswpin */ > + for (int i =3D 0; i < memsize; i +=3D pagesize) { > + if (memcmp(&mem[i], buf, pagesize)) { > + ksft_print_msg("invalid memory\n"); > + goto out; > + } > + } > + gettimeofday(&end, NULL); > + printf ("zswapin took %fms to run.\n", (end.tv_sec - start.tv_sec= )*1000 + (double)(end.tv_usec - start.tv_usec) / 1000); > + ret =3D 0; > +out: > + free(mem); > + return ret; > +} > + > +static int test_zswapin_perf(const char *root) > +{ > + int ret =3D KSFT_FAIL; > + char *test_group; > + > + test_group =3D cg_name(root, "zswapin_perf_test"); > + if (!test_group) > + goto out; > + if (cg_create(test_group)) > + goto out; > + > + if (cg_run(test_group, zswapin_perf, NULL)) > + goto out; > + > + ret =3D KSFT_PASS; > +out: > + cg_destroy(test_group); > + free(test_group); > + return ret; > +} > + > /* > * When trying to store a memcg page in zswap, if the memcg hits its mem= ory > * limit in zswap, writeback should affect only the zswapped pages of th= at > @@ -584,6 +654,7 @@ struct zswap_test { > T(test_zswapin), > T(test_zswap_writeback_enabled), > T(test_zswap_writeback_disabled), > + T(test_zswapin_perf), > T(test_no_kmem_bypass), > T(test_no_invasive_cgroup_shrink), > }; > > [1] https://lore.kernel.org/all/20241001053222.6944-1-kanchana.p.sridhar@= intel.com/ > [2] https://lore.kernel.org/all/20240821074541.516249-1-hanchuanhua@oppo.= com/ > [3] https://lore.kernel.org/all/1505886205-9671-5-git-send-email-minchan@= kernel.org/T/#u > [4] https://lwn.net/Articles/955575/ > > Usama Arif (4): > mm/zswap: skip swapcache for swapping in zswap pages > mm/zswap: modify zswap_decompress to accept page instead of folio > mm/zswap: add support for large folio zswapin > mm/zswap: count successful large folio zswap loads > > Documentation/admin-guide/mm/transhuge.rst | 3 + > include/linux/huge_mm.h | 1 + > include/linux/zswap.h | 6 ++ > mm/huge_memory.c | 3 + > mm/memory.c | 16 +-- > mm/page_io.c | 2 +- > mm/zswap.c | 120 ++++++++++++++------- > 7 files changed, 99 insertions(+), 52 deletions(-) > > -- > 2.43.5 > Thanks barry