From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D4FDC2BA18 for ; Thu, 20 Jun 2024 06:10:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8CEFD8D009F; Thu, 20 Jun 2024 02:10:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87DD78D0091; Thu, 20 Jun 2024 02:10:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7460E8D009F; Thu, 20 Jun 2024 02:10:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E97388D0091 for ; Thu, 20 Jun 2024 02:10:00 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 647B8A17A9 for ; Thu, 20 Jun 2024 06:10:00 +0000 (UTC) X-FDA: 82250241360.06.0FA269C Received: from mail-ua1-f42.google.com (mail-ua1-f42.google.com [209.85.222.42]) by imf25.hostedemail.com (Postfix) with ESMTP id 89902A0018 for ; Thu, 20 Jun 2024 06:09:58 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mZVKyNfS; spf=pass (imf25.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.42 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718863793; a=rsa-sha256; cv=none; b=fYt+8Y0mVuUoiX0Ev483cuw0Ki5Qnt3o0MP+oNPEJkaDI9xhYAB1yVXuBpUdCSFrJVpnzA eZV7LnG3qRbJCh3wGdv6OaH53iBy3GsHkM9cJ/s2bSDd4BDM5gxEMW5m3ol1Jjygwe6ZHs 8UAXoNODhTEFCz+jrQ6viF6qIY3JuIw= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mZVKyNfS; spf=pass (imf25.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.42 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718863793; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tiyEdcAwGphYqKJNGYhaFEYFCGSaKvPUv7A1+7DkOss=; b=dPdcnoJHN3liso5USydz1THiV0/9Nz+kFjTN+3oOGZHinpU0S57b9x7zWJUrYGcIZQqFMK mBsBGzny3Thzf2l5PaC7Qn0b3US9TeX0j3+HY6hEJzzc3YShvKKFXqdZFqRCiG2tF1Mydb 6/MfaWU83mPD7/gY4prZW2EA+PuFPZg= Received: by mail-ua1-f42.google.com with SMTP id a1e0cc1a2514c-80f4f7e6856so159204241.2 for ; Wed, 19 Jun 2024 23:09:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718863797; x=1719468597; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tiyEdcAwGphYqKJNGYhaFEYFCGSaKvPUv7A1+7DkOss=; b=mZVKyNfS2mPeyU0tkivEsqsMP/sKqBSfuOQaJrY+3NncvZhO0t1cEu4c5jCHVXmeDX KAYvQsdSJ+XiaplathBOkf5Rh3T8mC78q6zIL3DKiwSeQUdg4fb7NX09tdnuSVzbeto9 QRC8T1gxTQ9qP1TujIKtGLBlVTpoMHMCj4mBzkV1AScSNiS77riTzuOW4/ayA9gAjClT 8MRPXc5+FVDychaN1b6CU2nbJjvzr0Hexyynjxxa1auYJdnBKRJXj12IXo2WuM1G7rHz 3IZ4zgV0uauCteOQNxzrboklCOaclw6djmTvJDDA8XrkzBoy4m0AZDoSUIGXzrynO3Sl ZjCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718863797; x=1719468597; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tiyEdcAwGphYqKJNGYhaFEYFCGSaKvPUv7A1+7DkOss=; b=HXBibh1f0fj2MHYLWjsAasIrtA06KbqrRPRwx48QLMIbwK0XKUaKAOPlxzfrBfvuOc YnF+4gahH1rHsQI2wAs+AEhvKCd532oC0b4Qb8tTIqrFwPVqU2pOF8W4tDIC6zD3C9eu Q1Vh7MidjQM0Ry+BKCWgp2aYS273T+pAtRyXPE2W1GlfZLO40pb3GU5YaCzvKEj45SGt 2AcWF9Z9dj3rAXuTd8Lssa8pq3lZBlBIliQRgYUjAhklhMlYraO+l0pND+Icd1sxu4vs B24fN6BX3XFsbiywOBQUYQNFSk8iYigGrrRKGk5Z6VKR47cWC3aL5EjvEObZBd0AzM2f fDDw== X-Forwarded-Encrypted: i=1; AJvYcCXeFQMGLSQxnxN/O5n6/BYHx9O7QsihpaptjIg9oiMaS0Gi09KwrjzV1dQsIxPccMHkEEblLGCVl0UKDMXGD/g91Bw= X-Gm-Message-State: AOJu0YxSsHb0BHnCQeNgEvoNuh6tSHxJ7ZPlW/a+YePA4vxFrW4c9aiC PbDcKaYNsYekkfI+YoCDGHWEKkbyJoigycj2gGrwzYSN7oNOiHWhZveqxqROzaLjxqXgOBEt35S yG2R8S2Y/QXD+/wMX4pHbEX0X8UI= X-Google-Smtp-Source: AGHT+IGloYsi00Kp4ETdPxBpujFzbP6slTVUhvShKapWBEYIWQ15VyO1o1NUI0/26ubjeeHjwdOYiNovs5KGiKKhXxA= X-Received: by 2002:a67:ea97:0:b0:48d:b646:a4fd with SMTP id ada2fe7eead31-48f130e001bmr4435000137.29.1718863797410; Wed, 19 Jun 2024 23:09:57 -0700 (PDT) MIME-Version: 1.0 References: <20240620002648.75204-1-21cnbao@gmail.com> <87zfrg2xce.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o77w2nrw.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87o77w2nrw.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Barry Song <21cnbao@gmail.com> Date: Thu, 20 Jun 2024 18:09:45 +1200 Message-ID: Subject: Re: [PATCH] selftests/mm: Introduce a test program to assess swap entry allocation for thp_swapout To: "Huang, Ying" Cc: akpm@linux-foundation.org, shuah@kernel.org, linux-mm@kvack.org, ryan.roberts@arm.com, chrisl@kernel.org, david@redhat.com, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: qs1pxckp5knet9cowuc1cpackkaxmqp8 X-Rspamd-Queue-Id: 89902A0018 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1718863798-441209 X-HE-Meta: U2FsdGVkX1+MF3v8hyor4sa0etV530dcoewZOoEiB+rRQszSNSynkHmVxF1Ikee1kFT6CLJuuKIfNgnsfnbn787bWiyDETu8XzqCjQoiAg20NVvdv5Kj5ErcltUbJ0EX3Z3jNulsheeVsePw58MqVF6C6kK8fuIbRjENELQeWzpyC2KySQDLbHfSohLrwWBRFMMhfTcMpfCKwkNLnI+FTerTwpUvSTMouCBgYJm70mi8yrVl1AzfvM03XbIbbqx/rrvwCf9IPif6v59ttgmdezjr0gC52s84s7wZpJVNqMR3gxUjk8PNUVuZA9hD8B8X0sv6m5+v8dsUjfD0mfXmeNS/lhjXhzNfXUb183XSe0Pm+4t9UP+C0zFv+Fqyt4QnpqC8jMuF0zN2s9fvvV87Wyuwe3F9dqPxC1K8EYOmExDORu9StEHK9u5ZdsLzl+Mogg7q7AkebtkG4K9c8FLRmODE1yE2QBRuWRptI4flATA3ydJdnekUolbh5Bo6etO2hM25TFHRqVx7/ZSj50WF4JTeBmRxlB1W6KEU4Ms4zAtwSW7h+WM2iynwgXLxKZZsVN24ly1eiP+XNXJcPSfa1wFhRvqgvjB92bIt8MUsl5Sv6qESv24blM6XWCKzrC9c1ICC4A3yQJaVFNZEwg7aMnD0PlrJn5S/9NNQr+RzFqJExK/zCOqRCNErr1DryA+BNLi6laya5oMbOaiopuQHtoTwbRAfiRCZTppkvRADiRfWYSBhSQ68fHJMM2WozLKJZoFRnGcgAqDAWlTxO0LGn51l73yiQFR7AsL67NpAN7r1EXgCUx7abO//mpWQhJHDHaF2LwLoAvNqkKiR6i6dOfFahuwWoUIB87Jtjz5tRcMPfX2/BnUEPknbifulSUBaX8O06xhuDDuLc+vCEwBUhYsWX2tUhwqmjyDqIPMsJFn1OOL91Sote3zPLeUMY1P75A9laqgCt/lrjgUK/nz 56kiTDzP 7cEu3kGVL4annO5T6URBaJOI0+XsNgyAzMuwNRBz2heza0CgTCZQDnYfQsLFCjfsrk7DRbXEJIFPHtbZoJT0nvcNtbfNFC/EwnQLya0vMgCcBgAjZP42ERTpppmQhBqbjPmLcA1zdkRjWiQlx3VELWVw96GgpNNVrNdc/DpNTT0HKf8cVH9sA1ilOS4O3Su+muVZ3ZG8yGVXw+D9c3OORZo2LJhvzWgNnPfg/CA10uR7Swqnr4zKF+6z76BuMCJDV2SqkXmk3iCoNmzXa6+SfLm/2ZY4RFNt+2EhdwUVCE6nC2AhXufzlccSddzrgEDaZnmlSriR5DW4E7Rtim6+yR0NyoCDgycsl08kV/Z9Z+eezygLlgeq28Tw0g27gK1VuHC8CMf1YOc/f9Jdyp+WGSOrDUAJMC8+nyedVIe6WwbCjUlrQCVnP1b6raZKpI3zQOsfh1/czJB4ypY4IYJ/p4IB/gTUX+IYKI7c5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 20, 2024 at 5:22=E2=80=AFPM Huang, Ying = wrote: > > Barry Song <21cnbao@gmail.com> writes: > > > On Thu, Jun 20, 2024 at 1:55=E2=80=AFPM Huang, Ying wrote: > >> > >> Barry Song <21cnbao@gmail.com> writes: > >> > >> > From: Barry Song > >> > > >> > Both Ryan and Chris have been utilizing the small test program to ai= d > >> > in debugging and identifying issues with swap entry allocation. Whil= e > >> > a real or intricate workload might be more suitable for assessing th= e > >> > correctness and effectiveness of the swap allocation policy, a small > >> > test program presents a simpler means of understanding the problem a= nd > >> > initially verifying the improvements being made. > >> > > >> > Let's endeavor to integrate it into the self-test suite. Although it > >> > presently only accommodates 64KB and 4KB, I'm optimistic that we can > >> > expand its capabilities to support multiple sizes and simulate more > >> > complex systems in the future as required. > >> > >> IIUC, this is a performance test program instead of functionality test > >> program. Does it match the purpose of the kernel selftest? > > > > I have a differing perspective. I maintain that the functionality is > > not functioning > > as expected. Despite having all the necessary resources for allocation,= failure > > persists, indicating a lack of functionality. > > Is there any user visual functionality issue? Definitely not. If a plane can't take off, taking a train and pretending there's no functionality issue isn't a solution. I have never assigned blame for any mistakes here. On the contrary, I have 100% appreciation for Ryan's work in at least initiating mTHP swapout w/o being split. It took countless experiments for humans to make airplanes commercially viable, but the person who created the first flying airplane remains the greatest. Similarly, Ryan's efforts, combined with your review of his patch= , have enabled us to achieve a better goal here. Without your work, we can't get here at all. However, this is never a reason to refuse to acknowledge that this feature is not actually working. > > >> > >> > Signed-off-by: Barry Song > >> > --- > >> > tools/testing/selftests/mm/Makefile | 1 + > >> > .../selftests/mm/thp_swap_allocator_test.c | 192 +++++++++++++++= +++ > >> > 2 files changed, 193 insertions(+) > >> > create mode 100644 tools/testing/selftests/mm/thp_swap_allocator_te= st.c > >> > > >> > diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/sel= ftests/mm/Makefile > >> > index e1aa09ddaa3d..64164ad66835 100644 > >> > --- a/tools/testing/selftests/mm/Makefile > >> > +++ b/tools/testing/selftests/mm/Makefile > >> > @@ -65,6 +65,7 @@ TEST_GEN_FILES +=3D mseal_test > >> > TEST_GEN_FILES +=3D seal_elf > >> > TEST_GEN_FILES +=3D on-fault-limit > >> > TEST_GEN_FILES +=3D pagemap_ioctl > >> > +TEST_GEN_FILES +=3D thp_swap_allocator_test > >> > TEST_GEN_FILES +=3D thuge-gen > >> > TEST_GEN_FILES +=3D transhuge-stress > >> > TEST_GEN_FILES +=3D uffd-stress > >> > diff --git a/tools/testing/selftests/mm/thp_swap_allocator_test.c b/= tools/testing/selftests/mm/thp_swap_allocator_test.c > >> > new file mode 100644 > >> > index 000000000000..4443a906d0f8 > >> > --- /dev/null > >> > +++ b/tools/testing/selftests/mm/thp_swap_allocator_test.c > >> > @@ -0,0 +1,192 @@ > >> > +// SPDX-License-Identifier: GPL-2.0-or-later > >> > +/* > >> > + * thp_swap_allocator_test > >> > + * > >> > + * The purpose of this test program is helping check if THP swpout > >> > + * can correctly get swap slots to swap out as a whole instead of > >> > + * being split. It randomly releases swap entries through madvise > >> > + * DONTNEED and do swapout on two memory areas: a memory area for > >> > + * 64KB THP and the other area for small folios. The second memory > >> > + * can be enabled by "-s". > >> > + * Before running the program, we need to setup a zRAM or similar > >> > + * swap device by: > >> > + * echo lzo > /sys/block/zram0/comp_algorithm > >> > + * echo 64M > /sys/block/zram0/disksize > >> > + * echo never > /sys/kernel/mm/transparent_hugepage/hugepages-2048= kB/enabled > >> > + * echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64k= B/enabled > >> > + * mkswap /dev/zram0 > >> > + * swapon /dev/zram0 > >> > + * The expected result should be 0% anon swpout fallback ratio w/ o= r > >> > + * w/o "-s". > >> > + * > >> > + * Author(s): Barry Song > >> > + */ > >> > + > >> > +#define _GNU_SOURCE > >> > +#include > >> > +#include > >> > +#include > >> > +#include > >> > +#include > >> > +#include > >> > +#include > >> > + > >> > +#define MEMSIZE_MTHP (60 * 1024 * 1024) > >> > +#define MEMSIZE_SMALLFOLIO (1 * 1024 * 1024) > >> > +#define ALIGNMENT_MTHP (64 * 1024) > >> > +#define ALIGNMENT_SMALLFOLIO (4 * 1024) > >> > +#define TOTAL_DONTNEED_MTHP (16 * 1024 * 1024) > >> > +#define TOTAL_DONTNEED_SMALLFOLIO (768 * 1024) > >> > +#define MTHP_FOLIO_SIZE (64 * 1024) > >> > + > >> > +#define SWPOUT_PATH \ > >> > + "/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpo= ut" > >> > +#define SWPOUT_FALLBACK_PATH \ > >> > + "/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpo= ut_fallback" > >> > + > >> > +static void *aligned_alloc_mem(size_t size, size_t alignment) > >> > +{ > >> > + void *mem =3D NULL; > >> > + > >> > + if (posix_memalign(&mem, alignment, size) !=3D 0) { > >> > + perror("posix_memalign"); > >> > + return NULL; > >> > + } > >> > + return mem; > >> > +} > >> > + > >> > +static void random_madvise_dontneed(void *mem, size_t mem_size, > >> > + size_t align_size, size_t total_dontneed_size) > >> > +{ > >> > + size_t num_pages =3D total_dontneed_size / align_size; > >> > + size_t i; > >> > + size_t offset; > >> > + void *addr; > >> > + > >> > + for (i =3D 0; i < num_pages; ++i) { > >> > + offset =3D (rand() % (mem_size / align_size)) * align_= size; > >> > + addr =3D (char *)mem + offset; > >> > + if (madvise(addr, align_size, MADV_DONTNEED) !=3D 0) > >> > + perror("madvise dontneed"); > >> > >> IIUC, this simulates align_size (generally 64KB) swap-in. That is, it > >> simulate the effect of large size swap-in when it's not available in > >> kernel. If we have large size swap-in in kernel in the future, this > >> becomes unnecessary. > >> > >> Additionally, we have not reached the consensus that we should always > >> swap-in with swapped-out size. So, I suspect that this test may not > >> reflect real situation in the future. Although it doesn't reflect > >> current situation too. > > > > Disagree again. releasing the whole mTHP swaps is the best case. Even i= n > > the best-case scenario, if we fail, it raises concerns for handling pot= entially > > more challenging situations. > > Repeating sequential anonymous pages writing is the best case. I define the best case as the scenario with the least chance of creating fragments within swapfiles for mTHP to swap out. There is no real difference whether this is done through swapin or madv_dontneed. > > > I don't find it hard to incorporate additional features into this test > > program to simulate more intricate scenarios. > > IMHO, we don't really need this special purpose test. We can have some > more general basic tests, for example, sequential anonymous pages > writing/reading, random anonymous pages writing/reading, and combination > of them. I understand that not all things will be loved by all people. However, befo= re I sent this patch, Chris mentioned that it has been very helpful for him an= d strongly suggested that I contribute it to the self-test suite. By the way, adding sequential and random anonymous pages for read/write operations is definitely in my plan. The absence of this feature isn't a convincing reason to disregard it. > > -- > Best Regards, > Huang, Ying > > >> > >> > + > >> > + memset(addr, 0x11, align_size); > >> > + } > >> > +} > >> > + > >> > >> [snip] > >> > >> -- > >> Best Regards, > >> Huang, Ying Thanks Barry