From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A652C2BD05 for ; Thu, 20 Jun 2024 23:34:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 213798D00F5; Thu, 20 Jun 2024 19:34:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 175B88D00EC; Thu, 20 Jun 2024 19:34:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F09A78D00F5; Thu, 20 Jun 2024 19:34:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CE6D28D00EC for ; Thu, 20 Jun 2024 19:34:33 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 80505A21EC for ; Thu, 20 Jun 2024 23:34:33 +0000 (UTC) X-FDA: 82252873626.05.D31E385 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 92FC540004 for ; Thu, 20 Jun 2024 23:34:31 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=FP2d6cSB; spf=pass (imf01.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718926464; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CIN7Gww+jDZQI56muWzH08JyalW6xUV52hZbNwxni74=; b=Whf8HRtSJydLAjsBTdLqDTVih3rMfDoUnzHqOHs/hBiyRLC2WnekjCyMmasxcP0gIgTlRf g0EIlL8YLq++00v5W3vGbP7FYzsryDhI31+0nW0Rg4E/0FtW/WkwRPUQjSwCmKeMhGR8U6 r1V5MAqb4NH4WLD7KTt1/fkhdEotFWA= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=FP2d6cSB; spf=pass (imf01.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718926464; a=rsa-sha256; cv=none; b=K+FUPe/7SbnW9I/AOmkpkXeGuL3P9vJcaTjZEnoeAegGZHXviAybOK1lzxBUkCQgBCiGra 5vP68TXmZnTB3nDX9nkoNqWcWqkPCwgmCei8CRSV43sB7i3yeqyQ3GpqMfYvu1r2fv280H 0L+vnyBRM8mEnl+miIar3m7peTbV+gw= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 94FE062364 for ; Thu, 20 Jun 2024 23:34:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EED12C4AF0C for ; Thu, 20 Jun 2024 23:34:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718926470; bh=h+moqWorq8WvFStIQy5VbHeBGz2hkRvmvutoj2g9ghU=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=FP2d6cSBCNJ2FYG31gdCxnZm8ZYdLIF6LM+Vewbqpac1vi8/lk+SB/ZQDd6gVqUO9 fJtYZAPRH9KKKe1qckyUI30J8O77/CaafAm5CFL/iG/Gt0zyGkqis0XaHpOVTvl/0H VF5oX9UALiYzom/gTeo9+Jebqf0Aw1ZFm1AdOi1QmlhzJxMRC954FfMshLrMIO3uWJ FY5S9fB6N9c/bNbw1us+qlMKLFkZgRfvxkeKFQF/TSIHE/ZDMxZbdmK7vN9jo77UTI XYgVuM0sb/aCI0l/rLnPoGBWyqCm0W/aIYHm2XxQpJa2XL9hxgXMCeaRDD/E0CtZoT Zjut9ESCHxqkA== Received: by mail-il1-f173.google.com with SMTP id e9e14a558f8ab-3762ef0c451so700185ab.0 for ; Thu, 20 Jun 2024 16:34:29 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCVeqHqezSuulkN3Xn00OxHD8v7Yl8Hfnv3fG2nMtwpqYiQCqNN/D0Ac6jzXs2G58IGwgP3hqLk2D30lbEIMwJQf770= X-Gm-Message-State: AOJu0Yyn/9xGpFj8lyNq42MZ+yLrwm/rQGEZOHESFgqHbpkehkwotWon E+UFkjhkjvmCbf7+ELTKCs4IT+pti7/FCU2x8IsUkuhaM3LeWD0h1a5sM1i5m7GBSC2PCUclPua 50OrtT5WNr4V5SbzWr87yMRcJrvRrFPTl40Wc X-Google-Smtp-Source: AGHT+IHkE6H7TRF7ZbXgZtSv58+TsNMbdTF2zyStl9uhMjCL1hM2Me4FJ+Y4JrGQFubTLf9mxa3mCd4FS8rNzUWdgTE= X-Received: by 2002:a92:ca0c:0:b0:375:a8a8:8e7 with SMTP id e9e14a558f8ab-3761d6b359cmr75324335ab.8.1718926469153; Thu, 20 Jun 2024 16:34:29 -0700 (PDT) MIME-Version: 1.0 References: <20240620002648.75204-1-21cnbao@gmail.com> In-Reply-To: <20240620002648.75204-1-21cnbao@gmail.com> From: Chris Li Date: Thu, 20 Jun 2024 16:34:16 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] selftests/mm: Introduce a test program to assess swap entry allocation for thp_swapout To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, shuah@kernel.org, linux-mm@kvack.org, ryan.roberts@arm.com, david@redhat.com, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, ying.huang@intel.com, linux-kselftest@vger.kernel.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: mepuegauum7ddz6cm4zff4zbdewsncs5 X-Rspam-User: X-Rspamd-Queue-Id: 92FC540004 X-Rspamd-Server: rspam02 X-HE-Tag: 1718926471-486451 X-HE-Meta: U2FsdGVkX18Nr6sQdzqV3iOlAxKxRbz9uyaudxDG5P2a5HXTXNSYtt9N6vVv5GpxwDIlNHBu/Bk7Xbea6YwP36IiqXL8vd6oKkKO+Wb6MSnFzp+b8feiBmV1hwJqImrUW6jdj6tU2BVVo0eZrKx01a7ML26YPoCCluO9cYPu9+Go9rEkaRZRZcWrl8CAwvMhL5FSL/XQnASj5p7cODY2F+mWvEYf7+EI9ms6OC2Bw5S9a50tEWNtPcnrji3d/7q2UzQ0OB1EdeRBpWxTpJgLeOqJdxuUXzV635a7BKGBmxQ376QB974lxs4473ryDxMZ2AIrUKYGQVwWaW9ZsnSuogAcP7a5vgEYfbtZTOKCKbKHMyJMLYVKYieJby+2xPa3Pcnz2VIMR/nbHLxUqGQvlEVZO/zp0D24V1xG0x0g0P7E9fOzhP6xWzbJJOHBs7n2WWq/j2YUUCIce6HEdJLPMJJrelFPXJy4lrZvbf0yRLbHP1P+v3NYF00n9Y2CPt7XB2FmQuJJ6crZmXnHVrUJryMIl0ElW9WUW37mcnQqyJkj+MnsjAdPdoJjnKBqOAc0wb5dBJS3kRJEWE2ererBV4/Upso9xZD/AKY9gANU/ELj48+y7A+i0X5YcLJbHOjsrszEx9dy4pqpAw0aD0fbJtJx1coZm8UjST0a7tEF6MFY0bSXDG2qqmNPl05Z5AkJ6RE/65M9tyOCaJCVcs9ZOsy9VeVlP1SfEqDSXlOaHGpytz/yFnrRynzz0uBWqxx4T5jMp8wWSjtlSJOMng07JMB57xb6/pkdff1VplzeMDjomSws7J1zWVXSN6WnIMsblMfakQ2SG3VcFnqE0Fbj1C1sMzwvYFdbD/PT1PZarcr5MQUfgFAiFiJ3Ua3kXGGgkk+EstM5LSttO+gaTvLt7lG9Nf3/W/I0kX/gYgwl6ojFIPTeXzXHA6lKrgpWQ+iDIYw+pIoQ/PjoZ1Dv9+q HaiZRel8 D8Lju36QvU/VuOorCNPbH3qmErEfCUTjw+qmk103z1CDm5qDepwX2nJ3pbc2jdx932l5AfpiueAUir9n0gFtzGX4cDJblSLqU5X8OjMbajSecZCvl+WQWW/zPTgQ/FQdPBHwHeQRlli4gQB11LDmp785ktch1l6LxG8ZFmPbG+bRPYG0T8yo/UlcPu/60bHnAhOyEHZdwOCoiIPQftQos7cYv3m7fKqpVZm2BKuhy3hs0CrSbUdxCljS+9VX9alyRgANCgpdLC7nU8IxSf10ikcCCMQhWrzLy3hRQTXBs9AxiLnuS/b+hwis5he/nYlRy4PX+DgL8yOA6mAw9RYsqvD59VH0cEIKeBhbTyqWpDBwLPcXjCfG/58X2gf8LrgEQBjA1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Barry, Thanks for the wonderful test program. I have also used other swap test programs as well. A lot of those tests are harder to setup up and run. This test is very quick and simple to run. It can test some hard to hit corner cases for me. I am able to reproduce the warning and the kernel oops with this test progr= am. So for me, I am using it as a functional test that my allocator did not produce a crash. In that regard, it definitely provides value as a function test. Having a fall percentage output is fine, as long as we don't fail the test based on performance number. I am also fine with moving the test to under tools/mm etc. I see good value to include the test in the tree one way or the other. On Wed, Jun 19, 2024 at 5:27=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > From: Barry Song > > Both Ryan and Chris have been utilizing the small test program to aid > in debugging and identifying issues with swap entry allocation. While > a real or intricate workload might be more suitable for assessing the > correctness and effectiveness of the swap allocation policy, a small > test program presents a simpler means of understanding the problem and > initially verifying the improvements being made. > > Let's endeavor to integrate it into the self-test suite. Although it > presently only accommodates 64KB and 4KB, I'm optimistic that we can > expand its capabilities to support multiple sizes and simulate more > complex systems in the future as required. > > Signed-off-by: Barry Song > --- > tools/testing/selftests/mm/Makefile | 1 + > .../selftests/mm/thp_swap_allocator_test.c | 192 ++++++++++++++++++ > 2 files changed, 193 insertions(+) Assume we want to keep it as selftest. You did not add your test in run_vmtests.sh. You might need something like this: --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -418,6 +418,14 @@ CATEGORY=3D"thp" run_test ./khugepaged -s 2 CATEGORY=3D"thp" run_test ./transhuge-stress -d 20 +# config and swapon zram here. + +CATEGORY=3D"thp" run_test ./thp_swap_allocator_test + +CATEGORY=3D"thp" run_test ./thp_swap_allocator_test -s + +# swapoff zram here. + # Try to create XFS if not provided if [ -z "${SPLIT_HUGE_PAGE_TEST_XFS_PATH}" ]; then if test_selected "thp"; then You can use the following XFS test as an example of how to setup the zram s= wap. XFS uses file system mount, you use swapon. Also you need to update the usage string in run_vmtests.sh. BTW, here is how I invoke the test runs: kselftest_override_timeout=3D500 make -C tools/testing/selftests TARGETS=3Dmm run_tests The time out is not for this test, it is for some other test before the thp_swap which exit run_vmtests.sh before hitting thp_swap. I am running in a VM so it is slower than native machine. > create mode 100644 tools/testing/selftests/mm/thp_swap_allocator_test.c > > diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftest= s/mm/Makefile > index e1aa09ddaa3d..64164ad66835 100644 > --- a/tools/testing/selftests/mm/Makefile > +++ b/tools/testing/selftests/mm/Makefile > @@ -65,6 +65,7 @@ TEST_GEN_FILES +=3D mseal_test > TEST_GEN_FILES +=3D seal_elf > TEST_GEN_FILES +=3D on-fault-limit > TEST_GEN_FILES +=3D pagemap_ioctl > +TEST_GEN_FILES +=3D thp_swap_allocator_test > TEST_GEN_FILES +=3D thuge-gen > TEST_GEN_FILES +=3D transhuge-stress > TEST_GEN_FILES +=3D uffd-stress > diff --git a/tools/testing/selftests/mm/thp_swap_allocator_test.c b/tools= /testing/selftests/mm/thp_swap_allocator_test.c > new file mode 100644 > index 000000000000..4443a906d0f8 > --- /dev/null > +++ b/tools/testing/selftests/mm/thp_swap_allocator_test.c > @@ -0,0 +1,192 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > +/* > + * thp_swap_allocator_test > + * > + * The purpose of this test program is helping check if THP swpout > + * can correctly get swap slots to swap out as a whole instead of > + * being split. It randomly releases swap entries through madvise > + * DONTNEED and do swapout on two memory areas: a memory area for > + * 64KB THP and the other area for small folios. The second memory > + * can be enabled by "-s". > + * Before running the program, we need to setup a zRAM or similar > + * swap device by: > + * echo lzo > /sys/block/zram0/comp_algorithm > + * echo 64M > /sys/block/zram0/disksize > + * echo never > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/en= abled > + * echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/ena= bled > + * mkswap /dev/zram0 > + * swapon /dev/zram0 This setup needs to go into run_vmtest.sh as well. Also tear it down after the test. Chris > + * The expected result should be 0% anon swpout fallback ratio w/ or > + * w/o "-s". > + * > + * Author(s): Barry Song > + */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define MEMSIZE_MTHP (60 * 1024 * 1024) > +#define MEMSIZE_SMALLFOLIO (1 * 1024 * 1024) > +#define ALIGNMENT_MTHP (64 * 1024) > +#define ALIGNMENT_SMALLFOLIO (4 * 1024) > +#define TOTAL_DONTNEED_MTHP (16 * 1024 * 1024) > +#define TOTAL_DONTNEED_SMALLFOLIO (768 * 1024) > +#define MTHP_FOLIO_SIZE (64 * 1024) > + > +#define SWPOUT_PATH \ > + "/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpout" > +#define SWPOUT_FALLBACK_PATH \ > + "/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpout_= fallback" > + > +static void *aligned_alloc_mem(size_t size, size_t alignment) > +{ > + void *mem =3D NULL; > + > + if (posix_memalign(&mem, alignment, size) !=3D 0) { > + perror("posix_memalign"); > + return NULL; > + } > + return mem; > +} > + > +static void random_madvise_dontneed(void *mem, size_t mem_size, > + size_t align_size, size_t total_dontneed_size) > +{ > + size_t num_pages =3D total_dontneed_size / align_size; > + size_t i; > + size_t offset; > + void *addr; > + > + for (i =3D 0; i < num_pages; ++i) { > + offset =3D (rand() % (mem_size / align_size)) * align_siz= e; > + addr =3D (char *)mem + offset; > + if (madvise(addr, align_size, MADV_DONTNEED) !=3D 0) > + perror("madvise dontneed"); > + > + memset(addr, 0x11, align_size); > + } > +} > + > +static unsigned long read_stat(const char *path) > +{ > + FILE *file; > + unsigned long value; > + > + file =3D fopen(path, "r"); > + if (!file) { > + perror("fopen"); > + return 0; > + } > + > + if (fscanf(file, "%lu", &value) !=3D 1) { > + perror("fscanf"); > + fclose(file); > + return 0; > + } > + > + fclose(file); > + return value; > +} > + > +int main(int argc, char *argv[]) > +{ > + int use_small_folio =3D 0; > + int i; > + void *mem1 =3D aligned_alloc_mem(MEMSIZE_MTHP, ALIGNMENT_MTHP); > + void *mem2 =3D NULL; > + > + if (mem1 =3D=3D NULL) { > + fprintf(stderr, "Failed to allocate 60MB memory\n"); > + return EXIT_FAILURE; > + } > + > + if (madvise(mem1, MEMSIZE_MTHP, MADV_HUGEPAGE) !=3D 0) { > + perror("madvise hugepage for mem1"); > + free(mem1); > + return EXIT_FAILURE; > + } > + > + for (i =3D 1; i < argc; ++i) { > + if (strcmp(argv[i], "-s") =3D=3D 0) > + use_small_folio =3D 1; > + } > + > + if (use_small_folio) { > + mem2 =3D aligned_alloc_mem(MEMSIZE_SMALLFOLIO, ALIGNMENT_= MTHP); > + if (mem2 =3D=3D NULL) { > + fprintf(stderr, "Failed to allocate 1MB memory\n"= ); > + free(mem1); > + return EXIT_FAILURE; > + } > + > + if (madvise(mem2, MEMSIZE_SMALLFOLIO, MADV_NOHUGEPAGE) != =3D 0) { > + perror("madvise nohugepage for mem2"); > + free(mem1); > + free(mem2); > + return EXIT_FAILURE; > + } > + } > + > + for (i =3D 0; i < 100; ++i) { > + unsigned long initial_swpout; > + unsigned long initial_swpout_fallback; > + unsigned long final_swpout; > + unsigned long final_swpout_fallback; > + unsigned long swpout_inc; > + unsigned long swpout_fallback_inc; > + double fallback_percentage; > + > + initial_swpout =3D read_stat(SWPOUT_PATH); > + initial_swpout_fallback =3D read_stat(SWPOUT_FALLBACK_PAT= H); > + > + random_madvise_dontneed(mem1, MEMSIZE_MTHP, ALIGNMENT_MTH= P, > + TOTAL_DONTNEED_MTHP); > + > + if (use_small_folio) { > + random_madvise_dontneed(mem2, MEMSIZE_SMALLFOLIO, > + ALIGNMENT_SMALLFOLIO, > + TOTAL_DONTNEED_SMALLFOLIO); > + } > + > + if (madvise(mem1, MEMSIZE_MTHP, MADV_PAGEOUT) !=3D 0) { > + perror("madvise pageout for mem1"); > + free(mem1); > + if (mem2 !=3D NULL) > + free(mem2); > + return EXIT_FAILURE; > + } > + > + if (use_small_folio) { > + if (madvise(mem2, MEMSIZE_SMALLFOLIO, MADV_PAGEOU= T) !=3D 0) { > + perror("madvise pageout for mem2"); > + free(mem1); > + free(mem2); > + return EXIT_FAILURE; > + } > + } > + > + final_swpout =3D read_stat(SWPOUT_PATH); > + final_swpout_fallback =3D read_stat(SWPOUT_FALLBACK_PATH)= ; > + > + swpout_inc =3D final_swpout - initial_swpout; > + swpout_fallback_inc =3D final_swpout_fallback - initial_s= wpout_fallback; > + > + fallback_percentage =3D (double)swpout_fallback_inc / > + (swpout_fallback_inc + swpout_inc) * 100; > + > + printf("Iteration %d: swpout inc: %lu, swpout fallback in= c: %lu, Fallback percentage: %.2f%%\n", > + i + 1, swpout_inc, swpout_fallback_inc, f= allback_percentage); Chris > + } > + > + free(mem1); > + if (mem2 !=3D NULL) > + free(mem2); > + > + return EXIT_SUCCESS; > +} > -- > 2.34.1 > >