From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72A52C30658 for ; Fri, 5 Jul 2024 09:31:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A3B7B6B00A5; Fri, 5 Jul 2024 05:31:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9EC1D6B00A8; Fri, 5 Jul 2024 05:31:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8656D6B00A7; Fri, 5 Jul 2024 05:31:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 67D4A6B0098 for ; Fri, 5 Jul 2024 05:31:15 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 19958417E7 for ; Fri, 5 Jul 2024 09:31:15 +0000 (UTC) X-FDA: 82305180510.22.5E9AD59 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf22.hostedemail.com (Postfix) with ESMTP id 6445EC0011 for ; Fri, 5 Jul 2024 09:31:12 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720171860; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rDIxIa7v9VxoqidYz9aGPs7X39r0K9ul0WxieXSC7zU=; b=0gy4HibO8Y/jBbJnc+lZrBOhKwvU4TxlVga+/UVy8I+C8n4E+Ruf92PeJfcnfHTas86i+/ EKuuTaXTetjBMP1JZJSo2e+irpJv/aJjzY91vLW+UHdMYkaCK3nmUcBkPtZXA4YtDoa+UA NZtleefWqiXJfweuuWdaJtjw5XZe/ck= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720171860; a=rsa-sha256; cv=none; b=v8XnkBD9AIcSAswFmZQBCWW3v0bC/GWJWuQ/u+JsEKwJQNN6U+j9s/KZDElCKQkjDd7pRh 4WQI/IIhLA9iSaEsDAkRDApTHpNnpKwk/lGJegkoQNemHAd7WbW/4G78+0tfh1MhkCzjzS orsfa7TIcqfG1FYQ5JS2OGKrtnHK/ZE= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C8094367; Fri, 5 Jul 2024 02:31:36 -0700 (PDT) Received: from [10.57.74.223] (unknown [10.57.74.223]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 344C93F762; Fri, 5 Jul 2024 02:31:10 -0700 (PDT) Message-ID: <0c183228-44d0-4f77-842a-d6c0bcf37fb1@arm.com> Date: Fri, 5 Jul 2024 10:31:08 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/1] tools/mm: Introduce a tool to assess swap entry allocation for thp_swapout Content-Language: en-GB To: Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, chrisl@kernel.org, linux-mm@kvack.org Cc: david@redhat.com, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, v-songbaohua@oppo.com, ying.huang@intel.com References: <20240622071231.576056-1-21cnbao@gmail.com> <20240622071231.576056-2-21cnbao@gmail.com> From: Ryan Roberts In-Reply-To: <20240622071231.576056-2-21cnbao@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 6445EC0011 X-Stat-Signature: 86x76k6eawwhf513kmeyfp31t4j79xh3 X-HE-Tag: 1720171872-170249 X-HE-Meta: U2FsdGVkX1/ZjVYQnFPelEC3G2Be97xo+F8oNrkPEc4vesgkHynUv6GB9DEnq42ckLu1H/5IdX2FOsabJ2t7sXCOaud/oM40tPN4NVupDhW+MIY1RTsGogfWCFm+3voCTIohQF1xsM9jJiH9rZnqt8QxqCWqGP3eGBNHrRmX0p8Wvxd/zJKpiMMyPWmYC8UUsSL2juh8G4IQNog3rV4mCqXzLIYa4yDgp3/W+ueaBVvcwvLkOJi3S29FRCqErmzQDDfoxk1YL4Ou/QTSjSblmcEBbbLZ/IuhVknx2vG4UtoN1AG/wntIwDspmphfvVsc1qev1CuFAG5+FnE/Ix2l6VX/9m5aMPUqdPOntwUoZzBfrvRz9VRfC0nBpKmNfRaJOzP4XBATtAf099oc0bvqP4CqP6KJWcjMbfKqpNIieTvRjsZuCcMfencf9m1sG06mE5c8mMUqCbBakbu6j5rfd/Bmp0KroGlzRWnBKUg+xT0wfXpYwMWGWFSzI4tJKJqoB97kYTiB2kN7lwmIITGLxbi8i5FxWRckbfBxT1ewS6WSPV8cgwwGcw0pajavXZcdkIRZRjoInjyalyuITDE5pBvQjXBkieq309N/WWWKSoK5uCKGKnLfVCvTyRFhmhM/DGKnoO31MnZfwjBHxvoqYblIOoQrV8xFy7T5yqp2tg9cYA0+W32y0ghhhS9s8oiEIWVeE70Ud7ApB9ocCIyNOWCEGbwM7ZQm8tfAdtf0cHxIB+zZMSOmp1BYMJ4S+0Qw1ywfMKk8uJ7nArJkV2jT5cJ9XSGECg75+mg+nIA1F/TaFGEHAGZ610ybDvGc5aWs1e2JFaDcskPy8PUChnNTC45z9U3eRCEL+Y4UfsucxRvev+ki5oki9soW0PxnwmCF4TCOYd/I2CiRBymCGX6ye27VoQHdpKf9Iju+XuiVk29anxfQJia7xFE0CRTOeujFoQTFX/k2rq0AJB8NveN 9rQmdxCN gump5B2WK5CRzeN5b64+jQ+sb/vLqd5d+0onyrxb788gM1AJL6bDkEtGzYOajIDM5eqEdXo1mG7p2iTHNXT0/QD6qGuIJPjoershKYUQ+nqWAechTMvBXGlDOVrPyS6SymR/astPOIs5HABuyMZvaWCgxflqZO03JUkc9K7M6mrXjXy6y82QN3KA87PSNNjxTyLFuDX8J0eettRvQXqoOGVZtcRLYNMHxlkZZIFCsLCILkKyW4xXqsE/73JliaZNtI6M3P/as3Tomut5kFfolUduIEF+CPPwHQvnyiMXWzYrBf/7i8AsBND93HQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 22/06/2024 08:12, Barry Song wrote: > From: Barry Song > > Both Ryan and Chris have been utilizing the small test program to aid > in debugging and identifying issues with swap entry allocation. While > a real or intricate workload might be more suitable for assessing the > correctness and effectiveness of the swap allocation policy, a small > test program presents a simpler means of understanding the problem and > initially verifying the improvements being made. > > Let's endeavor to integrate it into tools/mm. Although it presently > only accommodates 64KB and 4KB, I'm optimistic that we can expand > its capabilities to support multiple sizes and simulate more > complex systems in the future as required. > > Basically, we have > 1. Use MADV_PAGEPUT for rapid swap-out, putting the swap allocation code > under high exercise in a short time. > 2. Use MADV_DONTNEED to simulate the behavior of libc and Java heap in > freeing memory, as well as for munmap, app exits, or OOM killer scenarios. > This ensures new mTHP is always generated, released or swapped out, similar > to the behavior on a PC or Android phone where many applications are > frequently started and terminated. > 3. Swap in with or without the "-a" option to observe how fragments > due to swap-in and the incoming swap-in of large folios will impact > swap-out fallback. > > Due to 2, we ensure a certain proportion of mTHP. Similarly, because > of 3, we maintain a certain proportion of small folios, as we don't > support large folios swap-in, meaning any swap-in will immediately > result in small folios. Therefore, with both 2 and 3, we automatically > achieve a system containing both mTHP and small folios. Additionally, > 1 provides the ability to continuously swap them out. > > We can also use "-s" to add a dedicated small folios memory area. > > Signed-off-by: Barry Song I note there is an open thread about compilation failure due to missing header include, with specific toolcahin. But once cleared up: Reviewed-by: Ryan Roberts I didn't hit the compile issue so: Tested-by: Ryan Roberts > --- > tools/mm/Makefile | 2 +- > tools/mm/thp_swap_allocator_test.c | 233 +++++++++++++++++++++++++++++ > 2 files changed, 234 insertions(+), 1 deletion(-) > create mode 100644 tools/mm/thp_swap_allocator_test.c > > diff --git a/tools/mm/Makefile b/tools/mm/Makefile > index 7bb03606b9ea..15791c1c5b28 100644 > --- a/tools/mm/Makefile > +++ b/tools/mm/Makefile > @@ -3,7 +3,7 @@ > # > include ../scripts/Makefile.include > > -BUILD_TARGETS=page-types slabinfo page_owner_sort > +BUILD_TARGETS=page-types slabinfo page_owner_sort thp_swap_allocator_test > INSTALL_TARGETS = $(BUILD_TARGETS) thpmaps > > LIB_DIR = ../lib/api > diff --git a/tools/mm/thp_swap_allocator_test.c b/tools/mm/thp_swap_allocator_test.c > new file mode 100644 > index 000000000000..a363bdde55f0 > --- /dev/null > +++ b/tools/mm/thp_swap_allocator_test.c > @@ -0,0 +1,233 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > +/* > + * thp_swap_allocator_test > + * > + * The purpose of this test program is helping check if THP swpout > + * can correctly get swap slots to swap out as a whole instead of > + * being split. It randomly releases swap entries through madvise > + * DONTNEED and swapin/out on two memory areas: a memory area for > + * 64KB THP and the other area for small folios. The second memory > + * can be enabled by "-s". > + * Before running the program, we need to setup a zRAM or similar > + * swap device by: > + * echo lzo > /sys/block/zram0/comp_algorithm > + * echo 64M > /sys/block/zram0/disksize > + * echo never > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled > + * echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled > + * mkswap /dev/zram0 > + * swapon /dev/zram0 > + * The expected result should be 0% anon swpout fallback ratio w/ or > + * w/o "-s". > + * > + * Author(s): Barry Song > + */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define MEMSIZE_MTHP (60 * 1024 * 1024) > +#define MEMSIZE_SMALLFOLIO (4 * 1024 * 1024) > +#define ALIGNMENT_MTHP (64 * 1024) > +#define ALIGNMENT_SMALLFOLIO (4 * 1024) > +#define TOTAL_DONTNEED_MTHP (16 * 1024 * 1024) > +#define TOTAL_DONTNEED_SMALLFOLIO (1 * 1024 * 1024) > +#define MTHP_FOLIO_SIZE (64 * 1024) > + > +#define SWPOUT_PATH \ > + "/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpout" > +#define SWPOUT_FALLBACK_PATH \ > + "/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpout_fallback" > + > +static void *aligned_alloc_mem(size_t size, size_t alignment) > +{ > + void *mem = NULL; > + > + if (posix_memalign(&mem, alignment, size) != 0) { > + perror("posix_memalign"); > + return NULL; > + } > + return mem; > +} > + > +/* > + * This emulates the behavior of native libc and Java heap, > + * as well as process exit and munmap. It helps generate mTHP > + * and ensures that iterations can proceed with mTHP, as we > + * currently don't support large folios swap-in. > + */ > +static void random_madvise_dontneed(void *mem, size_t mem_size, > + size_t align_size, size_t total_dontneed_size) > +{ > + size_t num_pages = total_dontneed_size / align_size; > + size_t i; > + size_t offset; > + void *addr; > + > + for (i = 0; i < num_pages; ++i) { > + offset = (rand() % (mem_size / align_size)) * align_size; > + addr = (char *)mem + offset; > + if (madvise(addr, align_size, MADV_DONTNEED) != 0) > + perror("madvise dontneed"); > + > + memset(addr, 0x11, align_size); > + } > +} > + > +static void random_swapin(void *mem, size_t mem_size, > + size_t align_size, size_t total_swapin_size) > +{ > + size_t num_pages = total_swapin_size / align_size; > + size_t i; > + size_t offset; > + void *addr; > + > + for (i = 0; i < num_pages; ++i) { > + offset = (rand() % (mem_size / align_size)) * align_size; > + addr = (char *)mem + offset; > + memset(addr, 0x11, align_size); > + } > +} > + > +static unsigned long read_stat(const char *path) > +{ > + FILE *file; > + unsigned long value; > + > + file = fopen(path, "r"); > + if (!file) { > + perror("fopen"); > + return 0; > + } > + > + if (fscanf(file, "%lu", &value) != 1) { > + perror("fscanf"); > + fclose(file); > + return 0; > + } > + > + fclose(file); > + return value; > +} > + > +int main(int argc, char *argv[]) > +{ > + int use_small_folio = 0, aligned_swapin = 0; > + void *mem1 = NULL, *mem2 = NULL; > + int i; > + > + for (i = 1; i < argc; ++i) { > + if (strcmp(argv[i], "-s") == 0) > + use_small_folio = 1; > + else if (strcmp(argv[i], "-a") == 0) > + aligned_swapin = 1; > + } > + > + mem1 = aligned_alloc_mem(MEMSIZE_MTHP, ALIGNMENT_MTHP); > + if (mem1 == NULL) { > + fprintf(stderr, "Failed to allocate large folios memory\n"); > + return EXIT_FAILURE; > + } > + > + if (madvise(mem1, MEMSIZE_MTHP, MADV_HUGEPAGE) != 0) { > + perror("madvise hugepage for mem1"); > + free(mem1); > + return EXIT_FAILURE; > + } > + > + if (use_small_folio) { > + mem2 = aligned_alloc_mem(MEMSIZE_SMALLFOLIO, ALIGNMENT_MTHP); > + if (mem2 == NULL) { > + fprintf(stderr, "Failed to allocate small folios memory\n"); > + free(mem1); > + return EXIT_FAILURE; > + } > + > + if (madvise(mem2, MEMSIZE_SMALLFOLIO, MADV_NOHUGEPAGE) != 0) { > + perror("madvise nohugepage for mem2"); > + free(mem1); > + free(mem2); > + return EXIT_FAILURE; > + } > + } > + > + /* warm-up phase to occupy the swapfile */ > + memset(mem1, 0x11, MEMSIZE_MTHP); > + madvise(mem1, MEMSIZE_MTHP, MADV_PAGEOUT); > + if (use_small_folio) { > + memset(mem2, 0x11, MEMSIZE_SMALLFOLIO); > + madvise(mem2, MEMSIZE_SMALLFOLIO, MADV_PAGEOUT); > + } > + > + /* iterations with newly created mTHP, swap-in, and swap-out */ > + for (i = 0; i < 100; ++i) { > + unsigned long initial_swpout; > + unsigned long initial_swpout_fallback; > + unsigned long final_swpout; > + unsigned long final_swpout_fallback; > + unsigned long swpout_inc; > + unsigned long swpout_fallback_inc; > + double fallback_percentage; > + > + initial_swpout = read_stat(SWPOUT_PATH); > + initial_swpout_fallback = read_stat(SWPOUT_FALLBACK_PATH); > + > + /* > + * The following setup creates a 1:1 ratio of mTHP to small folios > + * since large folio swap-in isn't supported yet. Once we support > + * mTHP swap-in, we'll likely need to reduce MEMSIZE_MTHP and > + * increase MEMSIZE_SMALLFOLIO to maintain the ratio. > + */ > + random_swapin(mem1, MEMSIZE_MTHP, > + aligned_swapin ? ALIGNMENT_MTHP : ALIGNMENT_SMALLFOLIO, > + TOTAL_DONTNEED_MTHP); > + random_madvise_dontneed(mem1, MEMSIZE_MTHP, ALIGNMENT_MTHP, > + TOTAL_DONTNEED_MTHP); > + > + if (use_small_folio) { > + random_swapin(mem2, MEMSIZE_SMALLFOLIO, > + ALIGNMENT_SMALLFOLIO, > + TOTAL_DONTNEED_SMALLFOLIO); > + } > + > + if (madvise(mem1, MEMSIZE_MTHP, MADV_PAGEOUT) != 0) { > + perror("madvise pageout for mem1"); > + free(mem1); > + if (mem2 != NULL) > + free(mem2); > + return EXIT_FAILURE; > + } > + > + if (use_small_folio) { > + if (madvise(mem2, MEMSIZE_SMALLFOLIO, MADV_PAGEOUT) != 0) { > + perror("madvise pageout for mem2"); > + free(mem1); > + free(mem2); > + return EXIT_FAILURE; > + } > + } > + > + final_swpout = read_stat(SWPOUT_PATH); > + final_swpout_fallback = read_stat(SWPOUT_FALLBACK_PATH); > + > + swpout_inc = final_swpout - initial_swpout; > + swpout_fallback_inc = final_swpout_fallback - initial_swpout_fallback; > + > + fallback_percentage = (double)swpout_fallback_inc / > + (swpout_fallback_inc + swpout_inc) * 100; > + > + printf("Iteration %d: swpout inc: %lu, swpout fallback inc: %lu, Fallback percentage: %.2f%%\n", > + i + 1, swpout_inc, swpout_fallback_inc, fallback_percentage); > + } > + > + free(mem1); > + if (mem2 != NULL) > + free(mem2); > + > + return EXIT_SUCCESS; > +}