From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 981BEC2BA18 for ; Thu, 20 Jun 2024 06:36:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 13B8C8D00A1; Thu, 20 Jun 2024 02:36:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 09DD88D0091; Thu, 20 Jun 2024 02:36:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E319E8D00A1; Thu, 20 Jun 2024 02:36:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B98908D0091 for ; Thu, 20 Jun 2024 02:36:29 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 228E681781 for ; Thu, 20 Jun 2024 06:36:29 +0000 (UTC) X-FDA: 82250308098.09.B604CF4 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) by imf12.hostedemail.com (Postfix) with ESMTP id F0BC740009 for ; Thu, 20 Jun 2024 06:36:25 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=SwNlduzY; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718865378; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uP9LM4LgMfvVomFqqi0zVhIyF0TjctqzYPt9Gt5a+Ak=; b=J+/3oRZ/G5r4Z765ylCbXrCOYBg+dg4sRgQkoLRQVS6MCYRPwz1jo92pYoeqUJGE+jQAXh 5Ztq4gkHVai2xRwspgQ5Y4ObNYbMT/HHhCixz570IBMuLPG9lmkrotojf9JdgigIqQP1jw QMnq0HCSWgoL0d9914vA38buDRUra3A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718865378; a=rsa-sha256; cv=none; b=op3O4iGKL4mhPmCbVZUsf1YNZHVt2tNlMBAoDw0nk/5WqxrC/HuwmzUUL2egOS82TpLx/X 7/mxK9iYX+DeAOsQQ0RKBiKQebrrQcaBm4JkNJwT7iGl1g1+6rLOCawGJqc6YdZmT5u0Az f3NB0S6VW/+l5tM094gDCv0bkrKioEU= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=SwNlduzY; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718865387; x=1750401387; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=hY2avXCXYFOostNL+nFhiyWysCtGuzYiMdq09ZbZeVg=; b=SwNlduzYFno7ujbnNHVJHT/lplKQXHA5ifz8b6DOhOA7h7lCF7TF0OUm HJbOlS9bjcO/SQG9EoyIUhT0h+UiLOGXc6Rf8ibTcUMjxuV6/0dmaGNrF w+nnjyCISAN0OqGt1AxdG8RPomrupE/8SBlSAro22YCSfiOdgGF5mBsl4 v09hg0mmipPsRvyX6Kqvr0Dxna1y7AYvSwNDSs5zKXO326VQX3/xDG+tc Xi7fhZAPDlfd+m9GaMErWPz8mrtdDUSf07DFpZdvHBLKRE5nxhaeqhez5 lkvKrycA6HKUj7S5eYZAcwBiLRV7WL43Z1pX/W7ZmVBEeRxDUVBV1uaKn g==; X-CSE-ConnectionGUID: u+gIpuvmTcuAbY/7oEuHwA== X-CSE-MsgGUID: IQNQRd1LSn23RJKWgPaGiw== X-IronPort-AV: E=McAfee;i="6700,10204,11108"; a="27244563" X-IronPort-AV: E=Sophos;i="6.08,251,1712646000"; d="scan'208";a="27244563" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2024 23:36:25 -0700 X-CSE-ConnectionGUID: YtQPmDoGTgG7OeOdBE2loA== X-CSE-MsgGUID: vvJhtmAvSjORieukM9FRWA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,251,1712646000"; d="scan'208";a="41960778" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2024 23:36:21 -0700 From: "Huang, Ying" To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, shuah@kernel.org, linux-mm@kvack.org, ryan.roberts@arm.com, chrisl@kernel.org, david@redhat.com, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Barry Song Subject: Re: [PATCH] selftests/mm: Introduce a test program to assess swap entry allocation for thp_swapout In-Reply-To: (Barry Song's message of "Thu, 20 Jun 2024 18:09:45 +1200") References: <20240620002648.75204-1-21cnbao@gmail.com> <87zfrg2xce.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o77w2nrw.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 20 Jun 2024 14:34:29 +0800 Message-ID: <87jzik2kcq.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F0BC740009 X-Stat-Signature: wafbqqkb9ex9jeemtxc33bn1xg3y1tnk X-HE-Tag: 1718865385-29448 X-HE-Meta: U2FsdGVkX1/NjWGt/iWyx2KE56Zi2U+Vznf4Tde3H4Hlx8kWciz6f8BOgkTvhzWsEgiVXU0U7pY5WyC8yq8qUyaeN2C+2E06ZiVO0OEv3ucBX5LoUmnP5dkVUJCPLHD9vtdPHqBISCjSwpf5PKE7L3W5Wojy6x3VW4DNvkmPVDQosv+a1i4+k8WZgGhd5q4JFCPLfBZtigNrgWWRM4slGmvicY99rmiayqldivViFv/B7HxJ+Cwof/UEYeuO7y/tt9gHgq9dVEXGk/THCw/pAL17tUheS4aIuyLewpIFQ/W7Diw6RN8nm1gEU8qXMVyGzd6rVjMg7p6unsfkGVK5jWvGPsHao5YRwiuU03ji/Mtm02WQFOpZn/M955p//rV7HQDPdJGmRtab7mzLqtoRbZ/Xr/aHagj0ltnWmDSs7yyrjhZ0dTyd/NrIkJBlwugcni+olIbgzQizR5f4mf49HZsu/sKN+E3KnVHDFsYL13xMmKu/W/Sa28qv+A2p8E9qBlIH7pHLdhM5NYoTvmRB6cNBYB16fn/JZbGgmw6OzuxNJ2TnydKOxLoA8v+M97Q4OpGY9qR+xL/Jek4yfsejNM3tj5rgd4JaD6w/zFR9MCriAvaR5JhXJNi/YhIOYoYqn4pgq3V/ydMqfaE4kKZUa8i71SokIfiztN1I2uwvHg3D1Zp9zK2JeR/PyPgTdtURRMc1jgXlfo5kP2h1UAjk7yRcMIylRvPZ3aoDTf4xLuPyRigimvRGIz8FKjpxNaruYT/XkBCYpAbVhlJSY/1IRvNRS06L+cAg4VEGguVBgeRbEeYSEEFAT4UIBcPT8aZAaEga8Q9E6CO+rwf51wjEHWDSOhPLs6UYq5/UEQPiQFBj0mBb1C+iACccuzRDvZWHNAiXNlAo1H+k6OpbWvA1+0byx+qlpn3qp4tLTm8pNlG1fqzX4GtkRMJceZ9RGGt55udjO2TFyBqmElCALrN MsVGpuF0 N0PqWYgqW828E90l6vDzjcCqpMOmbLCp9PQHxWQJridaUASMRmmzoXRJI9k+cSJoBUUV/wVmDFgqr4eIEFKC0rxAWg7VycrkpkJTdr1ISH17U6lEgPl+m/MhStHxA8Bmob48vxQFdnPUx83VF1nmD9qF7wY8mZUg2QFMDLJYN4UWpgZXnWbMT80TVucFHsk3z2XWtkbOgdax4N11OOWmZZwbIwUt3UTT+tkUCTofgDN/Kb4KF2lx/dp1C/RTP6/c3uldGiRe4JOZOkUs1dYph6nd4hBHCmKXP8lYakkTtlgRop7zmbdrw96I5Df7nBXyt7xwX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Barry Song <21cnbao@gmail.com> writes: > On Thu, Jun 20, 2024 at 5:22=E2=80=AFPM Huang, Ying wrote: >> >> Barry Song <21cnbao@gmail.com> writes: >> >> > On Thu, Jun 20, 2024 at 1:55=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Barry Song <21cnbao@gmail.com> writes: >> >> >> >> > From: Barry Song >> >> > >> >> > Both Ryan and Chris have been utilizing the small test program to a= id >> >> > in debugging and identifying issues with swap entry allocation. Whi= le >> >> > a real or intricate workload might be more suitable for assessing t= he >> >> > correctness and effectiveness of the swap allocation policy, a small >> >> > test program presents a simpler means of understanding the problem = and >> >> > initially verifying the improvements being made. >> >> > >> >> > Let's endeavor to integrate it into the self-test suite. Although it >> >> > presently only accommodates 64KB and 4KB, I'm optimistic that we can >> >> > expand its capabilities to support multiple sizes and simulate more >> >> > complex systems in the future as required. >> >> >> >> IIUC, this is a performance test program instead of functionality test >> >> program. Does it match the purpose of the kernel selftest? >> > >> > I have a differing perspective. I maintain that the functionality is >> > not functioning >> > as expected. Despite having all the necessary resources for allocation= , failure >> > persists, indicating a lack of functionality. >> >> Is there any user visual functionality issue? > > Definitely not. If a plane can't take off, taking a train and pretending > there's no functionality issue isn't a solution. I always think that performance optimization is great work. However, it is not functionality work. > I have never assigned blame for any mistakes here. On the contrary, > I have 100% appreciation for Ryan's work in at least initiating mTHP > swapout w/o being split. > > It took countless experiments for humans to make airplanes commercially > viable, but the person who created the first flying airplane remains the > greatest. Similarly, Ryan's efforts, combined with your review of his pat= ch, > have enabled us to achieve a better goal here. Without your work, we can't > get here at all. Thanks! > However, this is never a reason to refuse to acknowledge that this feature > is not actually working. It just works for some workloads, not for some others. >> >> >> >> >> > Signed-off-by: Barry Song >> >> > --- >> >> > tools/testing/selftests/mm/Makefile | 1 + >> >> > .../selftests/mm/thp_swap_allocator_test.c | 192 ++++++++++++++= ++++ >> >> > 2 files changed, 193 insertions(+) >> >> > create mode 100644 tools/testing/selftests/mm/thp_swap_allocator_t= est.c >> >> > >> >> > diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/se= lftests/mm/Makefile >> >> > index e1aa09ddaa3d..64164ad66835 100644 >> >> > --- a/tools/testing/selftests/mm/Makefile >> >> > +++ b/tools/testing/selftests/mm/Makefile >> >> > @@ -65,6 +65,7 @@ TEST_GEN_FILES +=3D mseal_test >> >> > TEST_GEN_FILES +=3D seal_elf >> >> > TEST_GEN_FILES +=3D on-fault-limit >> >> > TEST_GEN_FILES +=3D pagemap_ioctl >> >> > +TEST_GEN_FILES +=3D thp_swap_allocator_test >> >> > TEST_GEN_FILES +=3D thuge-gen >> >> > TEST_GEN_FILES +=3D transhuge-stress >> >> > TEST_GEN_FILES +=3D uffd-stress >> >> > diff --git a/tools/testing/selftests/mm/thp_swap_allocator_test.c b= /tools/testing/selftests/mm/thp_swap_allocator_test.c >> >> > new file mode 100644 >> >> > index 000000000000..4443a906d0f8 >> >> > --- /dev/null >> >> > +++ b/tools/testing/selftests/mm/thp_swap_allocator_test.c >> >> > @@ -0,0 +1,192 @@ >> >> > +// SPDX-License-Identifier: GPL-2.0-or-later >> >> > +/* >> >> > + * thp_swap_allocator_test >> >> > + * >> >> > + * The purpose of this test program is helping check if THP swpout >> >> > + * can correctly get swap slots to swap out as a whole instead of >> >> > + * being split. It randomly releases swap entries through madvise >> >> > + * DONTNEED and do swapout on two memory areas: a memory area for >> >> > + * 64KB THP and the other area for small folios. The second memory >> >> > + * can be enabled by "-s". >> >> > + * Before running the program, we need to setup a zRAM or similar >> >> > + * swap device by: >> >> > + * echo lzo > /sys/block/zram0/comp_algorithm >> >> > + * echo 64M > /sys/block/zram0/disksize >> >> > + * echo never > /sys/kernel/mm/transparent_hugepage/hugepages-204= 8kB/enabled >> >> > + * echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64= kB/enabled >> >> > + * mkswap /dev/zram0 >> >> > + * swapon /dev/zram0 >> >> > + * The expected result should be 0% anon swpout fallback ratio w/ = or >> >> > + * w/o "-s". >> >> > + * >> >> > + * Author(s): Barry Song >> >> > + */ >> >> > + >> >> > +#define _GNU_SOURCE >> >> > +#include >> >> > +#include >> >> > +#include >> >> > +#include >> >> > +#include >> >> > +#include >> >> > +#include >> >> > + >> >> > +#define MEMSIZE_MTHP (60 * 1024 * 1024) >> >> > +#define MEMSIZE_SMALLFOLIO (1 * 1024 * 1024) >> >> > +#define ALIGNMENT_MTHP (64 * 1024) >> >> > +#define ALIGNMENT_SMALLFOLIO (4 * 1024) >> >> > +#define TOTAL_DONTNEED_MTHP (16 * 1024 * 1024) >> >> > +#define TOTAL_DONTNEED_SMALLFOLIO (768 * 1024) >> >> > +#define MTHP_FOLIO_SIZE (64 * 1024) >> >> > + >> >> > +#define SWPOUT_PATH \ >> >> > + "/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swp= out" >> >> > +#define SWPOUT_FALLBACK_PATH \ >> >> > + "/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swp= out_fallback" >> >> > + >> >> > +static void *aligned_alloc_mem(size_t size, size_t alignment) >> >> > +{ >> >> > + void *mem =3D NULL; >> >> > + >> >> > + if (posix_memalign(&mem, alignment, size) !=3D 0) { >> >> > + perror("posix_memalign"); >> >> > + return NULL; >> >> > + } >> >> > + return mem; >> >> > +} >> >> > + >> >> > +static void random_madvise_dontneed(void *mem, size_t mem_size, >> >> > + size_t align_size, size_t total_dontneed_size) >> >> > +{ >> >> > + size_t num_pages =3D total_dontneed_size / align_size; >> >> > + size_t i; >> >> > + size_t offset; >> >> > + void *addr; >> >> > + >> >> > + for (i =3D 0; i < num_pages; ++i) { >> >> > + offset =3D (rand() % (mem_size / align_size)) * align= _size; >> >> > + addr =3D (char *)mem + offset; >> >> > + if (madvise(addr, align_size, MADV_DONTNEED) !=3D 0) >> >> > + perror("madvise dontneed"); >> >> >> >> IIUC, this simulates align_size (generally 64KB) swap-in. That is, it >> >> simulate the effect of large size swap-in when it's not available in >> >> kernel. If we have large size swap-in in kernel in the future, this >> >> becomes unnecessary. >> >> >> >> Additionally, we have not reached the consensus that we should always >> >> swap-in with swapped-out size. So, I suspect that this test may not >> >> reflect real situation in the future. Although it doesn't reflect >> >> current situation too. >> > >> > Disagree again. releasing the whole mTHP swaps is the best case. Even = in >> > the best-case scenario, if we fail, it raises concerns for handling po= tentially >> > more challenging situations. >> >> Repeating sequential anonymous pages writing is the best case. > > I define the best case as the scenario with the least chance of creating > fragments within swapfiles for mTHP to swap out. There is no real > difference whether this is done through swapin or madv_dontneed. IMO, swapin is much more important than madv_dontneed. Because most users use swapin automatically, but few use madv_dontneed by hand. So, I think swapin/swapout test is much more important than madv_dontneed. I don't like this test case because madv_dontneed isn't typical or basic. >> >> > I don't find it hard to incorporate additional features into this test >> > program to simulate more intricate scenarios. >> >> IMHO, we don't really need this special purpose test. We can have some >> more general basic tests, for example, sequential anonymous pages >> writing/reading, random anonymous pages writing/reading, and combination >> of them. > > I understand that not all things will be loved by all people. However, be= fore > I sent this patch, Chris mentioned that it has been very helpful for him = and > strongly suggested that I contribute it to the self-test suite. > > By the way, adding sequential and random anonymous pages for > read/write operations is definitely in my plan. The absence of this featu= re > isn't a convincing reason to disregard it. > [snip] -- Best Regards, Huang, Ying