From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47F5DC2BA18 for ; Thu, 20 Jun 2024 05:22:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C0ED6B00DD; Thu, 20 Jun 2024 01:22:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 570236B00DF; Thu, 20 Jun 2024 01:22:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 438478D0066; Thu, 20 Jun 2024 01:22:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 20C7D6B00DD for ; Thu, 20 Jun 2024 01:22:35 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 56CC2A1766 for ; Thu, 20 Jun 2024 05:22:34 +0000 (UTC) X-FDA: 82250121828.05.04991D5 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) by imf14.hostedemail.com (Postfix) with ESMTP id 7A23E100003 for ; Thu, 20 Jun 2024 05:22:31 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=G6w9dJi1; spf=pass (imf14.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718860947; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jSKaLybNotPS2RGfA+5SSSBhNCSvpyFW/fh9azM67Qw=; b=VO6uO3A1plkEBF9a4YKlW5CxST2R/gy0PfEkgIg3hHw7RKk3t7g+zYos7XQYHXLYOZN5g/ X5HptLMcl6p8uD5d3z14WkZS7M+hmp7n0E8fB2JA2Q6xQeHER5t4KhojQsinWCLYexwHzh MBQ2V5JiyfauT+g9HBjj4llmAiErIM4= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=G6w9dJi1; spf=pass (imf14.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718860947; a=rsa-sha256; cv=none; b=ZQsa9TP6LxXzNwHVAaabx4gKGg7ngEZNi2utZOFlfr7VaK/qIjizvF4LvRoFANAE6oOlfN myhTGny97gJUIQo+WScWjypEYFKa2qiE4ZRQBQntjwxIRZ9p/LivIWyJxtOGz8vhd3kZTa bxoo4WQDL5nzwUWqtDLK4Sa3dyazZGM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718860951; x=1750396951; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=2Q18pMNL3crb1mTBx9dsuEwBMtm1FtCsIi46LVeh/98=; b=G6w9dJi19wg8D9g7fyGnE0sfwzwITa/EmYupOTsDeu/YBVpNBOn/Mtfu 8ODiQMTodqQiqApHfv/Tz1NyCGAfZ/XYMCKgBGVtgt/s1m+NaiorztGFn 8n8iR65FVjdvwz/fRVjCeLa+KRGp+Ra73nn1uiJCIlkI1UzxefJNF5f2e 83EWEJJOtO4HWWLU7ylF1SBMenTNTUtpJaQ++TYaqBFdgPlZIAMyWA4Uf eQSlgNpM3Ots19qnKdKjPbOGpnJPMBta2an9kadeHNZafKOd9SLKt8Ueg ACFF7vPioYqfE5Pzu950FBI2APUeErZckx9PsPwewjNwjP8hP2blId62p A==; X-CSE-ConnectionGUID: xNNYM5n2T16Q5FD/Rqps3w== X-CSE-MsgGUID: zvZW61CMTQqu8rDD/4w4Tw== X-IronPort-AV: E=McAfee;i="6700,10204,11108"; a="15956899" X-IronPort-AV: E=Sophos;i="6.08,251,1712646000"; d="scan'208";a="15956899" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2024 22:22:29 -0700 X-CSE-ConnectionGUID: 8hAzRxWHTF2wL5X3ZI7FgA== X-CSE-MsgGUID: /uCbXq20TxO+zRpuAh9NiA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,251,1712646000"; d="scan'208";a="46648444" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2024 22:22:26 -0700 From: "Huang, Ying" To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, shuah@kernel.org, linux-mm@kvack.org, ryan.roberts@arm.com, chrisl@kernel.org, david@redhat.com, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Barry Song Subject: Re: [PATCH] selftests/mm: Introduce a test program to assess swap entry allocation for thp_swapout In-Reply-To: (Barry Song's message of "Thu, 20 Jun 2024 14:04:03 +1200") References: <20240620002648.75204-1-21cnbao@gmail.com> <87zfrg2xce.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 20 Jun 2024 13:20:35 +0800 Message-ID: <87o77w2nrw.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 7A23E100003 X-Stat-Signature: 41pajnjuwtp1u7uirsb5ysboiyeo49sn X-Rspam-User: X-HE-Tag: 1718860951-995137 X-HE-Meta: U2FsdGVkX184gb+owiavEzSLLENtfKFfIXdDPpe/tv8C9Khok8/SZU9ytYsc7eINInvuxZSgRLbpxQylbkD/gP6BLSkXp2nmq8lRBIn/9Z8xOdq+ELNh/HQHGBS+9gti4VolCG6JzAgF2aCXtb1nzwVryGMu27k0gAplPimwjB8yfjJ0ma+zxLJNxLkiyHPOq3bJJ1ZrJg8Tkq2GlLsifGsDcU8YbdbbAjFg+Zh/aHLxVcGlwVLUXcSJSKbTSxlyegBst3hdmHDiIiynujNBmNqAkW/5VxVEmohgsqEzVLpGfcAnt/sezcuUIV2EFtzIkHHil/Ge3XDN85Se5C80tLvOigapN6yCqyMZ/JkFCtFUqhfNjUURufrZPSmDAad01OllcY3+s3cTK9GntqnRulZgcjDGOV9/rYYFnSXaubmws35P+Cjwz5hWlxtH+iLWK9jjcHHQhqGN+e+gl7VmVuW3wBxgljPEGEiBeUtiEvTYnNiXBRR4J8au3r7nn2ty95ZP1boV+UgyyisjbBXCCkCQLBEX7UgV+IdeK3FGxUl9jHfhhn/KCFgcetFzia9PZhI1r8t2tIKMtlav/pjm2dQRm7j5QRM8stQrDlYJaZml86QLVIuWq2Z++nwExpr88OBG4h3RLteOhRYJnz22gDdsh0UNQ7ytqKnvtwHHhZZOYQJ+yXAAV+kfW3WYWjUdz7PruPxuECv+yCbDm9ULdja1Fnw5g/74UhIYllkAfF58FyQkIA1MRpVeC+0rJnE8jeyfrY9xDQmJeAtuxYw6WpxfpH7sFZGMeHExWGMdfoWmkk2Yx4rPmUEuENEpTQQNDshKpUV5Jepht3I32DbmS5YYFZIwIWtdXvnUN/dS4zgMyD/GQcWMqx3t+b2FS3x7Sl3peJJBo5VrtZiIru3lZ4MG+1zzG9WSpSXn+ideJZMzyjJK5xgDn4EOMdaLz3B+s5LxWXtHqh+abFZKTph mdeWUUQ/ HyQncuiLi1LDC9YwUJ3xll6Qod+WrWLmraafViQpSHUtJY5QNMe+Q4wBFWwnHKbAlbefOWAxp4aS98jPO7hsDyurPKf6BgenxDRdUobQh1cWDK19Y5CY48nYzIB5ZNJNoHWqr/8NqtLsvkMi+BX/q0FTUqV+bAdb1++GvH9Vrn6bqRNKdnhLUCLBiS4cgIVX1cRndE+DmAhe5dLaEH0PeiQcDai1PPvaKLAEbUQmnZXtnSOf9f7YXpzcTmfFxAPacRkA54+Pk9XO3EkonWQySGMHFcLRaNIivKGlDph2adxCpFDK1AEFXKHUg7hu5fcx2j0jD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Barry Song <21cnbao@gmail.com> writes: > On Thu, Jun 20, 2024 at 1:55=E2=80=AFPM Huang, Ying wrote: >> >> Barry Song <21cnbao@gmail.com> writes: >> >> > From: Barry Song >> > >> > Both Ryan and Chris have been utilizing the small test program to aid >> > in debugging and identifying issues with swap entry allocation. While >> > a real or intricate workload might be more suitable for assessing the >> > correctness and effectiveness of the swap allocation policy, a small >> > test program presents a simpler means of understanding the problem and >> > initially verifying the improvements being made. >> > >> > Let's endeavor to integrate it into the self-test suite. Although it >> > presently only accommodates 64KB and 4KB, I'm optimistic that we can >> > expand its capabilities to support multiple sizes and simulate more >> > complex systems in the future as required. >> >> IIUC, this is a performance test program instead of functionality test >> program. Does it match the purpose of the kernel selftest? > > I have a differing perspective. I maintain that the functionality is > not functioning > as expected. Despite having all the necessary resources for allocation, f= ailure > persists, indicating a lack of functionality. Is there any user visual functionality issue? >> >> > Signed-off-by: Barry Song >> > --- >> > tools/testing/selftests/mm/Makefile | 1 + >> > .../selftests/mm/thp_swap_allocator_test.c | 192 ++++++++++++++++++ >> > 2 files changed, 193 insertions(+) >> > create mode 100644 tools/testing/selftests/mm/thp_swap_allocator_test= .c >> > >> > diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selft= ests/mm/Makefile >> > index e1aa09ddaa3d..64164ad66835 100644 >> > --- a/tools/testing/selftests/mm/Makefile >> > +++ b/tools/testing/selftests/mm/Makefile >> > @@ -65,6 +65,7 @@ TEST_GEN_FILES +=3D mseal_test >> > TEST_GEN_FILES +=3D seal_elf >> > TEST_GEN_FILES +=3D on-fault-limit >> > TEST_GEN_FILES +=3D pagemap_ioctl >> > +TEST_GEN_FILES +=3D thp_swap_allocator_test >> > TEST_GEN_FILES +=3D thuge-gen >> > TEST_GEN_FILES +=3D transhuge-stress >> > TEST_GEN_FILES +=3D uffd-stress >> > diff --git a/tools/testing/selftests/mm/thp_swap_allocator_test.c b/to= ols/testing/selftests/mm/thp_swap_allocator_test.c >> > new file mode 100644 >> > index 000000000000..4443a906d0f8 >> > --- /dev/null >> > +++ b/tools/testing/selftests/mm/thp_swap_allocator_test.c >> > @@ -0,0 +1,192 @@ >> > +// SPDX-License-Identifier: GPL-2.0-or-later >> > +/* >> > + * thp_swap_allocator_test >> > + * >> > + * The purpose of this test program is helping check if THP swpout >> > + * can correctly get swap slots to swap out as a whole instead of >> > + * being split. It randomly releases swap entries through madvise >> > + * DONTNEED and do swapout on two memory areas: a memory area for >> > + * 64KB THP and the other area for small folios. The second memory >> > + * can be enabled by "-s". >> > + * Before running the program, we need to setup a zRAM or similar >> > + * swap device by: >> > + * echo lzo > /sys/block/zram0/comp_algorithm >> > + * echo 64M > /sys/block/zram0/disksize >> > + * echo never > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB= /enabled >> > + * echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/= enabled >> > + * mkswap /dev/zram0 >> > + * swapon /dev/zram0 >> > + * The expected result should be 0% anon swpout fallback ratio w/ or >> > + * w/o "-s". >> > + * >> > + * Author(s): Barry Song >> > + */ >> > + >> > +#define _GNU_SOURCE >> > +#include >> > +#include >> > +#include >> > +#include >> > +#include >> > +#include >> > +#include >> > + >> > +#define MEMSIZE_MTHP (60 * 1024 * 1024) >> > +#define MEMSIZE_SMALLFOLIO (1 * 1024 * 1024) >> > +#define ALIGNMENT_MTHP (64 * 1024) >> > +#define ALIGNMENT_SMALLFOLIO (4 * 1024) >> > +#define TOTAL_DONTNEED_MTHP (16 * 1024 * 1024) >> > +#define TOTAL_DONTNEED_SMALLFOLIO (768 * 1024) >> > +#define MTHP_FOLIO_SIZE (64 * 1024) >> > + >> > +#define SWPOUT_PATH \ >> > + "/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpout" >> > +#define SWPOUT_FALLBACK_PATH \ >> > + "/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpout= _fallback" >> > + >> > +static void *aligned_alloc_mem(size_t size, size_t alignment) >> > +{ >> > + void *mem =3D NULL; >> > + >> > + if (posix_memalign(&mem, alignment, size) !=3D 0) { >> > + perror("posix_memalign"); >> > + return NULL; >> > + } >> > + return mem; >> > +} >> > + >> > +static void random_madvise_dontneed(void *mem, size_t mem_size, >> > + size_t align_size, size_t total_dontneed_size) >> > +{ >> > + size_t num_pages =3D total_dontneed_size / align_size; >> > + size_t i; >> > + size_t offset; >> > + void *addr; >> > + >> > + for (i =3D 0; i < num_pages; ++i) { >> > + offset =3D (rand() % (mem_size / align_size)) * align_si= ze; >> > + addr =3D (char *)mem + offset; >> > + if (madvise(addr, align_size, MADV_DONTNEED) !=3D 0) >> > + perror("madvise dontneed"); >> >> IIUC, this simulates align_size (generally 64KB) swap-in. That is, it >> simulate the effect of large size swap-in when it's not available in >> kernel. If we have large size swap-in in kernel in the future, this >> becomes unnecessary. >> >> Additionally, we have not reached the consensus that we should always >> swap-in with swapped-out size. So, I suspect that this test may not >> reflect real situation in the future. Although it doesn't reflect >> current situation too. > > Disagree again. releasing the whole mTHP swaps is the best case. Even in > the best-case scenario, if we fail, it raises concerns for handling poten= tially > more challenging situations. Repeating sequential anonymous pages writing is the best case. > I don't find it hard to incorporate additional features into this test > program to simulate more intricate scenarios. IMHO, we don't really need this special purpose test. We can have some more general basic tests, for example, sequential anonymous pages writing/reading, random anonymous pages writing/reading, and combination of them. -- Best Regards, Huang, Ying >> >> > + >> > + memset(addr, 0x11, align_size); >> > + } >> > +} >> > + >> >> [snip] >> >> -- >> Best Regards, >> Huang, Ying > > Thanks > Barry