From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE6BCC2BD09 for ; Mon, 24 Jun 2024 03:44:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56A1A6B02A8; Sun, 23 Jun 2024 23:44:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 540996B02E1; Sun, 23 Jun 2024 23:44:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E1E16B02EE; Sun, 23 Jun 2024 23:44:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1C9386B02A8 for ; Sun, 23 Jun 2024 23:44:51 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id ACC3CA0414 for ; Mon, 24 Jun 2024 03:44:50 +0000 (UTC) X-FDA: 82264390740.06.AF95B51 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by imf04.hostedemail.com (Postfix) with ESMTP id CC31940004 for ; Mon, 24 Jun 2024 03:44:47 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MtUuhLoa; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf04.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.13 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719200677; a=rsa-sha256; cv=none; b=ow1rENTzMNGSA+7AhyJp5qFrv9s5ni9toRsmJQiuPRPyVE+DrYW1LqLm2iCI1lUPZnrf6V pb6o+r3N6xsrYMkfHNCd3Ufwiux12E0ZWEAqixn9sshjm4c3lgRSBgNRk+aU0wIcEbgDLr Hy/S6/2w0KY8pznPHgNFi9GyljtqGU8= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MtUuhLoa; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf04.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.13 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719200677; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r6SL3i1Au6hs7HosGtTyNlDZ7mUzP9pwk04NIO67hCI=; b=sRyrQTQNidwj8KE6I6em2ZaCVENUwQ9dkJV7Z0bpX0JcowiVwvKWVafcLCiyuRu57uJakI YB4O3rtt0DHJ1gATPzWRhoOw8oxqf7d8wSIpOjyuCgfrcD1i/JYhXMholkZvyaH9G6hi0W FJLZJazuoyxnzfQUowxJEMdcyicZ7XA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719200688; x=1750736688; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=WFvlK0QDfHGuvHUJ7K90wbNAELTGJ7ciK90MEAQK2iQ=; b=MtUuhLoavbfeyb1v951JiQwUQRP0YvOKC8656SvrYQrokfFhU6Geo4qn z8RvxZYd/p53CmGczA+OOB9+sAkdVzqsa24b7jGQa4QPfZfheT5Vqbv9F cJ0u6hqoZdoCoUVXbY2RbeHqL4p3BDne4tb6qVDTdx6h2WRYlxseJutR7 lRJarfnFratbT444R/Xm6xeteRiJbLD1KlCVaDx757FM9NYRVRFkbPQdm 9HrsZCN31IyJRz9RGFr/VB0CI4ct+uRwAfUXC9JhgQl08JifrmSvFMfIE n2mA3ApQK28Ngn/uK0vV7BXWcmbYmWTTsUzrBPOgZ+WK5TRU1pyuL/JEn w==; X-CSE-ConnectionGUID: oXSmQKNSTWKzBXM7MR+5Dg== X-CSE-MsgGUID: SNb8Wy0GRvGfyEoty53EfQ== X-IronPort-AV: E=McAfee;i="6700,10204,11112"; a="19058002" X-IronPort-AV: E=Sophos;i="6.08,261,1712646000"; d="scan'208";a="19058002" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2024 20:44:46 -0700 X-CSE-ConnectionGUID: /g9UOsk8SDOYU0UgIF/DTQ== X-CSE-MsgGUID: Rs5v1uNDSsCCC7HqVsV7aA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,261,1712646000"; d="scan'208";a="47693440" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2024 20:44:42 -0700 From: "Huang, Ying" To: Barry Song <21cnbao@gmail.com> Cc: Ryan Roberts , David Hildenbrand , akpm@linux-foundation.org, shuah@kernel.org, linux-mm@kvack.org, chrisl@kernel.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Barry Song Subject: Re: [PATCH] selftests/mm: Introduce a test program to assess swap entry allocation for thp_swapout In-Reply-To: (Barry Song's message of "Fri, 21 Jun 2024 21:43:20 +1200") References: <20240620002648.75204-1-21cnbao@gmail.com> <87cyoa1wgm.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Mon, 24 Jun 2024 11:42:51 +0800 Message-ID: <878qyv0zwk.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: CC31940004 X-Stat-Signature: d38kqzw1txrh61kug1wjfgbgssj76bff X-Rspam-User: X-HE-Tag: 1719200687-72666 X-HE-Meta: U2FsdGVkX185b9BE9s/HHHLkBn1l95HAkavLNRqYBkdzUUCPgY07vCT0KM9AaEjAtIc+xvs6uf1+eQrBfQrlL3I3GRVF2ncFeco2iH7gIm26CMVX8aZxWux6mkllcm7txWc378NCDrv365xpVgTnxmQK8XH736l7f3/CbHDTQzlT6QoyxEj8657ivqsCCzrsK4KM7zk0b6fSf9wjD/U31q584Zq5GA9Ij/6P2YuSur9ca340t5u+SaO+f9AznnWwNgn/tsWq2ZM01cGyYa4Xn8Gbj42NOLY0OUbH2wdgdW4AGYMax5MvxRAEm4BrnM26Mk9TCelDqok1hhEUJq1uehDcF24rIAfuYGHYjdKpenr3ZaLquB/pv7oxg4jVBC+LQHFy4FB5U0osbIQGtwa/+h235+fIBp2GDcXGdsI6qhuPeJ3Gz0LBKjUHnVs6XQEGFFYs+FTY7hHppVb4It3g0lpPcQNLgzt6RS+pz9eWIiZA7E/3mfwoHn6XegnjqPNMwpLjMcKqKFTnHddwJQDTItXbNQj1wDhj/kjXLGtfAh9InSc8/60SCRnjLi0OAl+HRnYEAGURocMPZUSo3SVYCzq13LkBEQzLwnd2qJutbUZ3sqKpJ1bfvXVLJ3toLMSOiJcwvcN4eDDa0qRFOEJ2fmGLRDmrmmhw29G1rGzCDa72JqCkQJY0ngEaIaHmYE7wDBKaXffMaw60Qz3vVw//8ZbhF+D/5Y4py8iFELXua13ysJxTX5CGwripurIVFx9wivtScCbCO8pf9IeX0Ix4LISUewMvrX0ZaShlVROpyHtCe0PM6/B4LwvDLrXDrZuCfarzUfXCSXNXn7N4LBYbznSVF06I4uzYMNF9sp05CoWnfMBjg0JeBaBrllVIYO0s6zXKu6wMf3I+V9g4tuyfnW9G2jRMRCKxgAw8+6cvonHGzTEUAkjdksT8DfKKFgIIrPMdbHkQSCHoL3tJujf cjZewziG d6k+FbFrEsC1VS9/rL1LZZ+KjZBY+eS0eBkDDlJrtI9C6y9+8VCOqaKC9sJyjPXQJmHD986fD9Z+Dsfb/8GtKVvpf+B+tfZnqZYbVdeoDqnHDuCw7P9aJh6zoJrPSnQ2IFvF+NyxIrA/9luZZblZSzk7PGf4O2u4LlGXqiTkWHA1dTiqMI9WQ0UX4tjXBMGgz6bt/M5UhMN89QP0toAFWfxcfDhthxbglRlUwQvLM270OF/vmGc6mTcuYrmZnm45t+D1mOkavd+PpA9FGS4mxgaxA9x9dPudHo4z+WDV621McUmF30EqW86AiRbHQmJM1bRGdBHzPhuawuZKQrssLY5FT8iY18kvL7cMEuNGlW23GgAY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Barry Song <21cnbao@gmail.com> writes: > On Fri, Jun 21, 2024 at 9:24=E2=80=AFPM Huang, Ying wrote: >> >> Barry Song <21cnbao@gmail.com> writes: >> >> > On Fri, Jun 21, 2024 at 7:25=E2=80=AFPM Ryan Roberts wrote: >> >> >> >> On 20/06/2024 12:34, David Hildenbrand wrote: >> >> > On 20.06.24 11:04, Ryan Roberts wrote: >> >> >> On 20/06/2024 01:26, Barry Song wrote: >> >> >>> From: Barry Song >> >> >>> >> >> >>> Both Ryan and Chris have been utilizing the small test program to= aid >> >> >>> in debugging and identifying issues with swap entry allocation. W= hile >> >> >>> a real or intricate workload might be more suitable for assessing= the >> >> >>> correctness and effectiveness of the swap allocation policy, a sm= all >> >> >>> test program presents a simpler means of understanding the proble= m and >> >> >>> initially verifying the improvements being made. >> >> >>> >> >> >>> Let's endeavor to integrate it into the self-test suite. Although= it >> >> >>> presently only accommodates 64KB and 4KB, I'm optimistic that we = can >> >> >>> expand its capabilities to support multiple sizes and simulate mo= re >> >> >>> complex systems in the future as required. >> >> >> >> >> >> I'll try to summarize the thread with Huang Ying by suggesting thi= s test program >> >> >> is "neccessary but not sufficient" to exhaustively test the mTHP s= wap-out path. >> >> >> I've certainly found it useful and think it would be a valuable ad= dition to the >> >> >> tree. >> >> >> >> >> >> That said, I'm not convinced it is a selftest; IMO a selftest shou= ld provide a >> >> >> clear pass/fail result against some criteria and must be able to b= e run >> >> >> automatically by (e.g.) a CI system. >> >> > >> >> > Likely we should then consider moving other such performance-relate= d thingies >> >> > out of the selftests? >> >> >> >> Yes, that would get my vote. But of the 4 tests you mentioned that use >> >> clock_gettime(), it looks like transhuge-stress is the only one that = doesn't >> >> have a pass/fail result, so is probably the only candidate for moving. >> >> >> >> The others either use the times as a timeout and determines failure i= f the >> >> action didn't occur within the timeout (e.g. ksm_tests.c) or use it t= o add some >> >> supplemental performance information to an otherwise functionality-or= iented test. >> > >> > Thank you very much, Ryan. I think you've found a better home for this >> > tool . I will >> > send v2, relocating it to tools/mm and adding a function to swap in >> > either the whole >> > mTHPs or a portion of mTHPs by "-a"(aligned swapin). >> > >> > So basically, we will have >> > >> > 1. Use MADV_PAGEPUT for rapid swap-out, putting the swap allocation co= de under >> > high exercise in a short time. >> > >> > 2. Use MADV_DONTNEED to simulate the behavior of libc and Java heap in= freeing >> > memory, as well as for munmap, app exits, or OOM killer scenarios. Thi= s ensures >> > new mTHP is always generated, released or swapped out, similar to the = behavior >> > on a PC or Android phone where many applications are frequently starte= d and >> > terminated. >> >> MADV_DONTNEED 64KB memory, then memset() it, this just simulates the >> large folio swap-in exactly, which hasn't been merged by upstream. I >> don't think that it's a good idea to make such kind of trick. > > I disagree. This is how userspace heaps can manage memory > deallocation. Sorry, I don't understand how. Can you show some examples? Such as strace log with 64KB aligned MADV_DONTNEED? > Additionally, in the event of an application exit, munmap, or OOM killer,= the > amount of freed memory can be much larger than 64KB. The primary purpose > of using MADV_DONTNEED is to release anonymous memory and generate > new mTHP so that the iteration can continue. Otherwise, the test program > becomes entirely pointless, as we only have large folios at the beginning. > That is exactly why Chris has failed to find his bugs by using other small > programs. Although I still don't understand how 64KB aligned MADV_DONTNEED is used for libc/java heap or munmap in a practical way. After more thoughts, I think 64KB Aligned MADV_DONTNEED can simulate the fragmentation effect of processes exit at some degree if 64KB folios in these processes are swapped out without splitting. If you have no other practical use cases, I suggest to make it explicit with comments in program. > On the other hand, we definitely want large folios swap-in, otherwise, mT= HP > is just a toy to Android or similar system where more than 2/3 memory cou= ld > be in swap. We do NOT want single-use mTHP. I agree that large folios swap-in has its value at least in some situations. Whether we should take it as default behavior is another topic, we can discuss it further in the future. >> >> > 3. Swap in with or without the "-a" option to observe how fragments >> > due to swap-in >> > and the incoming swap-in of large folios will impact swap-out fallback. >> >> It's good to create fragmentation with swap-in. Which is more practical >> and future-proof. And, I believe that we can reduce large folio >> swap-out fallback rate without the large folio swap-in trick. >> >> > And many thanks to Chris for the suggestion on improving it within >> > selftest, though I >> > prefer to place it in tools/mm. -- Best Regards, Huang, Ying