From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23699C2D0D1 for ; Mon, 24 Jun 2024 07:55:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 892A86B0421; Mon, 24 Jun 2024 03:55:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 841EF6B0422; Mon, 24 Jun 2024 03:55:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E3826B0423; Mon, 24 Jun 2024 03:55:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4CB4D6B0421 for ; Mon, 24 Jun 2024 03:55:19 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id F27C3160BFA for ; Mon, 24 Jun 2024 07:55:18 +0000 (UTC) X-FDA: 82265021916.20.DDF21FB Received: from mail-vk1-f176.google.com (mail-vk1-f176.google.com [209.85.221.176]) by imf19.hostedemail.com (Postfix) with ESMTP id 388181A0014 for ; Mon, 24 Jun 2024 07:55:17 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="f/w4UO44"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.176 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719215702; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3M44QliiGI4wbA+J/OJex8f2IcZehHAYy3iurIa3mQM=; b=NQHCK3Q8Ik7YKbnuU7+tHsiiTxmcshSKgH+Xe/7M9tCxzRoQ/di62RqeDDiWCtcCyG4CwE OP16f1wytfIjIcma8C9Ss0m6WH/TwippHxCyHxoz3dY5K4mT3dfSZ+ibI/2wwk/LDaAkTl PMuN/R1aV+r724ivzysRzLFCeptIPe8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719215702; a=rsa-sha256; cv=none; b=KwUDcvQZK2L21MpPnasCsP6Fy85a85Jw9nya7V03+521uVYrBwvpKS1Oj3G93Y6lN/ft1F 4w1/ZSQpNX3UKSDLMWWfaHQLCd15gMMUj8PRocrlzVMdsXYTdxwpxTi9tIkq5V65mD/csG b6eUZag9BwJEOMYf14xv8f5lSR9Bq3w= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="f/w4UO44"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.176 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-vk1-f176.google.com with SMTP id 71dfb90a1353d-4ef6b14a34eso443924e0c.1 for ; Mon, 24 Jun 2024 00:55:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719215716; x=1719820516; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3M44QliiGI4wbA+J/OJex8f2IcZehHAYy3iurIa3mQM=; b=f/w4UO44dFigmruiq/iJH3ts3bxlqG4T3AJkhAy4twAS1J8zyo/Q7vaJIyVcZE3qd4 jsKWtk3ZExLU6QJtYxmUneVcb/SIkTMgMiQ/BVhIME41R1L9LDC9fP6mOm7FFTDCIThg XWHr8EMuag3B8UKJeGvDVTkT9Bw1Lk2BI4fv4JG0MSKY2SOzMyDTS6K1QOSjHjOnyBGb 9pBIlS0r4BHdXu9ExXQFUEYXCjlIg7Kk3LG7Y0etWbgEUYJag0gpgFGrpyZxrjGFISjD pMN7z+PjwzhGE3tGM2Gg21e5sBwEfyk+bjmTQoQ6CtFnDFcygiJ1fWJ8NbIbSMtobIBF SEfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719215716; x=1719820516; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3M44QliiGI4wbA+J/OJex8f2IcZehHAYy3iurIa3mQM=; b=j4if9N3mwuZrL5BX5sdQ6B9WmPzdz9bYk8PEE2+Kd6+dMDOGl9JbsnBM02buGd/ezz smWi8HqXiZy+cF/YjCuhoaJWc5a8HM7630wc4cxHFYIy5WR7ElzVYtgn435P21BFBLk9 /fe+gIDwypX22F5pzmGAPN/1Osds59sKo6tgE3DDNqXVCJeCVvV5HDPSmZlpORMP1ZHN tfBdvbLNM/xDMNPkuKg7wPTdxKAKBIZYfkjJQ78LzyTcRbJyBqavtIjpJ4cFqs1DLeWd OjN7zbpK87L49fJIGsRRPyvoiM7WB1zBdJcyRtOhphL8ShfzJlGZouNtR67/AvunBzln n+Sg== X-Forwarded-Encrypted: i=1; AJvYcCWyo0NLpVQ0no4+QkDtx8Y8YjDAI9donDqRtF2oq5yenRXFJmYaWOvXw8Q8Jks7YRul4fJBkOzV/sFpJVq7NvcQsC8= X-Gm-Message-State: AOJu0YzSwQWchhMvHPkrLVpN2LMRi4TlIh2/SBBcPlDN3sk7QijstJFM H2Mzen0/JhddK3b3GqKfVCca5MYGHhpulQWTGRuIu1vfjicgIbluRop0zzZ+zIFrms7RczsjG0u zdX6+qRZhpp5Uy2NaSGOikTsW284= X-Google-Smtp-Source: AGHT+IEoT41PuDFqnUrEIizToRPT7FgMJf748qpwaLIXeADY612U8AoFnLspB3mb/g0vYtPvYir2H58zuLI/uLbS4kA= X-Received: by 2002:a05:6122:1ad0:b0:4ef:678e:8a90 with SMTP id 71dfb90a1353d-4ef6d7d7e4dmr2403905e0c.3.1719215716234; Mon, 24 Jun 2024 00:55:16 -0700 (PDT) MIME-Version: 1.0 References: <20240620002648.75204-1-21cnbao@gmail.com> <87cyoa1wgm.fsf@yhuang6-desk2.ccr.corp.intel.com> <878qyv0zwk.fsf@yhuang6-desk2.ccr.corp.intel.com> <871q4m25du.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <871q4m25du.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Barry Song <21cnbao@gmail.com> Date: Mon, 24 Jun 2024 19:55:04 +1200 Message-ID: Subject: Re: [PATCH] selftests/mm: Introduce a test program to assess swap entry allocation for thp_swapout To: "Huang, Ying" Cc: Ryan Roberts , David Hildenbrand , akpm@linux-foundation.org, shuah@kernel.org, linux-mm@kvack.org, chrisl@kernel.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 388181A0014 X-Stat-Signature: 9w8e5xyi5anbconwjjoxxeq5eafxgzrc X-Rspam-User: X-HE-Tag: 1719215717-724371 X-HE-Meta: U2FsdGVkX188iYw3SBHA5HJ2ZcyCCJxANEpCSsXW1I0/omVmW5Z3iop3MJ3KNg0kqWFSwhg0O2s11OvFHwnZTZ59zDHwmBWXF8FxbtESEyFUqYB4HC1VLoCBzg5z9Bo1YxCi8r0+n9BEKK+BONvNbeHbWmcG6BskUk0O/fQIOhLkcbvbbUBVBHMygZcctDfyFtf5afBkBCKvP4eiH+KLui0WiEbfBCW50NP3NOEatcE2FUz1iuk8Uivh1vsCvp6sksqsuDd2/RQPapOhen96jtAQyzf1GpXKnF+bWOOrm35KOkIVByKJYibppa5zWGRM1M661MAd4nwuINM6Kevz0UPMwy301dtm9B91gxOteNiCFImCr7LMh+MYnqA3mE4aF23xvX6hgRB+Dl4lWt5k7S1n9qioRMVzwpZZaXhbj1jZzyzoEazcYEaGBf1l/QOwtLP58hwzr+4wv0VEM6KzZw3c2TC8L0H2qIqB3H8+NcmNPZ9DLkUK2u20JXf8qimjxa/ydevSpKOvz46NPVZ+FmuM3stEageXLs78LVrry7QN18dmVbuGHUppys4XUvlSi6JUqkVTDhuHjHBSBN73ne+bxoZlYNP0Oa9vVuMl/R9Iis3+ipLP/c9dbhujVz5x9a9+NjViC3kKNGUZz9j6+4yOM29abuWZdvViNtQExHNwgqUNM7XwicLZT9S+qcDrk/2BA0wgHoXigePUGWZsC6VDbfldopgFNcEqh8P83vNld38JB380UWlfBH4dtKNpZbMQf1nzDBcCTPc6Z39lfMI42HHPt9q3Ov367oRn/9fJd4Qkp9njg43xkUj5iw1ScZdNybJJ+iFtXRISd4sLPujA94i+gyvs7bSOED9ctOyiugUadN2ZUfyH+aiH8lrrdUwhiUG+ymXoB174qsbHcPzT2mi93Y1Ik6V1Wuk2uftDcwcTxCFnQL04ujr7uRFlVh9u/dgm8JNXqAgwr7b csYw7htI V2Zla5/PRfJlxfvt/Xkf8U0xrUTHdcW2KKOdm94f10wHbbB2gp2D2Incjj9YatrthGJBLshaqV+cBg1aC5B9GhP7s+juxqP0ez0GuheBFNhUwYd3edwwzlOrk4RcEcEH3Gq8WuUpFcjYacIqytBE2GiipR7C8zdaE0tJ9kSNsKYKrAnOnjiuxGf+egemdFsRw6I71yaCX238iRgJnX0Q3xnXUilLnopsic7gHdU0kC+JJWPsVa6/SncV9WKVr1zaJgIYccNQtekbvV0bAJHnVmzljpLGZc03AWaVvTxMrKPy3VV4Inez655yvO+2TubWcEQqQ5xCQMOfCiDjnDaZlDNo9GgWeWYLhrZHuZndPrxRsN4Wbvb+6BqzL41sNLAWi0QBV87gEo9aLGAiKjSugksYM9LJnQ/f90fRCbYCFp8Wlam/GIBAWtS6FqcG1/mtygOsKsZLxigbubS+/o+JvfM+Cr+AQEkzqkCoe0Oudb4MEBrI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 24, 2024 at 7:01=E2=80=AFPM Huang, Ying = wrote: > > Barry Song <21cnbao@gmail.com> writes: > > > On Mon, Jun 24, 2024 at 3:44=E2=80=AFPM Huang, Ying wrote: > >> > >> Barry Song <21cnbao@gmail.com> writes: > >> > >> > On Fri, Jun 21, 2024 at 9:24=E2=80=AFPM Huang, Ying wrote: > >> >> > >> >> Barry Song <21cnbao@gmail.com> writes: > >> >> > >> >> > On Fri, Jun 21, 2024 at 7:25=E2=80=AFPM Ryan Roberts wrote: > >> >> >> > >> >> >> On 20/06/2024 12:34, David Hildenbrand wrote: > >> >> >> > On 20.06.24 11:04, Ryan Roberts wrote: > >> >> >> >> On 20/06/2024 01:26, Barry Song wrote: > >> >> >> >>> From: Barry Song > >> >> >> >>> > >> >> >> >>> Both Ryan and Chris have been utilizing the small test progr= am to aid > >> >> >> >>> in debugging and identifying issues with swap entry allocati= on. While > >> >> >> >>> a real or intricate workload might be more suitable for asse= ssing the > >> >> >> >>> correctness and effectiveness of the swap allocation policy,= a small > >> >> >> >>> test program presents a simpler means of understanding the p= roblem and > >> >> >> >>> initially verifying the improvements being made. > >> >> >> >>> > >> >> >> >>> Let's endeavor to integrate it into the self-test suite. Alt= hough it > >> >> >> >>> presently only accommodates 64KB and 4KB, I'm optimistic tha= t we can > >> >> >> >>> expand its capabilities to support multiple sizes and simula= te more > >> >> >> >>> complex systems in the future as required. > >> >> >> >> > >> >> >> >> I'll try to summarize the thread with Huang Ying by suggestin= g this test program > >> >> >> >> is "neccessary but not sufficient" to exhaustively test the m= THP swap-out path. > >> >> >> >> I've certainly found it useful and think it would be a valuab= le addition to the > >> >> >> >> tree. > >> >> >> >> > >> >> >> >> That said, I'm not convinced it is a selftest; IMO a selftest= should provide a > >> >> >> >> clear pass/fail result against some criteria and must be able= to be run > >> >> >> >> automatically by (e.g.) a CI system. > >> >> >> > > >> >> >> > Likely we should then consider moving other such performance-r= elated thingies > >> >> >> > out of the selftests? > >> >> >> > >> >> >> Yes, that would get my vote. But of the 4 tests you mentioned th= at use > >> >> >> clock_gettime(), it looks like transhuge-stress is the only one = that doesn't > >> >> >> have a pass/fail result, so is probably the only candidate for m= oving. > >> >> >> > >> >> >> The others either use the times as a timeout and determines fail= ure if the > >> >> >> action didn't occur within the timeout (e.g. ksm_tests.c) or use= it to add some > >> >> >> supplemental performance information to an otherwise functionali= ty-oriented test. > >> >> > > >> >> > Thank you very much, Ryan. I think you've found a better home for= this > >> >> > tool . I will > >> >> > send v2, relocating it to tools/mm and adding a function to swap = in > >> >> > either the whole > >> >> > mTHPs or a portion of mTHPs by "-a"(aligned swapin). > >> >> > > >> >> > So basically, we will have > >> >> > > >> >> > 1. Use MADV_PAGEPUT for rapid swap-out, putting the swap allocati= on code under > >> >> > high exercise in a short time. > >> >> > > >> >> > 2. Use MADV_DONTNEED to simulate the behavior of libc and Java he= ap in freeing > >> >> > memory, as well as for munmap, app exits, or OOM killer scenarios= . This ensures > >> >> > new mTHP is always generated, released or swapped out, similar to= the behavior > >> >> > on a PC or Android phone where many applications are frequently s= tarted and > >> >> > terminated. > >> >> > >> >> MADV_DONTNEED 64KB memory, then memset() it, this just simulates th= e > >> >> large folio swap-in exactly, which hasn't been merged by upstream. = I > >> >> don't think that it's a good idea to make such kind of trick. > >> > > >> > I disagree. This is how userspace heaps can manage memory > >> > deallocation. > >> > >> Sorry, I don't understand how. Can you show some examples? Such as > >> strace log with 64KB aligned MADV_DONTNEED? > > > > In Java heap and memory allocators such as jemalloc and Scudo, memory i= s freed > > using the MADV_DONTNEED flag when either free() is called or garbage co= llection > > occurs. In Android, the Java heap is freed in chunks aligned to 64KB > > or larger. > > Originally, I heard about that MADV_FREE is used by jemalloc. Now, I > know that they use MADV_DONTNEED too. Thanks! > > Although I still suspect that libc/java allocator will free pages in > exact 64KB size (IIUC, they should free pages in much larger trunk). I > agree that MADV_DONTNEED is a way to create fragmentation in swap > devices. Right. They don't always free memory in exact 64KB sizes or mTHP size, but we need to define a minimum granularity. Typically, when many objects are freed, they combine into a larger free block, which is then released to kernel all at once. As an example, libc might map lots of 4MB VMAs and classify them into different size categories=E2=80=94some for small objects and others for lar= ger ones. While attempts are made to consolidate adjacent free blocks to reduce system calls, MADV_DONTNEED is often utilized at the minimum granularity for small objects when merging is temporarily impractical - We don't always encounter two or more memory blocks where all the objects have been released :-) > > > In > > Scudo and jemalloc, there is a configuration option to set the > > management granularity. > > This granularity is set to match the mTHP size(though the default > > value is 16KB in the > > latest Android if we don't run mTHP). Otherwise, you could end up with > > millions of > > partial unmap operations, which would severely degrade the performance = of mTHP. > > > > Imagine libc/Java functioning like a slab allocator. When kfree() is > > called, some pages > > may become completely unoccupied and can be returned to the buddy alloc= ator. In > > userspace, memory is given back to the kernel in a similar manner, > > typically using > > MADV_DONTNEED. Therefore, MADV_DONTNEED is the most common memory > > reclamation behavior in Android, coming with free(), delete() or GC. > > > > Imagine a system with extensive malloc, free, new, and delete > > operations, where objects > > are constantly being created and destroyed. > > > > On the other hand, whether libc/Java use MADV_DONTNEED to free memory i= s not > > crucial, although they do. We need a method to simulate the lifecycle > > of applications > > =E2=80=94exiting and starting anew=E2=80=94on PCs or Android phones. It= doesn't matter if you > > use MADV_DONTNEED or munmap to achieve this. > > > > It is important to note that mTHP currently operates on a one-shot > > basis(after swap-out, > > you never get them back as mTHP as we don't support large folios > > swapin). For the test > > program, we need a method to generate new mTHPs continuously. Without t= his, > > after the initial iterations, we would be left with only folios, > > rendering the entire > > test program *pointless*. > > I understand the requirements for new mTHPs. > > >> > >> > Additionally, in the event of an application exit, munmap, or OOM ki= ller, the > >> > amount of freed memory can be much larger than 64KB. The primary pur= pose > >> > of using MADV_DONTNEED is to release anonymous memory and generate > >> > new mTHP so that the iteration can continue. Otherwise, the test pro= gram > >> > becomes entirely pointless, as we only have large folios at the begi= nning. > >> > That is exactly why Chris has failed to find his bugs by using other= small > >> > programs. > >> > >> Although I still don't understand how 64KB aligned MADV_DONTNEED is us= ed > >> for libc/java heap or munmap in a practical way. After more thoughts,= I > >> think 64KB Aligned MADV_DONTNEED can simulate the fragmentation effect > >> of processes exit at some degree if 64KB folios in these processes are > >> swapped out without splitting. If you have no other practical use > >> cases, I suggest to make it explicit with comments in program. > >> > > [snip] > > -- > Best Regards, > Huang, Ying