From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 643FEC3ABD8 for ; Fri, 16 May 2025 11:55:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B70376B014D; Fri, 16 May 2025 07:54:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B1E886B014E; Fri, 16 May 2025 07:54:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0DAC6B014F; Fri, 16 May 2025 07:54:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 821E26B014D for ; Fri, 16 May 2025 07:54:57 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 44867C0174 for ; Fri, 16 May 2025 11:54:59 +0000 (UTC) X-FDA: 83448614718.28.FA7451F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf14.hostedemail.com (Postfix) with ESMTP id F300410000C for ; Fri, 16 May 2025 11:54:56 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="GPc2dO1/"; spf=pass (imf14.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747396497; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=16Jg+8bAXH1i+LxBAD34ZM0SrFpYUvJXVipJZYjH7AU=; b=Xrb5R/IKQZ3I3nZrDAE/GzOLt9l2Ic8xM2mJaSnMTHFGc+cTGLZ3lq0jX7L/9yC4DFLqJQ rn/LquviZ/kIIFwbIAO5nBW74m5RYZh2WXSE+6hwTP0YULG40Q0aKc7/y+gqQv/BOpKQ6q h+vQTNzMjwv94LLG/6ZELhAs3D7kVlo= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="GPc2dO1/"; spf=pass (imf14.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747396497; a=rsa-sha256; cv=none; b=0M9Z96ucjfWOW6nxXLsTgYTaLfVbNV+eVRcI0unXst3YLArDidMY9qPXjVVqZS82YQMLxm KOvgDjLVRfRCkfH+R05fXGiBzT2KiEyHXQw5c86jI3ewPyyj8+NEGPhI8/CM5v8KRqJ+Ha qck1JzydjSfpX4fbXCUo9lPU06We4xA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1747396496; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=16Jg+8bAXH1i+LxBAD34ZM0SrFpYUvJXVipJZYjH7AU=; b=GPc2dO1/8f+EuL94RLlXYf2cWzqduyHbA/zR6Zzxf91CdL01sG4JzwzaFWKd7X69dY1ip7 xbb7+XVlG2TRVOIffQcC8ibOef3jwWCWAwEl/c16qmaqahtHFCgfnvG1gKXMKjH97N4t+I jr6iS93+Z9nmCXomofMHggUT1YwtV5M= Received: from mail-yw1-f198.google.com (mail-yw1-f198.google.com [209.85.128.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-588-x1-GfDjINWaKGZWHnC2h8Q-1; Fri, 16 May 2025 07:54:55 -0400 X-MC-Unique: x1-GfDjINWaKGZWHnC2h8Q-1 X-Mimecast-MFC-AGG-ID: x1-GfDjINWaKGZWHnC2h8Q_1747396495 Received: by mail-yw1-f198.google.com with SMTP id 00721157ae682-7075bbaa916so31241457b3.1 for ; Fri, 16 May 2025 04:54:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747396495; x=1748001295; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=16Jg+8bAXH1i+LxBAD34ZM0SrFpYUvJXVipJZYjH7AU=; b=m2j0ZLWxxevvf1dnd9ZNeNtGJ/9YwDiCsWgriFf/XiRCAjPuuEJA9Umq/nJcKdS6mC CaFMn7OwgGV/S/gLEPwRuyaJUzJcp69vYNjJ2T8D8hQkw0cz7UZXgJo8XbHZjVI2bPna CqcDtlTTJuRGIU0KfrRO726C1J5kNawJWUdMSnSsLP8xK/uxljZ0SCdTBhG+W6g7Znfb WHvUarr4eQ0kWUKJD4Tv22H5UBdKxAkuYNoG4l3DzMiB0JqlmKNpP/DsKayXFTJ1b0q3 ZCM8/5WUCgS8cZi7N2uflokMqrTJ8oKnxNSGMYUvEDziq9W1dotpAiBfeub294WAq9L8 GPEg== X-Gm-Message-State: AOJu0Yzo5ZJpsXILMLpIBEzz8/q9zw3qdsdSUrw27+2n/PbmF/JlRYmC 8pyXE3RT8X2EtBo1rO1ZKG6suYbuSWHW1m759f6ZlN2QvIHYFm5idDCWAyEJMdReIFDGy+ktUet fBIm6ERhDSBwOctCOkRbaHPK+bZWOIjGPN9sNqP2uXZVM1PXi+agWICrRwVbZafyzUaIOeVDba+ IrhJGyDbs+OQ6EN5YxHVP2U4Pl5o8= X-Gm-Gg: ASbGnctiE/Eoa7mFi3pBHFONryDmYIa50F1FNWnxY3HNldodyc2MsAv2NCuT+/qAaR4 jVBFVvcYXs3okkF0WvF3lAYggrP1XEyCvr7yTMM/DchC0V+rjc5fGqncpsr9wCSlQ88yd0tQ= X-Received: by 2002:a05:690c:4485:b0:6d4:4a0c:fcf0 with SMTP id 00721157ae682-70caafe9a72mr35139467b3.20.1747396494546; Fri, 16 May 2025 04:54:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHO933KL//yKd9EAbqCltzsYv9mGexjo76OMS6kNwwoXbUhc29kxatND0065IUUezm3J+crstyZYAT7krMjU9A= X-Received: by 2002:a05:690c:4485:b0:6d4:4a0c:fcf0 with SMTP id 00721157ae682-70caafe9a72mr35138877b3.20.1747396494191; Fri, 16 May 2025 04:54:54 -0700 (PDT) MIME-Version: 1.0 References: <20250515030312.125567-1-npache@redhat.com> <55e5169b-2cba-47e0-8e16-ced29ad4d879@arm.com> In-Reply-To: <55e5169b-2cba-47e0-8e16-ced29ad4d879@arm.com> From: Nico Pache Date: Fri, 16 May 2025 05:54:27 -0600 X-Gm-Features: AX0GCFsHX2JXakhYDpEmibc368WjOeo41bDkQIInm9W6AizPHuLbno06hIV02uw Message-ID: Subject: Re: [PATCH v6 00/12] khugepaged: mTHP support To: Dev Jain Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: l1quPvpxQPQY-LkbhTuWhFy2OSzDuov80hUfQNTUgIg_1747396495 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: F300410000C X-Rspamd-Server: rspam09 X-Stat-Signature: qwor5n7buwdw58p44h5cgp77cr55kwtj X-HE-Tag: 1747396496-363555 X-HE-Meta: U2FsdGVkX18FAp7wLw7TiB7C51VPShhIdEwQvMuD/DFDvbVeH0xcl5VC3f9Gvc3nsEqwnX6hwNyScUH+PssxEJA2Zko09o15XwQxjvxH6XWcwzWeg8fuJJiVG01c8O49hds6syircRFTqsHSUbvT6dzzguBL5s++GyzHTEaZUjN7rhSZuslQvuI7XD1Zwi9EoN5HvoUKroWL5YTrBmhm74HmA213+AT9VhmVYMTA6y9mgOD233dxubDM4LsJcnDf7dMHXEMd5A545JjTUkZ8LrIqlfs+jC8tqeK9UuhXzdi/bedGRLCyGf32iLd3DOYwAB1S7iaeIwZ4PXuSveyJVEgrsT0IqyXrd3U8kFHAocXGPopqefsvgF7c1ESV/u0ibZnCL4Bs4UraG4kKhDu7oRBlDLQ1Dhm+GD+Wi70wY7K8wJotvrIWnLNc+bzY+VE2m633okn+KGhKZaac038C0rJCdkdV2dTx41+75vEEasFRdRCwlCT60mX6ABpPXRJc23eMWOZLTV5uSCutxqYq1GbbNgKWC+vBIGhxDVO/EPYr+ejl2TDD6L7B1vBdmtAI7cF9VJvOkiKxKUFkNF08gtVD27X4os7GbA82QiM4kfG2wSMh1St2EP9cus1v/Zi3dYsteHen7mue1Nza/KnvmyHYQ0F8qWAfO/9fiKMXs3JL10xwkVZ4xy74gajcxUokDD0W0FfqHF7zPxBuZ3Eb2BsI2sV7tk+N6WoULfb8zBa+MbCwFQ4An6gAqdDwjeYJyB1449BszTr3pntUDGKSpY/CD8zZcktl+h5u41SONSEIc513W6cV686gclLrMzFfXJJ1XZnxD+49oCyRRd9rQMsUjeAJI+dt6xigoVACUpBsiBypyXqD3yWOelakM8RHKaac8DGaCRtN4BtwxjOpXC2Vletjwf72887jkORxzTeBPbO1M/9P1xqAmPEOqGjv1C4FFzLdDdpnyp711HP GZKiCGvc X60iilqB1p0b+lnaRXiouAEE5leEgcx6lmERYpOUEDLt6UzPDalcvxCnRzliPbF1v/4EVsysmk7KwCG9ugSOwNIp7cFTZ7cs9ytJMEvOC0YZExLmXay9EnuwNmZCbRgMD1hBRs50GmhOatUfLiGGI5N1EeZL7Nz/A4BTH8Y4TalQGc7njxwGmYf8Hye2B9ItE5q9rZpWtvEqcZaDlcoeAJYMUpXFBBtY6z5Cv7Fg1hcpr3Grafcuc7Rtr4lLuBdMU6BPzETyPc6VCfghb/woatwZSHGssycQ3zJL6JZvgvO+a/4PDzsspKLV2SFcfCd5wMUldWJi2zkPKO8JxZeS498bPVqCcqou+m/LgNRUyfs4JMgHwjcYuV5p6n5cZ2u8x8rcshf+GpFX9FS7jcd2hr2MmY4LGgJqjVSP1ubopGTpYbw7Z9ro+z0OYII9jl2iJ08x0KfsfkxqNo9g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 15, 2025 at 12:45=E2=80=AFAM Dev Jain wrote: > > > > On 15/05/25 8:51 am, Nico Pache wrote: > > Ugh... So sorry, I forgot to turn off the chain-reply-to. > > > > resending V7 *facepalm* > > In the future you can just send the same version again with [RESEND] > prefixed in the subject, that prevents confusion. Thanks I'll do that next time. > > > > > On Wed, May 14, 2025 at 9:03=E2=80=AFPM Nico Pache = wrote: > >> > >> The following series provides khugepaged and madvise collapse with the > >> capability to collapse anonymous memory regions to mTHPs. > >> > >> To achieve this we generalize the khugepaged functions to no longer de= pend > >> on PMD_ORDER. Then during the PMD scan, we keep track of chunks of pag= es > >> (defined by KHUGEPAGED_MTHP_MIN_ORDER) that are utilized. This info is > >> tracked using a bitmap. After the PMD scan is done, we do binary recur= sion > >> on the bitmap to find the optimal mTHP sizes for the PMD range. The > >> restriction on max_ptes_none is removed during the scan, to make sure = we > >> account for the whole PMD range. When no mTHP size is enabled, the leg= acy > >> behavior of khugepaged is maintained. max_ptes_none will be scaled by = the > >> attempted collapse order to determine how full a THP must be to be > >> eligible. If a mTHP collapse is attempted, but contains swapped out, o= r > >> shared pages, we dont perform the collapse. > >> > >> With the default max_ptes_none=3D511, the code should keep its most of= its > >> original behavior. To exercise mTHP collapse we need to set > >> max_ptes_none<=3D255. With max_ptes_none > HPAGE_PMD_NR/2 you will > >> experience collapse "creep" and constantly promote mTHPs to the next > >> available size. This is due the fact that it will introduce at least 2= x > >> the number of pages, and on a future scan will satisfy that condition = once > >> again. > >> > >> Patch 1: Refactor/rename hpage_collapse > >> Patch 2: Some refactoring to combine madvise_collapse and khugepag= ed > >> Patch 3-5: Generalize khugepaged functions for arbitrary orders > >> Patch 6-9: The mTHP patches > >> Patch 10-11: Tracing/stats > >> Patch 12: Documentation > >> > >> --------- > >> Testing > >> --------- > >> - Built for x86_64, aarch64, ppc64le, and s390x > >> - selftests mm > >> - I created a test script that I used to push khugepaged to its limits > >> while monitoring a number of stats and tracepoints. The code is > >> available here[1] (Run in legacy mode for these changes and set mt= hp > >> sizes to inherit) > >> The summary from my testings was that there was no significant > >> regression noticed through this test. In some cases my changes had > >> better collapse latencies, and was able to scan more pages in the = same > >> amount of time/work, but for the most part the results were consis= tent. > >> - redis testing. I tested these changes along with my defer changes > >> (see followup post for more details). > >> - some basic testing on 64k page size. > >> - lots of general use. > >> > >> V6 Changes: > >> - Dont release the anon_vma_lock early (like in the PMD case), as not = all > >> pages are isolated. > >> - Define the PTE as null to avoid a uninitilized condition > >> - minor nits and newline cleanup > >> - make sure to unmap and unlock the pte for the swapin case > >> - change the revalidation to always check the PMD order (as this will = make > >> sure that no other VMA spans it) > >> > >> V5 Changes [2]: > >> - switched the order of patches 1 and 2 > >> - fixed some edge cases on the unified madvise_collapse and khugepaged > >> - Explained the "creep" some more in the docs > >> - fix EXCEED_SHARED vs EXCEED_SWAP accounting issue > >> - fix potential highmem issue caused by a early unmap of the PTE > >> > >> V4 Changes: > >> - Rebased onto mm-unstable > >> - small changes to Documentation > >> > >> V3 Changes: > >> - corrected legacy behavior for khugepaged and madvise_collapse > >> - added proper mTHP stat tracking > >> - Minor changes to prevent a nested lock on non-split-lock arches > >> - Took Devs version of alloc_charge_folio as it has the proper stats > >> - Skip cases were trying to collapse to a lower order would still fail > >> - Fixed cases were the bitmap was not being updated properly > >> - Moved Documentation update to this series instead of the defer set > >> - Minor bugs discovered during testing and review > >> - Minor "nit" cleanup > >> > >> V2 Changes: > >> - Minor bug fixes discovered during review and testing > >> - removed dynamic allocations for bitmaps, and made them stack based > >> - Adjusted bitmap offset from u8 to u16 to support 64k pagesize. > >> - Updated trace events to include collapsing order info. > >> - Scaled max_ptes_none by order rather than scaling to a 0-100 scale. > >> - No longer require a chunk to be fully utilized before setting the bi= t. > >> Use the same max_ptes_none scaling principle to achieve this. > >> - Skip mTHP collapse that requires swapin or shared handling. This hel= ps > >> prevent some of the "creep" that was discovered in v1. > >> > >> [1] - https://gitlab.com/npache/khugepaged_mthp_test > >> [2] - https://lore.kernel.org/all/20250428181218.85925-1-npache@redhat= .com/ > >> > >> Dev Jain (1): > >> khugepaged: generalize alloc_charge_folio() > >> > >> Nico Pache (11): > >> khugepaged: rename hpage_collapse_* to khugepaged_* > >> introduce khugepaged_collapse_single_pmd to unify khugepaged and > >> madvise_collapse > >> khugepaged: generalize hugepage_vma_revalidate for mTHP support > >> khugepaged: generalize __collapse_huge_page_* for mTHP support > >> khugepaged: introduce khugepaged_scan_bitmap for mTHP support > >> khugepaged: add mTHP support > >> khugepaged: skip collapsing mTHP to smaller orders > >> khugepaged: avoid unnecessary mTHP collapse attempts > >> khugepaged: improve tracepoints for mTHP orders > >> khugepaged: add per-order mTHP khugepaged stats > >> Documentation: mm: update the admin guide for mTHP collapse > >> > >> Documentation/admin-guide/mm/transhuge.rst | 14 +- > >> include/linux/huge_mm.h | 5 + > >> include/linux/khugepaged.h | 4 + > >> include/trace/events/huge_memory.h | 34 +- > >> mm/huge_memory.c | 11 + > >> mm/khugepaged.c | 472 ++++++++++++++-----= -- > >> 6 files changed, 382 insertions(+), 158 deletions(-) > >> > >> -- > >> 2.49.0 > >> > > >