From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D318AE77197 for ; Thu, 9 Jan 2025 06:22:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EFA8D6B0092; Thu, 9 Jan 2025 01:22:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EAAB26B0093; Thu, 9 Jan 2025 01:22:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D99DF6B0096; Thu, 9 Jan 2025 01:22:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BCEED6B0092 for ; Thu, 9 Jan 2025 01:22:39 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2EC34160CBC for ; Thu, 9 Jan 2025 06:22:39 +0000 (UTC) X-FDA: 82986919638.20.71F1240 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf08.hostedemail.com (Postfix) with ESMTP id 3C5DF160016 for ; Thu, 9 Jan 2025 06:22:37 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf08.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736403757; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S8+xZv6tkXHj2kSfQbVV7DA1LT+qEv9G/7cb4OJ/FLs=; b=uLjv9vTn5g5p+YgZP6qJn3Af4Y7IcldhLvTYvuzpeK2yM6E41V4saxbGITQfEaTigPu81u TBszdtHnU1ozS9ebEBU+EsSRqHRULpBN1sHu/tUverY3wU8sgk8x5UE+e0QBoJx9JVkt/0 J3drR1Fj/bXjkOuEEDx6R3Yv9y501Bo= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf08.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736403757; a=rsa-sha256; cv=none; b=OI/9Yeiq9XJ6kct5H1Dnn6M4V992tu1czxIKJSqbsIGgllFKZU/ngNVun3ekX+MELr53CP 5osfY20+0ItrJJPk6SlVYTf04TX6Pnit61zK2QpzsRkQVhwns+ZLPNIQZQzfcffeknTZEO Xtdfe+OQzpUh82rSyN3I1UiAxYD19uo= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 97B8613D5; Wed, 8 Jan 2025 22:23:04 -0800 (PST) Received: from [10.162.43.52] (K4MQJ0H1H2.blr.arm.com [10.162.43.52]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 02B9D3F59E; Wed, 8 Jan 2025 22:22:24 -0800 (PST) Message-ID: <3e3252c2-65fc-45aa-99a1-ed66c31aba12@arm.com> Date: Thu, 9 Jan 2025 11:52:22 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 00/11] khugepaged: mTHP support To: Nico Pache , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org References: <20250108233128.14484-1-npache@redhat.com> Content-Language: en-US From: Dev Jain In-Reply-To: <20250108233128.14484-1-npache@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 3C5DF160016 X-Stat-Signature: n3ag5sypab1douppio1sf6gwzxb4dgg5 X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1736403757-76919 X-HE-Meta: U2FsdGVkX1+m3kcqMXMUbpA52hvkPlGGbjh14is/WSYtbqyP7RbDFLR2Q/9FDuAk4evEEY+9U+hp3BX4prIRt48g7s/z7qOeeQat7hqhKk8gIn3OeMyyPMKqRry9Q9V0zFglUPvs70J6/sahuAQc632BaBWl3f7MoHStBd0PQNiKwDhmePJQ7grCDDAtUMOoVXIWZENDDaM4gybN4nYhrJi19+Efx1geDBmti/0Nuj8U98/imen0OpwYvgpWq807HVKacWPt0Eakzo2+F7eDC+pmjUY62xvq8J8HAS8iX6QxnRU4Tw6DtlgP7GYuwI4BxqkwE2/g+WzIli32UamhSF3KpPjiQAk9BJMjLm8nJED9U2OMvBBLoHe6FlEqc9lTscm3LDeSi6qcNHls9DDFiR0733AnYzodkMuFrA56PWuASWSpKwthm5dkSi4rwjQpA8UfhA/u9BBfez/CTAOrWGCV0F0NY7fD/+jDau+8gLEQByFM75zHSi/oJ2cN8pqoU/J1M/uAflTwZn2ZMpE8rrJGPgbvmc4JubIgyGIq7jBNetx4lG6NNwe7Lamc7fMUy3US8m6yNwB5fq9ztuOurOLFhlwxX7h6oShDNQWM1DPIiA/lmRjzz1jmoeAhVsiebfUz9rPg4s9x6GzRJSKEz+c8NWdUxxXdnrlCU/KYQpH+PDWmkuYtG5E20FjAg85CQiEy+OEiZntQ2SAMzESYy2MeVdffnJ8haIVJZA6N+DEgIV11wjD28xa43xZgO6j8xjM4GsbX+NLXQRv1WdIG396zqpQL8nuxUzeQfHAF0HdTog0Zj/0oIK+7RriVONoiDYJvvh26Q6faCelkAgbv96mfX7OO1BeeDn//urte17ZRJcSII/3MeSHKXn0u0GFAVspkYmmS69qugWh0V3znAkG2Ew7K/knTZvk+s8wtaAkNjyoF+RchiGOsyi9CBWW6aNdewd3/rErtpaci1mx BM9JGygd 6vDUEqI0vU2VHaXVhVVUUPl6FCm4WvoRy3E+zmu3hW4nGlUcgQQqvIyT8Du5L/ZC7nzefHTUJHfL2w2XBjE17XgPR4SCpSqGMyUxO94qk/qBwmepkkpUaz8ao/2pzQbyqHTYQKE6kW31VlTuFEQuUpfSFdoux+LirEoEJU/d/NHphuGpBbGDoHdXmkwgfcMlWxjt+V3uZdYahsz41ObxtURjymCEsdPWbeEBODJnDNcoB8ar3/rVNrXCAxJuhpCBJVgtXppexkYUpxfg7qA71l0xtp40isaWMisKGz/bQUVOE4SY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 09/01/25 5:01 am, Nico Pache wrote: > The following series provides khugepaged and madvise collapse with the > capability to collapse regions to mTHPs. > > To achieve this we generalize the khugepaged functions to no longer depend > on PMD_ORDER. Then during the PMD scan, we keep track of chunks of pages > (defined by MTHP_MIN_ORDER) that are fully utilized. This info is tracked > using a bitmap. After the PMD scan is done, we do binary recursion on the > bitmap to find the optimal mTHP sizes for the PMD range. The restriction > on max_ptes_none is removed during the scan, to make sure we account for > the whole PMD range. max_ptes_none is mapped to a 0-100 range to > determine how full a mTHP order needs to be before collapsing it. > > Some design choices to note: > - bitmap structures are allocated dynamically because on some arch's > (like PowerPC) the value of MTHP_BITMAP_SIZE cannot be computed at > compile time leading to warnings. > - The recursion is masked through a stack structure. > - A MTHP_MIN_ORDER was added to compress the bitmap, and ensure it was > 64bit on x86. This provides some optimization on the bitmap operations. > if other arches/configs that have larger than 512 PTEs per PMD want to > compress their bitmap further we can change this value per arch. > > Patch 1-2: Some refactoring to combine madvise_collapse and khugepaged > Patch 3: A minor "fix"/optimization > Patch 4: Refactor/rename hpage_collapse > Patch 5-7: Generalize khugepaged functions for arbitrary orders > Patch 8-11: The mTHP patches > > This series acts as an alternative to Dev Jain's approach [1]. The two > series differ in a few ways: > - My approach uses a bitmap to store the state of the linear scan_pmd to > then determine potential mTHP batches. Devs incorporates his directly > into the scan, and will try each available order. > - Dev is attempting to optimize the locking, while my approach keeps the > locking changes to a minimum. I believe his changes are not safe for > uffd. > - Dev's changes only work for khugepaged not madvise_collapse (although > i think that was by choice and it could easily support madvise) > - Dev scales all khugepaged sysfs tunables by order, while im removing > the restriction of max_ptes_none and converting it to a scale to > determine a (m)THP threshold. > - Dev turns on khugepaged if any order is available while mine still > only runs if PMDs are enabled. I like Dev's approach and will most > likely do the same in my PATCH posting. > - mTHPs need their ref count updated to 1< > Patch 11 was inspired by one of Dev's changes. > > [1] https://lore.kernel.org/lkml/20241216165105.56185-1-dev.jain@arm.com/ > > Nico Pache (11): > introduce khugepaged_collapse_single_pmd to collapse a single pmd > khugepaged: refactor madvise_collapse and khugepaged_scan_mm_slot > khugepaged: Don't allocate khugepaged mm_slot early > khugepaged: rename hpage_collapse_* to khugepaged_* > khugepaged: generalize hugepage_vma_revalidate for mTHP support > khugepaged: generalize alloc_charge_folio for mTHP support > khugepaged: generalize __collapse_huge_page_* for mTHP support > khugepaged: introduce khugepaged_scan_bitmap for mTHP support > khugepaged: add mTHP support > khugepaged: remove max_ptes_none restriction on the pmd scan > khugepaged: skip collapsing mTHP to smaller orders > > include/linux/khugepaged.h | 4 +- > mm/huge_memory.c | 3 +- > mm/khugepaged.c | 436 +++++++++++++++++++++++++------------ > 3 files changed, 306 insertions(+), 137 deletions(-) Before I take a proper look at your series, can you please include any testing you may have done?