From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE443C02180 for ; Mon, 13 Jan 2025 22:26:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E91596B0083; Mon, 13 Jan 2025 17:26:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E1A5A6B0085; Mon, 13 Jan 2025 17:26:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C93AD6B0089; Mon, 13 Jan 2025 17:26:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A07ED6B0083 for ; Mon, 13 Jan 2025 17:26:02 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 35F8F1A0433 for ; Mon, 13 Jan 2025 22:26:02 +0000 (UTC) X-FDA: 83003862564.30.AB92DCD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf28.hostedemail.com (Postfix) with ESMTP id BC9FEC000F for ; Mon, 13 Jan 2025 22:25:59 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eNmokml9; spf=pass (imf28.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736807160; a=rsa-sha256; cv=none; b=MKSOwfCO8U/9ywxtHhHZxZpCIv9t6RxFF/N333FWiGqHD5LUgKLBNsyH5ZJdWFdls2UhJj fM5K6PtHRBt1PkWtqBoMfGxHvXVwqfiF6fneV1NFlc7QvrHPESni84kvVRZ2dvqwI5SSVj 9VkQrUBG209snomCdbpgqeQ5MKWosI8= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eNmokml9; spf=pass (imf28.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736807160; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r2pB7OXQlKh8Ni8VGR88OPvGABl7mFpvwTh5wlqGJZw=; b=PTNylo9Ytb+CBZylUYcXmL2M0h35ti05hD5SlZF0iM4rNSZk8HrSZYJpe8D7cn8QwA/tLu fibkvWZaTSSIotGjPzmWVZrLW2hndIgZvsJqOIe8KnBGDmmM3CBEsTz/Mi3jYB41zxWQps qh/+yzP9bJtkpPTqr4AOTxg6MZa2AY4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736807158; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r2pB7OXQlKh8Ni8VGR88OPvGABl7mFpvwTh5wlqGJZw=; b=eNmokml90xJ2v8GNItqaipK8VzwvnLt4V2aZw2iejMvxb0RnklQoNzXTXU5mu4ZHpcO7NW /zOy1AZQ5VBc1ekbehL2nwU5qBxNpZ9wj1VrO1eNucWrVZuuO0ARYT4+c60EsR8ZsvYVds HGfIqTSUZoxrqggUZFngyI09HYjOwHQ= Received: from mail-yb1-f198.google.com (mail-yb1-f198.google.com [209.85.219.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-587-cnJr99JwPuSswtD4XlwrCw-1; Mon, 13 Jan 2025 17:25:57 -0500 X-MC-Unique: cnJr99JwPuSswtD4XlwrCw-1 X-Mimecast-MFC-AGG-ID: cnJr99JwPuSswtD4XlwrCw Received: by mail-yb1-f198.google.com with SMTP id 3f1490d57ef6-e572df6db3eso7315791276.3 for ; Mon, 13 Jan 2025 14:25:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736807157; x=1737411957; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r2pB7OXQlKh8Ni8VGR88OPvGABl7mFpvwTh5wlqGJZw=; b=IN1S3k+xazH9G65r5RHK5v8+hYViO5EgypeFhMJ1LPd0RREyokOqqwtE0TeLza7uP8 Lu6ZFVTTJqd8AnQzsQTPj2iDEbCjEhiU7xI+14wgYTTki+1gmoLnd5UugN6fgqb+xON5 R00z9DgdmgK2pRc5KOHi3hH6Qv5sjSVym7sjMMkVKEbtAs0QvgG61oRAfNiw3yiScZA2 5g5b4UcWzi1R5r2jrp1LntcybBC4gmRNE1pX2/z8oyUyf0pmsDOwaH5DuasdinLZVTWj 6q+RZ+OGs8NXHbLcyA8ZOx3U/j8M6jFZAftaAvsotoLDvv5q0Fl+0wDsKgYoCiwLmj6Y NekQ== X-Forwarded-Encrypted: i=1; AJvYcCWmcQLptD+G3nmdROVpvrHPS5fcBpGnucdJaET7CiVRo9n96r9oHD2vPRVXBuVukb7L28ntIwiTXw==@kvack.org X-Gm-Message-State: AOJu0Yx2M/Sdf+Njpl/BJCfWKqpSiJ1CPcos6Kq+7w8W7KNvRv5jz2Kg OQlGH1Gg0Olun1fZQvm4/lJGDSejfsINf1EyY/wAvRhWZhASVPJRpp/1R5hHlqNHHzir4OryMnd s9dWvvtI9mHMlPLw1FiX+VceQMZQSaz4z8/zrK8JyCWtkTduudcFY8cc3e2QSHrTq27sVxeD7ZI rhFgreH6bSlTwHxBPQHqqE1pU= X-Gm-Gg: ASbGnctPgYFoENPQMYM/Ljik9D1KyvkhL5KjvmlNi0zwseOXrUYD0wucTTGeVMAcD9H ATASF9QWglY8RjBio/UZV43d4NNSekontbXowOoG9vyWEGNtCbm/d X-Received: by 2002:a05:690c:9a0a:b0:6ef:827b:2442 with SMTP id 00721157ae682-6f53130d55fmr197908187b3.35.1736807156833; Mon, 13 Jan 2025 14:25:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IGbZ9N1CKYuh+NE4hsvovuu+5G4TPBF5zQqJQ/q39bBg+g06PxpPbYj/w71wVkAFBhParSSSUOgsyOu3zcf1YY= X-Received: by 2002:a05:690c:9a0a:b0:6ef:827b:2442 with SMTP id 00721157ae682-6f53130d55fmr197907827b3.35.1736807156482; Mon, 13 Jan 2025 14:25:56 -0800 (PST) MIME-Version: 1.0 References: <20250108233128.14484-1-npache@redhat.com> <20250108233128.14484-9-npache@redhat.com> <5d073d55-5495-4b42-b8f1-f8b4a2382a65@arm.com> In-Reply-To: <5d073d55-5495-4b42-b8f1-f8b4a2382a65@arm.com> From: Nico Pache Date: Mon, 13 Jan 2025 15:25:30 -0700 X-Gm-Features: AbW1kvaOmXUh5qIuRoDp8MThQ8XbRiVohxKu-MCmPoU07nXNPvkkWrG_huIzePY Message-ID: Subject: Re: [RFC 08/11] khugepaged: introduce khugepaged_scan_bitmap for mTHP support To: Dev Jain Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Il5-pTLObK9_ftIwxhhBXvQnGkiYH_8k4Q3NdNlb_HU_1736807157 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: BC9FEC000F X-Stat-Signature: zj8a37eneh4e79ymco9uckcn5fq8spup X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1736807159-577376 X-HE-Meta: U2FsdGVkX1/iGW9l1WEj8PEpcpveMxebf/BhD8b07HCwpTDrXOffLo7e+VvC9HfBDMmSh9XRrektVFvBKNsBqzruhX+OSYln3BHDIIkBY3IV0/rw6YuzJTqgdLPBO82WA4iB9Cnv+ccyvETUTwDzWhSReJqxG/EXY/yKNCCEFU88tuPO5cc0PwMDp8cTiUD77fykvgSQIDu9Lwau32wZ/qBsJFyXrD3zXikdtCLf+N3O4pNBASboTd7c8a9fW2uLV3Ar5O4Q0wKjFeq9hH9cUOMQh1Kp9bPCEi0OZeQt1pdmVGYbYAdI/v2CIJjQprIsGrS4F6LA8TAGR9lQRIleUWYyyY5VuDuFwgdHzstYSbZZt/zPUWLyUAAv5/QnJA5WT7T9lRYETLOelxFbYjaNGTRNF8jLG1ioTYr+fC50Sq21imjOhsA7VikCVulzjBWXAP0YYA+nR/XpchY6jmupQisQu/UW4EKaqAiRtG5EhqsTrTxvRAVANak4BGMPiWUAn4ypnPjIZ6UyIpdaLhfD+TWgrFlr+QXZ0lglv3/Io/fSHyWNi2yq1K8GmTAucrlRfkx056cIVbuVBkKVF8yKmWcXnUiJCRts//dLIoFiKhRNc7nD7GIMjxS2SZVMkRMWJ1NGjmkmLar2gPWmcigFMgUr7vkK0wmbvP+6jBgKf8nQTAcpslQc45IfhVb7q3ffzMVnlsyAH+6bV/jY+6L9WhpzcWCKL0uz2fGPROJhn1NJa1Y2MQbMFdYgCFIqC0kxfPe9jxp4iMZc/+jexjT2kcmHn6PDCll5tFjmz753yHOJdBsUmA+yOrGl52rLQ9uSCbYvn50AcCesad0DDR31J42YhMy6Fgz2XVfXI+A9AOX1gyux7nzRb18Rp7kdDR0mhjTgDKJJrd8EmDM8QFTug+ZnkJXg9YdJAaYfJ0sr+V8bqBNUXnVSBdS9T+DhGN+zQemgoSoE28LsQ5w486Y kqbMgg+a 9mPeJJNlvBCDEgEtJESRV7ur/Xs0DxENZrTQTzfhp+oANgO5yta6NXD/nYvtwrXWGTLRgb+ufoiUJ14Jl/GFLkvOFbFjkeRfq8F44OKGFDeDh7eFYnsbFBMmGngKeylGnyDYVdqKSrTcUoX6a44VQPoW9v4b7J1S30IMu9IoTIuJwV6vw6iD3JV5HIT2T6RK6C+aWEvFaQ2eWVCWvzDZBhB2ZvPKBy81ZPUlV3Uur60oYzzpZyKj8Dnrpn6/VOMTN5nuv6juGpeglLdwqqAjVI3+heaztP31rMO8DzgjRMMWxMdrgSvBRCmWMzVDWyllhnIcCPWe7w44kd4uB6NA3iV2yuuME6OgBBFpuLD9T7+KhJiS1zD00QSc60qpudhc0eKO/vbX5xvTqViSk3OBI4Gtdrnhr6vkLC0JzbiJ+30TxdYrmOvFNqG3H7xrS5NS6OLj7LTx7Al38eho= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jan 12, 2025 at 4:23=E2=80=AFAM Dev Jain wrote: > > > > On 11/01/25 3:18 am, Nico Pache wrote: > > On Fri, Jan 10, 2025 at 2:06=E2=80=AFAM Dev Jain wro= te: > >> > >> > >> > >> On 09/01/25 5:01 am, Nico Pache wrote: > >>> khugepaged scans PMD ranges for potential collapse to a hugepage. To = add > >>> mTHP support we use this scan to instead record chunks of fully utili= zed > >>> sections of the PMD. > >>> > >>> create a bitmap to represent a PMD in order MTHP_MIN_ORDER chunks. > >>> by default we will set this to order 3. The reasoning is that for 4K = 512 > >>> PMD size this results in a 64 bit bitmap which has some optimizations= . > >>> For other arches like ARM64 64K, we can set a larger order if needed. > >>> > >>> khugepaged_scan_bitmap uses a stack struct to recursively scan a bitm= ap > >>> that represents chunks of fully utilized regions. We can then determi= ne > >>> what mTHP size fits best and in the following patch, we set this bitm= ap > >>> while scanning the PMD. > >>> > >>> max_ptes_none is used as a scale to determine how "full" an order mus= t > >>> be before being considered for collapse. > >>> > >>> Signed-off-by: Nico Pache > >>> --- > >>> include/linux/khugepaged.h | 4 +- > >>> mm/khugepaged.c | 129 +++++++++++++++++++++++++++++++++= ++-- > >>> 2 files changed, 126 insertions(+), 7 deletions(-) > >>> > >>> diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h > >>> index 1f46046080f5..31cff8aeec4a 100644 > >>> --- a/include/linux/khugepaged.h > >>> +++ b/include/linux/khugepaged.h > >>> @@ -1,7 +1,9 @@ > >>> /* SPDX-License-Identifier: GPL-2.0 */ > >>> #ifndef _LINUX_KHUGEPAGED_H > >>> #define _LINUX_KHUGEPAGED_H > >>> - > >> > >> Nit: I don't think this line needs to be deleted. > >> > >>> +#define MIN_MTHP_ORDER 3 > >>> +#define MIN_MTHP_NR (1< >> > >> Nit: Insert a space: (1 << MIN_MTHP_ORDER) > >> > >>> +#define MTHP_BITMAP_SIZE (1<<(HPAGE_PMD_ORDER - MIN_MTHP_ORDER)) > >>> extern unsigned int khugepaged_max_ptes_none __read_mostly; > >>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE > >>> extern struct attribute_group khugepaged_attr_group; > >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c > >>> index 9eb161b04ee4..de1dc6ea3c71 100644 > >>> --- a/mm/khugepaged.c > >>> +++ b/mm/khugepaged.c > >>> @@ -94,6 +94,11 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash,= MM_SLOTS_HASH_BITS); > >>> > >>> static struct kmem_cache *mm_slot_cache __ro_after_init; > >>> > >>> +struct scan_bit_state { > >>> + u8 order; > >>> + u8 offset; > >>> +}; > >>> + > >>> struct collapse_control { > >>> bool is_khugepaged; > >>> > >>> @@ -102,6 +107,15 @@ struct collapse_control { > >>> > >>> /* nodemask for allocation fallback */ > >>> nodemask_t alloc_nmask; > >>> + > >>> + /* bitmap used to collapse mTHP sizes. 1bit =3D order MIN_MTHP_= ORDER mTHP */ > >>> + unsigned long *mthp_bitmap; > >>> + unsigned long *mthp_bitmap_temp; > >>> + struct scan_bit_state *mthp_bitmap_stack; > >>> +}; > >>> + > >>> +struct collapse_control khugepaged_collapse_control =3D { > >>> + .is_khugepaged =3D true, > >>> }; > >>> > >>> /** > >>> @@ -389,6 +403,25 @@ int __init khugepaged_init(void) > >>> if (!mm_slot_cache) > >>> return -ENOMEM; > >>> > >>> + /* > >>> + * allocate the bitmaps dynamically since MTHP_BITMAP_SIZE is n= ot known at > >>> + * compile time for some architectures. > >>> + */ > >>> + khugepaged_collapse_control.mthp_bitmap =3D kmalloc_array( > >>> + BITS_TO_LONGS(MTHP_BITMAP_SIZE), sizeof(unsigned long),= GFP_KERNEL); > >>> + if (!khugepaged_collapse_control.mthp_bitmap) > >>> + return -ENOMEM; > >>> + > >>> + khugepaged_collapse_control.mthp_bitmap_temp =3D kmalloc_array( > >>> + BITS_TO_LONGS(MTHP_BITMAP_SIZE), sizeof(unsigned long),= GFP_KERNEL); > >>> + if (!khugepaged_collapse_control.mthp_bitmap_temp) > >>> + return -ENOMEM; > >>> + > >>> + khugepaged_collapse_control.mthp_bitmap_stack =3D kmalloc_array= ( > >>> + MTHP_BITMAP_SIZE, sizeof(struct scan_bit_state), GFP_KE= RNEL); > >>> + if (!khugepaged_collapse_control.mthp_bitmap_stack) > >>> + return -ENOMEM; > >>> + > >>> khugepaged_pages_to_scan =3D HPAGE_PMD_NR * 8; > >>> khugepaged_max_ptes_none =3D HPAGE_PMD_NR - 1; > >>> khugepaged_max_ptes_swap =3D HPAGE_PMD_NR / 8; > >>> @@ -400,6 +433,9 @@ int __init khugepaged_init(void) > >>> void __init khugepaged_destroy(void) > >>> { > >>> kmem_cache_destroy(mm_slot_cache); > >>> + kfree(khugepaged_collapse_control.mthp_bitmap); > >>> + kfree(khugepaged_collapse_control.mthp_bitmap_temp); > >>> + kfree(khugepaged_collapse_control.mthp_bitmap_stack); > >>> } > >>> > >>> static inline int khugepaged_test_exit(struct mm_struct *mm) > >>> @@ -850,10 +886,6 @@ static void khugepaged_alloc_sleep(void) > >>> remove_wait_queue(&khugepaged_wait, &wait); > >>> } > >>> > >>> -struct collapse_control khugepaged_collapse_control =3D { > >>> - .is_khugepaged =3D true, > >>> -}; > >>> - > >>> static bool khugepaged_scan_abort(int nid, struct collapse_control= *cc) > >>> { > >>> int i; > >>> @@ -1102,7 +1134,8 @@ static int alloc_charge_folio(struct folio **fo= liop, struct mm_struct *mm, > >>> > >>> static int collapse_huge_page(struct mm_struct *mm, unsigned long = address, > >>> int referenced, int unmapped, > >>> - struct collapse_control *cc) > >>> + struct collapse_control *cc, bool *mmap_l= ocked, > >>> + int order, int offset) > >>> { > >>> LIST_HEAD(compound_pagelist); > >>> pmd_t *pmd, _pmd; > >>> @@ -1115,6 +1148,11 @@ static int collapse_huge_page(struct mm_struct= *mm, unsigned long address, > >>> struct mmu_notifier_range range; > >>> VM_BUG_ON(address & ~HPAGE_PMD_MASK); > >>> > >>> + /* if collapsing mTHPs we may have already released the read_lo= ck, and > >>> + * need to reaquire it to keep the proper locking order. > >>> + */ > >>> + if (!*mmap_locked) > >>> + mmap_read_lock(mm); > >> > >> There is no need to take the read lock again, because we drop it just > >> after this. > > > > collapse_huge_page expects the mmap_lock to already be taken, and it > > returns with it unlocked. If we are collapsing multiple mTHPs under > > the same PMD, then I think we need to reacquire the lock before > > calling unlock on it. > > I cannot figure out a potential place where we drop the lock before > entering collapse_huge_page(). In any case, wouldn't this be better: Let's say we are collapsing two 1024kB mTHPs in a single PMD region. We call collapse_huge_page on the first mTHP and during the collapse the lock is dropped. When the second mTHP collapse is attempted the lock has already been droppe= d. > if (*mmap_locked) > mmap_read_unlock(mm); > > Basically, instead of putting the if condition around the lock, you do > it around the unlock? Yeah that seems much cleaner, Ill give it a try, thanks! > > > > >> > >>> /* > >>> * Before allocating the hugepage, release the mmap_lock read = lock. > >>> * The allocation can take potentially a long time if it invol= ves > >>> @@ -1122,6 +1160,7 @@ static int collapse_huge_page(struct mm_struct = *mm, unsigned long address, > >>> * that. We will recheck the vma after taking it again in writ= e mode. > >>> */ > >>> mmap_read_unlock(mm); > >>> + *mmap_locked =3D false; > >>> > >>> result =3D alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER)= ; > >>> if (result !=3D SCAN_SUCCEED) > >>> @@ -1256,12 +1295,71 @@ static int collapse_huge_page(struct mm_struc= t *mm, unsigned long address, > >>> out_up_write: > >>> mmap_write_unlock(mm); > >>> out_nolock: > >>> + *mmap_locked =3D false; > >>> if (folio) > >>> folio_put(folio); > >>> trace_mm_collapse_huge_page(mm, result =3D=3D SCAN_SUCCEED, re= sult); > >>> return result; > >>> } > >>> > >>> +// Recursive function to consume the bitmap > >>> +static int khugepaged_scan_bitmap(struct mm_struct *mm, unsigned lon= g address, > >>> + int referenced, int unmapped, struct collapse_c= ontrol *cc, > >>> + bool *mmap_locked, unsigned long enabled_orders= ) > >>> +{ > >>> + u8 order, offset; > >>> + int num_chunks; > >>> + int bits_set, max_percent, threshold_bits; > >>> + int next_order, mid_offset; > >>> + int top =3D -1; > >>> + int collapsed =3D 0; > >>> + int ret; > >>> + struct scan_bit_state state; > >>> + > >>> + cc->mthp_bitmap_stack[++top] =3D (struct scan_bit_state) > >>> + { HPAGE_PMD_ORDER - MIN_MTHP_ORDER, 0 }; > >>> + > >>> + while (top >=3D 0) { > >>> + state =3D cc->mthp_bitmap_stack[top--]; > >>> + order =3D state.order; > >>> + offset =3D state.offset; > >>> + num_chunks =3D 1 << order; > >>> + // Skip mTHP orders that are not enabled > >>> + if (!(enabled_orders >> (order + MIN_MTHP_ORDER)) & 1) > >>> + goto next; > >>> + > >>> + // copy the relavant section to a new bitmap > >>> + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitma= p, offset, > >>> + MTHP_BITMAP_SIZE); > >>> + > >>> + bits_set =3D bitmap_weight(cc->mthp_bitmap_temp, num_ch= unks); > >>> + > >>> + // Check if the region is "almost full" based on the th= reshold > >>> + max_percent =3D ((HPAGE_PMD_NR - khugepaged_max_ptes_no= ne - 1) * 100) > >>> + / (HPAGE_PMD_NR - 1); > >>> + threshold_bits =3D (max_percent * num_chunks) / 100; > >>> + > >>> + if (bits_set >=3D threshold_bits) { > >>> + ret =3D collapse_huge_page(mm, address, referen= ced, unmapped, cc, > >>> + mmap_locked, order + MIN_MTHP_O= RDER, offset * MIN_MTHP_NR); > >>> + if (ret =3D=3D SCAN_SUCCEED) > >>> + collapsed +=3D (1 << (order + MIN_MTHP_= ORDER)); > >>> + continue; > >>> + } > >>> + > >>> +next: > >>> + if (order > 0) { > >>> + next_order =3D order - 1; > >>> + mid_offset =3D offset + (num_chunks / 2); > >>> + cc->mthp_bitmap_stack[++top] =3D (struct scan_b= it_state) > >>> + { next_order, mid_offset }; > >>> + cc->mthp_bitmap_stack[++top] =3D (struct scan_b= it_state) > >>> + { next_order, offset }; > >>> + } > >>> + } > >>> + return collapsed; > >>> +} > >>> + > >>> static int khugepaged_scan_pmd(struct mm_struct *mm, > >>> struct vm_area_struct *vma, > >>> unsigned long address, bool *mmap_l= ocked, > >>> @@ -1430,7 +1528,7 @@ static int khugepaged_scan_pmd(struct mm_struct= *mm, > >>> pte_unmap_unlock(pte, ptl); > >>> if (result =3D=3D SCAN_SUCCEED) { > >>> result =3D collapse_huge_page(mm, address, referenced, > >>> - unmapped, cc); > >>> + unmapped, cc, mmap_locked, = HPAGE_PMD_ORDER, 0); > >>> /* collapse_huge_page will return with the mmap_lock r= eleased */ > >>> *mmap_locked =3D false; > >>> } > >>> @@ -2767,6 +2865,21 @@ int madvise_collapse(struct vm_area_struct *vm= a, struct vm_area_struct **prev, > >>> return -ENOMEM; > >>> cc->is_khugepaged =3D false; > >>> > >>> + cc->mthp_bitmap =3D kmalloc_array( > >>> + BITS_TO_LONGS(MTHP_BITMAP_SIZE), sizeof(unsigned long),= GFP_KERNEL); > >>> + if (!cc->mthp_bitmap) > >>> + return -ENOMEM; > >>> + > >>> + cc->mthp_bitmap_temp =3D kmalloc_array( > >>> + BITS_TO_LONGS(MTHP_BITMAP_SIZE), sizeof(unsigned long),= GFP_KERNEL); > >>> + if (!cc->mthp_bitmap_temp) > >>> + return -ENOMEM; > >>> + > >>> + cc->mthp_bitmap_stack =3D kmalloc_array( > >>> + MTHP_BITMAP_SIZE, sizeof(struct scan_bit_state), GFP_KE= RNEL); > >>> + if (!cc->mthp_bitmap_stack) > >>> + return -ENOMEM; > >>> + > >>> mmgrab(mm); > >>> lru_add_drain_all(); > >>> > >>> @@ -2831,8 +2944,12 @@ int madvise_collapse(struct vm_area_struct *vm= a, struct vm_area_struct **prev, > >>> out_nolock: > >>> mmap_assert_locked(mm); > >>> mmdrop(mm); > >>> + kfree(cc->mthp_bitmap); > >>> + kfree(cc->mthp_bitmap_temp); > >>> + kfree(cc->mthp_bitmap_stack); > >>> kfree(cc); > >>> > >>> + > >>> return thps =3D=3D ((hend - hstart) >> HPAGE_PMD_SHIFT) ? 0 > >>> : madvise_collapse_errno(last_fail); > >>> } > >> > > >