From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 495C2E77188 for ; Fri, 10 Jan 2025 14:54:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8EEF48D0002; Fri, 10 Jan 2025 09:54:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 89EBD8D0001; Fri, 10 Jan 2025 09:54:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 718968D0002; Fri, 10 Jan 2025 09:54:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 49C488D0001 for ; Fri, 10 Jan 2025 09:54:50 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E7897AE646 for ; Fri, 10 Jan 2025 14:54:49 +0000 (UTC) X-FDA: 82991839098.30.A427229 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf17.hostedemail.com (Postfix) with ESMTP id C85F840008 for ; Fri, 10 Jan 2025 14:54:47 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf17.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736520888; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EyTjESmBFqG1al0Z7iPmK+1FtJfPZq+8t5v68t6c1DQ=; b=xjHX2nR2lCCcLAoBug8UuhoezT2g184c4r0svyMSTpSVBfKcCMtNGlhOWnquckTQx16Srw axctAWTvCT9H4JlHpAHEr12HaIxR1exKXmXdqy9Z0ll5BTKMoLPhN1bSpwdxruDgTn20bu jlwSq0XOFaJd41TIzILT+KGYYN4G7/0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736520888; a=rsa-sha256; cv=none; b=dPitJj3UNNvTEOUjmwRJT/Aw7ohDl4JsmMkcJc0DMo/GZxGbCVKEIqPffcDFF+MCTMG+ha SG9bGCg9ytVX0FwRAFdg0QRKyyzMLbMRCHn1GeOAh178xaxgdApEdwAJvHGngvKg1rIuTR +pdjQYapIF9mecLcdHN4QNghgrtADxc= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf17.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EC9AF1477; Fri, 10 Jan 2025 06:55:14 -0800 (PST) Received: from [10.50.66.95] (PW040MKD.arm.com [10.50.66.95]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A379D3F59E; Fri, 10 Jan 2025 06:54:30 -0800 (PST) Message-ID: <27ae4d80-38cd-4d6b-a49c-dad3f0ffbde3@arm.com> Date: Fri, 10 Jan 2025 20:24:25 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 08/11] khugepaged: introduce khugepaged_scan_bitmap for mTHP support To: Nico Pache , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org References: <20250108233128.14484-1-npache@redhat.com> <20250108233128.14484-9-npache@redhat.com> Content-Language: en-US From: Dev Jain In-Reply-To: <20250108233128.14484-9-npache@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: C85F840008 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: xzy4a3qsgjrdxr69gtqz35b9668eakeb X-HE-Tag: 1736520887-366585 X-HE-Meta: U2FsdGVkX19srAU8jRwy0OmCTTF27mhTTePFOPfrcmPk/KzlxapnzGXpyp3aJWwWvXDoUO9muavhNv6mcATS6WiP+wKSggaKx3ovCThkWpjZiKzGPxw5iYEwipWHIFX8fWli+YDHCNOvaFN73fbL9TEVqEJDgkF0DMvzbQ+lEcmr+9km4m95wXAfBBb2ATys8zyHeKo2ynYncbBYPuo5ZYh7RigLxd9co/Ilj179IjcB9+7oVJoYPQd6MtDXpKZnirqm5oUPou539Q1G99sCpTe0JW0ei9PXyEtiaYGFERz84KzzMcw7S88u23rwi3+5yVv/K3vM7RRc3xEzImePbPbJUxxxAAADOsgz/PNwE0j2eeReBZs3a9cKfxFvAOpMZ+j2poF/jL94rdkEJgEd0Agk3+KI4A0Hd35pHkSph5WzFAq/NJtNcDY43nmhrVwwR1q17YmYcm0+KiG3csylUPvYqIJEFF9dnJQ1XDDQ5w+fJxZczkNMN+tRMOD15dGYlYVH1f/qa8xYcvrL143xlVLmVFnJMoOz0hp3s1ZeHbCKNMJ6KeS9Avo0kax0AHm4vaxrBulW/UEdc4Mvu2+1HyzANJ0T+YCYqhQzLD3x3heXjyF1hcw8ZBSWpGisuS3Dz9pzB7MbkCNI2wAn1oeCZOVZ8YeqX78/nAOKQM9HuHaAT0wVrgWNzkQzpUCtWwFoNlHN5cVSD49mqAAjUuvPv+kOIrMp/GS8OJ/mHX4sQrU/PjM6JnGfuQ8Y9SYQp4KfGbc+BAEzvtFSkAUuYqE0Ic0B60cEceKPGAcLLcuRgMzRk/mBIIZNG5HnDwAHftY7VNtrwnqp1piJ9ln1S1ziQ2e4m5IZK8Gvf73o/xCh4mx7HYo5JPPFgucK5V0lsI3US2o1MyBHL3x6mw5wSTXQaD7W0PgoBQRJybEK+Ybx5+hYS5LLGcHN6hKLjpzJX/jbRIO5cf+ah6rYZbc7Bba aBk+S/TZ SVHFzQHJaxnb9TWUTrOao4HYlI3u6z65qB5fXfmE/5eImKUdeVqbbwKUOU4wqF8dQ2OvSjd9y2QnxE19c5fsM+jUeWwo3RTkOwZ2apRxoyA4YRFvp78FWgTZGTKVvsKBLYi5JkizdFyeduBHsufjS+ypITVa2zNDCyj19Ar5wYGoOzL1+vMo6o6UpbCSJrwBIfsde+n4+CdRUvZTNFfRnntONsqh9jw6TLOW7x0ANBlRAPtn0UjF8S7iB5oTkYF6Po4K4U8ScUi7JSasOJXF2vVMDxQxxPb6X9qKz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 09/01/25 5:01 am, Nico Pache wrote: > khugepaged scans PMD ranges for potential collapse to a hugepage. To add > mTHP support we use this scan to instead record chunks of fully utilized > sections of the PMD. > > create a bitmap to represent a PMD in order MTHP_MIN_ORDER chunks. > by default we will set this to order 3. The reasoning is that for 4K 512 > PMD size this results in a 64 bit bitmap which has some optimizations. > For other arches like ARM64 64K, we can set a larger order if needed. > > khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap > that represents chunks of fully utilized regions. We can then determine > what mTHP size fits best and in the following patch, we set this bitmap > while scanning the PMD. > > max_ptes_none is used as a scale to determine how "full" an order must > be before being considered for collapse. > > Signed-off-by: Nico Pache > --- > include/linux/khugepaged.h | 4 +- > mm/khugepaged.c | 129 +++++++++++++++++++++++++++++++++++-- > 2 files changed, 126 insertions(+), 7 deletions(-) > [--snip--] > > +// Recursive function to consume the bitmap > +static int khugepaged_scan_bitmap(struct mm_struct *mm, unsigned long address, > + int referenced, int unmapped, struct collapse_control *cc, > + bool *mmap_locked, unsigned long enabled_orders) > +{ > + u8 order, offset; > + int num_chunks; > + int bits_set, max_percent, threshold_bits; > + int next_order, mid_offset; > + int top = -1; > + int collapsed = 0; > + int ret; > + struct scan_bit_state state; > + > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) > + { HPAGE_PMD_ORDER - MIN_MTHP_ORDER, 0 }; > + > + while (top >= 0) { > + state = cc->mthp_bitmap_stack[top--]; > + order = state.order; > + offset = state.offset; > + num_chunks = 1 << order; > + // Skip mTHP orders that are not enabled > + if (!(enabled_orders >> (order + MIN_MTHP_ORDER)) & 1) > + goto next; > + > + // copy the relavant section to a new bitmap > + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset, > + MTHP_BITMAP_SIZE); > + > + bits_set = bitmap_weight(cc->mthp_bitmap_temp, num_chunks); > + > + // Check if the region is "almost full" based on the threshold > + max_percent = ((HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) * 100) > + / (HPAGE_PMD_NR - 1); > + threshold_bits = (max_percent * num_chunks) / 100; > + > + if (bits_set >= threshold_bits) { > + ret = collapse_huge_page(mm, address, referenced, unmapped, cc, > + mmap_locked, order + MIN_MTHP_ORDER, offset * MIN_MTHP_NR); > + if (ret == SCAN_SUCCEED) > + collapsed += (1 << (order + MIN_MTHP_ORDER)); > + continue; > + } We are going to the lower order when it is not in the allowed mask of orders, or when we are below the threshold. What to do when these conditions do not happen, and the reason for collapse failure is collapse_huge_page()? For example, if you start with a PMD order scan, and collapse_huge_page() fails, then you hit "continue", and then exit the loop because there is nothing else in the stack, so we exit without trying mTHPs. > + > +next: > + if (order > 0) { > + next_order = order - 1; > + mid_offset = offset + (num_chunks / 2); > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) > + { next_order, mid_offset }; > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) > + { next_order, offset }; > + } > + } > + return collapsed; > +} > +