From: Nico Pache <npache@redhat.com>
To: Dev Jain <dev.jain@arm.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
ryan.roberts@arm.com, anshuman.khandual@arm.com,
catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz,
mhocko@suse.com, apopple@nvidia.com,
dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org,
jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com,
hughd@google.com, aneesh.kumar@kernel.org,
yang@os.amperecomputing.com, peterx@redhat.com,
ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com,
jglisse@google.com, surenb@google.com, vishal.moola@gmail.com,
zokeefe@google.com, zhengqi.arch@bytedance.com,
jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org,
kirill.shutemov@linux.intel.com, david@redhat.com,
aarcange@redhat.com, raquini@redhat.com, sunnanyong@huawei.com,
usamaarif642@gmail.com, audra@redhat.com,
akpm@linux-foundation.org
Subject: Re: [RFC 08/11] khugepaged: introduce khugepaged_scan_bitmap for mTHP support
Date: Fri, 10 Jan 2025 14:48:38 -0700 [thread overview]
Message-ID: <CAA1CXcD94JSfG+NZFEwrd+8rtwYW7=ANHjKT2WPpaBv4AuCabA@mail.gmail.com> (raw)
In-Reply-To: <27ae4d80-38cd-4d6b-a49c-dad3f0ffbde3@arm.com>
On Fri, Jan 10, 2025 at 7:54 AM Dev Jain <dev.jain@arm.com> wrote:
>
>
>
> On 09/01/25 5:01 am, Nico Pache wrote:
> > khugepaged scans PMD ranges for potential collapse to a hugepage. To add
> > mTHP support we use this scan to instead record chunks of fully utilized
> > sections of the PMD.
> >
> > create a bitmap to represent a PMD in order MTHP_MIN_ORDER chunks.
> > by default we will set this to order 3. The reasoning is that for 4K 512
> > PMD size this results in a 64 bit bitmap which has some optimizations.
> > For other arches like ARM64 64K, we can set a larger order if needed.
> >
> > khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap
> > that represents chunks of fully utilized regions. We can then determine
> > what mTHP size fits best and in the following patch, we set this bitmap
> > while scanning the PMD.
> >
> > max_ptes_none is used as a scale to determine how "full" an order must
> > be before being considered for collapse.
> >
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> > include/linux/khugepaged.h | 4 +-
> > mm/khugepaged.c | 129 +++++++++++++++++++++++++++++++++++--
> > 2 files changed, 126 insertions(+), 7 deletions(-)
> >
>
> [--snip--]
>
> >
> > +// Recursive function to consume the bitmap
> > +static int khugepaged_scan_bitmap(struct mm_struct *mm, unsigned long address,
> > + int referenced, int unmapped, struct collapse_control *cc,
> > + bool *mmap_locked, unsigned long enabled_orders)
> > +{
> > + u8 order, offset;
> > + int num_chunks;
> > + int bits_set, max_percent, threshold_bits;
> > + int next_order, mid_offset;
> > + int top = -1;
> > + int collapsed = 0;
> > + int ret;
> > + struct scan_bit_state state;
> > +
> > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
> > + { HPAGE_PMD_ORDER - MIN_MTHP_ORDER, 0 };
> > +
> > + while (top >= 0) {
> > + state = cc->mthp_bitmap_stack[top--];
> > + order = state.order;
> > + offset = state.offset;
> > + num_chunks = 1 << order;
> > + // Skip mTHP orders that are not enabled
> > + if (!(enabled_orders >> (order + MIN_MTHP_ORDER)) & 1)
> > + goto next;
> > +
> > + // copy the relavant section to a new bitmap
> > + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset,
> > + MTHP_BITMAP_SIZE);
> > +
> > + bits_set = bitmap_weight(cc->mthp_bitmap_temp, num_chunks);
> > +
> > + // Check if the region is "almost full" based on the threshold
> > + max_percent = ((HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) * 100)
> > + / (HPAGE_PMD_NR - 1);
> > + threshold_bits = (max_percent * num_chunks) / 100;
> > +
> > + if (bits_set >= threshold_bits) {
> > + ret = collapse_huge_page(mm, address, referenced, unmapped, cc,
> > + mmap_locked, order + MIN_MTHP_ORDER, offset * MIN_MTHP_NR);
> > + if (ret == SCAN_SUCCEED)
> > + collapsed += (1 << (order + MIN_MTHP_ORDER));
> > + continue;
> > + }
>
> We are going to the lower order when it is not in the allowed mask of
> orders, or when we are below the threshold. What to do when these
> conditions do not happen, and the reason for collapse failure is
> collapse_huge_page()? For example, if you start with a PMD order scan,
> and collapse_huge_page() fails, then you hit "continue", and then exit
> the loop because there is nothing else in the stack, so we exit without
> trying mTHPs.
Thanks for catching that, I introduced that bug when I went from the
recursion to stack based approach.
This should only continue on SCAN_SUCCEED. If not it needs to go next:
I think I also need to handle the case where nothing succeeds in
khugepaged_scan_pmd.
>
> > +
> > +next:
> > + if (order > 0) {
> > + next_order = order - 1;
> > + mid_offset = offset + (num_chunks / 2);
> > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
> > + { next_order, mid_offset };
> > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
> > + { next_order, offset };
> > + }
> > + }
> > + return collapsed;
> > +}
> > +
>
next prev parent reply other threads:[~2025-01-10 21:49 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-08 23:31 [RFC 00/11] khugepaged: " Nico Pache
2025-01-08 23:31 ` [RFC 01/11] introduce khugepaged_collapse_single_pmd to collapse a single pmd Nico Pache
2025-01-10 6:25 ` Dev Jain
2025-01-08 23:31 ` [RFC 02/11] khugepaged: refactor madvise_collapse and khugepaged_scan_mm_slot Nico Pache
2025-01-08 23:31 ` [RFC 03/11] khugepaged: Don't allocate khugepaged mm_slot early Nico Pache
2025-01-10 6:11 ` Dev Jain
2025-01-10 19:37 ` Nico Pache
2025-01-08 23:31 ` [RFC 04/11] khugepaged: rename hpage_collapse_* to khugepaged_* Nico Pache
2025-01-08 23:31 ` [RFC 05/11] khugepaged: generalize hugepage_vma_revalidate for mTHP support Nico Pache
2025-01-08 23:31 ` [RFC 06/11] khugepaged: generalize alloc_charge_folio " Nico Pache
2025-01-10 6:23 ` Dev Jain
2025-01-10 19:41 ` Nico Pache
2025-01-08 23:31 ` [RFC 07/11] khugepaged: generalize __collapse_huge_page_* " Nico Pache
2025-01-10 6:38 ` Dev Jain
2025-01-08 23:31 ` [RFC 08/11] khugepaged: introduce khugepaged_scan_bitmap " Nico Pache
2025-01-10 9:05 ` Dev Jain
2025-01-10 21:48 ` Nico Pache
2025-01-12 11:23 ` Dev Jain
2025-01-13 22:25 ` Nico Pache
2025-01-10 14:54 ` Dev Jain
2025-01-10 21:48 ` Nico Pache [this message]
2025-01-12 15:13 ` Dev Jain
2025-01-12 16:41 ` Dev Jain
2025-01-08 23:31 ` [RFC 09/11] khugepaged: add " Nico Pache
2025-01-10 9:20 ` Dev Jain
2025-01-10 13:36 ` Dev Jain
2025-01-08 23:31 ` [RFC 10/11] khugepaged: remove max_ptes_none restriction on the pmd scan Nico Pache
2025-01-08 23:31 ` [RFC 11/11] khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2025-01-09 6:22 ` [RFC 00/11] khugepaged: mTHP support Dev Jain
2025-01-10 2:27 ` Nico Pache
2025-01-10 4:56 ` Dev Jain
2025-01-10 22:01 ` Nico Pache
2025-01-12 14:11 ` Dev Jain
2025-01-13 23:00 ` Nico Pache
2025-01-09 6:27 ` Dev Jain
2025-01-10 1:28 ` Nico Pache
2025-01-16 9:47 ` Ryan Roberts
2025-01-16 20:53 ` Nico Pache
2025-01-20 5:17 ` Dev Jain
2025-01-23 20:24 ` Nico Pache
2025-01-24 7:13 ` Dev Jain
2025-01-24 7:38 ` Dev Jain
2025-01-20 12:49 ` Ryan Roberts
2025-01-23 20:42 ` Nico Pache
2025-01-20 12:54 ` David Hildenbrand
2025-01-20 13:37 ` Ryan Roberts
2025-01-20 13:56 ` David Hildenbrand
2025-01-20 16:27 ` Ryan Roberts
2025-01-20 18:39 ` David Hildenbrand
2025-01-21 9:48 ` Ryan Roberts
2025-01-21 10:19 ` David Hildenbrand
2025-01-27 9:31 ` Dev Jain
2025-01-22 5:18 ` Dev Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAA1CXcD94JSfG+NZFEwrd+8rtwYW7=ANHjKT2WPpaBv4AuCabA@mail.gmail.com' \
--to=npache@redhat.com \
--cc=21cnbao@gmail.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=anshuman.khandual@arm.com \
--cc=apopple@nvidia.com \
--cc=audra@redhat.com \
--cc=baohua@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=haowenchao22@gmail.com \
--cc=hughd@google.com \
--cc=ioworker0@gmail.com \
--cc=jack@suse.cz \
--cc=jglisse@google.com \
--cc=jhubbard@nvidia.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=peterx@redhat.com \
--cc=raquini@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=srivatsa@csail.mit.edu \
--cc=sunnanyong@huawei.com \
--cc=surenb@google.com \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox