linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nico Pache <npache@redhat.com>
To: Dev Jain <dev.jain@arm.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	ryan.roberts@arm.com,  anshuman.khandual@arm.com,
	catalin.marinas@arm.com, cl@gentwo.org,  vbabka@suse.cz,
	mhocko@suse.com, apopple@nvidia.com,
	 dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org,
	jack@suse.cz,  srivatsa@csail.mit.edu, haowenchao22@gmail.com,
	hughd@google.com,  aneesh.kumar@kernel.org,
	yang@os.amperecomputing.com, peterx@redhat.com,
	 ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com,
	 jglisse@google.com, surenb@google.com, vishal.moola@gmail.com,
	 zokeefe@google.com, zhengqi.arch@bytedance.com,
	jhubbard@nvidia.com,  21cnbao@gmail.com, willy@infradead.org,
	kirill.shutemov@linux.intel.com,  david@redhat.com,
	aarcange@redhat.com, raquini@redhat.com,  sunnanyong@huawei.com,
	usamaarif642@gmail.com, audra@redhat.com,
	 akpm@linux-foundation.org
Subject: Re: [RFC 08/11] khugepaged: introduce khugepaged_scan_bitmap for mTHP support
Date: Fri, 10 Jan 2025 14:48:38 -0700	[thread overview]
Message-ID: <CAA1CXcD94JSfG+NZFEwrd+8rtwYW7=ANHjKT2WPpaBv4AuCabA@mail.gmail.com> (raw)
In-Reply-To: <27ae4d80-38cd-4d6b-a49c-dad3f0ffbde3@arm.com>

On Fri, Jan 10, 2025 at 7:54 AM Dev Jain <dev.jain@arm.com> wrote:
>
>
>
> On 09/01/25 5:01 am, Nico Pache wrote:
> > khugepaged scans PMD ranges for potential collapse to a hugepage. To add
> > mTHP support we use this scan to instead record chunks of fully utilized
> > sections of the PMD.
> >
> > create a bitmap to represent a PMD in order MTHP_MIN_ORDER chunks.
> > by default we will set this to order 3. The reasoning is that for 4K 512
> > PMD size this results in a 64 bit bitmap which has some optimizations.
> > For other arches like ARM64 64K, we can set a larger order if needed.
> >
> > khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap
> > that represents chunks of fully utilized regions. We can then determine
> > what mTHP size fits best and in the following patch, we set this bitmap
> > while scanning the PMD.
> >
> > max_ptes_none is used as a scale to determine how "full" an order must
> > be before being considered for collapse.
> >
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> >   include/linux/khugepaged.h |   4 +-
> >   mm/khugepaged.c            | 129 +++++++++++++++++++++++++++++++++++--
> >   2 files changed, 126 insertions(+), 7 deletions(-)
> >
>
> [--snip--]
>
> >
> > +// Recursive function to consume the bitmap
> > +static int khugepaged_scan_bitmap(struct mm_struct *mm, unsigned long address,
> > +                     int referenced, int unmapped, struct collapse_control *cc,
> > +                     bool *mmap_locked, unsigned long enabled_orders)
> > +{
> > +     u8 order, offset;
> > +     int num_chunks;
> > +     int bits_set, max_percent, threshold_bits;
> > +     int next_order, mid_offset;
> > +     int top = -1;
> > +     int collapsed = 0;
> > +     int ret;
> > +     struct scan_bit_state state;
> > +
> > +     cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
> > +             { HPAGE_PMD_ORDER - MIN_MTHP_ORDER, 0 };
> > +
> > +     while (top >= 0) {
> > +             state = cc->mthp_bitmap_stack[top--];
> > +             order = state.order;
> > +             offset = state.offset;
> > +             num_chunks = 1 << order;
> > +             // Skip mTHP orders that are not enabled
> > +             if (!(enabled_orders >> (order +  MIN_MTHP_ORDER)) & 1)
> > +                     goto next;
> > +
> > +             // copy the relavant section to a new bitmap
> > +             bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset,
> > +                               MTHP_BITMAP_SIZE);
> > +
> > +             bits_set = bitmap_weight(cc->mthp_bitmap_temp, num_chunks);
> > +
> > +             // Check if the region is "almost full" based on the threshold
> > +             max_percent = ((HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) * 100)
> > +                                             / (HPAGE_PMD_NR - 1);
> > +             threshold_bits = (max_percent * num_chunks) / 100;
> > +
> > +             if (bits_set >= threshold_bits) {
> > +                     ret = collapse_huge_page(mm, address, referenced, unmapped, cc,
> > +                                     mmap_locked, order + MIN_MTHP_ORDER, offset * MIN_MTHP_NR);
> > +                     if (ret == SCAN_SUCCEED)
> > +                             collapsed += (1 << (order + MIN_MTHP_ORDER));
> > +                     continue;
> > +             }
>
> We are going to the lower order when it is not in the allowed mask of
> orders, or when we are below the threshold. What to do when these
> conditions do not happen, and the reason for collapse failure is
> collapse_huge_page()? For example, if you start with a PMD order scan,
> and collapse_huge_page() fails, then you hit "continue", and then exit
> the loop because there is nothing else in the stack, so we exit without
> trying mTHPs.

Thanks for catching that, I introduced that bug when I went from the
recursion to stack based approach.
This should only continue on SCAN_SUCCEED. If not it needs to go next:

I think I also need to handle the case where nothing succeeds in
khugepaged_scan_pmd.


>
> > +
> > +next:
> > +             if (order > 0) {
> > +                     next_order = order - 1;
> > +                     mid_offset = offset + (num_chunks / 2);
> > +                     cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
> > +                             { next_order, mid_offset };
> > +                     cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
> > +                             { next_order, offset };
> > +                     }
> > +     }
> > +     return collapsed;
> > +}
> > +
>



  reply	other threads:[~2025-01-10 21:49 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-08 23:31 [RFC 00/11] khugepaged: " Nico Pache
2025-01-08 23:31 ` [RFC 01/11] introduce khugepaged_collapse_single_pmd to collapse a single pmd Nico Pache
2025-01-10  6:25   ` Dev Jain
2025-01-08 23:31 ` [RFC 02/11] khugepaged: refactor madvise_collapse and khugepaged_scan_mm_slot Nico Pache
2025-01-08 23:31 ` [RFC 03/11] khugepaged: Don't allocate khugepaged mm_slot early Nico Pache
2025-01-10  6:11   ` Dev Jain
2025-01-10 19:37     ` Nico Pache
2025-01-08 23:31 ` [RFC 04/11] khugepaged: rename hpage_collapse_* to khugepaged_* Nico Pache
2025-01-08 23:31 ` [RFC 05/11] khugepaged: generalize hugepage_vma_revalidate for mTHP support Nico Pache
2025-01-08 23:31 ` [RFC 06/11] khugepaged: generalize alloc_charge_folio " Nico Pache
2025-01-10  6:23   ` Dev Jain
2025-01-10 19:41     ` Nico Pache
2025-01-08 23:31 ` [RFC 07/11] khugepaged: generalize __collapse_huge_page_* " Nico Pache
2025-01-10  6:38   ` Dev Jain
2025-01-08 23:31 ` [RFC 08/11] khugepaged: introduce khugepaged_scan_bitmap " Nico Pache
2025-01-10  9:05   ` Dev Jain
2025-01-10 21:48     ` Nico Pache
2025-01-12 11:23       ` Dev Jain
2025-01-13 22:25         ` Nico Pache
2025-01-10 14:54   ` Dev Jain
2025-01-10 21:48     ` Nico Pache [this message]
2025-01-12 15:13   ` Dev Jain
2025-01-12 16:41     ` Dev Jain
2025-01-08 23:31 ` [RFC 09/11] khugepaged: add " Nico Pache
2025-01-10  9:20   ` Dev Jain
2025-01-10 13:36   ` Dev Jain
2025-01-08 23:31 ` [RFC 10/11] khugepaged: remove max_ptes_none restriction on the pmd scan Nico Pache
2025-01-08 23:31 ` [RFC 11/11] khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2025-01-09  6:22 ` [RFC 00/11] khugepaged: mTHP support Dev Jain
2025-01-10  2:27   ` Nico Pache
2025-01-10  4:56     ` Dev Jain
2025-01-10 22:01       ` Nico Pache
2025-01-12 14:11         ` Dev Jain
2025-01-13 23:00           ` Nico Pache
2025-01-09  6:27 ` Dev Jain
2025-01-10  1:28   ` Nico Pache
2025-01-16  9:47 ` Ryan Roberts
2025-01-16 20:53   ` Nico Pache
2025-01-20  5:17     ` Dev Jain
2025-01-23 20:24       ` Nico Pache
2025-01-24  7:13         ` Dev Jain
2025-01-24  7:38           ` Dev Jain
2025-01-20 12:49     ` Ryan Roberts
2025-01-23 20:42       ` Nico Pache
2025-01-20 12:54     ` David Hildenbrand
2025-01-20 13:37       ` Ryan Roberts
2025-01-20 13:56         ` David Hildenbrand
2025-01-20 16:27           ` Ryan Roberts
2025-01-20 18:39             ` David Hildenbrand
2025-01-21  9:48               ` Ryan Roberts
2025-01-21 10:19                 ` David Hildenbrand
2025-01-27  9:31                   ` Dev Jain
2025-01-22  5:18                 ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA1CXcD94JSfG+NZFEwrd+8rtwYW7=ANHjKT2WPpaBv4AuCabA@mail.gmail.com' \
    --to=npache@redhat.com \
    --cc=21cnbao@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=audra@redhat.com \
    --cc=baohua@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=haowenchao22@gmail.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=jack@suse.cz \
    --cc=jglisse@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=peterx@redhat.com \
    --cc=raquini@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=srivatsa@csail.mit.edu \
    --cc=sunnanyong@huawei.com \
    --cc=surenb@google.com \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox