Re: [Chapter Three] THP HVO: bring the hugeTLB feature to THP

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yang Shi <shy828301@gmail.com>
To: Yu Zhao <yuzhao@google.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	 Jonathan Corbet <corbet@lwn.net>
Subject: Re: [Chapter Three] THP HVO: bring the hugeTLB feature to THP
Date: Thu, 29 Feb 2024 14:54:01 -0800	[thread overview]
Message-ID: <CAHbLzkrmT7=HYimU8f0BcvsjQ=GM2bQdLGRohNeXcnCJoNzrCQ@mail.gmail.com> (raw)
In-Reply-To: <20240229183436.4110845-4-yuzhao@google.com>

On Thu, Feb 29, 2024 at 10:34 AM Yu Zhao <yuzhao@google.com> wrote:
>
> HVO can be one of the perks for heavy THP users like it is for hugeTLB
> users. For example, if such a user uses 60% of physical memory for 2MB
> THPs, THP HVO can reduce the struct page overhead by half (60% * 7/8
> ~= 50%).
>
> ZONE_NOMERGE considerably simplifies the implementation of HVO for
> THPs, since THPs from it cannot be split or merged and thus do not
> require any correctness-related operations on tail pages beyond the
> second one.
>
> If a THP is mapped by PTEs, two optimization-related operations on its
> tail pages, i.e., _mapcount and PG_anon_exclusive, can be binned to
> track a group of pages, e.g., eight pages per group for 2MB THPs. The
> estimation, as the copying cost incurred during shattering, is also by
> design, since mapping by PTEs is another discouraged behavior.

I'm confused by this. Can you please elaborate a little bit about
binning mapcount and PG_anon_exclusive?

For mapcount, IIUC, for example, when inc'ing a subpage's mapcount,
you actually inc the (i % 64) page's mapcount (assuming THP size is 2M
and base page size is 4K, so 8 strides and 64 pages in each stride),
right? But how you can tell each page of the 8 pages has mapcount 1 or
one page is mapped 8 times? Or this actually doesn't matter, we don't
even care to distinguish the two cases?

For PG_anon_exclusive, if one page has it set, it means other 7 pages
in other strides have it set too?

>
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> ---
>  include/linux/mm.h     | 140 ++++++++++++++++++++++++++++++++++++++
>  include/linux/mmzone.h |   1 +
>  include/linux/rmap.h   |   4 ++
>  init/main.c            |   1 +
>  mm/gup.c               |   3 +-
>  mm/huge_memory.c       |   2 +
>  mm/hugetlb_vmemmap.c   |   2 +-
>  mm/internal.h          |   9 ---
>  mm/memory.c            |  11 +--
>  mm/page_alloc.c        | 151 ++++++++++++++++++++++++++++++++++++++++-
>  mm/rmap.c              |  17 ++++-
>  mm/vmstat.c            |   2 +
>  12 files changed, 323 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f5a97dec5169..d7014fc35cca 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1196,6 +1196,138 @@ static inline void page_mapcount_reset(struct page *page)
>         atomic_set(&(page)->_mapcount, -1);
>  }
>
> +#define HVO_MOD (PAGE_SIZE / sizeof(struct page))
> +
> +static inline int hvo_order_size(int order)
> +{
> +       if (PAGE_SIZE % sizeof(struct page) || !is_power_of_2(HVO_MOD))
> +               return 0;
> +
> +       return (1 << order) * sizeof(struct page);
> +}
> +
> +static inline bool page_hvo_suitable(struct page *head, int order)
> +{
> +       VM_WARN_ON_ONCE_PAGE(!test_bit(PG_head, &head->flags), head);
> +
> +       if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key))
> +               return false;
> +
> +       return page_zonenum(head) == ZONE_NOMERGE &&
> +              IS_ALIGNED((unsigned long)head, PAGE_SIZE) &&
> +              hvo_order_size(order) > PAGE_SIZE;
> +}
> +
> +static inline bool folio_hvo_suitable(struct folio *folio)
> +{
> +       return folio_test_large(folio) && page_hvo_suitable(&folio->page, folio_order(folio));
> +}
> +
> +static inline bool page_is_hvo(struct page *head, int order)
> +{
> +       return page_hvo_suitable(head, order) && test_bit(PG_head, &head[HVO_MOD].flags);
> +}
> +
> +static inline bool folio_is_hvo(struct folio *folio)
> +{
> +       return folio_test_large(folio) && page_is_hvo(&folio->page, folio_order(folio));
> +}
> +
> +/*
> + * If a 16GB hugetlb folio were mapped by PTEs of all of its 4kB pages,
> + * its nr_pages_mapped would be 0x400000: choose the ENTIRELY_MAPPED bit
> + * above that range, instead of 2*(PMD_SIZE/PAGE_SIZE).  Hugetlb currently
> + * leaves nr_pages_mapped at 0, but avoid surprise if it participates later.
> + */
> +#define ENTIRELY_MAPPED                0x800000
> +#define FOLIO_PAGES_MAPPED     (ENTIRELY_MAPPED - 1)
> +
> +static inline int hvo_range_mapcount(struct folio *folio, struct page *page, int nr_pages, int *ret)
> +{
> +       int i, next, end;
> +       int stride = hvo_order_size(folio_order(folio)) / PAGE_SIZE;
> +
> +       if (!folio_is_hvo(folio))
> +               return false;
> +
> +       *ret = folio_entire_mapcount(folio);
> +
> +       for (i = folio_page_idx(folio, page), end = i + nr_pages; i != end; i = next) {
> +               next = min(end, round_down(i + stride, stride));
> +
> +               page = folio_page(folio, i / stride);
> +               *ret += atomic_read(&page->_mapcount) + 1;
> +       }
> +
> +       return true;
> +}
> +
> +static inline bool hvo_map_range(struct folio *folio, struct page *page, int nr_pages, int *ret)
> +{
> +       int i, next, end;
> +       int stride = hvo_order_size(folio_order(folio)) / PAGE_SIZE;
> +
> +       if (!folio_is_hvo(folio))
> +               return false;
> +
> +       *ret = 0;
> +
> +       for (i = folio_page_idx(folio, page), end = i + nr_pages; i != end; i = next) {
> +               next = min(end, round_down(i + stride, stride));
> +
> +               page = folio_page(folio, i / stride);
> +               if (atomic_add_return(next - i, &page->_mapcount) == next - i - 1)
> +                       *ret += stride;
> +       }
> +
> +       if (atomic_add_return(*ret, &folio->_nr_pages_mapped) >= ENTIRELY_MAPPED)
> +               *ret = 0;
> +
> +       return true;
> +}
> +
> +static inline bool hvo_unmap_range(struct folio *folio, struct page *page, int nr_pages, int *ret)
> +{
> +       int i, next, end;
> +       int stride = hvo_order_size(folio_order(folio)) / PAGE_SIZE;
> +
> +       if (!folio_is_hvo(folio))
> +               return false;
> +
> +       *ret = 0;
> +
> +       for (i = folio_page_idx(folio, page), end = i + nr_pages; i != end; i = next) {
> +               next = min(end, round_down(i + stride, stride));
> +
> +               page = folio_page(folio, i / stride);
> +               if (atomic_sub_return(next - i, &page->_mapcount) == -1)
> +                       *ret += stride;
> +       }
> +
> +       if (atomic_sub_return(*ret, &folio->_nr_pages_mapped) >= ENTIRELY_MAPPED)
> +               *ret = 0;
> +
> +       return true;
> +}
> +
> +static inline bool hvo_dup_range(struct folio *folio, struct page *page, int nr_pages)
> +{
> +       int i, next, end;
> +       int stride = hvo_order_size(folio_order(folio)) / PAGE_SIZE;
> +
> +       if (!folio_is_hvo(folio))
> +               return false;
> +
> +       for (i = folio_page_idx(folio, page), end = i + nr_pages; i != end; i = next) {
> +               next = min(end, round_down(i + stride, stride));
> +
> +               page = folio_page(folio, i / stride);
> +               atomic_add(next - i, &page->_mapcount);
> +       }
> +
> +       return true;
> +}
> +
>  /**
>   * page_mapcount() - Number of times this precise page is mapped.
>   * @page: The page.
> @@ -1212,6 +1344,9 @@ static inline int page_mapcount(struct page *page)
>  {
>         int mapcount = atomic_read(&page->_mapcount) + 1;
>
> +       if (hvo_range_mapcount(page_folio(page), page, 1, &mapcount))
> +               return mapcount;
> +
>         if (unlikely(PageCompound(page)))
>                 mapcount += folio_entire_mapcount(page_folio(page));
>
> @@ -3094,6 +3229,11 @@ static inline void pagetable_pud_dtor(struct ptdesc *ptdesc)
>
>  extern void __init pagecache_init(void);
>  extern void free_initmem(void);
> +extern void free_vmemmap(void);
> +extern int vmemmap_remap_free(unsigned long start, unsigned long end,
> +                             unsigned long reuse,
> +                             struct list_head *vmemmap_pages,
> +                             unsigned long flags);
>
>  /*
>   * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 532218167bba..00e4bb6c8533 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -916,6 +916,7 @@ struct zone {
>  #ifdef CONFIG_CMA
>         unsigned long           cma_pages;
>  #endif
> +       atomic_long_t           hvo_freed;
>
>         const char              *name;
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index b7944a833668..d058c4cb3c96 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -322,6 +322,8 @@ static __always_inline void __folio_dup_file_rmap(struct folio *folio,
>
>         switch (level) {
>         case RMAP_LEVEL_PTE:
> +               if (hvo_dup_range(folio, page, nr_pages))
> +                       break;
>                 do {
>                         atomic_inc(&page->_mapcount);
>                 } while (page++, --nr_pages > 0);
> @@ -401,6 +403,8 @@ static __always_inline int __folio_try_dup_anon_rmap(struct folio *folio,
>                                 if (PageAnonExclusive(page + i))
>                                         return -EBUSY;
>                 }
> +               if (hvo_dup_range(folio, page, nr_pages))
> +                       break;
>                 do {
>                         if (PageAnonExclusive(page))
>                                 ClearPageAnonExclusive(page);
> diff --git a/init/main.c b/init/main.c
> index e24b0780fdff..74003495db32 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1448,6 +1448,7 @@ static int __ref kernel_init(void *unused)
>         kgdb_free_init_mem();
>         exit_boot_config();
>         free_initmem();
> +       free_vmemmap();
>         mark_readonly();
>
>         /*
> diff --git a/mm/gup.c b/mm/gup.c
> index df83182ec72d..f3df0078505b 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -57,7 +57,7 @@ static inline void sanity_check_pinned_pages(struct page **pages,
>                         continue;
>                 if (!folio_test_large(folio) || folio_test_hugetlb(folio))
>                         VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page);
> -               else
> +               else if (!folio_is_hvo(folio) || !folio_nr_pages_mapped(folio))
>                         /* Either a PTE-mapped or a PMD-mapped THP. */
>                         VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page) &&
>                                        !PageAnonExclusive(page), page);
> @@ -645,6 +645,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>         }
>
>         VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
> +                      !folio_is_hvo(page_folio(page)) &&
>                        !PageAnonExclusive(page), page);
>
>         /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 62d2254bc51c..9e7e5d587a5c 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2535,6 +2535,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>                  *
>                  * See folio_try_share_anon_rmap_pmd(): invalidate PMD first.
>                  */
> +               if (folio_is_hvo(folio))
> +                       ClearPageAnonExclusive(page);
>                 anon_exclusive = PageAnonExclusive(page);
>                 if (freeze && anon_exclusive &&
>                     folio_try_share_anon_rmap_pmd(folio, page))
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index da177e49d956..9f43d900e83c 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -310,7 +310,7 @@ static int vmemmap_remap_split(unsigned long start, unsigned long end,
>   *
>   * Return: %0 on success, negative error code otherwise.
>   */
> -static int vmemmap_remap_free(unsigned long start, unsigned long end,
> +int vmemmap_remap_free(unsigned long start, unsigned long end,
>                               unsigned long reuse,
>                               struct list_head *vmemmap_pages,
>                               unsigned long flags)
> diff --git a/mm/internal.h b/mm/internal.h
> index ac1d27468899..871c6eeb78b8 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -52,15 +52,6 @@ struct folio_batch;
>
>  void page_writeback_init(void);
>
> -/*
> - * If a 16GB hugetlb folio were mapped by PTEs of all of its 4kB pages,
> - * its nr_pages_mapped would be 0x400000: choose the ENTIRELY_MAPPED bit
> - * above that range, instead of 2*(PMD_SIZE/PAGE_SIZE).  Hugetlb currently
> - * leaves nr_pages_mapped at 0, but avoid surprise if it participates later.
> - */
> -#define ENTIRELY_MAPPED                0x800000
> -#define FOLIO_PAGES_MAPPED     (ENTIRELY_MAPPED - 1)
> -
>  /*
>   * Flags passed to __show_mem() and show_free_areas() to suppress output in
>   * various contexts.
> diff --git a/mm/memory.c b/mm/memory.c
> index 0bfc8b007c01..db389f1d776d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3047,8 +3047,8 @@ static inline void wp_page_reuse(struct vm_fault *vmf, struct folio *folio)
>         VM_BUG_ON(!(vmf->flags & FAULT_FLAG_WRITE));
>
>         if (folio) {
> -               VM_BUG_ON(folio_test_anon(folio) &&
> -                         !PageAnonExclusive(vmf->page));
> +               VM_BUG_ON_PAGE(folio_test_anon(folio) && !folio_is_hvo(folio) &&
> +                              !PageAnonExclusive(vmf->page), vmf->page);
>                 /*
>                  * Clear the folio's cpupid information as the existing
>                  * information potentially belongs to a now completely
> @@ -3502,7 +3502,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>          */
>         if (folio && folio_test_anon(folio) &&
>             (PageAnonExclusive(vmf->page) || wp_can_reuse_anon_folio(folio, vma))) {
> -               if (!PageAnonExclusive(vmf->page))
> +               if (!folio_is_hvo(folio) && !PageAnonExclusive(vmf->page))
>                         SetPageAnonExclusive(vmf->page);
>                 if (unlikely(unshare)) {
>                         pte_unmap_unlock(vmf->pte, vmf->ptl);
> @@ -4100,8 +4100,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>                                         rmap_flags);
>         }
>
> -       VM_BUG_ON(!folio_test_anon(folio) ||
> -                       (pte_write(pte) && !PageAnonExclusive(page)));
> +       VM_BUG_ON_PAGE(!folio_test_anon(folio) ||
> +                      (pte_write(pte) && !folio_is_hvo(folio) && !PageAnonExclusive(page)),
> +                      page);
>         set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
>         arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dd843fb04f78..5f8c6583a191 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -53,6 +53,7 @@
>  #include <linux/khugepaged.h>
>  #include <linux/delayacct.h>
>  #include <linux/cacheinfo.h>
> +#include <linux/bootmem_info.h>
>  #include <asm/div64.h>
>  #include "internal.h"
>  #include "shuffle.h"
> @@ -585,6 +586,10 @@ void prep_compound_page(struct page *page, unsigned int order)
>         int nr_pages = 1 << order;
>
>         __SetPageHead(page);
> +
> +       if (page_is_hvo(page, order))
> +               nr_pages = HVO_MOD;
> +
>         for (i = 1; i < nr_pages; i++)
>                 prep_compound_tail(page, i);
>
> @@ -1124,10 +1129,15 @@ static __always_inline bool free_pages_prepare(struct page *page,
>          */
>         if (unlikely(order)) {
>                 int i;
> +               int nr_pages = 1 << order;
>
> -               if (compound)
> +               if (compound) {
> +                       if (page_is_hvo(page, order))
> +                               nr_pages = HVO_MOD;
>                         page[1].flags &= ~PAGE_FLAGS_SECOND;
> -               for (i = 1; i < (1 << order); i++) {
> +               }
> +
> +               for (i = 1; i < nr_pages; i++) {
>                         if (compound)
>                                 bad += free_tail_page_prepare(page, page + i);
>                         if (is_check_pages_enabled()) {
> @@ -1547,6 +1557,141 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
>         page_table_check_alloc(page, order);
>  }
>
> +static void prep_hvo_page(struct page *head, int order)
> +{
> +       LIST_HEAD(list);
> +       struct page *page, *next;
> +       int freed = 0;
> +       unsigned long start = (unsigned long)head;
> +       unsigned long end = start + hvo_order_size(order);
> +
> +       if (page_zonenum(head) != ZONE_NOMERGE)
> +               return;
> +
> +       if (WARN_ON_ONCE(order != page_zone(head)->order)) {
> +               bad_page(head, "invalid page order");
> +               return;
> +       }
> +
> +       if (!page_hvo_suitable(head, order) || page_is_hvo(head, order))
> +               return;
> +
> +       vmemmap_remap_free(start + PAGE_SIZE, end, start, &list, 0);
> +
> +       list_for_each_entry_safe(page, next, &list, lru) {
> +               if (PageReserved(page))
> +                       free_bootmem_page(page);
> +               else
> +                       __free_page(page);
> +               freed++;
> +       }
> +
> +       atomic_long_add(freed, &page_zone(head)->hvo_freed);
> +}
> +
> +static void prep_nomerge_zone(struct zone *zone, enum migratetype type)
> +{
> +       int order;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&zone->lock, flags);
> +
> +       for (order = MAX_PAGE_ORDER; order > zone->order; order--) {
> +               struct page *page;
> +               int split = 0;
> +               struct free_area *area = zone->free_area + order;
> +
> +               while ((page = get_page_from_free_area(area, type))) {
> +                       del_page_from_free_list(page, zone, order);
> +                       expand(zone, page, zone->order, order, type);
> +                       set_buddy_order(page, zone->order);
> +                       add_to_free_list(page, zone, zone->order, type);
> +                       split++;
> +               }
> +
> +               pr_info("  HVO: order %d split %d\n", order, split);
> +       }
> +
> +       spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
> +static void hvo_nomerge_zone(struct zone *zone, enum migratetype type)
> +{
> +       LIST_HEAD(old);
> +       LIST_HEAD(new);
> +       int nomem, freed;
> +       unsigned long flags;
> +       struct list_head list;
> +       struct page *page, *next;
> +       struct free_area *area = zone->free_area + zone->order;
> +again:
> +       nomem = freed = 0;
> +       INIT_LIST_HEAD(&list);
> +
> +       spin_lock_irqsave(&zone->lock, flags);
> +       list_splice_init(area->free_list + type, &old);
> +       spin_unlock_irqrestore(&zone->lock, flags);
> +
> +       list_for_each_entry_safe(page, next, &old, buddy_list) {
> +               unsigned long start = (unsigned long)page;
> +               unsigned long end = start + hvo_order_size(zone->order);
> +
> +               if (WARN_ON_ONCE(!IS_ALIGNED(start, PAGE_SIZE)))
> +                       continue;
> +
> +               if (vmemmap_remap_free(start + PAGE_SIZE, end, start, &list, 0))
> +                       nomem++;
> +       }
> +
> +       list_for_each_entry_safe(page, next, &list, lru) {
> +               if (PageReserved(page))
> +                       free_bootmem_page(page);
> +               else
> +                       __free_page(page);
> +               freed++;
> +       }
> +
> +       list_splice_init(&old, &new);
> +       atomic_long_add(freed, &zone->hvo_freed);
> +
> +       pr_info("  HVO: nomem %d freed %d\n", nomem, freed);
> +
> +       if (!list_empty(area->free_list + type))
> +               goto again;
> +
> +       spin_lock_irqsave(&zone->lock, flags);
> +       list_splice(&new, area->free_list + type);
> +       spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
> +static bool zone_hvo_suitable(struct zone *zone)
> +{
> +       if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key))
> +               return false;
> +
> +       return zone_idx(zone) == ZONE_NOMERGE && hvo_order_size(zone->order) > PAGE_SIZE;
> +}
> +
> +void free_vmemmap(void)
> +{
> +       struct zone *zone;
> +
> +       static_branch_inc(&hugetlb_optimize_vmemmap_key);
> +
> +       for_each_populated_zone(zone) {
> +               if (!zone_hvo_suitable(zone))
> +                       continue;
> +
> +               pr_info("Freeing vmemmap of node %d zone %s\n",
> +                        zone_to_nid(zone), zone->name);
> +
> +               prep_nomerge_zone(zone, MIGRATE_MOVABLE);
> +               hvo_nomerge_zone(zone, MIGRATE_MOVABLE);
> +
> +               cond_resched();
> +       }
> +}
> +
>  static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
>                                                         unsigned int alloc_flags)
>  {
> @@ -1565,6 +1710,8 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
>                 set_page_pfmemalloc(page);
>         else
>                 clear_page_pfmemalloc(page);
> +
> +       prep_hvo_page(page, order);
>  }
>
>  /*
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 0ddb28c52961..d339bf489230 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1143,6 +1143,10 @@ int folio_total_mapcount(struct folio *folio)
>         /* In the common case, avoid the loop when no pages mapped by PTE */
>         if (folio_nr_pages_mapped(folio) == 0)
>                 return mapcount;
> +
> +       if (hvo_range_mapcount(folio, &folio->page, folio_nr_pages(folio), &mapcount))
> +               return mapcount;
> +
>         /*
>          * Add all the PTE mappings of those pages mapped by PTE.
>          * Limit the loop to folio_nr_pages_mapped()?
> @@ -1168,6 +1172,8 @@ static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
>
>         switch (level) {
>         case RMAP_LEVEL_PTE:
> +               if (hvo_map_range(folio, page, nr_pages, &nr))
> +                       break;
>                 do {
>                         first = atomic_inc_and_test(&page->_mapcount);
>                         if (first && folio_test_large(folio)) {
> @@ -1314,6 +1320,8 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio,
>         if (flags & RMAP_EXCLUSIVE) {
>                 switch (level) {
>                 case RMAP_LEVEL_PTE:
> +                       if (folio_is_hvo(folio))
> +                               break;
>                         for (i = 0; i < nr_pages; i++)
>                                 SetPageAnonExclusive(page + i);
>                         break;
> @@ -1421,6 +1429,9 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>         } else if (!folio_test_pmd_mappable(folio)) {
>                 int i;
>
> +               if (hvo_map_range(folio, &folio->page, nr, &nr))
> +                       goto done;
> +
>                 for (i = 0; i < nr; i++) {
>                         struct page *page = folio_page(folio, i);
>
> @@ -1437,7 +1448,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>                 SetPageAnonExclusive(&folio->page);
>                 __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr);
>         }
> -
> +done:
>         __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr);
>  }
>
> @@ -1510,6 +1521,8 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
>
>         switch (level) {
>         case RMAP_LEVEL_PTE:
> +               if (hvo_unmap_range(folio, page, nr_pages, &nr))
> +                       break;
>                 do {
>                         last = atomic_add_negative(-1, &page->_mapcount);
>                         if (last && folio_test_large(folio)) {
> @@ -2212,7 +2225,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>                                 break;
>                         }
>                         VM_BUG_ON_PAGE(pte_write(pteval) && folio_test_anon(folio) &&
> -                                      !anon_exclusive, subpage);
> +                                      !folio_is_hvo(folio) && !anon_exclusive, subpage);
>
>                         /* See folio_try_share_anon_rmap_pte(): clear PTE first. */
>                         if (folio_test_hugetlb(folio)) {
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index ff2114452334..f51f3b872270 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1704,6 +1704,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>                    "\n        present  %lu"
>                    "\n        managed  %lu"
>                    "\n        cma      %lu"
> +                  "\n  hvo   freed    %lu"
>                    "\n        order    %u",
>                    zone_page_state(zone, NR_FREE_PAGES),
>                    zone->watermark_boost,
> @@ -1714,6 +1715,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>                    zone->present_pages,
>                    zone_managed_pages(zone),
>                    zone_cma_pages(zone),
> +                  atomic_long_read(&zone->hvo_freed),
>                    zone->order);
>
>         seq_printf(m,
> --
> 2.44.0.rc1.240.g4c46232300-goog
>
>

next prev parent reply	other threads:[~2024-02-29 22:54 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29 18:34 [LSF/MM/BPF TOPIC] TAO: THP Allocator Optimizations Yu Zhao
2024-02-29 18:34 ` [Chapter One] THP zones: the use cases of policy zones Yu Zhao
2024-02-29 20:28   ` Matthew Wilcox
2024-03-06  3:51     ` Yu Zhao
2024-03-06  4:33       ` Matthew Wilcox
2024-02-29 23:31   ` Yang Shi
2024-03-03  2:47     ` Yu Zhao
2024-03-04 15:19   ` Matthew Wilcox
2024-03-05 17:22     ` Matthew Wilcox
2024-03-05  8:41   ` Barry Song
2024-03-05 10:07     ` Vlastimil Babka
2024-03-05 21:04       ` Barry Song
2024-03-06  3:05         ` Yu Zhao
2024-05-24  8:38   ` Barry Song
2024-11-01  2:35   ` Charan Teja Kalla
2024-11-01 16:55     ` Yu Zhao
2024-02-29 18:34 ` [Chapter Two] THP shattering: the reverse of collapsing Yu Zhao
2024-02-29 21:55   ` Zi Yan
2024-03-03  1:17     ` Yu Zhao
2024-03-03  1:21       ` Zi Yan
2024-06-11  8:32   ` Barry Song
2024-02-29 18:34 ` [Chapter Three] THP HVO: bring the hugeTLB feature to THP Yu Zhao
2024-02-29 22:54   ` Yang Shi [this message]
2024-03-01 15:42     ` David Hildenbrand
2024-03-03  1:46     ` Yu Zhao
2024-02-29 18:34 ` [Epilogue] Profile-Guided Heap Optimization and THP fungibility Yu Zhao
2024-03-05  8:37 ` [LSF/MM/BPF TOPIC] TAO: THP Allocator Optimizations Barry Song
2024-03-06 15:51 ` Johannes Weiner
2024-03-06 16:40   ` Zi Yan
2024-03-13 22:09   ` Kaiyang Zhao
2024-05-15 21:17 ` Yu Zhao
2024-05-15 21:52   ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHbLzkrmT7=HYimU8f0BcvsjQ=GM2bQdLGRohNeXcnCJoNzrCQ@mail.gmail.com' \
    --to=shy828301@gmail.com \
    --cc=corbet@lwn.net \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox