From: Mike Kravetz <mike.kravetz@oracle.com>
To: Muchun Song <muchun.song@linux.dev>
Cc: Usama Arif <usama.arif@bytedance.com>,
Linux-MM <linux-mm@kvack.org>,
"Mike Rapoport (IBM)" <rppt@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Muchun Song <songmuchun@bytedance.com>,
fam.zheng@bytedance.com, liangma@liangbit.com,
punit.agrawal@bytedance.com
Subject: Re: [v4 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
Date: Fri, 8 Sep 2023 11:29:50 -0700 [thread overview]
Message-ID: <20230908182950.GA6564@monkey> (raw)
In-Reply-To: <C51EF081-ABC9-4770-B329-2CAC2CE979FA@linux.dev>
On 09/08/23 10:39, Muchun Song wrote:
>
>
> > On Sep 8, 2023, at 02:37, Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 09/06/23 12:26, Usama Arif wrote:
> >> The new boot flow when it comes to initialization of gigantic pages
> >> is as follows:
> >> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
> >> the region after the first struct page is marked as noinit.
> >> - This results in only the first struct page to be
> >> initialized in reserve_bootmem_region. As the tail struct pages are
> >> not initialized at this point, there can be a significant saving
> >> in boot time if HVO succeeds later on.
> >> - Later on in the boot, the head page is prepped and the first
> >> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
> >> are initialized.
> >> - HVO is attempted. If it is not successful, then the rest of the
> >> tail struct pages are initialized. If it is successful, no more
> >> tail struct pages need to be initialized saving significant boot time.
> >>
> >> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> >> ---
> >> mm/hugetlb.c | 61 +++++++++++++++++++++++++++++++++++++-------
> >> mm/hugetlb_vmemmap.c | 2 +-
> >> mm/hugetlb_vmemmap.h | 9 ++++---
> >> mm/internal.h | 3 +++
> >> mm/mm_init.c | 2 +-
> >> 5 files changed, 62 insertions(+), 15 deletions(-)
> >
> > As mentioned, in general this looks good. One small point below.
> >
> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> index c32ca241df4b..540e0386514e 100644
> >> --- a/mm/hugetlb.c
> >> +++ b/mm/hugetlb.c
> >> @@ -3169,6 +3169,15 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
> >> }
> >>
> >> found:
> >> +
> >> + /*
> >> + * Only initialize the head struct page in memmap_init_reserved_pages,
> >> + * rest of the struct pages will be initialized by the HugeTLB subsystem itself.
> >> + * The head struct page is used to get folio information by the HugeTLB
> >> + * subsystem like zone id and node id.
> >> + */
> >> + memblock_reserved_mark_noinit(virt_to_phys((void *)m + PAGE_SIZE),
> >> + huge_page_size(h) - PAGE_SIZE);
> >> /* Put them into a private list first because mem_map is not up yet */
> >> INIT_LIST_HEAD(&m->list);
> >> list_add(&m->list, &huge_boot_pages);
> >> @@ -3176,6 +3185,40 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
> >> return 1;
> >> }
> >>
> >> +/* Initialize [start_page:end_page_number] tail struct pages of a hugepage */
> >> +static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
> >> + unsigned long start_page_number,
> >> + unsigned long end_page_number)
> >> +{
> >> + enum zone_type zone = zone_idx(folio_zone(folio));
> >> + int nid = folio_nid(folio);
> >> + unsigned long head_pfn = folio_pfn(folio);
> >> + unsigned long pfn, end_pfn = head_pfn + end_page_number;
> >> +
> >> + for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
> >> + struct page *page = pfn_to_page(pfn);
> >> +
> >> + __init_single_page(page, pfn, zone, nid);
> >> + prep_compound_tail((struct page *)folio, pfn - head_pfn);
> >> + set_page_count(page, 0);
> >> + }
> >> +}
> >> +
> >> +static void __init hugetlb_folio_init_vmemmap(struct folio *folio, struct hstate *h,
> >> + unsigned long nr_pages)
> >> +{
> >> + int ret;
> >> +
> >> + /* Prepare folio head */
> >> + __folio_clear_reserved(folio);
> >> + __folio_set_head(folio);
> >> + ret = page_ref_freeze(&folio->page, 1);
> >> + VM_BUG_ON(!ret);
> >
> > In the current code, we print a warning and free the associated pages to
> > buddy if we ever experience an increased ref count. The routine
> > hugetlb_folio_init_tail_vmemmap does not check for this.
> >
> > I do not believe speculative/temporary ref counts this early in the boot
> > process are possible. It would be great to get input from someone else.
>
> Yes, it is a very early stage and other tail struct pages haven't been
> initialized yet, anyone should not reference them. It it the same case
> as CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled.
>
> >
> > When I wrote the existing code, it was fairly easy to WARN and continue
> > if we encountered an increased ref count. Things would be bit more
>
> In your case, I think it is not in the boot process, right?
They were calls in the same routine: gather_bootmem_prealloc().
> > complicated here. So, it may not be worth the effort.
>
> Agree. Note that tail struct pages are not initialized here, if we want to
> handle head page, how to handle tail pages? It really cannot resolved.
> We should make the same assumption as CONFIG_DEFERRED_STRUCT_PAGE_INIT
> that anyone should not reference those pages.
Agree that speculative refs should not happen this early. How about making
the following changes?
- Instead of set_page_count() in hugetlb_folio_init_tail_vmemmap, do a
page_ref_freeze and VM_BUG_ON if not ref_count != 1.
- In the commit message, mention 'The WARN_ON for increased ref count in
gather_bootmem_prealloc was changed to a VM_BUG_ON. This is OK as
there should be no speculative references this early in boot process.
The VM_BUG_ON's are there just in case such code is introduced.'
--
Mike Kravetz
next prev parent reply other threads:[~2023-09-08 18:54 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-06 11:26 [v4 0/4] " Usama Arif
2023-09-06 11:26 ` [v4 1/4] mm: hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
2023-09-06 11:26 ` [v4 2/4] memblock: pass memblock_type to memblock_setclr_flag Usama Arif
2023-09-06 11:26 ` [v4 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT flag Usama Arif
2023-09-06 11:35 ` Muchun Song
2023-09-06 12:01 ` Mike Rapoport
2023-09-06 11:26 ` [v4 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Usama Arif
2023-09-06 18:10 ` Mike Kravetz
2023-09-06 21:27 ` [External] " Usama Arif
2023-09-06 21:59 ` Mike Kravetz
2023-09-07 10:14 ` Usama Arif
2023-09-07 18:24 ` Mike Kravetz
2023-09-07 18:37 ` Mike Kravetz
2023-09-08 2:39 ` Muchun Song
2023-09-08 18:29 ` Mike Kravetz [this message]
2023-09-08 20:48 ` [External] " Usama Arif
2023-09-22 14:42 ` [v4 0/4] " Pasha Tatashin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230908182950.GA6564@monkey \
--to=mike.kravetz@oracle.com \
--cc=fam.zheng@bytedance.com \
--cc=liangma@liangbit.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
--cc=punit.agrawal@bytedance.com \
--cc=rppt@kernel.org \
--cc=songmuchun@bytedance.com \
--cc=usama.arif@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox