Re: [External] Re: [v4 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Usama Arif <usama.arif@bytedance.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <muchun.song@linux.dev>
Cc: Linux-MM <linux-mm@kvack.org>,
	"Mike Rapoport (IBM)" <rppt@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Muchun Song <songmuchun@bytedance.com>,
	fam.zheng@bytedance.com, liangma@liangbit.com,
	punit.agrawal@bytedance.com
Subject: Re: [External] Re: [v4 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO
Date: Fri, 8 Sep 2023 21:48:06 +0100	[thread overview]
Message-ID: <63b72487-6220-7c44-70fd-822681746737@bytedance.com> (raw)
In-Reply-To: <20230908182950.GA6564@monkey>



On 08/09/2023 19:29, Mike Kravetz wrote:
> On 09/08/23 10:39, Muchun Song wrote:
>>
>>
>>> On Sep 8, 2023, at 02:37, Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>
>>> On 09/06/23 12:26, Usama Arif wrote:
>>>> The new boot flow when it comes to initialization of gigantic pages
>>>> is as follows:
>>>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
>>>> the region after the first struct page is marked as noinit.
>>>> - This results in only the first struct page to be
>>>> initialized in reserve_bootmem_region. As the tail struct pages are
>>>> not initialized at this point, there can be a significant saving
>>>> in boot time if HVO succeeds later on.
>>>> - Later on in the boot, the head page is prepped and the first
>>>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages
>>>> are initialized.
>>>> - HVO is attempted. If it is not successful, then the rest of the
>>>> tail struct pages are initialized. If it is successful, no more
>>>> tail struct pages need to be initialized saving significant boot time.
>>>>
>>>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
>>>> ---
>>>> mm/hugetlb.c         | 61 +++++++++++++++++++++++++++++++++++++-------
>>>> mm/hugetlb_vmemmap.c |  2 +-
>>>> mm/hugetlb_vmemmap.h |  9 ++++---
>>>> mm/internal.h        |  3 +++
>>>> mm/mm_init.c         |  2 +-
>>>> 5 files changed, 62 insertions(+), 15 deletions(-)
>>>
>>> As mentioned, in general this looks good.  One small point below.
>>>
>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>> index c32ca241df4b..540e0386514e 100644
>>>> --- a/mm/hugetlb.c
>>>> +++ b/mm/hugetlb.c
>>>> @@ -3169,6 +3169,15 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
>>>> }
>>>>
>>>> found:
>>>> +
>>>> + 	/*
>>>> + 	 * Only initialize the head struct page in memmap_init_reserved_pages,
>>>> + 	 * rest of the struct pages will be initialized by the HugeTLB subsystem itself.
>>>> + 	 * The head struct page is used to get folio information by the HugeTLB
>>>> + 	 * subsystem like zone id and node id.
>>>> + 	 */
>>>> + 	memblock_reserved_mark_noinit(virt_to_phys((void *)m + PAGE_SIZE),
>>>> + 	huge_page_size(h) - PAGE_SIZE);
>>>> 	/* Put them into a private list first because mem_map is not up yet */
>>>> 	INIT_LIST_HEAD(&m->list);
>>>> 	list_add(&m->list, &huge_boot_pages);
>>>> @@ -3176,6 +3185,40 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
>>>> 	return 1;
>>>> }
>>>>
>>>> +/* Initialize [start_page:end_page_number] tail struct pages of a hugepage */
>>>> +static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
>>>> +     		unsigned long start_page_number,
>>>> +     		unsigned long end_page_number)
>>>> +{
>>>> + 	enum zone_type zone = zone_idx(folio_zone(folio));
>>>> + 	int nid = folio_nid(folio);
>>>> + 	unsigned long head_pfn = folio_pfn(folio);
>>>> + 	unsigned long pfn, end_pfn = head_pfn + end_page_number;
>>>> +
>>>> + 	for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
>>>> + 	struct page *page = pfn_to_page(pfn);
>>>> +
>>>> + 		__init_single_page(page, pfn, zone, nid);
>>>> + 		prep_compound_tail((struct page *)folio, pfn - head_pfn);
>>>> + 		set_page_count(page, 0);
>>>> + 	}
>>>> +}
>>>> +
>>>> +static void __init hugetlb_folio_init_vmemmap(struct folio *folio, struct hstate *h,
>>>> +        unsigned long nr_pages)
>>>> +{
>>>> + 	int ret;
>>>> +
>>>> + 	/* Prepare folio head */
>>>> +	 __folio_clear_reserved(folio);
>>>> + 	__folio_set_head(folio);
>>>> + 	ret = page_ref_freeze(&folio->page, 1);
>>>> + 	VM_BUG_ON(!ret);
>>>
>>> In the current code, we print a warning and free the associated pages to
>>> buddy if we ever experience an increased ref count.  The routine
>>> hugetlb_folio_init_tail_vmemmap does not check for this.
>>>
>>> I do not believe speculative/temporary ref counts this early in the boot
>>> process are possible.  It would be great to get input from someone else.
>>
>> Yes, it is a very early stage and other tail struct pages haven't been
>> initialized yet, anyone should not reference them. It it the same case
>> as CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled.
>>
>>>
>>> When I wrote the existing code, it was fairly easy to WARN and continue
>>> if we encountered an increased ref count.  Things would be bit more
>>
>> In your case, I think it is not in the boot process, right?
> 
> They were calls in the same routine: gather_bootmem_prealloc().
> 
>>> complicated here.  So, it may not be worth the effort.
>>
>> Agree. Note that tail struct pages are not initialized here, if we want to
>> handle head page, how to handle tail pages? It really cannot resolved.
>> We should make the same assumption as CONFIG_DEFERRED_STRUCT_PAGE_INIT
>> that anyone should not reference those pages.
> 
> Agree that speculative refs should not happen this early.  How about making
> the following changes?
> - Instead of set_page_count() in hugetlb_folio_init_tail_vmemmap, do a
>    page_ref_freeze and VM_BUG_ON if not ref_count != 1.
> - In the commit message, mention 'The WARN_ON for increased ref count in
>    gather_bootmem_prealloc was changed to a VM_BUG_ON.  This is OK as
>    there should be no speculative references this early in boot process.
>    The VM_BUG_ON's are there just in case such code is introduced.'

Sounds good, although its not possible for the refcnt to not be 1 as 
there isnt anything that happens between __init_single_page and 
setting/freezing refcnt to 0. I will include the below diff in the next 
revision with the explanation in commit message as suggested.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 540e0386514e..ed37c6e4e952 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3194,13 +3194,15 @@ static void __init 
hugetlb_folio_init_tail_vmemmap(struct folio *folio,
         int nid = folio_nid(folio);
         unsigned long head_pfn = folio_pfn(folio);
         unsigned long pfn, end_pfn = head_pfn + end_page_number;
+       int ret;

         for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
                 struct page *page = pfn_to_page(pfn);

                 __init_single_page(page, pfn, zone, nid);
                 prep_compound_tail((struct page *)folio, pfn - head_pfn);
-               set_page_count(page, 0);
+               ret = page_ref_freeze(page, 1);
+               VM_BUG_ON(!ret);
         }
  }

next prev parent reply	other threads:[~2023-09-08 20:48 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-06 11:26 [v4 0/4] " Usama Arif
2023-09-06 11:26 ` [v4 1/4] mm: hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
2023-09-06 11:26 ` [v4 2/4] memblock: pass memblock_type to memblock_setclr_flag Usama Arif
2023-09-06 11:26 ` [v4 3/4] memblock: introduce MEMBLOCK_RSRV_NOINIT flag Usama Arif
2023-09-06 11:35   ` Muchun Song
2023-09-06 12:01   ` Mike Rapoport
2023-09-06 11:26 ` [v4 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Usama Arif
2023-09-06 18:10   ` Mike Kravetz
2023-09-06 21:27     ` [External] " Usama Arif
2023-09-06 21:59       ` Mike Kravetz
2023-09-07 10:14         ` Usama Arif
2023-09-07 18:24           ` Mike Kravetz
2023-09-07 18:37   ` Mike Kravetz
2023-09-08  2:39     ` Muchun Song
2023-09-08 18:29       ` Mike Kravetz
2023-09-08 20:48         ` Usama Arif [this message]
2023-09-22 14:42 ` [v4 0/4] " Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63b72487-6220-7c44-70fd-822681746737@bytedance.com \
    --to=usama.arif@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=liangma@liangbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=punit.agrawal@bytedance.com \
    --cc=rppt@kernel.org \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox