From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D810DEB8FB1
	for <linux-mm@archiver.kernel.org>; Fri,  8 Sep 2023 02:40:41 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 14EEC6B008C; Thu,  7 Sep 2023 22:40:41 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 102456B0092; Thu,  7 Sep 2023 22:40:41 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id F09636B0093; Thu,  7 Sep 2023 22:40:40 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id E141E6B008C
	for <linux-mm@kvack.org>; Thu,  7 Sep 2023 22:40:40 -0400 (EDT)
Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id BB8C0A0353
	for <linux-mm@kvack.org>; Fri,  8 Sep 2023 02:40:40 +0000 (UTC)
X-FDA: 81211877040.28.2671C7F
Received: from out-227.mta0.migadu.com (out-227.mta0.migadu.com [91.218.175.227])
	by imf12.hostedemail.com (Postfix) with ESMTP id B419D40003
	for <linux-mm@kvack.org>; Fri,  8 Sep 2023 02:40:38 +0000 (UTC)
Authentication-Results: imf12.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=ttEyOjVi;
	dmarc=pass (policy=none) header.from=linux.dev;
	spf=pass (imf12.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.227 as permitted sender) smtp.mailfrom=muchun.song@linux.dev
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1694140839;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=Rs5bwBL78/PRiAuJDQ+7XQFfD46cSe+ZZZ20CkSpt74=;
	b=YcW/91e6tmwEzkzs3EIIdiZpIFQvgMnY3jkYpG0WGtVxdJLGMA6cDG2nojgEcReIlVC1KI
	wuwW5KWgm25SHj5Drke6qK6e/UN8R7DxuoYvPUaiVyFdZfLRUTfYHNc7qL4OZZJcpwsT4o
	NekfI3ygfDYPgkNjGexOiSa+OYJVDG8=
ARC-Authentication-Results: i=1;
	imf12.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=ttEyOjVi;
	dmarc=pass (policy=none) header.from=linux.dev;
	spf=pass (imf12.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.227 as permitted sender) smtp.mailfrom=muchun.song@linux.dev
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694140839; a=rsa-sha256;
	cv=none;
	b=BzygJrDKqL+CWBohKu63a557Jfd2cax5gypMUp22rrVuHMgCZzq+iwR4CVrVrGXU5LYqrD
	ICAsB5lIDC3SHvQXxLp67/2qv8jkTfFkk0WxE3Hg/vaHm7spQ4ARtT/hNq5+FSFLqdLQZW
	P6SaiCYiQUgfJ9OynFzPdRYQnME3g38=
Content-Type: text/plain;
	charset=us-ascii
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1694140836;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=Rs5bwBL78/PRiAuJDQ+7XQFfD46cSe+ZZZ20CkSpt74=;
	b=ttEyOjViY13b8UUNKz7qX9yHrQRVA4N5KGdUQzGnIr6APiy3VyOh6Gloc/t4WK4SvBAxM0
	UYMeFI/3xIaVCVSMoE18MfFogT7v3qyEWeKBGgOQ56sTmS2gb60vkR4wHh36mJGw/aERRE
	bjTfbWN96NLaW9zlnHWNeVxTLn5MX20=
Mime-Version: 1.0
Subject: Re: [v4 4/4] mm: hugetlb: Skip initialization of gigantic tail struct
 pages if freed by HVO
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Muchun Song <muchun.song@linux.dev>
In-Reply-To: <20230907183712.GB3640@monkey>
Date: Fri, 8 Sep 2023 10:39:56 +0800
Cc: Usama Arif <usama.arif@bytedance.com>,
 Linux-MM <linux-mm@kvack.org>,
 "Mike Rapoport (IBM)" <rppt@kernel.org>,
 LKML <linux-kernel@vger.kernel.org>,
 Muchun Song <songmuchun@bytedance.com>,
 fam.zheng@bytedance.com,
 liangma@liangbit.com,
 punit.agrawal@bytedance.com
Content-Transfer-Encoding: quoted-printable
Message-Id: <C51EF081-ABC9-4770-B329-2CAC2CE979FA@linux.dev>
References: <20230906112605.2286994-1-usama.arif@bytedance.com>
 <20230906112605.2286994-5-usama.arif@bytedance.com>
 <20230907183712.GB3640@monkey>
To: Mike Kravetz <mike.kravetz@oracle.com>
X-Migadu-Flow: FLOW_OUT
X-Rspam-User: 
X-Rspamd-Server: rspam12
X-Rspamd-Queue-Id: B419D40003
X-Stat-Signature: 97xjzmp55aqkcddu8kwwdmxs3bxdr4qf
X-HE-Tag: 1694140838-901191
X-HE-Meta: U2FsdGVkX19x1lcHeeeQzbAA+NYnduapzZRe6DKdGHJFluwhhNNL0ILlqkmPrUWsKYra+HXiXog6wktIZzRo9jHoDWzZ1uHM0aj3aHy4+S2HZiD7BPX+sg4WuzcHzZ/u8dUu1TY2H8XBdo6o2wnKGgrN6XqmTIIV46qZWUW0aX4Md97tZF0TpW1vShLCNgmlojUY9+zS3dJAqlYCSFp9QUxzGxs6vG7RuNWPI0miw9vurDk64euw+mU9RXFkUydyRlVxoS/N1Hi9rKdBSjb3MMvjFdWboWa2TsaNYlimSbx+zYZ+wor8fD0hT2zunEwsasUnlOJnjqvAzAMPaqiEB4YEzXT07bYlYxAlyxK0rw7jSeuaUwV1iW+y5LD3nhQKipTaktlAsciq5pWvAE9aQRzKqgO7PWzldeSD+0P0zwJhf9alu3MeutnQm65A1GZa9R3btXDQ8akN2bO/iIJ7biaOdA7QxFIeSbiG392j038SaVtn1+lWu9fWYM3ZWfYHu1JDQF9GFq8hOOc0IdmpYYqwRw6TPsRVsi8nr1BA7AuGmyP127WoaZs9cSd+AxKZa1QV4N2MiYyWzQUU6L1lX0q4clySUCe8VqyrBkOyWooef2LpI8gCEG2bA39koLlAe/GvQodjd9NiM9gB05aLU7cWM3dOx3dHrFuypoI0plCHkOxUMAfYdRMtRKID6rxbriYtPjCmoPFNDjtIAfdyApNAYLi9M0NybESOftD5XJQgcmFSj2aRl6TYFie3rFVY8gU+HgUv86N7FbLWDbmtapI/E+5g6OpQ9mx3xPdYBB6aoSGu7hrGKGkwNALc2a1HcG92Xre3vH1mGCiMldTzxStG+fzgrhE3MAYDmJOTW2Nhudc9gdc0SHF2++Zj0r6oH0bs3IQDiEj16ICl5zdjQsD8JkAc8QHTbgzuQ0dDgf5EJlTpMKhM4USVRYnK9vI2/MFTZc2rX0vEzgDeE9v
 v8a9ywbk
 Xl0wfcLqgwOfiWAiAuqTUHDJ7rHAdmkponF3UjqSm9djJHVcu5nTJs9epUfVDzpAOGBnhbLpDDcT2zgsUZlS5YTa3IJiqfX6+Dmxdyr6OLNi2nHaBAxr/FAMUXwbyXduPdDX2PMBnrDPcNVjZx4IqrBGLReukuFtdvkY66CBKpgJFL8Q4yMzzEmrdhUUE2eXpzXP31YMjKQdpfxY=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


> On Sep 8, 2023, at 02:37, Mike Kravetz <mike.kravetz@oracle.com> =
wrote:
>=20
> On 09/06/23 12:26, Usama Arif wrote:
>> The new boot flow when it comes to initialization of gigantic pages
>> is as follows:
>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage,
>> the region after the first struct page is marked as noinit.
>> - This results in only the first struct page to be
>> initialized in reserve_bootmem_region. As the tail struct pages are
>> not initialized at this point, there can be a significant saving
>> in boot time if HVO succeeds later on.
>> - Later on in the boot, the head page is prepped and the first
>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct =
pages
>> are initialized.
>> - HVO is attempted. If it is not successful, then the rest of the
>> tail struct pages are initialized. If it is successful, no more
>> tail struct pages need to be initialized saving significant boot =
time.
>>=20
>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
>> ---
>> mm/hugetlb.c         | 61 =
+++++++++++++++++++++++++++++++++++++-------
>> mm/hugetlb_vmemmap.c |  2 +-
>> mm/hugetlb_vmemmap.h |  9 ++++---
>> mm/internal.h        |  3 +++
>> mm/mm_init.c         |  2 +-
>> 5 files changed, 62 insertions(+), 15 deletions(-)
>=20
> As mentioned, in general this looks good.  One small point below.
>=20
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index c32ca241df4b..540e0386514e 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3169,6 +3169,15 @@ int __alloc_bootmem_huge_page(struct hstate =
*h, int nid)
>> }
>>=20
>> found:
>> +
>> + 	/*
>> + 	 * Only initialize the head struct page in =
memmap_init_reserved_pages,
>> + 	 * rest of the struct pages will be initialized by the HugeTLB =
subsystem itself.
>> + 	 * The head struct page is used to get folio information by the =
HugeTLB
>> + 	 * subsystem like zone id and node id.
>> + 	 */
>> + 	memblock_reserved_mark_noinit(virt_to_phys((void *)m + =
PAGE_SIZE),
>> + 	huge_page_size(h) - PAGE_SIZE);
>> 	/* Put them into a private list first because mem_map is not up =
yet */
>> 	INIT_LIST_HEAD(&m->list);
>> 	list_add(&m->list, &huge_boot_pages);
>> @@ -3176,6 +3185,40 @@ int __alloc_bootmem_huge_page(struct hstate =
*h, int nid)
>> 	return 1;
>> }
>>=20
>> +/* Initialize [start_page:end_page_number] tail struct pages of a =
hugepage */
>> +static void __init hugetlb_folio_init_tail_vmemmap(struct folio =
*folio,
>> +     		unsigned long start_page_number,
>> +     		unsigned long end_page_number)
>> +{
>> + 	enum zone_type zone =3D zone_idx(folio_zone(folio));
>> + 	int nid =3D folio_nid(folio);
>> + 	unsigned long head_pfn =3D folio_pfn(folio);
>> + 	unsigned long pfn, end_pfn =3D head_pfn + end_page_number;
>> +
>> + 	for (pfn =3D head_pfn + start_page_number; pfn < end_pfn; pfn++) =
{
>> + 	struct page *page =3D pfn_to_page(pfn);
>> +
>> + 		__init_single_page(page, pfn, zone, nid);
>> + 		prep_compound_tail((struct page *)folio, pfn - =
head_pfn);
>> + 		set_page_count(page, 0);
>> + 	}
>> +}
>> +
>> +static void __init hugetlb_folio_init_vmemmap(struct folio *folio, =
struct hstate *h,
>> +        unsigned long nr_pages)
>> +{
>> + 	int ret;
>> +
>> + 	/* Prepare folio head */
>> +	 __folio_clear_reserved(folio);
>> + 	__folio_set_head(folio);
>> + 	ret =3D page_ref_freeze(&folio->page, 1);
>> + 	VM_BUG_ON(!ret);
>=20
> In the current code, we print a warning and free the associated pages =
to
> buddy if we ever experience an increased ref count.  The routine
> hugetlb_folio_init_tail_vmemmap does not check for this.
>=20
> I do not believe speculative/temporary ref counts this early in the =
boot
> process are possible.  It would be great to get input from someone =
else.

Yes, it is a very early stage and other tail struct pages haven't been
initialized yet, anyone should not reference them. It it the same case
as CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled.

>=20
> When I wrote the existing code, it was fairly easy to WARN and =
continue
> if we encountered an increased ref count.  Things would be bit more

In your case, I think it is not in the boot process, right?

> complicated here.  So, it may not be worth the effort.

Agree. Note that tail struct pages are not initialized here, if we want =
to
handle head page, how to handle tail pages? It really cannot resolved.
We should make the same assumption as CONFIG_DEFERRED_STRUCT_PAGE_INIT
that anyone should not reference those pages.

Thanks.