From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D810DEB8FB1 for ; Fri, 8 Sep 2023 02:40:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 14EEC6B008C; Thu, 7 Sep 2023 22:40:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 102456B0092; Thu, 7 Sep 2023 22:40:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F09636B0093; Thu, 7 Sep 2023 22:40:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E141E6B008C for ; Thu, 7 Sep 2023 22:40:40 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id BB8C0A0353 for ; Fri, 8 Sep 2023 02:40:40 +0000 (UTC) X-FDA: 81211877040.28.2671C7F Received: from out-227.mta0.migadu.com (out-227.mta0.migadu.com [91.218.175.227]) by imf12.hostedemail.com (Postfix) with ESMTP id B419D40003 for ; Fri, 8 Sep 2023 02:40:38 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ttEyOjVi; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf12.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.227 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694140839; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Rs5bwBL78/PRiAuJDQ+7XQFfD46cSe+ZZZ20CkSpt74=; b=YcW/91e6tmwEzkzs3EIIdiZpIFQvgMnY3jkYpG0WGtVxdJLGMA6cDG2nojgEcReIlVC1KI wuwW5KWgm25SHj5Drke6qK6e/UN8R7DxuoYvPUaiVyFdZfLRUTfYHNc7qL4OZZJcpwsT4o NekfI3ygfDYPgkNjGexOiSa+OYJVDG8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ttEyOjVi; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf12.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.227 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694140839; a=rsa-sha256; cv=none; b=BzygJrDKqL+CWBohKu63a557Jfd2cax5gypMUp22rrVuHMgCZzq+iwR4CVrVrGXU5LYqrD ICAsB5lIDC3SHvQXxLp67/2qv8jkTfFkk0WxE3Hg/vaHm7spQ4ARtT/hNq5+FSFLqdLQZW P6SaiCYiQUgfJ9OynFzPdRYQnME3g38= Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1694140836; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Rs5bwBL78/PRiAuJDQ+7XQFfD46cSe+ZZZ20CkSpt74=; b=ttEyOjViY13b8UUNKz7qX9yHrQRVA4N5KGdUQzGnIr6APiy3VyOh6Gloc/t4WK4SvBAxM0 UYMeFI/3xIaVCVSMoE18MfFogT7v3qyEWeKBGgOQ56sTmS2gb60vkR4wHh36mJGw/aERRE bjTfbWN96NLaW9zlnHWNeVxTLn5MX20= Mime-Version: 1.0 Subject: Re: [v4 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <20230907183712.GB3640@monkey> Date: Fri, 8 Sep 2023 10:39:56 +0800 Cc: Usama Arif , Linux-MM , "Mike Rapoport (IBM)" , LKML , Muchun Song , fam.zheng@bytedance.com, liangma@liangbit.com, punit.agrawal@bytedance.com Content-Transfer-Encoding: quoted-printable Message-Id: References: <20230906112605.2286994-1-usama.arif@bytedance.com> <20230906112605.2286994-5-usama.arif@bytedance.com> <20230907183712.GB3640@monkey> To: Mike Kravetz X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: B419D40003 X-Stat-Signature: 97xjzmp55aqkcddu8kwwdmxs3bxdr4qf X-HE-Tag: 1694140838-901191 X-HE-Meta: U2FsdGVkX19x1lcHeeeQzbAA+NYnduapzZRe6DKdGHJFluwhhNNL0ILlqkmPrUWsKYra+HXiXog6wktIZzRo9jHoDWzZ1uHM0aj3aHy4+S2HZiD7BPX+sg4WuzcHzZ/u8dUu1TY2H8XBdo6o2wnKGgrN6XqmTIIV46qZWUW0aX4Md97tZF0TpW1vShLCNgmlojUY9+zS3dJAqlYCSFp9QUxzGxs6vG7RuNWPI0miw9vurDk64euw+mU9RXFkUydyRlVxoS/N1Hi9rKdBSjb3MMvjFdWboWa2TsaNYlimSbx+zYZ+wor8fD0hT2zunEwsasUnlOJnjqvAzAMPaqiEB4YEzXT07bYlYxAlyxK0rw7jSeuaUwV1iW+y5LD3nhQKipTaktlAsciq5pWvAE9aQRzKqgO7PWzldeSD+0P0zwJhf9alu3MeutnQm65A1GZa9R3btXDQ8akN2bO/iIJ7biaOdA7QxFIeSbiG392j038SaVtn1+lWu9fWYM3ZWfYHu1JDQF9GFq8hOOc0IdmpYYqwRw6TPsRVsi8nr1BA7AuGmyP127WoaZs9cSd+AxKZa1QV4N2MiYyWzQUU6L1lX0q4clySUCe8VqyrBkOyWooef2LpI8gCEG2bA39koLlAe/GvQodjd9NiM9gB05aLU7cWM3dOx3dHrFuypoI0plCHkOxUMAfYdRMtRKID6rxbriYtPjCmoPFNDjtIAfdyApNAYLi9M0NybESOftD5XJQgcmFSj2aRl6TYFie3rFVY8gU+HgUv86N7FbLWDbmtapI/E+5g6OpQ9mx3xPdYBB6aoSGu7hrGKGkwNALc2a1HcG92Xre3vH1mGCiMldTzxStG+fzgrhE3MAYDmJOTW2Nhudc9gdc0SHF2++Zj0r6oH0bs3IQDiEj16ICl5zdjQsD8JkAc8QHTbgzuQ0dDgf5EJlTpMKhM4USVRYnK9vI2/MFTZc2rX0vEzgDeE9v v8a9ywbk Xl0wfcLqgwOfiWAiAuqTUHDJ7rHAdmkponF3UjqSm9djJHVcu5nTJs9epUfVDzpAOGBnhbLpDDcT2zgsUZlS5YTa3IJiqfX6+Dmxdyr6OLNi2nHaBAxr/FAMUXwbyXduPdDX2PMBnrDPcNVjZx4IqrBGLReukuFtdvkY66CBKpgJFL8Q4yMzzEmrdhUUE2eXpzXP31YMjKQdpfxY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Sep 8, 2023, at 02:37, Mike Kravetz = wrote: >=20 > On 09/06/23 12:26, Usama Arif wrote: >> The new boot flow when it comes to initialization of gigantic pages >> is as follows: >> - At boot time, for a gigantic page during __alloc_bootmem_hugepage, >> the region after the first struct page is marked as noinit. >> - This results in only the first struct page to be >> initialized in reserve_bootmem_region. As the tail struct pages are >> not initialized at this point, there can be a significant saving >> in boot time if HVO succeeds later on. >> - Later on in the boot, the head page is prepped and the first >> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct = pages >> are initialized. >> - HVO is attempted. If it is not successful, then the rest of the >> tail struct pages are initialized. If it is successful, no more >> tail struct pages need to be initialized saving significant boot = time. >>=20 >> Signed-off-by: Usama Arif >> --- >> mm/hugetlb.c | 61 = +++++++++++++++++++++++++++++++++++++------- >> mm/hugetlb_vmemmap.c | 2 +- >> mm/hugetlb_vmemmap.h | 9 ++++--- >> mm/internal.h | 3 +++ >> mm/mm_init.c | 2 +- >> 5 files changed, 62 insertions(+), 15 deletions(-) >=20 > As mentioned, in general this looks good. One small point below. >=20 >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index c32ca241df4b..540e0386514e 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -3169,6 +3169,15 @@ int __alloc_bootmem_huge_page(struct hstate = *h, int nid) >> } >>=20 >> found: >> + >> + /* >> + * Only initialize the head struct page in = memmap_init_reserved_pages, >> + * rest of the struct pages will be initialized by the HugeTLB = subsystem itself. >> + * The head struct page is used to get folio information by the = HugeTLB >> + * subsystem like zone id and node id. >> + */ >> + memblock_reserved_mark_noinit(virt_to_phys((void *)m + = PAGE_SIZE), >> + huge_page_size(h) - PAGE_SIZE); >> /* Put them into a private list first because mem_map is not up = yet */ >> INIT_LIST_HEAD(&m->list); >> list_add(&m->list, &huge_boot_pages); >> @@ -3176,6 +3185,40 @@ int __alloc_bootmem_huge_page(struct hstate = *h, int nid) >> return 1; >> } >>=20 >> +/* Initialize [start_page:end_page_number] tail struct pages of a = hugepage */ >> +static void __init hugetlb_folio_init_tail_vmemmap(struct folio = *folio, >> + unsigned long start_page_number, >> + unsigned long end_page_number) >> +{ >> + enum zone_type zone =3D zone_idx(folio_zone(folio)); >> + int nid =3D folio_nid(folio); >> + unsigned long head_pfn =3D folio_pfn(folio); >> + unsigned long pfn, end_pfn =3D head_pfn + end_page_number; >> + >> + for (pfn =3D head_pfn + start_page_number; pfn < end_pfn; pfn++) = { >> + struct page *page =3D pfn_to_page(pfn); >> + >> + __init_single_page(page, pfn, zone, nid); >> + prep_compound_tail((struct page *)folio, pfn - = head_pfn); >> + set_page_count(page, 0); >> + } >> +} >> + >> +static void __init hugetlb_folio_init_vmemmap(struct folio *folio, = struct hstate *h, >> + unsigned long nr_pages) >> +{ >> + int ret; >> + >> + /* Prepare folio head */ >> + __folio_clear_reserved(folio); >> + __folio_set_head(folio); >> + ret =3D page_ref_freeze(&folio->page, 1); >> + VM_BUG_ON(!ret); >=20 > In the current code, we print a warning and free the associated pages = to > buddy if we ever experience an increased ref count. The routine > hugetlb_folio_init_tail_vmemmap does not check for this. >=20 > I do not believe speculative/temporary ref counts this early in the = boot > process are possible. It would be great to get input from someone = else. Yes, it is a very early stage and other tail struct pages haven't been initialized yet, anyone should not reference them. It it the same case as CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled. >=20 > When I wrote the existing code, it was fairly easy to WARN and = continue > if we encountered an increased ref count. Things would be bit more In your case, I think it is not in the boot process, right? > complicated here. So, it may not be worth the effort. Agree. Note that tail struct pages are not initialized here, if we want = to handle head page, how to handle tail pages? It really cannot resolved. We should make the same assumption as CONFIG_DEFERRED_STRUCT_PAGE_INIT that anyone should not reference those pages. Thanks.