From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F48DC71153 for ; Tue, 29 Aug 2023 03:33:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C61E28002A; Mon, 28 Aug 2023 23:33:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 575D88E001E; Mon, 28 Aug 2023 23:33:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 464C028002A; Mon, 28 Aug 2023 23:33:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 372218E001E for ; Mon, 28 Aug 2023 23:33:42 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0D51B1A0476 for ; Tue, 29 Aug 2023 03:33:42 +0000 (UTC) X-FDA: 81175722684.26.3A1C820 Received: from out-251.mta0.migadu.com (out-251.mta0.migadu.com [91.218.175.251]) by imf27.hostedemail.com (Postfix) with ESMTP id 262CC40003 for ; Tue, 29 Aug 2023 03:33:39 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="lsH6FSk/"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf27.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.251 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693280020; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TbU8Li2pRJ88mITWQeQKv1yZgPCpNOVQ4jVcRFDLRV8=; b=027TTHM+gcygnnn++vO9QwBxEn8bacMuyVBH9sBt1Rl9Mny6jn92ozLV1iiCK4hpSxcelA ikPyxX17hylMj10eaw14nnDuu7XJI1iTe18Se56UIMLZWEH52E3AtcH4CtLGxoEbi0mdu3 q+dpUBf3jaq8w0u7JMBsk+nWO1hAyzw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="lsH6FSk/"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf27.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.251 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693280020; a=rsa-sha256; cv=none; b=3BmKvorr7kdtl+utIpyuyTv11Vvg0FNxH1iGbGtoYH/bp4Mrm/N4JqBcp36DSCIq5B63Gv PzqudGgyuRwmDcg8A4eK618gvMRlzVWVGlES6dh6M+je5ukYIRFJ4wxbTFw1jTfMbXOAfo 3Gy0BuryotY3Wyx6QmZItBytOWhAvMg= Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1693280017; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TbU8Li2pRJ88mITWQeQKv1yZgPCpNOVQ4jVcRFDLRV8=; b=lsH6FSk/Ob/4Bz2BuQiJvTRRTAxO/IJpT9k/r9eGA4q/ZzUHku6pCOkbDNOd3SVMbHc4Ns Ml6suvzSu9UZYuWXilXUnFtVloKwgC5kvh/DCDpTGffbqj2Nfc5yzuLRUUaS5MpScxu9+h TGSl9WcY/IuPR1GBdyO5zgzz9laDDVs= Mime-Version: 1.0 Subject: Re: [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <20230828210418.GD3290@monkey> Date: Tue, 29 Aug 2023 11:33:10 +0800 Cc: Usama Arif , Linux-MM , Mike Rapoport , LKML , Muchun Song , fam.zheng@bytedance.com, liangma@liangbit.com, punit.agrawal@bytedance.com Content-Transfer-Encoding: quoted-printable Message-Id: References: <20230825111836.1715308-1-usama.arif@bytedance.com> <20230825111836.1715308-5-usama.arif@bytedance.com> <486CFF93-3BB1-44CD-B0A0-A47F560F2CAE@linux.dev> <20230828210418.GD3290@monkey> To: Mike Kravetz X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 262CC40003 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: s5whjqhyzyq81yq34nmyzsfkty4xhp4c X-HE-Tag: 1693280019-409163 X-HE-Meta: U2FsdGVkX18KLhBINgXJoBRlAj3YuAZl7VySUW0PbO6xzypkJsQk+D3yAo3k1rVqGHuvVMOUbIUpcpn1VdOeA9lkbgR5S8h4RVbGzE3NZGRRwujXWhnEec+1zDIqsb+Tt21Q7u4Cg+Y71I7a+q2nbq/WJZojiqvqQdgdOoleheq+BRpsyARb3MtJAqtx/f9SvnJwkRMYnQotQmAAl8+WiiH2ZsTW/TJrLDfbfEeOYGO6S7t6AbTbTuWGPG1M6RH+WqS+JWJP15JUjoKklRFWWucifBNzccyRqIt4GzlrshQ/4Ml2oVwbcBPICU+jvd4PI4mdBJUvSCcpg5s18LagEmqcTiCtYgr9PCxYjpRC4Wu085jBxsledKGcJBTrIyOuZfMf/7tinpQEKc7na0b7WLMJQjQmEwVngefr7Ce1FgAJXDWxoGegyLhHTM4wwLuE3/zbjjQeTzZNQAlOpqLCBiRq4xG/twLGGVp4xmOtBQXBSu1NQb1RWHohV9XZw2Cp8VLbF1pDFxMWIPQLbJ7P51EytMt6scL6bFweATkLzoGCo1l1QrxfhdjzU7zGHU125rei1SWwAfoZb+12QgIc9EJJzQWzwgZt3F5Sc1B9Po84rwCQM/DZtqq2QibFs8tI6yO1a0hmdWo/4jQ/XI5lRmci4PzOXJFefei359nz7W4eu2GP7kEV+p+/WD9f+C5IFrk95Azz55z7CEZCLYBabaP0MNpT9MIpuy/q5j0RxAZDVpdgir34DDXwFv6bXlp/Q8o1dX0T2hGJqO+P4mb6EREvBRHfvCrW0JEH9aEYlynYBamQixA3+n+cE7M9Db/t16SAL2JNjqyTJ0IXZVMIhG/VhOzfXt/UhCWXdZhnqdkfWDOB+YosX+xKUlRs6dGt6m0YyUnrOZ/VXp/W+KEbZZiqxOgmVL8+BRNWdErkW1THajuh+MFfx5qq4RNqyDfPSOfG7xEr0uU4GqBHkaG sK1sewEU lCAw9UxvoUz6K0bPyvbhTFirNZfNnBOkVv9vnIHjoEDN9UKf353YLlkxYmBxNAMfJik1L+G82f8MKaPn0iMeenakhXklgnSBl2Y1VRAujR/oKdSoRtCe0QGsfKYIVs4iu4AuWyss+dH87MDStAci0ciOOic1EKeeVDhi5zaBeiX0L2vVkjJfGTfTL8/EpgpMC1GFTTXHVatTg+Vg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Aug 29, 2023, at 05:04, Mike Kravetz = wrote: >=20 > On 08/28/23 19:33, Muchun Song wrote: >>=20 >>=20 >>> On Aug 25, 2023, at 19:18, Usama Arif = wrote: >>>=20 >>> The new boot flow when it comes to initialization of gigantic pages >>> is as follows: >>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage, >>> the region after the first struct page is marked as noinit. >>> - This results in only the first struct page to be >>> initialized in reserve_bootmem_region. As the tail struct pages are >>> not initialized at this point, there can be a significant saving >>> in boot time if HVO succeeds later on. >>> - Later on in the boot, HVO is attempted. If its successful, only = the first >>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct = pages >>> after the head struct page are initialized. If it is not successful, >>> then all of the tail struct pages are initialized. >>>=20 >>> Signed-off-by: Usama Arif >>=20 >> This edition is simpler than before ever, thanks for your work. >>=20 >> There is premise that other subsystems do not access vmemmap pages >> before the initialization of vmemmap pages associated withe HugeTLB >> pages allocated from bootmem for your optimization. However, IIUC, = the >> compacting path could access arbitrary struct page when memory fails >> to be allocated via buddy allocator. So we should make sure that >> those struct pages are not referenced in this routine. And I know >> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter >> the same issue, but I don't find any code to prevent this from >> happening. I need more time to confirm this, if someone already = knows, >> please let me know, thanks. So I think HugeTLB should adopt the = similar >> way to prevent this. >=20 > In this patch, the call to hugetlb_vmemmap_optimize() is moved BEFORE > __prep_new_hugetlb_folio or prep_new_hugetlb_folio in all code paths. > The prep_new_hugetlb_folio routine(s) are what set the destructor = (soon > to be a flag) that identifies the set of pages as a hugetlb page. So, > there is now a window where a set of pages not identified as hugetlb > will not have vmemmap pages. Thanks for your point it out. Seems this issue is not related to this change? = hugetlb_vmemmap_optimize() is called before the setting of destructor since the initial commit f41f2ed43ca5. Right? >=20 > Recently, I closed the same window in the hugetlb freeing code paths = with > commit 32c877191e02 'hugetlb: do not clear hugetlb dtor until = allocating'. Yes, I saw it.=20 > This patch needs to be reworked so that this window is not opened in = the > allocation paths. So I think the fix should be a separate series. Thanks.