From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7A3BC83F29 for ; Thu, 31 Aug 2023 06:21:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 00F358D0005; Thu, 31 Aug 2023 02:21:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F02038D0001; Thu, 31 Aug 2023 02:21:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF0288D0005; Thu, 31 Aug 2023 02:21:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CFF0C8D0001 for ; Thu, 31 Aug 2023 02:21:57 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C0B168013D for ; Thu, 31 Aug 2023 06:21:56 +0000 (UTC) X-FDA: 81183404232.29.A2B8E5A Received: from out-244.mta0.migadu.com (out-244.mta0.migadu.com [91.218.175.244]) by imf21.hostedemail.com (Postfix) with ESMTP id E82981C0008 for ; Thu, 31 Aug 2023 06:21:53 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=X140LA5X; spf=pass (imf21.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.244 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693462914; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XpQNp0A8i0S7SQfKTwp86WbNGnfzkLE8YkfFQsWuvUM=; b=qrnqP3oBL3kuBC3jIvAmhyIZsWKH2k0BAbdtRyX+MwjBQOtaDHlQdmgqPFGrkfwS4g7Rkr mKZuf+xCFNJWDZ8/dQcQPBQ2jxyWWT3mYLLwJmSJ5gkySFDfWC79PWrFASKZFO06hqGvg/ QWUrBd7K2rdSwNpq35w/EMUqTf8hQMQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693462914; a=rsa-sha256; cv=none; b=M2+bemxEZ6EiGGbVufL9pumS+MvMsrVuDDaK7SBh1/7rUgya5qkLRTXqpcM2xcP95wz7Vf VhTpDeJBlFJXikj99GOaIT/lCIeS81D62C5LTUlu4NcbOXkpRAg1aEp4xvNiJpbUe1QROX ZdPCH51fJbuMq4+DqQx3uhDEjMoR3dM= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=X140LA5X; spf=pass (imf21.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.244 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1693462911; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XpQNp0A8i0S7SQfKTwp86WbNGnfzkLE8YkfFQsWuvUM=; b=X140LA5XVSse65boP6zP2hcTtma2Vup+iEfNBHXkmt0K3YzHrytzoMCTuIRmymld+BWC9O BSseNNTlDRlB4rJ90jaxCNTS7a6as5yKaFlXaFcbmlQ/tFnvOv8nfND+MM8RcvqHyUzesV jh57Am7jhW9FRzWn8Vw7VVDzAAJ/1KA= Mime-Version: 1.0 Subject: Re: [External] [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: Date: Thu, 31 Aug 2023 14:21:06 +0800 Cc: Linux-MM , Mike Kravetz , Mike Rapoport , LKML , Muchun Song , fam.zheng@bytedance.com, liangma@liangbit.com, punit.agrawal@bytedance.com, Andrew Morton , Usama Arif Content-Transfer-Encoding: quoted-printable Message-Id: References: <20230825111836.1715308-1-usama.arif@bytedance.com> <20230825111836.1715308-5-usama.arif@bytedance.com> <486CFF93-3BB1-44CD-B0A0-A47F560F2CAE@linux.dev> To: Mel Gorman X-Migadu-Flow: FLOW_OUT X-Stat-Signature: mhusaurmsmoqihc57csym7ou9gztt1hc X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: E82981C0008 X-Rspam-User: X-HE-Tag: 1693462913-983387 X-HE-Meta: U2FsdGVkX19+N78jVGUD5wpj8eVlmkHpTCmWeLF1OLWcsz7zU/cWYJOU+va70GvWnB+yddxaTHDoff+bXBpA53U3iLuEn1+VPwV6YzQ2Nct5f5lOfXIkf4gWBUIDUqZgrLfJQ51HngYPMDhb/ij6vnNvhqOKg66O+pIHwU2RHxdZqToew9xSIolCszks4euR96gBjwYQQOQsjQnG8IADjOL/134oHQVkvbTiVSUR9f69lJQ5Vd7XtFnUIJOzNMnBwchlPw3dq6z8LlRoyoPxvjr99JmUdOzUXGbmtcCdOPxY8UleC9NlnmvCzXUbTxILigzEHjNebZgp1KkBVlO+SK16YNwT1cfH7cBvUmo9Aj4tjiH3ARP7sufAwwtRRGQkfijrTQinPLSWGaQJO1MCalymZCUrrd9OC3Omxc4BOnXcrjR9uz0zPblgUskG1Iz9a/xxLBBL7QpotZ0w+QvaBlbaT7xn5NZLHKY7mEsKDjH1RF/7WXAcq4zhvksFTIvCbtdnBKSxXzyl4D0c9Gy7WP4VulkuWP6Kh3+mgLlbZVnTPXRu8XXKQDjcPwszGft8YrmPk1NSqFG8syoq4ckNax0zsw1lJtEsCgykkkmJqo7SXTzTq/JunjvWjZ0bmngP/uxmqRR1j3kiNNH/MofesuZpxuTApDWudYDuBU2JoPbrE3iH7cVFCsBBUIYk3pMxQ8RFnhmwrO8EaWvqpEgNArkU+gayfZeinXSrWPFn7jZwyp8jpM5vDT5p3D2Kc8cZJoM8QxEdkeHk4FWxGweoWxejZVnDcEaeLYuCmDBOPMULG4TO+HU13g2dbxmx8cS2+bWA9mhDhIy21flIKDKoQX+UOhVRZUZXLbOJdpR0YgAAT5bD90tfNYbL3crNtimsGZnm0wQUWjCFI89YrzzZKdR//I5MTahdhg/3ukVMV9a2BIpfTK17GI4fPyy9YTg1EcnJ/CxlVz4xejsk6DR 3nV5MCcA pbyiO5aci4o4IrxyxUU4oBh97TwAMOIDRnWC6SX1clfEWJdeZzfbdOwm+Tod+ZfIU8zZ8Ye3zXvXBPtVNVs8itFyGAooOvyyZpJbNxxjAhGgifKbZGZ+tIa+3YhhFU5YACtJ/AEyB2cc+hE+V14CAw9llSIeAvTOdTrPjeRwXIKNS8W51LuZxA3ZxLGbgA5JBsFaYbv522Jnfi9As0XsRIixBSg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Aug 30, 2023, at 18:27, Usama Arif = wrote: > On 28/08/2023 12:33, Muchun Song wrote: >>> On Aug 25, 2023, at 19:18, Usama Arif = wrote: >>>=20 >>> The new boot flow when it comes to initialization of gigantic pages >>> is as follows: >>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage, >>> the region after the first struct page is marked as noinit. >>> - This results in only the first struct page to be >>> initialized in reserve_bootmem_region. As the tail struct pages are >>> not initialized at this point, there can be a significant saving >>> in boot time if HVO succeeds later on. >>> - Later on in the boot, HVO is attempted. If its successful, only = the first >>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct = pages >>> after the head struct page are initialized. If it is not successful, >>> then all of the tail struct pages are initialized. >>>=20 >>> Signed-off-by: Usama Arif >> This edition is simpler than before ever, thanks for your work. >> There is premise that other subsystems do not access vmemmap pages >> before the initialization of vmemmap pages associated withe HugeTLB >> pages allocated from bootmem for your optimization. However, IIUC, = the >> compacting path could access arbitrary struct page when memory fails >> to be allocated via buddy allocator. So we should make sure that >> those struct pages are not referenced in this routine. And I know >> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter >> the same issue, but I don't find any code to prevent this from >> happening. I need more time to confirm this, if someone already = knows, >> please let me know, thanks. So I think HugeTLB should adopt the = similar >> way to prevent this. >> Thanks. >=20 > Thanks for the reviews. >=20 > So if I understand it correctly, the uninitialized pages due to the = optimization in this patch and due to DEFERRED_STRUCT_PAGE_INIT should = be treated in the same way during compaction. I see that in = isolate_freepages during compaction there is a check to see if PageBuddy = flag is set and also there are calls like __pageblock_pfn_to_page to = check if the pageblock is valid. >=20 > But if the struct page is uninitialized then they would contain random = data and these checks could pass if certain bits were set? >=20 > Compaction is done on free list. I think the uninitialized struct = pages atleast from DEFERRED_STRUCT_PAGE_INIT would be part of freelist, = so I think their pfn would be considered for compaction. >=20 > Could someone more familiar with DEFERRED_STRUCT_PAGE_INIT and = compaction confirm how the uninitialized struct pages are handled when = compaction happens? Thanks! Hi Mel, Could you help us answer this question? I think you must be the expert = of CONFIG_DEFERRED_STRUCT_PAGE_INIT. I summarize the context here. As we = all know, some struct pages are uninnitialized when = CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, if someone allocates a larger memory (e.g. order is 4) via = buddy allocator and fails to allocate the memory, then we will go into the = compacting routine, which will traverse all pfns and use pfn_to_page to access its = struct page, however, those struct pages may be uninnitialized (so it's = arbitrary data). Our question is how to prevent the compacting routine from accessing = those uninitialized struct pages? We'll be appreciated if you know the answer. Thanks.