From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B956C83F10 for ; Thu, 31 Aug 2023 09:58:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6CABB8D000B; Thu, 31 Aug 2023 05:58:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 67A5F8D0001; Thu, 31 Aug 2023 05:58:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 569658D000B; Thu, 31 Aug 2023 05:58:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 459A58D0001 for ; Thu, 31 Aug 2023 05:58:09 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0E52A401E9 for ; Thu, 31 Aug 2023 09:58:09 +0000 (UTC) X-FDA: 81183949098.05.8F41207 Received: from outbound-smtp48.blacknight.com (outbound-smtp48.blacknight.com [46.22.136.219]) by imf21.hostedemail.com (Postfix) with ESMTP id 072821C0003 for ; Thu, 31 Aug 2023 09:58:05 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf21.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.219 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693475887; a=rsa-sha256; cv=none; b=m+XzdpQ5ahoZr2fOtjLaDl6BqwsD8HQ/Ar8DCpUTQXrFr5h+E8r8FPQG9AbPCku2h0owc5 iT+6UqMCnJPXUyHX4f154yery86uhg4e1hY6e2VbaOCly8Nn6sHFxjDdpaEXKbkZDf2M2f d5SbCCAmygsD2Y/h+72f/ycvYZY6MHg= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf21.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.219 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693475887; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NNwa/a7CpX7dM2R5JJTlEkjKax9OYgv5zgyWY/qi8WE=; b=uPY7R6KznhulXbG04X6N6/bfrOubA6Gdirl0o18n3+byb/USTCG2otU+Pygp30UrLVIIDm VPtE9JycZFgCQRYjq/EL3Qf0KBvpEpfCrjdUf50UA0pvlAj9awKOv4Wr3kfJYv5OdcwWKL xfT6s8auP1iXBtns26cL9LFc+3xDb9E= Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp48.blacknight.com (Postfix) with ESMTPS id DDB5BFAAEA for ; Thu, 31 Aug 2023 10:58:03 +0100 (IST) Received: (qmail 434 invoked from network); 31 Aug 2023 09:58:03 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.20.191]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 31 Aug 2023 09:58:03 -0000 Date: Thu, 31 Aug 2023 10:58:01 +0100 From: Mel Gorman To: Muchun Song Cc: Linux-MM , Mike Kravetz , Mike Rapoport , LKML , Muchun Song , fam.zheng@bytedance.com, liangma@liangbit.com, punit.agrawal@bytedance.com, Andrew Morton , Usama Arif Subject: Re: [External] [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Message-ID: <20230831095801.76rtpgdsvdijbw5t@techsingularity.net> References: <20230825111836.1715308-1-usama.arif@bytedance.com> <20230825111836.1715308-5-usama.arif@bytedance.com> <486CFF93-3BB1-44CD-B0A0-A47F560F2CAE@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 072821C0003 X-Stat-Signature: gwch7zeuz63ujatiegcw6qki5irgcuzm X-Rspam-User: X-HE-Tag: 1693475885-905575 X-HE-Meta: U2FsdGVkX18T4TCj+EmkvaPRQxFE+pRq5MGWuzHR11EJvmR/Of11+rb3OFgNd88EMT172lRFsXsvuIc9Jn+qOClKSDkIaCN7O4Wi39LeYMY9ad+enNAecOuPbq2V3/QqT24vceT8rvpdiGDa/uMlEfw3XGVfenyPsFANa6znWeyx5XBsjlz/VVC+QBkc9kmw5FpARhlJXLimHPQ7RnVZW9NM9H371drd2fU+7/67iO//1246RJe/9TAJctplB6tHefz12ky2iYYHd5iaj2Ek0RNcXLo7W7v5SXeprrbDk3RmdHimww9k+9J8MGJd61nfQQyXU/ymhHnkJqIUFkfjhzlJeilNE/b7rLQ1Fyqc8VfsT0T5AGKC3eajRVB/NJTcnKhcLo3avcx/1CjolQUxNZnW+mLwS28UpagTD00QcyZTGUBONqmkhLEq8zBjKlBHXPCfJ2Fr7uYgCRgjikHy61UjqoL4kdKh8nruG6AToaTPHwZFiJCjcAgKdgTcgIa+619n9IJ9SA43hfLkj72PyEfFcPISpvj49xk2MhpqUifRaSpTj2E73LH9VatG5x2Q6DhCO8Jle+7RrrB9XdujUT8OPf44SdE7VDVzAsGpIpX1Hw6lq8XPEvSS2JK9VH05sWoy5/Stvt4jZz7CXKP9JozGqBJoegvG/SldRPfiLjgX5R1GAQfNuwkfk4BcxdahFZbroP0CTxUD9Qhzvc3K6LdkJrQ3akJiLzIJxj0/KxiPMo4wTsBq6yEusfj7HtlmEMlEscx4odDGFz5B12rxV/O2+YiYpLDv4lVdNrnN1shlePGLHaZxP/RufvtqBvFvr2JvR3EOnHQSx6ewxYEM2qIjkknYUm/IpphKWDDzL9mEy94jVnnSk3AHqNXY//XisJTI6N0/jH0LUzWYS6KO0yeN6imkrS3l7sMTZJxt2tvCzbdK4hmrnVVsPbm7KaE/aErpY4IZ7lx4eFEXVFF 2N4WJfcy 1TJK1xVB075cSCBI9zyXdvekFDTsJAjTWGqXnyF9EDNHQ3yF0zRnVktCbtXumuFtOcPXd9GmXFoBuZPhiEStzi7iNDb8VanlMiYYrLX7zCO/W4Zh5y7XRlX+Bmvn95dMY81rq46yqv1JEoNuLkWFsAlG+OgOMV5VBoZX/YRDtyKO8M7IE1+B7vCEbpMXc0vsbcmV2YWen8ucngoKyVjzWL8d71E4FPzcGiScoeaWnfbuhwu3uG/vk6q5ULrQwr8N9Iibs36yQgmS1Oa8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 31, 2023 at 02:21:06PM +0800, Muchun Song wrote: > > > > On Aug 30, 2023, at 18:27, Usama Arif wrote: > > On 28/08/2023 12:33, Muchun Song wrote: > >>> On Aug 25, 2023, at 19:18, Usama Arif wrote: > >>> > >>> The new boot flow when it comes to initialization of gigantic pages > >>> is as follows: > >>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage, > >>> the region after the first struct page is marked as noinit. > >>> - This results in only the first struct page to be > >>> initialized in reserve_bootmem_region. As the tail struct pages are > >>> not initialized at this point, there can be a significant saving > >>> in boot time if HVO succeeds later on. > >>> - Later on in the boot, HVO is attempted. If its successful, only the first > >>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages > >>> after the head struct page are initialized. If it is not successful, > >>> then all of the tail struct pages are initialized. > >>> > >>> Signed-off-by: Usama Arif > >> This edition is simpler than before ever, thanks for your work. > >> There is premise that other subsystems do not access vmemmap pages > >> before the initialization of vmemmap pages associated withe HugeTLB > >> pages allocated from bootmem for your optimization. However, IIUC, the > >> compacting path could access arbitrary struct page when memory fails > >> to be allocated via buddy allocator. So we should make sure that > >> those struct pages are not referenced in this routine. And I know > >> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter > >> the same issue, but I don't find any code to prevent this from > >> happening. I need more time to confirm this, if someone already knows, > >> please let me know, thanks. So I think HugeTLB should adopt the similar > >> way to prevent this. > >> Thanks. > > > > Thanks for the reviews. > > > > So if I understand it correctly, the uninitialized pages due to the optimization in this patch and due to DEFERRED_STRUCT_PAGE_INIT should be treated in the same way during compaction. I see that in isolate_freepages during compaction there is a check to see if PageBuddy flag is set and also there are calls like __pageblock_pfn_to_page to check if the pageblock is valid. > > > > But if the struct page is uninitialized then they would contain random data and these checks could pass if certain bits were set? > > > > Compaction is done on free list. I think the uninitialized struct pages atleast from DEFERRED_STRUCT_PAGE_INIT would be part of freelist, so I think their pfn would be considered for compaction. > > > > Could someone more familiar with DEFERRED_STRUCT_PAGE_INIT and compaction confirm how the uninitialized struct pages are handled when compaction happens? Thanks! > > Hi Mel, > > Could you help us answer this question? I think you must be the expert of > CONFIG_DEFERRED_STRUCT_PAGE_INIT. I summarize the context here. As we all know, > some struct pages are uninnitialized when CONFIG_DEFERRED_STRUCT_PAGE_INIT is > enabled, if someone allocates a larger memory (e.g. order is 4) via buddy > allocator and fails to allocate the memory, then we will go into the compacting > routine, which will traverse all pfns and use pfn_to_page to access its struct > page, however, those struct pages may be uninnitialized (so it's arbitrary data). > Our question is how to prevent the compacting routine from accessing those > uninitialized struct pages? We'll be appreciated if you know the answer. > I didn't check the code but IIRC, the struct pages should be at least valid and not contain arbitrary data once page_alloc_init_late finishes. -- Mel Gorman SUSE Labs