From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63F4EC83F17 for ; Thu, 31 Aug 2023 10:01:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F33CD8D000B; Thu, 31 Aug 2023 06:01:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EE12E8D0001; Thu, 31 Aug 2023 06:01:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD0808D000B; Thu, 31 Aug 2023 06:01:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CE3D28D0001 for ; Thu, 31 Aug 2023 06:01:56 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 923BEC01EB for ; Thu, 31 Aug 2023 10:01:56 +0000 (UTC) X-FDA: 81183958632.05.5BDA0A4 Received: from out-251.mta0.migadu.com (out-251.mta0.migadu.com [91.218.175.251]) by imf03.hostedemail.com (Postfix) with ESMTP id 266472000A for ; Thu, 31 Aug 2023 10:01:53 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=MIeAL94h; spf=pass (imf03.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.251 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693476114; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WKc1BtceIQGmsRun6QDSzaR2M4uTT4/ClHb8w8QgRJs=; b=1xJZvPGpkNVxkhJl/fBNm0pMXuQ4DSYXSQFmhNygqKTIWydi0aTXl1aSf09BJPlP1ErSr/ YRc/LeMdGN4+ZKr5R3P8tE6hP28HvoEfYTehy7hQQgMzWEiZvKqBBzkcIliBhL+2CBgALt NYitV7zoq8N4zaOMQBau0xISZwV0TRQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693476114; a=rsa-sha256; cv=none; b=AXFSNiWDKfb1cJU2rqgrtrfkcT4UfD3ff0+OKWFwXv0gOyUOtkz7YDqPaipLNRLXbP01Hn RsgNOCzcYb0EgaGuw7V2vH6XfwkWsy5sDZ4p+55pS3/qTO5RQuUhZl5no7+VNHA3IM58EO XWeQaPC8Ypg+tks44SJPg0fWldcQZQE= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=MIeAL94h; spf=pass (imf03.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.251 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1693476111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WKc1BtceIQGmsRun6QDSzaR2M4uTT4/ClHb8w8QgRJs=; b=MIeAL94hzGLpBkuqvJopyawFDuReZXKLmlTgY6nxwsBpwp1nfnT1q1Op78u96Kfi/j63Jt JYYAsAEDps3s8X8P+MrVz29CWb5sDbVWvl9zQiSOAuIh4WVVJwTWP0ksd6Npdv0jZgrpT0 2hNjfFi/8RlzuxCKZFy5KzBvkTcoeEk= Mime-Version: 1.0 Subject: Re: [External] [v3 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <20230831095801.76rtpgdsvdijbw5t@techsingularity.net> Date: Thu, 31 Aug 2023 18:01:08 +0800 Cc: Linux-MM , Mike Kravetz , Mike Rapoport , LKML , Muchun Song , fam.zheng@bytedance.com, liangma@liangbit.com, punit.agrawal@bytedance.com, Andrew Morton , Usama Arif Content-Transfer-Encoding: quoted-printable Message-Id: <07E9202B-CA8B-4E1E-93FC-7BF84CB8E988@linux.dev> References: <20230825111836.1715308-1-usama.arif@bytedance.com> <20230825111836.1715308-5-usama.arif@bytedance.com> <486CFF93-3BB1-44CD-B0A0-A47F560F2CAE@linux.dev> <20230831095801.76rtpgdsvdijbw5t@techsingularity.net> To: Mel Gorman X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 266472000A X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 9m93m6m6chayxq139y8dkdn3fgndn5fc X-HE-Tag: 1693476113-201184 X-HE-Meta: U2FsdGVkX1/WwvINv6t2hZD4Xl9hobSOP4wsrUAG16im+gBx7vkhbjhvepZsNGODZOcQDqs/3O8nvwjWA9ZefZXTQx1kioM75qJ0yxSFYESgD+9TLwOtXovHj/t8r/BkTbUDaGjBW019/AQ0YR1yhbi68tBSPwCV5pG4ggDn8eJKnI6F0zbfSwiNC5VM4Gp9+FORK8bhOsG/LV4v/Pefq4e+tk186S7YwACyoSvO6hAwr/667/KHu90DygSX2REp+l3p2B/aEV7EwxK4o9wMZJB6uRkTab7fiY1+5PfVHPxRA3Tur5A7mYaBS4k5ZkfLqBLyP8FKrz5yPd13qJv5r85UIi1/gR4iefoJ1oKJtk4x7sHBDUd+mP4QP68sPsliPsNjBG4e7Fih2QHNAVgrUAk59RpwX8LCNsjLzlEi6V8cpz9+qgPykWMa2MN3g/ade9AlssdlthB7j08rJqSdMUWh0ib43uHQp1KBqU+a3rDPfgDX4Mg8P4PVbCZ4bpMD7T9r79H0qE2df1UNGHPfdyLLhzQit3wVz9EA6h4zZ0uFhDDG8yqIMQsoD/khElbUFguBI4bPemK4eOD2ifje+SRtpux+FpSb9bWIOP/NxrEm1+rmmTP4Ztx4tALlKGIhujpDuEsNEjMACAZwqXMpUnl/DplyhUtJMuJmLpVYQcc3yE/MaqMgn9jr/sqoA3fKjjS4bGiIrJttudefyESCgfHcavvZY3U1ZTB4M9acm26oJO7crq08R5tPRK9lf+hiGcfVtfo5HFP4tdDx94UtdeUDK1sH0BW/pbzsGgQK6a7/bBn8GeIkYBSPFp7VVC6k8SV3ZvjwnskH0scmbld/0HFfuaFU021yij4u0h5lrubul2qwELERv2ZGFTv+I0PNbMNDFfBaeWmJDe6GjBxd45/qIAvzLBf4yY+wp2vJAhMK1JNIpyrco03PTzMHGzDjVq+TUJGKL0leqDznkVU 8oRt2iuy A808DQgYVE4MXttvSfhjUem0n6RkhTY6ZMDYZBZaEtrYOgAF0P/hXcfpLJseO7l0/QgEt0QWtgn6wCnmL8qUXlmC+w8cIPZ7/DkFXE5X/K5uIWwhWNEnKTn24qYvqanY44gNQopovVUx0p4Z+3WhB42sTXEBlaVZ5Wf1Kv7sZo+gkakP3QUSMuR6nDz1JpjpFeFUqcEU2D/yMz8oOl1RHQC2j+jLiw3OAzya3PZVLYYLIvx0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Aug 31, 2023, at 17:58, Mel Gorman = wrote: >=20 > On Thu, Aug 31, 2023 at 02:21:06PM +0800, Muchun Song wrote: >>=20 >>=20 >>> On Aug 30, 2023, at 18:27, Usama Arif = wrote: >>> On 28/08/2023 12:33, Muchun Song wrote: >>>>> On Aug 25, 2023, at 19:18, Usama Arif = wrote: >>>>>=20 >>>>> The new boot flow when it comes to initialization of gigantic = pages >>>>> is as follows: >>>>> - At boot time, for a gigantic page during = __alloc_bootmem_hugepage, >>>>> the region after the first struct page is marked as noinit. >>>>> - This results in only the first struct page to be >>>>> initialized in reserve_bootmem_region. As the tail struct pages = are >>>>> not initialized at this point, there can be a significant saving >>>>> in boot time if HVO succeeds later on. >>>>> - Later on in the boot, HVO is attempted. If its successful, only = the first >>>>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct = pages >>>>> after the head struct page are initialized. If it is not = successful, >>>>> then all of the tail struct pages are initialized. >>>>>=20 >>>>> Signed-off-by: Usama Arif >>>> This edition is simpler than before ever, thanks for your work. >>>> There is premise that other subsystems do not access vmemmap pages >>>> before the initialization of vmemmap pages associated withe HugeTLB >>>> pages allocated from bootmem for your optimization. However, IIUC, = the >>>> compacting path could access arbitrary struct page when memory = fails >>>> to be allocated via buddy allocator. So we should make sure that >>>> those struct pages are not referenced in this routine. And I know >>>> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter >>>> the same issue, but I don't find any code to prevent this from >>>> happening. I need more time to confirm this, if someone already = knows, >>>> please let me know, thanks. So I think HugeTLB should adopt the = similar >>>> way to prevent this. >>>> Thanks. >>>=20 >>> Thanks for the reviews. >>>=20 >>> So if I understand it correctly, the uninitialized pages due to the = optimization in this patch and due to DEFERRED_STRUCT_PAGE_INIT should = be treated in the same way during compaction. I see that in = isolate_freepages during compaction there is a check to see if PageBuddy = flag is set and also there are calls like __pageblock_pfn_to_page to = check if the pageblock is valid. >>>=20 >>> But if the struct page is uninitialized then they would contain = random data and these checks could pass if certain bits were set? >>>=20 >>> Compaction is done on free list. I think the uninitialized struct = pages atleast from DEFERRED_STRUCT_PAGE_INIT would be part of freelist, = so I think their pfn would be considered for compaction. >>>=20 >>> Could someone more familiar with DEFERRED_STRUCT_PAGE_INIT and = compaction confirm how the uninitialized struct pages are handled when = compaction happens? Thanks! >>=20 >> Hi Mel, >>=20 >> Could you help us answer this question? I think you must be the = expert of >> CONFIG_DEFERRED_STRUCT_PAGE_INIT. I summarize the context here. As we = all know, >> some struct pages are uninnitialized when = CONFIG_DEFERRED_STRUCT_PAGE_INIT is >> enabled, if someone allocates a larger memory (e.g. order is 4) via = buddy >> allocator and fails to allocate the memory, then we will go into the = compacting >> routine, which will traverse all pfns and use pfn_to_page to access = its struct >> page, however, those struct pages may be uninnitialized (so it's = arbitrary data). >> Our question is how to prevent the compacting routine from accessing = those >> uninitialized struct pages? We'll be appreciated if you know the = answer. >>=20 >=20 > I didn't check the code but IIRC, the struct pages should be at least > valid and not contain arbitrary data once page_alloc_init_late = finishes. However, the buddy allocator is ready before page_alloc_init_late(), so = it may access arbitrary data in compacting routine, right? >=20 > --=20 > Mel Gorman > SUSE Labs