From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 482EFEE14D4 for ; Thu, 7 Sep 2023 10:14:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2BA7440183; Thu, 7 Sep 2023 06:14:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CDBDA8E000F; Thu, 7 Sep 2023 06:14:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCA23440183; Thu, 7 Sep 2023 06:14:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AE0E08E000F for ; Thu, 7 Sep 2023 06:14:46 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 58860A01F7 for ; Thu, 7 Sep 2023 10:14:46 +0000 (UTC) X-FDA: 81209392572.16.8903DEC Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by imf11.hostedemail.com (Postfix) with ESMTP id 75EE740007 for ; Thu, 7 Sep 2023 10:14:42 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=KziLkpZA; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf11.hostedemail.com: domain of usama.arif@bytedance.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=usama.arif@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694081684; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6arKeyrgPphWsjC9458pVzLHkvbc4JWeWn+IPDbctQc=; b=VKaKZBXyV+eV+hNqW+UTDKyfqwyO39GrwrL8eH/uLkE1txaJtQiznHbs+dJZUEge1O+9GS 2+tMRwNVRrivqzo2ZTlxcf2E0A52Aj6tiQN46MtDw1WshFxpxJkw6HaX/HNh1IVl3dICII V7llgJq6K/qGJm7nxVvEAZn/yhwjY/M= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=KziLkpZA; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf11.hostedemail.com: domain of usama.arif@bytedance.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=usama.arif@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694081684; a=rsa-sha256; cv=none; b=WRsHsnkQieBcCDkLnAuy50u3MrOd9D1zSLFbx+4JEijj3Ac+Q4rm1UdqDjU6iouaBd3KUV r8ggCmoXa+gwf0TTEteqngWhFbFny512yYC4g9H8WK8o1gslePMnfnksfwSpzSQbq3HTz8 uuARBK9M0KGIevbU22s5rbAO07wNtDM= Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-40078c4855fso8800635e9.3 for ; Thu, 07 Sep 2023 03:14:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1694081680; x=1694686480; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=6arKeyrgPphWsjC9458pVzLHkvbc4JWeWn+IPDbctQc=; b=KziLkpZA+W1TskoRkMGaAxxgws5B4yeQnBZ5sXGjvN3o8j2mYhz+rXFgTcc2pBgzDO mWrL4uJFXRtC1kahtTeAaamgX3U0ppJe0lcZyZMTSbrMXJQ9YqzqbKbVLeokeOOA/1Ze IPrIj0St6RpUpNyMNppQRKLUPEHVEW7IVHvc6lIGewgR9Pi0KHELsd6RnPlKsMbhVq+E V/NNtP993L6Ox7MadJXQpmHuuHhhTd0GlVqbowZMxSZskIM1czjrWVDzWs//aZrwjy47 DCXabkwXTHhVsTwdmYiVIi6tfwngUvXfKFKARzDDTM7McB0EvXh4mCcKD2bSVgM2q2Ja IIGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1694081680; x=1694686480; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6arKeyrgPphWsjC9458pVzLHkvbc4JWeWn+IPDbctQc=; b=CvhvEKYbzHn2+Hyr7XJUtWqhEMB5kxikO8me/H0iPQ6fZfP3RDtHkC6l9L0fZhXui/ ESij+mvlDDNykfQyGGBgrat0+3ZOIaayc1RFaGmP01ortmDP5rBvO+vbKdQQy4cE0Q0R wIQJS3nteZ/hDjxbF720bXMCoO2/pS8StIBVWzsZScuE2S/QALIZlYVlbKJHGjOBi49s XwxWswqTCxle4B41c1EcYqUpKJL1PbaDo/REy349OdU8p3jJp3NbY3Op7FLGpOsZa1sx AUc68SfHN4EyWrChSsshfTetdy690mxWWrNvIIfUbuiwbMPKCA34fxiDCVjzDlDNrdoD e9uQ== X-Gm-Message-State: AOJu0Yx51G1ZQNiNss7jhfo40EGD2szTi7colgFJn0+fcobfmMvbb+Ra 3crVWWzIxK0O+9PzbJ8tyaD2Yw== X-Google-Smtp-Source: AGHT+IFRpPPukBi6IZmLBxS+OzDEd3MJR8tvOwTtzfl4T9BzPzahKmMP5ZhCH3GcpMSc+xkpdZfRGg== X-Received: by 2002:a05:600c:20d1:b0:401:b53e:6c3b with SMTP id y17-20020a05600c20d100b00401b53e6c3bmr4434672wmm.6.1694081680521; Thu, 07 Sep 2023 03:14:40 -0700 (PDT) Received: from ?IPV6:2a02:6b6a:b5c7:0:6274:c7ab:26e2:b113? ([2a02:6b6a:b5c7:0:6274:c7ab:26e2:b113]) by smtp.gmail.com with ESMTPSA id n10-20020adffe0a000000b003140f47224csm22815249wrr.15.2023.09.07.03.14.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 07 Sep 2023 03:14:40 -0700 (PDT) Message-ID: <92fc88ba-3e3a-2648-4232-1d3f9bed5bb6@bytedance.com> Date: Thu, 7 Sep 2023 11:14:38 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: [External] Re: [v4 4/4] mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO Content-Language: en-US To: Mike Kravetz Cc: linux-mm@kvack.org, muchun.song@linux.dev, rppt@kernel.org, linux-kernel@vger.kernel.org, songmuchun@bytedance.com, fam.zheng@bytedance.com, liangma@liangbit.com, punit.agrawal@bytedance.com References: <20230906112605.2286994-1-usama.arif@bytedance.com> <20230906112605.2286994-5-usama.arif@bytedance.com> <20230906181016.GA3612@monkey> <57c8dd7f-d1a0-37c4-1d3b-d6374e92ffa1@bytedance.com> <20230906215927.GE3612@monkey> From: Usama Arif In-Reply-To: <20230906215927.GE3612@monkey> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 75EE740007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: qhqk3gjn9y7uddg6czcyrxpx8pwxekjq X-HE-Tag: 1694081682-196243 X-HE-Meta: U2FsdGVkX1+hHKqPETr0bH3/geNLm4hrKemfOuw1WrJG2t7+Yd04VcCyieaDhVJOgUAPGWWY7LYioPtHEfbciKmmkQHwx7TqpQsvD+PxQ44c9qNLfdzq+6iEpL+lX1OBQDSyZIc1P2rAaZmhcvNXw7dqhJ50A7tyEXoc0IvchfY01EDFqtmgrlQnWWavS0LEDcP445YKM+bnMW6WS2rn8y8qNyJk0qaD4iJ1w3Yef+DodWtbnd8NjpqD8KfigS1ZuJajM6LuMlFnKisJTR5aPglvYydx7dJTMIUP6I5iY90P+Ly7Tpbldi+CHLxLzG++QjD8ddtjW1+TFHnEBiRggImUuaOwwMD6O8slHN3i+p6c1pPJwEbsknmBx0nmJPWgoPHqfyO1FgHPS7KGWUCMHlCuqhFmQVrINOmosBPm619+D0lWG6yXtGvJY6rrkaG1g7jW5EBNxOsPpUWGp8M6zCuEwOtGyZl3Wt2+5ehN/m4w9CjKodweIgn7RBIL6+J7877kZv9WOONogWTIevnAYg2knuxkmV0W5S6482tB8ZtQ04hHmbuob3MPhkXis9WBwk/1KfPvGPYIZat36QJku+R7OcQIyt/k73x2ui1wF+fDsaB/+t0ty1ONVBL7ggSFAfhQocvzJ9Sbh95QR8nJ7+qec5TREa7oWsT8ljSargfcDy8A/E3HSPmNY7AomyjiGJ64XUx+ltIB1xOjb+my+8cylPm9371ezA25t+3HQSeGmibW7JzU8+NXQzYFRHFtfH/D7zcKSCAAqDa3+JT7XyEf8fYDMYblBwnhPMyVEYTUFIYESaUESBzssj8q+urABNFdtvvl/lQN8q8szHF0xB459u+BLkYQKq2UjLdYDYKyJGleTSPI+BLL/klw0iR61kq7ZZQMGicBJdZQG/f/7+QVC9NwDjuViwMJ5e/rWfXOUg/QyD3usJem/1EIi8TTTneaqokQG4DtjeS8h8a kXCVCf4S bCDI/4cBxCmZ5tA0x0HzITd9bw82Qp4GYdH5mhjmvOYk4j/h0iaA26ZilNwDsFKLjY+HaF1ECKQGCyqRFipLkggdTjsLtCdpmZspl8F5KJq7ddnV4u8LKofE/M6gHEI6FLPMnADfvC+hSAkZlQ/g9AA+3LMagtvWaiYAnd3cgDB1em3Za2PiaTBGixRY57386CtxLZzQkXQlYY+BDJpgQzj3IJNlwKF860O+a0K1zmZ9U3K11mB17aJ+xxl/uLa4HjkeHzfWDiLeLJWpTzp2rmAEc8KeHS8z030a7MtYhDQ4MwfL4+6b3mQ8d5aZjwluK52yNZSB69SHjJV01svC8bTHRIPN3X3lXyzi2DUCm4AaSiCSW6I+2W8MCLDxGAnxLeo4k7EI9QVjysk+c+BMuIYMI01QS/FZgRxs5t6GZNjxvSmLC+J5f8Ls5O55GpooGCxybgBw3QL/d1Yo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 06/09/2023 22:59, Mike Kravetz wrote: > On 09/06/23 22:27, Usama Arif wrote: >> >> >> On 06/09/2023 19:10, Mike Kravetz wrote: >>> On 09/06/23 12:26, Usama Arif wrote: >>>> The new boot flow when it comes to initialization of gigantic pages >>>> is as follows: >>>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage, >>>> the region after the first struct page is marked as noinit. >>>> - This results in only the first struct page to be >>>> initialized in reserve_bootmem_region. As the tail struct pages are >>>> not initialized at this point, there can be a significant saving >>>> in boot time if HVO succeeds later on. >>>> - Later on in the boot, the head page is prepped and the first >>>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages >>>> are initialized. >>>> - HVO is attempted. If it is not successful, then the rest of the >>>> tail struct pages are initialized. If it is successful, no more >>>> tail struct pages need to be initialized saving significant boot time. >>> >>> Code looks reasonable. Quick question. >>> >>> On systems where HVO is disabled, we will still go through this new boot >>> flow and init hugetlb tail pages later in boot (gather_bootmem_prealloc). >>> Correct? >>> If yes, will there be a noticeable change in performance from the current >>> flow with HVO disabled? My concern would be allocating a large number of >>> gigantic pages at boot (TB or more). >>> >> >> Thanks for the review. >> >> The patch moves the initialization of struct pages backing hugepage from >> reserve_bootmem_region to a bit later on in the boot to >> gather_bootmem_prealloc. When HVO is disabled, there will be no difference >> in time taken to boot with or without this patch series, as 262144 struct >> pages per gigantic page (for x86) are still going to be initialized, just in >> a different place. > > I seem to recall that 'normal' deferred struct page initialization was > done in parallel as the result of these series: > https://lore.kernel.org/linux-mm/20171013173214.27300-1-pasha.tatashin@oracle.com/ > https://lore.kernel.org/linux-mm/20200527173608.2885243-1-daniel.m.jordan@oracle.com/#t > and perhaps others. > > My thought is that we lose that parallel initialization when it is being > done as part of hugetlb fall back initialization. > > Does that make sense? Or am I missing something? I do not have any proof > that things will be slower. That is just something I was thinking about. The patches for deferring struct page initialization did not cover the struct pages for gigantic pages. With CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled, the function call taken during boot without these patches is: [A1] mm_core_init-> mem_init-> memblock_free_all-> free_low_memory_core_early-> memmap_init_reserved_pages-> reserve_bootmem_region-> initialize *all* struct pages of a gigantic page serially (DEFERRED_STRUCT_PAGE_INIT is enabled). The pfn of the struct pages > NODE_DATA(nid)->first_deferred_pfn which means this cannot be deferred. then later on in the boot: [A2] hugetlb_init-> gather_bootmem_prealloc-> prep_compound_gigantic_folio-> prepare *all* the struct pages to be part of a gigantic page (freezing page ref count, setting compound head, etc for all struct pages) With CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled, the function call taken during boot with these patches is: [B1] mm_core_init->...reserve_bootmem_region-> initialize head struct page only. then later on in the boot: [B2] hugetlb_init-> gather_bootmem_prealloc-> [B21] initialize only 64 tail struct pages if HVO passes. [B22] If HVO fails initialize all tail struct pages. Each of A1, A2 and B22 are for loops going over 262144 struct pages per hugepage. So without these patches, the work done is 262144*2 (A1+A2) per hugepage during boot, even with CONFIG_DEFERRED_STRUCT_PAGE_INIT as its not deferred. With these patches, the work done is either 1 + 64 (B1+B21) if HVO is enabled or 1 + 262144 (B1+B22) if HVO is disabled. With CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled, the times taken to boot till init process when allocating 500 1G hugeppages are: - with these patches, HVO enabled: 1.32 seconds [B1 + B21] - with patches, HVO disabled: 2.15 seconds [B1 + B22] - without patches, HVO enabled: 3.90 seconds [A1 + A2 + HVO] - without patches, HVO disabled: 3.58 seconds [A1 + A2]