From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2541C001DF for ; Wed, 2 Aug 2023 10:05:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1FB78280147; Wed, 2 Aug 2023 06:05:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1AB29280143; Wed, 2 Aug 2023 06:05:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 099F0280147; Wed, 2 Aug 2023 06:05:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EE60C280143 for ; Wed, 2 Aug 2023 06:05:41 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9EC67120930 for ; Wed, 2 Aug 2023 10:05:41 +0000 (UTC) X-FDA: 81078732882.10.800DD6A Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) by imf02.hostedemail.com (Postfix) with ESMTP id AC5C180006 for ; Wed, 2 Aug 2023 10:05:38 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=JzfYxtvZ; spf=pass (imf02.hostedemail.com: domain of usama.arif@bytedance.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=usama.arif@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690970739; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cF2FUhStQzchBg8EsNZ3PTlwlia/z+UG1itwYa+1w48=; b=WtVi+SNTLl02QIcBl6LjIDVFRfkgy2P635Ih1yXd8XQVieFV/o5F4OpYKmUeBt0uidwtn2 VVzCAEgzb37Muylo07XnRrfCI6dsXn3ctJRBgTkoZqDSr6I2V6Tdd5E4ry9fcTNX0qv9nd fLya0cm2qDMB3yDDYy8FYLD1oCwVlAA= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=JzfYxtvZ; spf=pass (imf02.hostedemail.com: domain of usama.arif@bytedance.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=usama.arif@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690970739; a=rsa-sha256; cv=none; b=cDThwXCa8JVQ/Ap2pG2tE+EqNkNXetxc8lkwr5knDYLsiikQ/8BX2N6/XPj7my9tXvLKCd l+X5juekrvlQ84tMI9HSI+clrYx9+DDtrA7tK8LgFKWP+56IOOweS4ZHXEqSUS0MfDEQzb 6aSI4lVHdTecEc4j25U1lLHw4WXHllw= Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-3fbc59de0e2so62052885e9.3 for ; Wed, 02 Aug 2023 03:05:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690970737; x=1691575537; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=cF2FUhStQzchBg8EsNZ3PTlwlia/z+UG1itwYa+1w48=; b=JzfYxtvZfxxFD6qkz6THP204zLrAiElgl0934FX0RHk3ykyx8MwCzq2wjKaeQkrJJ4 ISUnpj2tVfFBAn9YipG8V4o7XH4bW8V8F/Gz+aCm+fqCFsCMS1ovL1qGpePz9I5ZE/u6 SC7xTqA8aiQkzfjgf24AqVRpNsU909t3DEn8e9A/sKZvBes5ZNkbokaAFUw6+KtfWxdl bZuDw0sSosmGQyfRa8xCm2GGzkzC4sKHJt/G16tv+/0heA+KAbSO5QEKb9CKisssPX5K E3S/Ba/HIn4QOZq6L/NKViYZuPwZxBGB0ySGve2kqP3rgmV+9NCggNBA0Et3wwzHlZ+P ivYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690970737; x=1691575537; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cF2FUhStQzchBg8EsNZ3PTlwlia/z+UG1itwYa+1w48=; b=SmwbpbixSauoPeKCskUN/fW7IHu7Q3dM/Qf/7IY1FPs8v0dzpFq1s+idignFbBByzW Gvz2yb0ilNJuGZFfaE0Dd7hHw8YR+5KhsP9p7b/Fld6Eloz6f3gXvA3nz+V+y5grfVuK OJjXI5DO8++TxSyLBbOf+DggulvfDtWPwlcOrMptSZJA54ENt+gMc31IajlB5+b3pHDA C+XP0+uMcuWQTigTkrzOoUt0gVcC0KRXz106cnhO5/Kt2n7XXd//J76mPuzvotojQZJ2 WjyFGDgNM/CrIDvBgfqSkcd4IwPb43QUqsfGD3VBxVl3ahkv+6xDQVezbVGcTeJFkAyo dCQA== X-Gm-Message-State: ABy/qLaaQNxU2c8l5vHPW/WXxqaUwE+CSzbap2P7JvuUfjxhguxDO+c0 XLzOozAWAHD4DxQIHQjjcBKl0g== X-Google-Smtp-Source: APBJJlH3pfPlzSGgbSJIfcxXkXUxVokwvam+tnvDb/DSTS7pWmqUOuB79Em+1psry3vpjf4x7C4JmQ== X-Received: by 2002:a7b:cc88:0:b0:3fe:1d13:4663 with SMTP id p8-20020a7bcc88000000b003fe1d134663mr4426709wma.1.1690970736804; Wed, 02 Aug 2023 03:05:36 -0700 (PDT) Received: from ?IPV6:2a02:6b6a:b465:0:bbb6:1437:eebd:d9ef? ([2a02:6b6a:b465:0:bbb6:1437:eebd:d9ef]) by smtp.gmail.com with ESMTPSA id q7-20020a1cf307000000b003fa98908014sm1286703wmq.8.2023.08.02.03.05.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Aug 2023 03:05:36 -0700 (PDT) Message-ID: <540a0f2e-31e0-6396-be14-d9baec608b87@bytedance.com> Date: Wed, 2 Aug 2023 11:05:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: [External] Re: [v2 1/6] mm: hugetlb: Skip prep of tail pages when HVO is enabled Content-Language: en-US To: Mike Kravetz Cc: linux-mm@kvack.org, muchun.song@linux.dev, rppt@kernel.org, linux-kernel@vger.kernel.org, fam.zheng@bytedance.com, liangma@liangbit.com, simon.evans@bytedance.com, punit.agrawal@bytedance.com, Muchun Song References: <20230730151606.2871391-1-usama.arif@bytedance.com> <20230730151606.2871391-2-usama.arif@bytedance.com> <20230731231841.GA39768@monkey> From: Usama Arif In-Reply-To: <20230731231841.GA39768@monkey> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: AC5C180006 X-Rspam-User: X-Stat-Signature: darjnfond5zopew9ir5rqj494f5hgk6a X-Rspamd-Server: rspam01 X-HE-Tag: 1690970738-645066 X-HE-Meta: U2FsdGVkX199hTjrC8G8qXch0VYBnaXfNowBlEmPvjD9gT2cNDbRY6809KeThS1BEsC+F4+NCAM+oxquFt/nFj5ETsCIJpmhcCLbi7hRmaBiTYK3dfwYOO9fi+xRbqWCW+ZncXyKz8w83BfkUrohjHwukvGbQ9DfuX8oOGlVJYqlOuT0VHhT7c1HarlzrohNQsqzUewPjBlj2Fskndc+BgW0ihmT8NPazir2Nzotww3B3vSFY1bSWdfM1cTEwQz39Q/75PLozNC5mkX3AXE4wBO9iNegw87SYYwCAF3/mbQ5BMnBSBWAvVGLSS2dSHH5J2r2ARG9S2IJVq0qm2W6BVx2tbAvtgqBfR9AuybaHiS5YSjNAcB9VNyUfLk9Qi+vdZQyYKeMvKMjv+mQq/QwU7cywt74GUu5QkG5BIsR755GhXczRAZwDpqEo3SnZgSAkv0Wvj0YEJlXFG41smsuT3+qY+e/t0kyyYciO9tkBSQjHCdqdX4Nj1fY47TFy3eZvcfxIOaww9vGtj9pzTSKssbcUnkg8pScdcm+80ZeK/WE+L1d+/hEGTzGAnK2LheZHx2CEjGcPePyf+AeG7dEOVbG2ObfZiFBral5jz3amJW6/QRqL5pcuP2eD1NSIUBaBS81bdlRFxB1Ok0JPIZGVqIFib9QYDkyk4P93FMPrqE+At63ANAT8IjUcl2GUWP0Ngj+c+0JZdSn0khwCPxEZk2GkSOITAPE9CUDEHd8MJuFY4MJxvZX1N54lwOyeoqBCDgiWzGJDsEKhHFloeYJ+C21cQ6p3xUcMy93BCmMmCfZ6tEHxxkWRpkrrAy/IaD8A+sZDKkFcH8levnkE+oXoKyptOXjnRZCbSGP9JVfhddmAztwemBi6nPT2d1na8a3S8JOCwMN7JNTPra1k74snIeuQMCIrHjoLILWRJ5WRMXXxIAmH7MYFdhSF14HKkZTQ8K07y63RzQPhUGwcEo CKVHvb82 pcYLUzbTe9bskWwSWIntYsGi/8df54HHEgqsQ/4X5nattCfqF6TEngoWLbwyBO2pZdNzLHqK5wZ1XyhZ6dXivGsYmqC4dxpcjEtN/xo6AlsMXXaqIYe8utb71yr5EuxB7eNYFwjKVSakyvqOAF62AReKL8DahT0VNhe7dsjunEahhvFeGQG6thokybpJaJveq+OgHwEDu6QQGY3i0niM6AVLjMdtSkKTJOfVA2ojQJXpT4bIMB5ZT4HqZlRVNsiFtbMORSx1F+xB2aqqQp3fcp8OX4O++M/IODWMvuilr6URBfC0TpaSoSKrdHsdWtPu1+ThYW8xhU4UywNDAb+YrXYzJulao7zKVEhXcOhlIQHiE8+neJfo0MB9A8l99Sp5gYZORv98x6lo645fk685891VdJ4gKPHUCixkc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 01/08/2023 00:18, Mike Kravetz wrote: > On 07/30/23 16:16, Usama Arif wrote: >> When vmemmap is optimizable, it will free all the duplicated tail >> pages in hugetlb_vmemmap_optimize while preparing the new hugepage. >> Hence, there is no need to prepare them. >> >> For 1G x86 hugepages, it avoids preparing >> 262144 - 64 = 262080 struct pages per hugepage. >> >> The indirection of using __prep_compound_gigantic_folio is also removed, >> as it just creates extra functions to indicate demote which can be done >> with the argument. >> >> Signed-off-by: Usama Arif >> --- >> mm/hugetlb.c | 32 ++++++++++++++------------------ >> mm/hugetlb_vmemmap.c | 2 +- >> mm/hugetlb_vmemmap.h | 15 +++++++++++---- >> 3 files changed, 26 insertions(+), 23 deletions(-) > > Thanks, > > I just started looking at this series. Adding Muchun on Cc: > >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 64a3239b6407..541c07b6d60f 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -1942,14 +1942,23 @@ static void prep_new_hugetlb_folio(struct hstate *h, struct folio *folio, int ni >> spin_unlock_irq(&hugetlb_lock); >> } >> >> -static bool __prep_compound_gigantic_folio(struct folio *folio, >> - unsigned int order, bool demote) >> +static bool prep_compound_gigantic_folio(struct folio *folio, struct hstate *h, bool demote) >> { >> int i, j; >> + int order = huge_page_order(h); >> int nr_pages = 1 << order; >> struct page *p; >> >> __folio_clear_reserved(folio); >> + >> + /* >> + * No need to prep pages that will be freed later by hugetlb_vmemmap_optimize. >> + * Hence, reduce nr_pages to the pages that will be kept. >> + */ >> + if (IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP) && >> + vmemmap_should_optimize(h, &folio->page)) > > IIUC, vmemmap_optimize_enabled (checked in vmemmap_should_optimize) can be > modified at runtime via sysctl. If so, what prevents it from being changed > after this check and the later call to hugetlb_vmemmap_optimize()? Hi, Thanks for the review. Yes thats a good catch. The solution for this issue would be to to turn hugetlb_free_vmemmap into a callback core_param and have a lock around the write and when its used in gather_bootmem_prealloc, etc. But the bigger issue during runtime is what Muchun pointed out that the struct page refcount is not frozen to 0. My main usecase (and maybe for others as well?) is reserving these gigantic pages at boot time. I thought the runtime improvement might come from free with it but doesnt look like it. Both issues could be solved by just limiting it to boot time, as vmemmap_optimize_enabled cannot be changed during boot time, and reference to those pages cannot gotten by anything else as well (they aren't even initialized by memblock after patch 6), so will include the below diff to solve both the issues in next revision. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8434100f60ae..790842a6f978 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1956,7 +1956,8 @@ static bool prep_compound_gigantic_folio(struct folio *folio, struct hstate *h, * Hence, reduce nr_pages to the pages that will be kept. */ if (IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP) && - vmemmap_should_optimize(h, &folio->page)) + vmemmap_should_optimize(h, &folio->page) && + system_state == SYSTEM_BOOTING) nr_pages = HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page); for (i = 0; i < nr_pages; i++) { Thanks, Usama