From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88270E784A4 for ; Mon, 2 Oct 2023 07:04:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 244748D0005; Mon, 2 Oct 2023 03:04:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F4A68D0001; Mon, 2 Oct 2023 03:04:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BCBB8D0005; Mon, 2 Oct 2023 03:04:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EFD848D0001 for ; Mon, 2 Oct 2023 03:04:11 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id C5BAEA0160 for ; Mon, 2 Oct 2023 07:04:11 +0000 (UTC) X-FDA: 81299632302.30.944CDF3 Received: from out-207.mta1.migadu.com (out-207.mta1.migadu.com [95.215.58.207]) by imf04.hostedemail.com (Postfix) with ESMTP id E73994001A for ; Mon, 2 Oct 2023 07:04:08 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Rbkog8po; spf=pass (imf04.hostedemail.com: domain of yajun.deng@linux.dev designates 95.215.58.207 as permitted sender) smtp.mailfrom=yajun.deng@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696230249; a=rsa-sha256; cv=none; b=QkROYzjcnzpBmjnwzZClqqmCedkMaCeKM3t6RBpvnRTVYOPEqJ4tNGU3gtxqaOMN0+CHgz mWb+npyRTxjU59LMLkgkWWyAf7uCyaWjGupMX8iauWefwyi/KQH5B39Q0TrOQiSN7lKolq 1nlDhcvMR0DL80TyeNIUkeg0w7ku7lk= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Rbkog8po; spf=pass (imf04.hostedemail.com: domain of yajun.deng@linux.dev designates 95.215.58.207 as permitted sender) smtp.mailfrom=yajun.deng@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696230249; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Nq8NwlZ519KqON4hNH1Et3LCB/6xP0wXrfjAxri2wT8=; b=NWs9YObb1T+uU/5xGIaeZdf5U5DfepFM8p7HYjfxauXJ8lR93o0v4P3xp6jbAoJRQgI+LL qYLij9tr53X3ZiwFtp+iTDT2N2pGXAmFkd+edi7IcDu3f/GO1/HFENteeqvNep6ATslMGH PiNcbdAwR/LZFTukv1c2YFjplMQqtTM= Message-ID: <90342474-432a-9fe3-2f11-915a04f0053f@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1696230247; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Nq8NwlZ519KqON4hNH1Et3LCB/6xP0wXrfjAxri2wT8=; b=Rbkog8pok8p7gsjhmBMWrxgWXKDxdzi7XxbK8cncC8tZ3ZRHH1vGdozpM5r/wNSZJ9O3pZ LztCFdafWFJVK0lT3sPQ/f1sCe4qHm4cZt5CqxejBRGm28yLxA9MdjA83gXXfoAgDNH612 n3DEXuNbIW2wqnP71tJkst8qjCd9Eag= Date: Mon, 2 Oct 2023 15:03:56 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v4 2/2] mm: Init page count in reserve_bootmem_region when MEMINIT_EARLY Content-Language: en-US To: Mike Rapoport Cc: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, willy@infradead.org, david@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230928083302.386202-1-yajun.deng@linux.dev> <20230928083302.386202-3-yajun.deng@linux.dev> <20230929083018.GU3303@kernel.org> <20230929100252.GW3303@kernel.org> <15233624-f32e-172e-b2f6-7ca7bffbc96d@linux.dev> <20231001185934.GX3303@kernel.org> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yajun Deng In-Reply-To: <20231001185934.GX3303@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E73994001A X-Stat-Signature: 8pzgwg1yzz534kfy4dfzhqojiorqtkf8 X-Rspam-User: X-HE-Tag: 1696230248-390064 X-HE-Meta: U2FsdGVkX1/50LhSFLvYn50RGKjqn6zkx1/LerjK8eKQDV2/snEfHghw1QnGADV9KzHrXSuHIlnJPFjin2RE4S6iMth1itNueUgA0BS0o3lJ6YPSQ/Xbz8hEwy5u4jTbcAuAWIKU0/9sFs0z1RW9OwBCJWAITgPQHbCd3129TqhkMS5N7dGg9/hPO5/FwmYzYFLGX7B5RfRVeUmgB1NYl9VkjfG2ewZ/pq1TKmnFHCF5ItJjN5sYWg+DP2PKEacnPVkdbclt3GleW05djF6CwEWO8SNKxFar7N3j2Pz3r4w8vW7k8Kl75smHJeUYg1S5vJczuA+nzFA/NDnE6eJU9nFbNQNhnb+8qDV6pCcgv2CJ4a7iB0oc/ftxTu3zzMwFt1ZfYy/hei+q1R9b5DCscMBosu3Tyy19zERBb4rEKoKsCot5RUD2NbLeU09R5ccrUzbvk08uLyyqywPdtu+XMzm4NsBLLk8Gi7ijreGugElO1RAuZjzW5w0ZwHdraS99iM6uXLWQR8J8RuDvKRaHE2ecPAyTXzGCuX/Ki8AO9kef6OXqT4CKnYTxKLmwDY2x9nMO0BQU1sAU6NcLrlfZDnWreoxtYJzHfJXsGbuaJs9Gl4wOTqqXpem4Q8Ra+FYgvRwcPUKw3ALBsb+AZ3ZlV76C/G3ncpxHGfIMpz3vyIAn1lz+8Tym4KmXgIf4Jnvt59Xkc8iz2oVwsj+rw+bbAbWEZLHoQFSw1cWZqBT20Tgfy6UkvwvjjnhI+wSJok9QUDTRPH0iM2VBA2m0F+KC4Y/FgEliooMTSqqykX/9BfaqA9hbCG/9qEeFPZvhwQ8Ri7v85EcGuDq3/ZHSc0jxEKgTrhqc4T9Lth/OME/D3oNQyv+y0joCEke5hg4RGHGJFKEPxpeORLctjGAve+N/jBu9gHPUw33VbvIBfHmtefRtVvDjdlMBks9wKuCcQ0AI14xysh8X3OPYySKErKd Vuqwp7Fq Iwcz3IhziXcDIlMt3XVka5NpZGTHhs6UvxJjuyebrP9/RgYzMsTvLGd/0s318crv5T64AOPgiBQ9e4horF/BbFIT+T8FZF/kQ8xtaDEHelPscXFyoD+V6E0bJqx1xcsHJQC2aCTYOB7hOEO0k7GHG15AX9+y+GOSK2rCwJ0YeQ+Aqo8xCA1kc81A8KWrBMvNHcsAG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/10/2 02:59, Mike Rapoport wrote: > On Fri, Sep 29, 2023 at 06:27:25PM +0800, Yajun Deng wrote: >> On 2023/9/29 18:02, Mike Rapoport wrote: >>> On Fri, Sep 29, 2023 at 05:50:40PM +0800, Yajun Deng wrote: >>>> On 2023/9/29 16:30, Mike Rapoport wrote: >>>>> On Thu, Sep 28, 2023 at 04:33:02PM +0800, Yajun Deng wrote: >>>>>> memmap_init_range() would init page count of all pages, but the free >>>>>> pages count would be reset in __free_pages_core(). There are opposite >>>>>> operations. It's unnecessary and time-consuming when it's MEMINIT_EARLY >>>>>> context. >>>>>> >>>>>> Init page count in reserve_bootmem_region when in MEMINIT_EARLY context, >>>>>> and check the page count before reset it. >>>>>> >>>>>> At the same time, the INIT_LIST_HEAD in reserve_bootmem_region isn't >>>>>> need, as it already done in __init_single_page. >>>>>> >>>>>> The following data was tested on an x86 machine with 190GB of RAM. >>>>>> >>>>>> before: >>>>>> free_low_memory_core_early() 341ms >>>>>> >>>>>> after: >>>>>> free_low_memory_core_early() 285ms >>>>>> >>>>>> Signed-off-by: Yajun Deng >>>>>> --- >>>>>> v4: same with v2. >>>>>> v3: same with v2. >>>>>> v2: check page count instead of check context before reset it. >>>>>> v1: https://lore.kernel.org/all/20230922070923.355656-1-yajun.deng@linux.dev/ >>>>>> --- >>>>>> mm/mm_init.c | 18 +++++++++++++----- >>>>>> mm/page_alloc.c | 20 ++++++++++++-------- >>>>>> 2 files changed, 25 insertions(+), 13 deletions(-) >>>>>> >>>>>> diff --git a/mm/mm_init.c b/mm/mm_init.c >>>>>> index 9716c8a7ade9..3ab8861e1ef3 100644 >>>>>> --- a/mm/mm_init.c >>>>>> +++ b/mm/mm_init.c >>>>>> @@ -718,7 +718,7 @@ static void __meminit init_reserved_page(unsigned long pfn, int nid) >>>>>> if (zone_spans_pfn(zone, pfn)) >>>>>> break; >>>>>> } >>>>>> - __init_single_page(pfn_to_page(pfn), pfn, zid, nid, INIT_PAGE_COUNT); >>>>>> + __init_single_page(pfn_to_page(pfn), pfn, zid, nid, 0); >>>>>> } >>>>>> #else >>>>>> static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {} >>>>>> @@ -756,8 +756,8 @@ void __meminit reserve_bootmem_region(phys_addr_t start, >>>>>> init_reserved_page(start_pfn, nid); >>>>>> - /* Avoid false-positive PageTail() */ >>>>>> - INIT_LIST_HEAD(&page->lru); >>>>>> + /* Init page count for reserved region */ >>>>> Please add a comment that describes _why_ we initialize the page count here. >>>> Okay. >>>>>> + init_page_count(page); >>>>>> /* >>>>>> * no need for atomic set_bit because the struct >>>>>> @@ -888,9 +888,17 @@ void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone >>>>>> } >>>>>> page = pfn_to_page(pfn); >>>>>> - __init_single_page(page, pfn, zone, nid, INIT_PAGE_COUNT); >>>>>> - if (context == MEMINIT_HOTPLUG) >>>>>> + >>>>>> + /* If the context is MEMINIT_EARLY, we will init page count and >>>>>> + * mark page reserved in reserve_bootmem_region, the free region >>>>>> + * wouldn't have page count and we will check the pages count >>>>>> + * in __free_pages_core. >>>>>> + */ >>>>>> + __init_single_page(page, pfn, zone, nid, 0); >>>>>> + if (context == MEMINIT_HOTPLUG) { >>>>>> + init_page_count(page); >>>>>> __SetPageReserved(page); >>>>> Rather than calling init_page_count() and __SetPageReserved() for >>>>> MEMINIT_HOTPLUG you can set flags to INIT_PAGE_COUNT | INIT_PAGE_RESERVED >>>>> an call __init_single_page() after the check for MEMINIT_HOTPLUG. >>>> No, the following code would cost more time than the current code in >>>> memmap_init(). >>>> >>>> if (context == MEMINIT_HOTPLUG) >>>> >>>> __init_single_page(page, pfn, zone, nid, INIT_PAGE_COUNT | INIT_PAGE_RESERVED); >>>> else >>>> >>>> __init_single_page(page, pfn, zone, nid, 0); >>> Sorry if I wasn't clear. What I meant was to have something along these lines: >>> >>> enum page_init_flags flags = 0; >>> >>> if (context == MEMINIT_HOTPLUG) >>> flags = INIT_PAGE_COUNT | INIT_PAGE_RESERVED; >>> __init_single_page(page, pfn, zone, nid, flags); >>> >> Okay, I'll test the time consumed in memmap_init(). >>>>> But more generally, I wonder if we have to differentiate HOTPLUG here at all. >>>>> @David, can you comment please? >>>>> >>>>>> + } >>>>>> /* >>>>>> * Usually, we want to mark the pageblock MIGRATE_MOVABLE, >>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>>> index 06be8821d833..b868caabe8dc 100644 >>>>>> --- a/mm/page_alloc.c >>>>>> +++ b/mm/page_alloc.c >>>>>> @@ -1285,18 +1285,22 @@ void __free_pages_core(struct page *page, unsigned int order) >>>>>> unsigned int loop; >>>>>> /* >>>>>> - * When initializing the memmap, __init_single_page() sets the refcount >>>>>> - * of all pages to 1 ("allocated"/"not free"). We have to set the >>>>>> - * refcount of all involved pages to 0. >>>>>> + * When initializing the memmap, memmap_init_range sets the refcount >>>>>> + * of all pages to 1 ("reserved" and "free") in hotplug context. We >>>>>> + * have to set the refcount of all involved pages to 0. Otherwise, >>>>>> + * we don't do it, as reserve_bootmem_region only set the refcount on >>>>>> + * reserve region ("reserved") in early context. >>>>>> */ >>>>> Again, why hotplug and early init should be different? >>>> I will add a comment that describes it will save boot time. >>> But why do we need initialize struct pages differently at boot time vs >>> memory hotplug? >>> Is there a reason memory hotplug cannot have page count set to 0 just like >>> for pages reserved at boot time? >> This patch just save boot time in MEMINIT_EARLY. If someone finds out that >> it can save time in >> >> MEMINIT_HOTPLUG, I think it can be done in another patch later. I just >> keeping it in the same. > But it's not the same. It becomes slower after your patch and the code that > frees the pages for MEMINIT_EARLY and MEMINIT_HOTPLUG becomes non-uniform > for no apparent reason. __free_pages_core will also be called by others, such as: deferred_free_range, do_collection and memblock_free_late. We couldn't removeĀ  'if (page_count(page))' even if we set page count to 0 when MEMINIT_HOTPLUG. The following data is the time consumed in online_pages_range(). start_pfn nr_pages before after 0x680000 0x80000 4.315ms 4.461ms 0x700000 0x80000 4.468ms 4.440ms 0x780000 0x80000 4.295ms 3.835ms 0x800000 0x80000 3.928ms 4.377ms 0x880000 0x80000 4.321ms 4.378ms 0x900000 0x80000 4.448ms 3.964ms 0x980000 0x80000 4.000ms 4.381ms 0xa00000 0x80000 4.459ms 4.255ms sum 34.234ms 34.091ms As we can see, it doesn't become slower with ' if (page_count(page))' when MEMINIT_HOTPLUG. > The if (page_count(page)) is really non-obvious... > >>>>>> - prefetchw(p); >>>>>> - for (loop = 0; loop < (nr_pages - 1); loop++, p++) { >>>>>> - prefetchw(p + 1); >>>>>> + if (page_count(page)) { >>>>>> + prefetchw(p); >>>>>> + for (loop = 0; loop < (nr_pages - 1); loop++, p++) { >>>>>> + prefetchw(p + 1); >>>>>> + __ClearPageReserved(p); >>>>>> + set_page_count(p, 0); >>>>>> + } >>>>>> __ClearPageReserved(p); >>>>>> set_page_count(p, 0); >>>>>> } >>>>>> - __ClearPageReserved(p); >>>>>> - set_page_count(p, 0); >>>>>> atomic_long_add(nr_pages, &page_zone(page)->managed_pages); >>>>>> -- >>>>>> 2.25.1 >>>>>>