From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C20D7C001DE for ; Fri, 28 Jul 2023 15:51:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 227786B0071; Fri, 28 Jul 2023 11:51:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D7E98D0002; Fri, 28 Jul 2023 11:51:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 079248D0001; Fri, 28 Jul 2023 11:51:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id ED64B6B0071 for ; Fri, 28 Jul 2023 11:51:39 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BB9B2140577 for ; Fri, 28 Jul 2023 15:51:39 +0000 (UTC) X-FDA: 81061460718.15.57B3F4F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 7B26E1A0019 for ; Fri, 28 Jul 2023 15:51:37 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HTMfHkGR; spf=pass (imf19.hostedemail.com: domain of mpenttil@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mpenttil@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690559497; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XHjZAePmFJB7+dA+KDoRXnwZZfqjqf+DFxLHYu/V8+Y=; b=pWzgJr2cPCVYrmfhTjBduhJ0Pr3gjTr1T6m4VYKWpDdyvOrB791/vViA53PCV9t3ixtxo6 AYbAO6Wv+G72WlBuOx9Gw90UKBnoZoYyVLtOWktyBKnj5gxaAgTXClZj2EwTAr94fKt0Ju pXDNgFuHlrtQnTCaygoGaSaALinG46c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690559497; a=rsa-sha256; cv=none; b=XWX6jQ+Xw93r2NC1ndTw90J6WDpHeOEXB7yJfLyy0Ow/kV2k1+sH+RMt9GFAmSQs7s4gR4 /TdcuN0aqINC/CgCjV6s9UtQlkKyTH49h+IfzdYE+wgwmZHBWX0chbF12JX500jBzLadRg Frdm6SxMtUNNF7VLt8BN4w2BhomTRe4= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HTMfHkGR; spf=pass (imf19.hostedemail.com: domain of mpenttil@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mpenttil@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690559496; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XHjZAePmFJB7+dA+KDoRXnwZZfqjqf+DFxLHYu/V8+Y=; b=HTMfHkGRpSSg9dn3XN3xiNdzaG985sYFhQC6VUd2AI1ceMdVDBExwW/L85twKjMGEgQ+Z+ gVlNQZ8Kh3Pwjf85QgDDvDnEjAmX9y0v4xkgKH5SStZtb4whEryMtVQ39BD3TYwvkaahQI T2F9oNyGsdMnF9RAfyW7chbI3i347Lw= Received: from mail-lj1-f199.google.com (mail-lj1-f199.google.com [209.85.208.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-503-btEV3SrWMVi0ky29Vn3CBg-1; Fri, 28 Jul 2023 11:51:35 -0400 X-MC-Unique: btEV3SrWMVi0ky29Vn3CBg-1 Received: by mail-lj1-f199.google.com with SMTP id 38308e7fff4ca-2b9b1a21b93so22396551fa.2 for ; Fri, 28 Jul 2023 08:51:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690559494; x=1691164294; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XHjZAePmFJB7+dA+KDoRXnwZZfqjqf+DFxLHYu/V8+Y=; b=AEw1LNZDcCVidzdHxO15Y2tyi8BaVLb3Z4hNzRD0h8cjbXD8lnMt0YPv+LkApEO858 RPFI1xKMVOBjur3ScbCwyUAf7AGIWJyoJuqk04IoQ8OT1pv99+DR3EWFkzVk5yU6frDZ ulRkKirH2KTXIqyCQy51Bg/eB8raQmRihfjcz7Vfso9ptSFGglJW7s7SPTX3Qb9UzLlX FYxJImIwE+YVS16ZWPK2q4WegqcL1jPsNqTdLf3JCOp4zyBL9z6HpiJq2mEsm3Om3VSM YnmY8ARcErZBpw0qV8vm26KUbHaqfiqV7x2WQd04YT5p2feUFPqujEF/l723efwa8vvD zfsg== X-Gm-Message-State: ABy/qLYJRZ+T72Hf3JCkox3S2V6Kcs9Z1Eyqm2YQdOGCl+f7H9LBdPWA yB6Hdq07pJQD+R2EVR7/MrBA4DTeUWPVMgCae6A5Z5I3EGpwvG5sFGfW9WMRXY+TgYQdg/Ou9xd xXmS74lpCog== X-Received: by 2002:a2e:7a05:0:b0:2b6:a763:5d13 with SMTP id v5-20020a2e7a05000000b002b6a7635d13mr2237737ljc.27.1690559494245; Fri, 28 Jul 2023 08:51:34 -0700 (PDT) X-Google-Smtp-Source: APBJJlF1IH2y/C0/aauMFoLZ54Y6JUdd8x5e1dPvQVv6ncL6riabbGzqpGWrIy+c0v8Fm8IDAtP2FQ== X-Received: by 2002:a2e:7a05:0:b0:2b6:a763:5d13 with SMTP id v5-20020a2e7a05000000b002b6a7635d13mr2237717ljc.27.1690559493884; Fri, 28 Jul 2023 08:51:33 -0700 (PDT) Received: from [192.168.1.86] (85-23-20-79.bb.dnainternet.fi. [85.23.20.79]) by smtp.gmail.com with ESMTPSA id x12-20020a2e7c0c000000b002b6de6deb5asm1010431ljc.2.2023.07.28.08.51.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Jul 2023 08:51:33 -0700 (PDT) Message-ID: <2ed29e83-3aab-712c-5290-a20faabccc0a@redhat.com> Date: Fri, 28 Jul 2023 18:51:32 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [External] Re: [v1 4/6] memblock: introduce MEMBLOCK_RSRV_NOINIT flag To: Usama Arif , linux-mm@kvack.org, muchun.song@linux.dev, mike.kravetz@oracle.com, rppt@kernel.org Cc: linux-kernel@vger.kernel.org, fam.zheng@bytedance.com, liangma@liangbit.com, simon.evans@bytedance.com, punit.agrawal@bytedance.com References: <20230727204624.1942372-1-usama.arif@bytedance.com> <20230727204624.1942372-5-usama.arif@bytedance.com> <55750855-0029-b10f-3317-e6ae4d89d492@redhat.com> From: =?UTF-8?Q?Mika_Penttil=c3=a4?= In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 7B26E1A0019 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: uktgyo1hce16h1pux3m7rm8st93hc68j X-HE-Tag: 1690559497-369635 X-HE-Meta: U2FsdGVkX19EX3wyh4MIPdKKxvPbo0xBT2emIxSa7AzvIO60QvUB069giQmB+hhBH59qAmI9p4xTtQBglHeQlbdu4Eqvqd4uKF6Wgx06Ro1XBGYS2q9kpYiK9unEuG+yT5pK+ftl7AfW5d49lYGrKahzIeLLYQ2GbApLpmOpVVwoZEISb5L/VTYETXKJ8lcBDzcDMwQL7dewQ190W1+s5rv7ZqVhFgh3ypJuaLazPe7+PVUk4bx9laMcOSgFyhwTafy71lw+BgqzoYIXD4UhhDd8S2rvKa+SxDjFO7VsM33tF8GgOL8UAURFO8hXMmxZWccLd8LjOcPwkDC51A6BXPiRRqAFrrPQ/Wt7un3Rt7oQuTnh5G0lQcTubl3ZCsPeBRHjcqzc+B31v6eX+AN4TWUTnMD3XPH2GXTTkQkUzxqWSS8A3jjGWvZX11j4jiOJRTpcW5jPOA//0zp+s5Bx0H0x01I2+v0X2x3eizPvVxb7nuIvB9QhInOloqmW+Smu7grGNu+DmKN14oyrMP2+uAqH6eSMNZG+dW49argyfAct4+FRKpQRhKmMajsBxC1mFl2VENZbLTcc0vDqmI0wWPN6pXOEOGoi7ffPve/s8Aw/OPBvU9NfqvSImZUDh9ZF0sT5S8aX/I8S2jYogY/4Unvlrohx+smSA5zHruLh8tSTseIyJFkrGylUN9MtDZtWfa2kzHS8baILlFqGqs0pbSogt6um7JWFPQJQfv5gmnFXBdWwG+G25Qcs/vP/+8h6p/DCP+XPtozKxDeE7EbwjSUIERlMuDjmVNL79/wqgfqWc5LNSElN6w47ngjOvlYJuit1liZuYPiXC6PFhLeEbWQdhVVH9JfBXp2nfW/xXcNqEoSOSn/OhoA1dJImYm3JpC3alvTLO6a1FjnHIjTdMuMyg6R/oGmzse9KkGK7jyQUSQ8joh33jpALCo2QB0e0KCK6nk4UL/uYrGIZon4 8PxvmsMC oVWIE6VzcRZRpi9uAnNyHzETpgWqDExtm07OQJdi93MkEPYlAaMclHI0mmY1akQcy7ZRC8an3hkoFHu9yEsuqCHqmlYukPDphI2aBzrMCqiwXvqGL/4YrByzFRiyq3dG3SPZbK3jVJFC3homTM2TzSDYjOhxGAu+7tvjUwTGJKK/obDWuKtwNTATvVo9oa0Nrg9rwpmF7sN2rhCQsxsqLZL8ODiU8xGZEWfAlFCoqqOTrVA1PrDrUxooIoMKuJq9T9OOePbzPGIEwlLCcTvGtGJCucuhq09bYtdcx4EUiz3XNnyy4NzI0J6dXS793zGeu9LVrXvAEbo7W0oiCF1bnt9mOj6KGaEI3xRyZKDqZEj2PwlE5NDAnaJ2CxRzKVmVFP71ZIYAJzTJwB4HCmzBPnw14HmwwDyg+6Gzgcukg2Ax65eQ8IS5BleMF9FUkj/hn8UOlIL0AJDvpZXQrbROIHT3oLzukTjozOU2tWgJqDyMMPWbiXCnh2jdVeg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/28/23 16:47, Usama Arif wrote: > > > On 28/07/2023 05:30, Mika Penttilä wrote: >> Hi, >> >> On 7/27/23 23:46, Usama Arif wrote: >> >>> For reserved memory regions marked with this flag, >>> reserve_bootmem_region is not called during memmap_init_reserved_pages. >>> This can be used to avoid struct page initialization for >>> regions which won't need them, for e.g. hugepages with >>> HVO enabled. >>> >>> Signed-off-by: Usama Arif >>> --- >>>   include/linux/memblock.h |  7 +++++++ >>>   mm/memblock.c            | 32 ++++++++++++++++++++++++++------ >>>   2 files changed, 33 insertions(+), 6 deletions(-) >>> >>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h >>> index f71ff9f0ec81..7f9d06c08592 100644 >>> --- a/include/linux/memblock.h >>> +++ b/include/linux/memblock.h >>> @@ -47,6 +47,7 @@ enum memblock_flags { >>>       MEMBLOCK_MIRROR        = 0x2,    /* mirrored region */ >>>       MEMBLOCK_NOMAP        = 0x4,    /* don't add to kernel direct >>> mapping */ >>>       MEMBLOCK_DRIVER_MANAGED = 0x8,    /* always detected via a >>> driver */ >>> +    MEMBLOCK_RSRV_NOINIT    = 0x10,    /* don't call >>> reserve_bootmem_region for this region */ >>>   }; >>>   /** >>> @@ -125,6 +126,7 @@ int memblock_clear_hotplug(phys_addr_t base, >>> phys_addr_t size); >>>   int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); >>>   int memblock_mark_nomap(phys_addr_t base, phys_addr_t size); >>>   int memblock_clear_nomap(phys_addr_t base, phys_addr_t size); >>> +int memblock_rsrv_mark_noinit(phys_addr_t base, phys_addr_t size); >>>   void memblock_free_all(void); >>>   void memblock_free(void *ptr, size_t size); >>> @@ -259,6 +261,11 @@ static inline bool memblock_is_nomap(struct >>> memblock_region *m) >>>       return m->flags & MEMBLOCK_NOMAP; >>>   } >>> +static inline bool memblock_is_noinit(struct memblock_region *m) >>> +{ >>> +    return m->flags & MEMBLOCK_RSRV_NOINIT; >>> +} >>> + >>>   static inline bool memblock_is_driver_managed(struct >>> memblock_region *m) >>>   { >>>       return m->flags & MEMBLOCK_DRIVER_MANAGED; >>> diff --git a/mm/memblock.c b/mm/memblock.c >>> index 4fd431d16ef2..3a15708af3b6 100644 >>> --- a/mm/memblock.c >>> +++ b/mm/memblock.c >>> @@ -997,6 +997,22 @@ int __init_memblock >>> memblock_clear_nomap(phys_addr_t base, phys_addr_t size) >>>       return memblock_setclr_flag(base, size, 0, MEMBLOCK_NOMAP, 0); >>>   } >>> +/** >>> + * memblock_rsrv_mark_noinit - Mark a reserved memory region with >>> flag MEMBLOCK_RSRV_NOINIT. >>> + * @base: the base phys addr of the region >>> + * @size: the size of the region >>> + * >>> + * For memory regions marked with %MEMBLOCK_RSRV_NOINIT, >>> reserve_bootmem_region >>> + * is not called during memmap_init_reserved_pages, hence struct >>> pages are not >>> + * initialized for this region. >>> + * >>> + * Return: 0 on success, -errno on failure. >>> + */ >>> +int __init_memblock memblock_rsrv_mark_noinit(phys_addr_t base, >>> phys_addr_t size) >>> +{ >>> +    return memblock_setclr_flag(base, size, 1, >>> MEMBLOCK_RSRV_NOINIT, 1); >>> +} >>> + >>>   static bool should_skip_region(struct memblock_type *type, >>>                      struct memblock_region *m, >>>                      int nid, int flags) >>> @@ -2113,13 +2129,17 @@ static void __init >>> memmap_init_reserved_pages(void) >>>           memblock_set_node(start, end, &memblock.reserved, nid); >>>       } >>> -    /* initialize struct pages for the reserved regions */ >>> +    /* >>> +     * initialize struct pages for reserved regions that don't have >>> +     * the MEMBLOCK_RSRV_NOINIT flag set >>> +     */ >>>       for_each_reserved_mem_region(region) { >>> -        nid = memblock_get_region_node(region); >>> -        start = region->base; >>> -        end = start + region->size; >>> - >>> -        reserve_bootmem_region(start, end, nid); >>> +        if (!memblock_is_noinit(region)) { >>> +            nid = memblock_get_region_node(region); >>> +            start = region->base; >>> +            end = start + region->size; >>> +            reserve_bootmem_region(start, end, nid); >>> +        } >>>       } >>>   } >> >> There's code like: >> >> static inline void free_vmemmap_page(struct page *page) >> { >>          if (PageReserved(page)) >>                  free_bootmem_page(page); >>          else >>                  __free_page(page); >> } >> >> which depends on the PageReserved being in vmempages pages, so I >> think you can't skip that part? >> > > free_vmemmap_page_list (free_vmemmap_page) is called on struct pages > (refer to as [1]) that point to memory *which contains* the struct > pages (refer to as [2]) for the hugepage. The above if > (!memblock_is_noinit(region)) to not reserve_bootmem_region is called > for the struct pages [2] for the hugepage. struct pages [1] are not > changed with my patch. > > As an experiment if I run the diff at the bottom with and without > these patches I get the same log "HugeTLB: reserved pages 4096, normal > pages 0", which means those struct pages are treated the same without > and without these patches. (Its 4096 as 262144 struct pages [2] per > hugepage * 64 bytes per struct page / PAGE_SIZE = 4096 struct pages [1] ) > > Also should have mentioned in cover letter, I used cat /proc/meminfo > to make sure it was working as expected. Reserving 500 1G hugepages > with and without these patches when hugetlb_free_vmemmap=on > MemTotal:       536207112 kB (511.4G) > > when hugetlb_free_vmemmap=off > MemTotal:       528015112 kB (503G) > > > The expectation is that for 500 1G hugepages, when using HVO we have a > saving of 16380K*500=~8GB which is what we see with and without those > patches (511.4G - 503G). These patches didnt affect these numbers. > > You are right, thanks for the explanation! --Mika > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > index b5b7834e0f42..bc0ec90552b7 100644 > --- a/mm/hugetlb_vmemmap.c > +++ b/mm/hugetlb_vmemmap.c > @@ -208,6 +208,8 @@ static int vmemmap_remap_range(unsigned long > start, unsigned long end, >         return 0; >  } > > +static int i = 0, j = 0; > + >  /* >   * Free a vmemmap page. A vmemmap page can be allocated from the > memblock >   * allocator or buddy allocator. If the PG_reserved flag is set, it > means > @@ -216,10 +218,14 @@ static int vmemmap_remap_range(unsigned long > start, unsigned long end, >   */ >  static inline void free_vmemmap_page(struct page *page) >  { > -       if (PageReserved(page)) > +       if (PageReserved(page)) { > +               i++; >                 free_bootmem_page(page); > -       else > +       } > +       else { > +               j++; >                 __free_page(page); > +       } >  } > >  /* Free a list of the vmemmap pages */ > @@ -380,6 +386,7 @@ static int vmemmap_remap_free(unsigned long start, > unsigned long end, > >         free_vmemmap_page_list(&vmemmap_pages); > > +       pr_err("reserved pages %u, normal pages %u", i, j); >         return ret; >  } > > > > > >> --Mika >> >> >