From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2246EC0015E for ; Wed, 26 Jul 2023 15:05:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 975096B007E; Wed, 26 Jul 2023 11:05:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 92FD36B0080; Wed, 26 Jul 2023 11:05:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C6426B0081; Wed, 26 Jul 2023 11:05:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6CAE36B007E for ; Wed, 26 Jul 2023 11:05:39 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 366884031C for ; Wed, 26 Jul 2023 15:05:39 +0000 (UTC) X-FDA: 81054087198.05.EE503BC Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) by imf14.hostedemail.com (Postfix) with ESMTP id 0490310005E for ; Wed, 26 Jul 2023 15:02:24 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="dO/LDAG1"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf14.hostedemail.com: domain of usama.arif@bytedance.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=usama.arif@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690383746; a=rsa-sha256; cv=none; b=qHe0OBYhjZBsAcOgRZqCpTT8phB1xabFcz+F9NHmxB46oXgZRxEhlGWtLkWDHCpBVOIVkJ DreSqnNhh8ZGIyq1XZUHIMuGeU83FZodOqnXSOJ//L5kLgHdpZNlw37uTdVVt2fdY5WXY1 WKlm6H6A40U0OyV/Wrqo3Bb2lTiaQWM= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="dO/LDAG1"; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf14.hostedemail.com: domain of usama.arif@bytedance.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=usama.arif@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690383746; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tPSIiabiK2bB92IGCQhHNo24e4IiD57cduoDesI/uEQ=; b=7lFIFflolB76teLN01B0s0b2xod8RNCDynsIFX0LZsSa+UHoTeld3c7qKRXb+CPmRfzeWo yDqKJA4vDVWOgTRgFGYi0xU9ZnTlIK0sR6hyfJbsm9IbBx2nk6kzq8UdHHLm7Dhn9FFijt WysEA1LFMVlrCcKJSLny41OUhwEGUy0= Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-3fbfa811667so7682285e9.1 for ; Wed, 26 Jul 2023 08:02:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690383742; x=1690988542; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=tPSIiabiK2bB92IGCQhHNo24e4IiD57cduoDesI/uEQ=; b=dO/LDAG1Z1DKZOmV71y3tSzTM7L/oULfRN3jBBUriEKRcjCYssuWh3yYzghtAXqlV+ uArOGHXt2nmRUO1+tIL8Y+UL86CNijqjbccsbXlrZFbz7d0dmh19MhzamK0sG9KIbyt4 KN0XhkM3OpfYjEaRlStZwFWEE2w99vDnIROJ9+/J2mTPt3Kh/m3ao+A0Hf6vmn7hyhrB 67RmvMASqvKA4ru00dYHq/HfDJLR5EjKZN6NPW+g7cnSbzdN0tvPJaIvyodEraL7vGZj DRWgM8Raotd/f5j8m/p1FUd46k4wJgkT/U1lGeWimmYdw0rwkcmC1pG9uJl/SsH473aQ QYhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690383742; x=1690988542; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tPSIiabiK2bB92IGCQhHNo24e4IiD57cduoDesI/uEQ=; b=B0rYo1/K0hA6bqqc8A4WiqPLDt+SMWOML6xczBqtRW8MCcbLeoNjAPxztGSWG73YTV +WoEWpAK6l9X2ip8nvLEiLDIfx94Is9Wu9j0UCcc4IF5cainNjNDzsIzeYoE841bvebk cDRfdEqrGLBgMa8JyKhGYjA60tNlsOaaeVDAmDpFOoT/osC9qRMrKgVz1ZhS07ewOfkn 9Rj+qreazkOxsHgUxZgg94jWLVPk/dtB/OxQtzo4zqYZs8pnLHBzrK5TF9bkoz6CWIpy DoRTBMjt7we50DqOKbk5lfuNmWG+eyLtbAWp3KQtkT2qXh4KveFSnrvBQ4nP5nkrWAg9 Rd0g== X-Gm-Message-State: ABy/qLYCjxZtRL6mqTObdkSYcOfMWeywk3YbiRnL5mDGxjQLFZ2aCNON lhpDcNQFpd7phQ2Erx+XuoEjYQ== X-Google-Smtp-Source: APBJJlFs6YArZvcQLv3s/VScdvAm2rpaIl5b4MCX20jpRpa1L1Um5u3JLtxoUdsdj3uRMUxE6ISIgQ== X-Received: by 2002:a05:600c:6347:b0:3fa:7c6b:86e with SMTP id du7-20020a05600c634700b003fa7c6b086emr5150214wmb.12.1690383742604; Wed, 26 Jul 2023 08:02:22 -0700 (PDT) Received: from ?IPV6:2a02:6b6a:b465:0:fc3:b498:9b55:91a8? ([2a02:6b6a:b465:0:fc3:b498:9b55:91a8]) by smtp.gmail.com with ESMTPSA id n6-20020a7bcbc6000000b003fb739d27aesm2257909wmi.35.2023.07.26.08.02.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 Jul 2023 08:02:22 -0700 (PDT) Message-ID: <440d4a0e-c1ea-864b-54cb-aab74858319a@bytedance.com> Date: Wed, 26 Jul 2023 16:02:21 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: [External] Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region Content-Language: en-US To: Mike Rapoport Cc: linux-mm@kvack.org, muchun.song@linux.dev, mike.kravetz@oracle.com, linux-kernel@vger.kernel.org, fam.zheng@bytedance.com, liangma@liangbit.com, simon.evans@bytedance.com, punit.agrawal@bytedance.com References: <20230724134644.1299963-1-usama.arif@bytedance.com> <20230724134644.1299963-3-usama.arif@bytedance.com> <20230726110113.GT1901145@kernel.org> From: Usama Arif In-Reply-To: <20230726110113.GT1901145@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0490310005E X-Stat-Signature: xb4n4hc7oz37jabt41h6pxspe59fxj1f X-HE-Tag: 1690383744-313323 X-HE-Meta: U2FsdGVkX19Bzmvk02T8IHzos18KkPsnyPycd/0OrgpB1u1/OmsU9seWFObwbXhHBEXYL9ldy2pPJtpgSxWj4t89xoZVLj1s38slgjYjCsYkbBaDe3SXoJq74vx1F+pcEDhyRJhuxfoQax6BMuERwxBU/Jqbv7CkzzXMUmdEsAOwCFjPK+BKHEKwY2CF50W9TDZMgexaysC03qgPqFBlY6fOJykOzWu+Re/LVQHe7lOq3I4kb8jzqE0X3xGbxzOT+ceX8JWgm/ciCGymEy/RM+QIT5zHpkliB5RWvSSrV0BWGRIXPnwdD0N5hpXsJkuo+s0UTT/wFoyxktrX0XXuvoZN587J4OnS17nAUT/2UcEeE2ixlBsOyOmyP3/0CicVOIufKBRr1vWaVTnx2XLPtHvhOVDFCzf9TOvQzyQVe580+2bN0/ONY7x8XZmyb9pzM1ODuzk9gk96go/j08p8WkJt3ogSP2w1R+air3DY190nPQGvHjpznY1j9pjlNieJWp1uGDNc94I2ipkGDZydQPg1PZA2EB9XbGzCI4PQfdq7eSEGErSA8TjnoCVTBNeytvh43Dz+wOKYwiYiNO9w+Ya4x6dz23ifF+8B6IpZu6U27LOpNNqqqEuyv83fXp3/QZGv5YSe7gmnun5EKXp4KjzNe6Met4wqhQyPtQzKED0ci3wB2TcPEoQu+E1qP2xk/u6+ViLXPSVzJuK3/xDrJasB7Lf58O/ZqH8TjrhuxLeNAqnGAZ0ShLrXG6ir/mmCbz4XuwA3YS84pIjLb5d1DtbK7820a0E+QcGXE83zL8BpsSZ6kblIzCztRWeYan6cJ9LdrSbSJc3EsCqx/FPQRjDMv8uff8HOrwx1/s9UApYVgRAE2JXwPvdl3MIXaBA8MrZ2MHNMcG2XxW8xEOMocx0N9piJQHCOoJKOIhqYsyyv5ziErfWXQ/v2VAtpw8DZXywiUWtw6dOlrQQo0qF l+PfOxAr /2kcGyuPSaPDBJ6Y7JBIDZAGzQ003r/9YJmDeSnZ45254W/0Wto9/R7O5DogMjFAIGDQBZDbwPMAhZEHPTWkw6bNShoTQI7dK9d0op8RHoyESA7WUaO9XYRNWhdXURfn6UKo+YuV1DU7esTK30+lD0hC6YTHG0hP7dQ9+nUzmlrb8UBJ+rL8oWntPaXw1g3seEBsx7yjenmosRwwgT9tgZmG0y3qxUMUF5KSysFxQHtnv5eUcRfwapexAG+ex3xJsCPFEguPzDg60ZQmy49jIgYEhNuhAJyw8hwMinIxrFsEPl36BPBGXwg2clNwEB6Lkg7Q7xy3//WjBpPjoLS8vBMpx7KiLy7N0uHbBuM8m8mvT/OQ7BSM2qB2FCZNoP7dYKEo7xNbpfaq1CI6BoDU9TcFooDnG5YZV0xdiDf9am92GfLDMOEdr8w3GQWnPcPDiiooxZl+IzghpmI/Pq6CYz3L6k4vbK6wPutTyaRpf9YXJFmdAO+pWUwO0iQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 26/07/2023 12:01, Mike Rapoport wrote: > On Mon, Jul 24, 2023 at 02:46:42PM +0100, Usama Arif wrote: >> This propagates the hugepage size from the memblock APIs >> (memblock_alloc_try_nid_raw and memblock_alloc_range_nid) >> so that it can be stored in struct memblock region. This does not >> introduce any functional change and hugepage_size is not used in >> this commit. It is just a setup for the next commit where huge_pagesize >> is used to skip initialization of struct pages that will be freed later >> when HVO is enabled. >> >> Signed-off-by: Usama Arif >> --- >> arch/arm64/mm/kasan_init.c | 2 +- >> arch/powerpc/platforms/pasemi/iommu.c | 2 +- >> arch/powerpc/platforms/pseries/setup.c | 4 +- >> arch/powerpc/sysdev/dart_iommu.c | 2 +- >> include/linux/memblock.h | 8 ++- >> mm/cma.c | 4 +- >> mm/hugetlb.c | 6 +- >> mm/memblock.c | 60 ++++++++++++-------- >> mm/mm_init.c | 2 +- >> mm/sparse-vmemmap.c | 2 +- >> tools/testing/memblock/tests/alloc_nid_api.c | 2 +- >> 11 files changed, 56 insertions(+), 38 deletions(-) >> > > [ snip ] > >> diff --git a/include/linux/memblock.h b/include/linux/memblock.h >> index f71ff9f0ec81..bb8019540d73 100644 >> --- a/include/linux/memblock.h >> +++ b/include/linux/memblock.h >> @@ -63,6 +63,7 @@ struct memblock_region { >> #ifdef CONFIG_NUMA >> int nid; >> #endif >> + phys_addr_t hugepage_size; >> }; >> >> /** >> @@ -400,7 +401,8 @@ phys_addr_t memblock_phys_alloc_range(phys_addr_t size, phys_addr_t align, >> phys_addr_t start, phys_addr_t end); >> phys_addr_t memblock_alloc_range_nid(phys_addr_t size, >> phys_addr_t align, phys_addr_t start, >> - phys_addr_t end, int nid, bool exact_nid); >> + phys_addr_t end, int nid, bool exact_nid, >> + phys_addr_t hugepage_size); > > Rather than adding yet another parameter to memblock_phys_alloc_range() we > can have an API that sets a flag on the reserved regions. > With this the hugetlb reservation code can set a flag when HVO is > enabled and memmap_init_reserved_pages() will skip regions with this flag > set. > Hi, Thanks for the review. I think you meant memblock_alloc_range_nid/memblock_alloc_try_nid_raw and not memblock_phys_alloc_range? My initial approach was to use flags, but I think it looks worse than what I have done in this RFC (I have pushed the flags prototype at https://github.com/uarif1/linux/commits/flags_skip_prep_init_gigantic_HVO, top 4 commits for reference (the main difference is patch 2 and 4 from RFC)). The major points are (the bigger issue is in patch 4): - (RFC vs flags patch 2 comparison) In the RFC, hugepage_size is propagated from memblock_alloc_try_nid_raw through function calls. When using flags, the "no_init" boolean is propogated from memblock_alloc_try_nid_raw through function calls until the region flags are available in memblock_add_range and the new MEMBLOCK_NOINIT flag is set. I think its a bit more tricky to introduce a new function to set the flag in the region AFTER the call to memblock_alloc_try_nid_raw has finished as the memblock_region can not be found. So something (hugepage_size/flag information) still has to be propagated through function calls and a new argument needs to be added. - (RFC vs flags patch 4 comparison) We can't skip initialization of the whole region, only the tail pages. We still need to initialize the HUGETLB_VMEMMAP_RESERVE_SIZE (PAGE_SIZE) struct pages for each gigantic page. In the RFC, hugepage_size from patch 2 was used in the for loop in memmap_init_reserved_pages in patch 4 to reserve HUGETLB_VMEMMAP_RESERVE_SIZE struct pages for every hugepage_size. This looks very simple and not hacky. If we use a flag, there are 2 ways to initialize the HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage: 1. (implemented in github link patch 4) memmap_init_reserved_pages skips the region for initialization as you suggested, and then we initialize HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage somewhere later (I did it in gather_bootmem_prealloc). When calling reserve_bootmem_region in gather_bootmem_prealloc, we need to skip early_page_uninitialised and this makes it look a bit hacky. 2. We initialize the HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage in memmap_init_reserved_pages itself. As we have used a flag and havent passed hugepage_size, we need to get the gigantic page size somehow. There doesnt seem to be a nice way to determine the gigantic page size in that function which is architecture dependent. I think gigantic page size can be given by PAGE_SIZE << (PUD_SHIFT - PAGE_SHIFT), but not sure if this is ok for all architectures? If we can use PAGE_SIZE << (PUD_SHIFT - PAGE_SHIFT) it will look much better than point 1. Both the RFC patches and the github flags implementation work, but I think RFC patches look much cleaner. If there is a strong preference for the the github patches I can send it to mailing list? Thanks, Usama >> phys_addr_t memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid); >> >> static __always_inline phys_addr_t memblock_phys_alloc(phys_addr_t size, >> @@ -415,7 +417,7 @@ void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align, >> int nid); >> void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align, >> phys_addr_t min_addr, phys_addr_t max_addr, >> - int nid); >> + int nid, phys_addr_t hugepage_size); >> void *memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, >> phys_addr_t min_addr, phys_addr_t max_addr, >> int nid); >> @@ -431,7 +433,7 @@ static inline void *memblock_alloc_raw(phys_addr_t size, >> { >> return memblock_alloc_try_nid_raw(size, align, MEMBLOCK_LOW_LIMIT, >> MEMBLOCK_ALLOC_ACCESSIBLE, >> - NUMA_NO_NODE); >> + NUMA_NO_NODE, 0); >> } >> >> static inline void *memblock_alloc_from(phys_addr_t size, >