From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B16DC433E0 for ; Tue, 26 Jan 2021 09:47:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D19BE23117 for ; Tue, 26 Jan 2021 09:47:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D19BE23117 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1375F8D00B9; Tue, 26 Jan 2021 04:47:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E82A8D00B0; Tue, 26 Jan 2021 04:47:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3F028D00B9; Tue, 26 Jan 2021 04:47:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0022.hostedemail.com [216.40.44.22]) by kanga.kvack.org (Postfix) with ESMTP id E0A708D00B0 for ; Tue, 26 Jan 2021 04:47:29 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id B04B110F12 for ; Tue, 26 Jan 2021 09:47:29 +0000 (UTC) X-FDA: 77747448618.08.offer57_1704b072758d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 917DF1819E793 for ; Tue, 26 Jan 2021 09:47:29 +0000 (UTC) X-HE-Tag: offer57_1704b072758d X-Filterd-Recvd-Size: 7618 Received: from mail-pg1-f176.google.com (mail-pg1-f176.google.com [209.85.215.176]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Tue, 26 Jan 2021 09:47:29 +0000 (UTC) Received: by mail-pg1-f176.google.com with SMTP id c132so11158362pga.3 for ; Tue, 26 Jan 2021 01:47:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=SPAzPMHTUwMywoI0YQ3zzW5m1bqgTd/Dm/sAyJYiZoU=; b=W2PrIvrIVmdbuaEdejTXL/7FLIJb/5mP1qpGdgnJtEEGw7NLZ+e/FI97ohHsWVDipd 5dF8C5mNdVkZUqB8JfAS7PVUERPWq0KWNurxULF9TUHwuvDN0DeZRbbdCL34p8Gvz4rL lmYEI8nq984fQtKOef6ZAELQJyH11zFN8f4BrI4A2zhXFdY6nTsAOBxwfPyggbwFRezg fqippu/BO5P96Mb8WjAFpfV2A8YAsRcRl8N4VBP7YTcQzZte9XHY+e/2unQxKMhd6eSy UmIiqO7KXZL6uSq5C2Wr8+0OKovo+QEy7YhcRTzibFN1n3rz7n4oqURRM4erA2+Pfj+/ 3vRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=SPAzPMHTUwMywoI0YQ3zzW5m1bqgTd/Dm/sAyJYiZoU=; b=ojVAnE9mU6rq6uRo1mAAsnAc5Lqbb/YaK320pEyU1nAQ3H022uqQIm3Zer1IpHBCUp pRHBhM/wwMBVsoY7RDjE/JqVH5Vz/FAfouBV5PgYoN6CbtTFbjLBvgXl6O+cRe5aHemc s7r4sJf59z0jeiR1VnFsBFPOu1fvMXX2O7nfOqgVAODm0lz0Bnq77VhiiUX5wYQHj2fa I264FOJZmLEd/X+ZwW8sXeLDe28jdbUqyPDAuRLAyyfuOfPrFnbLpOiVPLXf4qrHRYxT CliBOOBtwnUTQRWCCK2ugLDy1z7CMfTT3EieueaOE2ja0XAOPPtb7mjTbtpq23PvTnu0 FRow== X-Gm-Message-State: AOAM5337fIR6tjuiR9aDW56JiCplT/HT462gPNoaMQ/EnnvjRjHXH6vH UaO/kCal8tz/3KcMbtGKYzo= X-Google-Smtp-Source: ABdhPJw2MXPKRaA2kctx6FW3cVhePx8H93ZknPUe2RtbSEpjbwM/uc3AYcTkxnbnOic06K/F2HkFzQ== X-Received: by 2002:a62:54c3:0:b029:1bc:731c:dfc1 with SMTP id i186-20020a6254c30000b02901bc731cdfc1mr4627362pfb.20.1611654448148; Tue, 26 Jan 2021 01:47:28 -0800 (PST) Received: from localhost (192.156.221.203.dial.dynamic.acc50-nort-cbr.comindico.com.au. [203.221.156.192]) by smtp.gmail.com with ESMTPSA id v21sm1095752pfn.80.2021.01.26.01.47.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Jan 2021 01:47:27 -0800 (PST) Date: Tue, 26 Jan 2021 19:47:22 +1000 From: Nicholas Piggin Subject: Re: [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings To: Andrew Morton , Ding Tianhong , linux-mm@kvack.org Cc: Christophe Leroy , Christoph Hellwig , Jonathan Cameron , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Rick Edgecombe References: <20210126044510.2491820-1-npiggin@gmail.com> <20210126044510.2491820-13-npiggin@gmail.com> <0f360e6e-6d34-19ce-6c76-a17a5f4f7fc3@huawei.com> In-Reply-To: <0f360e6e-6d34-19ce-6c76-a17a5f4f7fc3@huawei.com> MIME-Version: 1.0 Message-Id: <1611653945.t3oot63nwn.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Excerpts from Ding Tianhong's message of January 26, 2021 4:59 pm: > On 2021/1/26 12:45, Nicholas Piggin wrote: >> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC >> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and >> supports PMD sized vmap mappings. >>=20 >> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size >> or larger, and fall back to small pages if that was unsuccessful. >>=20 >> Architectures must ensure that any arch specific vmalloc allocations >> that require PAGE_SIZE mappings (e.g., module allocations vs strict >> module rwx) use the VM_NOHUGE flag to inhibit larger mappings. >>=20 >> When hugepage vmalloc mappings are enabled in the next patch, this >> reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node >> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%. >>=20 >> This can result in more internal fragmentation and memory overhead for a >> given allocation, an option nohugevmalloc is added to disable at boot. >>=20 >> Signed-off-by: Nicholas Piggin >> --- >> arch/Kconfig | 11 ++ >> include/linux/vmalloc.h | 21 ++++ >> mm/page_alloc.c | 5 +- >> mm/vmalloc.c | 215 +++++++++++++++++++++++++++++++--------- >> 4 files changed, 205 insertions(+), 47 deletions(-) >>=20 >> diff --git a/arch/Kconfig b/arch/Kconfig >> index 24862d15f3a3..eef170e0c9b8 100644 >> --- a/arch/Kconfig >> +++ b/arch/Kconfig >> @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >> config HAVE_ARCH_HUGE_VMAP >> bool >> =20 >> +# >> +# Archs that select this would be capable of PMD-sized vmaps (i.e., >> +# arch_vmap_pmd_supported() returns true), and they must make no assum= ptions >> +# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VM= AP flag >> +# can be used to prohibit arch-specific allocations from using hugepag= es to >> +# help with this (e.g., modules may require it). >> +# >> +config HAVE_ARCH_HUGE_VMALLOC >> + depends on HAVE_ARCH_HUGE_VMAP >> + bool >> + >> config ARCH_WANT_HUGE_PMD_SHARE >> bool >> =20 >> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h >> index 99ea72d547dc..93270adf5db5 100644 >> --- a/include/linux/vmalloc.h >> +++ b/include/linux/vmalloc.h >> @@ -25,6 +25,7 @@ struct notifier_block; /* in notifier.h */ >> #define VM_NO_GUARD 0x00000040 /* don't add guard page */ >> #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory = */ >> #define VM_MAP_PUT_PAGES 0x00000100 /* put pages and free array in vfre= e */ >> +#define VM_NO_HUGE_VMAP 0x00000200 /* force PAGE_SIZE pte mapping */ >>=20 >> /* >> * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALL= OC. >> @@ -59,6 +60,9 @@ struct vm_struct { >> unsigned long size; >> unsigned long flags; >> struct page **pages; >> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC >> + unsigned int page_order; >> +#endif >> unsigned int nr_pages; >> phys_addr_t phys_addr; >> const void *caller; > Hi Nicholas: >=20 > Give a suggestion :) >=20 > The page order was only used to indicate the huge page flag for vm area, = and only valid when > size bigger than PMD_SIZE, so can we use the vm flgas to instead of that,= just like define the > new flag named VM_HUGEPAGE, it would not break the vm struct, and it is e= asier for me to backport the serious > patches to our own branches. (Base on the lts version). Hmm, it might be possible. I'm not sure if 1GB vmallocs will be used any=20 time soon (or maybe they will for edge case configurations? It would be=20 trivial to add support for). The other concern I have is that Christophe IIRC was asking about=20 implementing a mapping for PPC which used TLB mappings that were=20 different than kernel page table tree size. Although I guess we could=20 deal with that when it comes. I like the flexibility of page_order though. How hard would it be for=20 you to do the backport with VM_HUGEPAGE yourself? I should also say, thanks for all the review and testing from the Huawei=20 team. Do you have an x86 patch? Thanks, Nick