From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3F47C433F5 for ; Tue, 28 Dec 2021 10:26:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 476B06B0072; Tue, 28 Dec 2021 05:26:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FF7F6B0073; Tue, 28 Dec 2021 05:26:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A0706B0074; Tue, 28 Dec 2021 05:26:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0117.hostedemail.com [216.40.44.117]) by kanga.kvack.org (Postfix) with ESMTP id 1AC4A6B0072 for ; Tue, 28 Dec 2021 05:26:46 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C453689127 for ; Tue, 28 Dec 2021 10:26:45 +0000 (UTC) X-FDA: 78966824370.11.C7BA707 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf14.hostedemail.com (Postfix) with ESMTP id 03074100007 for ; Tue, 28 Dec 2021 10:26:42 +0000 (UTC) Received: from dggpemm500024.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4JNW0P01swzccD4; Tue, 28 Dec 2021 18:26:13 +0800 (CST) Received: from dggpemm500001.china.huawei.com (7.185.36.107) by dggpemm500024.china.huawei.com (7.185.36.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20; Tue, 28 Dec 2021 18:26:40 +0800 Received: from [10.174.177.243] (10.174.177.243) by dggpemm500001.china.huawei.com (7.185.36.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2308.20; Tue, 28 Dec 2021 18:26:39 +0800 Message-ID: <3858de1f-cdbc-ff52-2890-4254d0f48b0a@huawei.com> Date: Tue, 28 Dec 2021 18:26:39 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 From: Kefeng Wang Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings To: Dave Hansen , Jonathan Corbet , Andrew Morton , , , , , , CC: Nicholas Piggin , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Michael Ellerman , "Benjamin Herrenschmidt" , Paul Mackerras , Christophe Leroy , Matthew Wilcox References: <20211227145903.187152-1-wangkefeng.wang@huawei.com> <20211227145903.187152-4-wangkefeng.wang@huawei.com> <70ff58bc-3a92-55c2-2da8-c5877af72e44@intel.com> Content-Language: en-US In-Reply-To: <70ff58bc-3a92-55c2-2da8-c5877af72e44@intel.com> Content-Type: text/plain; charset="UTF-8"; format=flowed X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggeme702-chm.china.huawei.com (10.1.199.98) To dggpemm500001.china.huawei.com (7.185.36.107) X-CFilter-Loop: Reflected X-Stat-Signature: om9np1hpemfb9gk7mmgyjh6j8kr7w4nj X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 03074100007 Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com X-HE-Tag: 1640687202-925458 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2021/12/27 23:56, Dave Hansen wrote: > On 12/27/21 6:59 AM, Kefeng Wang wrote: >> This patch select HAVE_ARCH_HUGE_VMALLOC to let X86_64 and X86_PAE >> support huge vmalloc mappings. > In general, this seems interesting and the diff is simple. But, I don'= t > see _any_ x86-specific data. I think the bare minimum here would be a > few kernel compiles and some 'perf stat' data for some TLB events. When the feature supported on ppc, commit 8abddd968a303db75e4debe77a3df484164f1f33 Author: Nicholas Piggin Date:=C2=A0=C2=A0 Mon May 3 19:17:55 2021 +1000 =C2=A0=C2=A0=C2=A0 powerpc/64s/radix: Enable huge vmalloc mappings =C2=A0=C2=A0=C2=A0 This reduces TLB misses by nearly 30x on a `git diff`= workload on a =C2=A0=C2=A0=C2=A0 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycle= s by 0.54%, due =C2=A0=C2=A0=C2=A0 to vfs hashes being allocated with 2MB pages. But the data could be different on different machine/arch. >> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c >> index 95fa745e310a..6bf5cb7d876a 100644 >> --- a/arch/x86/kernel/module.c >> +++ b/arch/x86/kernel/module.c >> @@ -75,8 +75,8 @@ void *module_alloc(unsigned long size) >> =20 >> p =3D __vmalloc_node_range(size, MODULE_ALIGN, >> MODULES_VADDR + get_module_load_offset(), >> - MODULES_END, gfp_mask, >> - PAGE_KERNEL, VM_DEFER_KMEMLEAK, NUMA_NO_NODE, >> + MODULES_END, gfp_mask, PAGE_KERNEL, >> + VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP, NUMA_NO_NODE, >> __builtin_return_address(0)); >> if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) { >> vfree(p); > To figure out what's going on in this hunk, I had to look at the cover > letter (which I wasn't cc'd on). That's not great and it means that > somebody who stumbles upon this in the code is going to have a really > hard time figuring out what is going on. Cover letters don't make it > into git history. Sorry for that, will add more into arch's patch changelog. > This desperately needs a comment and some changelog material in *this* > patch. > > But, even the description from the cover letter is sparse: > >> There are some disadvantages about this feature[2], one of the main >> concerns is the possible memory fragmentation/waste in some scenarios, >> also archs must ensure that any arch specific vmalloc allocations that >> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX) >> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings. > That just says that x86 *needs* PAGE_SIZE allocations. But, what > happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)? Will the > subsequent permission changes just fragment the 2M mapping? > . Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping. When module alloc with STRICT_MODULE_RWX on x86, it calls=20 __change_page_attr() from set_memory_ro/rw/nx which will split large page, so there is no=20 need to make module alloc with HUGE_VMALLOC. > =20 > > =20 > =20