From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1375C433F5 for ; Wed, 19 Jan 2022 13:32:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 287626B0074; Wed, 19 Jan 2022 08:32:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2372B6B0075; Wed, 19 Jan 2022 08:32:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14E016B0078; Wed, 19 Jan 2022 08:32:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id 0854B6B0074 for ; Wed, 19 Jan 2022 08:32:20 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B533892E0D for ; Wed, 19 Jan 2022 13:32:19 +0000 (UTC) X-FDA: 79047125598.12.DFE3400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf15.hostedemail.com (Postfix) with ESMTP id 74009A0055 for ; Wed, 19 Jan 2022 13:32:18 +0000 (UTC) Received: from dggpemm500023.china.huawei.com (unknown [172.30.72.56]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4Jf60Y3G8kzZfHP; Wed, 19 Jan 2022 21:28:29 +0800 (CST) Received: from dggpemm500001.china.huawei.com (7.185.36.107) by dggpemm500023.china.huawei.com (7.185.36.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Wed, 19 Jan 2022 21:32:15 +0800 Received: from [10.174.177.243] (10.174.177.243) by dggpemm500001.china.huawei.com (7.185.36.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2308.21; Wed, 19 Jan 2022 21:32:14 +0800 Message-ID: Date: Wed, 19 Jan 2022 21:32:13 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [PATCH v2 3/3] x86: Support huge vmalloc mappings Content-Language: en-US To: Nicholas Piggin , Andrew Morton , Jonathan Corbet , Dave Hansen , , , , , , CC: Benjamin Herrenschmidt , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , "H. Peter Anvin" , Ingo Molnar , "Michael Ellerman" , Paul Mackerras , "Thomas Gleixner" , Will Deacon , Matthew Wilcox References: <20211227145903.187152-1-wangkefeng.wang@huawei.com> <20211227145903.187152-4-wangkefeng.wang@huawei.com> <70ff58bc-3a92-55c2-2da8-c5877af72e44@intel.com> <3858de1f-cdbc-ff52-2890-4254d0f48b0a@huawei.com> <31a75f95-6e6e-b640-2d95-08a95ea8cf51@intel.com> <1642472965.lgfksp6krp.astroid@bobo.none> <4488d39f-0698-7bfd-b81c-1e609821818f@intel.com> <1642565468.c0jax91tvn.astroid@bobo.none> From: Kefeng Wang In-Reply-To: <1642565468.c0jax91tvn.astroid@bobo.none> Content-Type: text/plain; charset="UTF-8"; format=flowed X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggeme705-chm.china.huawei.com (10.1.199.101) To dggpemm500001.china.huawei.com (7.185.36.107) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 74009A0055 X-Stat-Signature: ihtswce9h8zw86wk3eortijeku8sk3gc Authentication-Results: imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com X-Rspamd-Server: rspam03 X-HE-Tag: 1642599138-203550 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022/1/19 12:17, Nicholas Piggin wrote: > Excerpts from Dave Hansen's message of January 19, 2022 3:28 am: >> On 1/17/22 6:46 PM, Nicholas Piggin wrote: >>>> This all sounds very fragile to me. Every time a new architecture w= ould >>>> get added for huge vmalloc() support, the developer needs to know to= go >>>> find that architecture's module_alloc() and add this flag. >>> This is documented in the Kconfig. >>> >>> # >>> # Archs that select this would be capable of PMD-sized vmaps (i.e.= , >>> # arch_vmap_pmd_supported() returns true), and they must make no a= ssumptions >>> # that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUG= E_VMAP flag >>> # can be used to prohibit arch-specific allocations from using hug= epages to >>> # help with this (e.g., modules may require it). >>> # >>> config HAVE_ARCH_HUGE_VMALLOC >>> depends on HAVE_ARCH_HUGE_VMAP >>> bool >>> >>> Is it really fair to say it's *very* fragile? Surely it's reasonable = to >>> read the (not very long) documentation ad understand the consequences= for >>> the arch code before enabling it. >> Very fragile or not, I think folks are likely to get it wrong. It wou= ld >> be nice to have it default *everyone* to safe and slow and make *sure* > It's not safe to enable though. That's the problem. If it was just > modules then you'd have a point but it could be anything. > >> they go look at the architecture modules code itself before enabling >> this for modules. > This is required not just for modules for the whole arch code, it > has to be looked at and decided this will work. > >> Just from that Kconfig text, I don't think I'd know off the top of my >> head what do do for x86, or what code I needed to go touch. > You have to make sure arch/x86 makes no assumptions that vmalloc memory > is backed by PAGE_SIZE ptes. If you can't do that then you shouldn't > enable the option. The option can not explain it any more because any > arch could do anything with its mappings. The module code is an example= , > not the recipe. Hi Nick, Dave and Christophe=EF=BC=8Cthanks for your review,=C2=A0 a litt= le=20 confused,=C2=A0=C2=A0 I think, 1) for ppc/arm64 module_alloc(),=C2=A0 it must set VM_NO_HUGE_VMAP becaus= e the arch's set_memory_* funcitons can only support PAGE_SIZE mapping, due to = the limit of apply_to_page_range(). 2) but for x86's module_alloc(), add VM_NO_HUGE_VMAP is to avoid=20 fragmentation, x86's __change_page_attr functions will split the huge mapping. this=20 flags is not a must. and the behavior above occurred when STRICT_MODULE_RWX enabled, so 1) add a unified function to set vm flags(suggested by Dave ) or 2) add vm flags with some comments to per-arch's module_alloc() are both acceptable, for the way of unified function ,=C2=A0 we could mak= e=20 this a default recipe with STRICT_MODULE_RWX, also make two more vm flags into it, eg, +unsigned long module_alloc_vm_flags(bool need_flush_reset_perms) +{ +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long vm_flags =3D VM_DEFER= _KMEMLEAK; + +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (need_flush_reset_perms) +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 vm_flags |=3D VM_FLUSH_RESET_PERMS; +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * Modules use a single, large= vmalloc(). Different permissions +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * are applied later and will = fragment huge mappings or even +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * fails in set_memory_* on so= me architectures. Avoid using +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * huge pages for modules. +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (IS_ENABLED(CONFIG_STRICT_MODULE= _RWX)) +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 vm_flags |=3D VM_NO_HUGE_VMAP; + +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return vm_flags; +} then called each arch's module_alloc(). Any suggestion, many thanks. > > Thanks, > Nick > .