From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0E14C433F5 for ; Fri, 22 Apr 2022 17:08:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B4096B0075; Fri, 22 Apr 2022 13:08:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 564F36B0078; Fri, 22 Apr 2022 13:08:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42CE38D0002; Fri, 22 Apr 2022 13:08:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 313FF6B0075 for ; Fri, 22 Apr 2022 13:08:22 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 1832960CF7 for ; Fri, 22 Apr 2022 17:08:22 +0000 (UTC) X-FDA: 79385148444.30.EBE42A0 Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf19.hostedemail.com (Postfix) with ESMTP id 105011A002A for ; Fri, 22 Apr 2022 17:08:18 +0000 (UTC) Received: by mail-lf1-f53.google.com with SMTP id y32so15374249lfa.6 for ; Fri, 22 Apr 2022 10:08:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5ydRGpptIb5w81SmMvS/nipUS4vYlFd2laIjOx1l84s=; b=Lvd464N6kYN0JkSu1d8HO48IyDf7HODgjimqZsNgT54ppx6uKMwlrVXLSjUGD98RhY Pq0tcSZ4P6rrA2KwURvfmZxdD/Rhh4j7v6eCQuGetT0NswRtG/u1JUAK4PZA48TMiRyP uy1ByLyEeu1hJJmzsmM51y7T/M7OtsP2SKMmg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5ydRGpptIb5w81SmMvS/nipUS4vYlFd2laIjOx1l84s=; b=Z1bxo+mifxzGk6nlVvS7+ooy+vuP5S5Hc8zGvTXZzpfCeXL+k+XSdBN4fSigKefGRg qM5psyXEN/tvv5OkkP9uPhBtfAapMQSnVTkCM76syt1tU8y8H4rAeBfBkFdBPnAWPhCE yCjSuuhs84X2BZyvIXwa/2z3JKV7FyqxPtqSg4/GNqpLZzTjOak2UrIWtwKwU2BSzATG yvHXRnGOY59CMWoQ2vtoZcOMRAnFRxAffX4WyWT0mfF4KqNxEel7gRAORz2A5tqqNgHS 5LKq2xWzKkq1Lj8pqpnIYKgDqoVRW1g0hqokb4a/6gLJuC2l/aeX9dTg28WKm1fvZgq7 i1/Q== X-Gm-Message-State: AOAM5337hGD8GRqlp2EFdQ/lP4LOx2Y2nH03JIi0CdIGOjxbXdzX8q++ 6RmFzgogbCCi43Lmdup6U/2qNU/4cKoi7UDVOto= X-Google-Smtp-Source: ABdhPJx3pm8L5iroZfEvAtAxpYODscj5w9HEcHdDDJoYKPwu9FPSfzLCl1L54t73IN7LCGVchYw5sg== X-Received: by 2002:a19:6f09:0:b0:46c:46af:c1b with SMTP id k9-20020a196f09000000b0046c46af0c1bmr3536947lfc.218.1650647299440; Fri, 22 Apr 2022 10:08:19 -0700 (PDT) Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com. [209.85.208.174]) by smtp.gmail.com with ESMTPSA id 24-20020ac24d58000000b0047197f264b4sm286424lfp.70.2022.04.22.10.08.17 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 22 Apr 2022 10:08:17 -0700 (PDT) Received: by mail-lj1-f174.google.com with SMTP id q14so10393715ljc.12 for ; Fri, 22 Apr 2022 10:08:17 -0700 (PDT) X-Received: by 2002:a2e:b818:0:b0:24c:ce86:e6d6 with SMTP id u24-20020a2eb818000000b0024cce86e6d6mr3398453ljo.443.1650647296730; Fri, 22 Apr 2022 10:08:16 -0700 (PDT) MIME-Version: 1.0 References: <20220422060107.781512-1-npiggin@gmail.com> <20220422060107.781512-3-npiggin@gmail.com> In-Reply-To: <20220422060107.781512-3-npiggin@gmail.com> From: Linus Torvalds Date: Fri, 22 Apr 2022 10:08:00 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/2] Revert "vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP" To: Nicholas Piggin Cc: Paul Menzel , "the arch/x86 maintainers" , Song Liu , "Edgecombe, Rick P" , Andrew Morton , Linux Kernel Mailing List , Linux-MM Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 105011A002A X-Stat-Signature: y454n6feeukqtsmaqr9rofanxgawc3hz X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Lvd464N6; spf=pass (imf19.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.167.53 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none X-HE-Tag: 1650647298-695178 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 21, 2022 at 11:01 PM Nicholas Piggin wrote: > > This reverts commit 559089e0a93d44280ec3ab478830af319c56dbe3 > > The previous commit fixes huge vmalloc for drivers that use the > vmalloc_to_page() struct pages. Yeah, no. The very revert shows the problem: > --- a/arch/powerpc/kernel/module.c > +++ b/arch/powerpc/kernel/module.c > @@ -101,7 +101,7 @@ __module_alloc(unsigned long size, unsigned long start, unsigned long end, bool > * too. > */ > return __vmalloc_node_range(size, 1, start, end, gfp, prot, > - VM_FLUSH_RESET_PERMS, > + VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP, > NUMA_NO_NODE, __builtin_return_address(0)); This VM_NO_HUGE_VMAP is a sign of the fact that using hugepages for mapping still isn't a transparent operation. Now, in some cases that would be perfectly fine, ie the s390 case has a nice clear comment about how it's a very special case: > + /* > + * The Create Secure Configuration Ultravisor Call does not support > + * using large pages for the virtual memory area. > + * This is a hardware limitation. > + */ > + kvm->arch.pv.stor_var = vmalloc_no_huge(vlen); but as long as it is "anything that plays permission games with the mapping is broken" we are not reverting that opt-in thing. And no, it's not just that powerpc module code that is somehow magical. This is the exact same issue that the bpf people hit. It's also elsewhere, although it might well be hidden by "small allocations will never trigger this" (eg the arm64 kprobes case only does a single page). I also wonder how this affects any use of 'set_memory_xyz()' with partial mappings (I can point to "frob_text()" and friends for modules, but I can easily imagine drivers doing odd things). In particular, x86 does support pmd splitting for pmd's in set_memory_xyz(), but I *really* couldn't tell you that it's ok with a largepage that has already had its page counts split. It only used to hit the big IO mappings traditionally. Now I *think* it JustWorks(tm) - I don't actually see any obvious problems there - and I also really hope that nobody actually even does that "partial set_memory" on some vmalloc allocation in the first place, but no, that kind of "let's hope" is not ok. And we already know it happens at least for modules. And no, don't even start about that "it's x86". It *still* isn't about x86 as shown by this very patch. The issue is generic, and x86 just tends to hit more odd cases and drivers. In fact, I think x86 probably does *better* than powerpc. Because it looks like 'set_memory_xyz()' just returns an error for vmalloc addresses on powerpc. Sounds strange. Doesn't powerpc do STRICT_MODULE_RWX? Does it work only because 'frob_text()' doesn't actually check the return value? Or maybe set_memory_xyz() is ok and it is *only* VM_FLUSH_RESET_PERMS that doesn't work? I don't know. But I do know bpf was affected, and I'm looking at that module thing, and so I suspect it's elsewhere too. Just opt-in with the mappings that matter. Linus