From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7741C433EF for ; Fri, 22 Apr 2022 02:48:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 474986B0073; Thu, 21 Apr 2022 22:48:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 421D26B0074; Thu, 21 Apr 2022 22:48:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EAAC6B0075; Thu, 21 Apr 2022 22:48:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 1D4306B0073 for ; Thu, 21 Apr 2022 22:48:14 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id EAEC81204AE for ; Fri, 22 Apr 2022 02:48:13 +0000 (UTC) X-FDA: 79382980866.31.95BAE3E Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) by imf21.hostedemail.com (Postfix) with ESMTP id D9AD91C0017 for ; Fri, 22 Apr 2022 02:48:11 +0000 (UTC) Received: by mail-lj1-f181.google.com with SMTP id c15so7944346ljr.9 for ; Thu, 21 Apr 2022 19:48:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ppxYJpilG8kq+TFq7q0/hAFZJS7FtFUQSQusR4SqlvA=; b=Xctnh2iK5uByYnDTDm5OXbRfKcEVx4Xs78FxhxW87Sw/K0tponrZPQ+y9TipNokajo yMZLfRCLO6Pr6DCynUM4uHbdbfAIEevQALChH2DhzRyAy3ocXyIrbT+tBuTUR1rxaypw q+M9mSEDvFTqlaMTl/bftJ5L/6+6k0Tkc5roo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ppxYJpilG8kq+TFq7q0/hAFZJS7FtFUQSQusR4SqlvA=; b=eK9PT1E563Ec9vZniRScqFThP2dVzP7hTApJBH/HgYsPUIAtl6NeDmONUVpP0gC/Wf T6Y3c/DQk0sRCn2PGuOKhEUf0jZuxXIR/bSUxbXFYGiCWoy/v49yaBgvI9vyeaG7iTss M1Cuh5ePcPoKdquQPuQvIDkO+OKgwxj+msfQuJXumvr7Xs2JSOMYwS7saOensRutg7SO kNJLgGtW2HbxPweQ/MEF5YaCKx5A0N/3wLCsymny01uzDYyCO1g86hxwDu+ClzuwkCfJ A0An+xXWum7fs4dtt9yXgAwFBC836lgsvPqW6qOT9qy45RPl73+gTajosqM5M5HWxkEG MRxQ== X-Gm-Message-State: AOAM531CDyEP2vhHEctObeyG7ZpmH2Lg/hj0cTWb0iAihW54LjFJJOKc kJl+XhDVy2vwxj4gBIsU1fppA0Z6a4iqWqs0qv4= X-Google-Smtp-Source: ABdhPJxM8ZjkhW8JC/MmQwyqoqn0PTF5+Udy7T4+WlgBnI5lCj4xR8gq2T0t19RlaB5fJYoU1K65Ww== X-Received: by 2002:a05:651c:1185:b0:24d:a008:46f0 with SMTP id w5-20020a05651c118500b0024da00846f0mr1474260ljo.408.1650595691472; Thu, 21 Apr 2022 19:48:11 -0700 (PDT) Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com. [209.85.167.51]) by smtp.gmail.com with ESMTPSA id s12-20020ac25fec000000b0044837422334sm77532lfg.154.2022.04.21.19.48.08 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 21 Apr 2022 19:48:09 -0700 (PDT) Received: by mail-lf1-f51.google.com with SMTP id bq30so11913517lfb.3 for ; Thu, 21 Apr 2022 19:48:08 -0700 (PDT) X-Received: by 2002:a05:6512:3c93:b0:44b:4ba:c334 with SMTP id h19-20020a0565123c9300b0044b04bac334mr1630672lfv.27.1650595688485; Thu, 21 Apr 2022 19:48:08 -0700 (PDT) MIME-Version: 1.0 References: <20220415164413.2727220-1-song@kernel.org> <4AD023F9-FBCE-4C7C-A049-9292491408AA@fb.com> <88eafc9220d134d72db9eb381114432e71903022.camel@intel.com> <1650511496.iys9nxdueb.astroid@bobo.none> <1650530694.evuxjgtju7.astroid@bobo.none> <25437eade8b2ecf52ff9666a7de9e36928b7d28f.camel@intel.com> <1650584815.0dtcbd4qky.astroid@bobo.none> <310d562b80ad328e19a4959356600e4efe49cf4c.camel@intel.com> In-Reply-To: <310d562b80ad328e19a4959356600e4efe49cf4c.camel@intel.com> From: Linus Torvalds Date: Thu, 21 Apr 2022 19:47:52 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP To: "Edgecombe, Rick P" Cc: "npiggin@gmail.com" , "songliubraving@fb.com" , "linux-kernel@vger.kernel.org" , "daniel@iogearbox.net" , "bpf@vger.kernel.org" , "hch@infradead.org" , "ast@kernel.org" , "Kernel-team@fb.com" , "linux-mm@kvack.org" , "rppt@kernel.org" , "song@kernel.org" , "pmladek@suse.com" , "akpm@linux-foundation.org" , "hpa@zytor.com" , "dborkman@redhat.com" , "edumazet@google.com" , "bp@alien8.de" , "mcgrof@kernel.org" , "mbenes@suse.cz" , "imbrenda@linux.ibm.com" Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: D9AD91C0017 X-Stat-Signature: b49pyzu7fwf3z48qo1m9ozo3xqnhtduu Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Xctnh2iK; dmarc=none; spf=pass (imf21.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.181 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org X-HE-Tag: 1650595691-25475 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 21, 2022 at 7:29 PM Edgecombe, Rick P wrote: > > FWIW, I like this direction. I think it needs to free them differently > though? Very much so. > Besides fixing the bisected issue (hopefully), it also more cleanly > separates the mapping from the backing allocation logic. And then since > all the pages are 4k (from the page allocator perspective), it would be > easier to support non-huge page aligned sizes. i.e. not use up a whole > additional 2MB page if you only need 4k more of allocation size. I don't disagree, but I think the real problem is that the whole "oen page_order per vmalloc() area" itself is a bit broken. For example, AMD already does this "automatic TLB size" thing for when you have multiple contiguous PTE entries (shades of the old alpha "page size hint" thing, except it's automatic and doesn't have explicit hints). And I'm hoping Intel will do something similar in the future. End result? It would actually be really good to just map contiguous pages, but it doesn't have anything to do with the 2MB PMD size. And there's no "fixed order" needed either. If you have mapping that is 17 pages in size, it would still be good to allocate them as a block of 16 pages ("page_order = 4") and as a single page, because just laying them out in the page tables that way will already allow AMD to use a 64kB TLB entry for that 16-page block. But it would also work to just do the allocations as a set of 8, 4, 4 and 1. But the whole "one page order for one vmalloc" means that doesn't work very well. Where I disagree (violently) with Nick is his contention that (a) this is x86-specific and (b) this is somehow trivial to fix. Let's face it - the current code is broken. I think the sub-page issue is not entirely trivial, and the current design isn't even very good for it. But the *easy* cases are the ones that simply don't care - the ones that powerpc has actually been testing. So for 5.18, I think it's quite likely reasonable to re-enable large-page vmalloc for the easy case (ie those big hash tables). Re-enabling it *all*, considering how broken it has been, and how little testing it has clearly gotten? And potentially not enabling it on x86 because x86 is so much better at showing issues? That's not what I want to do. If the code is so broken that it can't be used on x86, then it's too broken to be enabled on powerpc and s390 too. Never mind that those architectures might have so limited use that they never realized how broken they were.. Linus