From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B291FC433F5 for ; Wed, 20 Apr 2022 02:03:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D30A76B0072; Tue, 19 Apr 2022 22:03:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB76B6B0073; Tue, 19 Apr 2022 22:03:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B30E66B0074; Tue, 19 Apr 2022 22:03:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 9E44F6B0072 for ; Tue, 19 Apr 2022 22:03:18 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 63CB062168 for ; Wed, 20 Apr 2022 02:03:18 +0000 (UTC) X-FDA: 79375610076.25.D6791F8 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf30.hostedemail.com (Postfix) with ESMTP id 5F2AE80016 for ; Wed, 20 Apr 2022 02:03:16 +0000 (UTC) Received: by mail-pl1-f174.google.com with SMTP id n18so447582plg.5 for ; Tue, 19 Apr 2022 19:03:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ibEgNl4ah592y04CKeGYQNxrDcqpD8/OSVG1ddkFp1M=; b=LaBe316aDoumG2Dpik722GfIyJarGZhlJL2hYxjNg90jQuXSKB2aY/5AtIGdFNsEl8 g0MYl+bVmlGWdszBg9naIYf3ZXPmL+FU50nM/eK4NT8Uul9aXtuzJiOfG4FOXRb8zwZ7 ckf1iEyIpdwhInWPExohuDvx+Htkjttzw7wt2DNATnytGPO+5ANQcV47HCVSrI8xNcg8 PRegW4jPnuBZwZkerivwO83XXrlHd9GiWoA8IDBnXrX5M8KnqQQky9a19dpW+sauhtA/ FFBkjy0dbFpsxMHogsGZCykw0UZUo/4eU/FW8twhn8STJJfbe6gfmy5MGl6peGEniQs5 oXvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ibEgNl4ah592y04CKeGYQNxrDcqpD8/OSVG1ddkFp1M=; b=2I+mM3uYCdN1uHiqpYAdM8vuYtvee4BM+DFCJVgNoGsEGCGZ2cElgyXxr4mEayU0vS M2YMr+E9I/q05bq4gV0sswz4Lqq4ITzgMqFPkFNQRtpOvlvccHp4zp59DytxDVHoWn3K 5HIMILCft3M2opweLxmAUmK29NHlNWIE9+UlnGCTvesGnVMZsH9gRGXlPhI7Re2H4Vq6 3U0VMv8QwQaz25fe4ZfMn0KV1HMaEvuVbamYi2h1irqMi8WqFEiwbvfbXgnX6gxL/s3S WCkC7b1CvSI6B5SY7lbqevPPR+EVD9eJ99pDkX+oAowXCToBdHW7hEmbNz9lBIaxM/74 9c4g== X-Gm-Message-State: AOAM532w/wuc7QojgnYzMegwF0KceehywLkOfx7/8FCwcYxxql/iEMv4 UpgAqzv+Guwj8Hdin0GbibE= X-Google-Smtp-Source: ABdhPJyzXhyfBVQOSAYv7JeeXa5TGCf2yZ58e5LFSZGZY91wL0Wt7M4faqxQtlZ0MlwWbI81/zqbjw== X-Received: by 2002:a17:902:e1c5:b0:158:e060:4f6c with SMTP id t5-20020a170902e1c500b00158e0604f6cmr18229407pla.163.1650420196781; Tue, 19 Apr 2022 19:03:16 -0700 (PDT) Received: from MacBook-Pro.local.dhcp.thefacebook.com ([2620:10d:c090:400::5:ce98]) by smtp.gmail.com with ESMTPSA id cw2-20020a056a00450200b0050ab7f48a9csm1415954pfb.170.2022.04.19.19.03.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Apr 2022 19:03:16 -0700 (PDT) Date: Tue, 19 Apr 2022 19:03:11 -0700 From: Alexei Starovoitov To: Linus Torvalds Cc: Mike Rapoport , Song Liu , "Edgecombe, Rick P" , "mcgrof@kernel.org" , "linux-kernel@vger.kernel.org" , "bpf@vger.kernel.org" , "hch@infradead.org" , "ast@kernel.org" , "daniel@iogearbox.net" , "linux-mm@kvack.org" , "song@kernel.org" , Kernel Team , "pmladek@suse.com" , "akpm@linux-foundation.org" , "hpa@zytor.com" , "dborkman@redhat.com" , "edumazet@google.com" , "bp@alien8.de" , "mbenes@suse.cz" , "imbrenda@linux.ibm.com" Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Message-ID: <20220420020311.6ojfhcooumflnbbk@MacBook-Pro.local.dhcp.thefacebook.com> References: <4AD023F9-FBCE-4C7C-A049-9292491408AA@fb.com> <88eafc9220d134d72db9eb381114432e71903022.camel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 5F2AE80016 X-Stat-Signature: 67twmeyt718ua1iqxsp8zawfoaqwcxkz Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LaBe316a; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-HE-Tag: 1650420196-173074 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 19, 2022 at 12:20:39PM -0700, Linus Torvalds wrote: > On Tue, Apr 19, 2022 at 11:42 AM Mike Rapoport wrote: > > > > I'd say that bpf_prog_pack was a cure for symptoms and this project tries > > to address more general problem. > > But you are right, it'll take some time and won't land in 5.19. > > Just to update people: I've just applied Song's [1/4] patch, which > means that the whole current hugepage vmalloc thing is effectively > disabled (because nothing opts in). > > And I suspect that will be the status for 5.18, unless somebody comes > up with some very strong arguments for (re-)starting using huge pages. Here is the quote from Song's cover letter for bpf_prog_pack series: Most BPF programs are small, but they consume a page each. For systems with busy traffic and many BPF programs, this could also add significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system, which includes visible performance degradation for production workloads. The last sentence is the key. We've added this feature not because of bpf programs themselves. So calling this feature an optimization is not quite correct. The number of bpf programs on the production server doesn't matter. The programs come and go all the time. That is the key here. The 4k module_alloc() plus set_memory_ro/x done by the JIT break down huge pages and increase TLB pressure on the kernel code. That creates visible performance degradation for normal user space workloads that are not doing anything bpf related. mm folks can fill in the details here. My understanding it's something to do with identity mapping. So we're not trying to improve bpf performance. We're trying to make sure that bpf program load/unload doesn't affect the speed of the kernel. Generalizing bpf_prog_alloc to modules would be nice, but it's not clear what benefits such optimization might have. It's orthogonal here. So I argue that all 4 Song's fixes are necessary in 5.18. We need an additional zeroing patch too, of course, to make sure huge page doesn't have garbage at alloc time and it's cleaned after prog is unloaded. Regarding JIT spraying and other concerns. Short answer: nothing changed. JIT spraying was mitigated with start address randomization and invalid instruction padding. Both features are still present. Constant blinding is also fully functional. Any kind of generalization of bpf_prog_pack into general mm feature would be nice, but it cannot be done as opportunistic cache. We need a guarantee that bpf prog/unload won't recreate the issue with kernel performance degradation. I suspect we would need bpf_prog_pack in the current form for foreseeable future.