From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6C02C46467 for ; Wed, 30 Nov 2022 16:18:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 255BC6B0072; Wed, 30 Nov 2022 11:18:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2054E6B0073; Wed, 30 Nov 2022 11:18:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F59D6B0074; Wed, 30 Nov 2022 11:18:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 01AB26B0072 for ; Wed, 30 Nov 2022 11:18:49 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B2EB8C0FE0 for ; Wed, 30 Nov 2022 16:18:49 +0000 (UTC) X-FDA: 80190617178.15.3065D70 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf18.hostedemail.com (Postfix) with ESMTP id 072C01C000A for ; Wed, 30 Nov 2022 16:18:47 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1F43161CE3 for ; Wed, 30 Nov 2022 16:18:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 06D35C43470 for ; Wed, 30 Nov 2022 16:18:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669825126; bh=KZLVjRYAEWDOIkOwlUQk3o7idcubztUPk/V9lAeQDug=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=ms7XRKU/tG97oZpE9gYVFLz43UAXWfmGdpnhiQdGsauyfLQkHBQakIkaGGRb7LWUl dpwR8DpnBZ/ginytQwd4uHBcQZQpa8BkVlHFaZ4K4bB9+lyQP6i0PyGkNFf1F/UGQv KMZXuDrXIBIU2uyo1jfJ7Adsupfs2ptTYs2LzYtCkSO0NDfKjOoTGKOq+xyU2ko404 fdZLTJjSXJVaGOpOCoIHrRJ8Sp1ZswdfDAw8lA4jkUqjvOv4+QTgOs4LIyvt6q7vtE lxTnNBRjLArqM1FTxYhrmSA0PgWGMmintGLtWX4WJke2eafHoj11glZHmu8Bl4gJDg 33HcLUrZuP8rw== Received: by mail-ua1-f44.google.com with SMTP id m5so6359327uah.3 for ; Wed, 30 Nov 2022 08:18:45 -0800 (PST) X-Gm-Message-State: ANoB5plbk/J1P5H3bDo9cLyOC3MsJVeUDCFwQ+pd0pwviewmoOgYjMmS puNcISyuXPr7vjSNO0kdbiv17FvpHB0F3kwkejc= X-Google-Smtp-Source: AA0mqf4Xq2P2GMVmisZULrIbu8N7BwNZlMRiUfy9qur8wRsLMGvA3e8V7T4IjdHgfLQR/8CW7Q5eHwApmhIzh47Wt4M= X-Received: by 2002:a9f:22a7:0:b0:418:87ba:c43a with SMTP id 36-20020a9f22a7000000b0041887bac43amr25783128uan.114.1669825124875; Wed, 30 Nov 2022 08:18:44 -0800 (PST) MIME-Version: 1.0 References: <20221107223921.3451913-1-song@kernel.org> <87lenuukj0.ffs@tglx> <871qpluxfu.ffs@tglx> In-Reply-To: <871qpluxfu.ffs@tglx> From: Song Liu Date: Wed, 30 Nov 2022 08:18:31 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs To: Thomas Gleixner Cc: bpf@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, akpm@linux-foundation.org, x86@kernel.org, hch@lst.de, rick.p.edgecombe@intel.com, aaron.lu@intel.com, rppt@kernel.org, mcgrof@kernel.org Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669825128; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=f5TGKo1fXZVF0Qv9TsuNWZpK/ycuGTFJSCnST7G0sMs=; b=CxIkpNNfOIu3pPHLV04qA94VlJv9rVcD15/gmb7Tx+uKQmATGRJLG0zwKZSS38CgOMEjg2 1hQK9WjGM0sQ1ZalkX/+AHPBfyfLdouS4kpMdPHBFKCk8or9csEWA/53joWpgNLSr2O1Nf /c5EMkhDa2v9ot+ZeZgx/ncYlIzcBpQ= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="ms7XRKU/"; spf=pass (imf18.hostedemail.com: domain of song@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=song@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669825128; a=rsa-sha256; cv=none; b=cdWH4RXOyEWmh8mtG7SQv9lnRU7JjvH7QVJjb7NIykRtjNCjNdmvW5GmPltltV6MmHLYqd l0LeR00QiHf4bfX9XurxOqFd/kfLZiKxiunk8duljXMjpurdeRMCqLwQGTw+j3xLh3OGMO sSuFh8l4bkPrX9lDJn2/gpVZnWz2DcA= X-Stat-Signature: 6bz3b9fexdnogy89ku45pntpwizupmgd X-Rspam-User: X-Rspamd-Queue-Id: 072C01C000A X-Rspamd-Server: rspam11 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="ms7XRKU/"; spf=pass (imf18.hostedemail.com: domain of song@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=song@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-HE-Tag: 1669825127-661760 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Thomas, On Tue, Nov 29, 2022 at 3:56 PM Thomas Gleixner wrote: > > Song, > > On Tue, Nov 29 2022 at 09:26, Song Liu wrote: > > On Tue, Nov 29, 2022 at 2:23 AM Thomas Gleixner wrote: > >> Modules are the obvious starting point. Once that is solved pretty much > >> everything else falls into place including BPF. > >> > >> Without modules support this whole exercise is pointless and not going > >> anywhere near x86. > > > > I am not sure I fully understand your point here. Do you mean > > > > 1) There is something wrong with this solution, that makes it not suitable > > for modules; > > or > > 2) The solution is in the right direction and it will very likely work > > for modules. > > But we haven't finished module support. ? > > As I'm obviously unable to express myself coherently, let me try again: > > A solution which solves the BPF problem, but does not solve the > underlying problem of module_alloc() is not acceptable. > > Is that clear enough? While I sincerely want to provide a solution not just for BPF but also for modules and others, I don't think I fully understand the underlying problem of module_alloc(). I sincerely would like to learn more about it. > > > If it is 1), I would like to understand what are the issues that make it not > > suitable for modules. If it is 2), I think a solid, mostly like working small > > step toward the right direction is the better way as it makes code reviews > > a lot easier and has much lower risks. Does this make sense? > > No. Because all you are interested in is to get your BPF itch scratched > instead of actually sitting down and solving the underlying problem and > thereby creating a benefit for everyone. TBH, until your reply, I thought I was working on something that would benefit everyone. It is indeed not just for BPF itch, as bpf_prog_pack already scratched it for BPF. > > You are not making anything easier. You are violating the basic > engineering principle of "Fix the root cause, not the symptom". > I am not sure what is the root cause and the symptom here. I understand ideas referred in this lwn article: https://lwn.net/Articles/894557/ But I don't know which one of them (if any) would fix the root cause. > By doing that you are actually creating more problems than you > solve. Why? > > Clearly your "solution" does not cover the full requirements of the > module space because you solely focus on executable memory allocations > which somehow magically go into the module address space. > > Can you coherently explain how this results in a consistent solution > for the rest of the module requirements? > > Can you coherently explain why this wont create problems down the road > for anyone who actually would be willing to solve the root cause? > > No, you can't answer any of these questions simply because you never > explored the problem space sufficiently. I was thinking, for modules, we only need something new for module text, and module data will just use vmalloc(). I guess this is probably not the right solution? > > I'm not the first one to point this out. Quite some people in the > various threads regarding this issue have been pointing that out to you > before. They even provided you hints on how this can be solved properly > once and forever and for everyones benefits. I tried to review various threads. Unfortunately, I am not able to identify the proper hints and construct a solution. > > > I would also highlight that part of the benefit of this work comes from > > reducing direct map fragmentations. While BPF programs consume less > > memory, they are more dynamic and can cause more direct map > > fragmentations. bpf_prog_pack in upstream kernel already covers this > > part, but this set is a better solution than bpf_prog_pack. > > > > Finally, I would like to point out that 5/6 and 6/6 of (v5) the set let BPF > > programs share a 2MB page with static kernel text. Therefore, even > > for systems without many BPF programs, we should already see some > > reduction in iTLB misses. > > Can you please stop this marketing nonsense? As I pointed out to you in > the very mail which your are replying to, the influence of BPF on the > system I picked randomly out of the pool is pretty close to ZERO. > > Ergo, the reduction of iTLB misses is going to be equally close to > ZERO. What is the benefit you are trying to sell me? > > I'm happy to run perf on this machine and provide the numbers which put > your 'we should already see some reduction' handwaving into perspective. > > But the above is just a distraction. The real point is: > > You can highlight and point out the benefits of your BPF specific > solution as much as you want, it does not make the fact that you are > "fixing" the symptom instead of the root cause magically go away. > > Again for the record: > > The iTLB pressure problem, which affects modules, kprobes, tracing and > BPF, is caused by the way how module_alloc() is implemented. TBH, I don't think I understand this... Do you mean the problem with module_alloc() is that it is not aware of desired permissions (W or X or neither)? If so, is permission vmalloc [1] the right direction for this? [1] https://lwn.net/ml/linux-mm/20201120202426.18009-1-rick.p.edgecombe@intel.com/ > > That's the root cause and this needs to be solved for _ALL_ of the users > of this infrastructure and not worked around by adding something which > makes BPF shiny and handwaves about that it solves the underlying > problem. While I did plan to enable 2MB pages for module text, I didn't plan to solve it in the first set. However, since you think it is possible and would like to provide directions, I am up for the challenge and will give it a try. Please share more details about the right direction. Otherwise, I am still lost... Thanks, Song