From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C8AFC433FE for ; Thu, 3 Nov 2022 21:42:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9FD646B0072; Thu, 3 Nov 2022 17:42:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9AD538E0001; Thu, 3 Nov 2022 17:42:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89BE26B0074; Thu, 3 Nov 2022 17:42:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 76BB86B0072 for ; Thu, 3 Nov 2022 17:42:01 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3FC4B1C5EAD for ; Thu, 3 Nov 2022 21:42:01 +0000 (UTC) X-FDA: 80093454042.17.2BB7AE9 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf17.hostedemail.com (Postfix) with ESMTP id 9234740007 for ; Thu, 3 Nov 2022 21:42:00 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E305EB82A31 for ; Thu, 3 Nov 2022 21:41:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 476A2C43142 for ; Thu, 3 Nov 2022 21:41:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667511716; bh=uPVjPd0sItmgKWcb/4J3iPNbrhCvC78j4JnZd3nbd0I=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=ULRJtokjPQ2GKq5DvEh3OyRPqVhAio8Kyu4OAfaBTlnm0rfsDAIsCajH2lXUoJtAS whaqhuIGNu9tv865DpSOadiD1upiBX24066iRjBkGo9Xa0fEfjnYvBDjFZC3dHZJky 2uVqeqRq/QuVh/5BhfoX0e2K0gR4ZaZj7KfoAwk1IhTt5bYDdTtgrbzHvvVLYk47Bv e8PMxc2E17i1TfcSyqK+wXVXxakOPwuzff/yjWQ8O4RGFrEc3m+tU6YWUu5vkC8A0e 8xaCHaOv/w/LrruWmdj2ai2qZMmvVifqxIHtWSHR1QZvgG5fjHMAFXPwKFYCu73sAk Y5k4ghGKDaVRA== Received: by mail-ej1-f44.google.com with SMTP id 13so8842703ejn.3 for ; Thu, 03 Nov 2022 14:41:56 -0700 (PDT) X-Gm-Message-State: ACrzQf2BiZijc6nzg2jjVxSkpfD0wsxG/Vyyu4+mO15rTAJf5HYLwggq JcLs65NpSw0zl/1OpR0t9qyDsxdAhg8fqJ0u1nc= X-Google-Smtp-Source: AMsMyM4DDn//BJ3YNxX1pJr0cEile6DXy2QBs9oTdcL26TvcAWsYoPn7YIyNSIIF+3OjWXZde6/KWYhHN9kOuRwi7gc= X-Received: by 2002:a17:907:b602:b0:7ad:e82c:3355 with SMTP id vl2-20020a170907b60200b007ade82c3355mr17874916ejc.3.1667511714445; Thu, 03 Nov 2022 14:41:54 -0700 (PDT) MIME-Version: 1.0 References: <20221031222541.1773452-1-song@kernel.org> <20221031222541.1773452-2-song@kernel.org> In-Reply-To: From: Song Liu Date: Thu, 3 Nov 2022 14:41:42 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH bpf-next v1 RESEND 1/5] vmalloc: introduce vmalloc_exec, vfree_exec, and vcopy_exec To: "Edgecombe, Rick P" Cc: "rppt@kernel.org" , "mcgrof@kernel.org" , "p.raghav@samsung.com" , "peterz@infradead.org" , "bpf@vger.kernel.org" , "dave@stgolabs.net" , "willy@infradead.org" , "linux-mm@kvack.org" , "hch@lst.de" , "vbabka@suse.cz" , "zhengjun.xing@linux.intel.com" , "x86@kernel.org" , "akpm@linux-foundation.org" , "Torvalds, Linus" , "Hansen, Dave" , "kbusch@kernel.org" , "mgorman@suse.de" , "a.manzanares@samsung.com" Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667511720; a=rsa-sha256; cv=none; b=5PRzA2niNECxYwRm3HYtdfF8H/L8WGLkhVYxj3LBsUCIS/pxRMMdaMyEbtJPm5/lc/Kw3Q Xsht2glwRuHkHhVocvAZru2KwiEmAFwBpGlrpjFhOeVquLcC01C2jjiHdfePgGvsCd/81m +JIo3mE8LExFLW5vewxDBLN8rstzUj0= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ULRJtokj; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of song@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=song@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667511720; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gQwwG2VH+NskuTnMzQth00dxLnwsfWFW2fyn4ZkxngU=; b=3ZJDZMLnskWbLfJDwWynvDv+tkuHWQ8+BgKFG6isuWMPJbzQFohZz77WJD1PyeTB1fvs1B AJKlhYgws5oPN+PJlqWeA0h5/E6JG+IBGdXWaX6QWX5DTRqEEQohJS+qrCvVN2zep43eU/ NsLKPKy1rU481EsaveVZErmgMYA0boQ= X-Stat-Signature: khqwj1pyy74opkb4kdf3r7mphez989f6 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 9234740007 X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ULRJtokj; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of song@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=song@kernel.org X-HE-Tag: 1667511720-685126 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Nov 3, 2022 at 2:19 PM Edgecombe, Rick P wrote: > > On Thu, 2022-11-03 at 11:59 -0700, Luis Chamberlain wrote: > > > > Mike Rapoport had presented about the Direct map fragmentation > > > > problem > > > > at Plumbers 2021 [0], and clearly mentioned modules / BPF / > > > > ftrace / > > > > kprobes as possible sources for this. Then Xing Zhengjun's 2021 > > > > performance > > > > evaluation on whether using 2M/1G pages aggressively for the > > > > kernel direct map > > > > help performance [1] ends up generally recommending huge pages. > > > > The work by Xing > > > > though was about using huge pages *alone*, not using a strategy > > > > such as in the > > > > "bpf prog pack" to share one 2 MiB huge page for *all* small eBPF > > > > programs, > > > > and that I think is the real golden nugget here. > > > > > > > > I contend therefore that the theoretical reduction of iTLB misses > > > > by using > > > > huge pages for "bpf prog pack" is not what gets your systems to > > > > perform > > > > somehow better. It should be simply that it reduces fragmentation > > > > and > > > > *this* generally can help with performance long term. If this is > > > > accurate > > > > then let's please separate the two aspects to this. > > > > > > The direct map fragmentation is the reason for higher TLB miss > > > rate, both > > > for iTLB and dTLB. > > > > OK so then whatever benchmark is running in tandem as eBPF JIT is > > hammered > > should *also* be measured with perf for iTLB and dTLB. ie, the patch > > can > > provide such results as a justifications. > > Song had done some tests on the old prog pack version that to me seemed > to indicate most (or possibly all) of the benefit was direct map > fragmentation reduction. This was surprised me, since 2MB kernel text > has shown to be beneficial. > > Otherwise +1 to all these comments. This should be clear about what the > benefits are. I would add, that this is also much nicer about TLB > shootdowns than the existing way of loading text and saves some memory. > > So I think there are sort of four areas of improvements: > 1. Direct map fragmentation reduction (dTLB miss improvements). This > sort of does it as a side effect in this series, and the solution Mike > is talking about is a more general, probably better one. > 2. 2MB mapped JITs. This is the iTLB side. I think this is a decent > solution for this, but surprisingly it doesn't seem to be useful for > JITs. (modules testing TBD) > 3. Loading text to reused allocation with per-cpu mappings. This > reduces TLB shootdowns, which are a short term load and teardown time > performance drag. My understanding is this is more of a problem on > bigger systems with many CPUs. This series does a decent job at this, > but the solution is not compatible with modules. Maybe ok since modules > don't load as often as JITs. > 4. Having BPF progs share pages. This saves memory. This series could > probably easily get a number for how much. > Hi Luis, Rick, and Mike, Thanks a lot for helping me organize this information. Totally agree with all these comments. I will add more data to the next revision. Besides the motivation improvement, could you please also share your comments on: 1. The logic/design of the vmalloc_exec() et. al. APIs; 2. The naming of these functions. Does execmem_[alloc|free|fill|cpy] (as suggested by Chritoph) sound good? Thanks, Song