From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37EE6ECAAD4 for ; Mon, 29 Aug 2022 22:27:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C21436B0073; Mon, 29 Aug 2022 18:27:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BD04D6B0074; Mon, 29 Aug 2022 18:27:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A9761940007; Mon, 29 Aug 2022 18:27:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 964AE6B0073 for ; Mon, 29 Aug 2022 18:27:58 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 6DF0E140415 for ; Mon, 29 Aug 2022 22:27:58 +0000 (UTC) X-FDA: 79854069036.09.05266B2 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf18.hostedemail.com (Postfix) with ESMTP id 1AEBF1C003F for ; Mon, 29 Aug 2022 22:27:57 +0000 (UTC) Received: by mail-ed1-f43.google.com with SMTP id u6so11883224eda.12 for ; Mon, 29 Aug 2022 15:27:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=StN1ci7BUla7rql82wgDtpuvqfCEri46SZjEaE6S5YI=; b=HVIURYsfcYragyBIU+rm16Mxc2ZVDchWUXvrVPmX2Lfu57JxvYdH5EtDEOBqhFlIYp j08XIsNiDQvB+MqGyGqB5h1q8av/yyBLjRn1jTaQIGcZbLuXtk9ylm5SsruO8Hnpjovy SuQ4yIs4pHvbf+shXd1t1a3XvTLMmij9IcgdsJL9TtWAiBynjEliJ5NqsXMdDyKJm2pr mRZBMPHkIG96kE0XawvUr0oK+hL7uSjznPIkHjDy+MPhWYQazuzxMbMw2BSonAj2lNtf sa7Kj9GA7Pxe1aX1ENHZ9GUHgvDjhwbs6/4zgY3xRg0XxJoti70tHCPAAlACBJZd31Wf FMTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=StN1ci7BUla7rql82wgDtpuvqfCEri46SZjEaE6S5YI=; b=3je2OH0XJaM6qb1IgIZtrGYjCZ8095ahpo4ujsOLn7KFE/3eAa2wjzdAwLffZhqKZ3 ND4duejOKRrnDnwuppoMm2MldxA7+iE8xbfNGxySPyd8XkdaH0VTOubMLLcyJH1gcTA8 4ckJxfbVkvHqrhrdndDI+ya0IwpDW2CTKjdsjwnYmEfEzuGyq72hby9IcYCnxLNfR+ee 5JRHTTpJI0Rdyuai/LloAOnzgfBW4PoayCkwdu7a/wVMMFL6lBtIHEetEr4u9xuSElz4 AGkZjtinrqrQ22Ius8sCfWU/a0Ylnmrdv8mJHh7pFcYuwiXWwFFdUmtG+dTdbLA2qIvJ Co4Q== X-Gm-Message-State: ACgBeo2hP6cYKwm2Q9qZngvVY7on9kiw6kBEfhDm3v2QR21+oXI/t91e ZsyWrelOJsMYUaYxIj/mauoc1qxS//t6xcggHe4= X-Google-Smtp-Source: AA6agR7OBnfVLBmrXWL8eOELRbpT8Qsy670LyyoMMUtjMp+ox6cEddCQX9vAS5nNEEIl2OwO/BqkKSoI1HCEnftwSZE= X-Received: by 2002:a05:6402:270d:b0:43a:67b9:6eea with SMTP id y13-20020a056402270d00b0043a67b96eeamr17998862edd.94.1661812076879; Mon, 29 Aug 2022 15:27:56 -0700 (PDT) MIME-Version: 1.0 References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> <20220826024430.84565-16-alexei.starovoitov@gmail.com> In-Reply-To: From: Alexei Starovoitov Date: Mon, 29 Aug 2022 15:27:45 -0700 Message-ID: Subject: Re: [PATCH v4 bpf-next 15/15] bpf: Introduce sysctl kernel.bpf_force_dyn_alloc. To: Daniel Borkmann Cc: "David S. Miller" , Andrii Nakryiko , Tejun Heo , Kumar Kartikeya Dwivedi , Delyan Kratunov , linux-mm , bpf , Kernel Team Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=HVIURYsf; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661812078; a=rsa-sha256; cv=none; b=ao3rOKttCedaxXlmQHk1EuJjBxk3Lf2w9wCfuoajuqd531BTp6OcRDTKTRVFGaZQcZ3Yeh HwNNSxYULm9NtBZMM7VgZ01oixTVi6G0LLxR1QXiE5ufVC//GuK10C9yfTKZzcxNuTC6px vHHCgjIWVHCOgr/HdKGXUBwHYMcZibQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661812078; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=StN1ci7BUla7rql82wgDtpuvqfCEri46SZjEaE6S5YI=; b=25JkIiEbazEfiFKAa4/XXZjHs2oMVtTeH9HAK93eyOCryWI14ZFG14XQimbM93wJnCi2D+ OxWDalSbuvC1VEAKH3lXcnsooZYAtSPKMo/GjLjYwK4RFwJlWC/vNQ3plIMOfG0VwigpPD /0H10W+VeYvV1pEgL0QubgKvhmVlpSs= X-Rspamd-Queue-Id: 1AEBF1C003F X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=HVIURYsf; spf=pass (imf18.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam02 X-Stat-Signature: ff8ctms54f4thkrsy6cf84mb9zi7s87n X-HE-Tag: 1661812077-779750 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 29, 2022 at 3:02 PM Daniel Borkmann wrote: > > On 8/26/22 4:44 AM, Alexei Starovoitov wrote: > > From: Alexei Starovoitov > > > > Introduce sysctl kernel.bpf_force_dyn_alloc to force dynamic allocation in bpf > > hash map. All selftests/bpf should pass with bpf_force_dyn_alloc 0 or 1 and all > > bpf programs (both sleepable and not) should not see any functional difference. > > The sysctl's observable behavior should only be improved memory usage. > > > > Acked-by: Kumar Kartikeya Dwivedi > > Signed-off-by: Alexei Starovoitov > > --- > > include/linux/filter.h | 2 ++ > > kernel/bpf/core.c | 2 ++ > > kernel/bpf/hashtab.c | 5 +++++ > > kernel/bpf/syscall.c | 9 +++++++++ > > 4 files changed, 18 insertions(+) > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > index a5f21dc3c432..eb4d4a0c0bde 100644 > > --- a/include/linux/filter.h > > +++ b/include/linux/filter.h > > @@ -1009,6 +1009,8 @@ bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk, > > } > > #endif > > > > +extern int bpf_force_dyn_alloc; > > + > > #ifdef CONFIG_BPF_JIT > > extern int bpf_jit_enable; > > extern int bpf_jit_harden; > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > index 639437f36928..a13e78ea4b90 100644 > > --- a/kernel/bpf/core.c > > +++ b/kernel/bpf/core.c > > @@ -533,6 +533,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) > > bpf_prog_kallsyms_del(fp); > > } > > > > +int bpf_force_dyn_alloc __read_mostly; > > + > > #ifdef CONFIG_BPF_JIT > > /* All BPF JIT sysctl knobs here. */ > > int bpf_jit_enable __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_DEFAULT_ON); > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > > index 89f26cbddef5..f68a3400939e 100644 > > --- a/kernel/bpf/hashtab.c > > +++ b/kernel/bpf/hashtab.c > > @@ -505,6 +505,11 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) > > > > bpf_map_init_from_attr(&htab->map, attr); > > > > + if (!lru && bpf_force_dyn_alloc) { > > + prealloc = false; > > + htab->map.map_flags |= BPF_F_NO_PREALLOC; > > + } > > + > > The rationale is essentially for testing, right? Would be nice to avoid > making this patch uapi. It will just confuse users with implementation > details, imho, and then it's hard to remove it again. Not for testing, but for production. The plan is to roll this sysctl gradually in the fleet and hopefully observe memory saving without negative side effects, but map usage patterns are wild. It will take a long time to get the confidence that prelloc code from htab can be completely removed. At scale usage might find all kinds of unforeseen issues. Probably new alloc heuristics would need to be developed. If 'git rm kernel/bpf/percpu_freelist.*' ever happens (would be great, but who knows) then this sysctl will become a nop. This patch is trivial enough and we could keep it internal, but everybody else with a large fleet of servers would probably be applying the same patch and will be repeating the same steps. bpf usage in hyperscalers varies a lot. Before 'git rm freelist' we probably flip the default for this sysctl to get even broader coverage.