From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0462ECAAD1 for ; Sat, 27 Aug 2022 16:58:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D60C5940008; Sat, 27 Aug 2022 12:58:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D0FA9940007; Sat, 27 Aug 2022 12:58:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD776940008; Sat, 27 Aug 2022 12:58:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AEB5A940007 for ; Sat, 27 Aug 2022 12:58:00 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7D38240A04 for ; Sat, 27 Aug 2022 16:58:00 +0000 (UTC) X-FDA: 79845979920.03.2D7FB9B Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) by imf16.hostedemail.com (Postfix) with ESMTP id 2D78B180030 for ; Sat, 27 Aug 2022 16:58:00 +0000 (UTC) Received: by mail-ej1-f53.google.com with SMTP id bj12so8256358ejb.13 for ; Sat, 27 Aug 2022 09:57:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=UH/bQrlcY3qy+d3vRB45tQDHJMPwlx5v9AaMDIYAjgQ=; b=W5h/+FSmglVlSgGSgzquYkjELn992cmjIkHWV5/o9LR6Jp+U4+EoVTjGAHChls1//q vmuXwcCpXIVKkqrWexNLP73AmjQN0Ka1H8QCqxhv+b7f4l74n3zY/o67Rl+X+YbSm6Uv zwTSLUBpvnCUTk/nSfj6t9lf8LIeZK37lC+MGYMuiX0r+8mo3H2IGsiTUlq748nt3dhq 25Vsi0uv74VranF+zMai3yHbycHmDWQ3tE7r3V5yz9Hs3IR0WFIQtq9TIyTcQ87GdFS1 sM7Te5ddk26T8fI3M6yyMIRf84AihBajHmcyHinGHnblXN84mjSIcVbl4oOA31Vkgvl7 h+ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=UH/bQrlcY3qy+d3vRB45tQDHJMPwlx5v9AaMDIYAjgQ=; b=kAjTBTcAVmkgb+LzaM5k3F2vWu4gfqUgYIZiucB4DOPznJvIJcwUjLtsVfBM8HpeRx IxeobVWFSkkXlNKluODpSzgL1PQvfGBaSozHnHFsKOqcXGMrVkU2Mv8fz0kZaL3vPN3n /dWyx+mN4s4F6VCKB3mWpz92I/fUYGRYophZUH8E8Bw+IaG1BZ1W+vrOb4elW+YXswoH 4iIOvHGs+I+hmcjkVBP+OG7LreqxwGa9/EGhC3qhLviEOOGbtBD6p9sZqr0Q+08/ZhYl 5LipQvnXlepdrrfXlHfmI2uFPAvDEhBFaxPCAOfE9ZkYHOnAx+Q+6sPLiEHcPgNipLj6 PgPQ== X-Gm-Message-State: ACgBeo1mukQLQzjVnx1EbBF1AYE+32LXvXeepHJAL4PjhzB/9/J5TpKc qTOj7os6lwRJMKb0c4VjfYHncJbtZ8HfS2h7J9o= X-Google-Smtp-Source: AA6agR5BCHlpqqmUvMgkGUYyrlYQwcB3+STdxYryA5d/zheNS8DkK5nR+hqZgGxsk42fYx2Ebn3OMGVT+wKtxXYbWWE= X-Received: by 2002:a17:907:6e8b:b0:73d:c094:e218 with SMTP id sh11-20020a1709076e8b00b0073dc094e218mr8745372ejc.226.1661619478696; Sat, 27 Aug 2022 09:57:58 -0700 (PDT) MIME-Version: 1.0 References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> In-Reply-To: <20220826024430.84565-1-alexei.starovoitov@gmail.com> From: Andrii Nakryiko Date: Sat, 27 Aug 2022 09:57:47 -0700 Message-ID: Subject: Re: [PATCH v4 bpf-next 00/15] bpf: BPF specific memory allocator. To: Alexei Starovoitov Cc: davem@davemloft.net, daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661619480; a=rsa-sha256; cv=none; b=rILiHMt6m4BUYlGQvUny0gJhto4Gs8NEP1hK7IRGiW19gZVgmTXmUepf4vSei9C//oo8YS 10d6wHpo5s1x/gdK3NBMCf8qzWGbcTxT4YR7Xb/IVFvdC3nJaq0egcX3EgVCyRzid5PVlE DzqjGaTeseM3EILH6Y37/T4D9m8RmQg= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="W5h/+FSm"; spf=pass (imf16.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661619480; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UH/bQrlcY3qy+d3vRB45tQDHJMPwlx5v9AaMDIYAjgQ=; b=AyOWMRHUL3RALlhBTqycdCc1zS9hQV9mzr8XEHyVroZrtQL8w3JKlNiZMapwLtley1Ivar lbDSuw1ntpxU+BKborU7QbtQikurvhkKkJ8htQZu0xcIQtJNvMNMBn5i3+7vB5M0f56Fi6 WteX6GIYDT3GaFu8Lg/9Tyb1qgxBgrA= X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="W5h/+FSm"; spf=pass (imf16.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam10 X-Stat-Signature: gogc6khdadyknqhsjiskxj17wofswizo X-Rspamd-Queue-Id: 2D78B180030 X-HE-Tag: 1661619480-345774 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 25, 2022 at 7:44 PM Alexei Starovoitov wrote: > > From: Alexei Starovoitov > > Introduce any context BPF specific memory allocator. > > Tracing BPF programs can attach to kprobe and fentry. Hence they > run in unknown context where calling plain kmalloc() might not be safe. > Front-end kmalloc() with per-cpu cache of free elements. > Refill this cache asynchronously from irq_work. > > Major achievements enabled by bpf_mem_alloc: > - Dynamically allocated hash maps used to be 10 times slower than fully preallocated. > With bpf_mem_alloc and subsequent optimizations the speed of dynamic maps is equal to full prealloc. > - Tracing bpf programs can use dynamically allocated hash maps. > Potentially saving lots of memory. Typical hash map is sparsely populated. > - Sleepable bpf programs can used dynamically allocated hash maps. > > v3->v4: > - fix build issue due to missing local.h on 32-bit arch > - add Kumar's ack > - proposal for next steps from Delyan: > https://lore.kernel.org/bpf/d3f76b27f4e55ec9e400ae8dcaecbb702a4932e8.camel@fb.com/ > > v2->v3: > - Rewrote the free_list algorithm based on discussions with Kumar. Patch 1. > - Allowed sleepable bpf progs use dynamically allocated maps. Patches 13 and 14. > - Added sysctl to force bpf_mem_alloc in hash map even if pre-alloc is > requested to reduce memory consumption. Patch 15. > - Fix: zero-fill percpu allocation > - Single rcu_barrier at the end instead of each cpu during bpf_mem_alloc destruction > > v2 thread: > https://lore.kernel.org/bpf/20220817210419.95560-1-alexei.starovoitov@gmail.com/ > > v1->v2: > - Moved unsafe direct call_rcu() from hash map into safe place inside bpf_mem_alloc. Patches 7 and 9. > - Optimized atomic_inc/dec in hash map with percpu_counter. Patch 6. > - Tuned watermarks per allocation size. Patch 8 > - Adopted this approach to per-cpu allocation. Patch 10. > - Fully converted hash map to bpf_mem_alloc. Patch 11. > - Removed tracing prog restriction on map types. Combination of all patches and final patch 12. > > v1 thread: > https://lore.kernel.org/bpf/20220623003230.37497-1-alexei.starovoitov@gmail.com/ > > LWN article: > https://lwn.net/Articles/899274/ > > Future work: > - expose bpf_mem_alloc as uapi FD to be used in dynptr_alloc, kptr_alloc > - convert lru map to bpf_mem_alloc > > Alexei Starovoitov (15): > bpf: Introduce any context BPF specific memory allocator. > bpf: Convert hash map to bpf_mem_alloc. > selftests/bpf: Improve test coverage of test_maps > samples/bpf: Reduce syscall overhead in map_perf_test. > bpf: Relax the requirement to use preallocated hash maps in tracing > progs. > bpf: Optimize element count in non-preallocated hash map. > bpf: Optimize call_rcu in non-preallocated hash map. > bpf: Adjust low/high watermarks in bpf_mem_cache > bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU. > bpf: Add percpu allocation support to bpf_mem_alloc. > bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. > bpf: Remove tracing program restriction on map types > bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. > bpf: Remove prealloc-only restriction for sleepable bpf programs. > bpf: Introduce sysctl kernel.bpf_force_dyn_alloc. > > include/linux/bpf_mem_alloc.h | 26 + > include/linux/filter.h | 2 + > kernel/bpf/Makefile | 2 +- > kernel/bpf/core.c | 2 + > kernel/bpf/hashtab.c | 132 +++-- > kernel/bpf/memalloc.c | 602 ++++++++++++++++++++++ > kernel/bpf/syscall.c | 14 +- > kernel/bpf/verifier.c | 52 -- > samples/bpf/map_perf_test_kern.c | 44 +- > samples/bpf/map_perf_test_user.c | 2 +- > tools/testing/selftests/bpf/progs/timer.c | 11 - > tools/testing/selftests/bpf/test_maps.c | 38 +- > 12 files changed, 796 insertions(+), 131 deletions(-) > create mode 100644 include/linux/bpf_mem_alloc.h > create mode 100644 kernel/bpf/memalloc.c > > -- > 2.30.2 > It's great to lift all those NMI restrictions on non-prealloc hashmap! This should also open up new maps (like qp-trie) that can't be pre-sized to the NMI world as well. But just to clarify, in NMI mode we can exhaust memory in caches (and thus if we do a lot of allocation in single BPF program execution we can fail some operations). That's unavoidable. But it's not 100% clear what's the behavior in IRQ mode and separately from that in "usual" less restrictive mode. Is my understanding correct that we shouldn't run out of memory (assuming there is memory available, of course) because replenishing of caches will interrupt BPF program execution? Or am I wrong and we can still run out of memory if we don't have enough pre-cached memory. I think it would be good to clearly state such things (unless I missed them somewhere in patches). I'm trying to understand if in non-restrictive mode we can still fail to allocate a bunch of hashmap elements in a loop just because of the design of bpf_mem_alloc? But it looks great otherwise. For the series: Acked-by: Andrii Nakryiko