From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1998FECAAD2 for ; Mon, 29 Aug 2022 22:04:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95898940007; Mon, 29 Aug 2022 18:04:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 909526B0074; Mon, 29 Aug 2022 18:04:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D040940007; Mon, 29 Aug 2022 18:04:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6BB3B6B0073 for ; Mon, 29 Aug 2022 18:04:52 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4898E1A0400 for ; Mon, 29 Aug 2022 22:04:52 +0000 (UTC) X-FDA: 79854010824.31.9782CEC Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf12.hostedemail.com (Postfix) with ESMTP id E358F40046 for ; Mon, 29 Aug 2022 22:04:51 +0000 (UTC) Received: by mail-ej1-f54.google.com with SMTP id bj12so18425143ejb.13 for ; Mon, 29 Aug 2022 15:04:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=+2/tWyBw6EVMqKBP1tkDJvkirD5YivPxkiDFhJQgs4A=; b=bZp3gJokNF9bgWgaI4KskhgWjxm0VdRObygQVbSrormhUrj/tr+eehO8xLm9qgLk26 uQXFp8g7vJ13vK10SD9xPHYouCvp+bM8G8yqYqi/k6hsM861bEGbdbsH+dY67T3ihqVK 8o/YWWikLhlokvbO+SqkLpwhReS6UinOHudtvHw2+u7K4uD58SlO3ZeiyD5ZhAQKRPF2 KmLl1JMGP0FyP8KzxLRjP19hxTGqDM5kw4WUy4qgZgopM+oxFBANQ1vfCM9MU4GDKTmj g6J+wojQ78Tyb+U+bKn1g/GyvkaY0BjBVYDT2lLiQiMYuMrtLkTJB0xkiCj9iKm8DuOV VzPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=+2/tWyBw6EVMqKBP1tkDJvkirD5YivPxkiDFhJQgs4A=; b=2y6pfxz3LnVlvQdaASIzeTZMpsiN3YIU/2zD4i5pC2gi0dQjhTU970x7b/K+K7lKA3 M2a9oGEKBmXcLCLwBHan7fox6eJwY6sZB3HDZhEiMV80LOXt26iSCFtPu87+7FwyAlo2 CfpU8QMd3XhGoEwu+T5U1DeQiuU3zBryaA6fQzmxCtYL76SFpHYegUnk35iqIizZNLzs 3Dfwm9IQ1OiNGFpQMBkCqmAx863ugyKdtlhImUTwObHBjkKS23uecOasot2xSAleKiOY sbC/rj3J2HTDkk+Jk36IXkjheNBggH9pKHNk7R2tAlNXEeyKtG7YzRIbcsqL8rPsPrGo 6sbg== X-Gm-Message-State: ACgBeo3Lx24rUvF4VZBLHQZjYmL5Cpi7Qc+D1HjutIg/3zOBkEaACsbh adeGsz97aez9phz/+Y2qDm2lq0NzBpNr8bj5jvA= X-Google-Smtp-Source: AA6agR52sLcptiwhGO0QXHf/Z8uAvyU2Q2cwPmarb6cdc2UaEB1nJa6crwc1RzYSPV9zIVYi0y00eWprJ/KrYKAW3KY= X-Received: by 2002:a17:906:847c:b0:73f:d7cf:bf2d with SMTP id hx28-20020a170906847c00b0073fd7cfbf2dmr11239757ejc.327.1661810690559; Mon, 29 Aug 2022 15:04:50 -0700 (PDT) MIME-Version: 1.0 References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> <20220826024430.84565-2-alexei.starovoitov@gmail.com> <181cb6ae-9d98-8986-4419-5013662b0189@iogearbox.net> In-Reply-To: <181cb6ae-9d98-8986-4419-5013662b0189@iogearbox.net> From: Alexei Starovoitov Date: Mon, 29 Aug 2022 15:04:39 -0700 Message-ID: Subject: Re: [PATCH v4 bpf-next 01/15] bpf: Introduce any context BPF specific memory allocator. To: Daniel Borkmann Cc: "David S. Miller" , Andrii Nakryiko , Tejun Heo , Kumar Kartikeya Dwivedi , Delyan Kratunov , linux-mm , bpf , Kernel Team Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661810691; a=rsa-sha256; cv=none; b=4enZQqQdHDHU7oO5VbFOaqiFtJSLVyNnWYBAuEMzG53Bj6q8BAv6JnPkMuGmRFPr2uo1hq 5T+YxJ0UdfZZZs71XxQ6wP3RESArgO8xv3Wt98zFZM1rVJyfF+3UjqxykbVei4hhDLCdHA 7zXmi4raqQHhWukNkS29RU6jX7rQlqg= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=bZp3gJok; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661810691; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+2/tWyBw6EVMqKBP1tkDJvkirD5YivPxkiDFhJQgs4A=; b=5KBomLaI0rESSyqVwsz0TWRILp5IC5sX/TcqXP9W9x7/Ti66LgBBgZgTXqpfFNwm/cyZap 0hbZaYBeYwwIvKEUWVpVmBCsCo6lEMnRsoTIm7OATLM+HOgQJdZnaEbDR2Ie+YLWc5wP2c 0lyoZI2kPcHGtcZ8lElfTSVqkEgydG8= X-Rspam-User: X-Rspamd-Queue-Id: E358F40046 X-Rspamd-Server: rspam03 X-Stat-Signature: qgw8sinpdfrpxehquyce9i4osfk8cxwm Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=bZp3gJok; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-HE-Tag: 1661810691-55179 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 29, 2022 at 2:59 PM Daniel Borkmann wrote: > > On 8/26/22 4:44 AM, Alexei Starovoitov wrote: > > From: Alexei Starovoitov > > > > Tracing BPF programs can attach to kprobe and fentry. Hence they > > run in unknown context where calling plain kmalloc() might not be safe. > > > > Front-end kmalloc() with minimal per-cpu cache of free elements. > > Refill this cache asynchronously from irq_work. > > > > BPF programs always run with migration disabled. > > It's safe to allocate from cache of the current cpu with irqs disabled. > > Free-ing is always done into bucket of the current cpu as well. > > irq_work trims extra free elements from buckets with kfree > > and refills them with kmalloc, so global kmalloc logic takes care > > of freeing objects allocated by one cpu and freed on another. > > > > struct bpf_mem_alloc supports two modes: > > - When size != 0 create kmem_cache and bpf_mem_cache for each cpu. > > This is typical bpf hash map use case when all elements have equal size. > > - When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on > > kmalloc/kfree. Max allocation size is 4096 in this case. > > This is bpf_dynptr and bpf_kptr use case. > > > > bpf_mem_alloc/bpf_mem_free are bpf specific 'wrappers' of kmalloc/kfree. > > bpf_mem_cache_alloc/bpf_mem_cache_free are 'wrappers' of kmem_cache_alloc/kmem_cache_free. > > > > The allocators are NMI-safe from bpf programs only. They are not NMI-safe in general. > > > > Acked-by: Kumar Kartikeya Dwivedi > > Signed-off-by: Alexei Starovoitov > > --- > > include/linux/bpf_mem_alloc.h | 26 ++ > > kernel/bpf/Makefile | 2 +- > > kernel/bpf/memalloc.c | 476 ++++++++++++++++++++++++++++++++++ > > 3 files changed, 503 insertions(+), 1 deletion(-) > > create mode 100644 include/linux/bpf_mem_alloc.h > > create mode 100644 kernel/bpf/memalloc.c > > > [...] > > +#define NUM_CACHES 11 > > + > > +struct bpf_mem_cache { > > + /* per-cpu list of free objects of size 'unit_size'. > > + * All accesses are done with interrupts disabled and 'active' counter > > + * protection with __llist_add() and __llist_del_first(). > > + */ > > + struct llist_head free_llist; > > + local_t active; > > + > > + /* Operations on the free_list from unit_alloc/unit_free/bpf_mem_refill > > + * are sequenced by per-cpu 'active' counter. But unit_free() cannot > > + * fail. When 'active' is busy the unit_free() will add an object to > > + * free_llist_extra. > > + */ > > + struct llist_head free_llist_extra; > > + > > + /* kmem_cache != NULL when bpf_mem_alloc was created for specific > > + * element size. > > + */ > > + struct kmem_cache *kmem_cache; > > + struct irq_work refill_work; > > + struct obj_cgroup *objcg; > > + int unit_size; > > + /* count of objects in free_llist */ > > + int free_cnt; > > +}; > > + > > +struct bpf_mem_caches { > > + struct bpf_mem_cache cache[NUM_CACHES]; > > +}; > > + > > Could we now also completely get rid of the current map prealloc infra (pcpu_freelist* > I mean), and replace it with above variant altogether? Would be nice to make it work > for this case, too, and then get rid of percpu_freelist.{h,c} .. it's essentially a > superset wrt functionality iiuc? Eventually it would be possible to get rid of prealloc logic completely, but not so fast. LRU map needs to be converted first. Then a lot of production testing is necessary to gain confidence and make sure we didn't miss any corner cases.