From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB063ECAAD2 for ; Mon, 29 Aug 2022 21:59:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 49CD96B0073; Mon, 29 Aug 2022 17:59:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 44AF36B0074; Mon, 29 Aug 2022 17:59:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33AE6940007; Mon, 29 Aug 2022 17:59:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 256996B0073 for ; Mon, 29 Aug 2022 17:59:36 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 02BF140A34 for ; Mon, 29 Aug 2022 21:59:35 +0000 (UTC) X-FDA: 79853997552.15.5DCBC47 Received: from www62.your-server.de (www62.your-server.de [213.133.104.62]) by imf10.hostedemail.com (Postfix) with ESMTP id BC579C003E for ; Mon, 29 Aug 2022 21:59:35 +0000 (UTC) Received: from sslproxy06.your-server.de ([78.46.172.3]) by www62.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92.3) (envelope-from ) id 1oSmn4-0009ZZ-Ck; Mon, 29 Aug 2022 23:59:30 +0200 Received: from [85.1.206.226] (helo=linux-4.home) by sslproxy06.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oSmn4-0004QQ-3v; Mon, 29 Aug 2022 23:59:30 +0200 Subject: Re: [PATCH v4 bpf-next 01/15] bpf: Introduce any context BPF specific memory allocator. To: Alexei Starovoitov , davem@davemloft.net Cc: andrii@kernel.org, tj@kernel.org, memxor@gmail.com, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com References: <20220826024430.84565-1-alexei.starovoitov@gmail.com> <20220826024430.84565-2-alexei.starovoitov@gmail.com> From: Daniel Borkmann Message-ID: <181cb6ae-9d98-8986-4419-5013662b0189@iogearbox.net> Date: Mon, 29 Aug 2022 23:59:29 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20220826024430.84565-2-alexei.starovoitov@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.103.6/26642/Mon Aug 29 09:54:26 2022) ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of daniel@iogearbox.net designates 213.133.104.62 as permitted sender) smtp.mailfrom=daniel@iogearbox.net; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661810375; a=rsa-sha256; cv=none; b=gdfbxNngRpabGXKGNKpQO9AH5G+WFNleJEWRl7JBqiMnFd+ehil+9A7+Kb8BgbhIyl90KC MDGvXdHoakk1BB9ksFdIdYC9JGBa8j/wbp9ddJK9srKljvnGmyX9n23U70ywfWzdVatL9m 5grcryI/qW4beW/KjDnqziRqI6F9V10= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661810375; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VbUZUST2H9buWesNpM+/JzxucNzre5PfJhDIZb8I+dU=; b=JU7SCys20djJsGLsLyPgwD/h4dRwpASCH0Un+03p8XwiR+iqiREWESiYsuTiSe++bFQvbF r5vRpDLMfk6FpQPu6ZClrW5i3vSBoHW849d4mBsfooqtc4nhtgBQFWJDOjylFn3Hoi8b5O mhvRMvizL5eX41XeXcdmDEl97OPl9+U= X-Stat-Signature: 38as97wrpjog5w4njdqh3jejehjf3sez X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of daniel@iogearbox.net designates 213.133.104.62 as permitted sender) smtp.mailfrom=daniel@iogearbox.net; dmarc=none X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: BC579C003E X-HE-Tag: 1661810375-440382 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/26/22 4:44 AM, Alexei Starovoitov wrote: > From: Alexei Starovoitov > > Tracing BPF programs can attach to kprobe and fentry. Hence they > run in unknown context where calling plain kmalloc() might not be safe. > > Front-end kmalloc() with minimal per-cpu cache of free elements. > Refill this cache asynchronously from irq_work. > > BPF programs always run with migration disabled. > It's safe to allocate from cache of the current cpu with irqs disabled. > Free-ing is always done into bucket of the current cpu as well. > irq_work trims extra free elements from buckets with kfree > and refills them with kmalloc, so global kmalloc logic takes care > of freeing objects allocated by one cpu and freed on another. > > struct bpf_mem_alloc supports two modes: > - When size != 0 create kmem_cache and bpf_mem_cache for each cpu. > This is typical bpf hash map use case when all elements have equal size. > - When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on > kmalloc/kfree. Max allocation size is 4096 in this case. > This is bpf_dynptr and bpf_kptr use case. > > bpf_mem_alloc/bpf_mem_free are bpf specific 'wrappers' of kmalloc/kfree. > bpf_mem_cache_alloc/bpf_mem_cache_free are 'wrappers' of kmem_cache_alloc/kmem_cache_free. > > The allocators are NMI-safe from bpf programs only. They are not NMI-safe in general. > > Acked-by: Kumar Kartikeya Dwivedi > Signed-off-by: Alexei Starovoitov > --- > include/linux/bpf_mem_alloc.h | 26 ++ > kernel/bpf/Makefile | 2 +- > kernel/bpf/memalloc.c | 476 ++++++++++++++++++++++++++++++++++ > 3 files changed, 503 insertions(+), 1 deletion(-) > create mode 100644 include/linux/bpf_mem_alloc.h > create mode 100644 kernel/bpf/memalloc.c > [...] > +#define NUM_CACHES 11 > + > +struct bpf_mem_cache { > + /* per-cpu list of free objects of size 'unit_size'. > + * All accesses are done with interrupts disabled and 'active' counter > + * protection with __llist_add() and __llist_del_first(). > + */ > + struct llist_head free_llist; > + local_t active; > + > + /* Operations on the free_list from unit_alloc/unit_free/bpf_mem_refill > + * are sequenced by per-cpu 'active' counter. But unit_free() cannot > + * fail. When 'active' is busy the unit_free() will add an object to > + * free_llist_extra. > + */ > + struct llist_head free_llist_extra; > + > + /* kmem_cache != NULL when bpf_mem_alloc was created for specific > + * element size. > + */ > + struct kmem_cache *kmem_cache; > + struct irq_work refill_work; > + struct obj_cgroup *objcg; > + int unit_size; > + /* count of objects in free_llist */ > + int free_cnt; > +}; > + > +struct bpf_mem_caches { > + struct bpf_mem_cache cache[NUM_CACHES]; > +}; > + Could we now also completely get rid of the current map prealloc infra (pcpu_freelist* I mean), and replace it with above variant altogether? Would be nice to make it work for this case, too, and then get rid of percpu_freelist.{h,c} .. it's essentially a superset wrt functionality iiuc? Thanks, Daniel