From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 068B5C43334 for ; Wed, 6 Jul 2022 17:43:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 980606B0083; Wed, 6 Jul 2022 13:43:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 92E2E6B0085; Wed, 6 Jul 2022 13:43:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F5C06B0087; Wed, 6 Jul 2022 13:43:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6BFF96B0083 for ; Wed, 6 Jul 2022 13:43:35 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 4EF7481329 for ; Wed, 6 Jul 2022 17:43:35 +0000 (UTC) X-FDA: 79657397190.24.3C078B4 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf13.hostedemail.com (Postfix) with ESMTP id 790FC20041 for ; Wed, 6 Jul 2022 17:43:34 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id j3so2340578pfb.6 for ; Wed, 06 Jul 2022 10:43:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=U8RDIjaz0+fXEx3uqi/jOmLPcygV9XMbpDJVew9ijlg=; b=SwsKV8KtbCrcOUNyLknyYbvuvPKCxhptX92Pih/BS3o+yOj4UdLpLlQLW+N3rC236T Lsxp6ZW+zdNIUCj6ZBO3jsAG+61zVBj4oCLqMkCKrJUm7aqNTXVfl484QufU8AWnGdPR zQSSCm2icWznFGi+nBu683nwfhD6f9Jk6yr1/+QownsA49Fe8AVUjWAtObYd7JHmdxDa zyb4eLQsHkhN58dEzIEJHRUPo6usYmpj23K0KvEGw77TVthq4OHsBsvfSo4XMJ5S24nB b069B7rvTBwuteoXMfcAzGRHS1emSQdNLPWvV/ujFw1uB/dFeXSbPEHl4GjlV+gMRxAL nfDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=U8RDIjaz0+fXEx3uqi/jOmLPcygV9XMbpDJVew9ijlg=; b=h0+3azh0UnNQOiMfoP3eaaLUfzDs55a5QpYMjYbw1ZCyf4xyf+J/Aut41Kna6MSe6/ udgNLhF6pj+RcJ1IGNK1BnEiqLRgdS5wBiWnaP1cQqQet+qKiYRW5mtFGVHN2WtKOVsE GlIsiZzo7FCBl7SQUbiF24sPUhC9m0ZOtIC8LgpO0jldt3rpg9A92LefsIzfsoaS7dzZ nxN2L1g9ayju0UdCpB0wt/gnBDzQ8V5Id/rjOCEdNHptO7ETdbM6V32s+FlnoR1Q6sg4 sAVgQYRXILfFLh6JcFJEzJb7nlphJ+w8Jkz+ToCGnMs3bwds20c78F4g6sgkmA46Tn3Q lk6w== X-Gm-Message-State: AJIora99kolb5092d0SZy2qqWdr2TfUejVJwNZFU/F8YuFYYjVxXwe9O L6t2M8OBbOcQmTzwsffQFLw= X-Google-Smtp-Source: AGRyM1vKoa2DmuabAmTZ2hPr/9xAAB4oiaToQhSSqpIW6Xdq5sj4R43bnvlHd0GZzG4/BYQgJM7mhw== X-Received: by 2002:a63:1921:0:b0:412:407f:f012 with SMTP id z33-20020a631921000000b00412407ff012mr15301509pgl.125.1657129411876; Wed, 06 Jul 2022 10:43:31 -0700 (PDT) Received: from MacBook-Pro-3.local ([2620:10d:c090:500::2:8597]) by smtp.gmail.com with ESMTPSA id a8-20020aa78e88000000b0052534ade61dsm25124018pfr.185.2022.07.06.10.43.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Jul 2022 10:43:31 -0700 (PDT) Date: Wed, 6 Jul 2022 10:43:28 -0700 From: Alexei Starovoitov To: Vlastimil Babka Cc: Christoph Lameter , Christoph Hellwig , David Miller , Daniel Borkmann , Andrii Nakryiko , Tejun Heo , Martin KaFai Lau , bpf , Kernel Team , linux-mm , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Matthew Wilcox , "Liam R. Howlett" Subject: Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator. Message-ID: <20220706174328.xqfyu4ikjvutnpr4@MacBook-Pro-3.local> References: <20220628170343.ng46xfwi32vefiyp@MacBook-Pro-3.local> <8a160205-99fe-a632-aeed-6b59eadc2aa2@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8a160205-99fe-a632-aeed-6b59eadc2aa2@suse.cz> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657129414; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=U8RDIjaz0+fXEx3uqi/jOmLPcygV9XMbpDJVew9ijlg=; b=e7fcZC5tTkKytvTktWT3ZqgEsg+E4crFaVd6wfqWmE6UvSpajJumq1ZrN3AY/kQdV4cOds dUSRatne6f+qANKHKCjtGT+QJvHCzvS3k8+TGNyXwsBGdjg4ljQNhyhFeNMB51UeY825oe H4sc/8Ih9dMfLYYSWRhQDM3kSbz9BRM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=SwsKV8Kt; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657129414; a=rsa-sha256; cv=none; b=7lhhFfCwbd+qvW8XT4WSP98Mxzzem04F511ELRdit61pmvnGM0c8+m1c8zNOp4cA25RRf0 k92aBHlhsO2ZprEUrIBTI8IFTN6p2X5CSOaRnYiDxiWTydceaYAhturl4y+/84I6AHfW4d iCfrpKyoAZcaWnLLge5OB0NrqfkLRyA= X-Rspam-User: X-Rspamd-Server: rspam07 Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=SwsKV8Kt; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Stat-Signature: xs4hequdur3qz5yois7rpu1qroekxomo X-Rspamd-Queue-Id: 790FC20041 X-HE-Tag: 1657129414-752132 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jul 04, 2022 at 06:13:17PM +0200, Vlastimil Babka wrote: > On 6/29/22 04:49, Alexei Starovoitov wrote: > > On Tue, Jun 28, 2022 at 7:35 PM Christoph Lameter wrote: > >> > >> On Tue, 28 Jun 2022, Alexei Starovoitov wrote: > >> > >> > > That is a relatively new feature due to RT logic support. without RT this > >> > > would be a simple irq disable. > >> > > >> > Not just RT. > >> > It's a slow path: > >> > if (IS_ENABLED(CONFIG_PREEMPT_RT) || > >> > unlikely(!object || !slab || !node_match(slab, node))) { > >> > local_unlock_irqrestore(&s->cpu_slab->lock,...); > >> > and that's not the only lock in there. > >> > new_slab->allocate_slab... alloc_pages grabbing more locks. > >> > >> > >> Its not a lock for !RT. > >> > >> The fastpath is lockless if hardware allows that but then we go into more > >> and more serialiation needs as the allocation gets more into the page > >> allocator logic. > > Yeah I don't think the recent RT-related changes made this much worse than > it already was. In alloc side you could perhaps try the really lockless > fastpaths only and fail if e.g. the per-cpu slabs were empty (but would BPF > be happy with that?). On the free side though you could end up having to > move a slab from partial to free list as a result, and now a spin lock is > needed (even before the RT changes), and you can't really fail a free... > > > On RT fast path == slow path with a lock. > > On !RT fast path is lock less. > > That's all correct. > > bpf side has to make sure safety in all possible paths > > therefore RT or !RT makes no difference. > > So AFAIK we don't right now have what BFP needs - an extra-constrained kind > of GFP_ATOMIC. I don't object you adding it privately. But it's another > reason to think about if these things can be generalized. For example we had > a discussion about the Maple tree having kinda similar kinds of requirements > to avoid its tree node preallocations always for the worst possible case. What kind of maple tree needs? Does it need to be fully reentrant and nmi safe? Not really. The caller knows the context and can choose appropriate flags. While bpf alloc doesn't know the context. The bpf prog can be called from places where slab/page/kasan specific locks are held which makes all these pieces non-reentrable. The full prealloc of bpf maps (read: waste a lot of memory) was our solution until now. This is specific to tracing bpf programs, of course. bpf networking, bpf security, sleepable bpf are completely different. > I'm not sure we can sanely implement this within each of SLAB/SLUB/SLOB, or > rather provide a generic cache on top... Notice that all of bpf cache functions are notrace/nokprobe/no locks. The main difference vs all other allocators is bpf_mem_alloc from cache and refill of the cache are two asynchronous operations. It allows the former to be reentrant and nmi safe. All in tree allocators sooner or later synchornously call into page_alloc, kasan, memleak and other debugging facilites that grab locks.