From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADC86C43334 for ; Tue, 28 Jun 2022 17:03:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1DE088E0002; Tue, 28 Jun 2022 13:03:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 18ECD8E0001; Tue, 28 Jun 2022 13:03:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 030078E0002; Tue, 28 Jun 2022 13:03:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E8F178E0001 for ; Tue, 28 Jun 2022 13:03:48 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C2D77349B0 for ; Tue, 28 Jun 2022 17:03:48 +0000 (UTC) X-FDA: 79628266536.30.7C22E84 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf12.hostedemail.com (Postfix) with ESMTP id 4A61D4002A for ; Tue, 28 Jun 2022 17:03:48 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id g20-20020a17090a579400b001ed52939d72so7878016pji.4 for ; Tue, 28 Jun 2022 10:03:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=OU8WUB6QZv3fzXCyA8VX1te4xsGWLQRtBW2oFDLfWyA=; b=eVenewKW8iLFOd58gS2pf8kiu9IT31PyyWl0pIqOay7A39me1hIT/dn429X1+DGUvG a4XO7Uu7NawQiEuEy+Fu4WDtDcGBkEJsrqx6e+rN1RJWdjfQWsOOJ87Gw5on0mUvd9X2 o1CUXjq7cS/9DbN9Vr2KTCQHKVGih4hN+2S6Nx/hjhb34cwcYTj/qQwAYvhe3seoA9Qr B+HN/qy1RyTnpSl6cQMVF6uAkuOSWLrp1k5Esb3pS6bl2Lq6Xy1De45xGNok8bDaiYXA RO+Dn8y90jzlMvb5XcW+Z3GJnVEcnUKrUdI4aR54B+ETduHp6t0nYtvILDAt7pWnSV/U DtwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=OU8WUB6QZv3fzXCyA8VX1te4xsGWLQRtBW2oFDLfWyA=; b=EPHlythYGgDGKeEu2nayq83lGDXPiKGS0lx8B6mRVbbposK66U4Fr3iTnJ5d00Wjjt 9gOxFTBuItE3EtFwqy5JP0gAZSxiiVZIKcZtrcUSH54qLHLbXgvv7OgNDaojvx0pdxLC MbtRip+LvJGpgPMSqxKiH6JPIlw3+S7znd/bYsjkMezdNxJXi6xco05gc0yFofbzWD3d Vz4HNwsj5gvsjMMe4B1NLkJHYUf0s8k8OXHMOOW/ZCmaIp6xElCUeRx4xP7JzIImjEvW ABuNm+oVvdGb9YNHiL+2QJYMPB9pLnX+sp7pzoDR5cKTxBKEgzCUNsuAzUINJzn1K8Xs J3hw== X-Gm-Message-State: AJIora+BU58mrp3Y4b5xvusQJVtnXi/4EN1N5VMxy7j55c92F3JKIy+E T48Cp+CzNK0/w4Kl9Q1cRDw= X-Google-Smtp-Source: AGRyM1ui/q1saVihJEUNQmGeDO5B6I9PqLvtGxAYvreR9CUUZffg3O2W//t5c0Vd2JfQidUWwVszJg== X-Received: by 2002:a17:902:6bc6:b0:16a:569d:33da with SMTP id m6-20020a1709026bc600b0016a569d33damr5737146plt.59.1656435827001; Tue, 28 Jun 2022 10:03:47 -0700 (PDT) Received: from MacBook-Pro-3.local ([2620:10d:c090:500::1:50]) by smtp.gmail.com with ESMTPSA id 9-20020a170902c20900b0015e8d4eb1dfsm9607528pll.41.2022.06.28.10.03.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 10:03:46 -0700 (PDT) Date: Tue, 28 Jun 2022 10:03:43 -0700 From: Alexei Starovoitov To: Christoph Lameter Cc: Christoph Hellwig , David Miller , Daniel Borkmann , Andrii Nakryiko , Tejun Heo , Martin KaFai Lau , bpf , Kernel Team , linux-mm , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Vlastimil Babka Subject: Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator. Message-ID: <20220628170343.ng46xfwi32vefiyp@MacBook-Pro-3.local> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656435828; a=rsa-sha256; cv=none; b=Gx4RDpO3kVUiwLwb36QNIKdnxRlZA/I+W9WJjcpdriGnPSuUCCYHslTw0ZVEDJurtyw6bo fXXoddd+Vw3P8jmOnkh4d1hQDTNPQJ7aHXPxpnwsvsfKmCPEZJDu75XGlyc57tXcn4RtgF LlRzxCgbk1W0T2x79n3hIkyN1bmIETE= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=eVenewKW; spf=pass (imf12.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656435828; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OU8WUB6QZv3fzXCyA8VX1te4xsGWLQRtBW2oFDLfWyA=; b=C5LpXL2lS4T5v2mCBApOpI+JBTS2ujP4Emu5mekU6rXQ7gII1cK7ZC+77zlmqVpxY3sTnF dz77NNK9n7Z3SCdCKySl21aQpasAwa0jopuKUktyrz6P1tJoigWZd00KeLDaqZkG93Jaxo xSyo0YwfEwrSvTDdc9ym1g6J9xOHyR0= X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=eVenewKW; spf=pass (imf12.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam02 X-Stat-Signature: ot43548q345sywbs3ngsjhc7rq41sxyp X-Rspamd-Queue-Id: 4A61D4002A X-HE-Tag: 1656435828-270840 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 28, 2022 at 03:57:54PM +0200, Christoph Lameter wrote: > On Mon, 27 Jun 2022, Alexei Starovoitov wrote: > > > On Mon, Jun 27, 2022 at 5:17 PM Christoph Lameter wrote: > > > > > > > From: Alexei Starovoitov > > > > > > > > Introduce any context BPF specific memory allocator. > > > > > > > > Tracing BPF programs can attach to kprobe and fentry. Hence they > > > > run in unknown context where calling plain kmalloc() might not be safe. > > > > Front-end kmalloc() with per-cpu per-bucket cache of free elements. > > > > Refill this cache asynchronously from irq_work. > > > > > > GFP_ATOMIC etc is not going to work for you? > > > > slab_alloc_node->slab_alloc->local_lock_irqsave > > kprobe -> bpf prog -> slab_alloc_node -> deadlock. > > In other words, the slow path of slab allocator takes locks. > > That is a relatively new feature due to RT logic support. without RT this > would be a simple irq disable. Not just RT. It's a slow path: if (IS_ENABLED(CONFIG_PREEMPT_RT) || unlikely(!object || !slab || !node_match(slab, node))) { local_unlock_irqrestore(&s->cpu_slab->lock,...); and that's not the only lock in there. new_slab->allocate_slab... alloc_pages grabbing more locks. > Generally doing slab allocation while debugging slab allocation is not > something that can work. Can we exempt RT locks/irqsave or slab alloc from > BPF tracing? People started doing lock profiling with bpf back in 2017. People do rcu profiling now and attaching bpf progs to all kinds of low level kernel internals: page alloc, etc. > I would assume that other key items of kernel logic will have similar > issues. We're _not_ asking for any changes from mm/slab side. Things were working all these years. We're making them more efficient now by getting rid of 'lets prealloc everything' approach. > > Which makes it unsafe to use from tracing bpf progs. > > That's why we preallocated all elements in bpf maps, > > so there are no calls to mm or rcu logic. > > bpf specific allocator cannot use locks at all. > > try_lock approach could have been used in alloc path, > > but free path cannot fail with try_lock. > > Hence the algorithm in this patch is purely lockless. > > bpf prog can attach to spin_unlock_irqrestore and > > safely do bpf_mem_alloc. > > That is generally safe unless you get into reetrance issues with memory > allocation. Right. Generic slab/mm/page_alloc/rcu are not ready for reentrance and are not safe from NMI either. That's why we're added all kinds of safey mechanisms in bpf layers. > Which begs the question: > > What happens if I try to use BPF to trace *your* shiny new memory 'shiny and new' is overstatement. It's a trivial lock less freelist layer on top of kmalloc. Please read the patch. > allocation functions in the BPF logic like bpf_mem_alloc? How do you stop > that from happening? here is the comment in the patch: /* notrace is necessary here and in other functions to make sure * bpf programs cannot attach to them and cause llist corruptions. */