From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B79EFC25B08 for ; Thu, 18 Aug 2022 00:40:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2E0B6B0073; Wed, 17 Aug 2022 20:40:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB5D56B0074; Wed, 17 Aug 2022 20:40:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2FE88D0002; Wed, 17 Aug 2022 20:40:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id ADAE26B0073 for ; Wed, 17 Aug 2022 20:40:02 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7F9481A0986 for ; Thu, 18 Aug 2022 00:40:02 +0000 (UTC) X-FDA: 79810856244.09.DB27329 Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf31.hostedemail.com (Postfix) with ESMTP id 2B139201D8 for ; Thu, 18 Aug 2022 00:40:02 +0000 (UTC) Received: by mail-pg1-f179.google.com with SMTP id f65so42815pgc.12 for ; Wed, 17 Aug 2022 17:40:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=b0WeULDIUoaSdEwpJ/ka61J17NOg91ZQ9Yyb6cQ0jPA=; b=gaQBRYXbabCHDL6BwAnOiQZmJzMtJdbXc5dtk0VqwmZjryXubXT0CvgOnCrenmAomM MzCT+Mmqrv1rjZz70JMyEgDgkYZ+5AE4ws02d6MTKEZvI2VeDsV6BCcwQ3mzRee8XLwA rxrvisAtQxvTA5YxWqKac0CM+JoEUJDQBVhBwD2UCac8KuKHnamtHxs5UDoK6q3psxgy pwCSTUCDznYM2R5G9Fu9dBJxC6VaLSIQ5f4jZE2ovIFsx3toSJwhCNUCvwm2NCvkWBnP Z50TqXNwJynhaWT2z2rKyoyfQMoDzD2ftKaXfqy9w9pabOLk81JNeFqMeW9OrodP9+rp O6iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=b0WeULDIUoaSdEwpJ/ka61J17NOg91ZQ9Yyb6cQ0jPA=; b=wxhtG8aXlURm2p4LUmxrQw6yJ4EYR1Rc8iC7hhp+1L4aAm8nFBKQ5nOJfG4+px0oSR hf5PepHamo/3HSUnjkFh+imlTGuGTmUS5aRTEKM1EhQtcoyKwwpt6vYZB5FZwHZT8xim I9ottzYNXeRMLTrG2uEoNqaTW3/CfZD5yAqDjQrNbyoD+VTTzLVCPqeJfvT6T28+hBKA TT7l20JZTkxc5Dix9Y8F92azlJrJLmIUQfgFCw0su4a212jtHIwlhKBrLOUsKtZYEZY3 DQY5IzmLd8l1hUMsTL9q9zMm2+r/G8z1LYzgugygLFkHGGgB+BiMmZffcxMogN3M6O6A W4UA== X-Gm-Message-State: ACgBeo3piIvzDNGVDwhZ4BlTQZIWH/NgJAzy0gTw0Q78JYVKcyaXSV9D 4hwPu+aHQWkkpE3/sTdQbVw= X-Google-Smtp-Source: AA6agR6O6XoKrrOEFPfj2S1FJ8BXLTWawfQEtsOoZskNoApy+Lcl1KX6VQtna5FEnn/mp9zoo45EHQ== X-Received: by 2002:a63:f34b:0:b0:429:f039:ccfc with SMTP id t11-20020a63f34b000000b00429f039ccfcmr602990pgj.95.1660783201054; Wed, 17 Aug 2022 17:40:01 -0700 (PDT) Received: from MacBook-Pro-3.local ([2620:10d:c090:500::1:ccd6]) by smtp.gmail.com with ESMTPSA id t3-20020a628103000000b0052aaf7fe731sm136033pfd.45.2022.08.17.17.39.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Aug 2022 17:40:00 -0700 (PDT) Date: Wed, 17 Aug 2022 17:39:57 -0700 From: Alexei Starovoitov To: Kumar Kartikeya Dwivedi Cc: davem@davemloft.net, daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH v2 bpf-next 01/12] bpf: Introduce any context BPF specific memory allocator. Message-ID: <20220818003957.t5lcp636n7we37hk@MacBook-Pro-3.local> References: <20220817210419.95560-1-alexei.starovoitov@gmail.com> <20220817210419.95560-2-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gaQBRYXb; spf=pass (imf31.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660783202; a=rsa-sha256; cv=none; b=1Dac96riUNdN1hBhdUjrcq/Qs8Dk3XQTfAFchhLwKPk1+H+RpHZe2YctDQhD6M9p+k3EZ+ sNPvvE1mqEToayP13o728Pi2NWLhpV0QtPhojX1pFmYhrgm9w8ujkp1OZGTSM8mXYwU1N2 RxD9xAlXU/OKpC6Mg58e12YR3RQtJLE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660783202; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=b0WeULDIUoaSdEwpJ/ka61J17NOg91ZQ9Yyb6cQ0jPA=; b=G5OHwVk+9QBIKuByv2NHe4mcyrBPHVLpcaF3hO9XuxI1CbQgW3ox8xQbYf4W/JsrxfTrLU CKzCeohP1Re5Ds4HzBlYyuDQab1f/Mq0kqfMjU7ehEdpx5dRFZJmP64XNskA/fcgEk/olF Q1oLMRnFntCre1xU39vx4CfZbYoq5DQ= X-Stat-Signature: diz8nd9sgtg1cwaqyqs1pn476k7uz7k5 X-Rspam-User: Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gaQBRYXb; spf=pass (imf31.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Queue-Id: 2B139201D8 X-Rspamd-Server: rspam03 X-HE-Tag: 1660783202-634787 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000358, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 18, 2022 at 01:51:39AM +0200, Kumar Kartikeya Dwivedi wrote: > > + > > +/* notrace is necessary here and in other functions to make sure > > + * bpf programs cannot attach to them and cause llist corruptions. > > + */ > > +static void notrace *unit_alloc(struct bpf_mem_cache *c) > > +{ > > + bool in_nmi = bpf_in_nmi(); > > + struct llist_node *llnode; > > + unsigned long flags; > > + int cnt = 0; > > + > > + if (unlikely(in_nmi)) { > > + llnode = llist_del_first(&c->free_llist_nmi); > > + if (llnode) > > + cnt = atomic_dec_return(&c->free_cnt_nmi); > > I am trying to understand which case this > atomic_dec_return/atomic_inc_return protects against in the > unit_alloc/unit_free for in_nmi branch. Is it protecting nested NMI > BPF prog interrupting NMI prog? > > In case of perf it seems we use bpf_prog_active, yes, but bpf_prog_active has plenty of downsides and hopefully will be replaced eventually with cleaner mechanism. Separate topic. > so nested NMI prog > won't be invoked while we are interrupted inside a BPF program in NMI > context. Which are the other cases that might cause reentrancy in this > branch such that we need atomics instead of c->free_cnt_nmi--? Or are > you anticipating you might allow this in the future even if it is > disallowed for now? > > If programs are allowed to stack like this, and we try to reason about > the safety of llist_del_first operation, the code is: > > struct llist_node *llist_del_first(struct llist_head *head) > { > struct llist_node *entry, *old_entry, *next; > > entry = smp_load_acquire(&head->first); > for (;;) { > if (entry == NULL) > return NULL; > old_entry = entry; > next = READ_ONCE(entry->next); > >>>>>>>> Suppose nested NMI comes at this point and BPF prog is invoked. llist_del_first is notrace. unit_alloc() above is also notrace. See comment before it. perf event overflow NMI can happen here, but for some other llist. Hence we're talking about NMI issues only here. fentry progs do not apply here. > Assume the current nmi free llist is HEAD -> A -> B -> C -> D -> ... > For our cmpxchg, parameters are going to be cmpxchg(&head->first, A, B); > > Now, nested NMI prog does unit_alloc thrice. this does llist_del_first thrice Even double llist_del_first on the same llist is bad. That's a known fact. > This makes nmi free llist HEAD -> D -> ... > A, B, C are allocated in prog. > Now it does unit_free of all three, but in order of B, C, A. > unit_free does llist_add, nmi free llist becomes HEAD -> A -> C -> B -> D -> ... > > Nested NMI prog exits. > We continue with our cmpxchg(&head->first, A, B); It succeeds, A is > returned, but C will be leaked. This exact scenario cannot happen for bpf_mem_cache's freelist. unit_alloc is doing llist_del_first on per-cpu freelist. We can have two perf_event bpf progs. Both progs would share the same hash map and use the same struct bpf_mem_alloc, and both call unit_alloc() on the same cpu freelist, but as you noticed bpf_prog_active covers that case. bpf_prog_active is too coarse as we discussed in the other thread a month or so ago. It prevents valid and safe execution of bpf progs, lost events, etc. We will surely come up with a better mechanism. Going back to your earlier question: > Which are the other cases that might cause reentrancy in this > branch such that we need atomics instead of c->free_cnt_nmi--? It's the case where perf_event bpf prog happened inside bpf_mem_refill in irq_work. bpf_mem_refill manipulates free_cnt_nmi and nmi bpf prog too through unit_alloc. Which got me thinking that there is indeed a missing check here. We need to protect free_bulk_nmi's llist_del_first from unit_alloc's llist_del_first. bpf_prog_active could be used for that, but let's come up with a cleaner way. Probably going to add atomic_t flag to bpf_mem_cache and cmpxchg it, or lock and spin_trylock it. tbd.