From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B08DAC32771 for ; Fri, 19 Aug 2022 22:23:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F22EE6B0073; Fri, 19 Aug 2022 18:23:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED4176B0074; Fri, 19 Aug 2022 18:23:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9A478D0001; Fri, 19 Aug 2022 18:23:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C819F6B0073 for ; Fri, 19 Aug 2022 18:23:29 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8B460C0960 for ; Fri, 19 Aug 2022 22:23:29 +0000 (UTC) X-FDA: 79817769738.26.38352CC Received: from mail-il1-f195.google.com (mail-il1-f195.google.com [209.85.166.195]) by imf28.hostedemail.com (Postfix) with ESMTP id C948DC00E6 for ; Fri, 19 Aug 2022 22:22:23 +0000 (UTC) Received: by mail-il1-f195.google.com with SMTP id o14so2979929ilt.2 for ; Fri, 19 Aug 2022 15:22:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=5SOE9sy8P4yhAnQX1DvTkOl8AAQBdpmOV8I/crXpCEU=; b=kxFie1ySDa5yoZorl0KsS5IlzarYii6sLei9cixYPth7HX/iAousZy2E7iRCjeQdqu /ull+PTzCLsva/D+Ar05KQEybwCI/WHhDQrUmFh9ssT5LUy0nXBXZD3NPKpYlm070Aul m9spjPpqxfuHTMED5y7daZ0hHCI38/fnsXMPSBsXI2z11OG+OraSrWDeCsB+Qdi89ZvC XRxtC9vhBG4He6TggmJf19APXd4qiglQ5x1CtKrnSn/Ol2X/jA/l52LDMrUIcpbH5lGO erWRLI5fAGgpwaWSHuhFaap+jJNB2zPCi6s2EGBO1l0DIRtBKQ2y3oPPt53pqhs8aKSz /TRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=5SOE9sy8P4yhAnQX1DvTkOl8AAQBdpmOV8I/crXpCEU=; b=7gnGh3mM69Wdm2A75Xon3hQ1pJeNe5Aa9AIKl9y+OUP3BHaRtpPjHOAUTk75NpNu2f yAv0imasQ+TmkHN9LvvAHmQyAe3UXLFMgDSdfgseVvsJpAh1IxlU7hs3uIOS1dhoLToK gTAkPkbbFVXWoUNXFtzN+gXxPrGmbfq7DlX4Ew6mLMUrDzfMdWarw+oHzE1fE1Yl5LVw B/wTGbIBiNQFGA2ShwF+aOEtTdb2DZ6XBPS/b+KjLOil7lMWA6ScacUbkJ9droyKm0NX D6Nd5i1+lfAAIhJ5RkV42oo/PWyvw1TedEoDbYEeRSY7lNmv7c7Tbwgolmw5Z+sH5qZd S7tQ== X-Gm-Message-State: ACgBeo3QzAfHuOLF2ejAwX96O190zr4y3HJGi3dd9Ri9EH4mDUh5jw8B DQYgFv0lWiaaKIh6Z2RStfCsyG6L5EYPC24deFI= X-Google-Smtp-Source: AA6agR6lANax76K/CB0G6hxraxR9GmpX3eQUoFFF3VLdHLjhr7IkWMr3s0gUttvlfeTXUnVMnjW3roklWYKgo798Rws= X-Received: by 2002:a05:6e02:1c04:b0:2df:6b58:5fe8 with SMTP id l4-20020a056e021c0400b002df6b585fe8mr4787926ilh.68.1660947742975; Fri, 19 Aug 2022 15:22:22 -0700 (PDT) MIME-Version: 1.0 References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> <20220819214232.18784-14-alexei.starovoitov@gmail.com> In-Reply-To: <20220819214232.18784-14-alexei.starovoitov@gmail.com> From: Kumar Kartikeya Dwivedi Date: Sat, 20 Aug 2022 00:21:46 +0200 Message-ID: Subject: Re: [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. To: Alexei Starovoitov Cc: davem@davemloft.net, daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660947743; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5SOE9sy8P4yhAnQX1DvTkOl8AAQBdpmOV8I/crXpCEU=; b=VrdzvCeT0jJLr5wChrxPpoS9O6chhvO3bm/qHqPt7BS4CwGWKTW3hBifgxUmjLZhnQpMgJ F+KLLTjKobA//68syuFZrwN2f80B/JX7s5n8Z5Ooi3eZnwOw2kYUbWoeX3CP5RRqdz6jSX XYGJ3vEmF3fAX0mlBR06cxkLDRt0V64= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=kxFie1yS; spf=pass (imf28.hostedemail.com: domain of memxor@gmail.com designates 209.85.166.195 as permitted sender) smtp.mailfrom=memxor@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660947743; a=rsa-sha256; cv=none; b=yr+3ab5lCNkQM5qTm/3NfTivk1wGs1U+39fP4Gv/guXQf21VHC6IxQHR8dZFxzdMOVHMCC E6zMxDq/gtZd6gra977qwt3siQSGh+wH8TjL0RNTezNAoWBtANPL+lczluHMNU0rv52xwR RotBFugDbJJ37ARjKJZLvcc3sHHhzLk= X-Rspam-User: X-Rspamd-Queue-Id: C948DC00E6 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=kxFie1yS; spf=pass (imf28.hostedemail.com: domain of memxor@gmail.com designates 209.85.166.195 as permitted sender) smtp.mailfrom=memxor@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: 8i3s9xd7du1zmxo67o741isgw7nhhxae X-Rspamd-Server: rspam10 X-HE-Tag: 1660947743-782160 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000028, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 19 Aug 2022 at 23:43, Alexei Starovoitov wrote: > > From: Alexei Starovoitov > > Use call_rcu_tasks_trace() to wait for sleepable progs to finish. > Then use call_rcu() to wait for normal progs to finish > and finally do free_one() on each element when freeing objects > into global memory pool. > > Signed-off-by: Alexei Starovoitov > --- I fear this can make OOM issues very easy to run into, because one sleepable prog that sleeps for a long period of time can hold the freeing of elements from another sleepable prog which either does not sleep often or sleeps for a very short period of time, and has a high update frequency. I'm mostly worried that unrelated sleepable programs not even using the same map will begin to affect each other. Have you considered other options? E.g. we could directly expose bpf_rcu_read_lock/bpf_rcu_read_unlock to the program and enforce that access to RCU protected map lookups only happens in such read sections, and unlock invalidates all RCU protected pointers? Sleepable helpers can then not be invoked inside the BPF RCU read section. The program uses RCU read section while accessing such maps, and sleeps after doing bpf_rcu_read_unlock. They can be kfuncs. It might also be useful in general, to access RCU protected data from sleepable programs (i.e. make some sections of the program RCU protected and non-sleepable at runtime). It will allow use of elements from dynamically allocated maps with bpf_mem_alloc while not having to wait for RCU tasks trace grace period, which can extend into minutes (or even longer if unlucky). One difference would be that you can pin a lookup across a sleep cycle with this approach, but not with preallocated maps or the explicit RCU section above, but I'm not sure it's worth it. It isn't possible now. > kernel/bpf/memalloc.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c > index 9e5ad7dc4dc7..d34383dc12d9 100644 > --- a/kernel/bpf/memalloc.c > +++ b/kernel/bpf/memalloc.c > @@ -224,6 +224,13 @@ static void __free_rcu(struct rcu_head *head) > atomic_set(&c->call_rcu_in_progress, 0); > } > > +static void __free_rcu_tasks_trace(struct rcu_head *head) > +{ > + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); > + > + call_rcu(&c->rcu, __free_rcu); > +} > + > static void enque_to_free(struct bpf_mem_cache *c, void *obj) > { > struct llist_node *llnode = obj; > @@ -249,7 +256,11 @@ static void do_call_rcu(struct bpf_mem_cache *c) > * from __free_rcu() and from drain_mem_cache(). > */ > __llist_add(llnode, &c->waiting_for_gp); > - call_rcu(&c->rcu, __free_rcu); > + /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. > + * Then use call_rcu() to wait for normal progs to finish > + * and finally do free_one() on each element. > + */ > + call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace); > } > > static void free_bulk(struct bpf_mem_cache *c) > @@ -452,6 +463,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma) > /* c->waiting_for_gp list was drained, but __free_rcu might > * still execute. Wait for it now before we free 'c'. > */ > + rcu_barrier_tasks_trace(); > rcu_barrier(); > free_percpu(ma->cache); > ma->cache = NULL; > -- > 2.30.2 >