From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2026C28D13 for ; Fri, 19 Aug 2022 22:56:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 592718D0001; Fri, 19 Aug 2022 18:56:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 542306B0074; Fri, 19 Aug 2022 18:56:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 409C38D0001; Fri, 19 Aug 2022 18:56:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2F6F16B0073 for ; Fri, 19 Aug 2022 18:56:50 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 056881C5FF4 for ; Fri, 19 Aug 2022 22:56:50 +0000 (UTC) X-FDA: 79817853780.05.0C228D2 Received: from mail-io1-f67.google.com (mail-io1-f67.google.com [209.85.166.67]) by imf31.hostedemail.com (Postfix) with ESMTP id B0E6B20003 for ; Fri, 19 Aug 2022 22:56:49 +0000 (UTC) Received: by mail-io1-f67.google.com with SMTP id y187so4383262iof.0 for ; Fri, 19 Aug 2022 15:56:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=DLOUyCCY0hq75QhLqDKmn7t1fk6HIbRSoz8cQOqRdjU=; b=WllZ1MyUYHokH1yJZE09n63r/dBEstNHftlfD0HAp1KhpKMqF1/Z8XvMuaekNj6D2E yiaAcDBfUp0Yw2oo6Jrb5Frpz35VRZ/eux1BikHDsMjZj1vJB0Jfa2GBq/bBiEc8V7Ro uxvaskCOrN0VdjHKDLnxHduDFDG2N1wNW4UsguZjx11yzAsk2gdVaBEXD6gOrQ3p8fby /b16skGWxU/jPFChheAS3mVmTpOz8JEkyH3wEjopxe/4JFHxiXEzxZSvhivGgjyjTHnf sfBNDs4203lfXqSfiznvIOV9m6Gyez1/cRDsoPcxapDh4ovwegds3i5PBHPS8b09g0DU D0pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=DLOUyCCY0hq75QhLqDKmn7t1fk6HIbRSoz8cQOqRdjU=; b=GubxA1BfuV60mrzXYJflHyyXr/CkpW/lpPsAr537i3eHGwBXbMSRp6R7hLHvR/z+Oy nwKUzQdBWmQ8PB1eT6v3rYA+uZEZMdTy2UMwm/kCFAzAIQ2hV8Z3grtXg2HcySCbGDUt hYfHANfhknG7jcPMUDfGuAI/8u+fnA4zVz1VFuJP48nrwU3lV9Q/+hZpL4+ZG/G3Zf62 aiieOdQxCBmk1Pgg2l25NL7aHi4DFEtulIyzhIOXbBVkgZcrpj6iSSC/Wq6UayJ2+eTx 7Arfyg7vB9cVS4gbz5JMjKVu/fGMuxxO6dR3QNrDt51roAKjpDFy1VWnxb1qlaW5UfGN SfQQ== X-Gm-Message-State: ACgBeo2AAMKLTgzWTu0/CBIum7wVROd+er+dSdmLzItXqQlAtShoStMG SFP/8p/6u+hIlKEWbno2FLWX4mHPc1P1ZNtHK5I= X-Google-Smtp-Source: AA6agR6GPCNEdJAZnj9Er+xf1cG/9xaEXyLAtHplHeKYSLQ6x1oJrO/7bZ/1cr8ThdGT3GBdrweoQ+xhMYkKBtIWDy4= X-Received: by 2002:a5e:dc46:0:b0:689:94f6:fa3e with SMTP id s6-20020a5edc46000000b0068994f6fa3emr840750iop.110.1660949808896; Fri, 19 Aug 2022 15:56:48 -0700 (PDT) MIME-Version: 1.0 References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> <20220819214232.18784-14-alexei.starovoitov@gmail.com> <20220819224317.i3mwmr5atdztudtt@MacBook-Pro-3.local.dhcp.thefacebook.com> In-Reply-To: <20220819224317.i3mwmr5atdztudtt@MacBook-Pro-3.local.dhcp.thefacebook.com> From: Kumar Kartikeya Dwivedi Date: Sat, 20 Aug 2022 00:56:12 +0200 Message-ID: Subject: Re: [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. To: Alexei Starovoitov Cc: davem@davemloft.net, daniel@iogearbox.net, andrii@kernel.org, tj@kernel.org, delyank@fb.com, linux-mm@kvack.org, bpf@vger.kernel.org, kernel-team@fb.com Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660949809; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DLOUyCCY0hq75QhLqDKmn7t1fk6HIbRSoz8cQOqRdjU=; b=haaHCHs7jm/R9g8LA6s82F8qpDc4HtmVdWz6e+sxpoamKzFZ4baFqjtBe+3lU2AOAV/Z/2 OI2VAgfQI6BTuXj/Ck818mjJxF2C+9TCH9yUuQnmeI5vU5PNRvH2RyLZi43MgxOn9JmsDR XWnFaOML0hdUtwShVUvvapfDvpr2HKk= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=WllZ1MyU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf31.hostedemail.com: domain of memxor@gmail.com designates 209.85.166.67 as permitted sender) smtp.mailfrom=memxor@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660949809; a=rsa-sha256; cv=none; b=5G2WfnDrSybizMRwFMHBMptajhlaY67oCi2WFNkaGL8sFwirEdngXgg8ey8442nUy9Wog0 qMywWsNRCkw9dZY/xF1Q8rAJBqD+bQw+G3/IdYcoQMuRsT90APCDUgtju71mDGBvobEMgG 7K5Vjr+dMpbaT8rWZejaCQ1A220PjDY= Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=WllZ1MyU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf31.hostedemail.com: domain of memxor@gmail.com designates 209.85.166.67 as permitted sender) smtp.mailfrom=memxor@gmail.com X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B0E6B20003 X-Stat-Signature: x9r1weijenxsga6hgynqi9mpgz4yn766 X-Rspam-User: X-HE-Tag: 1660949809-62776 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000046, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, 20 Aug 2022 at 00:43, Alexei Starovoitov wrote: > > On Sat, Aug 20, 2022 at 12:21:46AM +0200, Kumar Kartikeya Dwivedi wrote: > > On Fri, 19 Aug 2022 at 23:43, Alexei Starovoitov > > wrote: > > > > > > From: Alexei Starovoitov > > > > > > Use call_rcu_tasks_trace() to wait for sleepable progs to finish. > > > Then use call_rcu() to wait for normal progs to finish > > > and finally do free_one() on each element when freeing objects > > > into global memory pool. > > > > > > Signed-off-by: Alexei Starovoitov > > > --- > > > > I fear this can make OOM issues very easy to run into, because one > > sleepable prog that sleeps for a long period of time can hold the > > freeing of elements from another sleepable prog which either does not > > sleep often or sleeps for a very short period of time, and has a high > > update frequency. I'm mostly worried that unrelated sleepable programs > > not even using the same map will begin to affect each other. > > 'sleep for long time'? sleepable bpf prog doesn't mean that they can sleep. > sleepable progs can copy_from_user, but they're not allowed to waste time. It is certainly possible to waste time, but indirectly, not through the BPF program itself. If you have userfaultfd enabled (for unpriv users), an unprivileged user can trap a sleepable BPF prog (say LSM) using bpf_copy_from_user for as long as it wants. A similar case can be done using FUSE, IIRC. You can then say it's a problem about unprivileged users being able to use userfaultfd or FUSE, or we could think about fixing bpf_copy_from_user to return -EFAULT for this case, but it is totally possible right now for malicious userspace to extend the tasks trace gp like this for minutes (or even longer) on a system where sleepable BPF programs are using e.g. bpf_copy_from_user. > I don't share OOM concerns at all. > max_entries and memcg limits are still there and enforced. > dynamic map is strictly better and memory efficient than full prealloc. > > > Have you considered other options? E.g. we could directly expose > > bpf_rcu_read_lock/bpf_rcu_read_unlock to the program and enforce that > > access to RCU protected map lookups only happens in such read > > sections, and unlock invalidates all RCU protected pointers? Sleepable > > helpers can then not be invoked inside the BPF RCU read section. The > > program uses RCU read section while accessing such maps, and sleeps > > after doing bpf_rcu_read_unlock. They can be kfuncs. > > Yes. We can add explicit bpf_rcu_read_lock and teach verifier about RCU CS, > but I don't see the value specifically for sleepable progs. > Current sleepable progs can do map lookup without extra kfuncs. > Explicit CS would force progs to be rewritten which is not great. > > > It might also be useful in general, to access RCU protected data from > > sleepable programs (i.e. make some sections of the program RCU > > protected and non-sleepable at runtime). It will allow use of elements > > For other cases, sure. We can introduce RCU protected objects and > explicit bpf_rcu_read_lock. > > > from dynamically allocated maps with bpf_mem_alloc while not having to > > wait for RCU tasks trace grace period, which can extend into minutes > > (or even longer if unlucky). > > sleepable bpf prog that lasts minutes? In what kind of situation? > We don't have bpf_sleep() helper and not going to add one any time soon.