From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 421FFC3F6B0 for ; Thu, 25 Aug 2022 00:09:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C6A636B0074; Wed, 24 Aug 2022 20:09:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C1A37940007; Wed, 24 Aug 2022 20:09:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABAE46B0078; Wed, 24 Aug 2022 20:09:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 95C066B0074 for ; Wed, 24 Aug 2022 20:09:01 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 641A81203EB for ; Thu, 25 Aug 2022 00:09:01 +0000 (UTC) X-FDA: 79836179682.06.3A63908 Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by imf21.hostedemail.com (Postfix) with ESMTP id 22BF61C004C for ; Thu, 25 Aug 2022 00:09:00 +0000 (UTC) Received: by mail-ed1-f51.google.com with SMTP id e21so19215938edc.7 for ; Wed, 24 Aug 2022 17:09:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=daI2yVdfpWES6VKmM23cpOtVl/6VtWlL9gVWWX+YU8w=; b=h745Bcik7a/etpTJpn7PvWIbqRpyEa0ZxsHP9DF23cGFPoqeEjZW7wQy0OSU4cn6eM OL2Pve6XAZ81fYA5oHXyJHFK2buyoqD317Q/7pPlRI1xT0axrQOwCrSZVBDzC8G5+xW4 J5MLvUtNerULGN5ApeJrwjk5rtojHV4MJ7QlDU2atU9z1KnaMTzMkOvAaOfyd+NJnBxP Msyglvg8Hc3PRaIvpR9k6ahJk9ITDVogcDhPnApdhrL3bYye57+AKR44mxUYju2sD0Cq tTnyDQeKfI2OALiJmgt23myX3s1ZGhqdw54p6WrPGorj/LiuJskSLvgvhUYHfwQgyQSA Ppzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=daI2yVdfpWES6VKmM23cpOtVl/6VtWlL9gVWWX+YU8w=; b=lWumnUAxU7vBKVxJtlMugEPbTi4jPuKjSC+suIFiRmok0a1/cbL7WWS/xMXXplNTjU Fc3NhoyK1idzPIK3bmn0XQ8r+n1AIfno6s6RgagNWLZFAB/0yeDBCLsx9e5SLeLFjjRu GHXNJSNpiwafb0orF2UcTPh6dXIzZsk8T2AA6bsTNy4sprk3CjYZn3zvOFixaXXO8Xe4 B+BN3AObLk3PTFpHo5SV+CIwkTuaU1rD/2sPnHkWkLnuJsvmnC7YXYz3Zlr7JKEaU+w6 +tSm7eZuVYdtAIh/feNYeU2ZOtx+CuhrSbGwHWZakxtev+gFviT9HkUcXINN8X4Sf/mw A+HA== X-Gm-Message-State: ACgBeo0rF83IiU59zQ5MHFSXm55HMK0l8CEGSNJr9P5Iy8uxBuunXt+x LYu6dgecS+ZAss/TE8zk+M0hai9QcDrKMi8uN6Y= X-Google-Smtp-Source: AA6agR4GDPytrKTHpbw3NVfvRGaUGGlj5T4LpD0MD9p8tOFXQvzH902wCB1OAKcU0xKhS4BFujrS1tE8dXtZp5HmLn0= X-Received: by 2002:a05:6402:298c:b0:446:a97:1800 with SMTP id eq12-20020a056402298c00b004460a971800mr1101234edb.421.1661386139771; Wed, 24 Aug 2022 17:08:59 -0700 (PDT) MIME-Version: 1.0 References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> <20220819214232.18784-14-alexei.starovoitov@gmail.com> <20220819224317.i3mwmr5atdztudtt@MacBook-Pro-3.local.dhcp.thefacebook.com> In-Reply-To: From: Alexei Starovoitov Date: Wed, 24 Aug 2022 17:08:48 -0700 Message-ID: Subject: Re: [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. To: Kumar Kartikeya Dwivedi Cc: "David S. Miller" , Daniel Borkmann , Andrii Nakryiko , Tejun Heo , Delyan Kratunov , linux-mm , bpf , Kernel Team Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661386141; a=rsa-sha256; cv=none; b=1oW9BYqdMBPiwgoCHzH8kYaIwhKq08W/yUXtVsXRSBpUuUenCCBix2sWKVCLDpcDp491X8 GbCJpiG+rjQxUKJVgDlkfeR3bXjDXkTZMImhw2ZrKEs8eFN2LW4t46OAPcfyHjGkKiVNH4 xRgX0cyHCBE7d2q2Fu0tQy51PGI6mJc= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=h745Bcik; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661386141; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=daI2yVdfpWES6VKmM23cpOtVl/6VtWlL9gVWWX+YU8w=; b=zkRf3Tt+VsuI583SDeuxCCN5Bgxn8dRUx2/+3d7FZMxjs0f6XDFQjh9X6aOzutOvakPh8Z YYIjpHgHo361sYZyizOPMNjD4rJ+CIVNqu9FyvVTjM6GhH2skfcfUMEfdWVomV9Jj2uduq ft71nwC6jxqhnPuiXB3XRsjzVNMQeYI= X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 22BF61C004C X-Stat-Signature: rr8tj68xandha7d67wpgii1ckedtkj56 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=h745Bcik; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com X-Rspam-User: X-HE-Tag: 1661386140-453922 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Aug 24, 2022 at 12:50 PM Kumar Kartikeya Dwivedi wrote: > > On Sat, 20 Aug 2022 at 01:01, Alexei Starovoitov > wrote: > > > > On Fri, Aug 19, 2022 at 3:56 PM Kumar Kartikeya Dwivedi > > wrote: > > > > > > On Sat, 20 Aug 2022 at 00:43, Alexei Starovoitov > > > wrote: > > > > > > > > On Sat, Aug 20, 2022 at 12:21:46AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > > On Fri, 19 Aug 2022 at 23:43, Alexei Starovoitov > > > > > wrote: > > > > > > > > > > > > From: Alexei Starovoitov > > > > > > > > > > > > Use call_rcu_tasks_trace() to wait for sleepable progs to finish. > > > > > > Then use call_rcu() to wait for normal progs to finish > > > > > > and finally do free_one() on each element when freeing objects > > > > > > into global memory pool. > > > > > > > > > > > > Signed-off-by: Alexei Starovoitov > > > > > > --- > > > > > > > > > > I fear this can make OOM issues very easy to run into, because one > > > > > sleepable prog that sleeps for a long period of time can hold the > > > > > freeing of elements from another sleepable prog which either does not > > > > > sleep often or sleeps for a very short period of time, and has a high > > > > > update frequency. I'm mostly worried that unrelated sleepable programs > > > > > not even using the same map will begin to affect each other. > > > > > > > > 'sleep for long time'? sleepable bpf prog doesn't mean that they can sleep. > > > > sleepable progs can copy_from_user, but they're not allowed to waste time. > > > > > > It is certainly possible to waste time, but indirectly, not through > > > the BPF program itself. > > > > > > If you have userfaultfd enabled (for unpriv users), an unprivileged > > > user can trap a sleepable BPF prog (say LSM) using bpf_copy_from_user > > > for as long as it wants. A similar case can be done using FUSE, IIRC. > > > > > > You can then say it's a problem about unprivileged users being able to > > > use userfaultfd or FUSE, or we could think about fixing > > > bpf_copy_from_user to return -EFAULT for this case, but it is totally > > > possible right now for malicious userspace to extend the tasks trace > > > gp like this for minutes (or even longer) on a system where sleepable > > > BPF programs are using e.g. bpf_copy_from_user. > > > > Well in that sense userfaultfd can keep all sorts of things > > in the kernel from making progress. > > But nothing to do with OOM. > > There is still the max_entries limit. > > The amount of objects in waiting_for_gp is guaranteed to be less > > than full prealloc. > > My thinking was that once you hold the GP using uffd, we can assume > you will eventually hit a case where all such maps on the system have > their max_entries exhausted. So yes, it probably won't OOM, but it > would be bad regardless. > > I think this just begs instead that uffd (and even FUSE) should not be > available to untrusted processes on the system by default. Both are > used regularly to widen hard to hit race conditions in the kernel. > > But anyway, there's no easy way currently to guarantee the lifetime of > elements for the sleepable case while being as low overhead as trace > RCU, so it makes sense to go ahead with this. Right. We evaluated SRCU for sleepable and it had too much overhead. That's the reason rcu_tasks_trace was added and sleepable bpf progs is the only user so far. The point I'm arguing is that call_rcu_tasks_trace in this patch doesn't add mm concerns more than the existing call_rcu. There is CONFIG_PREEMPT_RCU and RT. uffd will cause similar issues in such configs too.