From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3695AC004D4 for ; Wed, 24 Aug 2022 19:50:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 689B9940007; Wed, 24 Aug 2022 15:50:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 612246B0074; Wed, 24 Aug 2022 15:50:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48BA1940007; Wed, 24 Aug 2022 15:50:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 343F16B0073 for ; Wed, 24 Aug 2022 15:50:09 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0F67B1A03C8 for ; Wed, 24 Aug 2022 19:50:09 +0000 (UTC) X-FDA: 79835527338.08.9CBFC21 Received: from mail-io1-f67.google.com (mail-io1-f67.google.com [209.85.166.67]) by imf08.hostedemail.com (Postfix) with ESMTP id B47C0160035 for ; Wed, 24 Aug 2022 19:50:08 +0000 (UTC) Received: by mail-io1-f67.google.com with SMTP id q81so6659628iod.9 for ; Wed, 24 Aug 2022 12:50:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=67HetFVTHgGvVwniBUmcRBQFAPcPJ4hXbIOD6A2xOdQ=; b=PLy5hz4MGCrRBCh/6Bn3mQVmr4Z6VgelYYMv4f7EVeIMltq2zDvY7MkwND6d15+8va Zf5QUZlTUx2LlWszsTfGl7p2hGfTbkakhJK5rBhtsfeX3eR80LGiLixDnQedwY6RbcGN padI/3c04h0PTnI1sP2CTMhu0OlSjRDXNu1nHl/LPRP28/g4AvjW2tSVx9TPL1Pp8E1w RwexSXnbgQHNRBnGyZ20lUv4BpOjdwqmlzE8MzQnHDDXO6guF2E/m/lb5bi1eIClVkOE tpUpPSaHSfM96DS80j6jk/C0k98in81EmFnlHHUStP/2DcjVTXKYEqxtGXT4+ftiPgKb eGMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=67HetFVTHgGvVwniBUmcRBQFAPcPJ4hXbIOD6A2xOdQ=; b=XPYmJMuHyRdmNVpxgAu3VQUX1BHI4ZbxDORGMOjs3Jp1l+jaHqRx0E1a+vkvWMFmh4 isuEVToApH7wiCHuwg4P6aMuVXtIc+pswcYzs13EpyxNQYG65DHQNsaKCefcZHY50zS0 xiD3Q7BY/aGI8kOFppVWWEFMBAw8H4UXVFU1wtcX33KvpWj0mIzjHRmuJAPusk0tXiWo 9wkqKeXa/x5dHfg3buvDaZhvcb5qhMtaoa1ih3r8W0LGKgaPObB+bVDEJ/rPDwamZvvI TZl7VTnpJ8SohrSRDVBKaECuCShDw+LFVnDyPl+GSzlM9m/jAlq+o3wjgxG6YLzu9qTQ eHfQ== X-Gm-Message-State: ACgBeo3/cQHv/CTa+Ak2s07vrotN0la5t6EUc83Sw6MVBldkD+ZJyWFJ xBckSICx4sebquKbQ66r6jy5ALYAIkHSe3mXWFI= X-Google-Smtp-Source: AA6agR5GNMJChj+zdWSqkI+hhJccBHDwi4B5HBeNbQNapq4ho+GX2kqQ0DfKDoW7c6Fe+mRVhVSbpBZ47gvtKC9wh1c= X-Received: by 2002:a5e:a815:0:b0:688:f11a:6e11 with SMTP id c21-20020a5ea815000000b00688f11a6e11mr245674ioa.10.1661370608050; Wed, 24 Aug 2022 12:50:08 -0700 (PDT) MIME-Version: 1.0 References: <20220819214232.18784-1-alexei.starovoitov@gmail.com> <20220819214232.18784-14-alexei.starovoitov@gmail.com> <20220819224317.i3mwmr5atdztudtt@MacBook-Pro-3.local.dhcp.thefacebook.com> In-Reply-To: From: Kumar Kartikeya Dwivedi Date: Wed, 24 Aug 2022 21:49:30 +0200 Message-ID: Subject: Re: [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. To: Alexei Starovoitov Cc: "David S. Miller" , Daniel Borkmann , Andrii Nakryiko , Tejun Heo , Delyan Kratunov , linux-mm , bpf , Kernel Team Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PLy5hz4M; spf=pass (imf08.hostedemail.com: domain of memxor@gmail.com designates 209.85.166.67 as permitted sender) smtp.mailfrom=memxor@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661370608; a=rsa-sha256; cv=none; b=A4Vk/eoyBeKOk/FH/Lm6+MjWBUk3jQ2MEWEOl63xYFxEcQDzPGbFimJ0zao5GbSfsIsULq C5jzcRvwnu+s0BeEVZXM4rn0/BJuz6MqYgFcknh8O9vj9c0J/Jsbq5V7LJ5Wm/vtOuO2IL FwdpufVw8dd9cJNoSWHLxy96MwPgbRE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661370608; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=67HetFVTHgGvVwniBUmcRBQFAPcPJ4hXbIOD6A2xOdQ=; b=7ba5sVCkLDNml7gxSRbA2o15j62HASozt5X56VmKznERqcyGJ1NmA+J8eBv89JN6q2VD/v CRMrrO5YmCUbU8vBoj5PKbAvnNKhkgA5ilMLxH40c2MeOtj9KT+gtCKaYnA82V7UpUS9Pt RVrOkd79jFQbMcAb0bzjSqXoc6VHomM= X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B47C0160035 X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PLy5hz4M; spf=pass (imf08.hostedemail.com: domain of memxor@gmail.com designates 209.85.166.67 as permitted sender) smtp.mailfrom=memxor@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: gteb6hzdgkwo5j83zu7yda98zstd5895 X-HE-Tag: 1661370608-930883 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, 20 Aug 2022 at 01:01, Alexei Starovoitov wrote: > > On Fri, Aug 19, 2022 at 3:56 PM Kumar Kartikeya Dwivedi > wrote: > > > > On Sat, 20 Aug 2022 at 00:43, Alexei Starovoitov > > wrote: > > > > > > On Sat, Aug 20, 2022 at 12:21:46AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > On Fri, 19 Aug 2022 at 23:43, Alexei Starovoitov > > > > wrote: > > > > > > > > > > From: Alexei Starovoitov > > > > > > > > > > Use call_rcu_tasks_trace() to wait for sleepable progs to finish. > > > > > Then use call_rcu() to wait for normal progs to finish > > > > > and finally do free_one() on each element when freeing objects > > > > > into global memory pool. > > > > > > > > > > Signed-off-by: Alexei Starovoitov > > > > > --- > > > > > > > > I fear this can make OOM issues very easy to run into, because one > > > > sleepable prog that sleeps for a long period of time can hold the > > > > freeing of elements from another sleepable prog which either does not > > > > sleep often or sleeps for a very short period of time, and has a high > > > > update frequency. I'm mostly worried that unrelated sleepable programs > > > > not even using the same map will begin to affect each other. > > > > > > 'sleep for long time'? sleepable bpf prog doesn't mean that they can sleep. > > > sleepable progs can copy_from_user, but they're not allowed to waste time. > > > > It is certainly possible to waste time, but indirectly, not through > > the BPF program itself. > > > > If you have userfaultfd enabled (for unpriv users), an unprivileged > > user can trap a sleepable BPF prog (say LSM) using bpf_copy_from_user > > for as long as it wants. A similar case can be done using FUSE, IIRC. > > > > You can then say it's a problem about unprivileged users being able to > > use userfaultfd or FUSE, or we could think about fixing > > bpf_copy_from_user to return -EFAULT for this case, but it is totally > > possible right now for malicious userspace to extend the tasks trace > > gp like this for minutes (or even longer) on a system where sleepable > > BPF programs are using e.g. bpf_copy_from_user. > > Well in that sense userfaultfd can keep all sorts of things > in the kernel from making progress. > But nothing to do with OOM. > There is still the max_entries limit. > The amount of objects in waiting_for_gp is guaranteed to be less > than full prealloc. My thinking was that once you hold the GP using uffd, we can assume you will eventually hit a case where all such maps on the system have their max_entries exhausted. So yes, it probably won't OOM, but it would be bad regardless. I think this just begs instead that uffd (and even FUSE) should not be available to untrusted processes on the system by default. Both are used regularly to widen hard to hit race conditions in the kernel. But anyway, there's no easy way currently to guarantee the lifetime of elements for the sleepable case while being as low overhead as trace RCU, so it makes sense to go ahead with this.