linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Daniel Borkmann <daniel@iogearbox.net>,
	 Andrii Nakryiko <andrii@kernel.org>, Tejun Heo <tj@kernel.org>,
	Delyan Kratunov <delyank@fb.com>,  linux-mm <linux-mm@kvack.org>,
	bpf <bpf@vger.kernel.org>,  Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs.
Date: Wed, 24 Aug 2022 17:08:48 -0700	[thread overview]
Message-ID: <CAADnVQKvctFmmWQuEdb3L0KfVgwXfLO2PCnzzZXfJN46iJyDXg@mail.gmail.com> (raw)
In-Reply-To: <CAP01T76YDGtF0tV6cDvzANW2oyJTGvSVGsT-F-YDozWjg9HnMw@mail.gmail.com>

On Wed, Aug 24, 2022 at 12:50 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, 20 Aug 2022 at 01:01, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Aug 19, 2022 at 3:56 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Sat, 20 Aug 2022 at 00:43, Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Sat, Aug 20, 2022 at 12:21:46AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > On Fri, 19 Aug 2022 at 23:43, Alexei Starovoitov
> > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > >
> > > > > > From: Alexei Starovoitov <ast@kernel.org>
> > > > > >
> > > > > > Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> > > > > > Then use call_rcu() to wait for normal progs to finish
> > > > > > and finally do free_one() on each element when freeing objects
> > > > > > into global memory pool.
> > > > > >
> > > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > ---
> > > > >
> > > > > I fear this can make OOM issues very easy to run into, because one
> > > > > sleepable prog that sleeps for a long period of time can hold the
> > > > > freeing of elements from another sleepable prog which either does not
> > > > > sleep often or sleeps for a very short period of time, and has a high
> > > > > update frequency. I'm mostly worried that unrelated sleepable programs
> > > > > not even using the same map will begin to affect each other.
> > > >
> > > > 'sleep for long time'? sleepable bpf prog doesn't mean that they can sleep.
> > > > sleepable progs can copy_from_user, but they're not allowed to waste time.
> > >
> > > It is certainly possible to waste time, but indirectly, not through
> > > the BPF program itself.
> > >
> > > If you have userfaultfd enabled (for unpriv users), an unprivileged
> > > user can trap a sleepable BPF prog (say LSM) using bpf_copy_from_user
> > > for as long as it wants. A similar case can be done using FUSE, IIRC.
> > >
> > > You can then say it's a problem about unprivileged users being able to
> > > use userfaultfd or FUSE, or we could think about fixing
> > > bpf_copy_from_user to return -EFAULT for this case, but it is totally
> > > possible right now for malicious userspace to extend the tasks trace
> > > gp like this for minutes (or even longer) on a system where sleepable
> > > BPF programs are using e.g. bpf_copy_from_user.
> >
> > Well in that sense userfaultfd can keep all sorts of things
> > in the kernel from making progress.
> > But nothing to do with OOM.
> > There is still the max_entries limit.
> > The amount of objects in waiting_for_gp is guaranteed to be less
> > than full prealloc.
>
> My thinking was that once you hold the GP using uffd, we can assume
> you will eventually hit a case where all such maps on the system have
> their max_entries exhausted. So yes, it probably won't OOM, but it
> would be bad regardless.
>
> I think this just begs instead that uffd (and even FUSE) should not be
> available to untrusted processes on the system by default. Both are
> used regularly to widen hard to hit race conditions in the kernel.
>
> But anyway, there's no easy way currently to guarantee the lifetime of
> elements for the sleepable case while being as low overhead as trace
> RCU, so it makes sense to go ahead with this.

Right. We evaluated SRCU for sleepable and it had too much overhead.
That's the reason rcu_tasks_trace was added and sleepable bpf progs
is the only user so far.
The point I'm arguing is that call_rcu_tasks_trace in this patch
doesn't add mm concerns more than the existing call_rcu.
There is CONFIG_PREEMPT_RCU and RT. uffd will cause similar
issues in such configs too.


  reply	other threads:[~2022-08-25  0:09 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19 21:42 [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 01/15] bpf: Introduce any context " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 02/15] bpf: Convert hash map to bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 03/15] selftests/bpf: Improve test coverage of test_maps Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 04/15] samples/bpf: Reduce syscall overhead in map_perf_test Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 05/15] bpf: Relax the requirement to use preallocated hash maps in tracing progs Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 06/15] bpf: Optimize element count in non-preallocated hash map Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 07/15] bpf: Optimize call_rcu " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 08/15] bpf: Adjust low/high watermarks in bpf_mem_cache Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 09/15] bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU Alexei Starovoitov
2022-08-24 19:58   ` Kumar Kartikeya Dwivedi
2022-08-25  0:13     ` Alexei Starovoitov
2022-08-25  0:35       ` Joel Fernandes
2022-08-25  0:49         ` Joel Fernandes
2022-08-19 21:42 ` [PATCH v3 bpf-next 10/15] bpf: Add percpu allocation support to bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 11/15] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 12/15] bpf: Remove tracing program restriction on map types Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs Alexei Starovoitov
2022-08-19 22:21   ` Kumar Kartikeya Dwivedi
2022-08-19 22:43     ` Alexei Starovoitov
2022-08-19 22:56       ` Kumar Kartikeya Dwivedi
2022-08-19 23:01         ` Alexei Starovoitov
2022-08-24 19:49           ` Kumar Kartikeya Dwivedi
2022-08-25  0:08             ` Alexei Starovoitov [this message]
2022-08-19 21:42 ` [PATCH v3 bpf-next 14/15] bpf: Remove prealloc-only restriction for " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 15/15] bpf: Introduce sysctl kernel.bpf_force_dyn_alloc Alexei Starovoitov
2022-08-24 20:03 ` [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator Kumar Kartikeya Dwivedi
2022-08-25  0:16   ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAADnVQKvctFmmWQuEdb3L0KfVgwXfLO2PCnzzZXfJN46iJyDXg@mail.gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=delyank@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=memxor@gmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox