linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: davem@davemloft.net, daniel@iogearbox.net, andrii@kernel.org,
	 tj@kernel.org, delyank@fb.com, linux-mm@kvack.org,
	bpf@vger.kernel.org,  kernel-team@fb.com
Subject: Re: [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs.
Date: Sat, 20 Aug 2022 00:56:12 +0200	[thread overview]
Message-ID: <CAP01T77A1pqYQKeECDSCoxH1pQ1Vxcm84B8_D_r0xoZv_bbq_A@mail.gmail.com> (raw)
In-Reply-To: <20220819224317.i3mwmr5atdztudtt@MacBook-Pro-3.local.dhcp.thefacebook.com>

On Sat, 20 Aug 2022 at 00:43, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Aug 20, 2022 at 12:21:46AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Fri, 19 Aug 2022 at 23:43, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > From: Alexei Starovoitov <ast@kernel.org>
> > >
> > > Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> > > Then use call_rcu() to wait for normal progs to finish
> > > and finally do free_one() on each element when freeing objects
> > > into global memory pool.
> > >
> > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > ---
> >
> > I fear this can make OOM issues very easy to run into, because one
> > sleepable prog that sleeps for a long period of time can hold the
> > freeing of elements from another sleepable prog which either does not
> > sleep often or sleeps for a very short period of time, and has a high
> > update frequency. I'm mostly worried that unrelated sleepable programs
> > not even using the same map will begin to affect each other.
>
> 'sleep for long time'? sleepable bpf prog doesn't mean that they can sleep.
> sleepable progs can copy_from_user, but they're not allowed to waste time.

It is certainly possible to waste time, but indirectly, not through
the BPF program itself.

If you have userfaultfd enabled (for unpriv users), an unprivileged
user can trap a sleepable BPF prog (say LSM) using bpf_copy_from_user
for as long as it wants. A similar case can be done using FUSE, IIRC.

You can then say it's a problem about unprivileged users being able to
use userfaultfd or FUSE, or we could think about fixing
bpf_copy_from_user to return -EFAULT for this case, but it is totally
possible right now for malicious userspace to extend the tasks trace
gp like this for minutes (or even longer) on a system where sleepable
BPF programs are using e.g. bpf_copy_from_user.

> I don't share OOM concerns at all.
> max_entries and memcg limits are still there and enforced.
> dynamic map is strictly better and memory efficient than full prealloc.
>
> > Have you considered other options? E.g. we could directly expose
> > bpf_rcu_read_lock/bpf_rcu_read_unlock to the program and enforce that
> > access to RCU protected map lookups only happens in such read
> > sections, and unlock invalidates all RCU protected pointers? Sleepable
> > helpers can then not be invoked inside the BPF RCU read section. The
> > program uses RCU read section while accessing such maps, and sleeps
> > after doing bpf_rcu_read_unlock. They can be kfuncs.
>
> Yes. We can add explicit bpf_rcu_read_lock and teach verifier about RCU CS,
> but I don't see the value specifically for sleepable progs.
> Current sleepable progs can do map lookup without extra kfuncs.
> Explicit CS would force progs to be rewritten which is not great.
>
> > It might also be useful in general, to access RCU protected data from
> > sleepable programs (i.e. make some sections of the program RCU
> > protected and non-sleepable at runtime). It will allow use of elements
>
> For other cases, sure. We can introduce RCU protected objects and
> explicit bpf_rcu_read_lock.
>
> > from dynamically allocated maps with bpf_mem_alloc while not having to
> > wait for RCU tasks trace grace period, which can extend into minutes
> > (or even longer if unlucky).
>
> sleepable bpf prog that lasts minutes? In what kind of situation?
> We don't have bpf_sleep() helper and not going to add one any time soon.


  reply	other threads:[~2022-08-19 22:56 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19 21:42 [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 01/15] bpf: Introduce any context " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 02/15] bpf: Convert hash map to bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 03/15] selftests/bpf: Improve test coverage of test_maps Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 04/15] samples/bpf: Reduce syscall overhead in map_perf_test Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 05/15] bpf: Relax the requirement to use preallocated hash maps in tracing progs Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 06/15] bpf: Optimize element count in non-preallocated hash map Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 07/15] bpf: Optimize call_rcu " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 08/15] bpf: Adjust low/high watermarks in bpf_mem_cache Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 09/15] bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU Alexei Starovoitov
2022-08-24 19:58   ` Kumar Kartikeya Dwivedi
2022-08-25  0:13     ` Alexei Starovoitov
2022-08-25  0:35       ` Joel Fernandes
2022-08-25  0:49         ` Joel Fernandes
2022-08-19 21:42 ` [PATCH v3 bpf-next 10/15] bpf: Add percpu allocation support to bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 11/15] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 12/15] bpf: Remove tracing program restriction on map types Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs Alexei Starovoitov
2022-08-19 22:21   ` Kumar Kartikeya Dwivedi
2022-08-19 22:43     ` Alexei Starovoitov
2022-08-19 22:56       ` Kumar Kartikeya Dwivedi [this message]
2022-08-19 23:01         ` Alexei Starovoitov
2022-08-24 19:49           ` Kumar Kartikeya Dwivedi
2022-08-25  0:08             ` Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 14/15] bpf: Remove prealloc-only restriction for " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 15/15] bpf: Introduce sysctl kernel.bpf_force_dyn_alloc Alexei Starovoitov
2022-08-24 20:03 ` [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator Kumar Kartikeya Dwivedi
2022-08-25  0:16   ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP01T77A1pqYQKeECDSCoxH1pQ1Vxcm84B8_D_r0xoZv_bbq_A@mail.gmail.com \
    --to=memxor@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=delyank@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox