linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: davem@davemloft.net, daniel@iogearbox.net, andrii@kernel.org,
	 tj@kernel.org, delyank@fb.com, linux-mm@kvack.org,
	bpf@vger.kernel.org,  kernel-team@fb.com
Subject: Re: [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs.
Date: Sat, 20 Aug 2022 00:21:46 +0200	[thread overview]
Message-ID: <CAP01T75MUMKzacdE+AcKqgXy1jA5FyMwKXxiibD0ML3OFSqvsw@mail.gmail.com> (raw)
In-Reply-To: <20220819214232.18784-14-alexei.starovoitov@gmail.com>

On Fri, 19 Aug 2022 at 23:43, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> Then use call_rcu() to wait for normal progs to finish
> and finally do free_one() on each element when freeing objects
> into global memory pool.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

I fear this can make OOM issues very easy to run into, because one
sleepable prog that sleeps for a long period of time can hold the
freeing of elements from another sleepable prog which either does not
sleep often or sleeps for a very short period of time, and has a high
update frequency. I'm mostly worried that unrelated sleepable programs
not even using the same map will begin to affect each other.

Have you considered other options? E.g. we could directly expose
bpf_rcu_read_lock/bpf_rcu_read_unlock to the program and enforce that
access to RCU protected map lookups only happens in such read
sections, and unlock invalidates all RCU protected pointers? Sleepable
helpers can then not be invoked inside the BPF RCU read section. The
program uses RCU read section while accessing such maps, and sleeps
after doing bpf_rcu_read_unlock. They can be kfuncs.

It might also be useful in general, to access RCU protected data from
sleepable programs (i.e. make some sections of the program RCU
protected and non-sleepable at runtime). It will allow use of elements
from dynamically allocated maps with bpf_mem_alloc while not having to
wait for RCU tasks trace grace period, which can extend into minutes
(or even longer if unlucky).

One difference would be that you can pin a lookup across a sleep cycle
with this approach, but not with preallocated maps or the explicit RCU
section above, but I'm not sure it's worth it. It isn't possible now.

>  kernel/bpf/memalloc.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> index 9e5ad7dc4dc7..d34383dc12d9 100644
> --- a/kernel/bpf/memalloc.c
> +++ b/kernel/bpf/memalloc.c
> @@ -224,6 +224,13 @@ static void __free_rcu(struct rcu_head *head)
>         atomic_set(&c->call_rcu_in_progress, 0);
>  }
>
> +static void __free_rcu_tasks_trace(struct rcu_head *head)
> +{
> +       struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> +
> +       call_rcu(&c->rcu, __free_rcu);
> +}
> +
>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
>  {
>         struct llist_node *llnode = obj;
> @@ -249,7 +256,11 @@ static void do_call_rcu(struct bpf_mem_cache *c)
>                  * from __free_rcu() and from drain_mem_cache().
>                  */
>                 __llist_add(llnode, &c->waiting_for_gp);
> -       call_rcu(&c->rcu, __free_rcu);
> +       /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> +        * Then use call_rcu() to wait for normal progs to finish
> +        * and finally do free_one() on each element.
> +        */
> +       call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
>  }
>
>  static void free_bulk(struct bpf_mem_cache *c)
> @@ -452,6 +463,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
>                 /* c->waiting_for_gp list was drained, but __free_rcu might
>                  * still execute. Wait for it now before we free 'c'.
>                  */
> +               rcu_barrier_tasks_trace();
>                 rcu_barrier();
>                 free_percpu(ma->cache);
>                 ma->cache = NULL;
> --
> 2.30.2
>


  reply	other threads:[~2022-08-19 22:23 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19 21:42 [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 01/15] bpf: Introduce any context " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 02/15] bpf: Convert hash map to bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 03/15] selftests/bpf: Improve test coverage of test_maps Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 04/15] samples/bpf: Reduce syscall overhead in map_perf_test Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 05/15] bpf: Relax the requirement to use preallocated hash maps in tracing progs Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 06/15] bpf: Optimize element count in non-preallocated hash map Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 07/15] bpf: Optimize call_rcu " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 08/15] bpf: Adjust low/high watermarks in bpf_mem_cache Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 09/15] bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU Alexei Starovoitov
2022-08-24 19:58   ` Kumar Kartikeya Dwivedi
2022-08-25  0:13     ` Alexei Starovoitov
2022-08-25  0:35       ` Joel Fernandes
2022-08-25  0:49         ` Joel Fernandes
2022-08-19 21:42 ` [PATCH v3 bpf-next 10/15] bpf: Add percpu allocation support to bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 11/15] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 12/15] bpf: Remove tracing program restriction on map types Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs Alexei Starovoitov
2022-08-19 22:21   ` Kumar Kartikeya Dwivedi [this message]
2022-08-19 22:43     ` Alexei Starovoitov
2022-08-19 22:56       ` Kumar Kartikeya Dwivedi
2022-08-19 23:01         ` Alexei Starovoitov
2022-08-24 19:49           ` Kumar Kartikeya Dwivedi
2022-08-25  0:08             ` Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 14/15] bpf: Remove prealloc-only restriction for " Alexei Starovoitov
2022-08-19 21:42 ` [PATCH v3 bpf-next 15/15] bpf: Introduce sysctl kernel.bpf_force_dyn_alloc Alexei Starovoitov
2022-08-24 20:03 ` [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator Kumar Kartikeya Dwivedi
2022-08-25  0:16   ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP01T75MUMKzacdE+AcKqgXy1jA5FyMwKXxiibD0ML3OFSqvsw@mail.gmail.com \
    --to=memxor@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=delyank@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox