From: Yafang Shao <laoar.shao@gmail.com>
To: 42.hyeyoo@gmail.com, vbabka@suse.cz, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com,
songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com,
kpsingh@kernel.org, sdf@google.com, haoluo@google.com,
jolsa@kernel.org, tj@kernel.org, dennis@kernel.org, cl@linux.com,
akpm@linux-foundation.org, penberg@kernel.org,
rientjes@google.com, iamjoonsoo.kim@lge.com,
roman.gushchin@linux.dev
Cc: linux-mm@kvack.org, bpf@vger.kernel.org,
Yafang Shao <laoar.shao@gmail.com>
Subject: [RFC PATCH bpf-next v2 00/11] mm, bpf: Add BPF into /proc/meminfo
Date: Thu, 12 Jan 2023 15:53:15 +0000 [thread overview]
Message-ID: <20230112155326.26902-1-laoar.shao@gmail.com> (raw)
Currently there's no way to get BPF memory usage, while we can only
estimate the usage by bpftool or memcg, both of which are not reliable.
- bpftool
`bpftool {map,prog} show` can show us the memlock of each map and
prog, but the memlock is vary from the real memory size. The memlock
of a bpf object is approximately
`round_up(key_size + value_size, 8) * max_entries`,
so 1) it can't apply to the non-preallocated bpf map which may
increase or decrease the real memory size dynamically. 2) the element
size of some bpf map is not `key_size + value_size`, for example the
element size of htab is
`sizeof(struct htab_elem) + round_up(key_size, 8) + round_up(value_size, 8)`
That said the differece between these two values may be very great if
the key_size and value_size is small. For example in my verifaction,
the size of memlock and real memory of a preallocated hash map are,
$ grep BPF /proc/meminfo
BPF: 350 kB <<< the size of preallocated memalloc pool
(create hash map)
$ bpftool map show
41549: hash name count_map flags 0x0
key 4B value 4B max_entries 1048576 memlock 8388608B
$ grep BPF /proc/meminfo
BPF: 82284 kB
So the real memory size is $((82284 - 350)) which is 81934 kB
while the memlock is only 8192 kB.
- memcg
With memcg we only know that the BPF memory usage is less than
memory.kmem.usage_in_bytes (or memory.current in v2). Furthermore, we
only know that the BPF memory usage is less than $MemTotal if the BPF
object is charged into root memcg :)
So we need a way to get the BPF memory usage especially there will be
more and more bpf programs running on the production environment. The
memory usage of BPF memory is not trivial, which deserves a new item in
/proc/meminfo.
There're some ways to calculate the BPF memory usage. They all have pros
and cons.
- Option 1: Annotate BPF memory allocation only
It is how I implemented in RFC v1. You can look into the detail and
discussion on it via the link below[1].
- pros
We only need to annotate the BPF memory allocation, and then we can
find these allocated memory in the free path automatically. So it is
very easy to use, and we don't need to worry about the stat leak.
- cons
We must store the information of these allocated memory, in
particular the allocated slab objects. So it takes extra memory. If
we introduce a new member into struct page or add this member into
page_ext, it will take at least 0.2% of the total memory on 64bit
system, that is not acceptible.
One way to reduce this memory overhead is to introduce dynamic page
extension, but it will take great effort and it may not worth it.
- Option 2: Annotate both allocation and free
It is similar to how I implemented in an earlier version[2].
- pros
There's almost no memory overhead.
- cons
All the memory allocation and free must use the BPF helpers, but
can't use the generic helpers like kfree/vfree/percpu_free. So if
the user forget to use the helpers we introduced to allocate or
free BPF memory, there will be stat leak.
It is not easy to annotate some derferred allocation, in particular
the kfree_rcu(). So the user have to use call_rcu() instead of
kfree_rcu(). Another risk is that if we introduce other deferred
free helpers in the future, this BPF statistic may break easily.
- Option 3: Calculate the memory size via the pointer
It is how I implement in this patchset.
After allocating some BPF memory, we get the full size from the
pointer and add it; Before freeing the BPF memory, we get the full
size from the pointer and sub it.
- pros
No memory overhead.
No code churn in MM core allocation and free path.
The impementation is quite clear and easy to maintain.
- cons
The calculation is not embedded in the MM allocation/free path, so
there will be some redundant code to execute to get the size via
pointer.
BPF memory allocation and free must use the helpers we introduced,
otherwise there will be stat leak.
I perfer the option 3. Its cons can be justified.
- bpf_map_free should be paired with bpf_map_alloc, that's reasonable.
- Regarding the possible extra cpu cycles it may take, the user should
not allocate and free memory in the critical path if it is latency
sensitive.
[1]. https://lwn.net/Articles/917647/
[2]. https://lore.kernel.org/linux-mm/20220921170002.29557-1-laoar.shao@gmail.com/
v1->v2: don't use page_ext (Vlastimil, Hyeonggon)
Yafang Shao (11):
mm: percpu: count memcg relevant memory only when kmemcg is enabled
mm: percpu: introduce percpu_size()
mm: slab: rename obj_full_size()
mm: slab: introduce ksize_full()
mm: vmalloc: introduce vsize()
mm: util: introduce kvsize()
bpf: introduce new helpers bpf_ringbuf_pages_{alloc,free}
bpf: use bpf_map_kzalloc in arraymap
bpf: use bpf_map_kvcalloc in bpf_local_storage
bpf: add and use bpf map free helpers
bpf: introduce bpf memory statistics
fs/proc/meminfo.c | 4 ++
include/linux/bpf.h | 115 +++++++++++++++++++++++++++++++++++++++--
include/linux/percpu.h | 1 +
include/linux/slab.h | 10 ++++
include/linux/vmalloc.h | 15 ++++++
kernel/bpf/arraymap.c | 20 +++----
kernel/bpf/bpf_cgrp_storage.c | 2 +-
kernel/bpf/bpf_inode_storage.c | 2 +-
kernel/bpf/bpf_local_storage.c | 24 ++++-----
kernel/bpf/bpf_task_storage.c | 2 +-
kernel/bpf/cpumap.c | 13 +++--
kernel/bpf/devmap.c | 10 ++--
kernel/bpf/hashtab.c | 8 +--
kernel/bpf/helpers.c | 2 +-
kernel/bpf/local_storage.c | 12 ++---
kernel/bpf/lpm_trie.c | 14 ++---
kernel/bpf/memalloc.c | 19 ++++++-
kernel/bpf/ringbuf.c | 75 ++++++++++++++++++---------
kernel/bpf/syscall.c | 54 ++++++++++++++++++-
mm/percpu-internal.h | 4 +-
mm/percpu.c | 35 +++++++++++++
mm/slab.h | 19 ++++---
mm/slab_common.c | 52 +++++++++++++------
mm/slob.c | 2 +-
mm/util.c | 15 ++++++
net/core/bpf_sk_storage.c | 4 +-
net/core/sock_map.c | 2 +-
net/xdp/xskmap.c | 2 +-
28 files changed, 422 insertions(+), 115 deletions(-)
--
1.8.3.1
next reply other threads:[~2023-01-12 15:53 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-12 15:53 Yafang Shao [this message]
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 01/11] mm: percpu: count memcg relevant memory only when kmemcg is enabled Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 02/11] mm: percpu: introduce percpu_size() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 03/11] mm: slab: rename obj_full_size() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 04/11] mm: slab: introduce ksize_full() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 05/11] mm: vmalloc: introduce vsize() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 06/11] mm: util: introduce kvsize() Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 07/11] bpf: introduce new helpers bpf_ringbuf_pages_{alloc,free} Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 08/11] bpf: use bpf_map_kzalloc in arraymap Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 09/11] bpf: use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 10/11] bpf: add and use bpf map free helpers Yafang Shao
2023-01-12 15:53 ` [RFC PATCH bpf-next v2 11/11] bpf: introduce bpf memory statistics Yafang Shao
2023-01-12 21:05 ` [RFC PATCH bpf-next v2 00/11] mm, bpf: Add BPF into /proc/meminfo Alexei Starovoitov
2023-01-13 11:53 ` Yafang Shao
2023-01-17 17:25 ` Alexei Starovoitov
2023-01-18 3:07 ` Yafang Shao
2023-01-18 5:39 ` Alexei Starovoitov
2023-01-18 6:49 ` Yafang Shao
2023-01-26 5:45 ` Alexei Starovoitov
2023-01-28 11:49 ` Yafang Shao
2023-01-30 13:14 ` Uladzislau Rezki
2023-01-31 6:28 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230112155326.26902-1-laoar.shao@gmail.com \
--to=laoar.shao@gmail.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cl@linux.com \
--cc=daniel@iogearbox.net \
--cc=dennis@kernel.org \
--cc=haoluo@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=sdf@google.com \
--cc=songliubraving@fb.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox