* [PATCH bpf-next 1/5] mm: memcontrol: add new kernel parameter cgroup.memory=nobpf
2023-02-05 6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
@ 2023-02-05 6:58 ` Yafang Shao
2023-02-05 6:58 ` [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-05 6:58 UTC (permalink / raw)
To: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
roman.gushchin, shakeelb, muchun.song, akpm
Cc: bpf, cgroups, linux-mm, Yafang Shao
Add new kernel parameter cgroup.memory=nobpf to allow user disable bpf
memory accounting. This is a preparation for the followup patch.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
Documentation/admin-guide/kernel-parameters.txt | 1 +
include/linux/memcontrol.h | 11 +++++++++++
mm/memcontrol.c | 18 ++++++++++++++++++
3 files changed, 30 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6cfa6e3..29fb41e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -557,6 +557,7 @@
Format: <string>
nosocket -- Disable socket memory accounting.
nokmem -- Disable kernel memory accounting.
+ nobpf -- Disable BPF memory accounting.
checkreqprot= [SELINUX] Set initial checkreqprot flag value.
Format: { "0" | "1" }
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d3c8203..26d4bf2 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1751,6 +1751,12 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size);
void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size);
+extern struct static_key_false memcg_bpf_enabled_key;
+static inline bool memcg_bpf_enabled(void)
+{
+ return static_branch_likely(&memcg_bpf_enabled_key);
+}
+
extern struct static_key_false memcg_kmem_enabled_key;
static inline bool memcg_kmem_enabled(void)
@@ -1829,6 +1835,11 @@ static inline struct obj_cgroup *get_obj_cgroup_from_page(struct page *page)
return NULL;
}
+static inline bool memcg_bpf_enabled(void)
+{
+ return false;
+}
+
static inline bool memcg_kmem_enabled(void)
{
return false;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ab457f0..590526f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -89,6 +89,9 @@
/* Kernel memory accounting disabled? */
static bool cgroup_memory_nokmem __ro_after_init;
+/* BPF memory accounting disabled? */
+static bool cgroup_memory_nobpf __ro_after_init;
+
#ifdef CONFIG_CGROUP_WRITEBACK
static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
#endif
@@ -348,6 +351,9 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg,
*/
DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key);
EXPORT_SYMBOL(memcg_kmem_enabled_key);
+
+DEFINE_STATIC_KEY_FALSE(memcg_bpf_enabled_key);
+EXPORT_SYMBOL(memcg_bpf_enabled_key);
#endif
/**
@@ -5362,6 +5368,11 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
static_branch_inc(&memcg_sockets_enabled_key);
+#if defined(CONFIG_MEMCG_KMEM)
+ if (!cgroup_memory_nobpf)
+ static_branch_inc(&memcg_bpf_enabled_key);
+#endif
+
return &memcg->css;
}
@@ -5446,6 +5457,11 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_active)
static_branch_dec(&memcg_sockets_enabled_key);
+#if defined(CONFIG_MEMCG_KMEM)
+ if (!cgroup_memory_nobpf)
+ static_branch_dec(&memcg_bpf_enabled_key);
+#endif
+
vmpressure_cleanup(&memcg->vmpressure);
cancel_work_sync(&memcg->high_work);
mem_cgroup_remove_from_trees(memcg);
@@ -7310,6 +7326,8 @@ static int __init cgroup_memory(char *s)
cgroup_memory_nosocket = true;
if (!strcmp(token, "nokmem"))
cgroup_memory_nokmem = true;
+ if (!strcmp(token, "nobpf"))
+ cgroup_memory_nobpf = true;
}
return 1;
}
--
1.8.3.1
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage
2023-02-05 6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
2023-02-05 6:58 ` [PATCH bpf-next 1/5] mm: memcontrol: add new kernel parameter cgroup.memory=nobpf Yafang Shao
@ 2023-02-05 6:58 ` Yafang Shao
2023-02-08 19:25 ` Johannes Weiner
2023-02-05 6:58 ` [PATCH bpf-next 3/5] bpf: introduce bpf_memcg_flags() Yafang Shao
` (4 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2023-02-05 6:58 UTC (permalink / raw)
To: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
roman.gushchin, shakeelb, muchun.song, akpm
Cc: bpf, cgroups, linux-mm, Yafang Shao
Introduce new helper bpf_map_kvcalloc() for this memory allocation. Then
bpf_local_storage will be the same with other map's creation.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
include/linux/bpf.h | 8 ++++++++
kernel/bpf/bpf_local_storage.c | 4 ++--
kernel/bpf/syscall.c | 15 +++++++++++++++
3 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 35c18a9..fe0bf48 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1886,6 +1886,8 @@ int generic_map_delete_batch(struct bpf_map *map,
void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
int node);
void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags);
+void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
+ gfp_t flags);
void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
size_t align, gfp_t flags);
#else
@@ -1902,6 +1904,12 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
return kzalloc(size, flags);
}
+static inline void *
+bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size, gfp_t flags)
+{
+ return kvcalloc(n, size, flags);
+}
+
static inline void __percpu *
bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, size_t align,
gfp_t flags)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 373c3c2..35f4138 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -568,8 +568,8 @@ static struct bpf_local_storage_map *__bpf_local_storage_map_alloc(union bpf_att
nbuckets = max_t(u32, 2, nbuckets);
smap->bucket_log = ilog2(nbuckets);
- smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
- GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
+ smap->buckets = bpf_map_kvcalloc(&smap->map, sizeof(*smap->buckets),
+ nbuckets, GFP_USER | __GFP_NOWARN);
if (!smap->buckets) {
bpf_map_area_free(smap);
return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bcc9761..9d94a35 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -464,6 +464,21 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
return ptr;
}
+void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
+ gfp_t flags)
+{
+ struct mem_cgroup *memcg, *old_memcg;
+ void *ptr;
+
+ memcg = bpf_map_get_memcg(map);
+ old_memcg = set_active_memcg(memcg);
+ ptr = kvcalloc(n, size, flags | __GFP_ACCOUNT);
+ set_active_memcg(old_memcg);
+ mem_cgroup_put(memcg);
+
+ return ptr;
+}
+
void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
size_t align, gfp_t flags)
{
--
1.8.3.1
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage
2023-02-05 6:58 ` [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
@ 2023-02-08 19:25 ` Johannes Weiner
2023-02-09 11:27 ` Yafang Shao
0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2023-02-08 19:25 UTC (permalink / raw)
To: Yafang Shao
Cc: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
john.fastabend, kpsingh, sdf, haoluo, jolsa, mhocko,
roman.gushchin, shakeelb, muchun.song, akpm, bpf, cgroups,
linux-mm
On Sun, Feb 05, 2023 at 06:58:02AM +0000, Yafang Shao wrote:
> Introduce new helper bpf_map_kvcalloc() for this memory allocation. Then
> bpf_local_storage will be the same with other map's creation.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
This looks good to me, but it could be helpful to explain the
user-visible part of the bug somewhat, i.e. who is being charged right
now for the allocation if it's not the map owner.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage
2023-02-08 19:25 ` Johannes Weiner
@ 2023-02-09 11:27 ` Yafang Shao
0 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-09 11:27 UTC (permalink / raw)
To: Johannes Weiner
Cc: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
john.fastabend, kpsingh, sdf, haoluo, jolsa, mhocko,
roman.gushchin, shakeelb, muchun.song, akpm, bpf, cgroups,
linux-mm
On Thu, Feb 9, 2023 at 3:25 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Sun, Feb 05, 2023 at 06:58:02AM +0000, Yafang Shao wrote:
> > Introduce new helper bpf_map_kvcalloc() for this memory allocation. Then
> > bpf_local_storage will be the same with other map's creation.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>
> This looks good to me, but it could be helpful to explain the
> user-visible part of the bug somewhat, i.e. who is being charged right
> now for the allocation if it's not the map owner.
Sure. Will improve the commit log.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH bpf-next 3/5] bpf: introduce bpf_memcg_flags()
2023-02-05 6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
2023-02-05 6:58 ` [PATCH bpf-next 1/5] mm: memcontrol: add new kernel parameter cgroup.memory=nobpf Yafang Shao
2023-02-05 6:58 ` [PATCH bpf-next 2/5] bpf: use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
@ 2023-02-05 6:58 ` Yafang Shao
2023-02-05 6:58 ` [PATCH bpf-next 4/5] bpf: allow to disable bpf map memory accounting Yafang Shao
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-05 6:58 UTC (permalink / raw)
To: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
roman.gushchin, shakeelb, muchun.song, akpm
Cc: bpf, cgroups, linux-mm, Yafang Shao
This new helper will be used in both bpf prog and bpf map.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
include/linux/bpf.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fe0bf48..4385418 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -28,6 +28,7 @@
#include <linux/btf.h>
#include <linux/rcupdate_trace.h>
#include <linux/static_call.h>
+#include <linux/memcontrol.h>
struct bpf_verifier_env;
struct bpf_verifier_log;
@@ -2933,4 +2934,11 @@ static inline bool type_is_alloc(u32 type)
return type & MEM_ALLOC;
}
+static inline gfp_t bpf_memcg_flags(gfp_t flags)
+{
+ if (memcg_bpf_enabled())
+ return flags | __GFP_ACCOUNT;
+ return flags;
+}
+
#endif /* _LINUX_BPF_H */
--
1.8.3.1
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH bpf-next 4/5] bpf: allow to disable bpf map memory accounting
2023-02-05 6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
` (2 preceding siblings ...)
2023-02-05 6:58 ` [PATCH bpf-next 3/5] bpf: introduce bpf_memcg_flags() Yafang Shao
@ 2023-02-05 6:58 ` Yafang Shao
2023-02-05 6:58 ` [PATCH bpf-next 5/5] bpf: allow to disable bpf prog " Yafang Shao
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-05 6:58 UTC (permalink / raw)
To: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
roman.gushchin, shakeelb, muchun.song, akpm
Cc: bpf, cgroups, linux-mm, Yafang Shao
We can simply set root memcg as the map's memcg to disable bpf memory
accounting. bpf_map_area_alloc is a little special as it gets the memcg
from current rather than from the map, so we need to disable GFP_ACCOUNT
specifically for it.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
kernel/bpf/memalloc.c | 3 ++-
kernel/bpf/syscall.c | 5 +++--
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index ebcc3dd..6da9051 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -395,7 +395,8 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
unit_size = size;
#ifdef CONFIG_MEMCG_KMEM
- objcg = get_obj_cgroup_from_current();
+ if (memcg_bpf_enabled())
+ objcg = get_obj_cgroup_from_current();
#endif
for_each_possible_cpu(cpu) {
c = per_cpu_ptr(pc, cpu);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9d94a35..cda8d00 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -309,7 +309,7 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
* __GFP_RETRY_MAYFAIL to avoid such situations.
*/
- const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_ACCOUNT;
+ gfp_t gfp = bpf_memcg_flags(__GFP_NOWARN | __GFP_ZERO);
unsigned int flags = 0;
unsigned long align = 1;
void *area;
@@ -418,7 +418,8 @@ static void bpf_map_save_memcg(struct bpf_map *map)
* So we have to check map->objcg for being NULL each time it's
* being used.
*/
- map->objcg = get_obj_cgroup_from_current();
+ if (memcg_bpf_enabled())
+ map->objcg = get_obj_cgroup_from_current();
}
static void bpf_map_release_memcg(struct bpf_map *map)
--
1.8.3.1
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH bpf-next 5/5] bpf: allow to disable bpf prog memory accounting
2023-02-05 6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
` (3 preceding siblings ...)
2023-02-05 6:58 ` [PATCH bpf-next 4/5] bpf: allow to disable bpf map memory accounting Yafang Shao
@ 2023-02-05 6:58 ` Yafang Shao
2023-02-08 19:29 ` [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Johannes Weiner
2023-02-08 20:54 ` Roman Gushchin
6 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-05 6:58 UTC (permalink / raw)
To: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
roman.gushchin, shakeelb, muchun.song, akpm
Cc: bpf, cgroups, linux-mm, Yafang Shao
We can simply disable the bpf prog memory accouting by not setting the
GFP_ACCOUNT.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
kernel/bpf/core.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 16da510..3390961 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -35,6 +35,7 @@
#include <linux/bpf_verifier.h>
#include <linux/nodemask.h>
#include <linux/bpf_mem_alloc.h>
+#include <linux/memcontrol.h>
#include <asm/barrier.h>
#include <asm/unaligned.h>
@@ -87,7 +88,7 @@ void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, uns
struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flags)
{
- gfp_t gfp_flags = GFP_KERNEL_ACCOUNT | __GFP_ZERO | gfp_extra_flags;
+ gfp_t gfp_flags = bpf_memcg_flags(GFP_KERNEL | __GFP_ZERO | gfp_extra_flags);
struct bpf_prog_aux *aux;
struct bpf_prog *fp;
@@ -96,12 +97,12 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
if (fp == NULL)
return NULL;
- aux = kzalloc(sizeof(*aux), GFP_KERNEL_ACCOUNT | gfp_extra_flags);
+ aux = kzalloc(sizeof(*aux), bpf_memcg_flags(GFP_KERNEL | gfp_extra_flags));
if (aux == NULL) {
vfree(fp);
return NULL;
}
- fp->active = alloc_percpu_gfp(int, GFP_KERNEL_ACCOUNT | gfp_extra_flags);
+ fp->active = alloc_percpu_gfp(int, bpf_memcg_flags(GFP_KERNEL | gfp_extra_flags));
if (!fp->active) {
vfree(fp);
kfree(aux);
@@ -126,7 +127,7 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
{
- gfp_t gfp_flags = GFP_KERNEL_ACCOUNT | __GFP_ZERO | gfp_extra_flags;
+ gfp_t gfp_flags = bpf_memcg_flags(GFP_KERNEL | __GFP_ZERO | gfp_extra_flags);
struct bpf_prog *prog;
int cpu;
@@ -159,7 +160,7 @@ int bpf_prog_alloc_jited_linfo(struct bpf_prog *prog)
prog->aux->jited_linfo = kvcalloc(prog->aux->nr_linfo,
sizeof(*prog->aux->jited_linfo),
- GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
+ bpf_memcg_flags(GFP_KERNEL | __GFP_NOWARN));
if (!prog->aux->jited_linfo)
return -ENOMEM;
@@ -234,7 +235,7 @@ void bpf_prog_fill_jited_linfo(struct bpf_prog *prog,
struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size,
gfp_t gfp_extra_flags)
{
- gfp_t gfp_flags = GFP_KERNEL_ACCOUNT | __GFP_ZERO | gfp_extra_flags;
+ gfp_t gfp_flags = bpf_memcg_flags(GFP_KERNEL | __GFP_ZERO | gfp_extra_flags);
struct bpf_prog *fp;
u32 pages;
--
1.8.3.1
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf
2023-02-05 6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
` (4 preceding siblings ...)
2023-02-05 6:58 ` [PATCH bpf-next 5/5] bpf: allow to disable bpf prog " Yafang Shao
@ 2023-02-08 19:29 ` Johannes Weiner
2023-02-08 20:54 ` Roman Gushchin
6 siblings, 0 replies; 11+ messages in thread
From: Johannes Weiner @ 2023-02-08 19:29 UTC (permalink / raw)
To: Yafang Shao
Cc: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
john.fastabend, kpsingh, sdf, haoluo, jolsa, mhocko,
roman.gushchin, shakeelb, muchun.song, akpm, bpf, cgroups,
linux-mm
On Sun, Feb 05, 2023 at 06:58:00AM +0000, Yafang Shao wrote:
> The bpf memory accouting has some known problems in contianer
> environment,
>
> - The container memory usage is not consistent if there's pinned bpf
> program
> After the container restart, the leftover bpf programs won't account
> to the new generation, so the memory usage of the container is not
> consistent. This issue can be resolved by introducing selectable
> memcg, but we don't have an agreement on the solution yet. See also
> the discussions at https://lwn.net/Articles/905150/ .
>
> - The leftover non-preallocated bpf map can't be limited
> The leftover bpf map will be reparented, and thus it will be limited by
> the parent, rather than the container itself. Furthermore, if the
> parent is destroyed, it be will limited by its parent's parent, and so
> on. It can also be resolved by introducing selectable memcg.
>
> - The memory dynamically allocated in bpf prog is charged into root memcg
> only
> Nowdays the bpf prog can dynamically allocate memory, for example via
> bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
> pool, so it will charge into root memcg only. That needs to be
> addressed by a new proposal.
>
> So let's give the user an option to disable bpf memory accouting.
>
> The idea of "cgroup.memory=nobpf" is originally by Tejun[1].
I'm not the most familiar with bpf internals, but the memcg bits and
adding the boot time flag look good to me:
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf
2023-02-05 6:58 [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Yafang Shao
` (5 preceding siblings ...)
2023-02-08 19:29 ` [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf Johannes Weiner
@ 2023-02-08 20:54 ` Roman Gushchin
2023-02-09 11:28 ` Yafang Shao
6 siblings, 1 reply; 11+ messages in thread
From: Roman Gushchin @ 2023-02-08 20:54 UTC (permalink / raw)
To: Yafang Shao
Cc: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
shakeelb, muchun.song, akpm, bpf, cgroups, linux-mm
On Sun, Feb 05, 2023 at 06:58:00AM +0000, Yafang Shao wrote:
> The bpf memory accouting has some known problems in contianer
> environment,
>
> - The container memory usage is not consistent if there's pinned bpf
> program
> After the container restart, the leftover bpf programs won't account
> to the new generation, so the memory usage of the container is not
> consistent. This issue can be resolved by introducing selectable
> memcg, but we don't have an agreement on the solution yet. See also
> the discussions at https://lwn.net/Articles/905150/ .
>
> - The leftover non-preallocated bpf map can't be limited
> The leftover bpf map will be reparented, and thus it will be limited by
> the parent, rather than the container itself. Furthermore, if the
> parent is destroyed, it be will limited by its parent's parent, and so
> on. It can also be resolved by introducing selectable memcg.
>
> - The memory dynamically allocated in bpf prog is charged into root memcg
> only
> Nowdays the bpf prog can dynamically allocate memory, for example via
> bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
> pool, so it will charge into root memcg only. That needs to be
> addressed by a new proposal.
>
> So let's give the user an option to disable bpf memory accouting.
>
> The idea of "cgroup.memory=nobpf" is originally by Tejun[1].
>
> [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/
>
> Yafang Shao (5):
> mm: memcontrol: add new kernel parameter cgroup.memory=nobpf
> bpf: use bpf_map_kvcalloc in bpf_local_storage
> bpf: introduce bpf_memcg_flags()
> bpf: allow to disable bpf map memory accounting
> bpf: allow to disable bpf prog memory accounting
Hello Yafang!
Overall the patch looks good to me, please, feel free to add
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
I'd squash patch (3) into (4), but up to you.
Thanks!
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH bpf-next 0/5] bpf, mm: introduce cgroup.memory=nobpf
2023-02-08 20:54 ` Roman Gushchin
@ 2023-02-09 11:28 ` Yafang Shao
0 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2023-02-09 11:28 UTC (permalink / raw)
To: Roman Gushchin
Cc: tj, ast, daniel, andrii, kafai, songliubraving, yhs,
john.fastabend, kpsingh, sdf, haoluo, jolsa, hannes, mhocko,
shakeelb, muchun.song, akpm, bpf, cgroups, linux-mm
On Thu, Feb 9, 2023 at 4:54 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Sun, Feb 05, 2023 at 06:58:00AM +0000, Yafang Shao wrote:
> > The bpf memory accouting has some known problems in contianer
> > environment,
> >
> > - The container memory usage is not consistent if there's pinned bpf
> > program
> > After the container restart, the leftover bpf programs won't account
> > to the new generation, so the memory usage of the container is not
> > consistent. This issue can be resolved by introducing selectable
> > memcg, but we don't have an agreement on the solution yet. See also
> > the discussions at https://lwn.net/Articles/905150/ .
> >
> > - The leftover non-preallocated bpf map can't be limited
> > The leftover bpf map will be reparented, and thus it will be limited by
> > the parent, rather than the container itself. Furthermore, if the
> > parent is destroyed, it be will limited by its parent's parent, and so
> > on. It can also be resolved by introducing selectable memcg.
> >
> > - The memory dynamically allocated in bpf prog is charged into root memcg
> > only
> > Nowdays the bpf prog can dynamically allocate memory, for example via
> > bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
> > pool, so it will charge into root memcg only. That needs to be
> > addressed by a new proposal.
> >
> > So let's give the user an option to disable bpf memory accouting.
> >
> > The idea of "cgroup.memory=nobpf" is originally by Tejun[1].
> >
> > [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/
> >
> > Yafang Shao (5):
> > mm: memcontrol: add new kernel parameter cgroup.memory=nobpf
> > bpf: use bpf_map_kvcalloc in bpf_local_storage
> > bpf: introduce bpf_memcg_flags()
> > bpf: allow to disable bpf map memory accounting
> > bpf: allow to disable bpf prog memory accounting
>
> Hello Yafang!
>
> Overall the patch looks good to me, please, feel free to add
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
>
> I'd squash patch (3) into (4), but up to you.
>
Sure. Will do it.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 11+ messages in thread