* [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size()
@ 2023-02-10 15:49 Yafang Shao
2023-02-10 22:05 ` Andrew Morton
2023-02-10 22:41 ` Roman Gushchin
0 siblings, 2 replies; 9+ messages in thread
From: Yafang Shao @ 2023-02-10 15:49 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Yafang Shao, Dennis Zhou, Tejun Heo, Christoph Lameter,
Roman Gushchin, Vasily Averin
The extra space which is used to store the obj_cgroup membership is only
valid when kmemcg is enabled. The kmemcg can be disabled via the kernel
parameter "cgroup.memory=nokmem" at runtime.
This helper is also used in non-memcg code, for example the tracepoint,
so we should fix it.
It was found by code review when I was implementing bpf memory usage[1].
No real issue happens in production environment.
[1]. https://lwn.net/Articles/921991/
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Vasily Averin <vvs@openvz.org>
---
mm/percpu-internal.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index 70b1ea2..2a95b1f 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -4,6 +4,7 @@
#include <linux/types.h>
#include <linux/percpu.h>
+#include <linux/memcontrol.h>
/*
* pcpu_block_md is the metadata block struct.
@@ -125,7 +126,8 @@ static inline size_t pcpu_obj_full_size(size_t size)
size_t extra_size = 0;
#ifdef CONFIG_MEMCG_KMEM
- extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
+ if (!mem_cgroup_kmem_disabled())
+ extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
#endif
return size * num_possible_cpus() + extra_size;
--
1.8.3.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size()
2023-02-10 15:49 [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size() Yafang Shao
@ 2023-02-10 22:05 ` Andrew Morton
2023-02-10 22:39 ` Dennis Zhou
2023-02-12 14:05 ` Yafang Shao
2023-02-10 22:41 ` Roman Gushchin
1 sibling, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2023-02-10 22:05 UTC (permalink / raw)
To: Yafang Shao
Cc: linux-mm, Dennis Zhou, Tejun Heo, Christoph Lameter,
Roman Gushchin, Vasily Averin
On Fri, 10 Feb 2023 15:49:47 +0000 Yafang Shao <laoar.shao@gmail.com> wrote:
> The extra space which is used to store the obj_cgroup membership is only
> valid when kmemcg is enabled. The kmemcg can be disabled via the kernel
> parameter "cgroup.memory=nokmem" at runtime.
> This helper is also used in non-memcg code, for example the tracepoint,
> so we should fix it.
>
> It was found by code review when I was implementing bpf memory usage[1].
> No real issue happens in production environment.
>
> ...
>
> --- a/mm/percpu-internal.h
> +++ b/mm/percpu-internal.h
> @@ -4,6 +4,7 @@
>
> #include <linux/types.h>
> #include <linux/percpu.h>
> +#include <linux/memcontrol.h>
>
> /*
> * pcpu_block_md is the metadata block struct.
> @@ -125,7 +126,8 @@ static inline size_t pcpu_obj_full_size(size_t size)
> size_t extra_size = 0;
>
> #ifdef CONFIG_MEMCG_KMEM
> - extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
> + if (!mem_cgroup_kmem_disabled())
> + extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
> #endif
>
> return size * num_possible_cpus() + extra_size;
Seems risky at the first look - enabling kmemcg at runtime will make
prior calculations based on pcpu_obj_full_size) incorrect. But as long
as this is only used for accounting I guess that's OK.
What happens if we do a bunch of allocations with kmemcg enabled, then
disable kmemcg then free those allocations, or some such thing. Does
the accounting end up wrong?
The final sentence in the pcpu_obj_full_size() kerneldoc could do with
an update - it still implies that the extra_size accounting is
unconditional.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size()
2023-02-10 22:05 ` Andrew Morton
@ 2023-02-10 22:39 ` Dennis Zhou
2023-02-12 14:12 ` Yafang Shao
2023-02-12 14:05 ` Yafang Shao
1 sibling, 1 reply; 9+ messages in thread
From: Dennis Zhou @ 2023-02-10 22:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Yafang Shao, linux-mm, Tejun Heo, Christoph Lameter,
Roman Gushchin, Vasily Averin
Hello,
On Fri, Feb 10, 2023 at 02:05:08PM -0800, Andrew Morton wrote:
> On Fri, 10 Feb 2023 15:49:47 +0000 Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > The extra space which is used to store the obj_cgroup membership is only
> > valid when kmemcg is enabled. The kmemcg can be disabled via the kernel
> > parameter "cgroup.memory=nokmem" at runtime.
> > This helper is also used in non-memcg code, for example the tracepoint,
> > so we should fix it.
> >
> > It was found by code review when I was implementing bpf memory usage[1].
> > No real issue happens in production environment.
> >
> > ...
> >
> > --- a/mm/percpu-internal.h
> > +++ b/mm/percpu-internal.h
> > @@ -4,6 +4,7 @@
> >
> > #include <linux/types.h>
> > #include <linux/percpu.h>
> > +#include <linux/memcontrol.h>
> >
> > /*
> > * pcpu_block_md is the metadata block struct.
> > @@ -125,7 +126,8 @@ static inline size_t pcpu_obj_full_size(size_t size)
> > size_t extra_size = 0;
> >
> > #ifdef CONFIG_MEMCG_KMEM
> > - extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
> > + if (!mem_cgroup_kmem_disabled())
> > + extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
> > #endif
> >
> > return size * num_possible_cpus() + extra_size;
>
Sorry I've been a bit mia...
> Seems risky at the first look - enabling kmemcg at runtime will make
> prior calculations based on pcpu_obj_full_size) incorrect. But as long
> as this is only used for accounting I guess that's OK.
>
> What happens if we do a bunch of allocations with kmemcg enabled, then
> disable kmemcg then free those allocations, or some such thing. Does
> the accounting end up wrong?
>
For now it works correctly because of 2 things. 1 - the function is only
called by accounting. 2 - the free path doesn't consult
mem_cgroup_kmem_disabled() but consults if a memcg exists for a percpu
allocation. If accounting is enabled, we'd always account the additional
memory for the memcg accounting. If it's not enabled, then percpu is
well unaccounted for.
This function probably needs to be renamed a bit more carefully so it
doesn't bleed outside of mm/percpu.c.
In short, I don't think this change is correct.
> The final sentence in the pcpu_obj_full_size() kerneldoc could do with
> an update - it still implies that the extra_size accounting is
> unconditional.
>
Thanks,
Dennis
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size()
2023-02-10 15:49 [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size() Yafang Shao
2023-02-10 22:05 ` Andrew Morton
@ 2023-02-10 22:41 ` Roman Gushchin
1 sibling, 0 replies; 9+ messages in thread
From: Roman Gushchin @ 2023-02-10 22:41 UTC (permalink / raw)
To: Yafang Shao
Cc: akpm, linux-mm, Dennis Zhou, Tejun Heo, Christoph Lameter, Vasily Averin
On Fri, Feb 10, 2023 at 03:49:47PM +0000, Yafang Shao wrote:
> The extra space which is used to store the obj_cgroup membership is only
> valid when kmemcg is enabled. The kmemcg can be disabled via the kernel
> parameter "cgroup.memory=nokmem" at runtime.
> This helper is also used in non-memcg code, for example the tracepoint,
> so we should fix it.
>
> It was found by code review when I was implementing bpf memory usage[1].
> No real issue happens in production environment.
Probably not a stable material, but nice to have it fixed. Thanks!
>
> [1]. https://lwn.net/Articles/921991/
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Dennis Zhou <dennis@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Vasily Averin <vvs@openvz.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Thanks!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size()
2023-02-10 22:05 ` Andrew Morton
2023-02-10 22:39 ` Dennis Zhou
@ 2023-02-12 14:05 ` Yafang Shao
2023-02-13 18:50 ` Andrew Morton
1 sibling, 1 reply; 9+ messages in thread
From: Yafang Shao @ 2023-02-12 14:05 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Dennis Zhou, Tejun Heo, Christoph Lameter,
Roman Gushchin, Vasily Averin
On Sat, Feb 11, 2023 at 6:05 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 10 Feb 2023 15:49:47 +0000 Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > The extra space which is used to store the obj_cgroup membership is only
> > valid when kmemcg is enabled. The kmemcg can be disabled via the kernel
> > parameter "cgroup.memory=nokmem" at runtime.
> > This helper is also used in non-memcg code, for example the tracepoint,
> > so we should fix it.
> >
> > It was found by code review when I was implementing bpf memory usage[1].
> > No real issue happens in production environment.
> >
> > ...
> >
> > --- a/mm/percpu-internal.h
> > +++ b/mm/percpu-internal.h
> > @@ -4,6 +4,7 @@
> >
> > #include <linux/types.h>
> > #include <linux/percpu.h>
> > +#include <linux/memcontrol.h>
> >
> > /*
> > * pcpu_block_md is the metadata block struct.
> > @@ -125,7 +126,8 @@ static inline size_t pcpu_obj_full_size(size_t size)
> > size_t extra_size = 0;
> >
> > #ifdef CONFIG_MEMCG_KMEM
> > - extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
> > + if (!mem_cgroup_kmem_disabled())
> > + extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
> > #endif
> >
> > return size * num_possible_cpus() + extra_size;
>
> Seems risky at the first look - enabling kmemcg at runtime will make
> prior calculations based on pcpu_obj_full_size) incorrect. But as long
> as this is only used for accounting I guess that's OK.
>
> What happens if we do a bunch of allocations with kmemcg enabled, then
> disable kmemcg then free those allocations, or some such thing. Does
> the accounting end up wrong?
>
That won't happen, because we can only enable and disable kmemcg at boot time.
> The final sentence in the pcpu_obj_full_size() kerneldoc could do with
> an update - it still implies that the extra_size accounting is
> unconditional.
>
Sure.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size()
2023-02-10 22:39 ` Dennis Zhou
@ 2023-02-12 14:12 ` Yafang Shao
2023-02-13 20:12 ` Dennis Zhou
0 siblings, 1 reply; 9+ messages in thread
From: Yafang Shao @ 2023-02-12 14:12 UTC (permalink / raw)
To: Dennis Zhou
Cc: Andrew Morton, linux-mm, Tejun Heo, Christoph Lameter,
Roman Gushchin, Vasily Averin
On Sat, Feb 11, 2023 at 6:39 AM Dennis Zhou <dennis@kernel.org> wrote:
>
> Hello,
>
> On Fri, Feb 10, 2023 at 02:05:08PM -0800, Andrew Morton wrote:
> > On Fri, 10 Feb 2023 15:49:47 +0000 Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > > The extra space which is used to store the obj_cgroup membership is only
> > > valid when kmemcg is enabled. The kmemcg can be disabled via the kernel
> > > parameter "cgroup.memory=nokmem" at runtime.
> > > This helper is also used in non-memcg code, for example the tracepoint,
> > > so we should fix it.
> > >
> > > It was found by code review when I was implementing bpf memory usage[1].
> > > No real issue happens in production environment.
> > >
> > > ...
> > >
> > > --- a/mm/percpu-internal.h
> > > +++ b/mm/percpu-internal.h
> > > @@ -4,6 +4,7 @@
> > >
> > > #include <linux/types.h>
> > > #include <linux/percpu.h>
> > > +#include <linux/memcontrol.h>
> > >
> > > /*
> > > * pcpu_block_md is the metadata block struct.
> > > @@ -125,7 +126,8 @@ static inline size_t pcpu_obj_full_size(size_t size)
> > > size_t extra_size = 0;
> > >
> > > #ifdef CONFIG_MEMCG_KMEM
> > > - extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
> > > + if (!mem_cgroup_kmem_disabled())
> > > + extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
> > > #endif
> > >
> > > return size * num_possible_cpus() + extra_size;
> >
>
> Sorry I've been a bit mia...
>
> > Seems risky at the first look - enabling kmemcg at runtime will make
> > prior calculations based on pcpu_obj_full_size) incorrect. But as long
> > as this is only used for accounting I guess that's OK.
> >
> > What happens if we do a bunch of allocations with kmemcg enabled, then
> > disable kmemcg then free those allocations, or some such thing. Does
> > the accounting end up wrong?
> >
>
> For now it works correctly because of 2 things. 1 - the function is only
> called by accounting. 2 - the free path doesn't consult
> mem_cgroup_kmem_disabled() but consults if a memcg exists for a percpu
> allocation. If accounting is enabled, we'd always account the additional
> memory for the memcg accounting. If it's not enabled, then percpu is
> well unaccounted for.
>
> This function probably needs to be renamed a bit more carefully so it
> doesn't bleed outside of mm/percpu.c.
>
Do you have any suggestions on the new name ?
> In short, I don't think this change is correct.
Could you pls be more specific ?
>
> > The final sentence in the pcpu_obj_full_size() kerneldoc could do with
> > an update - it still implies that the extra_size accounting is
> > unconditional.
> >
>
> Thanks,
> Dennis
--
Regards
Yafang
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size()
2023-02-12 14:05 ` Yafang Shao
@ 2023-02-13 18:50 ` Andrew Morton
2023-02-14 1:57 ` Yafang Shao
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2023-02-13 18:50 UTC (permalink / raw)
To: Yafang Shao
Cc: linux-mm, Dennis Zhou, Tejun Heo, Christoph Lameter,
Roman Gushchin, Vasily Averin
On Sun, 12 Feb 2023 22:05:42 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:
> On Sat, Feb 11, 2023 at 6:05 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Fri, 10 Feb 2023 15:49:47 +0000 Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > > The extra space which is used to store the obj_cgroup membership is only
> > > valid when kmemcg is enabled. The kmemcg can be disabled via the kernel
> > > parameter "cgroup.memory=nokmem" at runtime.
>
> ...
>
> >
> > What happens if we do a bunch of allocations with kmemcg enabled, then
> > disable kmemcg then free those allocations, or some such thing. Does
> > the accounting end up wrong?
> >
>
> That won't happen, because we can only enable and disable kmemcg at boot time.
Ah, I misread the changelog. I suggest that "disabled via the kernel
parameter ... at runtime" be "... at boot time".
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size()
2023-02-12 14:12 ` Yafang Shao
@ 2023-02-13 20:12 ` Dennis Zhou
0 siblings, 0 replies; 9+ messages in thread
From: Dennis Zhou @ 2023-02-13 20:12 UTC (permalink / raw)
To: Yafang Shao
Cc: Andrew Morton, linux-mm, Tejun Heo, Christoph Lameter,
Roman Gushchin, Vasily Averin
On Sun, Feb 12, 2023 at 10:12:12PM +0800, Yafang Shao wrote:
> On Sat, Feb 11, 2023 at 6:39 AM Dennis Zhou <dennis@kernel.org> wrote:
> >
> > Hello,
> >
> > On Fri, Feb 10, 2023 at 02:05:08PM -0800, Andrew Morton wrote:
> > > On Fri, 10 Feb 2023 15:49:47 +0000 Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > > The extra space which is used to store the obj_cgroup membership is only
> > > > valid when kmemcg is enabled. The kmemcg can be disabled via the kernel
> > > > parameter "cgroup.memory=nokmem" at runtime.
> > > > This helper is also used in non-memcg code, for example the tracepoint,
> > > > so we should fix it.
> > > >
> > > > It was found by code review when I was implementing bpf memory usage[1].
> > > > No real issue happens in production environment.
> > > >
> > > > ...
> > > >
> > > > --- a/mm/percpu-internal.h
> > > > +++ b/mm/percpu-internal.h
> > > > @@ -4,6 +4,7 @@
> > > >
> > > > #include <linux/types.h>
> > > > #include <linux/percpu.h>
> > > > +#include <linux/memcontrol.h>
> > > >
> > > > /*
> > > > * pcpu_block_md is the metadata block struct.
> > > > @@ -125,7 +126,8 @@ static inline size_t pcpu_obj_full_size(size_t size)
> > > > size_t extra_size = 0;
> > > >
> > > > #ifdef CONFIG_MEMCG_KMEM
> > > > - extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
> > > > + if (!mem_cgroup_kmem_disabled())
> > > > + extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *);
> > > > #endif
> > > >
> > > > return size * num_possible_cpus() + extra_size;
> > >
> >
> > Sorry I've been a bit mia...
> >
> > > Seems risky at the first look - enabling kmemcg at runtime will make
> > > prior calculations based on pcpu_obj_full_size) incorrect. But as long
> > > as this is only used for accounting I guess that's OK.
> > >
> > > What happens if we do a bunch of allocations with kmemcg enabled, then
> > > disable kmemcg then free those allocations, or some such thing. Does
> > > the accounting end up wrong?
> > >
> >
> > For now it works correctly because of 2 things. 1 - the function is only
> > called by accounting. 2 - the free path doesn't consult
> > mem_cgroup_kmem_disabled() but consults if a memcg exists for a percpu
> > allocation. If accounting is enabled, we'd always account the additional
> > memory for the memcg accounting. If it's not enabled, then percpu is
> > well unaccounted for.
> >
> > This function probably needs to be renamed a bit more carefully so it
> > doesn't bleed outside of mm/percpu.c.
> >
>
> Do you have any suggestions on the new name ?
>
> > In short, I don't think this change is correct.
>
> Could you pls be more specific ?
>
Hmmm I got ahead of myself. I misunderstood memcg_*_enabled() vs
memcg_*_disabled(). Roman clarified it just now in [1]. I was imagining
a world where we add disabled here and then eventually enabled would
propagate here too.
Anothing that was on my mind is, should a percpu object be charged for
the memcg space even if it's not in use. I now think it's yes and then
for general accounting outside of memcg, this function is correct.
Acked-by: Dennis Zhou <dennis@kernel.org>
Andrew, I have nothing queued. Do you mind picking this up?
[1] https://lore.kernel.org/linux-mm/20230213192922.1146370-1-roman.gushchin@linux.dev/T/#u
Thanks,
Dennis
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size()
2023-02-13 18:50 ` Andrew Morton
@ 2023-02-14 1:57 ` Yafang Shao
0 siblings, 0 replies; 9+ messages in thread
From: Yafang Shao @ 2023-02-14 1:57 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Dennis Zhou, Tejun Heo, Christoph Lameter,
Roman Gushchin, Vasily Averin
On Tue, Feb 14, 2023 at 2:50 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Sun, 12 Feb 2023 22:05:42 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > On Sat, Feb 11, 2023 at 6:05 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Fri, 10 Feb 2023 15:49:47 +0000 Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > > The extra space which is used to store the obj_cgroup membership is only
> > > > valid when kmemcg is enabled. The kmemcg can be disabled via the kernel
> > > > parameter "cgroup.memory=nokmem" at runtime.
> >
> > ...
> >
> > >
> > > What happens if we do a bunch of allocations with kmemcg enabled, then
> > > disable kmemcg then free those allocations, or some such thing. Does
> > > the accounting end up wrong?
> > >
> >
> > That won't happen, because we can only enable and disable kmemcg at boot time.
>
> Ah, I misread the changelog. I suggest that "disabled via the kernel
> parameter ... at runtime" be "... at boot time".
>
Thanks for the suggestion. I will change it.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-02-14 1:58 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-10 15:49 [PATCH -mm] mm: percpu: fix incorrect size in pcpu_obj_full_size() Yafang Shao
2023-02-10 22:05 ` Andrew Morton
2023-02-10 22:39 ` Dennis Zhou
2023-02-12 14:12 ` Yafang Shao
2023-02-13 20:12 ` Dennis Zhou
2023-02-12 14:05 ` Yafang Shao
2023-02-13 18:50 ` Andrew Morton
2023-02-14 1:57 ` Yafang Shao
2023-02-10 22:41 ` Roman Gushchin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox