From: Roman Gushchin <guro@fb.com>
To: Xie Xun <xiexun162534@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"shenwenbosmile@gmail.com" <shenwenbosmile@gmail.com>
Subject: Re: memcg missing charge when setting BPF
Date: Tue, 23 Jun 2020 21:30:21 -0700 [thread overview]
Message-ID: <20200624043021.GA3669@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <1139555701.2821292.1592970418462@mail.yahoo.com>
Hello Xie!
It's actually not a surprise, it's a known limitation/exception.
Partially it was so because historically there was no way to account
percpu memory, and some bpf maps can are using it quite extensively.
Fortunately, it changed recently, and 5.9 will likely get an ability
to account percpu memory. The latest version of the patchset I've actually
sent today:
https://lore.kernel.org/linux-mm/20200623184515.4132564-1-guro@fb.com/T/#m0be45dd71e6a238985181c213d9934731949c089
I also have a patchset in work which adds a memcg accounting of bpf memory
(programs and maps). I plan to send it upstream on the next week. If everything
will go smoothly it might appear in 5.9 as well.
Unfortunately the magnitude of required changes does not allow to backport
these changes to older kernels.
Thanks!
PS I'll be completely offline till the end of the week. I'll respond all e-mail
on Monday, Jun 29th. Thanks!
On Wed, Jun 24, 2020 at 03:46:58AM +0000, Xie Xun wrote:
> Hello,
>
> I found that programs can consume much more memory than memcg limit by setting BPF for many times. It's because that allocations during setting BPF are not charged by memcg.
>
>
> Below is how I did it:
>
> 1. Run Linux kernel in a QEMU virtual machine (x86_64) with 1GB physical memory.
> The kernel is built with memcg and memcg kmem accounting enabled.
>
> 2. Create a docker (runC) container, with memory limit 100MB.
>
> docker run --name debian --memory 100000000 --kernel-memory 50000000 \
> debian:slim /bin/bash
>
> 3. In the container, run a program to set BPF for many times. I use prctl to set BPF.
>
> while(1)
> {
> prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &bpf);
> }
>
> 4. Physical memory usage(the one by `free` or `top`) is increased by around 40MB,
> but memory usage of the container's memcg doesn't increase a lot (around 100KB).
>
> 5. Run several processes to set BPF, and almost all physical memory is consumed.
> Sometimes some processes not in the container are also killed due to OOM.
>
> I also try this with user namespace on, and I can still kill host processes inside container in this way. So this problem may be dangerous for containers that based on cgroups.
>
>
> kernel version: 5.3.6
> kernel configuration: in attachment (CONFIG_MEMCG_KMEM is on)
>
>
> This blog also shows this problem: https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.xiexun.tech_break-2Dmemcg.html&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=IBhsN9u88bNDFoDHNutIMKB-YrCvCOIvw-8z9RpB8RI&s=O1b3udJv7obq8vZ88-YPEDzs7hhGov3o_Txskn4IeyA&e=
>
>
> Cause of this problem:
>
> Memory allocations during setting BPF are not charged by memcg. For example,
> in kernel/bpf/core.c:bpf_prog_alloc, bpf_prog_alloc_no_stats and alloc_percpu_gfp
> are called to allocate memory. However, neither of them are charged by memcg.
> So if we trigger this path for many times, we can consume lots of memory, without
> increasing our memcg usage.
>
> /* ------------ */
> struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
> {
> gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | gfp_extra_flags;
> struct bpf_prog *prog;
> int cpu;
>
> prog = bpf_prog_alloc_no_stats(size, gfp_extra_flags);
> if (!prog)
> return NULL;
>
> prog->aux->stats = alloc_percpu_gfp(struct bpf_prog_stats, gfp_flags);
>
> /* ... */
>
> }
> /* ------------ */
>
>
> My program that sets BPF:
>
> /* ------------ */
> #include <unistd.h>
> #include <sys/prctl.h>
> #include <linux/prctl.h>
> #include <linux/seccomp.h>
> #include <linux/filter.h>
> #include <linux/audit.h>
> #include <linux/signal.h>
> #include <sys/ptrace.h>
> #include <stdio.h>
> #include <errno.h>
>
> int main()
> {
> struct sock_filter insns[] =
> {
> {
> .code = 0x6,
> .jt = 0,
> .jf = 0,
> .k = SECCOMP_RET_ALLOW
> }
> };
> struct sock_fprog bpf =
> {
> .len = 1,
> .filter = insns
> };
> int ret;
>
> ret = prctl(PR_SET_NO_NEW_PRIVS, 1, NULL, 0, 0);
> if (ret)
> {
> printf("error1 %d\n", errno);
> return 1;
> }
> int count = 0;
> while (1)
> {
> ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &bpf);
> if (ret)
> {
> sleep(1);
> printf("error %d\n", errno);
> }
> else
> {
> count++;
> printf("ok %d\n", count);
> }
> }
> return 0;
> }
> /* ------------ */
>
>
> Thanks,
> Xie Xun
prev parent reply other threads:[~2020-06-24 4:30 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1139555701.2821292.1592970418462.ref@mail.yahoo.com>
2020-06-24 3:46 ` Xie Xun
2020-06-24 4:30 ` Roman Gushchin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200624043021.GA3669@carbon.dhcp.thefacebook.com \
--to=guro@fb.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=shenwenbosmile@gmail.com \
--cc=vdavydov.dev@gmail.com \
--cc=xiexun162534@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox