From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
john.fastabend@gmail.com, kpsingh@kernel.org,
quentin@isovalent.com, hannes@cmpxchg.org, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeelb@google.com,
songmuchun@bytedance.com, akpm@linux-foundation.org,
cl@linux.com, penberg@kernel.org, rientjes@google.com,
iamjoonsoo.kim@lge.com, vbabka@suse.cz, linux-mm@kvack.org,
bpf@vger.kernel.org
Subject: Re: [RFC PATCH bpf-next 00/10] bpf, mm: Recharge pages when reuse bpf map
Date: Tue, 21 Jun 2022 16:28:31 -0700 [thread overview]
Message-ID: <20220621232831.nkw2e7ezfy55p6hg@macbook-pro-3.dhcp.thefacebook.com> (raw)
In-Reply-To: <20220619155032.32515-1-laoar.shao@gmail.com>
On Sun, Jun 19, 2022 at 03:50:22PM +0000, Yafang Shao wrote:
> After switching to memcg-based bpf memory accounting, the bpf memory is
> charged to the loader's memcg by default, that causes unexpected issues for
> us. For instance, the container of the loader may be restarted after
> pinning progs and maps, but the bpf memcg will be left and pinned on the
> system. Once the loader's new generation container is started, the leftover
> pages won't be charged to it. That inconsistent behavior will make trouble
> for the memory resource management for this container.
>
> In the past few days, I have proposed two patchsets[1][2] to try to resolve
> this issue, but in both of these two proposals the user code has to be
> changed to adapt to it, that is a pain for us. This patchset relieves the
> pain by triggering the recharge in libbpf. It also addresses Roman's
> critical comments.
>
> The key point we can avoid changing the user code is that there's a resue
> path in libbpf. Once the bpf container is restarted again, it will try
> to re-run the required bpf programs, if the bpf programs are the same with
> the already pinned one, it will reuse them.
>
> To make sure we either recharge all of them successfully or don't recharge
> any of them. The recharge prograss is divided into three steps:
> - Pre charge to the new generation
> To make sure once we uncharge from the old generation, we can always
> charge to the new generation succeesfully. If we can't pre charge to
> the new generation, we won't allow it to be uncharged from the old
> generation.
> - Uncharge from the old generation
> After pre charge to the new generation, we can uncharge from the old
> generation.
> - Post charge to the new generation
> Finnaly we can set pages' memcg_data to the new generation.
> In the pre charge step, we may succeed to charge some addresses, but fail
> to charge a new address, then we should uncharge the already charged
> addresses, so another recharge-err step is instroduced.
>
> This pachset has finished recharging bpf hash map. which is mostly used
> by our bpf services. The other maps hasn't been implemented yet. The bpf
> progs hasn't been implemented neither.
... and the implementation in patch 10 is missing recharge of htab->elems
which consumes the most memory.
That begs the question how the whole set was tested.
Even if that bug is fixed this recharge approach works only with preallocated
maps. Their use will be reduced in the future.
Maps with kmalloc won't work with this multi step approach.
There is no place where bpf_map_release_memcg can be done without racing
with concurrent kmallocs from bpf program side.
Despite being painful for user space the user space has to deal with it.
It created a container, charged its memcg, then destroyed the container,
but didn't free the bpf map. It's a user bug. It has to free the map.
The user space can use map-in-map solution. In the new container the new bpf map
can be allocated (and charged to new memcg), the data copied from old map,
and then inner map replaced. At this point old map can be freed and memcg
uncharged.
To make things easier we can consider introducing an anon FD that points to a memcg.
Then user can pick a memcg at map creation time instead of get_mem_cgroup_from_current.
next prev parent reply other threads:[~2022-06-21 23:28 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-19 15:50 Yafang Shao
2022-06-19 15:50 ` [RFC PATCH bpf-next 01/10] mm, memcg: Add a new helper memcg_should_recharge() Yafang Shao
2022-06-19 15:50 ` [RFC PATCH bpf-next 02/10] bpftool: Show memcg info of bpf map Yafang Shao
2022-06-19 15:50 ` [RFC PATCH bpf-next 03/10] mm, memcg: Add new helper obj_cgroup_from_current() Yafang Shao
2022-06-23 3:01 ` Roman Gushchin
2022-06-25 13:54 ` Yafang Shao
2022-06-26 1:52 ` Roman Gushchin
2022-06-19 15:50 ` [RFC PATCH bpf-next 04/10] mm, memcg: Make obj_cgroup_{charge, uncharge}_pages public Yafang Shao
2022-06-19 15:50 ` [RFC PATCH bpf-next 05/10] mm: Add helper to recharge kmalloc'ed address Yafang Shao
2022-06-19 15:50 ` [RFC PATCH bpf-next 06/10] mm: Add helper to recharge vmalloc'ed address Yafang Shao
2022-06-19 15:50 ` [RFC PATCH bpf-next 07/10] mm: Add helper to recharge percpu address Yafang Shao
2022-06-23 5:25 ` Dennis Zhou
2022-06-25 14:18 ` Yafang Shao
2022-06-19 15:50 ` [RFC PATCH bpf-next 08/10] bpf: Recharge memory when reuse bpf map Yafang Shao
2022-06-19 15:50 ` [RFC PATCH bpf-next 09/10] bpf: Make bpf_map_{save, release}_memcg public Yafang Shao
2022-06-19 15:50 ` [RFC PATCH bpf-next 10/10] bpf: Support recharge for hash map Yafang Shao
2022-06-21 23:28 ` Alexei Starovoitov [this message]
2022-06-22 14:03 ` [RFC PATCH bpf-next 00/10] bpf, mm: Recharge pages when reuse bpf map Yafang Shao
2022-06-23 3:29 ` Roman Gushchin
2022-06-25 3:26 ` Yafang Shao
2022-06-26 3:28 ` Roman Gushchin
2022-06-26 3:32 ` Roman Gushchin
2022-06-26 6:38 ` Yafang Shao
2022-06-26 6:25 ` Yafang Shao
2022-07-02 4:23 ` Roman Gushchin
2022-07-02 15:24 ` Yafang Shao
2022-07-02 15:33 ` Roman Gushchin
2022-06-27 0:40 ` Alexei Starovoitov
2022-06-27 15:02 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220621232831.nkw2e7ezfy55p6hg@macbook-pro-3.dhcp.thefacebook.com \
--to=alexei.starovoitov@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cl@linux.com \
--cc=daniel@iogearbox.net \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=laoar.shao@gmail.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penberg@kernel.org \
--cc=quentin@isovalent.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=songliubraving@fb.com \
--cc=songmuchun@bytedance.com \
--cc=vbabka@suse.cz \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox