From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9B4BC433EF for ; Thu, 23 Jun 2022 03:29:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EEFE8E0117; Wed, 22 Jun 2022 23:29:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A6978E0115; Wed, 22 Jun 2022 23:29:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 38E098E0117; Wed, 22 Jun 2022 23:29:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2B0548E0115 for ; Wed, 22 Jun 2022 23:29:56 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 008F2213F2 for ; Thu, 23 Jun 2022 03:29:55 +0000 (UTC) X-FDA: 79608071592.08.5695886 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by imf19.hostedemail.com (Postfix) with ESMTP id 668091A0018 for ; Thu, 23 Jun 2022 03:29:54 +0000 (UTC) Date: Wed, 22 Jun 2022 20:29:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1655954992; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7m7PQ41eyC1auFWCI8Zpow+ZtCZNE5Ds6mZn3mJOYLE=; b=gqcrSqAIqvA4tGnnDQSdrA4QcBbGIY4GjjVlUblJoFA6Ga6YgvXJiLKzSMQzckepJfBCGh 6ymvLL545ttlYYvLsqsmxCJscoLt+9n+UwUoDNDvJNzjzj/9uup/3ZvspazXP9RaXTMg2r Xom5H1CZu6Teuici16/sjaWd7KXrLdQ= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Yafang Shao Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, quentin@isovalent.com, hannes@cmpxchg.org, mhocko@kernel.org, shakeelb@google.com, songmuchun@bytedance.com, akpm@linux-foundation.org, cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, vbabka@suse.cz, linux-mm@kvack.org, bpf@vger.kernel.org Subject: Re: [RFC PATCH bpf-next 00/10] bpf, mm: Recharge pages when reuse bpf map Message-ID: References: <20220619155032.32515-1-laoar.shao@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220619155032.32515-1-laoar.shao@gmail.com> X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655954995; a=rsa-sha256; cv=none; b=xjiZdpSLGVh3uodMYNsiAwSHUdkusqCwbsV1HE674DgrpAycKpi5UaaqkE5hIedd5XNeLB jT+M3NOT5JROYnaJc/vEXp/tuG42JTkTRm8/eECtB/ZjrTjwX95Kiht6cXlsmZLfyKDEqo Vu3BAelQkhLwe/24SIp2BrlV1uMfNEI= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=gqcrSqAI; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf19.hostedemail.com: domain of roman.gushchin@linux.dev designates 188.165.223.204 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655954995; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7m7PQ41eyC1auFWCI8Zpow+ZtCZNE5Ds6mZn3mJOYLE=; b=BZWmv5pDqFCGMH7lPA2aD+EB7t0W/nlkpzRc7uAW/xpeaJXrLD+/AIa+jXEnBhTGNZm88R McBOAe40LlMDwL1JqIJmHv5RsH28zZVs7iOuO83v3UcR56tK8WwbZ7V7bAwOfSmusWch9+ Pfxh/Iobnc7K4jAmaVgmBTbKGZ1rzYk= X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 668091A0018 Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=gqcrSqAI; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf19.hostedemail.com: domain of roman.gushchin@linux.dev designates 188.165.223.204 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev X-Stat-Signature: qqwsxazhba76ums1t7yrtzajuxfxc7nx X-Rspam-User: X-HE-Tag: 1655954994-953780 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Jun 19, 2022 at 03:50:22PM +0000, Yafang Shao wrote: > After switching to memcg-based bpf memory accounting, the bpf memory is > charged to the loader's memcg by default, that causes unexpected issues for > us. For instance, the container of the loader may be restarted after > pinning progs and maps, but the bpf memcg will be left and pinned on the > system. Once the loader's new generation container is started, the leftover > pages won't be charged to it. That inconsistent behavior will make trouble > for the memory resource management for this container. > > In the past few days, I have proposed two patchsets[1][2] to try to resolve > this issue, but in both of these two proposals the user code has to be > changed to adapt to it, that is a pain for us. This patchset relieves the > pain by triggering the recharge in libbpf. It also addresses Roman's > critical comments. > > The key point we can avoid changing the user code is that there's a resue > path in libbpf. Once the bpf container is restarted again, it will try > to re-run the required bpf programs, if the bpf programs are the same with > the already pinned one, it will reuse them. > > To make sure we either recharge all of them successfully or don't recharge > any of them. The recharge prograss is divided into three steps: > - Pre charge to the new generation > To make sure once we uncharge from the old generation, we can always > charge to the new generation succeesfully. If we can't pre charge to > the new generation, we won't allow it to be uncharged from the old > generation. > - Uncharge from the old generation > After pre charge to the new generation, we can uncharge from the old > generation. > - Post charge to the new generation > Finnaly we can set pages' memcg_data to the new generation. > In the pre charge step, we may succeed to charge some addresses, but fail > to charge a new address, then we should uncharge the already charged > addresses, so another recharge-err step is instroduced. > > This pachset has finished recharging bpf hash map. which is mostly used > by our bpf services. The other maps hasn't been implemented yet. The bpf > progs hasn't been implemented neither. Without going into the implementation details, the overall approach looks ok to me. But it adds complexity and code into several different subsystems, and I'm 100% sure it's not worth it if we talking about a partial support of a single map type. Are you committed to implement the recharging for all/most map types and progs and support this code in the future? I'm still feeling you trying to solve a userspace problem in the kernel. Not saying it can't be solved this way, but it seems like there are easier options. Thanks!