From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46F11C433EF for ; Tue, 12 Jul 2022 08:40:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7CB29940056; Tue, 12 Jul 2022 04:40:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7521A940033; Tue, 12 Jul 2022 04:40:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F236940056; Tue, 12 Jul 2022 04:40:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 49B49940033 for ; Tue, 12 Jul 2022 04:40:27 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 1E4B73407B for ; Tue, 12 Jul 2022 08:40:27 +0000 (UTC) X-FDA: 79677801294.12.3D9DF38 Received: from mail-vs1-f51.google.com (mail-vs1-f51.google.com [209.85.217.51]) by imf24.hostedemail.com (Postfix) with ESMTP id A3DD018006D for ; Tue, 12 Jul 2022 08:40:26 +0000 (UTC) Received: by mail-vs1-f51.google.com with SMTP id d187so7147792vsd.10 for ; Tue, 12 Jul 2022 01:40:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=e/3xRUw+AhDAaSaTQ4YnfHz8pUsuYyr5QJTCiKtU/+8=; b=g3H0LnpEr0UXO87E2vF/6HgNq4pD95kIbBUOrQzyjGoAQqT8nBf6gN5NLuhAtjStP8 b6aPdzfcJL6MB5wDD+veOSIZ5altrCFQFeUrHBh9CIL/arF0mDq73uz+G14SyL/lp565 YW8gYxKRJQy3MnjwkU6wj2BA0oU+aBLwsYOd/c4YcvN/xDWmqsZ6Kn8YGqUEkfzZSV4j j8STnavL36aao63TxNuLZAjI7MQYDGnf6EO2a9EFIgMsnc+rY+FaEypzSx9/wxYmOoVl zfB8WSkwLYL3Ysw65mRtqy/REe29XwnQjMxns+j54faeFqFdWssmNOICtp+tFwnIwmmy ze2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=e/3xRUw+AhDAaSaTQ4YnfHz8pUsuYyr5QJTCiKtU/+8=; b=kuWoKK+J2hX3xTG+TgZ8fHPsYt9Xb5lJwuEY/ZVY//mDv6JQg4ObFBALsqVMfnjJ8D 43hVCG7W0WDnAaaRusKMp9H9ZQ5dpb7NAkGDpHylTet4g4zQHc9wr98wNysbaMZOgXR7 yKtTu14ios8dR8VWSb3NGZ1PTAM8Eo46CobaE5UK4ID/W38cXdh7nbhxdYQNrTR7tFix KH5JxX1RF7ylc9Q+kdqDNuqI9ciRzRR/HrJvukcCdheJ40KLRf/PtYD622yccwqbTPF9 5rIXTTNAyIP1wdS7cyPdGFpEkxdmwegAwlzWAb4XThdihehFFG3zw+TxGem9R8HULLS3 dKZg== X-Gm-Message-State: AJIora/YMjdVPIH6qanoVe99GHf+YZgv5cZjdBRKvUvntPmdlkVzceJj Cpj+zR2Nv4JPq/5/q5qlr3vG5nLHfw9xnlSV61M= X-Google-Smtp-Source: AGRyM1vy275iWu0ITFuTL2WLkRaiqo3KjUsVPwUtuph0/CqB1TAlh8Y0AJzSfsrkFmnlLWZL9NQ19TARqMXsu36o2ws= X-Received: by 2002:a05:6102:411:b0:357:6e48:d34c with SMTP id d17-20020a056102041100b003576e48d34cmr2039803vsq.80.1657615225833; Tue, 12 Jul 2022 01:40:25 -0700 (PDT) MIME-Version: 1.0 References: <20220706175034.y4hw5gfbswxya36z@MacBook-Pro-3.local> <20220706180525.ozkxnbifgd4vzxym@MacBook-Pro-3.local.dhcp.thefacebook.com> <20220708174858.6gl2ag3asmoimpoe@macbook-pro-3.dhcp.thefacebook.com> <20220708215536.pqclxdqvtrfll2y4@google.com> <20220710073213.bkkdweiqrlnr35sv@google.com> <20220712043914.pxmbm7vockuvpmmh@macbook-pro-3.dhcp.thefacebook.com> In-Reply-To: From: Yafang Shao Date: Tue, 12 Jul 2022 16:39:48 +0800 Message-ID: Subject: Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator. To: Michal Hocko Cc: Alexei Starovoitov , Shakeel Butt , Matthew Wilcox , Christoph Hellwig , "David S. Miller" , Daniel Borkmann , Andrii Nakryiko , Tejun Heo , Martin KaFai Lau , bpf , Kernel Team , linux-mm , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Vlastimil Babka Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657615226; a=rsa-sha256; cv=none; b=JSa9uM/3rNynADUvxTTtD/bM5WbdTkAzTNwPD5jLGv5xVlXHO0nhbWZGpwLKp1jQA8klTR TZd+u9ENJOdKrXDFOh+QwYHD9Wu1YiG2y34qUrCsuvZaXHJagOVXeWUXGoU5iAnsa47YEN AqFrV0wWnEh9DKE0yZjJrYl6geJNmpQ= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=g3H0LnpE; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657615226; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=e/3xRUw+AhDAaSaTQ4YnfHz8pUsuYyr5QJTCiKtU/+8=; b=l1m4VdKR1oOtJjOJaIJdbxVWFxi3gPRhe3G8lC5bHU7+aHw9OMyGzW5t9o3u3c1/cmWR6w wSmhf/rFHrGRbzWcqvvHGzjKcPnQeQ5rjCLP6dBROhvDRhKRJ798MHimfn+T2X4+zOk+uT pFJ3KjMjv4dzh0x4n7axZpxqhNnb2SQ= X-Stat-Signature: 58ak9r7ibjeansa8mnm7ue5hiyrqm15j Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=g3H0LnpE; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A3DD018006D X-Rspam-User: X-HE-Tag: 1657615226-283460 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jul 12, 2022 at 3:40 PM Michal Hocko wrote: > > On Mon 11-07-22 21:39:14, Alexei Starovoitov wrote: > > On Mon, Jul 11, 2022 at 02:15:07PM +0200, Michal Hocko wrote: > > > On Sun 10-07-22 07:32:13, Shakeel Butt wrote: > > > > On Sat, Jul 09, 2022 at 10:26:23PM -0700, Alexei Starovoitov wrote: > > > > > On Fri, Jul 8, 2022 at 2:55 PM Shakeel Butt wrote: > > > > [...] > > > > > > > > > > > > Most probably Michal's comment was on free objects sitting in the caches > > > > > > (also pointed out by Yosry). Should we drain them on memory pressure / > > > > > > OOM or should we ignore them as the amount of memory is not significant? > > > > > > > > > > Are you suggesting to design a shrinker for 0.01% of the memory > > > > > consumed by bpf? > > > > > > > > No, just claim that the memory sitting on such caches is insignificant. > > > > > > yes, that is not really clear from the patch description. Earlier you > > > have said that the memory consumed might go into GBs. If that is a > > > memory that is actively used and not really reclaimable then bad luck. > > > There are other users like that in the kernel and this is not a new > > > problem. I think it would really help to add a counter to describe both > > > the overall memory claimed by the bpf allocator and actively used > > > portion of it. If you use our standard vmstat infrastructure then we can > > > easily show that information in the OOM report. > > > > OOM report can potentially be extended with info about bpf consumed > > memory, but it's not clear whether it will help OOM analysis. > > If GBs of memory can be sitting there then it is surely an interesting > information to have when seeing OOM. One of the big shortcomings of the > OOM analysis is unaccounted memory. > > > bpftool map show > > prints all map data already. > > Some devs use bpf to inspect bpf maps for finer details in run-time. > > drgn scripts pull that data from crash dumps. > > There is no need for new counters. > > The idea of bpf specific counters/limits was rejected by memcg folks. > > I would argue that integration into vmstat is useful not only for oom > analysis but also for regular health check scripts watching /proc/vmstat > content. I do not think most of those generic tools are BPF aware. So > unless there is a good reason to not account this memory there then I > would vote for adding them. They are cheap and easy to integrate. > > > > OK, thanks for the clarification. There is still one thing that is not > > > really clear to me. Without a proper ownership bound to any process why > > > is it desired/helpful to account the memory to a memcg? > > > > The first step is to have a limit. memcg provides it. > > I am sorry but this doesn't really explain it. Could you elaborate > please? Is the limit supposed to protect against adversaries? Or is it > just to prevent from accidental runaways? Is it purely for accounting > purposes? > > > > We have discussed something similar in a different email thread and I > > > still didn't manage to find time to put all the parts together. But if > > > the initiator (or however you call the process which loads the program) > > > exits then this might be the last process in the specific cgroup and so > > > it can be offlined and mostly invisible to an admin. > > > > Roman already sent reparenting fix: > > https://patchwork.kernel.org/project/netdevbpf/patch/20220711162827.184743-1-roman.gushchin@linux.dev/ > > Reparenting is nice but not a silver bullet. Consider a shallow > hierarchy where the charging happens in the first level under the root > memcg. Reparenting to the root is just pushing everything under the > system resources category. > Agreed. That's why I don't like reparenting. Reparenting just reparent the charged pages and then redirect the new charge, but can't reparents the 'limit' of the original memcg. So it is a risk if the original memcg is still being charged. We have to forbid the destruction of the original memcg. > > > As you have explained there is nothing really actionable on this memory > > > by the OOM killer either. So does it actually buy us much to account? > > > > It will be actionable. One step at a time. > > In the other thread we've discussed an idea to make memcg selectable when > > bpf objects are created. The user might create a special memcg and use it for > > all things bpf. This might be the way to provide bpf specific accounting > > and limits. > > Do you have a reference for those discussions? > I think it is https://lore.kernel.org/bpf/CALOAHbCM=ZxwutQOPmJx2LKY3Pd_hs+8v8r4-ybwPbBNBuNjXA@mail.gmail.com/ . Introducing independent memcg to manage pinned bpf programs and maps and forbid the user to destroy them if bpf programs are not unpinned, that's the best workaround so sar, per my analysis. -- Regards Yafang