From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EC99C433EF for ; Mon, 18 Jul 2022 12:27:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED0E56B0071; Mon, 18 Jul 2022 08:27:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E7EA76B0072; Mon, 18 Jul 2022 08:27:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D481D6B0073; Mon, 18 Jul 2022 08:27:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C5E7D6B0071 for ; Mon, 18 Jul 2022 08:27:15 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9013720BC3 for ; Mon, 18 Jul 2022 12:27:15 +0000 (UTC) X-FDA: 79700145630.02.814FFB9 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf21.hostedemail.com (Postfix) with ESMTP id 1B1071C005C for ; Mon, 18 Jul 2022 12:27:14 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id DBF3B209F6; Mon, 18 Jul 2022 12:27:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1658147233; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=CWG69MBMtbPs10PORBmWiw+O9epLyQ5nVwzywqBkahs=; b=u7h7Df5+NOyt2ToHarG143mDlL+iaghJFNVDhhi68GQ+wIdI+gayQxM40xh7s83Pi8TZag 6jKNBHsffX8VHKfaaZ5YJMWM0bb5A5IXBfcKj5KBhaIPgutnzxCTku/Ml/oL60AxVFLqeP OJUTF+Irrt6Ci30f6t8ijmJzjjd49l4= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id D0E892C141; Mon, 18 Jul 2022 12:27:12 +0000 (UTC) Date: Mon, 18 Jul 2022 14:27:12 +0200 From: Michal Hocko To: Alexei Starovoitov Cc: Shakeel Butt , Matthew Wilcox , Christoph Hellwig , "David S. Miller" , Daniel Borkmann , Andrii Nakryiko , Tejun Heo , Martin KaFai Lau , bpf , Kernel Team , linux-mm , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Vlastimil Babka Subject: Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator. Message-ID: References: <20220706180525.ozkxnbifgd4vzxym@MacBook-Pro-3.local.dhcp.thefacebook.com> <20220708174858.6gl2ag3asmoimpoe@macbook-pro-3.dhcp.thefacebook.com> <20220708215536.pqclxdqvtrfll2y4@google.com> <20220710073213.bkkdweiqrlnr35sv@google.com> <20220712043914.pxmbm7vockuvpmmh@macbook-pro-3.dhcp.thefacebook.com> <20220712184018.i3cisffxr7k3aei7@MacBook-Pro-3.local.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220712184018.i3cisffxr7k3aei7@MacBook-Pro-3.local.dhcp.thefacebook.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658147235; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CWG69MBMtbPs10PORBmWiw+O9epLyQ5nVwzywqBkahs=; b=Cwxpf5PWUkC3P4JjOU8wCH6ln/Y7Pm9pF+8OOzLPZySyDHbLZBpnrusS6TrBz95dEDsfwP BvG+79ugT3uk7agdmw5kDrR73uEL5Gsp6XAnw1DwyGaDY+UcYQyHrN+d6DLGhZtLlVnKYv 3uutkJsy6+0ROWyDnzmWH8i+mOuDls0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658147235; a=rsa-sha256; cv=none; b=j62Tox+CtdLMwguErfDVjXF9ktCCvluX83af6CxF8ro5HpjHCw25IcD9rXOEaBo4+KHlf1 y26X1i8dIe8QpRbdS6/45nJk1PaJ57WZ3BiN2xAoZ41Ge+9YJaA8eSbs+zgbSAsmZmXvIJ mBnJ3nPMRkTiK1fslVwj40d9emIKpXU= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=u7h7Df5+; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf21.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com X-Stat-Signature: rwhg9khosou8ppynssmmkrz49y7c8hcj X-Rspam-User: X-Rspamd-Queue-Id: 1B1071C005C X-Rspamd-Server: rspam10 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=u7h7Df5+; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf21.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com X-HE-Tag: 1658147234-182936 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 12-07-22 11:40:18, Alexei Starovoitov wrote: > On Tue, Jul 12, 2022 at 09:40:13AM +0200, Michal Hocko wrote: > > On Mon 11-07-22 21:39:14, Alexei Starovoitov wrote: > > > On Mon, Jul 11, 2022 at 02:15:07PM +0200, Michal Hocko wrote: > > > > On Sun 10-07-22 07:32:13, Shakeel Butt wrote: > > > > > On Sat, Jul 09, 2022 at 10:26:23PM -0700, Alexei Starovoitov wrote: > > > > > > On Fri, Jul 8, 2022 at 2:55 PM Shakeel Butt wrote: > > > > > [...] > > > > > > > > > > > > > > Most probably Michal's comment was on free objects sitting in the caches > > > > > > > (also pointed out by Yosry). Should we drain them on memory pressure / > > > > > > > OOM or should we ignore them as the amount of memory is not significant? > > > > > > > > > > > > Are you suggesting to design a shrinker for 0.01% of the memory > > > > > > consumed by bpf? > > > > > > > > > > No, just claim that the memory sitting on such caches is insignificant. > > > > > > > > yes, that is not really clear from the patch description. Earlier you > > > > have said that the memory consumed might go into GBs. If that is a > > > > memory that is actively used and not really reclaimable then bad luck. > > > > There are other users like that in the kernel and this is not a new > > > > problem. I think it would really help to add a counter to describe both > > > > the overall memory claimed by the bpf allocator and actively used > > > > portion of it. If you use our standard vmstat infrastructure then we can > > > > easily show that information in the OOM report. > > > > > > OOM report can potentially be extended with info about bpf consumed > > > memory, but it's not clear whether it will help OOM analysis. > > > > If GBs of memory can be sitting there then it is surely an interesting > > information to have when seeing OOM. One of the big shortcomings of the > > OOM analysis is unaccounted memory. > > > > > bpftool map show > > > prints all map data already. > > > Some devs use bpf to inspect bpf maps for finer details in run-time. > > > drgn scripts pull that data from crash dumps. > > > There is no need for new counters. > > > The idea of bpf specific counters/limits was rejected by memcg folks. > > > > I would argue that integration into vmstat is useful not only for oom > > analysis but also for regular health check scripts watching /proc/vmstat > > content. I do not think most of those generic tools are BPF aware. So > > unless there is a good reason to not account this memory there then I > > would vote for adding them. They are cheap and easy to integrate. > > We've seen enough performance issues with such counters. Not sure we are talking about the same thing. These counters are used by the page allocator as well (e.g. PGALLOC, PGFREE) without a noticeable overhead. > So, no, they are not cheap. > Remember bpf has to be optimized for all cases. > Some of them process millions of packets per second. > Others do millions of map update/delete per second which means > millions of alloc/free. I thought the whole point is to allocate from a different context than the one where the memory is used. In any case, these were my few cents to help with "usual pains when OOM is hit". I can see you are not much into discussing in more details so I won't burn much more of our time here. Let me just reiterate that OOM reports with a large part of the consumption outside of usual counters are a PITA and essentially undebuggable without a local reproducer which can get pretty tricky with custom eBPF programs running on the affected system you do not have access to. So I believe that large in-kernel memory consumers should be accounted somewhere so it is at least clear where one should look for. If numbers tell that the accounting is prohibitely expensive than it would be great to have an estimation at least. Thanks -- Michal Hocko SUSE Labs