From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8F2EC28B2B for ; Fri, 19 Aug 2022 17:06:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 278218D0003; Fri, 19 Aug 2022 13:06:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2267E8D0002; Fri, 19 Aug 2022 13:06:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C8A98D0003; Fri, 19 Aug 2022 13:06:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id ECE848D0002 for ; Fri, 19 Aug 2022 13:06:55 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C4BF8C1A49 for ; Fri, 19 Aug 2022 17:06:55 +0000 (UTC) X-FDA: 79816971990.18.A19CB6A Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf29.hostedemail.com (Postfix) with ESMTP id 8DA8F120027 for ; Fri, 19 Aug 2022 17:06:55 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id e19so4002019pju.1 for ; Fri, 19 Aug 2022 10:06:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc; bh=UZnrLccuewBeUTHa7he/1pdbfRtAri2WoHO63fUtNRc=; b=XAW0y0n14GbbI93QKgU2Mko3gRIRMRbjNK8yh6f00kxV2u+FwfJBFOdIFc4ubGtXnZ 96Q7W+i5mtTv/AvpxsVclO3S1wahAB9QAbiT33xETdG3I+uCk/1gLZGnE5/tLTsGvFFo 0I1/8QoikeezFphqIzolui9+0fafgFMClIoRK29EKwdtKR/ArQz1f/PwmdqHMOlsUEsG dqop2MHzljQjY7dyyBDVAy6aQF6pfzlwuE2uTIsOz0RjIQnTjgUncRBAtxpzCdihOVK4 J5eGefsDHsvQ+K+HppyuUs/yFRkhgEdyNd4tDdrkcRyyCjtSWG74pKB6qGa2PWRznqxB 5AvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc; bh=UZnrLccuewBeUTHa7he/1pdbfRtAri2WoHO63fUtNRc=; b=2Xyp8p9seK9iFmgzdYtS53zXmkRp0wbW8oT3RpO2DxRrClRIAB5krh/ZRftSux/R4l ZU8YWrF5lTqPz8ylk63Us8dNKP7yuUxD18RwqTonzqmepyz2Ims5akLCm29JvoTtbGKZ FP6lbyCfJW8kjMwhAY1MKpWTgopQaRlxZKCs+3v2KiqAotI6G4GJIAO3KjUxsHPEJ7Rf uplNokSpmnT7/PIA3huNIQquZE+TBJgAgZx4HvvOAvpMnP7oJrRx6/fl+P+myUSRqfN0 OkVhbPYtkGrrELIfUidXGTIYwNNY7ecf77q9xxyP1mF4FYWD0Fcxu1SwE2+1esA0BGdE +QfA== X-Gm-Message-State: ACgBeo1KzotphYpOkX8yEtIckjS5X33u8oa1VtJ8tZcB3JwoKjQeHHZm Cqpg/ikid2Khh2LhxV5Q3w4= X-Google-Smtp-Source: AA6agR520zkVb1ab+1IFrWSRolYaFHIavEFl2N7Y7Y3JZORzp6bFg/A0aCnrdfpBhO9U8UXqp9Af9g== X-Received: by 2002:a17:902:d4c7:b0:16e:d1fd:f212 with SMTP id o7-20020a170902d4c700b0016ed1fdf212mr8198729plg.79.1660928814306; Fri, 19 Aug 2022 10:06:54 -0700 (PDT) Received: from localhost ([2620:10d:c090:400::5:db7d]) by smtp.gmail.com with ESMTPSA id 201-20020a6214d2000000b0052dbad1ea2esm3657782pfu.6.2022.08.19.10.06.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Aug 2022 10:06:53 -0700 (PDT) Date: Fri, 19 Aug 2022 07:06:51 -1000 From: Tejun Heo To: Yafang Shao Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin Lau , Song Liu , Yonghong Song , john fastabend , KP Singh , Stanislav Fomichev , Hao Luo , jolsa@kernel.org, Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , lizefan.x@bytedance.com, Cgroups , netdev , bpf , Linux MM Subject: Re: [PATCH bpf-next v2 00/12] bpf: Introduce selectable memcg for bpf map Message-ID: References: <20220818143118.17733-1-laoar.shao@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660928815; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UZnrLccuewBeUTHa7he/1pdbfRtAri2WoHO63fUtNRc=; b=6r2sX4KgD5Zooqco2+Um/xgQMra+G840yGxgLEaTAnBtu/NdxF+A1vL8dKYOecLXWd7fQd hri5bNNp34DPqDQ6jnA5Dp6wUvFFbfJMzh0A1wEciJ9XDArmkDWR7344xwIFuFCzV8itoB z6EBBqoQuMDnLejPD6+3wQpRcPS2Y5E= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=XAW0y0n1; spf=pass (imf29.hostedemail.com: domain of htejun@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660928815; a=rsa-sha256; cv=none; b=ASSXVcKbRrTNDh078kENR2l0MdloZpiV+OljkELLbFYh83FzCI5C5aLEy9lyXiJx/2gGnS 2WrGtQ914cfR3YB18y1uepo9Hem9or8ezdfCdsKFfxzE/w5k6kBnlWz2WSuCMakoJz47CE 6tEwOBfytiUPJmbz5S235xQ66LLn3I4= Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=XAW0y0n1; spf=pass (imf29.hostedemail.com: domain of htejun@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 8DA8F120027 X-Stat-Signature: keypbhze855gdqaq5u5kbku5j8do51xq X-Rspam-User: X-HE-Tag: 1660928815-303524 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Fri, Aug 19, 2022 at 09:09:25AM +0800, Yafang Shao wrote: > On Fri, Aug 19, 2022 at 6:33 AM Tejun Heo wrote: > > > > On Thu, Aug 18, 2022 at 12:20:33PM -1000, Tejun Heo wrote: > > > We have the exact same problem for any resources which span multiple > > > instances of a service including page cache, tmpfs instances and any other > > > thing which can persist longer than procss life time. My current opinion is > > > > To expand a bit more on this point, once we start including page cache and > > tmpfs, we now get entangled with memory reclaim which then brings in IO and > > not-yet-but-eventually CPU usage. > > Introduce-a-new-layer vs introduce-a-new-cgroup, which one is more overhead? Introducing a new layer in cgroup2 doesn't mean that any specific resource controller is enabled, so there is no runtime overhead difference. In terms of logical complexity, introducing a localized layer seems a lot more straightforward than building a whole separate tree. Note that the same applies to cgroup1 where collapsed controller tree is represented by simply not creating those layers in that particular controller tree. No matter how we cut the problem here, if we want to track these persistent resources, we have to create a cgroup to host them somewhere. The discussion we're having is mostly around where to put them. With your proposal, it can be anywhere and you draw out an example where the persistent cgroups form their own separate tree. What I'm saying is that the logical place to put it is where the current resource consumption is and we just need to put the persistent entity as the parent of the instances. Flexibility, just like anything else, isn't free. Here, if we extrapolate this approach, the cost is evidently hefty in that it doesn't generically work with the basic resource control structure. > > Once you start splitting the tree like > > you're suggesting here, all those will break down and now we have to worry > > about how to split resource accounting and control for the same entities > > across two split branches of the tree, which doesn't really make any sense. > > The k8s has already been broken thanks to the memcg accounting on bpf memory. > If you ignored it, I paste it below. > [0]"1. The memory usage is not consistent between the first generation and > new generations." > > This issue will persist even if you introduce a new layer. Please watch your tone. Again, this isn't a problem specific to k8s. We have the same problem with e.g. persistent tmpfs. One idea which I'm not against is allowing specific resources to be charged to an ancestor. We gotta think carefully about how such charges should be granted / denied but an approach like that jives well with the existing hierarchical control structure and because introducing a persistent layer does too, the combination of the two works well. > > So, we *really* don't wanna paint ourselves into that kind of a corner. This > > is a dead-end. Please ditch it. > > It makes non-sensen to ditch it. > Because, the hierarchy I described in the commit log is *one* use case > of the selectable memcg, but not *the only one* use case of it. If you > dislike that hierarchy, I will remove it to avoid misleading you. But if you drop that, what'd be the rationale for adding what you're proposing? Why would we want bpf memory charges to be attached any part of the hierarchy? > Even if you introduce a new layer, you still need the selectable memcg. > For example, to avoid the issue I described in [0], you still need to > charge to the parent cgroup instead of the current cgroup. As I wrote above, we've been discussing the above. Again, I'd be a lot more amenable to such approach because it fits with how everything is structured. > That's why I described in the commit log that the selectable memcg is flexible. Hopefully, my point on this is clear by now. Thanks. -- tejun