From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC309C02198 for ; Mon, 10 Feb 2025 18:35:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 470066B0088; Mon, 10 Feb 2025 13:35:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4210F6B008A; Mon, 10 Feb 2025 13:35:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30EDB280001; Mon, 10 Feb 2025 13:35:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 121C76B0088 for ; Mon, 10 Feb 2025 13:35:10 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B38D2140476 for ; Mon, 10 Feb 2025 18:35:08 +0000 (UTC) X-FDA: 83104887096.21.ADD71B4 Received: from out-184.mta1.migadu.com (out-184.mta1.migadu.com [95.215.58.184]) by imf21.hostedemail.com (Postfix) with ESMTP id B79221C000D for ; Mon, 10 Feb 2025 18:35:06 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="Eqq0fv/E"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf21.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.184 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739212507; a=rsa-sha256; cv=none; b=LRHF0mgQ4Pa81puG4ZO5wkk1WXqeZxESDsMgmApzzGUbAIFmJNj9DknmGDJXXjRNrbRz0W g1z1smIINb7YNIUffvPWRa/BxWosUBqtHGfTuxXFYAljr49nX6EHTPczMToM0grjDxE0yy 4Z2+JpgXoxa/usw+Scr6Ki8IpKekCfA= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="Eqq0fv/E"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf21.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.184 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739212507; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Px6n4/iF3ATy1tkEKiiYXPtCS0uyJnLM3LWN0RZhHL8=; b=TKuXqm3ZSsV6p1lNixcJIlJot+lAeZI1G482jNV8yzT+iOBPwN2SYb7iyw83sITGwa1NQ9 IEsvFBgATvR27g4lD9N2DR5ne7wTzeRpreR1QO2BzSXZu7S4yH8Wrj+0DiDhJf4mv8hirV kjbTHPvur8DPGrYRilBqLCNBjEnzmXQ= Date: Mon, 10 Feb 2025 10:34:54 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1739212501; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Px6n4/iF3ATy1tkEKiiYXPtCS0uyJnLM3LWN0RZhHL8=; b=Eqq0fv/EZqZxres6gXvwinFpZxvxoAbVnSYhy/hOzYIeULRrCalvzQY8wdrpNDLwQReVpz qhKuOIIHVO/vQnzJEhcb+aYrzKJQC7n3mi2fMRb0VdpW4j3MLoC00YTmnAxqtyeHapkPwS SrJNH6alGGJKnHKCG7ULeUTw9Xpdgxk= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Michal =?utf-8?Q?Koutn=C3=BD?= Cc: "T.J. Mercier" , Tejun Heo , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: Re: [PATCH] memcg: add hierarchical effective limits for v2 Message-ID: References: <20250205222029.2979048-1-shakeel.butt@linux.dev> <5jwdklebrnbym6c7ynd5y53t3wq453lg2iup6rj4yux5i72own@ay52cqthg3hy> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5jwdklebrnbym6c7ynd5y53t3wq453lg2iup6rj4yux5i72own@ay52cqthg3hy> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B79221C000D X-Stat-Signature: bre5xmfg9h7dtzgk1fqcuwmhuocsyjfq X-Rspam-User: X-HE-Tag: 1739212506-999350 X-HE-Meta: U2FsdGVkX1/FhuEd+EMSxyMR5FMdCSH3a/lYLa+PqUyyCim8MeW/hbNL1btDaukbZd6QTjdPzYhV9ZAkzmsDd0qWK1F8s9KfrtYgO2lJ6YvUqr92IDjNtf8nb1U2M4c6//irP5tdxO4+mt71k2Ao9arWhjzZZW1VP/eAgMyMePBKIkp5L1vMpFrgCkzOL2RK/mErJGT0CSU/2OySHjfexlTRxXc3HPK3/wFncX8W96mBbBz9LWiVMJ6muwp4AFwArwXtlvPGuIiNq79HHeK8TRyvwU3HfOa1YIJBpIud3Vsc8t65EKb3ek2xRj+OzntJXKzh3xDrPbKk4F5efQz/p/Dc2rkHS/fDBIFgDMeLWzPB75mlKoaWumI6F389uCC3tpGIXDOk7lZ0zgS3de/Yp/8fMeVJIUJaELGtuXPe/eqa6YX8AJ+2AajTGi7T+uZus/DbhI4KxKkk1mNiKSGQMzwE3XpxV10q/ea9++mcGKqbEX/b7EpNXp2W4XZpCWR5iSjwnHxxHIdBU1UwKTXEJcsa4sNA4hQdhVLVFzNYlQKAjHX6kazR5DcDWcsNwjtfHA1iufN2T3AcayQ57IKBPn92UuiEDTx92IQi1Zf5wNYtEpEzbqWUs14dBQOjAA0rAbytSTrU5sMOwQmoCBannYzJvMHEcN6Zv6p5qJe95gnA78cuaWAW9N84dhd5mzCzxF/MCwPEFYJQFkIGIJfv4CBExo9e+oXUk7XObaB9B9Id/P4vC64eDRotZFmdUJo/+SOYmx/CwldG3frHgl4l/y76WzsPC3GFRRnMjV3LBC24FB3cRSZzUEcFA0a53xRGCICH0KF4yE0epE3C05tJaYnedl+Q4z1AIu8z4oObie7I/yh5gnqEg1dJemSuXFtIdzQDzf5coKFue+WvJlVmb3/BK2v0+GCPJXNNrukaCP/qNP2zUioIoWeEwVDvtUlzo3KcEIHPrj+nF6CmLJz yccxvg5L GM743WlSTZwbUp+2QFQJK67XxxO9l+x2WGBQgHrXMFxwzfw1krhRwxzIN1njKXlMAiWMd4NkA1huZ0czj669rE9DZtGxPEPQfnArLVi+F/tWU0MqyQ0vk7W4ERfJCF9VYXPgiyeXWKDHznXTUzZHZ+8XSPUcL7Qq/MUNmLkzQkHnWJkh+ICt3KU4sROdU4fR7lbU0mNvagxxtSgU8m12w4bjkWRxuy8kc2sxu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 10, 2025 at 05:24:17PM +0100, Michal Koutný wrote: > Hello. > > On Thu, Feb 06, 2025 at 11:09:05AM -0800, Shakeel Butt wrote: > > Oh I totally forgot about your series. In my use-case, it is not about > > dynamically knowning how much they can expand and adjust themselves but > > rather knowing statically upfront what resources they have been given. > > From the memcg PoV, the effective value doesn't tell how much they were > given (because of sharing). > > > More concretely, these are workloads which used to completely occupy a > > single machine, though within containers but without limits. These > > workloads used to look at machine level metrics at startup on how much > > resources are available. > > I've been there but haven't found convincing mapping of global to memcg > limits. > > The issue is that such a value won't guarantee no OOM when below because > it can be (generally) effectively shared. > > (Alas, apps typically don't express their memory needs in units of > PSI. So it boils down to a system wide monitor like systemd-oomd and > cooperation with it.) > I think you missed the static partitioning of resources use-case I mentioned. The issue you are pointing exist for the system level metrics as well i.e. a worklod looking at system metrics can't say how much they are given but in my specific case, the workloads know they occupy the full machine. Now we want to move such workloads to multi-tenant environment but the resources are still statically partitioned and not overcommitted, so effective limit will tell how much they are given. > > Now these workloads are being moved to multi-tenant environment but > > still the machine is partitioned statically between the workloads. So, > > these workloads need to know upfront how much resources are allocated to > > them upfront and the way the cgroup hierarchy is setup, that information > > is a bit above the tree. > > FTR, e.g. in systemd setups, this can be partially overcome by exposed > EffectiveMemoryMax= (the service manager who configures the resources > also can do the ancestry traversal). > kubernetes has downward API where generic resource info is shared into > containers and I recall that lxcfs could mangle procfs > memory info wrt memory limits for legacy apps. > > > As I think about it, the cgroupns (in)visibility should be resolved by > assigning the proper limit to namespace's root group memory.max (read > only for contained user) and the traversal... > I think here your point is why not have userspace based solution. I think it is possible but not convenient and adds an external dependency in the workload. > > On Thu, Feb 06, 2025 at 11:37:31AM -0800, "T.J. Mercier" wrote: > > but having a single file to read instead of walking up the > > tree with multiple reads to calculate an effective limit would be > > nice. > > ...in kernel is nice but possible performance gain isn't worth hiding > the shareability of the effective limit. > > > So I wonder what is the current PoV of more MM people... Yup, let's see more opinion on this. Thanks Michal for your feedback.