From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1556C02198 for ; Mon, 10 Feb 2025 16:24:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 211F66B007B; Mon, 10 Feb 2025 11:24:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C10E6B0083; Mon, 10 Feb 2025 11:24:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 088FD6B0085; Mon, 10 Feb 2025 11:24:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DF8F46B007B for ; Mon, 10 Feb 2025 11:24:24 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 30D1444AE2 for ; Mon, 10 Feb 2025 16:24:24 +0000 (UTC) X-FDA: 83104557648.26.D7023B5 Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) by imf21.hostedemail.com (Postfix) with ESMTP id B539C1C000C for ; Mon, 10 Feb 2025 16:24:20 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=E6ElZCLf; spf=pass (imf21.hostedemail.com: domain of mkoutny@suse.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=mkoutny@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739204660; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=x90YashPsaPmJstRQwXxOQAPROyvVRHp1IfGEbjo984=; b=pD/ZjWOyyZf9/Sv4zmmh81GJH9S9Ko3Ex9UgCRbKZgh9tbDVvo99S1lYBlRRGUuWyttzj+ OMGALcK+iwWIWiBB3YjUyKhv3IYO5+XD0asZhGJqKyh1Xyrxmf5adLx8laD/orywb8pJCZ nxvqiUUjYgCyENSgSHSmBEgFpiuIYu0= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=E6ElZCLf; spf=pass (imf21.hostedemail.com: domain of mkoutny@suse.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=mkoutny@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739204660; a=rsa-sha256; cv=none; b=YTKr4oKnOK63RqLLdYfCqqJXmesnvsKTMhRpqjxtW6380rJo1Xyc6MrOVNYXjP/NjcjvdG 5If534mFbF73Vtbzy0l7GklEaa/oXeFHr7LjYtogrAnJp89KxIrQbyLIlBTlKsKC5nM0lD AjRb2Qt6HTda/YBuz0ZQYvQyMMhQMHo= Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-ab78e6edb99so450614066b.2 for ; Mon, 10 Feb 2025 08:24:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1739204659; x=1739809459; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=x90YashPsaPmJstRQwXxOQAPROyvVRHp1IfGEbjo984=; b=E6ElZCLfXaO2RDfFwstxCr8X7ueOdKgaKVO9p+oVXn3aaSSfc9FBG8DF/sWXCknmjq rqHkHWOQAEeVOtoBPZrtNSQMGIvltEG3sZXNxGuI3YSJuSMmSSBECHTDcSJZVa71Jql+ trMW0iOhxibtMdJGZByJ0cwba07lbMFyyYKUY0XyDDc6M1WSomoAbAb3vFFvTBD0AIY3 MhzYGIRYweT5lcrIWfpz2xkTZHMJLnz6mTy2QmJocLNfbaeXBlZWrqKLO6T8cXBhJMOm 4KWSQ1TBHGHNcFk7ghnPXdtVXB7k8UiWdRyKIChZKMOY9f8aG1eu8MnILNneF6sl8Sv5 AAcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739204659; x=1739809459; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=x90YashPsaPmJstRQwXxOQAPROyvVRHp1IfGEbjo984=; b=nngqEU+SETyXJa7dBvBjfGliT608VIiOP2M4f1cDI/ffKKyugGuXkdcagZBRkSw7yi JX64QjPnbVv9wcBty/o5COAi4nLgWwZpF/XqqwFK1283AdLbN5wgR0v6BvmjLxPfcKsd +Z/sz8spwIJAY69MGOZ4xPOY9olyAkXdAHReiT/YqvJf3E8R0AF7x+iiNzU61vMO9sjE ZIH3xw6vit0rrIXwvTkmKGxkSawU9M4NkQBv/DaUGFx1/ELi3qupcZAHVe2ZOB7mO6jv MQVi7VYo78ektnVFeY9tD8KnOuvaOGEc8d3sml9P5CD2LtJDDVJF2Le/Hi9FfskZDlD+ jhKQ== X-Forwarded-Encrypted: i=1; AJvYcCVW5O0ubYDdsEZjQQ2M/va/RPAkZSDLdLDEMfDNQYL6s/5vj64jsxkwpgQmDt4eO0awHIY8y1WBog==@kvack.org X-Gm-Message-State: AOJu0Yz0NqyeWXQJ/Vxd/cr01gMpn56nRQyrLMOKRNpHIEjjvWcvqlsm 1tYxcb/ufZn5zxYltGyzOyX7cp0CxfRLLqzbD7vUwaCra79Cb1CklGVK0NIK4Nk= X-Gm-Gg: ASbGncu8gF9UnMe9s0Sgtal/bZwnu7Z5yByPShkAII+GJWWdCElrzGf6KPyPtOw1uWW oV86EiMlGQfN7NdNsdZCCLIE0hPqJZeDLjsw2hqjoiRd74uEMjChxNoaXKKZRN7hMEElSuGopML Uz+k3RffJpoCwee2Lj+HUbbB/2rwgkPFVDEFX+GKA5hG3GR6L0FzpEfgt7eDDK6HgKQrQXWSGRC D508r13enokFnS2Hyhb3IdJDIT6S6hvrwf7TvSJm+4XkhB55l0+QU9NQOFtv/ON0uK7I7UqefOL PNnmTJcrG2/ssh8zJw== X-Google-Smtp-Source: AGHT+IEXW9wG25rAqoM0JOgADtS7LivWyhuiuGf3G6h6/6tuV3xZIWYrMzqsVh4jYjIlRAbthiHmtw== X-Received: by 2002:a17:907:7f03:b0:ab7:5cc9:66fc with SMTP id a640c23a62f3a-ab789c6d927mr1500100966b.50.1739204659076; Mon, 10 Feb 2025 08:24:19 -0800 (PST) Received: from blackdock.suse.cz ([193.86.92.181]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5dcf9f6c1d6sm8012911a12.65.2025.02.10.08.24.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Feb 2025 08:24:18 -0800 (PST) Date: Mon, 10 Feb 2025 17:24:17 +0100 From: Michal =?utf-8?Q?Koutn=C3=BD?= To: Shakeel Butt , "T.J. Mercier" Cc: Tejun Heo , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: Re: [PATCH] memcg: add hierarchical effective limits for v2 Message-ID: <5jwdklebrnbym6c7ynd5y53t3wq453lg2iup6rj4yux5i72own@ay52cqthg3hy> References: <20250205222029.2979048-1-shakeel.butt@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B539C1C000C X-Stat-Signature: nup8jomcxw7fj3fyz7aaeiaq4sg1y1rj X-Rspam-User: X-HE-Tag: 1739204660-294903 X-HE-Meta: U2FsdGVkX18O1aVt4ZajmXjbG8eoKVDcMoSg6EoEPZX0BFHtDxMWORREiuznza0fxuQp55C78aUUrnbE73zFdj9xj8tYgHW/mRrvIjV9sdSyozqPPad5eOxENK6+IFgisgr488Mk2amOP1XNm4zjgPJG5qFjqvAEbC6KPSM/J4KowZ5AoKZ/nKhjJZlgN1a68WfCy11BcyUz5lCMR7cHhcT3LW26zZYaNwVSPzSIWTsV62fUSbuzPcIHEVJRqH4lKdwuIJRrjmc5W7EXr5VlaHdGBAVgnpx61+nCbkV4H+wkZujxPfKQf1CZZQccA7aFRHmnptYOVfel1nlt4cnshZ2Y0BqsCJqPk0yZ7V62wN3rq+pNdxVPIqH3tB8d25Rhm68sekjsmwNja7LzD/GthQvongtgKhPO3/nrONecHUYNGx8ofr5FhP2tx19zdkphwJc2Fhv17XS9kr5CaQVk5tEAiho12z7lMcgIJKdPu2rOfjF6kh++vCsjFxJFc6eFYZ744PRpBr94RlxDOnl0KEUlnpgrHR28qMJCWo3M1IWC2zTbOIXP/dedDROWoI+QSeNwzYsXmOBWLNGy52rsHtnAKDri8IpaTZj+WguZg4gR1qJCOSlT+VxWRhmUCCAlHNk/LD733YY1zi/GidlunFsBec+xfgzuaf5NAxOcv35Krk5XIoTIXBsE3GyozCvNig7KIPlOhnGYtFzlUuiPYDAVRDX4qb60zlqUQXxTxvdNaBQZhZQO55j03j+OnC6FF+Tb9wQ8y2iR6KYdINvxoFAj8o5smT5vYZKmSYK41+zv0Jx/Gjuan54G7G96KNcseTR9QYusHYUEavA4Wa1myK75ATvlO+FbcdwOIHKOvQWbURBSQwzhDkIRVFODtC9zYo99N9cBLwNJFddqqO+v4r0BlHI9Lhqhxuf0Lg+1+p/R8fEjCtL8egNFklqlFq1SVCSyfGFdek0D6QwPPHd IA5KLKa6 S/gwL0r5FYqKNPzjoyZ4Y0OGInlZdUyS3Hjdz4ys2fxbIK+SllF80P6l6mvJfTqAgV87TpYhRr3UCMuWGutgvLDtXVAnQgcRjcUzxTeowg11ufPB7jN15QPh/yquYbAtuPlLPYeZR6D4Ut+qtuOghBz8Upp3ZAxLW1gpk9OcZ/+sczEIFbe6GiBmLhu/QR2c1LpQMYIf/vHFJIBJJO4T4I4EMONzcHjjGGyuN5u2Dz74D0+RX6A6OsK/7tVVZ4md27uDMXWJ8emmd4GTySmOT1K8vIKmAiCl6kPf9b2TTEumJTAU4+/ZrKE8vQI7GqF2+wnC5Rb205RIwkS3bK5R/s/+MeoPcF5eom4vA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello. On Thu, Feb 06, 2025 at 11:09:05AM -0800, Shakeel Butt wrote: > Oh I totally forgot about your series. In my use-case, it is not about > dynamically knowning how much they can expand and adjust themselves but > rather knowing statically upfront what resources they have been given. >From the memcg PoV, the effective value doesn't tell how much they were given (because of sharing). > More concretely, these are workloads which used to completely occupy a > single machine, though within containers but without limits. These > workloads used to look at machine level metrics at startup on how much > resources are available. I've been there but haven't found convincing mapping of global to memcg limits. The issue is that such a value won't guarantee no OOM when below because it can be (generally) effectively shared. (Alas, apps typically don't express their memory needs in units of PSI. So it boils down to a system wide monitor like systemd-oomd and cooperation with it.) > Now these workloads are being moved to multi-tenant environment but > still the machine is partitioned statically between the workloads. So, > these workloads need to know upfront how much resources are allocated to > them upfront and the way the cgroup hierarchy is setup, that information > is a bit above the tree. FTR, e.g. in systemd setups, this can be partially overcome by exposed EffectiveMemoryMax= (the service manager who configures the resources also can do the ancestry traversal). kubernetes has downward API where generic resource info is shared into containers and I recall that lxcfs could mangle procfs memory info wrt memory limits for legacy apps. As I think about it, the cgroupns (in)visibility should be resolved by assigning the proper limit to namespace's root group memory.max (read only for contained user) and the traversal... On Thu, Feb 06, 2025 at 11:37:31AM -0800, "T.J. Mercier" wrote: > but having a single file to read instead of walking up the > tree with multiple reads to calculate an effective limit would be > nice. ...in kernel is nice but possible performance gain isn't worth hiding the shareability of the effective limit. So I wonder what is the current PoV of more MM people... Michal