From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36303C02198 for ; Mon, 10 Feb 2025 22:52:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A3CAD280007; Mon, 10 Feb 2025 17:52:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9EA57280006; Mon, 10 Feb 2025 17:52:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D8C5280007; Mon, 10 Feb 2025 17:52:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 704FA280006 for ; Mon, 10 Feb 2025 17:52:42 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 01D6447B03 for ; Mon, 10 Feb 2025 22:52:41 +0000 (UTC) X-FDA: 83105536164.27.602CF36 Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by imf01.hostedemail.com (Postfix) with ESMTP id BEF644000F for ; Mon, 10 Feb 2025 22:52:39 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=gIRWTnFx; spf=pass (imf01.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.51 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739227960; a=rsa-sha256; cv=none; b=JDoVSV9OkqvHW2QE4BeI/U9UTsbeXRust9f+29s0jJBX7uHAYRJkdNed7mKhzCb9RSEXOM r7pRIlg01/usVrPqVMPBURmxQbwxB7iVw4YHYr60EkvgqM9r7YwQt2hcBXvZ957oOhB37u 4zjZTdobDEDPrcNVIeAfRaRHYnKsm1s= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=gIRWTnFx; spf=pass (imf01.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.51 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739227960; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hIqV3tfLe7RjPFG0CQCvGtroDIm7scu+4kUGANh/xuo=; b=s5QQY0fzn80RTOvMWAvbRblh98FggzAo+J7vnH0Lb1CptE4Dri6A/bZuNhkGo71jxr7U/2 KKrVzvB8ULxqtZM8F/zo7xrR4t7c/wzQ5t/2d/i5lFoxk4SLRpsPROTB0uBVhjjvJduJki IloPSq0xtKFZGQcTXESITVAdbK+oSQI= Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-6e454a513a6so18924116d6.3 for ; Mon, 10 Feb 2025 14:52:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1739227959; x=1739832759; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=hIqV3tfLe7RjPFG0CQCvGtroDIm7scu+4kUGANh/xuo=; b=gIRWTnFxx4m316gVkXymCqtbJP1YpCuA+KcFM40Lxn/b7jlQhaPVili6t2UaLRrRBQ V2ZkTpGlMLWXAqNAJFevC7ocznoZ5E+IOrnh8pV0js3QwPs2LtMjfngWWE3DB1zmbGkP UlWQGBtY+hkOY+cVA+STHl//yJbFAtUoDphLyLDoI3r5Jvtef/AeM52ObxCXOLS60s0w 2ZJiZyqKT/jRdHJdb1uXSyluYwRd++CVTRPxsVaSgyk+xfRD6Hzq2oSiqsnU2ZlXmxkQ dJh26+VkXZweTdofbSpGJurIIfjtggMCHvOoWmy3hbgEhF6us6xzcjxFUgQlGLg7qiBJ +g4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739227959; x=1739832759; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hIqV3tfLe7RjPFG0CQCvGtroDIm7scu+4kUGANh/xuo=; b=KjsX1LfcmORYNBQtA3D+2fGkct01v2zRFR3oBVXIYn3IM9M6fZekajhdQDyCWTqANG DqDfx6C9hk05wwmbqD6wp5pgy7LwG7B6tt/M3Gow6CfDQfF7jIDWnKYExPtyQ3SWnULq SxfUS/F9Ow8dYGWR190d+ctp1OAZPSKkAe16PWb+iLsi67Kfn+VMZXgVUJOpJf7Gh+1r XXe8aBx7+JgohotmIjFHYhJXBrbJqa7QK9e61kzGnGRQnPFiQQpy3frRAyGizdBiFbXL 75a1+YntkzI3yYnnuSqX6zBxHPqNJSFcBG+KnnfWMDpFE5rwb90wuyoWwnE5EFK+lLyX viFQ== X-Forwarded-Encrypted: i=1; AJvYcCVeX3BvLspeCwoipR+e7lxnYmOS1pQJz8kL3gIoBYuApj0lNG/CfJJvylgypPGVn8RYeXgKv0uBgg==@kvack.org X-Gm-Message-State: AOJu0YwTOV7S2dVYI0os9hJCnu3VMJI+rb5YW5+rMvcDI5Vind4INQPL PHmYse/ZVhGX+rJ46el37EYNOKfm+wB9Wvc/VT7cmLgHm8H61mnyewZgBUEqcrU= X-Gm-Gg: ASbGncsNoL9ZAgBxbs11khxz0t6XfRgbGqB9dSSVXuOs6+MvHRA0/D9kKDzu9ko3WK1 Kah6AOzmOhVYsCPlErdDYYlhViAMWXnlYRxzLhhQ805DOg9ctquiQWbM8k2GwMarkm9+XHCm96G +Xp3mM726Axd1DvgWa+18C52Pctv3Fjd5XAQwDN468Oq07wmKBAfJWeIiqMYoeFmcX7xcOwv4r1 FQPLjpL+FvxJf8Sfxa9EE24Y9oH2gcOLhMpfrFcQpPRl4qbhsda0hatDiPaESLatIdkDZhj5QyP uorFiljIywgGTA== X-Google-Smtp-Source: AGHT+IFCCtSpTxwzZJhFgQbTnEw4Hb6IM3CNmfFsUhwUhEqAD5ltagXJM+qe8uUcXCgdyfJayAr3pQ== X-Received: by 2002:a05:6214:1d0d:b0:6d8:9a85:5b44 with SMTP id 6a1803df08f44-6e4456c11ddmr220765156d6.29.1739227958757; Mon, 10 Feb 2025 14:52:38 -0800 (PST) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with UTF8SMTPSA id 6a1803df08f44-6e451e423cdsm32325906d6.125.2025.02.10.14.52.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Feb 2025 14:52:38 -0800 (PST) Date: Mon, 10 Feb 2025 17:52:34 -0500 From: Johannes Weiner To: Michal =?iso-8859-1?Q?Koutn=FD?= Cc: Shakeel Butt , "T.J. Mercier" , Tejun Heo , Michal Hocko , Roman Gushchin , Muchun Song , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: Re: [PATCH] memcg: add hierarchical effective limits for v2 Message-ID: <20250210225234.GB2484@cmpxchg.org> References: <20250205222029.2979048-1-shakeel.butt@linux.dev> <5jwdklebrnbym6c7ynd5y53t3wq453lg2iup6rj4yux5i72own@ay52cqthg3hy> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5jwdklebrnbym6c7ynd5y53t3wq453lg2iup6rj4yux5i72own@ay52cqthg3hy> X-Rspamd-Queue-Id: BEF644000F X-Stat-Signature: 3ofr119r8oqtubdeykjyhk3b37ku77k4 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1739227959-957071 X-HE-Meta: U2FsdGVkX1/l8pUYg5oRVEToMSuCQGTexOQZLUj0mN7KDiE/6aTgVK78Gz5HxKmcmLeJwtdOXJBtb+q2yRyQoCiUDyzo/7VqGjqnaJ+SdbCGB/KBkUtXFjpJlUxh7OukR8Vg8cDoXj/N7dnyyiJBIDf+xyMWOQPWayJ+NXnaMdqtWzDaNDjlBw0MiwNvb6Qnr9Z6KFZ6B2VH7ZH+fFVQVEhRrjxlEGNQo4aKb+WToj2V1bQOTIDlbTZ1cugqaMpjK7ldHDdTj2Icv0sCiWH3U/nUdPVX7/+kW36AN/S3iaKqSOC7OnO1VZ5V5sKtyadUxlHpfr1IfMlWkGRZ384NEUhBcDTrq6uj+c516IeCPn66cuDGlRqrtn35M6rAM5JQFM6NzQ/dgVfzDbAb51caEzU3FowxesIbja+xemCQRIuwPRUPnqGNNHDEBUiSRgfMCFxDXbuWmb8vp88c93qZwwyGScdEfwbEiyriKO16DIPycdlApp4KjFAbWg0uyVpUi57tw1MH+sII+PNmVqyvW2yHzyoRLst69NydXvCyrUsvgzIJWPJrVIVwUUdH45KzGOSUs8i7CezngJzihkuO4kmtcft2xD29IsjU6/CvBxvMyIauX4H0ii7usICd23oQqFxxWYcXtUOEqo/lICWK6bRItOm0YADJdHAju32pD1vzyLi5X/FagshPTnhyXG/h3CjNBM4bEJ/+bssjGKXs/IJDPdQbCx44iK9N8SJECXsUoox/uvM6qza1uDH+rRqldPnBGmy4swTODHHJC32rllicjor4GyK3yfQtp/hjuqsSqSFQDbzYtEPC72AfYbvLVz073skvGuewGxepsq8n0DdW+vb2j+EyOv1hsNywUyt6px8bmUyle6C31x/vUMHLl2Xwb9uCP5d5KrgivNTcrukahKTCU4jIsGWanfjmIPe58BBtd4KSa50pQmo3YTgm+2FKUi/NKvhPWwnesqP 2uQqxyUc XrgfBq5R/AbBrG+h+yauIarRjOIMkumKpt9+31SnVk8JTep3I0WJsjIraZwjKMM1GLOb5p7MD6efwtg1trzpiRg8vmU+ixG90ULCPJ2pnWYIT9GUIayvUApoyGKVWP+V1zaMWaWdn8OvNaLbzWppawLNDhCoi08w2Gy+p0MvB8nziur4B8XDtYP86iyTlehqpUo03xUvuioE3PqXLsfk3AaIdPwLvgu0lL4dCSnfXvBKauFcAd8/gP5qgCF7DiBFOqm3Lghrq97Dspkl4xs+2FQIH2JDuxKqTsj5idEcDs5BcWPdlK6zmkGIkd4CBMZBis8dwPfARRQ+McGFNrUWC3iUipFuI0YEuqTG52V9klNq764T1qavzKyBGvVG5CjzOGGNgUYkPsHbFTB9AKhqdghKiAs/78D7imAwrxUyhOez/MwPxeriRdrJIbnWBy4nnMy8xnBqwIkjGO03VFmGstoQmDNNMokJE4opCPUHdVKnam9Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 10, 2025 at 05:24:17PM +0100, Michal Koutný wrote: > Hello. > > On Thu, Feb 06, 2025 at 11:09:05AM -0800, Shakeel Butt wrote: > > Oh I totally forgot about your series. In my use-case, it is not about > > dynamically knowning how much they can expand and adjust themselves but > > rather knowing statically upfront what resources they have been given. > > From the memcg PoV, the effective value doesn't tell how much they were > given (because of sharing). It's definitely true that if you have an ancestral limit for several otherwise unlimited siblings, then interpreting this number as "this is how much memory I have available" will be completely misleading. I would also say that sharing a limit with several siblings requires a certain degree of awareness and cooperation between them. From that POV, IMO it would be fine to provide a metric with contextual caveats. The problem is, what do we do with canned, unaware, maybe untrusted applications? And they don't necessarily know which they are. It depends heavily on the judgement of the administrator of any given deployment. Some workloads might be completely untrusted and hard limited. Another deployment might consider the same workload reasonably predictable that it's configured only with a failsafe max limit that is much higher than where the workload is *expected* to operate. The allotment might happen altogether with min/low protections and no max limit. Or there could be a combination of protection slightly below and a limit slightly above the expected workload size. It seems basically impossible to write portable code against this without knowing the intent of the person setting it up. But how do we communicate intent down to the container? The two broad options are implicitly or explicitly: a) Provide a cgroup file that automatically derives intended target size from how min/low/high/max are set up. Right now those can be set up super loosely depending on what the administrator thinks about the application. In order for this to work, we'd likely have to define an idiomatic way of configuring the controller. E.g. if you set max by itself, we assume this is the target size. If you set low, with or without max, then low is the target size. Or if you set both, target is in between. I'm not completely convinced this is workable. It might require settings beyond what's actually needed for the safe containment of the workload, which carries the risk of excluding something useful. I don't mean enforced configuration rules, but rather the case where a configuration is reasonable and effective given the workload and environment, but now the target file shows nonsense. b) Provide a cgroup file that is freely configurable by the administrator with the target size of the container. This has obvious drawbacks as well. What's the default value? Also, a lot of setups are dead simple: set a hard limit and expect the workload to adhere to that, period. Nobody is going to reliably set another cgroup file that a workload may or may not consume. The third option is to wash our hands of all of this, provide the static hierarchy settings to the leaves (like this patch, plus do it for the other knobs as well) and let userspace figure it out. Thoughts?