From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF91AC2BA18 for ; Sun, 23 Jun 2024 20:52:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E0726B0380; Sun, 23 Jun 2024 16:52:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 591B06B038A; Sun, 23 Jun 2024 16:52:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E33E6B0380; Sun, 23 Jun 2024 16:52:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1C1AD6B0379 for ; Sun, 23 Jun 2024 16:52:16 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 95C5C408AF for ; Sun, 23 Jun 2024 20:52:15 +0000 (UTC) X-FDA: 82263351030.06.7E63EE8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf27.hostedemail.com (Postfix) with ESMTP id B10D640016 for ; Sun, 23 Jun 2024 20:52:13 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SpVpAYJc; spf=pass (imf27.hostedemail.com: domain of longman@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=longman@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719175924; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xxNDjDlKRoE7zO5kxxC/YRHIs3V83UmtXsw3pjNyLs8=; b=YFT1QsFHYyLmNLwHcLZkEzdreqkEWfffID4TuRD8E1KDUsLoT4w1N7aYOEirCIhAMu7RxF Y0oXiBZsgJ4BnAi2D/XkI3Bi8tkRob/MYpNuP5Na07sr6GaCyuZRGWJHbZK9nR9kZ381pc FQdn7+lwu2kC7nyD1XrY3AyEdFiaIhw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SpVpAYJc; spf=pass (imf27.hostedemail.com: domain of longman@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=longman@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719175924; a=rsa-sha256; cv=none; b=kwz8bBRr0lawzn+V+WRvMzovLH6VDognLEaseZvgZ3asrZK9F47sTkIBb6ZOPKsD0OOnEo Kjx0ejISOEj/SZXmWYRIzLDwKHdgHNSQUaddGm/intkkW2mSuhLnbXjW3Yrehz93UPyv63 7MCSHG3LGaNFeUFun0mC9+4RI4jLREk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1719175932; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xxNDjDlKRoE7zO5kxxC/YRHIs3V83UmtXsw3pjNyLs8=; b=SpVpAYJcZaVbFq5l4ZILu99xyNe9xiaZ7bQLiMzgVb3ZiB+MazgqbdPLgggXIoKDhBq6K1 lH9iUfcQ8uWeWiY1Qo7IWQ5P4hWrvbD3x5o7zcICPDi6carTmf1b9nqiDZm3OZyQWXOW6r B+YTY8p2nvwLkjv9C9ftgpTU3cgi35Q= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-500-MprzqK63PtWKbblUu_VSgQ-1; Sun, 23 Jun 2024 16:52:07 -0400 X-MC-Unique: MprzqK63PtWKbblUu_VSgQ-1 Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E788C195609D; Sun, 23 Jun 2024 20:52:04 +0000 (UTC) Received: from [10.22.16.52] (unknown [10.22.16.52]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0D0431956087; Sun, 23 Jun 2024 20:52:01 +0000 (UTC) Message-ID: <77d4299e-e1ee-4471-9b53-90957daa984d@redhat.com> Date: Sun, 23 Jun 2024 16:52:00 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] memcg: Add a new sysctl parameter for automatically setting memory.high To: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , Jonathan Corbet , Shakeel Butt Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Alex Kalenyuk , Peter Hunt , linux-doc@vger.kernel.org References: <20240623204514.1032662-1-longman@redhat.com> Content-Language: en-US From: Waiman Long In-Reply-To: <20240623204514.1032662-1-longman@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Stat-Signature: d4g8mef6fwpef83datoijtyt7g5ghrhw X-Rspam-User: X-Rspamd-Queue-Id: B10D640016 X-Rspamd-Server: rspam02 X-HE-Tag: 1719175933-963147 X-HE-Meta: U2FsdGVkX18Tx5U4My9xoK28OP3lWOfRuogVRHZBc4qdfRUMGj95DfSaaLVlxWcS4HV2PPWdc3nwTwLbEpuDKFKLQq88wYgfiMHvsVUYhaMclBeExki/smjj7Lzq5y3ljQwXpkqr3Gimry3TBD1d5i/BOSyYw9fntQqeraD3hcEBCrSWgRkF5LGlM4tMT8a/YlCLF4VsHQzQ0D14Q1zcwoC6nLjvtdIti2Qa20TYkv5sHl3MltqSYviGy6Bx+EjBh8HJHo94qj6QBISikyVtMgGVUTGoF2EJvOEoHMY28kR2d2fHROgNZH4340LKCqi5E60mo94xtemJe1xWP+jfWLC2xhBsshWwxwF8zqYYasoqrK8lNnmWMT6NEU6pmBQE2w9SiuKHkJhmUJnYcAjynADxJZOJ/46863tPpgX25GrSFAgx5JhS4Ld+ShBrcHW4CCndTzUnBZoHwZghALadiBVavqFIwhnGgufZnDpJg1+ZwC7JBu12GNQZ79BUB2mYcUCsfG+PIrLZ5pSB6aWLiEr9XzhXB+ymDkt8slYS5dCYWRgONYkEtHGliBEmUs5DtDJu5SU1ic46LVE2nZSTg4o1zd/LG2L7LmK60TYdi8/ZpfzwXwyG9mMRY35kO2WZm3ymfKERovO1GNw5sRphdOVF5LtW+le5iqY/6xtJ2oR49w/TjEiW/xDJ87avbf5+hu4pRHMdeRLq/WozZU5TP/YOjiWhNEJAHHLfviUbhwl20ZGX/325n91mPGo9W2t3WWgqCFbk8LKaczeb5A1agnuPGHbBVmcN2M7uL7MkMJ/V8RY9crGy8vwRBeaJeO02DlSORVY2lefG51GQQcA4ll9y9XZYJ8KBDbpxxcBpyQCwSll0ZnMHmMDz3oapoodw+Ib1VObWXhudluvq9QVkbXhJYulE58EJkZtjWumO6JYlw73aYUIwjeDpCnvAr6roloso75HsJ07MLBFD6RY x/XrIClG r2rEt9the/LUgSGrL4pBWtBzylFdEABDoktxbX/vqe4x4YHb3yskhyX4Ionkb0LxSUThMmMEBWj+14IVQ/mn5Bk9RAr4zYLQzgL6rblTD77/MSdFpLq+YJnBNp1aUpDTzRsNur5OuDvRl+LpjyIsiDAjCv3YBbDmDS5S+OMIqp1Xa0vbA4Hn/5Y56l+3IKEuAGwwvrOStnpwJ6OP6UZM1qvFTb0P2fyfOcjAoHAdfHbJxdilb/24Z+WmDv1TmYW0mkCGLXjGaTRRW8niy3q49z5tpMAnbjwS0UnRVxiD3G0pEe7QAVV9Ruh4Mh78OVxXogrmMhhzkh1nuwzjKlvD2s7TI+zSiJoEdWeiJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Correct some email addresses. On 6/23/24 16:45, Waiman Long wrote: > With memory cgroup v1, there is only a single "memory.limit_in_bytes" > to be set to specify the maximum amount of memory that is allowed to > be used. So a lot of memory cgroup using tools and applications allow > users to specify a single memory limit. When they migrate to cgroup > v2, they use the given memory limit to set memory.max and disregard > memory.high for the time being. > > Without properly setting memory.high, these user space applications > cannot make use of the memory cgroup v2 ability to further reduce the > chance of OOM kills by throttling and early memory reclaim. > > This patch adds a new sysctl parameter "vm/memory_high_autoset_ratio" > to enable setting "memory.high" automatically whenever "memory.max" is > set as long as "memory.high" hasn't been explicitly set before. This > will allow a system administrator or a middleware layer to greatly > reduce the chance of memory cgroup OOM kills without worrying about > how to properly set memory.high. > > The new sysctl parameter will allow a range of 0-100. The default value > of 0 will disable memory.high auto setting. For any non-zero value "n", > the actual ratio used will be "n/(n+1)". A user cannot set a fraction > less than 1/2. > > Signed-off-by: Waiman Long > --- > Documentation/admin-guide/sysctl/vm.rst | 10 ++++++ > include/linux/memcontrol.h | 3 ++ > mm/memcontrol.c | 41 +++++++++++++++++++++++++ > 3 files changed, 54 insertions(+) > > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst > index e86c968a7a0e..250ec39dd5af 100644 > --- a/Documentation/admin-guide/sysctl/vm.rst > +++ b/Documentation/admin-guide/sysctl/vm.rst > @@ -46,6 +46,7 @@ Currently, these files are in /proc/sys/vm: > - mem_profiling (only if CONFIG_MEM_ALLOC_PROFILING=y) > - memory_failure_early_kill > - memory_failure_recovery > +- memory_high_autoset_ratio > - min_free_kbytes > - min_slab_ratio > - min_unmapped_ratio > @@ -479,6 +480,15 @@ Enable memory failure recovery (when supported by the platform) > 0: Always panic on a memory failure. > > > +memory_high_autoset_ratio > +========================= > + > +Specify a ratio by which memory.high should be set as a fraction of > +memory.max if it hasn't been explicitly set before. It allows a range > +of 0-100. The default value of 0 means auto setting will be disabled. > +For any non-zero value "n", the actual ratio used will be "n/(n+1)". > + > + > min_free_kbytes > =============== > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 030d34e9d117..6be161a6b922 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -221,6 +221,9 @@ struct mem_cgroup { > */ > bool oom_group; > > + /* %true if memory.high has been explicitly set */ > + bool memory_high_set; > + > /* protected by memcg_oom_lock */ > bool oom_lock; > int under_oom; > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 71fe2a95b8bd..2cfb000bf543 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -48,6 +48,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -6889,6 +6890,35 @@ static void mem_cgroup_attach(struct cgroup_taskset *tset) > } > #endif > > +/* > + * The memory.high autoset ratio specifies a ratio by which memory.high > + * should be set as a fraction of memory.max if it hasn't been explicitly > + * set before. The default value of 0 means auto setting will be disabled. > + * For any non-zero value "n", the actual ratio is "n/(n+1)". > + */ > +static int sysctl_memory_high_autoset_ratio; > + > +#ifdef CONFIG_SYSCTL > +static struct ctl_table memcg_table[] = { > + { > + .procname = "memory_high_autoset_ratio", > + .data = &sysctl_memory_high_autoset_ratio, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, > + .extra1 = SYSCTL_ZERO, > + .extra2 = SYSCTL_ONE_HUNDRED, > + }, > +}; > + > +static inline void memcg_sysctl_init(void) > +{ > + register_sysctl_init("vm", memcg_table); > +} > +#else > +static void memcg_sysctl_init(void) { } > +#endif /* CONFIG_SYSCTL */ > + > static int seq_puts_memcg_tunable(struct seq_file *m, unsigned long value) > { > if (value == PAGE_COUNTER_MAX) > @@ -6982,6 +7012,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > return err; > > page_counter_set_high(&memcg->memory, high); > + memcg->memory_high_set = true; > > for (;;) { > unsigned long nr_pages = page_counter_read(&memcg->memory); > @@ -7023,6 +7054,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, > unsigned int nr_reclaims = MAX_RECLAIM_RETRIES; > bool drained = false; > unsigned long max; > + unsigned int high_ratio = sysctl_memory_high_autoset_ratio; > int err; > > buf = strstrip(buf); > @@ -7032,6 +7064,13 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, > > xchg(&memcg->memory.max, max); > > + if (high_ratio && !memcg->memory_high_set) { > + /* Set memory.high as a fraction of memory.max */ > + unsigned long high = max * high_ratio / (high_ratio + 1); > + > + page_counter_set_high(&memcg->memory, high); > + } > + > for (;;) { > unsigned long nr_pages = page_counter_read(&memcg->memory); > > @@ -7977,6 +8016,8 @@ static int __init mem_cgroup_init(void) > soft_limit_tree.rb_tree_per_node[node] = rtpn; > } > > + memcg_sysctl_init(); > + > return 0; > } > subsys_initcall(mem_cgroup_init);