From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91264C2BD09 for ; Mon, 24 Jun 2024 08:38:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 090C76B0116; Mon, 24 Jun 2024 04:38:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0196D6B03FC; Mon, 24 Jun 2024 04:38:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFBDA6B03FD; Mon, 24 Jun 2024 04:38:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C18D96B0116 for ; Mon, 24 Jun 2024 04:38:01 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 741BB40CB0 for ; Mon, 24 Jun 2024 08:38:01 +0000 (UTC) X-FDA: 82265129562.18.EC873D3 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf27.hostedemail.com (Postfix) with ESMTP id 31CCE40010 for ; Mon, 24 Jun 2024 08:37:58 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=RLR+OlB4; dkim=pass header.d=suse.com header.s=susede1 header.b=RLR+OlB4; spf=pass (imf27.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.130 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719218266; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=98fTMwEf+6SsUi7Nji/bTtxwDz8Dltlq3IXSCx/PIwQ=; b=O2bV3StJLzaJukU/cIOZwV+JZGrUTRXiK1jkLwBzmE1f0Cd4m9l+ThJOVsecfpa6VH2Xw7 yjd4W6VMfatG34ZsTveQSakVYtT6OJVcWyQEqVYA6MrHphHBdmDtEKhHifUjVqCTWytaxR kJpDjNMSo3xB8GgzFjmRjq6jLhb/CYQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719218266; a=rsa-sha256; cv=none; b=vN3y54pIdEujIhI/bhlm+r/tSXE357gSblEMm//FJVZG0bBcfuXAdVcqLQbL2sF+aOg8vd jqxChKywVW6cp5g9F1GDIxsneQbSH9FMwO4QauTJBPCNsbYqarNCZpG58glNUcEnW9r+AS v3ysfoz3uOwMX0wB1txpym3SObVIW80= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=RLR+OlB4; dkim=pass header.d=suse.com header.s=susede1 header.b=RLR+OlB4; spf=pass (imf27.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.130 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 8559821A3E; Mon, 24 Jun 2024 08:37:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1719218277; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=98fTMwEf+6SsUi7Nji/bTtxwDz8Dltlq3IXSCx/PIwQ=; b=RLR+OlB4jwmNTGBze3tJl5MW65PTZYFf+JXs6GPrEHdmbeqUuedS1FCS1kr3POVdUXPGZa BK6BTaMnjZvxHxO++x/+gdKVbiJWKA9J2T3KkVJS091JZq7WuBm3yEf3WLbZ7edp1w9vbX nlSgKm57tOD7MCnDpqlUynBVxD32YCU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1719218277; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=98fTMwEf+6SsUi7Nji/bTtxwDz8Dltlq3IXSCx/PIwQ=; b=RLR+OlB4jwmNTGBze3tJl5MW65PTZYFf+JXs6GPrEHdmbeqUuedS1FCS1kr3POVdUXPGZa BK6BTaMnjZvxHxO++x/+gdKVbiJWKA9J2T3KkVJS091JZq7WuBm3yEf3WLbZ7edp1w9vbX nlSgKm57tOD7MCnDpqlUynBVxD32YCU= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 6815513AA4; Mon, 24 Jun 2024 08:37:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 0pNeF2UweWbxFAAAD6G6ig (envelope-from ); Mon, 24 Jun 2024 08:37:57 +0000 Date: Mon, 24 Jun 2024 10:37:52 +0200 From: Michal Hocko To: Waiman Long Cc: Johannes Weiner , Roman Gushchin , Muchun Song , Andrew Morton , Jonathan Corbet , Shakeel Butt , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Alex Kalenyuk , Peter Hunt , linux-doc@vger.kernel.org Subject: Re: [PATCH] memcg: Add a new sysctl parameter for automatically setting memory.high Message-ID: References: <20240623204514.1032662-1-longman@redhat.com> <77d4299e-e1ee-4471-9b53-90957daa984d@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <77d4299e-e1ee-4471-9b53-90957daa984d@redhat.com> X-Stat-Signature: 17ajfzjpe6gx6q4yitnqfitpsx6o18pk X-Rspamd-Queue-Id: 31CCE40010 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1719218278-243183 X-HE-Meta: U2FsdGVkX18QVmYKOMca5rBIYP+tYmg+iHNZBCv3RyLfXsWcfARA5Wd+QnIgW0JW7zAzrmbm3cDcT9NYLIWyIhM4L4hxVH7rjAqUNoGTMc8AIhtr44XMdzwt9ovH8qkeY6SKMJnySKZ9NfUD6jodzBR//7LTUep37NshVr1AFRZr42X4Arw/D03FbNtpyZo+00gQjrGfEMq8y5NiACXt9iAVuAX/kEkZVJgFyAdUBzJBa2rFVE13d2Oim0fqjlR7ItDI81sPbKdL9ChdmTylyWMKVb0YTw3HCMieEyw0KfydImkt9N9RmoNkU+P/DPqkpP6T3FBE5NXlXAZQU7Eg3CIT9AxMg0905CnbdKpKU094HimK/2i4wB41WsqAGaj4T6HlFwH9nILXJPkRsjESBwcb0CZUgrKEb5QZlOhXFOiqxyEoxGEUAXFCjxZ8S8I8fqFvgUFam/PuxloGANe4I8MxpzNfxPUtK2H8w5qLlHS7FmXlsaWHYI+/sgUXubDmSWLiZmOneX7TPDFdt9Ji09tLk0mA6LZlUZFXnH0QrAFuePRwFk/zY7rV4hcLiOJM61bIZJ4IXxF8squS/A9lyZ3IiA/g8WmVMzjfoGgaBprk6KrR+BYPtRmwYv6eow6z5kpyh4r9SQOKbotBBPbPi/dt/p3j5dCU8MOkdukRRrV8pa0zRKdty9CFZDwVATIpmW8VMNwG9ldx2YryZPz1PHNWqcHxMfsZW4xGcoX7YP6H4CuNSsFZd5cvBIx8OBs2/m6ioxQGvHg4ssA+Smo8BA4khYQ5NpP32gLhHvtdUWsJYFk0RGnqLJ2H85jzeHU/pL1WOeFIV2AiT8d6giXEwZoNMxDTD9C/kjDwq2/e7tE7y+CyWrT6fFx9GUge1FA4Lo17CLwCV4lt+JBXSC1cu5RPmgZLMNw0/Vxvd6DZ6dd30FNfbnRvgXA6I+ThrbgprfjS5UyH9LweoD0FOIC /t+ZGxUg rOqQQrsrmhV258SAHb/tO5CIfAKxkLgNzgMKonWQ3ZNpZ5uZof2/H4ijllliG315nfQbseqIXjdB+vVANclMJg6ys5UloyHCPwS+Dye4HSr1ty/r/XLntW8jVGWUyqjzTb1Eh2kMRTT9ijSHSAOESYHIlkcInoQNyPK3EEEf9O/Dmud+m3ex3rNbAZai3ocJxGLJ2wghMGmk1oV525M2PCeIBxwnzNIbWsUOElrIROO3TTLJpzfG54o46j7Q8LSe+ABrXvpkH+jkAvVZdSwn9Jmjddy3Ax0tmfz3sXZNL8502I9rzYfn4aUYL4iIcj/KKujbXiZHarKml1ZwbvXWyJ6+6/g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun 23-06-24 16:52:00, Waiman Long wrote: > Correct some email addresses. > > On 6/23/24 16:45, Waiman Long wrote: > > With memory cgroup v1, there is only a single "memory.limit_in_bytes" > > to be set to specify the maximum amount of memory that is allowed to > > be used. So a lot of memory cgroup using tools and applications allow > > users to specify a single memory limit. When they migrate to cgroup > > v2, they use the given memory limit to set memory.max and disregard > > memory.high for the time being. > > > > Without properly setting memory.high, these user space applications > > cannot make use of the memory cgroup v2 ability to further reduce the > > chance of OOM kills by throttling and early memory reclaim. > > > > This patch adds a new sysctl parameter "vm/memory_high_autoset_ratio" > > to enable setting "memory.high" automatically whenever "memory.max" is > > set as long as "memory.high" hasn't been explicitly set before. This > > will allow a system administrator or a middleware layer to greatly > > reduce the chance of memory cgroup OOM kills without worrying about > > how to properly set memory.high. > > > > The new sysctl parameter will allow a range of 0-100. The default value > > of 0 will disable memory.high auto setting. For any non-zero value "n", > > the actual ratio used will be "n/(n+1)". A user cannot set a fraction > > less than 1/2. I am sorry but this is a bad idea. It is also completely unnecessary. If somebody goes all the way to set the hard limit there is no reason to not set the high limit along the way. I see a zero reason to make a global hard coded policy for something like that. Not to mention that %age is a really bad interface as it gets hugely impractical with large %limits. > > > > Signed-off-by: Waiman Long Nacked-by: Michal Hocko > > --- > > Documentation/admin-guide/sysctl/vm.rst | 10 ++++++ > > include/linux/memcontrol.h | 3 ++ > > mm/memcontrol.c | 41 +++++++++++++++++++++++++ > > 3 files changed, 54 insertions(+) > > > > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst > > index e86c968a7a0e..250ec39dd5af 100644 > > --- a/Documentation/admin-guide/sysctl/vm.rst > > +++ b/Documentation/admin-guide/sysctl/vm.rst > > @@ -46,6 +46,7 @@ Currently, these files are in /proc/sys/vm: > > - mem_profiling (only if CONFIG_MEM_ALLOC_PROFILING=y) > > - memory_failure_early_kill > > - memory_failure_recovery > > +- memory_high_autoset_ratio > > - min_free_kbytes > > - min_slab_ratio > > - min_unmapped_ratio > > @@ -479,6 +480,15 @@ Enable memory failure recovery (when supported by the platform) > > 0: Always panic on a memory failure. > > +memory_high_autoset_ratio > > +========================= > > + > > +Specify a ratio by which memory.high should be set as a fraction of > > +memory.max if it hasn't been explicitly set before. It allows a range > > +of 0-100. The default value of 0 means auto setting will be disabled. > > +For any non-zero value "n", the actual ratio used will be "n/(n+1)". > > + > > + > > min_free_kbytes > > =============== > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 030d34e9d117..6be161a6b922 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -221,6 +221,9 @@ struct mem_cgroup { > > */ > > bool oom_group; > > + /* %true if memory.high has been explicitly set */ > > + bool memory_high_set; > > + > > /* protected by memcg_oom_lock */ > > bool oom_lock; > > int under_oom; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 71fe2a95b8bd..2cfb000bf543 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -48,6 +48,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -6889,6 +6890,35 @@ static void mem_cgroup_attach(struct cgroup_taskset *tset) > > } > > #endif > > +/* > > + * The memory.high autoset ratio specifies a ratio by which memory.high > > + * should be set as a fraction of memory.max if it hasn't been explicitly > > + * set before. The default value of 0 means auto setting will be disabled. > > + * For any non-zero value "n", the actual ratio is "n/(n+1)". > > + */ > > +static int sysctl_memory_high_autoset_ratio; > > + > > +#ifdef CONFIG_SYSCTL > > +static struct ctl_table memcg_table[] = { > > + { > > + .procname = "memory_high_autoset_ratio", > > + .data = &sysctl_memory_high_autoset_ratio, > > + .maxlen = sizeof(int), > > + .mode = 0644, > > + .proc_handler = proc_dointvec_minmax, > > + .extra1 = SYSCTL_ZERO, > > + .extra2 = SYSCTL_ONE_HUNDRED, > > + }, > > +}; > > + > > +static inline void memcg_sysctl_init(void) > > +{ > > + register_sysctl_init("vm", memcg_table); > > +} > > +#else > > +static void memcg_sysctl_init(void) { } > > +#endif /* CONFIG_SYSCTL */ > > + > > static int seq_puts_memcg_tunable(struct seq_file *m, unsigned long value) > > { > > if (value == PAGE_COUNTER_MAX) > > @@ -6982,6 +7012,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > > return err; > > page_counter_set_high(&memcg->memory, high); > > + memcg->memory_high_set = true; > > for (;;) { > > unsigned long nr_pages = page_counter_read(&memcg->memory); > > @@ -7023,6 +7054,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, > > unsigned int nr_reclaims = MAX_RECLAIM_RETRIES; > > bool drained = false; > > unsigned long max; > > + unsigned int high_ratio = sysctl_memory_high_autoset_ratio; > > int err; > > buf = strstrip(buf); > > @@ -7032,6 +7064,13 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, > > xchg(&memcg->memory.max, max); > > + if (high_ratio && !memcg->memory_high_set) { > > + /* Set memory.high as a fraction of memory.max */ > > + unsigned long high = max * high_ratio / (high_ratio + 1); > > + > > + page_counter_set_high(&memcg->memory, high); > > + } > > + > > for (;;) { > > unsigned long nr_pages = page_counter_read(&memcg->memory); > > @@ -7977,6 +8016,8 @@ static int __init mem_cgroup_init(void) > > soft_limit_tree.rb_tree_per_node[node] = rtpn; > > } > > + memcg_sysctl_init(); > > + > > return 0; > > } > > subsys_initcall(mem_cgroup_init); -- Michal Hocko SUSE Labs