From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 799B9C4167B for ; Tue, 5 Dec 2023 09:07:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17C166B008C; Tue, 5 Dec 2023 04:07:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 12B6B6B0092; Tue, 5 Dec 2023 04:07:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F34C96B0093; Tue, 5 Dec 2023 04:07:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E13F06B008C for ; Tue, 5 Dec 2023 04:07:47 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id ACD2AA0B26 for ; Tue, 5 Dec 2023 09:07:47 +0000 (UTC) X-FDA: 81532186974.08.3AD643C Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf08.hostedemail.com (Postfix) with ESMTP id 6DB0F160008 for ; Tue, 5 Dec 2023 09:07:45 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.130 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701767265; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bXYWpxDEf6gm15Z1gUKQtRaKSVPcuT7vKCNfa9fLcVY=; b=sbkX60IdeNZWfE2L/ntNF3XE5Z1iJWAMyiOpkcOK1iomUGISt1KRGfY2vkyl+86dwZavZX RdcbJ1HjsSTD6MT9GF5fwlHF7I9fyxaiUqMNoWoTYwny4XvxfBCvptV+CEL9JpeJqh/Mby obJp66q5wh5x7X/EkV5SLN2vmXYt9jo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701767265; a=rsa-sha256; cv=none; b=aER17H6zh6gJkH6a+yg+J/xWwFe2FMQN8rbwME/qQFQ54rGJS6+SnJUmTs63j8JjWZSWSF PILeVKBYG9OELcKJ3eZAq43o+3lgD6C9K6rUBTYxV8ZEBvFMYpAJY4hK+7K049zWPDsZ+g TZnLNMTlJcfl7kEeT3i2irdQk542vT4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.130 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 6258222034; Tue, 5 Dec 2023 09:07:42 +0000 (UTC) Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 4648D136CF; Tue, 5 Dec 2023 09:07:42 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 78bYDl7obmUuSQAAD6G6ig (envelope-from ); Tue, 05 Dec 2023 09:07:42 +0000 Date: Tue, 5 Dec 2023 10:07:41 +0100 From: Michal Hocko To: David Finkel Cc: Muchun Song , core-services@vimeo.com, Jonathan Corbet , Roman Gushchin , Shakeel Butt , Shuah Khan , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers Message-ID: References: <20231204194156.2411672-1-davidf@vimeo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231204194156.2411672-1-davidf@vimeo.com> X-Spamd-Bar: ++++++++++++++++++ X-Stat-Signature: 5oqctf9oww9w9qkgzrt88p7dz5oft3xf X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 6DB0F160008 X-Rspam-User: X-HE-Tag: 1701767265-269885 X-HE-Meta: U2FsdGVkX1+itvgW/IP/wRIFxwuSYT8H2+DHob8QmkKPF4zM6nhmguFzXJ+TDvOdbr0nEIc+k7y7JVo0qXfANBZLxT2dgDZtQe4oZoUWm1UaYRBPfKQBPmzq1T5DUlFRKcV4Qk8vMLotKR+gYfXmNSzyAc1HDokaqNMUk0nllf8rHQgbY3iYbuiQd4elEHRA/eVtNVwnn1ebQ5N/+EtuO1ygZ2Lhi52U0mS2gyUrl7EkqrjbtFOot+gcwDK7eB1+5K8hCXNZPZQLT7INp8QhecwJ4KWY06nG7TTxiQmHjHZXAeY5vnkvXnLGEgrW0ZCg8JpVWB50Hyvfqz8eYcEmGAN0UPWZ0x00xQYv8xZfhfkobGGoxy43m/kkc1YvBjP0WiQ69gwGWf2fxlAYJYTNDPzk9ky2HnthANvkR3NRNEUqDzjJUX3f/2WzA2gGmavqzG1LzSKcVJUbM9/7xdi8i29A3WixXVQn1vnFIM8YtOh50Jq15Uwx7BU0Q5PBmznK3FXnVgVBJUsJ5Laa5B/6MYJ8heiPIbcF3gnx0vgCF74S4mGAw+rpaEwWwb0Zk2lpAQ9R5HkhZ8dQbk5P8DMq3k7wJOQVURUgQv+ZV7SrDgWsF4SxVKGd7Ny5JlaIEh4nNUy+8RbHvJUdfPDqpx1MCwSbQtnDyMjPPXdy/hdtlaMXhGqCWVG9Qu1XUeA6q2tvkBCanCxNUFz+H1AEAiU80YZdnyWlHdp2R4D+dEBMghlu2Lo+Gy0kNy+ha23XGxkaC3YcPeL3XVn/vrLbY79sA/ZxTtiVioD7u2m2vfBc5AnTHlpKCOtrKu/BLJ63UH/T78yznw4M8ABDKElIOZCG/hz4xbJBa2JjtAiH5hHCy6GTmN62V8KmiwmA3JKygwe2mvOCjuOOZ1hELjfFR+DIauns+fTkRF1yZYsaVRHgALWYH8n+ujsNicUMAqcxYecTaEGcxYcW5BS/ibHPdKH apz4g6Up TmnbaGe1WwIAsQ0xLHah1d3P/ny2Bbg9QvUdlW5uu00wAdKReEncJQkLsm0biXIXr/rL5e7Iwv9LD0GjoVHrCRaA3SJj8U9uLUjTF0zutnvY3CGRIXj1k1QxjwrLz+YIrsfXY8aynKUTELv4WSb++mgKmFS0+EGxM19m9ljCW+tlNDTvmd7vhtmxkqDBtzyYchmW1fbpLQpXAHGrLSJMPZBC52ev6UN2RuWlB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon 04-12-23 14:41:56, David Finkel wrote: > Other mechanisms for querying the peak memory usage of either a process > or v1 memory cgroup allow for resetting the high watermark. Restore > parity with those mechanisms. > > For example: > - Any write to memory.max_usage_in_bytes in a cgroup v1 mount resets > the high watermark. > - writing "5" to the clear_refs pseudo-file in a processes's proc > directory resets the peak RSS. > > This change copies the cgroup v1 behavior so any write to the > memory.peak and memory.swap.peak pseudo-files reset the high watermark > to the current usage. > > This behavior is particularly useful for work scheduling systems that > need to track memory usage of worker processes/cgroups per-work-item. > Since memory can't be squeezed like CPU can (the OOM-killer has > opinions), these systems need to track the peak memory usage to compute > system/container fullness when binpacking workitems. I do not understand the OOM-killer reference here but I do understand that your worker reuses a cgroup and you want a peak memory consumption of a single run to better profile/configure the memcg configuration for the specific worker type. Correct? > Signed-off-by: David Finkel Makes sense to me Acked-by: Michal Hocko Thanks! > --- > Documentation/admin-guide/cgroup-v2.rst | 20 +++--- > mm/memcontrol.c | 23 ++++++ > .../selftests/cgroup/test_memcontrol.c | 72 ++++++++++++++++--- > 3 files changed, 99 insertions(+), 16 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 3f85254f3cef..95af0628dc44 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1305,11 +1305,13 @@ PAGE_SIZE multiple when read back. > reclaim induced by memory.reclaim. > > memory.peak > - A read-only single value file which exists on non-root > - cgroups. > + A read-write single value file which exists on non-root cgroups. > + > + The max memory usage recorded for the cgroup and its descendants since > + either the creation of the cgroup or the most recent reset. > > - The max memory usage recorded for the cgroup and its > - descendants since the creation of the cgroup. > + Any non-empty write to this file resets it to the current memory usage. > + All content written is completely ignored. > > memory.oom.group > A read-write single value file which exists on non-root > @@ -1626,11 +1628,13 @@ PAGE_SIZE multiple when read back. > Healthy workloads are not expected to reach this limit. > > memory.swap.peak > - A read-only single value file which exists on non-root > - cgroups. > + A read-write single value file which exists on non-root cgroups. > + > + The max swap usage recorded for the cgroup and its descendants since > + the creation of the cgroup or the most recent reset. > > - The max swap usage recorded for the cgroup and its > - descendants since the creation of the cgroup. > + Any non-empty write to this file resets it to the current swap usage. > + All content written is completely ignored. > > memory.swap.max > A read-write single value file which exists on non-root > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 1c1061df9cd1..b04af158922d 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -25,6 +25,7 @@ > * Copyright (C) 2020 Alibaba, Inc, Alex Shi > */ > > +#include > #include > #include > #include > @@ -6635,6 +6636,16 @@ static u64 memory_peak_read(struct cgroup_subsys_state *css, > return (u64)memcg->memory.watermark * PAGE_SIZE; > } > > +static ssize_t memory_peak_write(struct kernfs_open_file *of, > + char *buf, size_t nbytes, loff_t off) > +{ > + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); > + > + page_counter_reset_watermark(&memcg->memory); > + > + return nbytes; > +} > + > static int memory_min_show(struct seq_file *m, void *v) > { > return seq_puts_memcg_tunable(m, > @@ -6947,6 +6958,7 @@ static struct cftype memory_files[] = { > .name = "peak", > .flags = CFTYPE_NOT_ON_ROOT, > .read_u64 = memory_peak_read, > + .write = memory_peak_write, > }, > { > .name = "min", > @@ -7917,6 +7929,16 @@ static u64 swap_peak_read(struct cgroup_subsys_state *css, > return (u64)memcg->swap.watermark * PAGE_SIZE; > } > > +static ssize_t swap_peak_write(struct kernfs_open_file *of, > + char *buf, size_t nbytes, loff_t off) > +{ > + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); > + > + page_counter_reset_watermark(&memcg->swap); > + > + return nbytes; > +} > + > static int swap_high_show(struct seq_file *m, void *v) > { > return seq_puts_memcg_tunable(m, > @@ -7999,6 +8021,7 @@ static struct cftype swap_files[] = { > .name = "swap.peak", > .flags = CFTYPE_NOT_ON_ROOT, > .read_u64 = swap_peak_read, > + .write = swap_peak_write, > }, > { > .name = "swap.events", > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c > index c7c9572003a8..0326c317f1f2 100644 > --- a/tools/testing/selftests/cgroup/test_memcontrol.c > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c > @@ -161,12 +161,12 @@ static int alloc_pagecache_50M_check(const char *cgroup, void *arg) > /* > * This test create a memory cgroup, allocates > * some anonymous memory and some pagecache > - * and check memory.current and some memory.stat values. > + * and checks memory.current, memory.peak, and some memory.stat values. > */ > -static int test_memcg_current(const char *root) > +static int test_memcg_current_peak(const char *root) > { > int ret = KSFT_FAIL; > - long current; > + long current, peak, peak_reset; > char *memcg; > > memcg = cg_name(root, "memcg_test"); > @@ -180,12 +180,32 @@ static int test_memcg_current(const char *root) > if (current != 0) > goto cleanup; > > + peak = cg_read_long(memcg, "memory.peak"); > + if (peak != 0) > + goto cleanup; > + > if (cg_run(memcg, alloc_anon_50M_check, NULL)) > goto cleanup; > > + peak = cg_read_long(memcg, "memory.peak"); > + if (peak < MB(50)) > + goto cleanup; > + > + peak_reset = cg_write(memcg, "memory.peak", "\n"); > + if (peak_reset != 0) > + goto cleanup; > + > + peak = cg_read_long(memcg, "memory.peak"); > + if (peak > MB(30)) > + goto cleanup; > + > if (cg_run(memcg, alloc_pagecache_50M_check, NULL)) > goto cleanup; > > + peak = cg_read_long(memcg, "memory.peak"); > + if (peak < MB(50)) > + goto cleanup; > + > ret = KSFT_PASS; > > cleanup: > @@ -815,13 +835,14 @@ static int alloc_anon_50M_check_swap(const char *cgroup, void *arg) > > /* > * This test checks that memory.swap.max limits the amount of > - * anonymous memory which can be swapped out. > + * anonymous memory which can be swapped out. Additionally, it verifies that > + * memory.swap.peak reflects the high watermark and can be reset. > */ > -static int test_memcg_swap_max(const char *root) > +static int test_memcg_swap_max_peak(const char *root) > { > int ret = KSFT_FAIL; > char *memcg; > - long max; > + long max, peak; > > if (!is_swap_enabled()) > return KSFT_SKIP; > @@ -838,6 +859,12 @@ static int test_memcg_swap_max(const char *root) > goto cleanup; > } > > + if (cg_read_long(memcg, "memory.swap.peak")) > + goto cleanup; > + > + if (cg_read_long(memcg, "memory.peak")) > + goto cleanup; > + > if (cg_read_strcmp(memcg, "memory.max", "max\n")) > goto cleanup; > > @@ -860,6 +887,27 @@ static int test_memcg_swap_max(const char *root) > if (cg_read_key_long(memcg, "memory.events", "oom_kill ") != 1) > goto cleanup; > > + peak = cg_read_long(memcg, "memory.peak"); > + if (peak < MB(29)) > + goto cleanup; > + > + peak = cg_read_long(memcg, "memory.swap.peak"); > + if (peak < MB(29)) > + goto cleanup; > + > + if (cg_write(memcg, "memory.swap.peak", "\n")) > + goto cleanup; > + > + if (cg_read_long(memcg, "memory.swap.peak") > MB(10)) > + goto cleanup; > + > + > + if (cg_write(memcg, "memory.peak", "\n")) > + goto cleanup; > + > + if (cg_read_long(memcg, "memory.peak")) > + goto cleanup; > + > if (cg_run(memcg, alloc_anon_50M_check_swap, (void *)MB(30))) > goto cleanup; > > @@ -867,6 +915,14 @@ static int test_memcg_swap_max(const char *root) > if (max <= 0) > goto cleanup; > > + peak = cg_read_long(memcg, "memory.peak"); > + if (peak < MB(29)) > + goto cleanup; > + > + peak = cg_read_long(memcg, "memory.swap.peak"); > + if (peak < MB(19)) > + goto cleanup; > + > ret = KSFT_PASS; > > cleanup: > @@ -1293,7 +1349,7 @@ struct memcg_test { > const char *name; > } tests[] = { > T(test_memcg_subtree_control), > - T(test_memcg_current), > + T(test_memcg_current_peak), > T(test_memcg_min), > T(test_memcg_low), > T(test_memcg_high), > @@ -1301,7 +1357,7 @@ struct memcg_test { > T(test_memcg_max), > T(test_memcg_reclaim), > T(test_memcg_oom_events), > - T(test_memcg_swap_max), > + T(test_memcg_swap_max_peak), > T(test_memcg_sock), > T(test_memcg_oom_group_leaf_events), > T(test_memcg_oom_group_parent_events), > -- > 2.39.2 -- Michal Hocko SUSE Labs