From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA91AC3DA59 for ; Tue, 16 Jul 2024 07:20:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 065436B0085; Tue, 16 Jul 2024 03:20:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 014D66B0088; Tue, 16 Jul 2024 03:20:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF7C76B0089; Tue, 16 Jul 2024 03:20:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BBA256B0085 for ; Tue, 16 Jul 2024 03:20:13 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4D8301C0409 for ; Tue, 16 Jul 2024 07:20:13 +0000 (UTC) X-FDA: 82344767106.26.0F5C485 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf29.hostedemail.com (Postfix) with ESMTP id 2E56C12000B for ; Tue, 16 Jul 2024 07:20:10 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=RAKSQ3nL; spf=pass (imf29.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721114392; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ryz13jWMVfVyvMNCAq2D9GKM+ZrB7awLhm0ebTfWLrE=; b=Ny5isIrpW6OKFVwesmkSlqkLGRefa/GBZWPs2IsJstM+ErUK8rg/AWvwy7ApzJt0vbjc+3 sk9EmE5zyuq6qAYsnL8CmVKfRpQO/JA3toeTP2wINe8CCWCXMj9Luduls71ZK6XSDdgggn nqChoBR6Zf8YDjNS6+BzM+druP2BTeY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=RAKSQ3nL; spf=pass (imf29.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721114392; a=rsa-sha256; cv=none; b=7FjnQVDoHlMeRxVRw23taqMS+yICQbehhXfr4xn3/H4VoH/OV3V1xyAxo4G2E2rt2l2LFi 2TBBcITLG2g6iXoi8G6j1PvRNxWOVl9rGc0S1t7I1V8Hmuh5JcTd4nxvnoGkQR2rtrbxrx fMxyYxHv855GX1zGiPy8kuWO7kE8dHk= Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-a77cc73d35fso1069295566b.0 for ; Tue, 16 Jul 2024 00:20:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1721114409; x=1721719209; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=Ryz13jWMVfVyvMNCAq2D9GKM+ZrB7awLhm0ebTfWLrE=; b=RAKSQ3nLzSy0tI2PLB3LKfYDT1WIfvtJ9hcTpjv66CNSNWiwr0OyW21RhJ2dE7ac+V JFDOANJifdlnJIZAEb+Zyyb+2oC6fWcBvSsgnxZAvCfoHPN7glF8+nYppuW+wNRaCL4B /ovfRVL6q2A+I1BJrkAQfXbP9j6o/ZOqDDF0JActpQydi76oDNlebAYhY1ZqRNviORZC /cfdsi3dNafcJPj8dtAmoS9L7PHgBAH+wvDBf96KJXVrhR6DVX2oMZmpPbMi6oH05XpR 1/+16DfRKrZvDKg0UXs3cMsZFaSuMZPiNqCZdn0VOq793r/TpLcHIj932m7p4xtVLx5I XMZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721114409; x=1721719209; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ryz13jWMVfVyvMNCAq2D9GKM+ZrB7awLhm0ebTfWLrE=; b=ZblfuyRTBB96Di9rXvk8oU0yYhMGjqir/sGe1t3Fm0jZjQw304t1Y82ErlHD6hw+dN T/Ha3stsOtk0IRDpYPeaCmQns41ruky4Q01eufwWKC/5cKhDcSbLIa644lux3Sjy3JuL HMwavJgVIOVDrA9QawC272FE08CMovSmf77/ePmQ73sjNDwi20HqNxc0ur7OhXU3GaSz D66owFNpKC9YV1xV6JXmNwNCiZrYaeWviT8RBqjYXQZvzSZaEVIkyKUGj4cbzhEtCsOe 1TOjC+Eho5zaUkFgw+Joma7Z90j0ruEOtgZ9fxnnNV5oLzUuLP0ABbPRMS1UjuuCsQAR 93Hw== X-Forwarded-Encrypted: i=1; AJvYcCWG4NOws+9ID2JPO53hH5YhcxyJUAy5SOMAs5w1h8NRpJ0eIqWA5z2/0KLB6Bo7t0LHCB3JeFRfiWz5ISHuTiqpgW0= X-Gm-Message-State: AOJu0YzbFodEWG5rD5VBQN6G8nYBr8TIInmkB2F7vkZ0DyFW2F0xCAwZ 5Ra0vHw58o9vWiIxycgE8FdVgmghgGgT+IZAhSyyDqvfjTLUEK9mVWYvfz1HwoA= X-Google-Smtp-Source: AGHT+IF7KAqVc8tB1cq8XGEqVn4hBgytdxCDM1K+wyDBEAdRIboXhu5uwo7R0OApSSiVBd/NbepQxQ== X-Received: by 2002:a17:906:4a4d:b0:a72:afd9:6109 with SMTP id a640c23a62f3a-a79edbfb0c9mr77909566b.16.1721114409305; Tue, 16 Jul 2024 00:20:09 -0700 (PDT) Received: from localhost (109-81-86-75.rct.o2.cz. [109.81.86.75]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a79bc5d1f57sm275518966b.72.2024.07.16.00.20.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Jul 2024 00:20:09 -0700 (PDT) Date: Tue, 16 Jul 2024 09:20:08 +0200 From: Michal Hocko To: David Finkel Cc: Muchun Song , Andrew Morton , core-services@vimeo.com, Jonathan Corbet , Roman Gushchin , Shuah Khan , Johannes Weiner , Tejun Heo , Zefan Li , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Shakeel Butt Subject: Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers Message-ID: References: <20240715203625.1462309-1-davidf@vimeo.com> <20240715203625.1462309-2-davidf@vimeo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 2E56C12000B X-Stat-Signature: yqp7dbkkb61tbx6uhdgdkxh75c43kfa6 X-HE-Tag: 1721114410-808812 X-HE-Meta: U2FsdGVkX19dL/QbcMpayNEv07dYTNMoJRUus/65aQhEU0VDcHCeUlDbmW/MyY+xaIwv4wFTTA6ZNjdwvyLk4FyrnTvl8l4oS1RgCWloCN/38v8+VT9cENlxEYakHAC1vDWxNX6gsae5S/Oa06rgRSinVFlk6LDHIm3JqKNdaMXWiby3EmwSVLcynLiLTwHKAh7R9M9wJ6TcWIxLJvpve2QqwTqiqznUN/neBpixix+8lT6hu8F5P6Tb/D5JmbuBcs1N+U/qOXcpxmiLoxbzHuhcCkbEGx9uqM3W8C6RhHMqyrXRaIoYhR1SOcCFqilj4tyloAlT52E9+ki3VvQRZLhvNmwajn/2Bvv94aEGPOinq3xAegEzxI1DnmG4Iqa7PqpmFHuHQqnd1+j125wlFzNp/3ubVnTFZfZx2l2zFqcAjKayf+CRDO6pCWpDySrPGAQNZGmsIjtNK/OgVMDfkm4W5D0sAUehfWqwmHkAb8+052rn9ybo4TIb5xFDS1ZSx4ikELl4n5854QWLdwDO40ymhULn1aIrycIvxO0OBLHvf6GySY2ePJbQKIlQ6Bbe4tuH8ajggUZs8iFQY48MM0C85/zqbcZNZNmi4nSLPzJKj7+wj0oepTADKVqVMYPn0PoOj3MylvavxE6xCdPJySHJeJG4XWTcHHDu2haGbbGV574Uana4Z0bbFIFM4sayQh/q4g2DHVKydHNMmCLrfOmsZOx4w2Eu35koe778NS2vNwFQqv1xSes8gXh6nbrBkOolTj74aDfRkBSS1DqT2l3wz2EBgLfHDUeJQI41RK/i5NmQFNdmqptCBvvfVUXfM8mezO5E0Ho4/Mbymbt1fx73g1BvblQMZOZbT12JR6JX66OBy0PvGXP+eI9oO6usP0i7ACIWQODvEqUeU8rS/xj+OyES0FNINfeKFwrZKqRF4oOwaQGBrCrxU04YNC0ZnvP76hJ0OP0+RnXGlrB MFZ2X6ca Uu5iD/gDMS7yrcA/xNSSYOSLEUS6wL92A1XboNzj1gU+QA8rRfAYD0y5dACdtfvQ9hATn3aTdj8tf21s9c5hYecFoj7s688N8YMzN8hMqt3JZ2F5zejWkQY8IyBdudjzcmgocMjHDIwjPW+lMacpX5twB3kwRO+FTf3Di+SxBCKQpIFnv1lcHsiER1z5zXZsH9Xmlddzw+ZK/CBjzOofxRFcg7VK7u3EK1ZatTE1fqrJilbdsBmZBNLBUKwas8RfK3FwO8KqhYYfCu+JCYlQU/WlBs36U73RqHwNMh/jdpAAm15kGB1xpetig0/P4YkQo6wHgGXQxm0qVJhdVfcF3x5Bzz/LRBx+fGPyvXpftIVwbAGFHAJvOZCmivw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon 15-07-24 16:46:36, David Finkel wrote: > > On Mon, Jul 15, 2024 at 4:38 PM David Finkel wrote: > > > > > > Other mechanisms for querying the peak memory usage of either a process > > > or v1 memory cgroup allow for resetting the high watermark. Restore > > > parity with those mechanisms. > > > > > > For example: > > > - Any write to memory.max_usage_in_bytes in a cgroup v1 mount resets > > > the high watermark. > > > - writing "5" to the clear_refs pseudo-file in a processes's proc > > > directory resets the peak RSS. > > > > > > This change copies the cgroup v1 behavior so any write to the > > > memory.peak and memory.swap.peak pseudo-files reset the high watermark > > > to the current usage. > > > > > > This behavior is particularly useful for work scheduling systems that > > > need to track memory usage of worker processes/cgroups per-work-item. > > > Since memory can't be squeezed like CPU can (the OOM-killer has > > > opinions), I do not understand the OOM-killer reference here. Why does it matter? Could you explain please? > > > these systems need to track the peak memory usage to compute > > > system/container fullness when binpacking workitems. Could you elaborate some more on how you are using this please? I expect you recycle memcgs for different runs of workers and reset peak consumptions before a new run and record it after it is done. The thing which is not really clear to me is how the peak value really helps if it can vary a lot among different runs. But maybe I misunderstand. > > > > > > Signed-off-by: David Finkel > > > --- > > > Documentation/admin-guide/cgroup-v2.rst | 20 +++--- > > > mm/memcontrol.c | 23 ++++++ > > > .../selftests/cgroup/test_memcontrol.c | 72 ++++++++++++++++--- > > > 3 files changed, 99 insertions(+), 16 deletions(-) > > > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > > > index 8fbb0519d556..201d8e5d9f82 100644 > > > --- a/Documentation/admin-guide/cgroup-v2.rst > > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > > @@ -1322,11 +1322,13 @@ PAGE_SIZE multiple when read back. > > > reclaim induced by memory.reclaim. > > > > > > memory.peak > > > - A read-only single value file which exists on non-root > > > - cgroups. > > > + A read-write single value file which exists on non-root cgroups. > > > + > > > + The max memory usage recorded for the cgroup and its descendants since > > > + either the creation of the cgroup or the most recent reset. > > > > > > - The max memory usage recorded for the cgroup and its > > > - descendants since the creation of the cgroup. > > > + Any non-empty write to this file resets it to the current memory usage. > > > + All content written is completely ignored. > > > > > > memory.oom.group > > > A read-write single value file which exists on non-root > > > @@ -1652,11 +1654,13 @@ PAGE_SIZE multiple when read back. > > > Healthy workloads are not expected to reach this limit. > > > > > > memory.swap.peak > > > - A read-only single value file which exists on non-root > > > - cgroups. > > > + A read-write single value file which exists on non-root cgroups. > > > + > > > + The max swap usage recorded for the cgroup and its descendants since > > > + the creation of the cgroup or the most recent reset. > > > > > > - The max swap usage recorded for the cgroup and its > > > - descendants since the creation of the cgroup. > > > + Any non-empty write to this file resets it to the current swap usage. > > > + All content written is completely ignored. > > > > > > memory.swap.max > > > A read-write single value file which exists on non-root > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 8f2f1bb18c9c..abfa547615d6 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -25,6 +25,7 @@ > > > * Copyright (C) 2020 Alibaba, Inc, Alex Shi > > > */ > > > > > > +#include > > > #include > > > #include > > > #include > > > @@ -6915,6 +6916,16 @@ static u64 memory_peak_read(struct cgroup_subsys_state *css, > > > return (u64)memcg->memory.watermark * PAGE_SIZE; > > > } > > > > > > +static ssize_t memory_peak_write(struct kernfs_open_file *of, > > > + char *buf, size_t nbytes, loff_t off) > > > +{ > > > + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); > > > + > > > + page_counter_reset_watermark(&memcg->memory); > > > + > > > + return nbytes; > > > +} > > > + > > > static int memory_min_show(struct seq_file *m, void *v) > > > { > > > return seq_puts_memcg_tunable(m, > > > @@ -7232,6 +7243,7 @@ static struct cftype memory_files[] = { > > > .name = "peak", > > > .flags = CFTYPE_NOT_ON_ROOT, > > > .read_u64 = memory_peak_read, > > > + .write = memory_peak_write, > > > }, > > > { > > > .name = "min", > > > @@ -8201,6 +8213,16 @@ static u64 swap_peak_read(struct cgroup_subsys_state *css, > > > return (u64)memcg->swap.watermark * PAGE_SIZE; > > > } > > > > > > +static ssize_t swap_peak_write(struct kernfs_open_file *of, > > > + char *buf, size_t nbytes, loff_t off) > > > +{ > > > + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); > > > + > > > + page_counter_reset_watermark(&memcg->swap); > > > + > > > + return nbytes; > > > +} > > > + > > > static int swap_high_show(struct seq_file *m, void *v) > > > { > > > return seq_puts_memcg_tunable(m, > > > @@ -8283,6 +8305,7 @@ static struct cftype swap_files[] = { > > > .name = "swap.peak", > > > .flags = CFTYPE_NOT_ON_ROOT, > > > .read_u64 = swap_peak_read, > > > + .write = swap_peak_write, > > > }, > > > { > > > .name = "swap.events", > > > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c > > > index 41ae8047b889..681972de673b 100644 > > > --- a/tools/testing/selftests/cgroup/test_memcontrol.c > > > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c > > > @@ -161,12 +161,12 @@ static int alloc_pagecache_50M_check(const char *cgroup, void *arg) > > > /* > > > * This test create a memory cgroup, allocates > > > * some anonymous memory and some pagecache > > > - * and check memory.current and some memory.stat values. > > > + * and checks memory.current, memory.peak, and some memory.stat values. > > > */ > > > -static int test_memcg_current(const char *root) > > > +static int test_memcg_current_peak(const char *root) > > > { > > > int ret = KSFT_FAIL; > > > - long current; > > > + long current, peak, peak_reset; > > > char *memcg; > > > > > > memcg = cg_name(root, "memcg_test"); > > > @@ -180,12 +180,32 @@ static int test_memcg_current(const char *root) > > > if (current != 0) > > > goto cleanup; > > > > > > + peak = cg_read_long(memcg, "memory.peak"); > > > + if (peak != 0) > > > + goto cleanup; > > > + > > > if (cg_run(memcg, alloc_anon_50M_check, NULL)) > > > goto cleanup; > > > > > > + peak = cg_read_long(memcg, "memory.peak"); > > > + if (peak < MB(50)) > > > + goto cleanup; > > > + > > > + peak_reset = cg_write(memcg, "memory.peak", "\n"); > > > + if (peak_reset != 0) > > > + goto cleanup; > > > + > > > + peak = cg_read_long(memcg, "memory.peak"); > > > + if (peak > MB(30)) > > > + goto cleanup; > > > + > > > if (cg_run(memcg, alloc_pagecache_50M_check, NULL)) > > > goto cleanup; > > > > > > + peak = cg_read_long(memcg, "memory.peak"); > > > + if (peak < MB(50)) > > > + goto cleanup; > > > + > > > ret = KSFT_PASS; > > > > > > cleanup: > > > @@ -817,13 +837,14 @@ static int alloc_anon_50M_check_swap(const char *cgroup, void *arg) > > > > > > /* > > > * This test checks that memory.swap.max limits the amount of > > > - * anonymous memory which can be swapped out. > > > + * anonymous memory which can be swapped out. Additionally, it verifies that > > > + * memory.swap.peak reflects the high watermark and can be reset. > > > */ > > > -static int test_memcg_swap_max(const char *root) > > > +static int test_memcg_swap_max_peak(const char *root) > > > { > > > int ret = KSFT_FAIL; > > > char *memcg; > > > - long max; > > > + long max, peak; > > > > > > if (!is_swap_enabled()) > > > return KSFT_SKIP; > > > @@ -840,6 +861,12 @@ static int test_memcg_swap_max(const char *root) > > > goto cleanup; > > > } > > > > > > + if (cg_read_long(memcg, "memory.swap.peak")) > > > + goto cleanup; > > > + > > > + if (cg_read_long(memcg, "memory.peak")) > > > + goto cleanup; > > > + > > > if (cg_read_strcmp(memcg, "memory.max", "max\n")) > > > goto cleanup; > > > > > > @@ -862,6 +889,27 @@ static int test_memcg_swap_max(const char *root) > > > if (cg_read_key_long(memcg, "memory.events", "oom_kill ") != 1) > > > goto cleanup; > > > > > > + peak = cg_read_long(memcg, "memory.peak"); > > > + if (peak < MB(29)) > > > + goto cleanup; > > > + > > > + peak = cg_read_long(memcg, "memory.swap.peak"); > > > + if (peak < MB(29)) > > > + goto cleanup; > > > + > > > + if (cg_write(memcg, "memory.swap.peak", "\n")) > > > + goto cleanup; > > > + > > > + if (cg_read_long(memcg, "memory.swap.peak") > MB(10)) > > > + goto cleanup; > > > + > > > + > > > + if (cg_write(memcg, "memory.peak", "\n")) > > > + goto cleanup; > > > + > > > + if (cg_read_long(memcg, "memory.peak")) > > > + goto cleanup; > > > + > > > if (cg_run(memcg, alloc_anon_50M_check_swap, (void *)MB(30))) > > > goto cleanup; > > > > > > @@ -869,6 +917,14 @@ static int test_memcg_swap_max(const char *root) > > > if (max <= 0) > > > goto cleanup; > > > > > > + peak = cg_read_long(memcg, "memory.peak"); > > > + if (peak < MB(29)) > > > + goto cleanup; > > > + > > > + peak = cg_read_long(memcg, "memory.swap.peak"); > > > + if (peak < MB(19)) > > > + goto cleanup; > > > + > > > ret = KSFT_PASS; > > > > > > cleanup: > > > @@ -1295,7 +1351,7 @@ struct memcg_test { > > > const char *name; > > > } tests[] = { > > > T(test_memcg_subtree_control), > > > - T(test_memcg_current), > > > + T(test_memcg_current_peak), > > > T(test_memcg_min), > > > T(test_memcg_low), > > > T(test_memcg_high), > > > @@ -1303,7 +1359,7 @@ struct memcg_test { > > > T(test_memcg_max), > > > T(test_memcg_reclaim), > > > T(test_memcg_oom_events), > > > - T(test_memcg_swap_max), > > > + T(test_memcg_swap_max_peak), > > > T(test_memcg_sock), > > > T(test_memcg_oom_group_leaf_events), > > > T(test_memcg_oom_group_parent_events), > > > -- > > > 2.40.1 > > > > > > > > > -- > > David Finkel > > Senior Principal Software Engineer, Core Services > > > > -- > David Finkel > Senior Principal Software Engineer, Core Services -- Michal Hocko SUSE Labs