From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA5E9C3DA59 for ; Mon, 15 Jul 2024 20:46:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 373756B00B0; Mon, 15 Jul 2024 16:46:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3062A6B00B2; Mon, 15 Jul 2024 16:46:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C3D76B00B3; Mon, 15 Jul 2024 16:46:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id F108C6B00B2 for ; Mon, 15 Jul 2024 16:46:50 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9157912021A for ; Mon, 15 Jul 2024 20:46:50 +0000 (UTC) X-FDA: 82343170980.24.00C125B Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf06.hostedemail.com (Postfix) with ESMTP id A547E180008 for ; Mon, 15 Jul 2024 20:46:48 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b=Y1gXdeSP; spf=pass (imf06.hostedemail.com: domain of davidf@vimeo.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=davidf@vimeo.com; dmarc=pass (policy=reject) header.from=vimeo.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721076370; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v5r4BlMeFMsBLJXnwagqjfkA1OJefHXmchvn97qJwwY=; b=5ygsjoYVfDH9GTqtS93LrGkwzzzwd1EdVyAawViPTWnJnyly6oAuosmNyvEoGuXdooKthw luilYsHwFVXBBGdjMOYTZbTQ8aCLRVX8Tv1xNlTH1L5fviJVevxxcUJKmvkQUW+jgfj2Qb KSy/zK6bGdIxD3XwZCs1PhfxCXuASxc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721076370; a=rsa-sha256; cv=none; b=0yUGajHD4MJDKrZL/mUOMUiUcGtO7UnzQG2wX0NakRXHUvV2+x2GR3s20s0mS8F21V6RBg RvG126TxRnmVJRzRaDYXPH1fbzO5jYINtfdDQ19f0Zmste3XQP8rNq9EXAvi5Na6wvJQ85 UR7oHp4sOM/WMMmELji05NwaP9mkVSU= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b=Y1gXdeSP; spf=pass (imf06.hostedemail.com: domain of davidf@vimeo.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=davidf@vimeo.com; dmarc=pass (policy=reject) header.from=vimeo.com Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-6bce380eb9bso2747161a12.0 for ; Mon, 15 Jul 2024 13:46:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vimeo.com; s=google; t=1721076407; x=1721681207; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=v5r4BlMeFMsBLJXnwagqjfkA1OJefHXmchvn97qJwwY=; b=Y1gXdeSPAPV2m4tKVJFm85uiP2CV1D/yHN915dihz3SyXqWjo5T99z1p1MBRM2y66W 2tXYRjSDMXdwBkaz+o7NXm02T4CEwmjyD315vnVuObRY166Za8/LLU3onh1mKXaJrKeG mLtfHiIknk1Yh0yHIl0QB5gfdOKxMoFBkz8vM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721076407; x=1721681207; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=v5r4BlMeFMsBLJXnwagqjfkA1OJefHXmchvn97qJwwY=; b=YLGcZzC8sDPxcjsmFCL7WVeIFNscKU1Ayqq+IlRYG5thq/JY5hsNi7ED3SufHU7GDP JLzimXlLLgrMWWu41XdiMXd5P7Px8zAppPF3SZAs7ivy/exMUcJ1h3s4qeQ1d4P7ilDW v8hh2UiCyvfOl0iho+CsLdkp7FWFHVNSjSjLgSmL538JbCCImSwwBYZoSLNe0z7HokG+ mKT0tOKiCiOVP6IIWpAGM6/ZX3/dt2pX63LAJU/+Zms8nuS9oMy3atM9elMJXJtlEUT7 2tpwlyreDH0xcsZXWyFkL+91Tjnfwj4hGUjFMkjWX+jMmfxnN5wYw8wumr/FV7Np9OOd MHrw== X-Forwarded-Encrypted: i=1; AJvYcCUSg8DLp4ziVCic0grB7SIgtyropxaqraFLbgctapbuQ3xgM9TE6/vEtysw9gHjGDdo9FBwsHNEk8AVL/iWNriIHmY= X-Gm-Message-State: AOJu0YxvEAisWuZe5YhTSPfvxZMCY6UPxkk/e+7nh5UiuZdC8I+MRvlb IXfW7UL+kw0Go70psr4gA35P7j2+atnx5eRRv3ZWE1AxdjlgwUDQSrcORGBt6ao7sblv2a1yAux S9aCfNexXl618Qm1Y7A//WMVmUGsC+xhf7Z0Nvw== X-Google-Smtp-Source: AGHT+IGWEblruDt5y6iV398H0NEwY8WcXmFhJ/wPOIreRwsCOPEZS49X+zbp4U+Eol1q+HMj+BVSAINrdsm2Vp6js/Y= X-Received: by 2002:a05:6a20:43a9:b0:1c0:f23c:28b1 with SMTP id adf61e73a8af0-1c3f122d69bmr94469637.23.1721076407209; Mon, 15 Jul 2024 13:46:47 -0700 (PDT) MIME-Version: 1.0 References: <20240715203625.1462309-1-davidf@vimeo.com> <20240715203625.1462309-2-davidf@vimeo.com> In-Reply-To: From: David Finkel Date: Mon, 15 Jul 2024 16:46:36 -0400 Message-ID: Subject: Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers To: Muchun Song , Andrew Morton Cc: core-services@vimeo.com, Jonathan Corbet , Michal Hocko , Roman Gushchin , Shuah Khan , Johannes Weiner , Tejun Heo , Zefan Li , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Shakeel Butt Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: bemg9hnpxww4z8gi1x18xgg6mgd8efrj X-Rspamd-Queue-Id: A547E180008 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1721076408-570933 X-HE-Meta: U2FsdGVkX18Wpxe8CyKWV/mG5kMrZ9HR37LzzEh6N6zdMIbbzq2fw6A7PJYqGc0jiiYVvaADoqO8EYxZsMX7SXTVkfuScNn6wuaKzibZBFZ/3+Oe9eZlAxIIiOZaLSsoaRiPZnoLBL7oudupkqFJca5c2Ano8UD/v9SN1oSzxY828WCjBR1wazGmLkNhDsvqymm+oL0aoqbkGDpZaO4MTzrEGbW5zobwvn+HnxP88cbvLnR98EA8d8t3x4Mn3eDqrZLmQIKmgHjNxmISA9EkY0fAH/ESZwn/IN+dBePzCxJVytNZjWUO08bVS8tdeDCpq7wKa6s6bJTlgLdIe+ua/5pkMpQvB93WQYsS8IIjsaNIsK20RUEPx9+01FSP1NjsIifIBJT+awuPYHRdY+FmEIkJDEr/q2LBrWcuf6fRlt3LLl7dLl4EUmsZtLpzI1OwUJP6SuwAiJxo85uqan5BX+a9iBKmfB1bhE4xrZtEJkl5WgNLIA4QVkvTlk+jS1St9U6gSpjbwRq/iYB5WrYlPocvVN3/Xbq8u5gvBshIsr992W0MVox05FQ74Vhsu8koSnkBn0lfVOmo82icXh4bFfiXC/HorgadsM79NJahggKZqrG84CzB/HOKWGVzjkjDLNya5DKbHG0Z36NCeUCdEHIG9/wHzl0AsUGku5IB6cX+wybUZc0BBf9RSAhobi6/TEqCLclyYuQJw0kpi32YhaT0xtuvmWN4SH5Q0mxEN5IW/giRTNcyjUy8jZDkz4SLzHUdlyHuO12LNadnTf659RuShWCjiJYHqD42zXgjdq/Fpeb8R+pUfs8Spx2gHcvYCdIRrcFg7uFCRM6aPScAhYrw6/h3V8dEPoGPx0UYyCIW2GXzIRosCELJxdhjvQo18IQMDR3rHV30V3bZG+dC6HQeH7bsoob14/4BeLQA389ev98s3XphZssn1S491Ro8NzUlqh/r/jnESWNuZVG KAMFh/BP 4u7FFn6Mgepv5j7/9aVeokpiHeVn6HbO/l+mLVmDqS36NwpNqONLacBLafXo3aU0/AAP/U7OGhuEqQfq8sGbXQ9UDwJAWUDQdLvClbVof/yxvHU+TJGnHXGsp3v6kaMHwKw7KmBU5wc/hKwSDxeueaclojskroIXDvSe2QYO9m5md5XHp8Uk77svsGIgvh9QTGh3VMi/Q+iqryg0pneHeHr2CGy/lZazc7dP2e3Phj2GF/hHPbHbuUtrlZCj4mnCFsARf1ooSJEKoFPQSGvX3LvjMEOj5fGKKJEiGB91FJqrNlEY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [Fixing Shakeel's email address (I didn't notice it changed when comparing my previous "git send-email" commandline and get_mainainer.pl output)] On Mon, Jul 15, 2024 at 4:42=E2=80=AFPM David Finkel wro= te: > > Note: this is a simple rebase of a patch I sent a few months ago, > which received two acks before the thread petered out: > https://www.spinics.net/lists/cgroups/msg40602.html > > Thanks, > > On Mon, Jul 15, 2024 at 4:38=E2=80=AFPM David Finkel w= rote: > > > > Other mechanisms for querying the peak memory usage of either a process > > or v1 memory cgroup allow for resetting the high watermark. Restore > > parity with those mechanisms. > > > > For example: > > - Any write to memory.max_usage_in_bytes in a cgroup v1 mount resets > > the high watermark. > > - writing "5" to the clear_refs pseudo-file in a processes's proc > > directory resets the peak RSS. > > > > This change copies the cgroup v1 behavior so any write to the > > memory.peak and memory.swap.peak pseudo-files reset the high watermark > > to the current usage. > > > > This behavior is particularly useful for work scheduling systems that > > need to track memory usage of worker processes/cgroups per-work-item. > > Since memory can't be squeezed like CPU can (the OOM-killer has > > opinions), these systems need to track the peak memory usage to compute > > system/container fullness when binpacking workitems. > > > > Signed-off-by: David Finkel > > --- > > Documentation/admin-guide/cgroup-v2.rst | 20 +++--- > > mm/memcontrol.c | 23 ++++++ > > .../selftests/cgroup/test_memcontrol.c | 72 ++++++++++++++++--- > > 3 files changed, 99 insertions(+), 16 deletions(-) > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/ad= min-guide/cgroup-v2.rst > > index 8fbb0519d556..201d8e5d9f82 100644 > > --- a/Documentation/admin-guide/cgroup-v2.rst > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > @@ -1322,11 +1322,13 @@ PAGE_SIZE multiple when read back. > > reclaim induced by memory.reclaim. > > > > memory.peak > > - A read-only single value file which exists on non-root > > - cgroups. > > + A read-write single value file which exists on non-root cgroups= . > > + > > + The max memory usage recorded for the cgroup and its descendant= s since > > + either the creation of the cgroup or the most recent reset. > > > > - The max memory usage recorded for the cgroup and its > > - descendants since the creation of the cgroup. > > + Any non-empty write to this file resets it to the current memor= y usage. > > + All content written is completely ignored. > > > > memory.oom.group > > A read-write single value file which exists on non-root > > @@ -1652,11 +1654,13 @@ PAGE_SIZE multiple when read back. > > Healthy workloads are not expected to reach this limit. > > > > memory.swap.peak > > - A read-only single value file which exists on non-root > > - cgroups. > > + A read-write single value file which exists on non-root cgroups= . > > + > > + The max swap usage recorded for the cgroup and its descendants = since > > + the creation of the cgroup or the most recent reset. > > > > - The max swap usage recorded for the cgroup and its > > - descendants since the creation of the cgroup. > > + Any non-empty write to this file resets it to the current swap = usage. > > + All content written is completely ignored. > > > > memory.swap.max > > A read-write single value file which exists on non-root > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 8f2f1bb18c9c..abfa547615d6 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -25,6 +25,7 @@ > > * Copyright (C) 2020 Alibaba, Inc, Alex Shi > > */ > > > > +#include > > #include > > #include > > #include > > @@ -6915,6 +6916,16 @@ static u64 memory_peak_read(struct cgroup_subsys= _state *css, > > return (u64)memcg->memory.watermark * PAGE_SIZE; > > } > > > > +static ssize_t memory_peak_write(struct kernfs_open_file *of, > > + char *buf, size_t nbytes, loff_t off) > > +{ > > + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); > > + > > + page_counter_reset_watermark(&memcg->memory); > > + > > + return nbytes; > > +} > > + > > static int memory_min_show(struct seq_file *m, void *v) > > { > > return seq_puts_memcg_tunable(m, > > @@ -7232,6 +7243,7 @@ static struct cftype memory_files[] =3D { > > .name =3D "peak", > > .flags =3D CFTYPE_NOT_ON_ROOT, > > .read_u64 =3D memory_peak_read, > > + .write =3D memory_peak_write, > > }, > > { > > .name =3D "min", > > @@ -8201,6 +8213,16 @@ static u64 swap_peak_read(struct cgroup_subsys_s= tate *css, > > return (u64)memcg->swap.watermark * PAGE_SIZE; > > } > > > > +static ssize_t swap_peak_write(struct kernfs_open_file *of, > > + char *buf, size_t nbytes, loff_t off) > > +{ > > + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); > > + > > + page_counter_reset_watermark(&memcg->swap); > > + > > + return nbytes; > > +} > > + > > static int swap_high_show(struct seq_file *m, void *v) > > { > > return seq_puts_memcg_tunable(m, > > @@ -8283,6 +8305,7 @@ static struct cftype swap_files[] =3D { > > .name =3D "swap.peak", > > .flags =3D CFTYPE_NOT_ON_ROOT, > > .read_u64 =3D swap_peak_read, > > + .write =3D swap_peak_write, > > }, > > { > > .name =3D "swap.events", > > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/t= esting/selftests/cgroup/test_memcontrol.c > > index 41ae8047b889..681972de673b 100644 > > --- a/tools/testing/selftests/cgroup/test_memcontrol.c > > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c > > @@ -161,12 +161,12 @@ static int alloc_pagecache_50M_check(const char *= cgroup, void *arg) > > /* > > * This test create a memory cgroup, allocates > > * some anonymous memory and some pagecache > > - * and check memory.current and some memory.stat values. > > + * and checks memory.current, memory.peak, and some memory.stat values= . > > */ > > -static int test_memcg_current(const char *root) > > +static int test_memcg_current_peak(const char *root) > > { > > int ret =3D KSFT_FAIL; > > - long current; > > + long current, peak, peak_reset; > > char *memcg; > > > > memcg =3D cg_name(root, "memcg_test"); > > @@ -180,12 +180,32 @@ static int test_memcg_current(const char *root) > > if (current !=3D 0) > > goto cleanup; > > > > + peak =3D cg_read_long(memcg, "memory.peak"); > > + if (peak !=3D 0) > > + goto cleanup; > > + > > if (cg_run(memcg, alloc_anon_50M_check, NULL)) > > goto cleanup; > > > > + peak =3D cg_read_long(memcg, "memory.peak"); > > + if (peak < MB(50)) > > + goto cleanup; > > + > > + peak_reset =3D cg_write(memcg, "memory.peak", "\n"); > > + if (peak_reset !=3D 0) > > + goto cleanup; > > + > > + peak =3D cg_read_long(memcg, "memory.peak"); > > + if (peak > MB(30)) > > + goto cleanup; > > + > > if (cg_run(memcg, alloc_pagecache_50M_check, NULL)) > > goto cleanup; > > > > + peak =3D cg_read_long(memcg, "memory.peak"); > > + if (peak < MB(50)) > > + goto cleanup; > > + > > ret =3D KSFT_PASS; > > > > cleanup: > > @@ -817,13 +837,14 @@ static int alloc_anon_50M_check_swap(const char *= cgroup, void *arg) > > > > /* > > * This test checks that memory.swap.max limits the amount of > > - * anonymous memory which can be swapped out. > > + * anonymous memory which can be swapped out. Additionally, it verifie= s that > > + * memory.swap.peak reflects the high watermark and can be reset. > > */ > > -static int test_memcg_swap_max(const char *root) > > +static int test_memcg_swap_max_peak(const char *root) > > { > > int ret =3D KSFT_FAIL; > > char *memcg; > > - long max; > > + long max, peak; > > > > if (!is_swap_enabled()) > > return KSFT_SKIP; > > @@ -840,6 +861,12 @@ static int test_memcg_swap_max(const char *root) > > goto cleanup; > > } > > > > + if (cg_read_long(memcg, "memory.swap.peak")) > > + goto cleanup; > > + > > + if (cg_read_long(memcg, "memory.peak")) > > + goto cleanup; > > + > > if (cg_read_strcmp(memcg, "memory.max", "max\n")) > > goto cleanup; > > > > @@ -862,6 +889,27 @@ static int test_memcg_swap_max(const char *root) > > if (cg_read_key_long(memcg, "memory.events", "oom_kill ") !=3D = 1) > > goto cleanup; > > > > + peak =3D cg_read_long(memcg, "memory.peak"); > > + if (peak < MB(29)) > > + goto cleanup; > > + > > + peak =3D cg_read_long(memcg, "memory.swap.peak"); > > + if (peak < MB(29)) > > + goto cleanup; > > + > > + if (cg_write(memcg, "memory.swap.peak", "\n")) > > + goto cleanup; > > + > > + if (cg_read_long(memcg, "memory.swap.peak") > MB(10)) > > + goto cleanup; > > + > > + > > + if (cg_write(memcg, "memory.peak", "\n")) > > + goto cleanup; > > + > > + if (cg_read_long(memcg, "memory.peak")) > > + goto cleanup; > > + > > if (cg_run(memcg, alloc_anon_50M_check_swap, (void *)MB(30))) > > goto cleanup; > > > > @@ -869,6 +917,14 @@ static int test_memcg_swap_max(const char *root) > > if (max <=3D 0) > > goto cleanup; > > > > + peak =3D cg_read_long(memcg, "memory.peak"); > > + if (peak < MB(29)) > > + goto cleanup; > > + > > + peak =3D cg_read_long(memcg, "memory.swap.peak"); > > + if (peak < MB(19)) > > + goto cleanup; > > + > > ret =3D KSFT_PASS; > > > > cleanup: > > @@ -1295,7 +1351,7 @@ struct memcg_test { > > const char *name; > > } tests[] =3D { > > T(test_memcg_subtree_control), > > - T(test_memcg_current), > > + T(test_memcg_current_peak), > > T(test_memcg_min), > > T(test_memcg_low), > > T(test_memcg_high), > > @@ -1303,7 +1359,7 @@ struct memcg_test { > > T(test_memcg_max), > > T(test_memcg_reclaim), > > T(test_memcg_oom_events), > > - T(test_memcg_swap_max), > > + T(test_memcg_swap_max_peak), > > T(test_memcg_sock), > > T(test_memcg_oom_group_leaf_events), > > T(test_memcg_oom_group_parent_events), > > -- > > 2.40.1 > > > > > -- > David Finkel > Senior Principal Software Engineer, Core Services --=20 David Finkel Senior Principal Software Engineer, Core Services