From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62F88C3DA59 for ; Mon, 15 Jul 2024 20:42:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE2236B009A; Mon, 15 Jul 2024 16:42:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E6B7D6B00A2; Mon, 15 Jul 2024 16:42:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE3DA6B00B1; Mon, 15 Jul 2024 16:42:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AC5826B009A for ; Mon, 15 Jul 2024 16:42:27 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4AFFC801F6 for ; Mon, 15 Jul 2024 20:42:27 +0000 (UTC) X-FDA: 82343159934.21.5844E1B Received: from mail-ot1-f52.google.com (mail-ot1-f52.google.com [209.85.210.52]) by imf25.hostedemail.com (Postfix) with ESMTP id 584DFA000F for ; Mon, 15 Jul 2024 20:42:25 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b=AEXhnnK6; spf=pass (imf25.hostedemail.com: domain of davidf@vimeo.com designates 209.85.210.52 as permitted sender) smtp.mailfrom=davidf@vimeo.com; dmarc=pass (policy=reject) header.from=vimeo.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721076107; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QgjekKFsvMaWwbjU1xY1q6vKPOUMmpbm2MCYH6Nig6A=; b=1vYo45dl5ZUObRORH+uNCXVrRWRj0m4vG6/hmZ7+4PiWxaT7PjOGA0zHpfkNuoa3/HNODA 73JbPYwqyiR1+V85LRDadm8YonG7GmUPkW7r76kK8Ye0cuAXg5SixcO6EnExPh/2dc27Is EYeNHymRQXdfWhjynWtv3/auvKsHKts= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721076107; a=rsa-sha256; cv=none; b=6vRNc02lVUtienV6jSEmY4NxNqK/7+IpJ/a+/C/z0SG7ld7xDZYsmK5TgF9LaytQ5r+tmC iASy+CCPQrDSeoYAa03jglpZ1y4qqpiIo432GfSydSvy4E0dgDvBCkVFwZ8CFrq1Dpn6Ox BTgu4lKYEU4mtyvqkDrAFXdESbPwszY= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=vimeo.com header.s=google header.b=AEXhnnK6; spf=pass (imf25.hostedemail.com: domain of davidf@vimeo.com designates 209.85.210.52 as permitted sender) smtp.mailfrom=davidf@vimeo.com; dmarc=pass (policy=reject) header.from=vimeo.com Received: by mail-ot1-f52.google.com with SMTP id 46e09a7af769-70211abf4cbso3282694a34.3 for ; Mon, 15 Jul 2024 13:42:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vimeo.com; s=google; t=1721076144; x=1721680944; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QgjekKFsvMaWwbjU1xY1q6vKPOUMmpbm2MCYH6Nig6A=; b=AEXhnnK6yzzGsOyYSy8EI39Do2VQZIAla8wn9W1B/gBGrvYlmDztDuseb3ibGNsGNT CTZa3axdhOrTjXMbWPciQ6wgq55i67I7K85+N9rZKKwc2XNSEriW4KxwsssxTphQwOpg i+7HwjVeWy8cwJjmhO7hMp3i9+nNIHR9F3zEU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721076144; x=1721680944; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QgjekKFsvMaWwbjU1xY1q6vKPOUMmpbm2MCYH6Nig6A=; b=p/kev0yQYimqpbnmRmBGOslfwXOcV72VBRVigBDNEhwAqNPD+TFUIcSXE9+1RKalXd M/dLMFQ3I46nOVPyB37ZKuS2O2c7eTh2Cin1XDDTbT4f+b9M8QEu9zxRx7t7dmFjgm5z td8D6N2KrsHd+ZplDD1g538Eo5jp97IwAU6Q1UiTpI1haAK/FfHevgJmseNtjZVUgtbP nw69goXNPH+WpQe+h7UBm+CX85RBjnw66t6CP1SYq2+OcVm9PD1+E4+fFQCqLeSlw76u VkCpC7l99XFWM9Wj9joWBIPIsMBNOA8RrqVWPD6+kPGwlPcxgsB9mgTEcXF5B/hoBP0D eZFw== X-Forwarded-Encrypted: i=1; AJvYcCUHNiMCHt5VqrDS0gTDdhusE7hYHyZsISfqcLp4WuaAF6lPUbk5S4BYK3kqc5DinGyu2FSlvCK70k5EovPqQZmqUJw= X-Gm-Message-State: AOJu0Yx6BClFT9ZMMXBc17XOs6ay1zK6xJElxLDGOaXsVmWJyp0Nm/LC spGc3JyQNizKWmLeS9ruemZI86xL4rtLBXRz3em7UL3B6APOkpiUFu9AXNEMy2QIlzMG3gU4qHy kWPDS8E7lgFHvBIrPmqG/5UWZL2E6xOcpoVe2hw== X-Google-Smtp-Source: AGHT+IE80A3N6/0LviWaYIYPv4b6JS3BKLmYTOpBLAVS0nJn2yHA4Nb5u2U5SNKZXaiRDD2fziMSYJV/LFLtAbbBnek= X-Received: by 2002:a05:6830:4988:b0:703:6032:de36 with SMTP id 46e09a7af769-708d99e658fmr234217a34.35.1721076144187; Mon, 15 Jul 2024 13:42:24 -0700 (PDT) MIME-Version: 1.0 References: <20240715203625.1462309-1-davidf@vimeo.com> <20240715203625.1462309-2-davidf@vimeo.com> In-Reply-To: <20240715203625.1462309-2-davidf@vimeo.com> From: David Finkel Date: Mon, 15 Jul 2024 16:42:12 -0400 Message-ID: Subject: Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers To: Muchun Song , Andrew Morton Cc: core-services@vimeo.com, Jonathan Corbet , Michal Hocko , Roman Gushchin , Shakeel Butt , Shuah Khan , Johannes Weiner , Tejun Heo , Zefan Li , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: mtw9zotk1ogqoeatmbgux8yx9t31385j X-Rspamd-Queue-Id: 584DFA000F X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1721076145-553595 X-HE-Meta: U2FsdGVkX1+/Vln01f7Gml/Mfo5LcEjkkUNyHxM9FZdUuMD+f5rrP+99sR0V0P86wKC1/EIuR667w5A0YCgL8IKvxcVFyVUG6SErxoYeJuOJ+DWWBzje8xLP5QOSrRk0F74Jz6xoaC2WjtBQh58XVgLPlnZuXreMBBbbATx3pvSqJeBmXuIqtHrkLK/VFTl455Dxk9NLX0IK66ZHWckUMFT0Fog/1qZ+nJ63c7leWTg6p5oPgAK+rcKfbci00y0dghhyCdIDCqb+W7qVfKmkqBOkS6QD97C6aHZWVg1L1Ji4drIO+dAfF4q7npaPKk71B8ZuhpR0grYwRtYXtP0CcmvTCTc7ygPG4Hqhn7CvXE1JbTC10yyIngThC6r9AHl5lRXQfTRraHkInafXzMRLp1ubxyGfzvta/0KUBj7elF3TDs542stVWnEyHL4slBy4RcHtg76XG7qtfuOe2nNlFhTdOg8DIKtAWNMR0j/7LU1YgouJzgWO35yhWLVwUyqjSer4prxnUrrpFaJ3PQv6y331S0ZJPeDFDYGbUtwoU2VoW8QDzVl1br4UKskkwyGIqJItS7Cs3OJ6oRGDvu6Fj/bIoNwJpqtR+kdmnm2qjvrP9bu+OM4d7m1f/diK/lXdSs8aTKsT62pty30vI43Q2ftRpuTcwjUPPjVagcL2iowfnZt4xhqIWMdHBFJ1gxHQ+NL6fM+pBMsLXRFPEoGdZDAn74tf52lEaVkgQl7H8GpYkpvkuagryAN1LdhSdfsnT30bYL9jC4nT2vFbyP2O/OQzUdHrm5kM3ogxmQId8vX/0gZNj8RUoEDFCtBEui988vobuKUcX/kQYTCAB74HXle+6sS0Iv9sr45vYTIiIDIxMT8weIaX99I+iCLm3y74SnHdzabKIvsrRU7YAoP+ZsrFr+5iHmHWslOFhtrSug4zCDC94RPdXKkeh56eF8mBz/GPLh794sDGf232RuS P/Cr6fwR qfm+T3mjLqolhLcfRiBh/XF55HAx0PTC3WrCkxcSaB8PAJUjlhO/r7xMj8jcm6SgLCN/eg2afBk1h6BXqygU+cMt5j3aOirDXBPuEfE7u2VUNtb0G0nPdfXtf97FZ8WzajXCDofnydYX6bywMyMzREtfty0BxxpTp2EDVov3qKU2YBlEXVCUEF+vjvjJa2yRzMLQD/CGU0AM7MithA8CfPrxdzFTPhlldURNppGLC1Fapr9CUOowvsyNXR5IdSJ2fldJMmdfv+JxF4Qt0XCZYRL80kV/WhmXHNc6B74mEzv/J0cw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Note: this is a simple rebase of a patch I sent a few months ago, which received two acks before the thread petered out: https://www.spinics.net/lists/cgroups/msg40602.html Thanks, On Mon, Jul 15, 2024 at 4:38=E2=80=AFPM David Finkel wro= te: > > Other mechanisms for querying the peak memory usage of either a process > or v1 memory cgroup allow for resetting the high watermark. Restore > parity with those mechanisms. > > For example: > - Any write to memory.max_usage_in_bytes in a cgroup v1 mount resets > the high watermark. > - writing "5" to the clear_refs pseudo-file in a processes's proc > directory resets the peak RSS. > > This change copies the cgroup v1 behavior so any write to the > memory.peak and memory.swap.peak pseudo-files reset the high watermark > to the current usage. > > This behavior is particularly useful for work scheduling systems that > need to track memory usage of worker processes/cgroups per-work-item. > Since memory can't be squeezed like CPU can (the OOM-killer has > opinions), these systems need to track the peak memory usage to compute > system/container fullness when binpacking workitems. > > Signed-off-by: David Finkel > --- > Documentation/admin-guide/cgroup-v2.rst | 20 +++--- > mm/memcontrol.c | 23 ++++++ > .../selftests/cgroup/test_memcontrol.c | 72 ++++++++++++++++--- > 3 files changed, 99 insertions(+), 16 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admi= n-guide/cgroup-v2.rst > index 8fbb0519d556..201d8e5d9f82 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1322,11 +1322,13 @@ PAGE_SIZE multiple when read back. > reclaim induced by memory.reclaim. > > memory.peak > - A read-only single value file which exists on non-root > - cgroups. > + A read-write single value file which exists on non-root cgroups. > + > + The max memory usage recorded for the cgroup and its descendants = since > + either the creation of the cgroup or the most recent reset. > > - The max memory usage recorded for the cgroup and its > - descendants since the creation of the cgroup. > + Any non-empty write to this file resets it to the current memory = usage. > + All content written is completely ignored. > > memory.oom.group > A read-write single value file which exists on non-root > @@ -1652,11 +1654,13 @@ PAGE_SIZE multiple when read back. > Healthy workloads are not expected to reach this limit. > > memory.swap.peak > - A read-only single value file which exists on non-root > - cgroups. > + A read-write single value file which exists on non-root cgroups. > + > + The max swap usage recorded for the cgroup and its descendants si= nce > + the creation of the cgroup or the most recent reset. > > - The max swap usage recorded for the cgroup and its > - descendants since the creation of the cgroup. > + Any non-empty write to this file resets it to the current swap us= age. > + All content written is completely ignored. > > memory.swap.max > A read-write single value file which exists on non-root > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8f2f1bb18c9c..abfa547615d6 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -25,6 +25,7 @@ > * Copyright (C) 2020 Alibaba, Inc, Alex Shi > */ > > +#include > #include > #include > #include > @@ -6915,6 +6916,16 @@ static u64 memory_peak_read(struct cgroup_subsys_s= tate *css, > return (u64)memcg->memory.watermark * PAGE_SIZE; > } > > +static ssize_t memory_peak_write(struct kernfs_open_file *of, > + char *buf, size_t nbytes, loff_t off) > +{ > + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); > + > + page_counter_reset_watermark(&memcg->memory); > + > + return nbytes; > +} > + > static int memory_min_show(struct seq_file *m, void *v) > { > return seq_puts_memcg_tunable(m, > @@ -7232,6 +7243,7 @@ static struct cftype memory_files[] =3D { > .name =3D "peak", > .flags =3D CFTYPE_NOT_ON_ROOT, > .read_u64 =3D memory_peak_read, > + .write =3D memory_peak_write, > }, > { > .name =3D "min", > @@ -8201,6 +8213,16 @@ static u64 swap_peak_read(struct cgroup_subsys_sta= te *css, > return (u64)memcg->swap.watermark * PAGE_SIZE; > } > > +static ssize_t swap_peak_write(struct kernfs_open_file *of, > + char *buf, size_t nbytes, loff_t off) > +{ > + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); > + > + page_counter_reset_watermark(&memcg->swap); > + > + return nbytes; > +} > + > static int swap_high_show(struct seq_file *m, void *v) > { > return seq_puts_memcg_tunable(m, > @@ -8283,6 +8305,7 @@ static struct cftype swap_files[] =3D { > .name =3D "swap.peak", > .flags =3D CFTYPE_NOT_ON_ROOT, > .read_u64 =3D swap_peak_read, > + .write =3D swap_peak_write, > }, > { > .name =3D "swap.events", > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/tes= ting/selftests/cgroup/test_memcontrol.c > index 41ae8047b889..681972de673b 100644 > --- a/tools/testing/selftests/cgroup/test_memcontrol.c > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c > @@ -161,12 +161,12 @@ static int alloc_pagecache_50M_check(const char *cg= roup, void *arg) > /* > * This test create a memory cgroup, allocates > * some anonymous memory and some pagecache > - * and check memory.current and some memory.stat values. > + * and checks memory.current, memory.peak, and some memory.stat values. > */ > -static int test_memcg_current(const char *root) > +static int test_memcg_current_peak(const char *root) > { > int ret =3D KSFT_FAIL; > - long current; > + long current, peak, peak_reset; > char *memcg; > > memcg =3D cg_name(root, "memcg_test"); > @@ -180,12 +180,32 @@ static int test_memcg_current(const char *root) > if (current !=3D 0) > goto cleanup; > > + peak =3D cg_read_long(memcg, "memory.peak"); > + if (peak !=3D 0) > + goto cleanup; > + > if (cg_run(memcg, alloc_anon_50M_check, NULL)) > goto cleanup; > > + peak =3D cg_read_long(memcg, "memory.peak"); > + if (peak < MB(50)) > + goto cleanup; > + > + peak_reset =3D cg_write(memcg, "memory.peak", "\n"); > + if (peak_reset !=3D 0) > + goto cleanup; > + > + peak =3D cg_read_long(memcg, "memory.peak"); > + if (peak > MB(30)) > + goto cleanup; > + > if (cg_run(memcg, alloc_pagecache_50M_check, NULL)) > goto cleanup; > > + peak =3D cg_read_long(memcg, "memory.peak"); > + if (peak < MB(50)) > + goto cleanup; > + > ret =3D KSFT_PASS; > > cleanup: > @@ -817,13 +837,14 @@ static int alloc_anon_50M_check_swap(const char *cg= roup, void *arg) > > /* > * This test checks that memory.swap.max limits the amount of > - * anonymous memory which can be swapped out. > + * anonymous memory which can be swapped out. Additionally, it verifies = that > + * memory.swap.peak reflects the high watermark and can be reset. > */ > -static int test_memcg_swap_max(const char *root) > +static int test_memcg_swap_max_peak(const char *root) > { > int ret =3D KSFT_FAIL; > char *memcg; > - long max; > + long max, peak; > > if (!is_swap_enabled()) > return KSFT_SKIP; > @@ -840,6 +861,12 @@ static int test_memcg_swap_max(const char *root) > goto cleanup; > } > > + if (cg_read_long(memcg, "memory.swap.peak")) > + goto cleanup; > + > + if (cg_read_long(memcg, "memory.peak")) > + goto cleanup; > + > if (cg_read_strcmp(memcg, "memory.max", "max\n")) > goto cleanup; > > @@ -862,6 +889,27 @@ static int test_memcg_swap_max(const char *root) > if (cg_read_key_long(memcg, "memory.events", "oom_kill ") !=3D 1) > goto cleanup; > > + peak =3D cg_read_long(memcg, "memory.peak"); > + if (peak < MB(29)) > + goto cleanup; > + > + peak =3D cg_read_long(memcg, "memory.swap.peak"); > + if (peak < MB(29)) > + goto cleanup; > + > + if (cg_write(memcg, "memory.swap.peak", "\n")) > + goto cleanup; > + > + if (cg_read_long(memcg, "memory.swap.peak") > MB(10)) > + goto cleanup; > + > + > + if (cg_write(memcg, "memory.peak", "\n")) > + goto cleanup; > + > + if (cg_read_long(memcg, "memory.peak")) > + goto cleanup; > + > if (cg_run(memcg, alloc_anon_50M_check_swap, (void *)MB(30))) > goto cleanup; > > @@ -869,6 +917,14 @@ static int test_memcg_swap_max(const char *root) > if (max <=3D 0) > goto cleanup; > > + peak =3D cg_read_long(memcg, "memory.peak"); > + if (peak < MB(29)) > + goto cleanup; > + > + peak =3D cg_read_long(memcg, "memory.swap.peak"); > + if (peak < MB(19)) > + goto cleanup; > + > ret =3D KSFT_PASS; > > cleanup: > @@ -1295,7 +1351,7 @@ struct memcg_test { > const char *name; > } tests[] =3D { > T(test_memcg_subtree_control), > - T(test_memcg_current), > + T(test_memcg_current_peak), > T(test_memcg_min), > T(test_memcg_low), > T(test_memcg_high), > @@ -1303,7 +1359,7 @@ struct memcg_test { > T(test_memcg_max), > T(test_memcg_reclaim), > T(test_memcg_oom_events), > - T(test_memcg_swap_max), > + T(test_memcg_swap_max_peak), > T(test_memcg_sock), > T(test_memcg_oom_group_leaf_events), > T(test_memcg_oom_group_parent_events), > -- > 2.40.1 > --=20 David Finkel Senior Principal Software Engineer, Core Services