From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9034C4345F for ; Tue, 30 Apr 2024 17:31:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 657546B00B2; Tue, 30 Apr 2024 13:31:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 608116B00B3; Tue, 30 Apr 2024 13:31:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A8C86B00B4; Tue, 30 Apr 2024 13:31:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2BCD06B00B2 for ; Tue, 30 Apr 2024 13:31:06 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id DA7B280633 for ; Tue, 30 Apr 2024 17:31:05 +0000 (UTC) X-FDA: 82066888890.01.8495FE9 Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169]) by imf28.hostedemail.com (Postfix) with ESMTP id 1982EC0030 for ; Tue, 30 Apr 2024 17:31:03 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OrOG6emA; spf=pass (imf28.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714498264; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WmwzhH0n2boBZ8Tjjq/sRiI789WbpPcUksWgg+R34aI=; b=hK4oUfMvnEtNHT0FD8QIB3Z4m09TT1e/3wdHyf7bfludUDkP2EmRWaN7s1RxcfaiciUSwL kQyF/1oVN6zd7ubUA2Pq1KdRNw05h478gMx9Ms/0i3ZFATml3l8RROn3PRkGSaGF9e0OPW A7CgcGjOXMsbkaUCwZwbyi8z2udeHhc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OrOG6emA; spf=pass (imf28.hostedemail.com: domain of tjmercier@google.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=tjmercier@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714498264; a=rsa-sha256; cv=none; b=fcf/4XVrb2H2yTLfpWDdpqtWBONhaL+KU0/JAd89WBE2UJ+auECDdyxdIjzud9oksODPbj 9IWAPqmNbATDlo5d0EthjXACg+XZZ0mVKEjiiBZKtrK3gy+P6f9/D3RdPsheenozA5mT8C gLICGl+t54edb0lrymUKttEnrSksSXU= Received: by mail-yb1-f169.google.com with SMTP id 3f1490d57ef6-de603e3072dso2271684276.1 for ; Tue, 30 Apr 2024 10:31:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1714498263; x=1715103063; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WmwzhH0n2boBZ8Tjjq/sRiI789WbpPcUksWgg+R34aI=; b=OrOG6emAvoaESOfzlQRtlDQg7fRkKlFm+SnOMJAgLTaOXyOD7rZDjI6p/GhvkDdpBs dtt0B7i0r1YSB6vL7NnUcS4wN1fNU7tEpqjHFLfFRZvslmNk5rmuo7Ms0xiQXjf5ySeH PkSxfRLCLHR54zYIrQWwW/Pu6M2eNOI9eQ/uPfG5LYI6jQdDi36s1yfP9BTThaeqAEK/ KgrJb86D7PptA41nGdJ0fIaYhGa8/VmejiIYLEStDu0kpPZLsS0BXKShhUa89GYBiTLX TX3A/aY5U4L83R7nEAN3OtVm+Xkqaj3graRjZqSLqoYCX2mFO0wKLxUlH8uOPpWc+0/X 4HTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714498263; x=1715103063; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WmwzhH0n2boBZ8Tjjq/sRiI789WbpPcUksWgg+R34aI=; b=V3zW28vdhEGfUM9EogQQXxs3YrK05SDFv+P7kyJBN3YaCf4nLu03o1E2SuX4dPV5n1 ECXyxLI/iKwH9Bd+d/OiFlhA8dXQvoX9rf+tydWSMNqbgO2mnrr6rBtDdFywggr+3fs0 uyk+z7F2AWiTeCPCMPA44hiTLipHrKWuWqAEVknIxWASK1OQ11qVGTxfisNsT3lXU+EK 0Ze6D5EKASOvLTj+1PygrkADVdMY/e9pBTGw31D6cKwYy/+L4fWJoyEEM/FxXvuJF48W BQ3bGmOdvKke8dWwYGxTLHAWtZGDphZfJpykPNOdkREp0eFj8KimZgV3NDH+B0r7BR/y WWhw== X-Forwarded-Encrypted: i=1; AJvYcCWXIDfYxdk9/pE4coMTcLyckKl0eIISLl3bYv+1oKPACG2M56hXPmfNa7jcMVqRmAaXu5VfSxP1xCWbp1fvfUmWONo= X-Gm-Message-State: AOJu0Yys8a19nlx9XEkvfpG4wPDoo8SDaVdLw4mAKdu/HS+CzLPEG1Ft /AZZBCnnm93yvPau25+coupzqytdsqWfE/lBHAwmv0FZETXSoetChyKxZ9ssptaKdnYn0EoxBjL Hih6vyaLT0QS1XGF2F9M+WlqpGjqFw6C/juIr X-Google-Smtp-Source: AGHT+IE5zlQ8yc0MU0P9kDDeu/SpC3jhvH27ySpK6YbvqZusw9NvI2J/OR4zxes8l0lNTxBSunLIa4tGtevLUJjeMmY= X-Received: by 2002:a25:8688:0:b0:de4:5e5c:5efe with SMTP id z8-20020a258688000000b00de45e5c5efemr224979ybk.47.1714498262595; Tue, 30 Apr 2024 10:31:02 -0700 (PDT) MIME-Version: 1.0 References: <20240430060612.2171650-1-shakeel.butt@linux.dev> <20240430060612.2171650-5-shakeel.butt@linux.dev> In-Reply-To: <20240430060612.2171650-5-shakeel.butt@linux.dev> From: "T.J. Mercier" Date: Tue, 30 Apr 2024 10:30:51 -0700 Message-ID: Subject: Re: [PATCH v3 4/8] memcg: reduce memory for the lruvec and memcg stats To: Shakeel Butt Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Yosry Ahmed , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 1982EC0030 X-Rspamd-Server: rspam06 X-Stat-Signature: mnzq9zdae7rxmkke1dh67cqjtgjg6hry X-HE-Tag: 1714498263-718437 X-HE-Meta: U2FsdGVkX18JEelGCWXD3WCqG1iSFvwmGhD0EvLKeBlqWWmcZ9FTySMEglhYvHsQc9JG4riI7mPgId+/KlkStNb2+8eTG9M3JlDAew9YJMjCE7lLJVOwAmz6iLtFppjr9CvAvUw3AowibYGvjAVRGZox5D2gNX2r4Okqg8wMGK4/2TIenXLGl1LeYYe1A2oKweFEQ/j/Iuh69z2hFg1u8SAla0XmUtOihzatfYElHd9pBTzz8qpgRAw9jwNmoYje2nno9C+vuZhpPRmn0wwmFMMwST1pIo87ew2WJVbEZ5rmv5+Sdf7vbTHoddalYpFpt7GHd8pq5aKPfvzYjlxVOQiUDv2OxVnEKTMUsycQGHU0dzv2NhJKILnsuTjYmPxtB8WoE33iaoaMD9dEpyjmxafHvv2aHoTPBQVZglP+sUiYsJvmxEoqVZzvzzFR+2oY5EDJjkMlTU6c/6JZxTr73zhtnxOaXsoe2lP1NOGLuLG3M8Hxir/meArP99uyK9Myv0KQtA6d+qtFjg5pr129Qsv0aFbj3acsbHwqYAcCYDTEsu/bdsGBZeM6XDQC+H3MPZ9pkaX2TPPkxM/jRvUx3H3iBLlC0gl2ghSbQapQVlU/eY6/6rGjBckQnBXnd03P1Jxn6Cxlmwz+342/1NX/NA/KiSV/fmHD3cViIQUElm0OR2hMC6pL2acMG+PQxEgZEUV8IiTib9q3a/RtV8DzKIIcQrqf0iEBmG06/T26aLbBeFXsi1N1DSFKzc+ZOHfRNwdEJJRPFKkOplMW2hbfUNy8wXRIHUc2s+rcNG+Aab7vr8ZVNvRewmmaJtQnNGNhy7vL3ciYuMfN88dzW0zOKrNTcN9Vajnjx4yGcGzaf5Sfl3Dwe3BWxUYksrYBAAUY5OW/tHHpD5bb9svARcffAszdrfpj8sDcONMSbMDrvwacpOob6WqKS2wkiEuzWZvT7V07YcmiPg5Q/u56LCg m2IpQ/wz t1E0zE2yu6CKHENBg8mN+SlH6xDcFNNcc/kY04/GNOCN3OloAQPawgAHJtEShO5OZY3csL4bkTBYdfOZIxrByREDe5snsjTjvVsu9j9jovaixdb2lqDOryLsQ6z7cDxM5zLcxTMjN3hUKVOk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 29, 2024 at 11:06=E2=80=AFPM Shakeel Butt wrote: > > At the moment, the amount of memory allocated for stats related structs > in the mem_cgroup corresponds to the size of enum node_stat_item. > However not all fields in enum node_stat_item has corresponding memcg typo: "have corresponding" > stats. So, let's use indirection mechanism similar to the one used for > memcg vmstats management. > > For a given x86_64 config, the size of stats with and without patch is: > > structs size in bytes w/o with > > struct lruvec_stats 1128 648 > struct lruvec_stats_percpu 752 432 > struct memcg_vmstats 1832 1352 > struct memcg_vmstats_percpu 1280 960 > > The memory savings is further compounded by the fact that these structs > are allocated for each cpu and for each node. To be precise, for each > memcg the memory saved would be: > > Memory saved =3D ((21 * 3 * NR_NODES) + (21 * 2 * NR_NODS * NR_CPUS) + typo: "NR_NODES" > (21 * 3) + (21 * 2 * NR_CPUS)) * sizeof(long) > > Where 21 is the number of fields eliminated. > > Signed-off-by: Shakeel Butt > --- > > Changes since v2: > - N/A > > mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 115 insertions(+), 23 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 434cff91b65e..f424c5b2ba9b 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -576,35 +576,105 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgro= up_tree_per_node *mctz) > return mz; > } > > +/* Subset of node_stat_item for memcg stats */ > +static const unsigned int memcg_node_stat_items[] =3D { > + NR_INACTIVE_ANON, > + NR_ACTIVE_ANON, > + NR_INACTIVE_FILE, > + NR_ACTIVE_FILE, > + NR_UNEVICTABLE, > + NR_SLAB_RECLAIMABLE_B, > + NR_SLAB_UNRECLAIMABLE_B, > + WORKINGSET_REFAULT_ANON, > + WORKINGSET_REFAULT_FILE, > + WORKINGSET_ACTIVATE_ANON, > + WORKINGSET_ACTIVATE_FILE, > + WORKINGSET_RESTORE_ANON, > + WORKINGSET_RESTORE_FILE, > + WORKINGSET_NODERECLAIM, > + NR_ANON_MAPPED, > + NR_FILE_MAPPED, > + NR_FILE_PAGES, > + NR_FILE_DIRTY, > + NR_WRITEBACK, > + NR_SHMEM, > + NR_SHMEM_THPS, > + NR_FILE_THPS, > + NR_ANON_THPS, > + NR_KERNEL_STACK_KB, > + NR_PAGETABLE, > + NR_SECONDARY_PAGETABLE, > +#ifdef CONFIG_SWAP > + NR_SWAPCACHE, > +#endif > +}; > + > +static const unsigned int memcg_stat_items[] =3D { > + MEMCG_SWAP, > + MEMCG_SOCK, > + MEMCG_PERCPU_B, > + MEMCG_VMALLOC, > + MEMCG_KMEM, > + MEMCG_ZSWAP_B, > + MEMCG_ZSWAPPED, > +}; > + > +#define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items) > +#define NR_MEMCG_STATS (NR_MEMCG_NODE_STAT_ITEMS + ARRAY_SIZE(memcg_stat= _items)) > +static int8_t mem_cgroup_stats_index[MEMCG_NR_STAT] __read_mostly; > + > +static void init_memcg_stats(void) > +{ > + int8_t i, j =3D 0; > + > + /* Switch to short once this failure occurs. */ > + BUILD_BUG_ON(NR_MEMCG_STATS >=3D 127 /* INT8_MAX */); > + > + for (i =3D 0; i < NR_MEMCG_NODE_STAT_ITEMS; ++i) > + mem_cgroup_stats_index[memcg_node_stat_items[i]] =3D ++j; > + > + for (i =3D 0; i < ARRAY_SIZE(memcg_stat_items); ++i) > + mem_cgroup_stats_index[memcg_stat_items[i]] =3D ++j; > +} > + > +static inline int memcg_stats_index(int idx) > +{ > + return mem_cgroup_stats_index[idx] - 1; Could this just be: return mem_cgroup_stats_index[idx]; with a postfix increment of j in init_memcg_stats instead of prefix increme= nt? > +} > + > struct lruvec_stats_percpu { > /* Local (CPU and cgroup) state */ > - long state[NR_VM_NODE_STAT_ITEMS]; > + long state[NR_MEMCG_NODE_STAT_ITEMS]; > > /* Delta calculation for lockless upward propagation */ > - long state_prev[NR_VM_NODE_STAT_ITEMS]; > + long state_prev[NR_MEMCG_NODE_STAT_ITEMS]; > }; > > struct lruvec_stats { > /* Aggregated (CPU and subtree) state */ > - long state[NR_VM_NODE_STAT_ITEMS]; > + long state[NR_MEMCG_NODE_STAT_ITEMS]; > > /* Non-hierarchical (CPU aggregated) state */ > - long state_local[NR_VM_NODE_STAT_ITEMS]; > + long state_local[NR_MEMCG_NODE_STAT_ITEMS]; > > /* Pending child counts during tree propagation */ > - long state_pending[NR_VM_NODE_STAT_ITEMS]; > + long state_pending[NR_MEMCG_NODE_STAT_ITEMS]; > }; > > unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_it= em idx) > { > struct mem_cgroup_per_node *pn; > - long x; > + long x =3D 0; > + int i; > > if (mem_cgroup_disabled()) > return node_page_state(lruvec_pgdat(lruvec), idx); > > - pn =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); > - x =3D READ_ONCE(pn->lruvec_stats->state[idx]); > + i =3D memcg_stats_index(idx); > + if (i >=3D 0) { > + pn =3D container_of(lruvec, struct mem_cgroup_per_node, l= ruvec); > + x =3D READ_ONCE(pn->lruvec_stats->state[i]); > + } > #ifdef CONFIG_SMP > if (x < 0) > x =3D 0; > @@ -617,12 +687,16 @@ unsigned long lruvec_page_state_local(struct lruvec= *lruvec, > { > struct mem_cgroup_per_node *pn; > long x =3D 0; > + int i; > > if (mem_cgroup_disabled()) > return node_page_state(lruvec_pgdat(lruvec), idx); > > - pn =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); > - x =3D READ_ONCE(pn->lruvec_stats->state_local[idx]); > + i =3D memcg_stats_index(idx); > + if (i >=3D 0) { > + pn =3D container_of(lruvec, struct mem_cgroup_per_node, l= ruvec); > + x =3D READ_ONCE(pn->lruvec_stats->state_local[i]); > + } > #ifdef CONFIG_SMP > if (x < 0) > x =3D 0; > @@ -689,11 +763,11 @@ struct memcg_vmstats_percpu { > /* The above should fit a single cacheline for memcg_rstat_update= d() */ > > /* Local (CPU and cgroup) page state & events */ > - long state[MEMCG_NR_STAT]; > + long state[NR_MEMCG_STATS]; > unsigned long events[NR_MEMCG_EVENTS]; > > /* Delta calculation for lockless upward propagation */ > - long state_prev[MEMCG_NR_STAT]; > + long state_prev[NR_MEMCG_STATS]; > unsigned long events_prev[NR_MEMCG_EVENTS]; > > /* Cgroup1: threshold notifications & softlimit tree updates */ > @@ -703,15 +777,15 @@ struct memcg_vmstats_percpu { > > struct memcg_vmstats { > /* Aggregated (CPU and subtree) page state & events */ > - long state[MEMCG_NR_STAT]; > + long state[NR_MEMCG_STATS]; > unsigned long events[NR_MEMCG_EVENTS]; > > /* Non-hierarchical (CPU aggregated) page state & events */ > - long state_local[MEMCG_NR_STAT]; > + long state_local[NR_MEMCG_STATS]; > unsigned long events_local[NR_MEMCG_EVENTS]; > > /* Pending child counts during tree propagation */ > - long state_pending[MEMCG_NR_STAT]; > + long state_pending[NR_MEMCG_STATS]; > unsigned long events_pending[NR_MEMCG_EVENTS]; > > /* Stats updates since the last flush */ > @@ -844,7 +918,13 @@ static void flush_memcg_stats_dwork(struct work_stru= ct *w) > > unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) > { > - long x =3D READ_ONCE(memcg->vmstats->state[idx]); > + long x; > + int i =3D memcg_stats_index(idx); > + > + if (i < 0) > + return 0; > + > + x =3D READ_ONCE(memcg->vmstats->state[i]); > #ifdef CONFIG_SMP > if (x < 0) > x =3D 0; > @@ -876,18 +956,25 @@ static int memcg_state_val_in_pages(int idx, int va= l) > */ > void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) > { > - if (mem_cgroup_disabled()) > + int i =3D memcg_stats_index(idx); > + > + if (mem_cgroup_disabled() || i < 0) > return; > > - __this_cpu_add(memcg->vmstats_percpu->state[idx], val); > + __this_cpu_add(memcg->vmstats_percpu->state[i], val); > memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val)); > } > > /* idx can be of type enum memcg_stat_item or node_stat_item. */ > static unsigned long memcg_page_state_local(struct mem_cgroup *memcg, in= t idx) > { > - long x =3D READ_ONCE(memcg->vmstats->state_local[idx]); > + long x; > + int i =3D memcg_stats_index(idx); > + > + if (i < 0) > + return 0; > > + x =3D READ_ONCE(memcg->vmstats->state_local[i]); > #ifdef CONFIG_SMP > if (x < 0) > x =3D 0; > @@ -901,6 +988,10 @@ static void __mod_memcg_lruvec_state(struct lruvec *= lruvec, > { > struct mem_cgroup_per_node *pn; > struct mem_cgroup *memcg; > + int i =3D memcg_stats_index(idx); > + > + if (i < 0) > + return; > > pn =3D container_of(lruvec, struct mem_cgroup_per_node, lruvec); > memcg =3D pn->memcg; > @@ -930,10 +1021,10 @@ static void __mod_memcg_lruvec_state(struct lruvec= *lruvec, > } > > /* Update memcg */ > - __this_cpu_add(memcg->vmstats_percpu->state[idx], val); > + __this_cpu_add(memcg->vmstats_percpu->state[i], val); > > /* Update lruvec */ > - __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); > + __this_cpu_add(pn->lruvec_stats_percpu->state[i], val); > > memcg_rstat_updated(memcg, memcg_state_val_in_pages(idx, val)); > memcg_stats_unlock(); > @@ -5702,6 +5793,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pa= rent_css) > page_counter_init(&memcg->kmem, &parent->kmem); > page_counter_init(&memcg->tcpmem, &parent->tcpmem); > } else { > + init_memcg_stats(); > init_memcg_events(); > page_counter_init(&memcg->memory, NULL); > page_counter_init(&memcg->swap, NULL); > @@ -5873,7 +5965,7 @@ static void mem_cgroup_css_rstat_flush(struct cgrou= p_subsys_state *css, int cpu) > > statc =3D per_cpu_ptr(memcg->vmstats_percpu, cpu); > > - for (i =3D 0; i < MEMCG_NR_STAT; i++) { > + for (i =3D 0; i < NR_MEMCG_STATS; i++) { > /* > * Collect the aggregated propagation counts of groups > * below us. We're in a per-cpu loop here and this is > @@ -5937,7 +6029,7 @@ static void mem_cgroup_css_rstat_flush(struct cgrou= p_subsys_state *css, int cpu) > > lstatc =3D per_cpu_ptr(pn->lruvec_stats_percpu, cpu); > > - for (i =3D 0; i < NR_VM_NODE_STAT_ITEMS; i++) { > + for (i =3D 0; i < NR_MEMCG_NODE_STAT_ITEMS; i++) { > delta =3D lstats->state_pending[i]; > if (delta) > lstats->state_pending[i] =3D 0; > -- > 2.43.0 >