From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75170C433EF for ; Mon, 28 Feb 2022 12:22:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17E318D0005; Mon, 28 Feb 2022 07:22:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 12DB78D0001; Mon, 28 Feb 2022 07:22:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F11888D0005; Mon, 28 Feb 2022 07:22:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0186.hostedemail.com [216.40.44.186]) by kanga.kvack.org (Postfix) with ESMTP id E43C98D0001 for ; Mon, 28 Feb 2022 07:22:23 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9CDE19528E for ; Mon, 28 Feb 2022 12:22:23 +0000 (UTC) X-FDA: 79192101366.25.653ADDC Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) by imf07.hostedemail.com (Postfix) with ESMTP id 24DEC4000C for ; Mon, 28 Feb 2022 12:22:23 +0000 (UTC) Received: by mail-pg1-f181.google.com with SMTP id o26so10567003pgb.8 for ; Mon, 28 Feb 2022 04:22:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=19HOHREB3NEbsy4dM9jVcuvJDaSZuxg6+0rHeJjgl+I=; b=o+jT2XLIR7Y7UVKvY8JafceKomqSpgXrMRhIUcYJEMeon2pgAq7w7Uuq6AU2Sz6mcz N0ockYvnCBlCZCqiFw2JXLUem3Zky3vbq36reHdc0d4In6703th6HilA6xP9AiTN7n26 Bn80c4iFAd4JUYBDniNKWxraNAivV0PDCX8gauhtZ3p+IbtPR6Jeetw2at8WknPas5MP dg5TX1SLraM+pGz8sxmn34po3dlRur/2Rdeb3Lmv5eexUZdv2hsZfjVUAo9HQtX5i8By +HmHL1kWTAFRlHbDPah2ymV0bPFfwq+yYJxfpuwRhLf1XpHB05DxgvW1NOiu517gGrCa +UaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=19HOHREB3NEbsy4dM9jVcuvJDaSZuxg6+0rHeJjgl+I=; b=ScF87b+PC7w/sJOXIqxacQKksUbktgI5sUg27L7Az2aCnnM0mezfN2t903dj4PlV6L HnrcJHQcqYTCYpEAQ+hu9lXhiVQ/xDLdk01+KtATeGhLRzvQqo9+LgS5zwLv6bPyId+j YH+aCbpiZJ2WGbD71VCP7otPt2x16C2PvUYf4BFSkKn55qeQ09keXTFkWzBkErjk6Suf N75LugyVi64q2NwxZS2aMUF0MfYaGQafAx3wDGgbbHHrCg8bxps2KsO9+o+aH84r5xtL EyC8BB95uKnRmfUFFq9q9/l+jB9EypjgaRXCtzDTnjnGBCunf5Uajm/ogyS0y0kqaz2d F9Yg== X-Gm-Message-State: AOAM533ZCsJsRZWcUZYh5378/AbVPrCFJ27TwhLUaGTiSgBiDFG4gLGG b7XXkpGnl8044FPwV9An0Z+fjg== X-Google-Smtp-Source: ABdhPJzIC8BNlybGIkNHzucmSZt/c2J6UPZC8cp/U77QOEXiYl5AUGQs6/Cq/PkNCHbZpwqYr/hwAg== X-Received: by 2002:a05:6a00:1c47:b0:4e1:2c3a:ac3d with SMTP id s7-20020a056a001c4700b004e12c3aac3dmr21147863pfw.15.1646050942145; Mon, 28 Feb 2022 04:22:22 -0800 (PST) Received: from FVFYT0MHHV2J.tiktokcdn.com ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id ep22-20020a17090ae65600b001b92477db10sm10466753pjb.29.2022.02.28.04.22.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Feb 2022 04:22:21 -0800 (PST) From: Muchun Song To: willy@infradead.org, akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, shakeelb@google.com, roman.gushchin@linux.dev, shy828301@gmail.com, alexs@kernel.org, richard.weiyang@gmail.com, david@fromorbit.com, trond.myklebust@hammerspace.com, anna.schumaker@netapp.com, jaegeuk@kernel.org, chao@kernel.org, kari.argillander@gmail.com, vbabka@suse.cz Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org, zhengqi.arch@bytedance.com, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v6 01/16] mm: list_lru: transpose the array of per-node per-memcg lru lists Date: Mon, 28 Feb 2022 20:21:11 +0800 Message-Id: <20220228122126.37293-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220228122126.37293-1-songmuchun@bytedance.com> References: <20220228122126.37293-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 24DEC4000C X-Rspam-User: Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=o+jT2XLI; spf=pass (imf07.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.181 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Stat-Signature: c8j98kx4t8gyhnb5muig1cr7fki7m6fy X-HE-Tag: 1646050943-196745 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The current scheme of maintaining per-node per-memcg lru lists looks like= : struct list_lru { struct list_lru_node *node; (for each node) struct list_lru_memcg *memcg_lrus; struct list_lru_one *lru[]; (for each memcg) } By effectively transposing the two-dimension array of list_lru_one's stru= ctures (per-node per-memcg =3D> per-memcg per-node) it's possible to save some m= emory and simplify alloc/dealloc paths. The new scheme looks like: struct list_lru { struct list_lru_memcg *mlrus; struct list_lru_per_memcg *mlru[]; (for each memcg) struct list_lru_one node[0]; (for each node) } Memory savings are coming from not only 'struct rcu_head' but also some pointer arrays used to store the pointer to 'struct list_lru_one'. The array is per node and its size is 8 (a pointer) * num_memcgs. So the tota= l size of the arrays is 8 * num_nodes * memcg_nr_cache_ids. After this pat= ch, the size becomes 8 * memcg_nr_cache_ids. Signed-off-by: Muchun Song Acked-by: Johannes Weiner --- include/linux/list_lru.h | 17 ++-- mm/list_lru.c | 206 +++++++++++++++++------------------------= ------ 2 files changed, 86 insertions(+), 137 deletions(-) diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 1b5fceb565df..729a27b6ff53 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -31,10 +31,15 @@ struct list_lru_one { long nr_items; }; =20 +struct list_lru_per_memcg { + /* array of per cgroup per node lists, indexed by node id */ + struct list_lru_one node[0]; +}; + struct list_lru_memcg { - struct rcu_head rcu; + struct rcu_head rcu; /* array of per cgroup lists, indexed by memcg_cache_id */ - struct list_lru_one *lru[]; + struct list_lru_per_memcg *mlru[]; }; =20 struct list_lru_node { @@ -42,11 +47,7 @@ struct list_lru_node { spinlock_t lock; /* global list, used for the root cgroup in cgroup aware lrus */ struct list_lru_one lru; -#ifdef CONFIG_MEMCG_KMEM - /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ - struct list_lru_memcg __rcu *memcg_lrus; -#endif - long nr_items; + long nr_items; } ____cacheline_aligned_in_smp; =20 struct list_lru { @@ -55,6 +56,8 @@ struct list_lru { struct list_head list; int shrinker_id; bool memcg_aware; + /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ + struct list_lru_memcg __rcu *mlrus; #endif }; =20 diff --git a/mm/list_lru.c b/mm/list_lru.c index 0cd5e89ca063..7d1356241aa8 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -49,35 +49,37 @@ static int lru_shrinker_id(struct list_lru *lru) } =20 static inline struct list_lru_one * -list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx) +list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx) { - struct list_lru_memcg *memcg_lrus; + struct list_lru_memcg *mlrus; + struct list_lru_node *nlru =3D &lru->node[nid]; + /* * Either lock or RCU protects the array of per cgroup lists - * from relocation (see memcg_update_list_lru_node). + * from relocation (see memcg_update_list_lru). */ - memcg_lrus =3D rcu_dereference_check(nlru->memcg_lrus, - lockdep_is_held(&nlru->lock)); - if (memcg_lrus && idx >=3D 0) - return memcg_lrus->lru[idx]; + mlrus =3D rcu_dereference_check(lru->mlrus, lockdep_is_held(&nlru->lock= )); + if (mlrus && idx >=3D 0) + return &mlrus->mlru[idx]->node[nid]; return &nlru->lru; } =20 static inline struct list_lru_one * -list_lru_from_kmem(struct list_lru_node *nlru, void *ptr, +list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr, struct mem_cgroup **memcg_ptr) { + struct list_lru_node *nlru =3D &lru->node[nid]; struct list_lru_one *l =3D &nlru->lru; struct mem_cgroup *memcg =3D NULL; =20 - if (!nlru->memcg_lrus) + if (!lru->mlrus) goto out; =20 memcg =3D mem_cgroup_from_obj(ptr); if (!memcg) goto out; =20 - l =3D list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg)); + l =3D list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg)); out: if (memcg_ptr) *memcg_ptr =3D memcg; @@ -103,18 +105,18 @@ static inline bool list_lru_memcg_aware(struct list= _lru *lru) } =20 static inline struct list_lru_one * -list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx) +list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx) { - return &nlru->lru; + return &lru->node[nid].lru; } =20 static inline struct list_lru_one * -list_lru_from_kmem(struct list_lru_node *nlru, void *ptr, +list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr, struct mem_cgroup **memcg_ptr) { if (memcg_ptr) *memcg_ptr =3D NULL; - return &nlru->lru; + return &lru->node[nid].lru; } #endif /* CONFIG_MEMCG_KMEM */ =20 @@ -127,7 +129,7 @@ bool list_lru_add(struct list_lru *lru, struct list_h= ead *item) =20 spin_lock(&nlru->lock); if (list_empty(item)) { - l =3D list_lru_from_kmem(nlru, item, &memcg); + l =3D list_lru_from_kmem(lru, nid, item, &memcg); list_add_tail(item, &l->list); /* Set shrinker bit if the first element was added */ if (!l->nr_items++) @@ -150,7 +152,7 @@ bool list_lru_del(struct list_lru *lru, struct list_h= ead *item) =20 spin_lock(&nlru->lock); if (!list_empty(item)) { - l =3D list_lru_from_kmem(nlru, item, NULL); + l =3D list_lru_from_kmem(lru, nid, item, NULL); list_del_init(item); l->nr_items--; nlru->nr_items--; @@ -180,12 +182,11 @@ EXPORT_SYMBOL_GPL(list_lru_isolate_move); unsigned long list_lru_count_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg) { - struct list_lru_node *nlru =3D &lru->node[nid]; struct list_lru_one *l; long count; =20 rcu_read_lock(); - l =3D list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg)); + l =3D list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg)); count =3D READ_ONCE(l->nr_items); rcu_read_unlock(); =20 @@ -206,16 +207,16 @@ unsigned long list_lru_count_node(struct list_lru *= lru, int nid) EXPORT_SYMBOL_GPL(list_lru_count_node); =20 static unsigned long -__list_lru_walk_one(struct list_lru_node *nlru, int memcg_idx, +__list_lru_walk_one(struct list_lru *lru, int nid, int memcg_idx, list_lru_walk_cb isolate, void *cb_arg, unsigned long *nr_to_walk) { - + struct list_lru_node *nlru =3D &lru->node[nid]; struct list_lru_one *l; struct list_head *item, *n; unsigned long isolated =3D 0; =20 - l =3D list_lru_from_memcg_idx(nlru, memcg_idx); + l =3D list_lru_from_memcg_idx(lru, nid, memcg_idx); restart: list_for_each_safe(item, n, &l->list) { enum lru_status ret; @@ -272,8 +273,8 @@ list_lru_walk_one(struct list_lru *lru, int nid, stru= ct mem_cgroup *memcg, unsigned long ret; =20 spin_lock(&nlru->lock); - ret =3D __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_ar= g, - nr_to_walk); + ret =3D __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate, + cb_arg, nr_to_walk); spin_unlock(&nlru->lock); return ret; } @@ -288,8 +289,8 @@ list_lru_walk_one_irq(struct list_lru *lru, int nid, = struct mem_cgroup *memcg, unsigned long ret; =20 spin_lock_irq(&nlru->lock); - ret =3D __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_ar= g, - nr_to_walk); + ret =3D __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate, + cb_arg, nr_to_walk); spin_unlock_irq(&nlru->lock); return ret; } @@ -308,7 +309,7 @@ unsigned long list_lru_walk_node(struct list_lru *lru= , int nid, struct list_lru_node *nlru =3D &lru->node[nid]; =20 spin_lock(&nlru->lock); - isolated +=3D __list_lru_walk_one(nlru, memcg_idx, + isolated +=3D __list_lru_walk_one(lru, nid, memcg_idx, isolate, cb_arg, nr_to_walk); spin_unlock(&nlru->lock); @@ -328,166 +329,111 @@ static void init_one_lru(struct list_lru_one *l) } =20 #ifdef CONFIG_MEMCG_KMEM -static void __memcg_destroy_list_lru_node(struct list_lru_memcg *memcg_l= rus, - int begin, int end) +static void memcg_destroy_list_lru_range(struct list_lru_memcg *mlrus, + int begin, int end) { int i; =20 for (i =3D begin; i < end; i++) - kfree(memcg_lrus->lru[i]); + kfree(mlrus->mlru[i]); } =20 -static int __memcg_init_list_lru_node(struct list_lru_memcg *memcg_lrus, - int begin, int end) +static int memcg_init_list_lru_range(struct list_lru_memcg *mlrus, + int begin, int end) { int i; =20 for (i =3D begin; i < end; i++) { - struct list_lru_one *l; + int nid; + struct list_lru_per_memcg *mlru; =20 - l =3D kmalloc(sizeof(struct list_lru_one), GFP_KERNEL); - if (!l) + mlru =3D kmalloc(struct_size(mlru, node, nr_node_ids), GFP_KERNEL); + if (!mlru) goto fail; =20 - init_one_lru(l); - memcg_lrus->lru[i] =3D l; + for_each_node(nid) + init_one_lru(&mlru->node[nid]); + mlrus->mlru[i] =3D mlru; } return 0; fail: - __memcg_destroy_list_lru_node(memcg_lrus, begin, i); + memcg_destroy_list_lru_range(mlrus, begin, i); return -ENOMEM; } =20 -static int memcg_init_list_lru_node(struct list_lru_node *nlru) +static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware) { - struct list_lru_memcg *memcg_lrus; + struct list_lru_memcg *mlrus; int size =3D memcg_nr_cache_ids; =20 - memcg_lrus =3D kvmalloc(struct_size(memcg_lrus, lru, size), GFP_KERNEL)= ; - if (!memcg_lrus) + lru->memcg_aware =3D memcg_aware; + if (!memcg_aware) + return 0; + + mlrus =3D kvmalloc(struct_size(mlrus, mlru, size), GFP_KERNEL); + if (!mlrus) return -ENOMEM; =20 - if (__memcg_init_list_lru_node(memcg_lrus, 0, size)) { - kvfree(memcg_lrus); + if (memcg_init_list_lru_range(mlrus, 0, size)) { + kvfree(mlrus); return -ENOMEM; } - RCU_INIT_POINTER(nlru->memcg_lrus, memcg_lrus); + RCU_INIT_POINTER(lru->mlrus, mlrus); =20 return 0; } =20 -static void memcg_destroy_list_lru_node(struct list_lru_node *nlru) +static void memcg_destroy_list_lru(struct list_lru *lru) { - struct list_lru_memcg *memcg_lrus; + struct list_lru_memcg *mlrus; + + if (!list_lru_memcg_aware(lru)) + return; + /* * This is called when shrinker has already been unregistered, * and nobody can use it. So, there is no need to use kvfree_rcu(). */ - memcg_lrus =3D rcu_dereference_protected(nlru->memcg_lrus, true); - __memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids); - kvfree(memcg_lrus); + mlrus =3D rcu_dereference_protected(lru->mlrus, true); + memcg_destroy_list_lru_range(mlrus, 0, memcg_nr_cache_ids); + kvfree(mlrus); } =20 -static int memcg_update_list_lru_node(struct list_lru_node *nlru, - int old_size, int new_size) +static int memcg_update_list_lru(struct list_lru *lru, int old_size, int= new_size) { struct list_lru_memcg *old, *new; =20 BUG_ON(old_size > new_size); =20 - old =3D rcu_dereference_protected(nlru->memcg_lrus, + old =3D rcu_dereference_protected(lru->mlrus, lockdep_is_held(&list_lrus_mutex)); - new =3D kvmalloc(struct_size(new, lru, new_size), GFP_KERNEL); + new =3D kvmalloc(struct_size(new, mlru, new_size), GFP_KERNEL); if (!new) return -ENOMEM; =20 - if (__memcg_init_list_lru_node(new, old_size, new_size)) { + if (memcg_init_list_lru_range(new, old_size, new_size)) { kvfree(new); return -ENOMEM; } =20 - memcpy(&new->lru, &old->lru, flex_array_size(new, lru, old_size)); - rcu_assign_pointer(nlru->memcg_lrus, new); + memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size)); + rcu_assign_pointer(lru->mlrus, new); kvfree_rcu(old, rcu); return 0; } =20 -static void memcg_cancel_update_list_lru_node(struct list_lru_node *nlru= , - int old_size, int new_size) -{ - struct list_lru_memcg *memcg_lrus; - - memcg_lrus =3D rcu_dereference_protected(nlru->memcg_lrus, - lockdep_is_held(&list_lrus_mutex)); - /* do not bother shrinking the array back to the old size, because we - * cannot handle allocation failures here */ - __memcg_destroy_list_lru_node(memcg_lrus, old_size, new_size); -} - -static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware) -{ - int i; - - lru->memcg_aware =3D memcg_aware; - - if (!memcg_aware) - return 0; - - for_each_node(i) { - if (memcg_init_list_lru_node(&lru->node[i])) - goto fail; - } - return 0; -fail: - for (i =3D i - 1; i >=3D 0; i--) { - if (!lru->node[i].memcg_lrus) - continue; - memcg_destroy_list_lru_node(&lru->node[i]); - } - return -ENOMEM; -} - -static void memcg_destroy_list_lru(struct list_lru *lru) -{ - int i; - - if (!list_lru_memcg_aware(lru)) - return; - - for_each_node(i) - memcg_destroy_list_lru_node(&lru->node[i]); -} - -static int memcg_update_list_lru(struct list_lru *lru, - int old_size, int new_size) -{ - int i; - - for_each_node(i) { - if (memcg_update_list_lru_node(&lru->node[i], - old_size, new_size)) - goto fail; - } - return 0; -fail: - for (i =3D i - 1; i >=3D 0; i--) { - if (!lru->node[i].memcg_lrus) - continue; - - memcg_cancel_update_list_lru_node(&lru->node[i], - old_size, new_size); - } - return -ENOMEM; -} - static void memcg_cancel_update_list_lru(struct list_lru *lru, int old_size, int new_size) { - int i; + struct list_lru_memcg *mlrus; =20 - for_each_node(i) - memcg_cancel_update_list_lru_node(&lru->node[i], - old_size, new_size); + mlrus =3D rcu_dereference_protected(lru->mlrus, + lockdep_is_held(&list_lrus_mutex)); + /* + * Do not bother shrinking the array back to the old size, because we + * cannot handle allocation failures here. + */ + memcg_destroy_list_lru_range(mlrus, old_size, new_size); } =20 int memcg_update_all_list_lrus(int new_size) @@ -524,8 +470,8 @@ static void memcg_drain_list_lru_node(struct list_lru= *lru, int nid, */ spin_lock_irq(&nlru->lock); =20 - src =3D list_lru_from_memcg_idx(nlru, src_idx); - dst =3D list_lru_from_memcg_idx(nlru, dst_idx); + src =3D list_lru_from_memcg_idx(lru, nid, src_idx); + dst =3D list_lru_from_memcg_idx(lru, nid, dst_idx); =20 list_splice_init(&src->list, &dst->list); =20 --=20 2.11.0