hi, Shakeel, On Tue, May 21, 2024 at 09:18:19PM -0700, Shakeel Butt wrote: > On Tue, May 21, 2024 at 10:43:16AM +0800, Oliver Sang wrote: > > hi, Shakeel, > > > [...] > > > > we reported regression on a 2-node Skylake server. so I found a 1-node Skylake > > desktop (we don't have 1 node server) to check. > > > > Please try the following patch on both single node and dual node > machines: the regression is partially recovered by applying your patch. (but one even more regression case as below) details: since you mentioned the whole patch-set behavior last time, I applied the patch upon a94032b35e5f9 memcg: use proper type for mod_memcg_state below fd2296741e2686ed6ecd05187e4 = a94032b35e5f9 + patch for the regression in our original report, test machine is: model: Skylake nr_node: 2 nr_cpu: 104 memory: 192G regression partially recovered: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 91713 -13.0% 79833 -4.5% 87614 will-it-scale.per_process_ops detail data is in part [1] in attachment. in later threads, we also reported similar regression on other platforms. on: model: Ice Lake nr_node: 2 nr_cpu: 64 memory: 256G brand: Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz regression partially recovered but not so obvious as above: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 240373 -12.9% 209394 -10.1% 215996 will-it-scale.per_process_ops detail data is in part [2] in attachment. on: model: Sapphire Rapids nr_node: 2 nr_cpu: 224 memory: 512G brand: Intel(R) Xeon(R) Platinum 8480CTDX regression NOT recovered, even a little worse: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 78072 -5.6% 73683 -6.5% 72975 will-it-scale.per_process_ops detail data is in part [3] in attachment. for single node machine, we reported last time no regression on: model: Skylake nr_node: 1 nr_cpu: 36 memory: 32G brand: Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz we confirmed it's not impacted by this new patch, either: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-d08/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 136040 -0.1% 135881 -0.1% 135953 will-it-scale.per_process_ops if you need detail data for this comparison, please let us know. BTW, after last update, we found another single node machine which can reproduce the regression in our original report: model: Cascade Lake nr_node: 1 nr_cpu: 36 memory: 128G brand: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz the regression is also partially recovered now: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-csl-d02/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 187483 -19.4% 151162 -12.1% 164714 will-it-scale.per_process_ops detail data is in part [4] in attachment. > > > From 00a84b489b9e18abd1b8ec575ea31afacaf0734b Mon Sep 17 00:00:00 2001 > From: Shakeel Butt > Date: Tue, 21 May 2024 20:27:11 -0700 > Subject: [PATCH] memcg: rearrage fields of mem_cgroup_per_node > > At the moment the fields of mem_cgroup_per_node which get read on the > performance critical path share the cacheline with the fields which > might get updated. This cause contention of that cacheline for > concurrent readers. Let's move all the read only pointers at the start > of the struct, followed by memcg-v1 only fields and at the end fields > which get updated often. > > Signed-off-by: Shakeel Butt > --- > include/linux/memcontrol.h | 18 ++++++++++-------- > 1 file changed, 10 insertions(+), 8 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 030d34e9d117..16efd9737be9 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -96,23 +96,25 @@ struct mem_cgroup_reclaim_iter { > * per-node information in memory controller. > */ > struct mem_cgroup_per_node { > - struct lruvec lruvec; > + /* Keep the read-only fields at the start */ > + struct mem_cgroup *memcg; /* Back pointer, we cannot */ > + /* use container_of */ > > struct lruvec_stats_percpu __percpu *lruvec_stats_percpu; > struct lruvec_stats *lruvec_stats; > - > - unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; > - > - struct mem_cgroup_reclaim_iter iter; > - > struct shrinker_info __rcu *shrinker_info; > > + /* memcg-v1 only stuff in middle */ > + > struct rb_node tree_node; /* RB tree node */ > unsigned long usage_in_excess;/* Set to the value by which */ > /* the soft limit is exceeded*/ > bool on_tree; > - struct mem_cgroup *memcg; /* Back pointer, we cannot */ > - /* use container_of */ > + > + /* Fields which get updated often at the end. */ > + struct lruvec lruvec; > + unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; > + struct mem_cgroup_reclaim_iter iter; > }; > > struct mem_cgroup_threshold { > -- > 2.43.0 > >