From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DC4DC433F5 for ; Wed, 2 Mar 2022 11:50:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E73068D0002; Wed, 2 Mar 2022 06:50:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E216C8D0001; Wed, 2 Mar 2022 06:50:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D10FC8D0002; Wed, 2 Mar 2022 06:50:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id C2D2B8D0001 for ; Wed, 2 Mar 2022 06:50:10 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7DB441C10 for ; Wed, 2 Mar 2022 11:50:10 +0000 (UTC) X-FDA: 79199277780.03.0277BB3 Received: from mail-il1-f171.google.com (mail-il1-f171.google.com [209.85.166.171]) by imf26.hostedemail.com (Postfix) with ESMTP id DC21E14000C for ; Wed, 2 Mar 2022 11:50:09 +0000 (UTC) Received: by mail-il1-f171.google.com with SMTP id x14so1153408ill.12 for ; Wed, 02 Mar 2022 03:50:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2nZ2m/i6VR7jPjxNjk4YINslKjfmF+MyVbWSRuXS71o=; b=pJ5niR09sXuFrwBwErPPPOSzS5TlILIpzuqZYX9H0xSm+PjEp2iXJmmTpaIZa4MWwI KHO+ZPVs7I9Km0ZPVlru5W8uF8ZXKDRMbdaNwmNg/73W6EQ0aacjXw0CcXNs8SXW+7PK FXcbMOtKF+eencA1wdT/nbYNQN0EVH9FjxPss= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2nZ2m/i6VR7jPjxNjk4YINslKjfmF+MyVbWSRuXS71o=; b=uPQi4IXdGuPaznPAcvWRDPfG2qIXla/FzAGQ15NzkVT3KuWjj9U933BYzDD9NBTE6h zglBb6IolomoMXjsSz0h/TFtRTxYEpYOeZTs9IuT/+qC1mVQOCqeVorS8/WsgHd/BbeR 2/T3JP48NdKOLawIBUNhdmeNfQ0IS2xvxI/GG2wold1BxqLyEx0BQrDcBkIg1FnNzoPV pU/478rtyWa7qf9TKN8SHOtv0znoQm/ihSBmab7LrMZSc2RRdGgTK6x8Dg9seRhF/I28 Jm4SVWnLzzi0DKfp5msJyfwGkTdrz7KGHBkMe6+K5dEAPZDRMB72r53BcWT1XMCtxmGT JJEA== X-Gm-Message-State: AOAM5320HkLp/6J4C/YSq+0QHpuXzvS/sd/pj2ty1AVKi/PaG2ZGbtT0 bIYogjKQze7lhoBlkJdZRkGGdvpUQmU9pe7WsuJuWQ== X-Google-Smtp-Source: ABdhPJzL0NtyDSgXy5e7BXmp8znWVP0yAf4eDOMcZlwx055N/7pMxJd4tTs3bTdnlZHu4KmYKmMXEOHucHLuWx8yECE= X-Received: by 2002:a05:6e02:1a41:b0:2be:6abe:103d with SMTP id u1-20020a056e021a4100b002be6abe103dmr27607471ilv.204.1646221809038; Wed, 02 Mar 2022 03:50:09 -0800 (PST) MIME-Version: 1.0 References: <20220224185236.qgzm3jpoz2orjfcw@google.com> <20220225180345.GD12037@blackbody.suse.cz> <20220228230949.xrmy6j2glxsoffko@google.com> <20220302025022.nnmpwxmkqed2icck@google.com> In-Reply-To: <20220302025022.nnmpwxmkqed2icck@google.com> From: Frank Hofmann Date: Wed, 2 Mar 2022 11:49:58 +0000 Message-ID: Subject: Re: Regression in workingset_refault latency on 5.15 To: Shakeel Butt Cc: Ivan Babrou , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Daniel Dao , kernel-team , Linux MM , Johannes Weiner , Roman Gushchin , Feng Tang , Michal Hocko , Hillf Danton , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: DC21E14000C X-Rspam-User: Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=cloudflare.com header.s=google header.b=pJ5niR09; dmarc=pass (policy=reject) header.from=cloudflare.com; spf=none (imf26.hostedemail.com: domain of fhofmann@cloudflare.com has no SPF policy when checking 209.85.166.171) smtp.mailfrom=fhofmann@cloudflare.com X-Stat-Signature: y8k6t8aezg3t99rdjoa4cd7z64bg4d89 X-HE-Tag: 1646221809-562835 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 2, 2022 at 2:50 AM 'Shakeel Butt' via kernel-team+notifications wrote: > > On Tue, Mar 01, 2022 at 04:48:00PM -0800, Ivan Babrou wrote: > > [...] > > > Looks like you were right that targeted flush is not going to be as good. > > > Thanks a lot. Can you please try the following patch (independent of other > patches) as well? Hi Shakeel, maybe it's just me, but the codechange below confuses me. I'll comment inline. > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index d9b8df5ef212..499f75e066f3 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -75,29 +75,8 @@ enum mem_cgroup_events_target { > MEM_CGROUP_NTARGETS, > }; > > -struct memcg_vmstats_percpu { > - /* Local (CPU and cgroup) page state & events */ > - long state[MEMCG_NR_STAT]; > - unsigned long events[NR_VM_EVENT_ITEMS]; > - > - /* Delta calculation for lockless upward propagation */ > - long state_prev[MEMCG_NR_STAT]; > - unsigned long events_prev[NR_VM_EVENT_ITEMS]; > - > - /* Cgroup1: threshold notifications & softlimit tree updates */ > - unsigned long nr_page_events; > - unsigned long targets[MEM_CGROUP_NTARGETS]; > -}; > - > -struct memcg_vmstats { > - /* Aggregated (CPU and subtree) page state & events */ > - long state[MEMCG_NR_STAT]; > - unsigned long events[NR_VM_EVENT_ITEMS]; > - > - /* Pending child counts during tree propagation */ > - long state_pending[MEMCG_NR_STAT]; > - unsigned long events_pending[NR_VM_EVENT_ITEMS]; > -}; > +struct memcg_vmstats_percpu; > +struct memcg_vmstats; > > struct mem_cgroup_reclaim_iter { > struct mem_cgroup *position; > @@ -304,7 +283,7 @@ struct mem_cgroup { > MEMCG_PADDING(_pad1_); > > /* memory.stat */ > - struct memcg_vmstats vmstats; > + struct memcg_vmstats *vmstats; > > /* memory.events */ > atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; > @@ -964,11 +943,7 @@ static inline void mod_memcg_state(struct mem_cgroup > *memcg, > local_irq_restore(flags); > } > > -static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int > idx) > -{ > - return READ_ONCE(memcg->vmstats.state[idx]); > -} > - > +unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx); > static inline unsigned long lruvec_page_state(struct lruvec *lruvec, > enum node_stat_item idx) > { > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 32ba963ebf2e..c65f155c2048 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -688,6 +688,71 @@ static void flush_memcg_stats_dwork(struct work_struct > *w) > queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ); > } > > +static const unsigned int memcg_vm_events[] = { > + PGPGIN, > + PGPGOUT, > + PGFAULT, > + PGMAJFAULT, > + PGREFILL, > + PGSCAN_KSWAPD, > + PGSCAN_DIRECT, > + PGSTEAL_KSWAPD, > + PGSTEAL_DIRECT, > + PGACTIVATE, > + PGDEACTIVATE, > + PGLAZYFREE, > + PGLAZYFREED, > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > + THP_FAULT_ALLOC, > + THP_COLLAPSE_ALLOC, > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > +}; > +#define NR_MEMCG_EVENTS ARRAY_SIZE(memcg_vm_events) > + > +static int memcg_events_index[NR_VM_EVENT_ITEMS] __read_mostly; > + > +static void init_memcg_events(void) > +{ > + int i; > + > + for (i = 0; i < NR_MEMCG_EVENTS; ++i) > + memcg_events_index[memcg_vm_events[i]] = i + 1; > +} > + > +static int get_memcg_events_index(enum vm_event_item idx) > +{ > + return memcg_events_index[idx] - 1; > +} It's this. The table, memcg_vm_events[], is basically a "truth set" - if the event is in this set, update memcg_events_index[] for it. This code looks like one attempts to re-base C arrays to start-at-one though. It then does that accessor function to translate "unsigned zero" to "signed -1" ... for a truth test. > + > +struct memcg_vmstats_percpu { > + /* Local (CPU and cgroup) page state & events */ > + long state[MEMCG_NR_STAT]; > + unsigned long events[NR_MEMCG_EVENTS]; > + > + /* Delta calculation for lockless upward propagation */ > + long state_prev[MEMCG_NR_STAT]; > + unsigned long events_prev[NR_MEMCG_EVENTS]; > + > + /* Cgroup1: threshold notifications & softlimit tree updates */ > + unsigned long nr_page_events; > + unsigned long targets[MEM_CGROUP_NTARGETS]; > +}; > + > +struct memcg_vmstats { > + /* Aggregated (CPU and subtree) page state & events */ > + long state[MEMCG_NR_STAT]; > + unsigned long events[NR_MEMCG_EVENTS]; > + > + /* Pending child counts during tree propagation */ > + long state_pending[MEMCG_NR_STAT]; > + unsigned long events_pending[NR_MEMCG_EVENTS]; > +}; > + > +unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) > +{ > + return READ_ONCE(memcg->vmstats->state[idx]); > +} > + > /** > * __mod_memcg_state - update cgroup memory statistics > * @memcg: the memory cgroup > @@ -831,25 +896,33 @@ static inline void mod_objcg_mlstate(struct > obj_cgroup *objcg, > void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, > unsigned long count) > { > - if (mem_cgroup_disabled()) > + int index = get_memcg_events_index(idx); > + > + if (mem_cgroup_disabled() || index < 0) And it's these where it's used; basically, this says "if the event is not in the monitored set, return". Likewise in all below. Why not have this explicit ? Make something like: static int memcg_event_exists_percg(int index) { switch (index) { case PGPGIN: case PGPGOUT: case PGFAULT: case PGMAJFAULT: case PGREFILL: case PGSCAN_KSWAPD: case PGSCAN_DIRECT: case PGSTEAL_KSWAPD: case PGSTEAL_DIRECT: case PGACTIVATE: case PGDEACTIVATE: case PGLAZYFREE: case PGLAZYFREED: #ifdef CONFIG_TRANSPARENT_HUGEPAGE case THP_FAULT_ALLOC: case THP_COLLAPSE_ALLOC: #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ return 1; default: return 0; } } and let the compiler choose how to lay this out where-used ? > return; > > - __this_cpu_add(memcg->vmstats_percpu->events[idx], count); > + __this_cpu_add(memcg->vmstats_percpu->events[index], count); [ ... ] I understand there is a memory saving from the different table types/sizes, while the vm events per memcg is a (small) subset of vm events. How relevant is this to the change, though? Best regards, Frank Hofmann