From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E164BC43460 for ; Thu, 22 Apr 2021 15:18:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 485EC61222 for ; Thu, 22 Apr 2021 15:18:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 485EC61222 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 687006B006C; Thu, 22 Apr 2021 11:18:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 637E36B006E; Thu, 22 Apr 2021 11:18:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D80E6B0070; Thu, 22 Apr 2021 11:18:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0143.hostedemail.com [216.40.44.143]) by kanga.kvack.org (Postfix) with ESMTP id 31C786B006C for ; Thu, 22 Apr 2021 11:18:42 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9EB1F62C0 for ; Thu, 22 Apr 2021 15:18:41 +0000 (UTC) X-FDA: 78060360042.17.D17C61E Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf17.hostedemail.com (Postfix) with ESMTP id CC95B40002F7 for ; Thu, 22 Apr 2021 15:18:37 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5C5EEB170; Thu, 22 Apr 2021 15:18:39 +0000 (UTC) To: Mel Gorman , Andrew Morton Cc: Chuck Lever , Jesper Dangaard Brouer , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Linux-MM , Linux-RT-Users , LKML References: <20210422111441.24318-1-mgorman@techsingularity.net> <20210422111441.24318-4-mgorman@techsingularity.net> From: Vlastimil Babka Subject: Re: [PATCH 3/9] mm/vmstat: Convert NUMA statistics to basic NUMA counters Message-ID: Date: Thu, 22 Apr 2021 17:18:38 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0 MIME-Version: 1.0 In-Reply-To: <20210422111441.24318-4-mgorman@techsingularity.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: CC95B40002F7 X-Stat-Signature: g3pdhs158ybbrunjgcq9cpx9pj6ojm6o Received-SPF: none (suse.cz>: No applicable sender policy available) receiver=imf17; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: none/none X-HE-Tag: 1619104717-987512 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/22/21 1:14 PM, Mel Gorman wrote: > NUMA statistics are maintained on the zone level for hits, misses, fore= ign > etc but nothing relies on them being perfectly accurate for functional > correctness. The counters are used by userspace to get a general overvi= ew > of a workloads NUMA behaviour but the page allocator incurs a high cost= to > maintain perfect accuracy similar to what is required for a vmstat like > NR_FREE_PAGES. There even is a sysctl vm.numa_stat to allow userspace t= o > turn off the collection of NUMA statistics like NUMA_HIT. >=20 > This patch converts NUMA_HIT and friends to be NUMA events with similar > accuracy to VM events. There is a possibility that slight errors will b= e > introduced but the overall trend as seen by userspace will be similar. > The counters are no longer updated from vmstat_refresh context as it is > unnecessary overhead for counters that may never be read by userspace. > Note that counters could be maintained at the node level to save space > but it would have a user-visible impact due to /proc/zoneinfo. >=20 > [lkp@intel.com: Fix misplaced closing brace for !CONFIG_NUMA] > Signed-off-by: Mel Gorman ... > @@ -731,26 +722,34 @@ static int fold_diff(int *zone_diff, int *numa_di= ff, int *node_diff) > } > return changes; > } > -#else > -static int fold_diff(int *zone_diff, int *node_diff) > + > +#ifdef CONFIG_NUMA > +static void fold_vm_zone_numa_events(struct zone *zone) > { > - int i; > - int changes =3D 0; > + int zone_numa_events[NR_VM_NUMA_EVENT_ITEMS] =3D { 0, }; Should this be long? pzstats are, the global counters too, so seems weird= to use int as intermediate sum counter. > + int cpu; > + enum numa_stat_item item; > =20 > - for (i =3D 0; i < NR_VM_ZONE_STAT_ITEMS; i++) > - if (zone_diff[i]) { > - atomic_long_add(zone_diff[i], &vm_zone_stat[i]); > - changes++; > - } > + for_each_online_cpu(cpu) { > + struct per_cpu_zonestat *pzstats; > =20 > - for (i =3D 0; i < NR_VM_NODE_STAT_ITEMS; i++) > - if (node_diff[i]) { > - atomic_long_add(node_diff[i], &vm_node_stat[i]); > - changes++; > + pzstats =3D per_cpu_ptr(zone->per_cpu_zonestats, cpu); > + for (item =3D 0; item < NR_VM_NUMA_EVENT_ITEMS; item++) > + zone_numa_events[item] +=3D xchg(&pzstats->vm_numa_event[item], 0); > } > - return changes; > + > + for (item =3D 0; item < NR_VM_NUMA_EVENT_ITEMS; item++) > + zone_numa_event_add(zone_numa_events[item], zone, item); > } > -#endif /* CONFIG_NUMA */ > + > +void fold_vm_numa_events(void) > +{ > + struct zone *zone; > + > + for_each_populated_zone(zone) > + fold_vm_zone_numa_events(zone); > +} > +#endif > =20 > /* > * Update the zone counters for the current cpu. > @@ -774,15 +773,14 @@ static int refresh_cpu_vm_stats(bool do_pagesets) > struct zone *zone; > int i; > int global_zone_diff[NR_VM_ZONE_STAT_ITEMS] =3D { 0, }; > -#ifdef CONFIG_NUMA > - int global_numa_diff[NR_VM_NUMA_STAT_ITEMS] =3D { 0, }; > -#endif > int global_node_diff[NR_VM_NODE_STAT_ITEMS] =3D { 0, }; > int changes =3D 0; > =20 > for_each_populated_zone(zone) { > struct per_cpu_zonestat __percpu *pzstats =3D zone->per_cpu_zonestat= s; > +#ifdef CONFIG_NUMA > struct per_cpu_pages __percpu *pcp =3D zone->per_cpu_pageset; > +#endif > =20 > for (i =3D 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { > int v; > @@ -799,17 +797,6 @@ static int refresh_cpu_vm_stats(bool do_pagesets) > } > } > #ifdef CONFIG_NUMA > - for (i =3D 0; i < NR_VM_NUMA_STAT_ITEMS; i++) { > - int v; > - > - v =3D this_cpu_xchg(pzstats->vm_numa_stat_diff[i], 0); > - if (v) { > - > - atomic_long_add(v, &zone->vm_numa_stat[i]); > - global_numa_diff[i] +=3D v; > - __this_cpu_write(pcp->expire, 3); > - } > - } > =20 > if (do_pagesets) { > cond_resched(); > @@ -857,12 +844,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) > } > } > =20 > -#ifdef CONFIG_NUMA > - changes +=3D fold_diff(global_zone_diff, global_numa_diff, > - global_node_diff); > -#else > changes +=3D fold_diff(global_zone_diff, global_node_diff); > -#endif > return changes; > } > =20 > @@ -877,9 +859,6 @@ void cpu_vm_stats_fold(int cpu) > struct zone *zone; > int i; > int global_zone_diff[NR_VM_ZONE_STAT_ITEMS] =3D { 0, }; > -#ifdef CONFIG_NUMA > - int global_numa_diff[NR_VM_NUMA_STAT_ITEMS] =3D { 0, }; > -#endif > int global_node_diff[NR_VM_NODE_STAT_ITEMS] =3D { 0, }; > =20 > for_each_populated_zone(zone) { > @@ -887,7 +866,7 @@ void cpu_vm_stats_fold(int cpu) > =20 > pzstats =3D per_cpu_ptr(zone->per_cpu_zonestats, cpu); > =20 > - for (i =3D 0; i < NR_VM_ZONE_STAT_ITEMS; i++) > + for (i =3D 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { > if (pzstats->vm_stat_diff[i]) { > int v; > =20 > @@ -896,17 +875,17 @@ void cpu_vm_stats_fold(int cpu) > atomic_long_add(v, &zone->vm_stat[i]); > global_zone_diff[i] +=3D v; > } > - > + } > #ifdef CONFIG_NUMA > - for (i =3D 0; i < NR_VM_NUMA_STAT_ITEMS; i++) > - if (pzstats->vm_numa_stat_diff[i]) { > + for (i =3D 0; i < NR_VM_NUMA_EVENT_ITEMS; i++) { > + if (pzstats->vm_numa_event[i]) { > int v; Also long? > =20 > - v =3D pzstats->vm_numa_stat_diff[i]; > - pzstats->vm_numa_stat_diff[i] =3D 0; > - atomic_long_add(v, &zone->vm_numa_stat[i]); > - global_numa_diff[i] +=3D v; > + v =3D pzstats->vm_numa_event[i]; > + pzstats->vm_numa_event[i] =3D 0; > + zone_numa_event_add(v, zone, i); > } > + } > #endif > } > =20 > @@ -926,11 +905,7 @@ void cpu_vm_stats_fold(int cpu) > } > } > =20 > -#ifdef CONFIG_NUMA > - fold_diff(global_zone_diff, global_numa_diff, global_node_diff); > -#else > fold_diff(global_zone_diff, global_node_diff); > -#endif > } > =20 > /* > @@ -939,43 +914,36 @@ void cpu_vm_stats_fold(int cpu) > */ > void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstat= s) > { > - int i; > + int i, v; And the 'v' here. Maybe keep using local to each loop below and make it l= ong for the NUMA one? > - for (i =3D 0; i < NR_VM_ZONE_STAT_ITEMS; i++) > + for (i =3D 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { > if (pzstats->vm_stat_diff[i]) { > - int v =3D pzstats->vm_stat_diff[i]; > + v =3D pzstats->vm_stat_diff[i]; > pzstats->vm_stat_diff[i] =3D 0; > - atomic_long_add(v, &zone->vm_stat[i]); > - atomic_long_add(v, &vm_zone_stat[i]); > + zone_page_state_add(v, zone, i); > } > + } > =20 > #ifdef CONFIG_NUMA > - for (i =3D 0; i < NR_VM_NUMA_STAT_ITEMS; i++) > - if (pzstats->vm_numa_stat_diff[i]) { > - int v =3D pzstats->vm_numa_stat_diff[i]; > - > - pzstats->vm_numa_stat_diff[i] =3D 0; > - atomic_long_add(v, &zone->vm_numa_stat[i]); > - atomic_long_add(v, &vm_numa_stat[i]); > + for (i =3D 0; i < NR_VM_NUMA_EVENT_ITEMS; i++) { > + if (pzstats->vm_numa_event[i]) { > + v =3D pzstats->vm_numa_event[i]; > + pzstats->vm_numa_event[i] =3D 0; > + zone_numa_event_add(v, zone, i); > } > + } > #endif > } > #endif > =20 > #ifdef CONFIG_NUMA