Re: [PATCH v4 01/12] mm/vmstat: remove remote node draining

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Vlastimil Babka <vbabka@suse.cz>, Leonardo Bras <leobras@redhat.com>
Cc: Christoph Lameter <cl@linux.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Aaron Tomlin <atomlin@atomlin.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Russell King <linux@armlinux.org.uk>,
	Huacai Chen <chenhuacai@kernel.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	x86@kernel.org, David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH v4 01/12] mm/vmstat: remove remote node draining
Date: Mon, 13 Mar 2023 13:41:06 -0300	[thread overview]
Message-ID: <ZA9SImzAVxuFiOxb@tpad> (raw)
In-Reply-To: <015212c4-8e3b-253a-f307-56a3e331060e@suse.cz>

On Thu, Mar 09, 2023 at 06:11:30PM +0100, Vlastimil Babka wrote:
> On 3/5/23 14:36, Marcelo Tosatti wrote:
> > Draining of pages from the local pcp for a remote zone should not be
> > necessary, since once the system is low on memory (or compaction on a
> > zone is in effect), drain_all_pages should be called freeing any unused
> > pcps.
> 
> Hmm I can't easily say this is a good idea without a proper evaluation. It's
> true that drain_all_pages() will be called from
> __alloc_pages_direct_reclaim(), but if you look closely, it's only after
> __perform_reclaim() and subsequent get_page_from_freelist() failure, which
> means very low on memory. There's also kcompactd_do_work() caller, but that
> shouldn't respond to order-0 pressure.
> 
> Maybe part of the problem is that pages sitting in pcplist are not counted
> as NR_FREE_PAGES, so watermark checks will not "see" them as available. That
> would be true for both local and remote nodes as we process the zonelist,
> but if pages are stuck on remote pcplists, it could make the problem worse
> than if they stayed only in local ones.
> 
> But the good news should be that for this series you shouldn't actually need
> this patch, AFAIU. Last year Mel did the "Drain remote per-cpu directly"
> series and followups, so remote draining of pcplists is today done without
> sending an IPI to the remote CPU.

Vlastimil,

Sent -v5 which should address the concerns regarding pages stuck on
remote pcplists. Let me know if you have other concerns.

> BTW I kinda like the implementation that relies on "per-cpu spin lock"
> that's used mostly locally thus uncontended, and only rarely remotely. Much
> simpler than complicated cmpxchg schemes, wonder if those are really still
> such a win these days, e.g. in SLUB or here in vmstat...

Good question (to which i don't know the answer).

Thanks.

> > For reference, the original commit which introduces remote node
> > draining is 4037d452202e34214e8a939fa5621b2b3bbb45b7.
> > 
> > Acked-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > Index: linux-vmstat-remote/include/linux/mmzone.h
> > ===================================================================
> > --- linux-vmstat-remote.orig/include/linux/mmzone.h
> > +++ linux-vmstat-remote/include/linux/mmzone.h
> > @@ -679,9 +679,6 @@ struct per_cpu_pages {
> >  	int high;		/* high watermark, emptying needed */
> >  	int batch;		/* chunk size for buddy add/remove */
> >  	short free_factor;	/* batch scaling factor during free */
> > -#ifdef CONFIG_NUMA
> > -	short expire;		/* When 0, remote pagesets are drained */
> > -#endif
> >  
> >  	/* Lists of pages, one per migrate type stored on the pcp-lists */
> >  	struct list_head lists[NR_PCP_LISTS];
> > Index: linux-vmstat-remote/mm/vmstat.c
> > ===================================================================
> > --- linux-vmstat-remote.orig/mm/vmstat.c
> > +++ linux-vmstat-remote/mm/vmstat.c
> > @@ -803,20 +803,16 @@ static int fold_diff(int *zone_diff, int
> >   *
> >   * The function returns the number of global counters updated.
> >   */
> > -static int refresh_cpu_vm_stats(bool do_pagesets)
> > +static int refresh_cpu_vm_stats(void)
> >  {
> >  	struct pglist_data *pgdat;
> >  	struct zone *zone;
> >  	int i;
> >  	int global_zone_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, };
> >  	int global_node_diff[NR_VM_NODE_STAT_ITEMS] = { 0, };
> > -	int changes = 0;
> >  
> >  	for_each_populated_zone(zone) {
> >  		struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats;
> > -#ifdef CONFIG_NUMA
> > -		struct per_cpu_pages __percpu *pcp = zone->per_cpu_pageset;
> > -#endif
> >  
> >  		for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) {
> >  			int v;
> > @@ -826,44 +822,8 @@ static int refresh_cpu_vm_stats(bool do_
> >  
> >  				atomic_long_add(v, &zone->vm_stat[i]);
> >  				global_zone_diff[i] += v;
> > -#ifdef CONFIG_NUMA
> > -				/* 3 seconds idle till flush */
> > -				__this_cpu_write(pcp->expire, 3);
> > -#endif
> >  			}
> >  		}
> > -#ifdef CONFIG_NUMA
> > -
> > -		if (do_pagesets) {
> > -			cond_resched();
> > -			/*
> > -			 * Deal with draining the remote pageset of this
> > -			 * processor
> > -			 *
> > -			 * Check if there are pages remaining in this pageset
> > -			 * if not then there is nothing to expire.
> > -			 */
> > -			if (!__this_cpu_read(pcp->expire) ||
> > -			       !__this_cpu_read(pcp->count))
> > -				continue;
> > -
> > -			/*
> > -			 * We never drain zones local to this processor.
> > -			 */
> > -			if (zone_to_nid(zone) == numa_node_id()) {
> > -				__this_cpu_write(pcp->expire, 0);
> > -				continue;
> > -			}
> > -
> > -			if (__this_cpu_dec_return(pcp->expire))
> > -				continue;
> > -
> > -			if (__this_cpu_read(pcp->count)) {
> > -				drain_zone_pages(zone, this_cpu_ptr(pcp));
> > -				changes++;
> > -			}
> > -		}
> > -#endif
> >  	}
> >  
> >  	for_each_online_pgdat(pgdat) {
> > @@ -880,8 +840,7 @@ static int refresh_cpu_vm_stats(bool do_
> >  		}
> >  	}
> >  
> > -	changes += fold_diff(global_zone_diff, global_node_diff);
> > -	return changes;
> > +	return fold_diff(global_zone_diff, global_node_diff);
> >  }
> >  
> >  /*
> > @@ -1867,7 +1826,7 @@ int sysctl_stat_interval __read_mostly =
> >  #ifdef CONFIG_PROC_FS
> >  static void refresh_vm_stats(struct work_struct *work)
> >  {
> > -	refresh_cpu_vm_stats(true);
> > +	refresh_cpu_vm_stats();
> >  }
> >  
> >  int vmstat_refresh(struct ctl_table *table, int write,
> > @@ -1877,6 +1836,8 @@ int vmstat_refresh(struct ctl_table *tab
> >  	int err;
> >  	int i;
> >  
> > +	drain_all_pages(NULL);
> > +
> >  	/*
> >  	 * The regular update, every sysctl_stat_interval, may come later
> >  	 * than expected: leaving a significant amount in per_cpu buckets.
> > @@ -1931,7 +1892,7 @@ int vmstat_refresh(struct ctl_table *tab
> >  
> >  static void vmstat_update(struct work_struct *w)
> >  {
> > -	if (refresh_cpu_vm_stats(true)) {
> > +	if (refresh_cpu_vm_stats()) {
> >  		/*
> >  		 * Counters were updated so we expect more updates
> >  		 * to occur in the future. Keep on running the
> > @@ -1994,7 +1955,7 @@ void quiet_vmstat(void)
> >  	 * it would be too expensive from this path.
> >  	 * vmstat_shepherd will take care about that for us.
> >  	 */
> > -	refresh_cpu_vm_stats(false);
> > +	refresh_cpu_vm_stats();
> >  }
> >  
> >  /*
> > Index: linux-vmstat-remote/mm/page_alloc.c
> > ===================================================================
> > --- linux-vmstat-remote.orig/mm/page_alloc.c
> > +++ linux-vmstat-remote/mm/page_alloc.c
> > @@ -3176,26 +3176,6 @@ static int rmqueue_bulk(struct zone *zon
> >  	return allocated;
> >  }
> >  
> > -#ifdef CONFIG_NUMA
> > -/*
> > - * Called from the vmstat counter updater to drain pagesets of this
> > - * currently executing processor on remote nodes after they have
> > - * expired.
> > - */
> > -void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
> > -{
> > -	int to_drain, batch;
> > -
> > -	batch = READ_ONCE(pcp->batch);
> > -	to_drain = min(pcp->count, batch);
> > -	if (to_drain > 0) {
> > -		spin_lock(&pcp->lock);
> > -		free_pcppages_bulk(zone, to_drain, pcp, 0);
> > -		spin_unlock(&pcp->lock);
> > -	}
> > -}
> > -#endif
> > -
> >  /*
> >   * Drain pcplists of the indicated processor and zone.
> >   */
> > 
> > 
> > 
> 
> 
>

next prev parent reply	other threads:[~2023-03-13 16:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-05 13:36 [PATCH v4 00/12] fold per-CPU vmstats remotely Marcelo Tosatti
2023-03-05 13:36 ` [PATCH v4 01/12] mm/vmstat: remove remote node draining Marcelo Tosatti
2023-03-09 17:11   ` Vlastimil Babka
2023-03-13 16:41     ` Marcelo Tosatti [this message]
2023-03-05 13:36 ` [PATCH v4 02/12] this_cpu_cmpxchg: ARM64: switch this_cpu_cmpxchg to locked, add _local function Marcelo Tosatti
2023-03-05 13:37 ` [PATCH v4 03/12] this_cpu_cmpxchg: loongarch: " Marcelo Tosatti
2023-03-05 13:37 ` [PATCH v4 04/12] this_cpu_cmpxchg: S390: " Marcelo Tosatti
2023-03-05 13:37 ` [PATCH v4 05/12] this_cpu_cmpxchg: x86: " Marcelo Tosatti
2023-03-06 11:22   ` Peter Zijlstra
2023-03-08 21:42     ` Marcelo Tosatti
2023-03-05 13:37 ` [PATCH v4 06/12] add this_cpu_cmpxchg_local and asm-generic definitions Marcelo Tosatti
2023-03-05 13:37 ` [PATCH v4 07/12] convert this_cpu_cmpxchg users to this_cpu_cmpxchg_local Marcelo Tosatti
2023-03-05 13:37 ` [PATCH v4 08/12] mm/vmstat: switch counter modification to cmpxchg Marcelo Tosatti
2023-03-05 13:37 ` [PATCH v4 09/12] vmstat: switch per-cpu vmstat counters to 32-bits Marcelo Tosatti
2023-03-05 13:37 ` [PATCH v4 10/12] mm/vmstat: use xchg in cpu_vm_stats_fold Marcelo Tosatti
2023-03-05 13:37 ` [PATCH v4 11/12] mm/vmstat: switch vmstat shepherd to flush per-CPU counters remotely Marcelo Tosatti
2023-03-05 13:37 ` [PATCH v4 12/12] mm/vmstat: refresh stats remotely instead of via work item Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZA9SImzAVxuFiOxb@tpad \
    --to=mtosatti@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=atomlin@atomlin.com \
    --cc=chenhuacai@kernel.org \
    --cc=cl@linux.com \
    --cc=david@redhat.com \
    --cc=frederic@kernel.org \
    --cc=hca@linux.ibm.com \
    --cc=leobras@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@armlinux.org.uk \
    --cc=mgorman@techsingularity.net \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox