linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Sandipan Das <sandipan@linux.ibm.com>, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, khlebnikov@yandex-team.ru, mhocko@suse.com,
	kirill@shutemov.name, aneesh.kumar@linux.ibm.com,
	srikar@linux.vnet.ibm.com
Subject: Re: [PATCH v2] mm: Reset numa stats for boot pagesets
Date: Mon, 11 May 2020 13:13:06 +0200	[thread overview]
Message-ID: <e29844d7-e4bd-5482-33e2-345dd64468b7@suse.cz> (raw)
In-Reply-To: <9c9c2d1b15e37f6e6bf32f99e3100035e90c4ac9.1588868430.git.sandipan@linux.ibm.com>

On 5/7/20 6:29 PM, Sandipan Das wrote:
> Initially, the per-cpu pagesets of each zone are set to the
> boot pagesets. The real pagesets are allocated later but
> before that happens, page allocations do occur and the numa
> stats for the boot pagesets get incremented since they are
> common to all zones at that point.
> 
> The real pagesets, however, are allocated for the populated
> zones only. Unpopulated zones, like those associated with
> memory-less nodes, continue using the boot pageset and end
> up skewing the numa stats of the corresponding node.
> 
> E.g.
> 
>   $ numactl -H
>   available: 2 nodes (0-1)
>   node 0 cpus: 0 1 2 3
>   node 0 size: 0 MB
>   node 0 free: 0 MB
>   node 1 cpus: 4 5 6 7
>   node 1 size: 8131 MB
>   node 1 free: 6980 MB
>   node distances:
>   node   0   1
>     0:  10  40
>     1:  40  10
> 
>   $ numastat
>                              node0           node1
>   numa_hit                     108           56495
>   numa_miss                      0               0
>   numa_foreign                   0               0
>   interleave_hit                 0            4537
>   local_node                   108           31547
>   other_node                     0           24948
> 
> Hence, the boot pageset stats need to be cleared after
> the real pagesets are allocated.
> 
> From this point onwards, the stats of the boot pagesets do
> not change as page allocations requested for a memory-less
> node will either fail (if __GFP_THISNODE is used) or get
> fulfilled by a preferred zone of a different node based on
> the fallback zonelist.
> 
> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>
With suggestion below.

> ---
> 
> The previous version and discussion around it can be found at
> https://lore.kernel.org/linux-mm/20200504070304.127361-1-sandipan@linux.ibm.com/
> 
> Changes in v2:
> 
> - Reset the stats of the boot pagesets instead of explicitly
>   returning zero as suggested by Vlastimil.
> 
> - Changed the subject to reflect the above.
> 
> ---
>  mm/page_alloc.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 69827d4fa052..1543e32f7e4e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6256,6 +6256,25 @@ void __init setup_per_cpu_pageset(void)
>  	for_each_populated_zone(zone)
>  		setup_zone_pageset(zone);
>  
> +#ifdef CONFIG_NUMA
> +	if (static_branch_likely(&vm_numa_stat_key)) {

I would just remove this test and do it unconditionally, as the branch can be
only disabled later in boot by a sysctl.

> +		struct per_cpu_pageset *pcp;
> +		int cpu;
> +
> +		/*
> +		 * Unpopulated zones continue using the boot pagesets.
> +		 * The numa stats for these pagesets need to be reset.
> +		 * Otherwise, they will end up skewing the stats of
> +		 * the nodes these zones are associated with.
> +		 */
> +		for_each_possible_cpu(cpu) {
> +			pcp = &per_cpu(boot_pageset, cpu);
> +			memset(pcp->vm_numa_stat_diff, 0,
> +			       sizeof(pcp->vm_numa_stat_diff));
> +		}
> +	}
> +#endif
> +
>  	for_each_online_pgdat(pgdat)
>  		pgdat->per_cpu_nodestats =
>  			alloc_percpu(struct per_cpu_nodestat);
> 



      reply	other threads:[~2020-05-11 11:13 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-07 16:29 Sandipan Das
2020-05-11 11:13 ` Vlastimil Babka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e29844d7-e4bd-5482-33e2-345dd64468b7@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=sandipan@linux.ibm.com \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox