From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C10DFC54E4B for ; Mon, 11 May 2020 11:13:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 867E3206F9 for ; Mon, 11 May 2020 11:13:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 867E3206F9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0ADD790002B; Mon, 11 May 2020 07:13:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 05D5B900006; Mon, 11 May 2020 07:13:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDBD490002B; Mon, 11 May 2020 07:13:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114]) by kanga.kvack.org (Postfix) with ESMTP id D3513900006 for ; Mon, 11 May 2020 07:13:09 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 981EE181AEF09 for ; Mon, 11 May 2020 11:13:09 +0000 (UTC) X-FDA: 76804176498.17.straw04_55b7c0f502434 X-HE-Tag: straw04_55b7c0f502434 X-Filterd-Recvd-Size: 4507 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Mon, 11 May 2020 11:13:08 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 1998CADF7; Mon, 11 May 2020 11:13:10 +0000 (UTC) Subject: Re: [PATCH v2] mm: Reset numa stats for boot pagesets To: Sandipan Das , akpm@linux-foundation.org Cc: linux-mm@kvack.org, khlebnikov@yandex-team.ru, mhocko@suse.com, kirill@shutemov.name, aneesh.kumar@linux.ibm.com, srikar@linux.vnet.ibm.com References: <9c9c2d1b15e37f6e6bf32f99e3100035e90c4ac9.1588868430.git.sandipan@linux.ibm.com> From: Vlastimil Babka Message-ID: Date: Mon, 11 May 2020 13:13:06 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <9c9c2d1b15e37f6e6bf32f99e3100035e90c4ac9.1588868430.git.sandipan@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5/7/20 6:29 PM, Sandipan Das wrote: > Initially, the per-cpu pagesets of each zone are set to the > boot pagesets. The real pagesets are allocated later but > before that happens, page allocations do occur and the numa > stats for the boot pagesets get incremented since they are > common to all zones at that point. > > The real pagesets, however, are allocated for the populated > zones only. Unpopulated zones, like those associated with > memory-less nodes, continue using the boot pageset and end > up skewing the numa stats of the corresponding node. > > E.g. > > $ numactl -H > available: 2 nodes (0-1) > node 0 cpus: 0 1 2 3 > node 0 size: 0 MB > node 0 free: 0 MB > node 1 cpus: 4 5 6 7 > node 1 size: 8131 MB > node 1 free: 6980 MB > node distances: > node 0 1 > 0: 10 40 > 1: 40 10 > > $ numastat > node0 node1 > numa_hit 108 56495 > numa_miss 0 0 > numa_foreign 0 0 > interleave_hit 0 4537 > local_node 108 31547 > other_node 0 24948 > > Hence, the boot pageset stats need to be cleared after > the real pagesets are allocated. > > From this point onwards, the stats of the boot pagesets do > not change as page allocations requested for a memory-less > node will either fail (if __GFP_THISNODE is used) or get > fulfilled by a preferred zone of a different node based on > the fallback zonelist. > > Signed-off-by: Sandipan Das Acked-by: Vlastimil Babka With suggestion below. > --- > > The previous version and discussion around it can be found at > https://lore.kernel.org/linux-mm/20200504070304.127361-1-sandipan@linux.ibm.com/ > > Changes in v2: > > - Reset the stats of the boot pagesets instead of explicitly > returning zero as suggested by Vlastimil. > > - Changed the subject to reflect the above. > > --- > mm/page_alloc.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 69827d4fa052..1543e32f7e4e 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6256,6 +6256,25 @@ void __init setup_per_cpu_pageset(void) > for_each_populated_zone(zone) > setup_zone_pageset(zone); > > +#ifdef CONFIG_NUMA > + if (static_branch_likely(&vm_numa_stat_key)) { I would just remove this test and do it unconditionally, as the branch can be only disabled later in boot by a sysctl. > + struct per_cpu_pageset *pcp; > + int cpu; > + > + /* > + * Unpopulated zones continue using the boot pagesets. > + * The numa stats for these pagesets need to be reset. > + * Otherwise, they will end up skewing the stats of > + * the nodes these zones are associated with. > + */ > + for_each_possible_cpu(cpu) { > + pcp = &per_cpu(boot_pageset, cpu); > + memset(pcp->vm_numa_stat_diff, 0, > + sizeof(pcp->vm_numa_stat_diff)); > + } > + } > +#endif > + > for_each_online_pgdat(pgdat) > pgdat->per_cpu_nodestats = > alloc_percpu(struct per_cpu_nodestat); >