From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3494CC47257 for ; Wed, 6 May 2020 15:09:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DC3A7208E4 for ; Wed, 6 May 2020 15:09:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DC3A7208E4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 028528E0005; Wed, 6 May 2020 11:09:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1C018E0003; Wed, 6 May 2020 11:09:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E30CD8E0005; Wed, 6 May 2020 11:09:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0119.hostedemail.com [216.40.44.119]) by kanga.kvack.org (Postfix) with ESMTP id C96A98E0003 for ; Wed, 6 May 2020 11:09:55 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 85231180AD806 for ; Wed, 6 May 2020 15:09:55 +0000 (UTC) X-FDA: 76786629150.01.skin62_73b45e9dea14f X-HE-Tag: skin62_73b45e9dea14f X-Filterd-Recvd-Size: 4231 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 May 2020 15:09:54 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 03FE4AE07; Wed, 6 May 2020 15:09:55 +0000 (UTC) Subject: Re: [PATCH] mm: vmstat: Use zeroed stats for unpopulated zones To: Michal Hocko Cc: Sandipan Das , akpm@linux-foundation.org, linux-mm@kvack.org, khlebnikov@yandex-team.ru, kirill@shutemov.name, aneesh.kumar@linux.ibm.com, srikar@linux.vnet.ibm.com References: <20200504070304.127361-1-sandipan@linux.ibm.com> <20200504102441.GM22838@dhcp22.suse.cz> <959f15af-28a8-371b-c5c3-cd7489d2a7fb@suse.cz> <20200506140241.GB6345@dhcp22.suse.cz> From: Vlastimil Babka Message-ID: Date: Wed, 6 May 2020 17:09:51 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200506140241.GB6345@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5/6/20 4:02 PM, Michal Hocko wrote: > On Wed 06-05-20 15:33:36, Vlastimil Babka wrote: >> On 5/4/20 12:26 PM, Michal Hocko wrote: >> > On Mon 04-05-20 12:33:04, Sandipan Das wrote: >> >> For unpopulated zones, the pagesets point to the common >> >> boot_pageset which can have non-zero vm_numa_stat counts. >> >> Because of this memory-less nodes end up having non-zero >> >> NUMA statistics. This can be observed on any architecture >> >> that supports memory-less NUMA nodes. >> >> >> >> E.g. >> >> >> >> $ numactl -H >> >> available: 2 nodes (0-1) >> >> node 0 cpus: 0 1 2 3 >> >> node 0 size: 0 MB >> >> node 0 free: 0 MB >> >> node 1 cpus: 4 5 6 7 >> >> node 1 size: 8131 MB >> >> node 1 free: 6980 MB >> >> node distances: >> >> node 0 1 >> >> 0: 10 40 >> >> 1: 40 10 >> >> >> >> $ numastat >> >> node0 node1 >> >> numa_hit 108 56495 >> >> numa_miss 0 0 >> >> numa_foreign 0 0 >> >> interleave_hit 0 4537 >> >> local_node 108 31547 >> >> other_node 0 24948 >> >> >> >> Hence, return zero explicitly for all the stats of an >> >> unpopulated zone. >> > >> > I hope I am not just confused but I would expect that at least >> > numa_foreign and other_node to be non zero. >> Hmm, checking zone_statistics(): >> >> NUMA_FOREIGN increment uses preferred zone, which is the first in zone in >> zonelist, so it will be a zone from node 1 even for allocations on cpu >> associated to node 0 - assuming node 0's unpopulated zones are not included in >> node 0's zonelist. > > But the allocation could have been requested for node 0 regardless of > the amount of memory the node has. Yes, if we allocate from cpu 0-3 then it should be a miss on node 0. But the zonelists are optimized in a way that they don't include empty zones - build_zonerefs_node() checks managed_zone(). As a result, node 0 zonelist has no node 0 zones, which confuses the stats code. We should probably document that numa stats are bogus on systems with memoryless nodes. This patch makes it somewhat more obvious by presenting nice zeroes on the memoryless node itself, but node 1 now include stats from node 0. >> NUMA_OTHER uses numa_node_id(), which would mean the node 0's cpus have node 1 >> in their numa_node_id() ? Is that correct? > > numa_node_id should reflect the real node the CPU is associated with. You're right, numa_node_id() is probably fine. But NUMA_OTHER is actually incremented at the zone where the allocation succeeds. This probably doesn't match Documentation/admin-guide/numastat.rst, even on a non-memoryless-node systems: other_node A process ran on this node and got memory from another node.