From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CE5CC00140 for ; Sun, 31 Jul 2022 14:11:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 750768E0002; Sun, 31 Jul 2022 10:11:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6FD6F8E0001; Sun, 31 Jul 2022 10:11:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5EC7E8E0002; Sun, 31 Jul 2022 10:11:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 505938E0001 for ; Sun, 31 Jul 2022 10:11:36 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 220FD80607 for ; Sun, 31 Jul 2022 14:11:36 +0000 (UTC) X-FDA: 79747582992.21.82A4697 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by imf01.hostedemail.com (Postfix) with ESMTP id 5C4DB400FA for ; Sun, 31 Jul 2022 14:11:32 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R691e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=xhao@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0VKwb5pz_1659276688; Received: from 30.30.99.227(mailfrom:xhao@linux.alibaba.com fp:SMTPD_---0VKwb5pz_1659276688) by smtp.aliyun-inc.com; Sun, 31 Jul 2022 22:11:29 +0800 Message-ID: <7a4d0960-c985-2b5f-bb5d-492542f9f087@linux.alibaba.com> Date: Sun, 31 Jul 2022 22:11:26 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.0.3 Subject: Re: [RFC PATCH] mm: add last level page table numa info to /proc/pid/numa_pgtable To: Matthew Wilcox Cc: adobriyan@gmail.com, akpm@linux-foundation.org, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, xhao@linux.alibaba.com References: <20220730163528.48377-1-xhao@linux.alibaba.com> From: haoxin In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659276695; a=rsa-sha256; cv=none; b=ZafnLct+ZWNW1JoZVWxzEeiTB7A1/SjIV5UcrL2k/KftsB/hcBFdH9vYzXELjyTf54/EGY F0C1RCf72UVP7dxza2CMPufZO2AMu45+o/6FGtVhBdeCi7XrPG0aglGfHncbxXPZFFsFZl VdLeUXz5WUYMDxzsC3niLaVbJeky4+I= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; spf=pass (imf01.hostedemail.com: domain of xhao@linux.alibaba.com designates 115.124.30.54 as permitted sender) smtp.mailfrom=xhao@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659276695; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XDViNBixbv/KiYpvgWHRcpnr5Nav0IAnv+FZfZQA1dU=; b=t51M7y2sGyzIiPav4yCWsDps9IeeUd7GtEirhNO2c9dVS0s/1xg2zuG9pwAlHJWEq15Vhw 2pPqBfeLFPMdO0uvLsCJYinYEblo3qe0MI9X/SVgVoWaiQzb1dmABEm/8xUD1xxgsMplCJ QLNZsfqTZzGP/RjwHa4Vn4ElYciDpiw= Authentication-Results: imf01.hostedemail.com; dkim=none; spf=pass (imf01.hostedemail.com: domain of xhao@linux.alibaba.com designates 115.124.30.54 as permitted sender) smtp.mailfrom=xhao@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-Stat-Signature: 9dfog3wce7c1c65zmqkpwm6dq7pcf8tz X-Rspamd-Queue-Id: 5C4DB400FA X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1659276692-294449 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 在 2022/7/31 上午1:29, Matthew Wilcox 写道: > On Sun, Jul 31, 2022 at 12:35:28AM +0800, Xin Hao wrote: >> In many data center servers, the shared memory architectures is >> Non-Uniform Memory Access (NUMA), remote numa node data access >> often brings a high latency problem, but what we are easy to ignore >> is that the page table remote numa access, It can also leads to a >> performance degradation. >> >> So there add a new interface in /proc, This will help developers to >> get more info about performance issues if they are caused by cross-NUMA. > Interesting. The implementation seems rather more complex than > necessary though. > >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c >> index 2d04e3470d4c..a51befb47ea8 100644 >> --- a/fs/proc/task_mmu.c >> +++ b/fs/proc/task_mmu.c >> @@ -1999,4 +1999,133 @@ const struct file_operations proc_pid_numa_maps_operations = { >> .release = proc_map_release, >> }; >> >> +struct pgtable_numa_maps { >> + unsigned long node[MAX_NUMNODES]; >> +}; >> + >> +struct pgtable_numa_private { >> + struct proc_maps_private proc_maps; >> + struct pgtable_numa_maps md; >> +}; > struct pgtable_numa_private { > struct proc_maps_private proc_maps; > unsigned long node[MAX_NUMNODES]; > }; > >> +static void gather_pgtable_stats(struct page *page, struct pgtable_numa_maps *md) >> +{ >> + md->node[page_to_nid(page)] += 1; >> +} >> + >> +static struct page *can_gather_pgtable_numa_stats(pmd_t pmd, struct vm_area_struct *vma, >> + unsigned long addr) >> +{ >> + struct page *page; >> + int nid; >> + >> + if (!pmd_present(pmd)) >> + return NULL; >> + >> + if (pmd_huge(pmd)) >> + return NULL; >> + >> + page = pmd_page(pmd); >> + nid = page_to_nid(page); >> + if (!node_isset(nid, node_states[N_MEMORY])) >> + return NULL; >> + >> + return page; >> +} >> + >> +static int gather_pgtable_numa_stats(pmd_t *pmd, unsigned long addr, >> + unsigned long end, struct mm_walk *walk) >> +{ >> + struct pgtable_numa_maps *md = walk->private; >> + struct vm_area_struct *vma = walk->vma; >> + struct page *page; >> + >> + if (pmd_huge(*pmd)) { >> + struct page *pmd_page; >> + >> + pmd_page = virt_to_page(pmd); >> + if (!pmd_page) >> + return 0; >> + >> + if (!node_isset(page_to_nid(pmd_page), node_states[N_MEMORY])) >> + return 0; >> + >> + gather_pgtable_stats(pmd_page, md); >> + goto out; >> + } >> + >> + page = can_gather_pgtable_numa_stats(*pmd, vma, addr); >> + if (!page) >> + return 0; >> + >> + gather_pgtable_stats(page, md); >> + >> +out: >> + cond_resched(); >> + return 0; >> +} > static int gather_pgtable_numa_stats(pmd_t *pmd, unsigned long addr, > unsigned long end, struct mm_walk *walk) > { > struct pgtable_numa_private *priv = walk->private; > struct vm_area_struct *vma = walk->vma; > struct page *page; > int nid; > > if (pmd_huge(*pmd)) { > page = virt_to_page(pmd); > } else { > page = pmd_page(*pmd); > } > > nid = page_to_nid(page); > priv->node[nid]++; > > return 0; > } Oh,  Thank you for reviewing the code, i will fix it in the next version.