From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E431C433EF for ; Tue, 1 Mar 2022 22:03:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8EC458D0002; Tue, 1 Mar 2022 17:03:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 89A698D0001; Tue, 1 Mar 2022 17:03:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78A798D0002; Tue, 1 Mar 2022 17:03:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0004.hostedemail.com [216.40.44.4]) by kanga.kvack.org (Postfix) with ESMTP id 6B7BB8D0001 for ; Tue, 1 Mar 2022 17:03:26 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 29B8D7F90D for ; Tue, 1 Mar 2022 22:03:26 +0000 (UTC) X-FDA: 79197194412.26.D71A475 Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by imf04.hostedemail.com (Postfix) with ESMTP id A96BA40014 for ; Tue, 1 Mar 2022 22:03:25 +0000 (UTC) Received: by mail-ed1-f54.google.com with SMTP id q17so23847254edd.4 for ; Tue, 01 Mar 2022 14:03:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4g38lO84UEtzS1FMCpfYGL1X2pOxy5GyUBDE0k9I5bU=; b=g3YkwGFhbNdKjyR0fFhLMMbf1kNDXQtjEFJZ6TJ0CZjmncWLXYuxpSn8ZngEIbLzDt Xbw9oWxk9uw85L3P7TsyLIIbnhgs779zzaWNaUDS3esO3WjHcJe3M7FfIxKQsacEqpwO 7EdKZIYT2PqSrpwsaYl7VxttgzZJhXMgy0gxg6earvDtLTSAhRtoZb6XDfdiCV0d91oy qF9Aa7UkV+/Jb3YuBk1K0oSRhpybtqzjrQ/b/H5fM4spJIFHsd0RO0VDbLLnJVVc0ZBD 4Noxbl35v8SHxRwCcup9zOjvWzQSFrmqaJAubdGOHnPSSRrYbWoAhkWYCCoD4iI9eQT0 fHIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4g38lO84UEtzS1FMCpfYGL1X2pOxy5GyUBDE0k9I5bU=; b=L63XfeuCYfmS1Av3vgL1IwYWWhEVT8ooOtSWJ1azR7meRyUg2I/S50H/OPgF2fpqAT +CzYKPK9PRGd5HH4q4Ol7KxbW4tBoFgookY7LxSfu9sZaulLjZfoRuEf+4u+JhRL9x9W UZLVW5WGiKkLHj2Y0KrQrL3RVjKS8JtUu8r+TD2Se4WKj5WW48uxCUnPP7vOoMbE13WS Ou50N7Zamp1jF1vwYAbS2467a2uaJ8J8+rVpJGI/AEpr8js7LUnE3myFlfFVlUm7KW4k AowWdZJRKnOSCrWxb7Ne9CGSdXmo8hlTCfsOqyBFAvB5eApgtkzX7dLsskbQPrSYQQBB BK6w== X-Gm-Message-State: AOAM530MmpfLYpV8kSz2mgHJJtE6+WNih1PSA1xQ7Xnb4QkCY0POSAm0 ONXz0pNviRh4ZUFOZGTMZAjlPhP8nlbej2QZR1Y= X-Google-Smtp-Source: ABdhPJwGQCophlEQDew2HaHEJrwNCfFiJgIQPLhDBroTh81nlt7REEbtOfxnqEhGAJw1pYUzTsY8WY0OfZ3XxE9sug4= X-Received: by 2002:aa7:ca5a:0:b0:410:9259:2e6f with SMTP id j26-20020aa7ca5a000000b0041092592e6fmr26313239edt.105.1646172204498; Tue, 01 Mar 2022 14:03:24 -0800 (PST) MIME-Version: 1.0 References: <20220301085329.3210428-1-ying.huang@intel.com> <20220301085329.3210428-4-ying.huang@intel.com> In-Reply-To: <20220301085329.3210428-4-ying.huang@intel.com> From: Yang Shi Date: Tue, 1 Mar 2022 14:03:11 -0800 Message-ID: Subject: Re: [PATCH -V14 3/3] memory tiering: skip to scan fast memory To: Huang Ying Cc: Peter Zijlstra , Mel Gorman , Andrew Morton , Linux MM , Linux Kernel Mailing List , Feng Tang , Dave Hansen , Baolin Wang , Johannes Weiner , Oscar Salvador , Michal Hocko , Rik van Riel , Zi Yan , Wei Xu , Shakeel Butt , zhongjiang-ali Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: A96BA40014 X-Stat-Signature: ixdfodkxga4dcs6zydz3tnh4o5k9xhsc X-Rspam-User: Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=g3YkwGFh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Rspamd-Server: rspam03 X-HE-Tag: 1646172205-887502 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 1, 2022 at 12:54 AM Huang Ying wrote: > > If the NUMA balancing isn't used to optimize the page placement among > sockets but only among memory types, the hot pages in the fast memory > node couldn't be migrated (promoted) to anywhere. So it's unnecessary > to scan the pages in the fast memory node via changing their PTE/PMD > mapping to be PROT_NONE. So that the page faults could be avoided > too. > > In the test, if only the memory tiering NUMA balancing mode is enabled, the > number of the NUMA balancing hint faults for the DRAM node is reduced to > almost 0 with the patch. While the benchmark score doesn't change > visibly. Reviewed-by: Yang Shi > > Signed-off-by: "Huang, Ying" > Suggested-by: Dave Hansen > Tested-by: Baolin Wang > Reviewed-by: Baolin Wang > Acked-by: Johannes Weiner > Reviewed-by: Oscar Salvador > Cc: Andrew Morton > Cc: Michal Hocko > Cc: Rik van Riel > Cc: Mel Gorman > Cc: Peter Zijlstra > Cc: Yang Shi > Cc: Zi Yan > Cc: Wei Xu > Cc: Shakeel Butt > Cc: zhongjiang-ali > Cc: linux-kernel@vger.kernel.org > Cc: linux-mm@kvack.org > --- > mm/huge_memory.c | 30 +++++++++++++++++++++--------- > mm/mprotect.c | 13 ++++++++++++- > 2 files changed, 33 insertions(+), 10 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 406a3c28c026..9ce126cb0cfd 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -34,6 +34,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -1766,17 +1767,28 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > } > #endif > > - /* > - * Avoid trapping faults against the zero page. The read-only > - * data is likely to be read-cached on the local CPU and > - * local/remote hits to the zero page are not interesting. > - */ > - if (prot_numa && is_huge_zero_pmd(*pmd)) > - goto unlock; > + if (prot_numa) { > + struct page *page; > + /* > + * Avoid trapping faults against the zero page. The read-only > + * data is likely to be read-cached on the local CPU and > + * local/remote hits to the zero page are not interesting. > + */ > + if (is_huge_zero_pmd(*pmd)) > + goto unlock; > > - if (prot_numa && pmd_protnone(*pmd)) > - goto unlock; > + if (pmd_protnone(*pmd)) > + goto unlock; > > + page = pmd_page(*pmd); > + /* > + * Skip scanning top tier node if normal numa > + * balancing is disabled > + */ > + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && > + node_is_toptier(page_to_nid(page))) > + goto unlock; > + } > /* > * In case prot_numa, we are under mmap_read_lock(mm). It's critical > * to not clear pmd intermittently to avoid race with MADV_DONTNEED > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 0138dfcdb1d8..2fe03e695c81 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -83,6 +84,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > */ > if (prot_numa) { > struct page *page; > + int nid; > > /* Avoid TLB flush if possible */ > if (pte_protnone(oldpte)) > @@ -109,7 +111,16 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > * Don't mess with PTEs if page is already on the node > * a single-threaded process is running on. > */ > - if (target_node == page_to_nid(page)) > + nid = page_to_nid(page); > + if (target_node == nid) > + continue; > + > + /* > + * Skip scanning top tier node if normal numa > + * balancing is disabled > + */ > + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && > + node_is_toptier(nid)) > continue; > } > > -- > 2.30.2 >