From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE0CEC433F5 for ; Wed, 8 Dec 2021 06:28:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F1A0D6B0093; Wed, 8 Dec 2021 01:19:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ECA6E6B0095; Wed, 8 Dec 2021 01:19:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D90A66B0096; Wed, 8 Dec 2021 01:19:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay038.a.hostedemail.com [64.99.140.38]) by kanga.kvack.org (Postfix) with ESMTP id C9FF66B0093 for ; Wed, 8 Dec 2021 01:19:53 -0500 (EST) Received: by unirelay09.hostedemail.com (Postfix, from userid 108) id 27C9720A77; Wed, 8 Dec 2021 06:17:49 +0000 (UTC) Received: by unirelay09.hostedemail.com (Postfix, from userid 108) id 71ABF20EAD; Wed, 8 Dec 2021 05:33:54 +0000 (UTC) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 46535211C4 for ; Wed, 8 Dec 2021 03:16:39 +0000 (UTC) X-FDA: 78893164476.31.7977EBF Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf09.hostedemail.com (Postfix) with ESMTP id 9E054300010A for ; Wed, 8 Dec 2021 03:16:37 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10191"; a="236485350" X-IronPort-AV: E=Sophos;i="5.87,296,1631602800"; d="scan'208";a="236485350" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 19:16:36 -0800 X-IronPort-AV: E=Sophos;i="5.87,296,1631602800"; d="scan'208";a="502881240" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.159.50]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 19:16:30 -0800 From: "Huang, Ying" To: Hasan Al Maruf Cc: akpm@linux-foundation.org, dave.hansen@linux.intel.com, feng.tang@intel.com, hasanalmaruf@fb.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mgorman@suse.de, mgorman@techsingularity.net, mhocko@suse.com, osalvador@suse.de, peterz@infradead.org, riel@surriel.com, shakeelb@google.com, shy828301@gmail.com, weixugc@google.com, ziy@nvidia.com, Johannes Weiner Subject: Re: [PATCH -V10 RESEND 2/6] NUMA balancing: optimize page placement for memory tiering system References: <20211207022757.2523359-3-ying.huang@intel.com> <20211207063639.83762-1-hasanalmaruf@fb.com> Date: Wed, 08 Dec 2021 11:16:28 +0800 In-Reply-To: <20211207063639.83762-1-hasanalmaruf@fb.com> (Hasan Al Maruf's message of "Tue, 7 Dec 2021 01:36:39 -0500") Message-ID: <87wnkf3hwz.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Stat-Signature: jjw37s8fu33uya5k9ud7zdnj7ysrj9js Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf09.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=ying.huang@intel.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9E054300010A X-HE-Tag: 1638933397-997869 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hasan Al Maruf writes: > Hi Huang, > >>+void set_numabalancing_state(bool enabled) >>+{ >>+ if (enabled) >>+ sysctl_numa_balancing_mode = NUMA_BALANCING_NORMAL; >>+ else >>+ sysctl_numa_balancing_mode = NUMA_BALANCING_DISABLED; >>+ __set_numabalancing_state(enabled); >>+} >>+ > > One of the properties of optimized NUMA Balancing for tiered memory is we > are not going to scan top-tier nodes as promotion doesn't make sense there > (implemented in the next patch [3/6]). However, if a system has only > single memory node with CPU, does it make sense to run > `NUMA_BALANCING_NORMAL` mode there? What do you think about downgrading to > `NUMA_BALANCING_MEMORY_TIERING` mode if a user setup NUMA Balancing on > the default mode of `NUMA_BALANCING_NORMAL` on a single toptier memory > node? Consider a system with only 1 NUMA node and no PMEM, should we refuse NUMA balancing to be enabled at all? Per my understanding, the philosophy behind is to keep thing as small as possible instead of as smart as possible. Do you agree? >>diff --git a/mm/vmscan.c b/mm/vmscan.c >>index c266e64d2f7e..5edb5dfa8900 100644 >>--- a/mm/vmscan.c >>+++ b/mm/vmscan.c >>@@ -56,6 +56,7 @@ >> >> #include >> #include >>+#include >> >> #include "internal.h" >> >>@@ -3919,6 +3920,12 @@ static bool pgdat_watermark_boosted(pg_data_t *pgdat, int highest_zoneidx) >> return false; >> } >> >>+/* >>+ * Keep the free pages on fast memory node a little more than the high >>+ * watermark to accommodate the promoted pages. >>+ */ >>+#define NUMA_BALANCING_PROMOTE_WATERMARK (10UL * 1024 * 1024 >> PAGE_SHIFT) >>+ >> /* >> * Returns true if there is an eligible zone balanced for the request order >> * and highest_zoneidx >>@@ -3940,6 +3947,15 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx) >> continue; >> >> mark = high_wmark_pages(zone); >>+ if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING && >>+ numa_demotion_enabled && >>+ next_demotion_node(pgdat->node_id) != NUMA_NO_NODE) { >>+ unsigned long promote_mark; >>+ >>+ promote_mark = min(NUMA_BALANCING_PROMOTE_WATERMARK, >>+ pgdat->node_present_pages >> 6); >>+ mark += promote_mark; >>+ } >> if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx)) >> return true; >> } > > This can be moved to a different patch. I think, this patch [2/6] can be > splitted into two basic patches -- 1. NUMA Balancing interface for tiered > memory and 2. maintaining a headroom for promotion. Johannes has taught me that, if we introduce a new function, variable, or interface, it's better to introduce its user together. So that we can determine whether it's necessary to do that, whether the definition is suitable, etc. I think that makes sense. So I try to do that in this patchset too. As in [2/5] of your patchset below, another possibility is to make 1. NUMA balancing interface for tiered memory and 2. skip scanning top tier memory in NUMA balancing in one patch. One concern is that although this is an optimization, there's almost no measurable performance difference. This makes it hard to justify to extend the user space interface. Do you have better data to support this? > Instead of having a static value for `NUMA_BALANCING_PROMOTE_WATERMARK` > what about decoupling the allocation and reclamation and add a user-space > interface for controling them? This means to add a new user space ABI. Because we may need to support the new ABI forever, we should have strong justification to add it. I am not against to add an ABI to adjust promotion watermark in general. I think that the path could be, - Have a simplest solution that works without introducing new ABI, like something in this patch, or revised. - Then try to add a new ABI in a separate patch with enough justification, for example, with much improved performance data. Do you agree? > Do you think patch [2/5] and [3/5] of this series can be merged to your > current patchset? > > https://lore.kernel.org/all/cover.1637778851.git.hasanalmaruf@fb.com/ Best Regards, Huang, Ying