From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C0ABC3404D for ; Wed, 19 Feb 2020 06:05:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4A9F921D56 for ; Wed, 19 Feb 2020 06:05:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A9F921D56 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F06AB6B0006; Wed, 19 Feb 2020 01:05:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EB5E96B0007; Wed, 19 Feb 2020 01:05:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF3D96B0008; Wed, 19 Feb 2020 01:05:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29]) by kanga.kvack.org (Postfix) with ESMTP id C62AC6B0006 for ; Wed, 19 Feb 2020 01:05:16 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 486FE181AC9BF for ; Wed, 19 Feb 2020 06:05:16 +0000 (UTC) X-FDA: 76505839032.13.bike23_2deed74f7933d X-HE-Tag: bike23_2deed74f7933d X-Filterd-Recvd-Size: 4899 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Wed, 19 Feb 2020 06:05:15 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Feb 2020 22:05:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,459,1574150400"; d="scan'208";a="239595768" Received: from yhuang-dev.sh.intel.com (HELO yhuang-dev) ([10.239.159.41]) by orsmga006.jf.intel.com with ESMTP; 18 Feb 2020 22:05:10 -0800 From: "Huang\, Ying" To: Mel Gorman Cc: Peter Zijlstra , Ingo Molnar , , , Feng Tang , Andrew Morton , "Michal Hocko" , Rik van Riel , Dave Hansen , Dan Williams Subject: Re: [RFC -V2 3/8] autonuma, memory tiering: Use kswapd to demote cold pages to PMEM References: <20200218082634.1596727-1-ying.huang@intel.com> <20200218082634.1596727-4-ying.huang@intel.com> <20200218090932.GD3420@suse.de> Date: Wed, 19 Feb 2020 14:05:09 +0800 In-Reply-To: <20200218090932.GD3420@suse.de> (Mel Gorman's message of "Tue, 18 Feb 2020 09:09:32 +0000") Message-ID: <87o8tvglii.fsf@yhuang-dev.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Mel Gorman writes: > On Tue, Feb 18, 2020 at 04:26:29PM +0800, Huang, Ying wrote: >> From: Huang Ying >> >> In a memory tiering system, if the memory size of the workloads is >> smaller than that of the faster memory (e.g. DRAM) nodes, all pages of >> the workloads should be put in the faster memory nodes. But this >> makes it unnecessary to use slower memory (e.g. PMEM) at all. >> >> So in common cases, the memory size of the workload should be larger >> than that of the faster memory nodes. And to optimize the >> performance, the hot pages should be promoted to the faster memory >> nodes while the cold pages should be demoted to the slower memory >> nodes. To achieve that, we have two choices, >> >> a. Promote the hot pages from the slower memory node to the faster >> memory node. This will create some memory pressure in the faster >> memory node, thus trigger the memory reclaiming, where the cold >> pages will be demoted to the slower memory node. >> >> b. Demote the cold pages from faster memory node to the slower memory >> node. This will create some free memory space in the faster memory >> node, and the hot pages in the slower memory node could be promoted >> to the faster memory node. >> >> The choice "a" will create the memory pressure in the faster memory >> node. If the memory pressure of the workload is high too, the memory >> pressure may become so high that the memory allocation latency of the >> workload is influenced, e.g. the direct reclaiming may be triggered. >> >> The choice "b" works much better at this aspect. If the memory >> pressure of the workload is high, it will consume the free memory and >> the hot pages promotion will stop earlier if its allocation watermark >> is higher than that of the normal memory allocation. >> >> In this patch, choice "b" is implemented. If memory tiering NUMA >> balancing mode is enabled, the node isn't the slowest node, and the >> free memory size of the node is below the high watermark, the kswapd >> of the node will be waken up to free some memory until the free memory >> size is above the high watermark + autonuma promotion rate limit. If >> the free memory size is below the high watermark, autonuma promotion >> will stop working. This avoids to create too much memory pressure to >> the system. >> >> Signed-off-by: "Huang, Ying" > > Unfortunately I stopped reading at this point. It depends on another series > entirely and they really need to be presented together instead of relying > on searching mail archives to find other patches to try assemble the full > picture :(. Ideally each stage would have supporting data showing roughly > how it behaves at each major stage. I know this will be a pain but the > original NUMA balancing had the same problem and ultimately started with > one large series that got the basics right followed by other series that > improved it in stages. That process is *still* ongoing today. Sorry for inconvenience, we will post a new patchset including both series and add supporting data at each major stage when possible. Best Regards, Huang, Ying