From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94E20C34026 for ; Tue, 18 Feb 2020 09:09:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 45C1920659 for ; Tue, 18 Feb 2020 09:09:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 45C1920659 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B88E36B0003; Tue, 18 Feb 2020 04:09:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B39346B0006; Tue, 18 Feb 2020 04:09:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4FC76B0007; Tue, 18 Feb 2020 04:09:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0160.hostedemail.com [216.40.44.160]) by kanga.kvack.org (Postfix) with ESMTP id 8B5006B0003 for ; Tue, 18 Feb 2020 04:09:38 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 08AA6876D for ; Tue, 18 Feb 2020 09:09:38 +0000 (UTC) X-FDA: 76502674836.18.stage31_8e3f86cd21121 X-HE-Tag: stage31_8e3f86cd21121 X-Filterd-Recvd-Size: 4310 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Tue, 18 Feb 2020 09:09:37 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 9DA2CAF48; Tue, 18 Feb 2020 09:09:35 +0000 (UTC) Date: Tue, 18 Feb 2020 09:09:32 +0000 From: Mel Gorman To: "Huang, Ying" Cc: Peter Zijlstra , Ingo Molnar , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Feng Tang , Andrew Morton , Michal Hocko , Rik van Riel , Dave Hansen , Dan Williams Subject: Re: [RFC -V2 3/8] autonuma, memory tiering: Use kswapd to demote cold pages to PMEM Message-ID: <20200218090932.GD3420@suse.de> References: <20200218082634.1596727-1-ying.huang@intel.com> <20200218082634.1596727-4-ying.huang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20200218082634.1596727-4-ying.huang@intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 18, 2020 at 04:26:29PM +0800, Huang, Ying wrote: > From: Huang Ying > > In a memory tiering system, if the memory size of the workloads is > smaller than that of the faster memory (e.g. DRAM) nodes, all pages of > the workloads should be put in the faster memory nodes. But this > makes it unnecessary to use slower memory (e.g. PMEM) at all. > > So in common cases, the memory size of the workload should be larger > than that of the faster memory nodes. And to optimize the > performance, the hot pages should be promoted to the faster memory > nodes while the cold pages should be demoted to the slower memory > nodes. To achieve that, we have two choices, > > a. Promote the hot pages from the slower memory node to the faster > memory node. This will create some memory pressure in the faster > memory node, thus trigger the memory reclaiming, where the cold > pages will be demoted to the slower memory node. > > b. Demote the cold pages from faster memory node to the slower memory > node. This will create some free memory space in the faster memory > node, and the hot pages in the slower memory node could be promoted > to the faster memory node. > > The choice "a" will create the memory pressure in the faster memory > node. If the memory pressure of the workload is high too, the memory > pressure may become so high that the memory allocation latency of the > workload is influenced, e.g. the direct reclaiming may be triggered. > > The choice "b" works much better at this aspect. If the memory > pressure of the workload is high, it will consume the free memory and > the hot pages promotion will stop earlier if its allocation watermark > is higher than that of the normal memory allocation. > > In this patch, choice "b" is implemented. If memory tiering NUMA > balancing mode is enabled, the node isn't the slowest node, and the > free memory size of the node is below the high watermark, the kswapd > of the node will be waken up to free some memory until the free memory > size is above the high watermark + autonuma promotion rate limit. If > the free memory size is below the high watermark, autonuma promotion > will stop working. This avoids to create too much memory pressure to > the system. > > Signed-off-by: "Huang, Ying" Unfortunately I stopped reading at this point. It depends on another series entirely and they really need to be presented together instead of relying on searching mail archives to find other patches to try assemble the full picture :(. Ideally each stage would have supporting data showing roughly how it behaves at each major stage. I know this will be a pain but the original NUMA balancing had the same problem and ultimately started with one large series that got the basics right followed by other series that improved it in stages. That process is *still* ongoing today. -- Mel Gorman SUSE Labs