From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4F84C433EF for ; Mon, 20 Jun 2022 03:19:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 218B76B0071; Sun, 19 Jun 2022 23:19:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C7786B0073; Sun, 19 Jun 2022 23:19:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0B7046B0074; Sun, 19 Jun 2022 23:19:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EE02C6B0071 for ; Sun, 19 Jun 2022 23:19:23 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AED5632006 for ; Mon, 20 Jun 2022 03:19:23 +0000 (UTC) X-FDA: 79597158606.18.671A9F7 Received: from out30-43.freemail.mail.aliyun.com (out30-43.freemail.mail.aliyun.com [115.124.30.43]) by imf29.hostedemail.com (Postfix) with ESMTP id 593EC120099 for ; Mon, 20 Jun 2022 03:19:22 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0VGpdHYg_1655695156; Received: from 30.39.237.172(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VGpdHYg_1655695156) by smtp.aliyun-inc.com; Mon, 20 Jun 2022 11:19:17 +0800 Message-ID: <872bdaee-21a0-005b-b66c-893eb331e39a@linux.alibaba.com> Date: Mon, 20 Jun 2022 11:19:23 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [PATCH -V3 0/3] memory tiering: hot page selection To: Huang Ying , Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Johannes Weiner , Michal Hocko , Rik van Riel , Mel Gorman , Peter Zijlstra , Dave Hansen , Yang Shi , Zi Yan , Wei Xu , osalvador , Shakeel Butt , Zhong Jiang References: <20220614081635.194014-1-ying.huang@intel.com> From: Baolin Wang In-Reply-To: <20220614081635.194014-1-ying.huang@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655695163; a=rsa-sha256; cv=none; b=x+YLJW6f2gMIdzCOBCyWojjeQAX0GfyE5aH7Tml+0ZjWitsm+HrEGlqFkiai1p4Xbnm2rj mwXhMfx5L/BqxjRt6CUIUHg576lJ6ydIpsOOmHVgoFZVamYnqMcHgTYlt7Ul6kqEjkGFwx b8IYAtNXQtUrEvVPmq72U0viwlPihQw= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.43 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655695163; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=X2ilJtYiDCweOPEfppZvdNmZujDnK66edShGclFNXU4=; b=j3+z/HSarx8Ln0onK1/B6jSVFuiqNVdPFVD/hvo8e/cVfO0cGc8TyI4b2vgvFOItyBrE2b hkT6Oy1sPZH0CmLqY0Cewl0ULeog2WADTsZW6xHkYNNa8Oo1h/rOaj34CmLc3f6SYFyQWi 7e5CUPONq2bHHO/5QxsXvdKxC80NRYw= Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.43 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-Rspam-User: X-Stat-Signature: pcdntpwkntz8wwcqdz59usy6n8j1kq5m X-Rspamd-Queue-Id: 593EC120099 X-Rspamd-Server: rspam08 X-HE-Tag: 1655695162-83130 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/14/2022 4:16 PM, Huang Ying wrote: > To optimize page placement in a memory tiering system with NUMA > balancing, the hot pages in the slow memory nodes need to be > identified. Essentially, the original NUMA balancing implementation > selects the mostly recently accessed (MRU) pages to promote. But this > isn't a perfect algorithm to identify the hot pages. Because the > pages with quite low access frequency may be accessed eventually given > the NUMA balancing page table scanning period could be quite long > (e.g. 60 seconds). So in this patchset, we implement a new hot page > identification algorithm based on the latency between NUMA balancing > page table scanning and hint page fault. Which is a kind of mostly > frequently accessed (MFU) algorithm. > > In NUMA balancing memory tiering mode, if there are hot pages in slow > memory node and cold pages in fast memory node, we need to > promote/demote hot/cold pages between the fast and cold memory nodes. > > A choice is to promote/demote as fast as possible. But the CPU cycles > and memory bandwidth consumed by the high promoting/demoting > throughput will hurt the latency of some workload because of accessing > inflating and slow memory bandwidth contention. > > A way to resolve this issue is to restrict the max promoting/demoting > throughput. It will take longer to finish the promoting/demoting. > But the workload latency will be better. This is implemented in this > patchset as the page promotion rate limit mechanism. > > The promotion hot threshold is workload and system configuration > dependent. So in this patchset, a method to adjust the hot threshold > automatically is implemented. The basic idea is to control the number > of the candidate promotion pages to match the promotion rate limit. > > We used the pmbench memory accessing benchmark tested the patchset on > a 2-socket server system with DRAM and PMEM installed. The test > results are as follows, > > pmbench score promote rate > (accesses/s) MB/s > ------------- ------------ > base 146887704.1 725.6 > hot selection 165695601.2 544.0 > rate limit 162814569.8 165.2 > auto adjustment 170495294.0 136.9 > > From the results above, > > With hot page selection patch [1/3], the pmbench score increases about > 12.8%, and promote rate (overhead) decreases about 25.0%, compared with > base kernel. > > With rate limit patch [2/3], pmbench score decreases about 1.7%, and > promote rate decreases about 69.6%, compared with hot page selection > patch. > > With threshold auto adjustment patch [3/3], pmbench score increases > about 4.7%, and promote rate decrease about 17.1%, compared with rate > limit patch. I did a simple testing with mysql on my machine which contains 1 DRAM node (30G) and 1 PMEM node (126G). sysbench /usr/share/sysbench/oltp_read_write.lua \ ...... --tables=200 \ --table-size=1000000 \ --report-interval=10 \ --threads=16 \ --time=120 The tps can be improved about 5% from below data, and I think this is a good start to optimize the promotion. So for this series, please feel free to add: Reviewed-by: Baolin Wang Tested-by: Baolin Wang Without this patchset: transactions: 2080188 (3466.48 per sec.) With this patch set: transactions: 2174296 (3623.40 per sec.)