From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9411C43334 for ; Mon, 6 Jun 2022 07:24:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04CA08D0002; Mon, 6 Jun 2022 03:24:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F133D8D0001; Mon, 6 Jun 2022 03:24:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB23D8D0002; Mon, 6 Jun 2022 03:24:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C85988D0001 for ; Mon, 6 Jun 2022 03:24:56 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A236020766 for ; Mon, 6 Jun 2022 07:24:56 +0000 (UTC) X-FDA: 79546974192.26.3607D18 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf10.hostedemail.com (Postfix) with ESMTP id 78EBCC001E for ; Mon, 6 Jun 2022 07:24:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654500294; x=1686036294; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=xNNv8f5KLFlwE8E16qBiInBjz2utO9UckXry2wUB1oc=; b=gz0xtJipa8FWUPlZrv14HnBOy69z47ij6o4+QlqRr6+ba8Rhgd+JIm+V yqwkmr4+u3vK0HfyB2R+kD5jdOli8FAbHTTv4vKkdkCETbKbXc0ApVI5V IbOO4/Z5uxUk7Lv2cvh0QeY9J9UzSUc+6c8iGNOneZ1tPStlSjKwRxs70 1w5zd3YD9eaQuahYS6Hp9a9xY7hoa/+GVCjpDjomfGbeJ5ZS8lCaPu4hH jFRKIimdUfktfHqxk2zLraJHjkbGdii4tyn6A0M1kg2plNwJX8N+tsZYQ YV831Ql9SWTY8JNFYZJDvJbNKcfA3Z1ekWNmFQOmcTb7OXesba6Gmh+YT Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10369"; a="258860372" X-IronPort-AV: E=Sophos;i="5.91,280,1647327600"; d="scan'208";a="258860372" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2022 00:24:52 -0700 X-IronPort-AV: E=Sophos;i="5.91,280,1647327600"; d="scan'208";a="635461994" Received: from xingguom-mobl.ccr.corp.intel.com ([10.254.213.116]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2022 00:24:43 -0700 Message-ID: <11f94e0c50f17f4a6a2f974cb69a1ae72853e2be.camel@intel.com> Subject: Re: [PATCH v5 9/9] mm/demotion: Update node_is_toptier to work with memory tiers From: Ying Huang To: Aneesh Kumar K V , linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Greg Thelen , Yang Shi , Davidlohr Bueso , Tim C Chen , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes Date: Mon, 06 Jun 2022 15:24:38 +0800 In-Reply-To: References: <20220603134237.131362-1-aneesh.kumar@linux.ibm.com> <20220603134237.131362-10-aneesh.kumar@linux.ibm.com> <6e94b7e2a6192e4cacba1db3676b5b5cf9b98eac.camel@intel.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 78EBCC001E X-Stat-Signature: 76ceeqhbi43doq8o11uhc9eggzfteui4 X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gz0xtJip; spf=none (imf10.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam08 X-HE-Tag: 1654500245-359505 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 2022-06-06 at 09:22 +0530, Aneesh Kumar K V wrote: > On 6/6/22 8:41 AM, Ying Huang wrote: > > On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote: > > > With memory tiers support we can have memory on NUMA nodes > > > in the top tier from which we want to avoid promotion tracking NUMA > > > faults. Update node_is_toptier to work with memory tiers. To > > > avoid taking locks, a nodemask is maintained for all demotion > > > targets. All NUMA nodes are by default top tier nodes and as > > > we add new lower memory tiers NUMA nodes get added to the > > > demotion targets thereby moving them out of the top tier. > > > > Check the usage of node_is_toptier(), > > > > - migrate_misplaced_page() > >    node_is_toptier() is used to check whether migration is a promotion. > > We can avoid to use it. Just compare the rank of the nodes. > > > > - change_pte_range() and change_huge_pmd() > >    node_is_toptier() is used to avoid scanning fast memory (DRAM) pages > > for promotion. So I think we should change the name to node_is_fast() > > as follows, > > > > static inline bool node_is_fast(int node) > > { > > return NODE_DATA(node)->mt_rank >= MEMORY_RANK_DRAM; > > } > > > > But that gives special meaning to MEMORY_RANK_DRAM. As detailed in other > patches, absolute value of rank doesn't carry any meaning. It is only > the relative value w.r.t other memory tiers that decide whether it is > fast or not. Agreed by default memory tiers get built with > MEMORY_RANK_DRAM. But userspace can change the rank value of 'memtier1' > Hence to determine a node is consisting of fast memory is essentially > figuring out whether node is the top most tier in memory hierarchy and > not just the memory tier rank value is >= MEMORY_RANK_DRAM? In a system with 3 tiers, HBM 0 DRAM 1 PMEM 2 In your implementation, only HBM will be considered fast. But what we need is to consider both HBM and DRAM fast. Because we use NUMA balancing to promote PMEM pages to DRAM. It's unnecessary to scan HBM and DRAM pages for that. And there're no requirements to promote DRAM pages to HBM with NUMA balancing. I can understand that the memory tiers are more dynamic now. For requirements of NUMA balancing, we need the lowest memory tier (rank) where there's at least one node with CPU. The nodes in it and the higher tiers will be considered fast. Best Regards, Huang, Ying