From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97B14C433EF for ; Thu, 14 Apr 2022 07:00:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 267976B0073; Thu, 14 Apr 2022 03:00:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F1506B0074; Thu, 14 Apr 2022 03:00:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06A5C6B0075; Thu, 14 Apr 2022 03:00:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id E9E996B0073 for ; Thu, 14 Apr 2022 03:00:56 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B0D83228BD for ; Thu, 14 Apr 2022 07:00:56 +0000 (UTC) X-FDA: 79354587312.21.8C16CED Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf03.hostedemail.com (Postfix) with ESMTP id 127032000E for ; Thu, 14 Apr 2022 07:00:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649919655; x=1681455655; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=8xWog9zGW+mcEtivSRwiLZXS7DeB7rdKhN8pS3VC9/s=; b=bPTplAy8jF8PpYdUDd07/h3vj1Tk078DKjs822czDhg/9ax/Us1mMlIX xagNSjkHvuEFp8B3lDMuJ70QXQk/xs7hnUJ1gfpAH+kgDGn7iKP8WAngN Y2pfoIpLVVYUP1NaJOA/c4C9spvzEz41lqCQpVOWMvhc3ZFSLcy78brv5 c7aFhkznHgBhwhVrgk7w9kgCVQgUPmdCZelc1uyxTVSbuTWZYXE/3lMYf 4JYo894C+BRTvXIsJZgUt86qIvkeZ3Ri9CoH1CKyN/Jo6cLul/GvLGZc6 5MbpbLE7/ufsWMSimFzumhVo8grZCvzoPHzab5JoIXVrSCogAuWnxNELt Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10316"; a="263041771" X-IronPort-AV: E=Sophos;i="5.90,259,1643702400"; d="scan'208";a="263041771" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Apr 2022 00:00:53 -0700 X-IronPort-AV: E=Sophos;i="5.90,259,1643702400"; d="scan'208";a="527278543" Received: from xikunjia-mobl1.ccr.corp.intel.com ([10.254.215.168]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Apr 2022 00:00:50 -0700 Message-ID: <6365983a8fbd8c325bb18959c51e9417fd821c91.camel@intel.com> Subject: Re: [PATCH v2 0/5] mm: demotion: Introduce new node state N_DEMOTION_TARGETS From: "ying.huang@intel.com" To: Jagdish Gediya , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, baolin.wang@linux.alibaba.com, dave.hansen@linux.intel.com, dan.j.williams@intel.com, Yang Shi , Wei Xu Date: Thu, 14 Apr 2022 15:00:46 +0800 In-Reply-To: <20220413092206.73974-1-jvgediya@linux.ibm.com> References: <20220413092206.73974-1-jvgediya@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=bPTplAy8; spf=none (imf03.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: ozk9tefdr7y4nstg8efo4do1woke4uep X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 127032000E X-HE-Tag: 1649919654-828498 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 2022-04-13 at 14:52 +0530, Jagdish Gediya wrote: > Current implementation to find the demotion targets works > based on node state N_MEMORY, however some systems may have > dram only memory numa node which are N_MEMORY but not the > right choices as demotion targets. > > This patch series introduces the new node state > N_DEMOTION_TARGETS, which is used to distinguish the nodes which > can be used as demotion targets, node_states[N_DEMOTION_TARGETS] > is used to hold the list of nodes which can be used as demotion > targets, support is also added to set the demotion target > list from user space so that default behavior can be overridden. It appears that your proposed user space interface cannot solve all problems. For example, for system as follows, Node 0 & 2 are cpu + dram nodes and node 1 are slow memory node near node 0, available: 3 nodes (0-2) node 0 cpus: 0 1 node 0 size: n MB node 0 free: n MB node 1 cpus: node 1 size: n MB node 1 free: n MB node 2 cpus: 2 3 node 2 size: n MB node 2 free: n MB node distances: node 0 1 2 0: 10 40 20 1: 40 10 80 2: 20 80 10 Demotion order 1: node demotion_target 0 1 1 X 2 X Demotion order 2: node demotion_target 0 1 1 X 2 1 The demotion order 1 is preferred if we want to reduce cross-socket traffic. While the demotion order 2 is preferred if we want to take full advantage of the slow memory node. We can take any choice as automatic-generated order, while make the other choice possible via user space overridden. I don't know how to implement this via your proposed user space interface. How about the following user space interface? 1. Add a file "demotion_order_override" in /sys/devices/system/node/ 2. When read, "1" is output if the demotion order of the system has been overridden; "0" is output if not. 3. When write "1", the demotion order of the system will become the overridden mode. When write "0", the demotion order of the system will become the automatic mode and the demotion order will be re-generated. 4. Add a file "demotion_targets" for each node in /sys/devices/system/node/nodeX/ 5. When read, the demotion targets of nodeX will be output. 6. When write a node list to the file, the demotion targets of nodeX will be set to the written nodes. And the demotion order of the system will become the overridden mode. To reduce the complexity, the demotion order of the system is either in overridden mode or automatic mode. When converting from the automatic mode to the overridden mode, the existing demotion targets of all nodes will be retained before being changed. When converting from overridden mode to automatic mode, the demotion order of the system will be re- generated automatically. In overridden mode, the demotion targets of the hot-added and hot- removed node will be set to empty. And the hot-removed node will be removed from the demotion targets of any node. This is an extention of the interface used in the following patch, https://lore.kernel.org/lkml/20191016221149.74AE222C@viggo.jf.intel.com/ What do you think about this? > node state N_DEMOTION_TARGETS is also set from the dax kmem > driver, certain type of memory which registers through dax kmem > (e.g. HBM) may not be the right choices for demotion so in future > they should be distinguished based on certain attributes and dax > kmem driver should avoid setting them as N_DEMOTION_TARGETS, > however current implementation also doesn't distinguish any > such memory and it considers all N_MEMORY as demotion targets > so this patch series doesn't modify the current behavior. > Best Regards, Huang, Ying [snip]