From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3B61C43334 for ; Mon, 18 Jul 2022 06:57:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 646BA6B0098; Mon, 18 Jul 2022 02:57:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F1076B0099; Mon, 18 Jul 2022 02:57:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B9C88E0003; Mon, 18 Jul 2022 02:57:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 390EB6B0098 for ; Mon, 18 Jul 2022 02:57:53 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0839C33E99 for ; Mon, 18 Jul 2022 06:57:53 +0000 (UTC) X-FDA: 79699315626.05.C5C0972 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf02.hostedemail.com (Postfix) with ESMTP id A88B780043 for ; Mon, 18 Jul 2022 06:57:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658127471; x=1689663471; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=kP3TMLhOBN0SOfYIAaFFq2xdvS0WfG2ZEeJ6EaXyvT0=; b=R5N1iSPJ6cp1ybDBpnduD5wf30fwRUTgNTBbbNt4BsoZinenJd5XkLiP 52ellf8XsHOd1M9ZYDZbw8ceqmNxxcBPJWWijzn5VoMEAQL3VyN1sAOQp 2AaRc+zKPElu9fKVyLmBF7sGZVMpQZOLuyvLHTEdnl+cQU65RWg47zDIY 5uWilRW7gZ/78+0nJbU8VvRDZvxfylLcqOOZU+LwofmqbJ9rTgjvSWlKm CBRaOmUlQFMrf1qTEyq8RF2KbAfxhqXaC8375KTydad6vZNGJlpc8Y+H+ gG7+YmYC/6sH+psTPvZQugmn3Fzxn1HlvCvXbmBhC+zClShaR7haCHunY g==; X-IronPort-AV: E=McAfee;i="6400,9594,10411"; a="269173271" X-IronPort-AV: E=Sophos;i="5.92,280,1650956400"; d="scan'208";a="269173271" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jul 2022 23:57:50 -0700 X-IronPort-AV: E=Sophos;i="5.92,280,1650956400"; d="scan'208";a="572284966" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.13.94]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jul 2022 23:57:46 -0700 From: "Huang, Ying" To: Aneesh Kumar K V Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Wei Xu , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, Jagdish Gediya Subject: Re: [PATCH v9 1/8] mm/demotion: Add support for explicit memory tiers References: <20220714045351.434957-1-aneesh.kumar@linux.ibm.com> <20220714045351.434957-2-aneesh.kumar@linux.ibm.com> <87bktq4xs7.fsf@yhuang6-desk2.ccr.corp.intel.com> <3659f1bb-a82e-1aad-f297-808a2c17687d@linux.ibm.com> Date: Mon, 18 Jul 2022 14:57:42 +0800 In-Reply-To: <3659f1bb-a82e-1aad-f297-808a2c17687d@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Fri, 15 Jul 2022 14:38:43 +0530") Message-ID: <87tu7e3o2h.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658127472; a=rsa-sha256; cv=none; b=nFBy478Epa8mUR/KdHkWCv4q2Q1F85WIw22vvZZ7wqML+zclsUOutuvZHGvQwUPrsU1mN7 V2VZJuXJ6ir+Vx3rb2pJNZAAeSgHwHAVl3Z48h9C98bm4wKoh4ZXOPqWego/8f4HAzxZnx H7GPJQN40BYJ7Y3aiqnAGs3XhU8UkAM= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=R5N1iSPJ; spf=none (imf02.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658127472; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=++WSFdn27j34ZyOj9JlpdxS1HaxWQms5paNZLnyCrzk=; b=dJ2koTn9wb5hMzBu6fdKvqfW5IJQQ+f1e8FsQPobU3tBPtHFRcf9VkfDLabUiiw/HCighb 9DIh91I1nOfMp5MK5PnPNz0oLg+hq/kYYsF99yxAdEFBj0oduwZRkaGZ7ClEgOU2ODfu7F XYoq4LDaZQO89OQd1N1dKs0p0W45A1o= X-Rspamd-Queue-Id: A88B780043 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=R5N1iSPJ; spf=none (imf02.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: jg4k1e5tx31bzo44zwmfmjedtw4upza8 X-HE-Tag: 1658127471-173115 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Aneesh Kumar K V writes: > On 7/15/22 1:23 PM, Huang, Ying wrote: [snip] >> >> You dropped the original sysfs interface patches from the series, but >> the kernel internal implementation is still for the original sysfs >> interface. For example, memory tier ID is for the original sysfs >> interface, not for the new proposed sysfs interface. So I suggest you >> to implement with the new interface in mind. What do you think about >> the following design? >> > > Sorry I am not able to follow you here. This patchset completely drops > exposing memory tiers to userspace via sysfs. Instead it allow > creation of memory tiers with specific tierID from within the kernel/device driver. > Default tierID is 200 and dax kmem creates memory tier with tierID 100. > > >> - Each NUMA node belongs to a memory type, and each memory type >> corresponds to a "abstract distance", so each NUMA node corresonds to >> a "distance". For simplicity, we can start with static distances, for >> example, DRAM (default): 150, PMEM: 250. The distance of each NUMA >> node can be recorded in a global array, >> >> int node_distances[MAX_NUMNODES]; >> >> or, just >> >> pgdat->distance >> > > I don't follow this. I guess you are trying to have a different design. > Would it be much easier if you can write this in the form of a patch? Written some pseudo code as follow to show my basic idea. #define MEMORY_TIER_ADISTANCE_DRAM 150 #define MEMORY_TIER_ADISTANCE_PMEM 250 struct memory_tier { /* abstract distance range covered by the memory tier */ int adistance_start; int adistance_len; struct list_head list; nodemask_t nodemask; }; /* RCU list of memory tiers */ static LIST_HEAD(memory_tiers); /* abstract distance of each NUMA node */ int node_adistances[MAX_NUMNODES]; struct memory_tier *find_create_memory_tier(int adistance) { struct memory_tier *tier; list_for_each_entry(tier, &memory_tiers, list) { if (adistance >= tier->adistance_start && adistance < tier->adistance_start + tier->adistance_len) return tier; } /* allocate a new memory tier and return */ } void memory_tier_add_node(int nid) { int adistance; struct memory_tier *tier; adistance = node_adistances[nid] || MEMORY_TIER_ADISTANCE_DRAM; tier = find_create_memory_tier(adistance); node_set(nid, &tier->nodemask); /* setup demotion data structure, etc */ } static int __meminit migrate_on_reclaim_callback(struct notifier_block *self, unsigned long action, void *_arg) { struct memory_notify *arg = _arg; int nid; nid = arg->status_change_nid; if (nid < 0) return notifier_from_errno(0); switch (action) { case MEM_ONLINE: memory_tier_add_node(nid); break; } return notifier_from_errno(0); } /* kmem.c */ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) { node_adistances[dev_dax->target_node] = MEMORY_TIER_ADISTANCE_PMEM; /* add_memory_driver_managed() */ } [snip] Best Regards, Huang, Ying