From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00343C001B0 for ; Tue, 25 Jul 2023 03:16:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 865196B0071; Mon, 24 Jul 2023 23:16:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 816BE6B0074; Mon, 24 Jul 2023 23:16:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 704588E0001; Mon, 24 Jul 2023 23:16:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5ECA26B0071 for ; Mon, 24 Jul 2023 23:16:28 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 34671B2395 for ; Tue, 25 Jul 2023 03:16:28 +0000 (UTC) X-FDA: 81048671256.08.328ECCB Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf14.hostedemail.com (Postfix) with ESMTP id 9143E10001D for ; Tue, 25 Jul 2023 03:16:25 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ZEyKREHt; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf14.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690254986; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qIFslydowtZ7legH7zTLI6WGlD99CxiTe/N/B7CmlmQ=; b=H3VjE63sVV0eWLBwUialhd1uUB+6yscHeO8325M41YnZyJBIN7kJSzYu+GMq4XpaVU4nv9 E/NsnAE1m2Je/FFVIWySMGMRiIHUvV4G70WTuy/LCLh40USDJatPXkPZWgQerC5neMVL7C ywFm90YZbhpeAP9+y5BF+AqSmrmtbQs= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ZEyKREHt; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf14.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.136 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690254986; a=rsa-sha256; cv=none; b=fAZernZpyU9rFMINEFIVmRyslvbbRrar+aEO/CF50dF8lABN70nQ1suxPKSJNy7KASmX0L 7nMScrcma+GubFvk5yScAEBCTEIblQHTLYQyMTruGSSs1X7JyYAofQtnXCjgdAFGX+NfN+ z0NUV6IkxKG/nOxmL4Zmc4Wd2yuUHbc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690254985; x=1721790985; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=eXqEf3qlKPxbWMmz2TfO/Y35RDERdVQkGU4AZ8/UTeY=; b=ZEyKREHtwBLnz1ElxKVV8Ac7KsYyksIngPGpa6Tp3zIBj4WWdLzjva5Z yjS6jZOhgJPs9rXXBCtXHRQDOabjaPDYj1vfy/h/d+cIHNmYJbZ+oWAL7 g+ZZAI9ziKAVJd1In1Ni5jPsQa0a5th6QS7t/CgISrkzoiUdsaD4+l/T2 Rt8+9zbcbA+DCQVy65Uq1XzEdjh0fdOvtD6rZqk5DwfAwkjRSKjHX3o4m ksHP8o0e46ndF0ZIbIXyh6/oSXExzfU0JMZeQIgFi1fyIvWZFLLID6O3d sQav/IXR+QVP2udS7zgAZ2iVq5TWkccU10gy8l7VzcVu2wBRmExHrJOcw g==; X-IronPort-AV: E=McAfee;i="6600,9927,10781"; a="347211937" X-IronPort-AV: E=Sophos;i="6.01,229,1684825200"; d="scan'208";a="347211937" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jul 2023 20:16:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10781"; a="729174386" X-IronPort-AV: E=Sophos;i="6.01,229,1684825200"; d="scan'208";a="729174386" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jul 2023 20:16:19 -0700 From: "Huang, Ying" To: Alistair Popple Cc: Andrew Morton , , , , , , "Aneesh Kumar K . V" , Wei Xu , Dan Williams , Dave Hansen , "Davidlohr Bueso" , Johannes Weiner , "Jonathan Cameron" , Michal Hocko , Yang Shi , Rafael J Wysocki , Dave Jiang Subject: Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management References: <20230721012932.190742-1-ying.huang@intel.com> <20230721012932.190742-2-ying.huang@intel.com> <87r0owzqdc.fsf@nvdebian.thelocal> Date: Tue, 25 Jul 2023 11:14:38 +0800 In-Reply-To: <87r0owzqdc.fsf@nvdebian.thelocal> (Alistair Popple's message of "Tue, 25 Jul 2023 12:13:49 +1000") Message-ID: <87r0owy95t.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Stat-Signature: rmytmahikoguq6knq9i4zftrbnmrksuo X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9143E10001D X-HE-Tag: 1690254985-252222 X-HE-Meta: U2FsdGVkX1+tundaPhQOxtDiLd7+8h7sSSYg+7MofsXBaHy9gTYDT3FoYnw2F3G38PQSimAihecbiJBZQQk90DbsYC0b8yP5OHSQpYsRGvrgUTCMLTbTLp8ycQ12/yuUWwSGxiMS7Z8BMUIF1WX96kWxC5g4WXS1+xL4e0kdXGT+OezOVU/WAZ1r88XgcDAqUxCT4pRAsyv90Gbk0kGakEas5nsGg9GBHxyu5z8Xv3x7W2fd5ealUPoYKEGbW+ifIHx3LMblQUEHx7iA+oih7seYLjnmJ23tE/q7M2zFcSIb26DKFEvTZDfYGdlM+jtFf0JP/JkptAoPQ0awCCkV5jJMgtUW5GvW7Um0HIomGCIj/Jy4RhIxUMSfXrPkGj2tJRP6+RAhORJogZ/dVctlTl5V2VDB0A1id6jym9dG7Rw0YQe/6Rk1rAIdt4yB0CfW3kreletef79csK7aXOwOheOu6oPYKMDsAsvnOkaaEgxw4bQS9k7av4bZns7Bk/HtDmkAv3auezzcqAfqiTY51md6OkIM4p6um1BcLXIjw7H51CLYpflmahEJuEAMici0t/9PIU+EdAn/xG/yrNU2GdRmI2NljLK5vFz8SKCW0XyHOlEtUVf6rxLZGN+ThilAHSan5plN9IGFHerD3nBRn4Hegd+bKrMIaOb9xGnYHjyhcqfpL4Kvj4gL9yKCi6EH1x3XqARvVPP7Qz0pYklckSJkG4E9XyXBc1U2KlOSt3rlfjn1BZkTX1ui64Qehs5LsYiRVSrwzsJ9RekDeCcqjCtHXQmFBEF0tvVduI73GNpYrhUNp4lsVWjBJrGbvNorLlut7Z13xDQFDbQ4T0eNxpWOM8AJvj3VjAvphBQwUdE66PjooeIAJCpYOV1cK8iumM25AFsPRQ6Lhx6nzKwyhtt1iktjYsJ13fN4MtV6qNp1LS1aMgEyA+MrtIHLTvgyOzBTVAChH5n2mOWAP1e zoHayTps ujD+ihCkx7JLdkvSrzZFTHEc4d387/YWh8BTrF0w1F7davqBBRRd0JObogE9KKu8gbq4Grc57Z+wDne3PNPuExXZ0E3fiDi19nC+rx4O/CKivHX8AkOmvH/aV7/vm3qCt+bDyy48AsEsObMe5t/j2Z53f+RmApnBDDfidPutmbEC2VgovjljhlBp5LKg2V1sFHdxpyCNtmawD7MG/atz+DSqcfZLZPTtD9AlP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Alistair, Thanks a lot for comments! Alistair Popple writes: > Huang Ying writes: > >> The abstract distance may be calculated by various drivers, such as >> ACPI HMAT, CXL CDAT, etc. While it may be used by various code which >> hot-add memory node, such as dax/kmem etc. To decouple the algorithm >> users and the providers, the abstract distance calculation algorithms >> management mechanism is implemented in this patch. It provides >> interface for the providers to register the implementation, and >> interface for the users. > > I wonder if we need this level of decoupling though? It seems to me like > it would be simpler and better for drivers to calculate the abstract > distance directly themselves by calling the desired algorithm (eg. ACPI > HMAT) and pass this when creating the nodes rather than having a > notifier chain. Per my understanding, ACPI HMAT and memory device drivers (such as dax/kmem) may belong to different subsystems (ACPI vs. dax). It's not good to call functions across subsystems directly. So, I think it's better to use a general subsystem: memory-tier.c to decouple them. If it turns out that a notifier chain is unnecessary, we can use some function pointers instead. > At the moment it seems we've only identified two possible algorithms > (ACPI HMAT and CXL CDAT) and I don't think it would make sense for one > of those to fallback to the other based on priority, so why not just > have drivers call the correct algorithm directly? For example, we have a system with PMEM (persistent memory, Optane DCPMM, or AEP, or something else) in DIMM slots and CXL.mem connected via CXL link to a remote memory pool. We will need ACPI HMAT for PMEM and CXL CDAT for CXL.mem. One way is to make dax/kmem identify the types of the device and call corresponding algorithms. The other way (suggested by this series) is to make dax/kmem call a notifier chain, then CXL CDAT or ACPI HMAT can identify the type of device and calculate the distance if the type is correct for them. I don't think that it's good to make dax/kem to know every possible types of memory devices. >> Multiple algorithm implementations can cooperate via calculating >> abstract distance for different memory nodes. The preference of >> algorithm implementations can be specified via >> priority (notifier_block.priority). > > How/what decides the priority though? That seems like something better > decided by a device driver than the algorithm driver IMHO. Do we need the memory device driver specific priority? Or we just share a common priority? For example, the priority of CXL CDAT is always higher than that of ACPI HMAT? Or architecture specific? And, I don't think that we are forced to use the general notifier chain interface in all memory device drivers. If the memory device driver has better understanding of the memory device, it can use other way to determine abstract distance. For example, a CXL memory device driver can identify abstract distance by itself. While other memory device drivers can use the general notifier chain interface at the same time. -- Best Regards, Huang, Ying