From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3329C0015E for ; Fri, 11 Aug 2023 03:53:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E20576B0071; Thu, 10 Aug 2023 23:53:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA96B6B0072; Thu, 10 Aug 2023 23:53:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C22AE6B0074; Thu, 10 Aug 2023 23:53:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id ADCDD6B0071 for ; Thu, 10 Aug 2023 23:53:08 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7C3F4C011A for ; Fri, 11 Aug 2023 03:53:08 +0000 (UTC) X-FDA: 81110453256.25.6AA40C6 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by imf18.hostedemail.com (Postfix) with ESMTP id 7590A1C0012 for ; Fri, 11 Aug 2023 03:53:04 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Ctn66cBx; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691725985; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vSGFJkVRV5kecZxK7Qwu5jL3MRfymCNbrAgvAdn80YE=; b=DPa1N9hnwTjxCpXdO455c/LBv/HrMllUU1f9NnJetVf+rqwEakYP+SoP2HbR/GoVos2Agx HceOcpN4FEbqMqZRWdWmdzwSTEBLyXz7OZ+2sRDtgCg6y84vIlqDOwKX3dktWU5yePT1Za hV7r1XcsSdwsEl9pvJ22qH1KwPw4JGM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Ctn66cBx; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691725985; a=rsa-sha256; cv=none; b=LMX25EzvbneGcDBI5fJ81DgDMHTvz9G0Ew2mV0TOtanyHawqnFJ/h+LruIDYqgaZFqwL7r l0BKLQkRunVReq+kt9XP9XOBBn9ePrBprZqCM+F/AoUeE80jbZ0BIgzsLnDd81zSN4Bt+R HNwfxQWl/T/gPhTF/flVcX1A+x1rtwk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691725984; x=1723261984; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=WDGapeIqb0kOff3B56X84HvpVJViEsZOYrPk6MNaeOU=; b=Ctn66cBxofOwwkE03ffEPaHE/S+RI7dXPLTEzgxNj0svt+iYTdGcVRC3 jO5bJeVzXYy8nhI39NjVkGHLGM8bKwgj9URhpKrGR9HwHJFTTvkz6BFwU 0H9jgCBj034M6qceLULJ5TwrGy/1uyxTI08ErooUqCp27pL+IAudtXVou qRu8o9v/3Jpj93yEoHUAhvc1qNmaEGkkjMBLCXiBTUdXQ2qJCkthlHI97 SpwncJKenw5Kc3C0vCUENwRWNVALJNdLCItKHNQ+SWqzbTh3sNw71vpcS PehQSoTLRq0SybPFgoxeHCgSoNLkh6wBYSU3xBvGfI+am3ke1hCp9HAeD Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10798"; a="356554304" X-IronPort-AV: E=Sophos;i="6.01,164,1684825200"; d="scan'208";a="356554304" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 20:53:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10798"; a="762047251" X-IronPort-AV: E=Sophos;i="6.01,164,1684825200"; d="scan'208";a="762047251" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 20:52:58 -0700 From: "Huang, Ying" To: Alistair Popple Cc: Andrew Morton , , , , , , "Aneesh Kumar K . V" , Wei Xu , Dan Williams , Dave Hansen , "Davidlohr Bueso" , Johannes Weiner , "Jonathan Cameron" , Michal Hocko , Yang Shi , Rafael J Wysocki , Dave Jiang Subject: Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management References: <20230721012932.190742-1-ying.huang@intel.com> <20230721012932.190742-2-ying.huang@intel.com> <87r0owzqdc.fsf@nvdebian.thelocal> <87r0owy95t.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sf9cxupz.fsf@nvdebian.thelocal> <878rb3xh2x.fsf@yhuang6-desk2.ccr.corp.intel.com> <87351axbk6.fsf@nvdebian.thelocal> <87edkuvw6m.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y1j2vvqw.fsf@nvdebian.thelocal> <87a5vhx664.fsf@yhuang6-desk2.ccr.corp.intel.com> <87lef0x23q.fsf@nvdebian.thelocal> Date: Fri, 11 Aug 2023 11:51:11 +0800 In-Reply-To: <87lef0x23q.fsf@nvdebian.thelocal> (Alistair Popple's message of "Fri, 28 Jul 2023 11:20:05 +1000") Message-ID: <87r0oack40.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 7590A1C0012 X-Stat-Signature: kd9gqtojnxq1oabhefc11ismg71ff1je X-HE-Tag: 1691725984-277162 X-HE-Meta: U2FsdGVkX1/nefvXpA3wftgbVlkdj/K0HXHMfvTMlU6CiKd4iI9h0proa0u1mlQ1Uv/wlcVPrLkP/sBN+WUYc+GhC3bcGIPr9HzD5PmJhtnL7HHS3F8hSe8/PYgJkS5Lv4LK4CO10YgyXrto40uJhoDUDXKqHecMvxD7F/Tt9+gp31C1KQbY1D76OpaE6poGHeismURDA0HhyoGtBoELs4myXcpveeGY512epXLvxw6+C/HluUFtrv9WfxBwBVJcJHzG/8plTkrw9N4dDVP6eF3iV4ePVkfpWrQtdKbuCUxM4PUJVYjeDEssZR5S8CSp+HWhERRpcyG/EbSQtC/7D9TZRPXXaLnRM9oJ5HmiZCL3NXM7C7/oa25UHdkz40AZJR1+nUMeppwp2D8UJTlAMIFI3j5v4vdDV4wXM4JawJ3FttetXbBNHn+DNBqnT1lWQOIda7hIjWahJEFOnYP1C/DV3DC2O2Z5DWgFlIRLyxAUdFalWldsx8qWJwxO2BgRUL7SMBo3xEtRGHPRdj92r8gm7Spd/2rMg3T4eTcrSCFMC0goS9BDlhAKa8alO9hUd1cDPGbDS/Tzo04W5UqJF6aMOfmL+Yry89H29FZtwzV4AAb7rHROGFItvhsI9USr3FOywM/9i/Qvh135Q5JDJl0WcMFUE8HzQrIByEBYpxQaT9TOTQPX32DYsKoSGg/SOLsudCGIMEeFLHyA+XY54wj7uK5dletHGNsSRgRgxnC8fqydCvsN5j5IlLE6+9/GKQ3cbrPysq0Bz6Xl4WltaGtCUf0Yg7jWAUciqhIpZXhY4c+A4dFqcLblPz8iR9+9XiXefa4ThrzOx2QTP79idSwYSSgyz5MCJ3MDPid11BTbpeZYJhVBJAldp4JKdAVE3DinW4ryQwGQn7RdwmPHqdGF2GUjEcjQx8WqfdmP4GXDtMjdWqvhlGTRPk+Uu/E7vOP85oYqt8GmSGTYCjY /VtuyIEv tkR5FCLAylg6ctML03OHMZN3H/lqk9IRdUM+W85KAnT6/1tfYSDe9B/qe2ClymaW3fLYv2LKQhJP7Gte8uoN/W7V1BVyb5xI5omY+K5AL0QgDADSxjCm+yvVIEcTF+7ySNyy5N6IzAAJgwXwoTwvsf8xYuis0kLJVz2GBgihlSOJoSu7d9jztqMqNl4Pz4R5uMJNauN7EXEdanuMdarf32kBurvyQZzwWdALvpQKNn+aGM1K0kxRr+Xtcqw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Alistair, Sorry for late response. Just come back from vacation. Alistair Popple writes: > "Huang, Ying" writes: > >> Alistair Popple writes: >> >>> "Huang, Ying" writes: >>> >>>> Alistair Popple writes: >>>> >>>>>>>> While other memory device drivers can use the general notifier chain >>>>>>>> interface at the same time. >>>>> >>>>> How would that work in practice though? The abstract distance as far as >>>>> I can tell doesn't have any meaning other than establishing preferences >>>>> for memory demotion order. Therefore all calculations are relative to >>>>> the rest of the calculations on the system. So if a driver does it's own >>>>> thing how does it choose a sensible distance? IHMO the value here is in >>>>> coordinating all that through a standard interface, whether that is HMAT >>>>> or something else. >>>> >>>> Only if different algorithms follow the same basic principle. For >>>> example, the abstract distance of default DRAM nodes are fixed >>>> (MEMTIER_ADISTANCE_DRAM). The abstract distance of the memory device is >>>> in linear direct proportion to the memory latency and inversely >>>> proportional to the memory bandwidth. Use the memory latency and >>>> bandwidth of default DRAM nodes as base. >>>> >>>> HMAT and CDAT report the raw memory latency and bandwidth. If there are >>>> some other methods to report the raw memory latency and bandwidth, we >>>> can use them too. >>> >>> Argh! So we could address my concerns by having drivers feed >>> latency/bandwidth numbers into a standard calculation algorithm right? >>> Ie. Rather than having drivers calculate abstract distance themselves we >>> have the notifier chains return the raw performance data from which the >>> abstract distance is derived. >> >> Now, memory device drivers only need a general interface to get the >> abstract distance from the NUMA node ID. In the future, if they need >> more interfaces, we can add them. For example, the interface you >> suggested above. > > Huh? Memory device drivers (ie. dax/kmem.c) don't care about abstract > distance, it's a meaningless number. The only reason they care about it > is so they can pass it to alloc_memory_type(): > > struct memory_dev_type *alloc_memory_type(int adistance) > > Instead alloc_memory_type() should be taking bandwidth/latency numbers > and the calculation of abstract distance should be done there. That > resovles the issues about how drivers are supposed to devine adistance > and also means that when CDAT is added we don't have to duplicate the > calculation code. In the current design, the abstract distance is the key concept of memory types and memory tiers. And it is used as interface to allocate memory types. This provides more flexibility than some other interfaces (e.g. read/write bandwidth/latency). For example, in current dax/kmem.c, if HMAT isn't available in the system, the default abstract distance: MEMTIER_DEFAULT_DAX_ADISTANCE is used. This is still useful to support some systems now. On a system without HMAT/CDAT, it's possible to calculate abstract distance from ACPI SLIT, although this is quite limited. I'm not sure whether all systems will provide read/write bandwith/latency data for all memory devices. HMAT and CDAT or some other mechanisms may provide the read/write bandwidth/latency data to be used to calculate abstract distance. For them, we can provide a shared implementation in mm/memory-tiers.c to map from read/write bandwith/latency to the abstract distance. Can this solve your concerns about the consistency among algorithms? If so, we can do that when we add the second algorithm that needs that. -- Best Regards, Huang, Ying