From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6DABEE49A6 for ; Mon, 21 Aug 2023 22:53:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F7D4280001; Mon, 21 Aug 2023 18:53:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3806A94000B; Mon, 21 Aug 2023 18:53:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1FBB9280001; Mon, 21 Aug 2023 18:53:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 07AAD94000B for ; Mon, 21 Aug 2023 18:53:06 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id CEB93160129 for ; Mon, 21 Aug 2023 22:53:05 +0000 (UTC) X-FDA: 81149613930.18.DBC2FDF Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.20]) by imf12.hostedemail.com (Postfix) with ESMTP id 2D37540009 for ; Mon, 21 Aug 2023 22:53:02 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ZGCJjaVd; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692658384; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=q2+2+MnmEtXoa0TdynRFG8Q9VXtUx78S6k9sLC1v02k=; b=oeqOcAQU1APj7VzV1Rq0sBKeHUUAq/ug8z5kpOCBTa9fq2FmHbbGdVMqX7hv6ikhucC6H6 Cr4sgYkJhdhiKMpreJ6OLRzxKsTu4hWr64ioryukNF644DGxg7dpt29MthU8QF04sNB2ir vZZW58i7RCI1A2XKpXwXvU6MJsg9zEk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692658384; a=rsa-sha256; cv=none; b=YGen5S5fSWxRFparRkU3Xyw0GmKQ6ktoHoVYVMyp1MdFZaiFTKwgWe8p3iq+PQH0KDlbPS xk1nNrwmT/R1+jSXHb6uJ457WOT7r4umJLNhEoI1Od3EOQ5Y79/0rL9j+T6LJqKWX67iUL 41FmhrhAemk0hDtcK/AasxI9Bj8cR2s= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ZGCJjaVd; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692658383; x=1724194383; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=LWApvfOUqHT4CVxPmBOO1cFmnp8lGFEMfYD9ICj0u+Q=; b=ZGCJjaVdPjeae6Zxm63hT4xVvZsC9MRcdIlZa1cbtUQpXVbXqoxjuRHe JW29JCekRAdcWg8QuaeMDH+GkwyQwTwVFWi9TD6tLbiJuZR5sZaVxyVzC hbGz28cYZogIMCW9Sfvq8Ya8YC71n8CG8WZWhgwCJJVSRsZQiFRnjY2X2 h5EJrxUiaSjSQVA88JmViKyMPu1fMWXH0jddGmexpRu1qBtbL6Huuim77 8SK8ppIJ1aGaTSCfKlyePxBreQUBgIqmrJSIGf1T3RPY8h35G7m5xkkRT QiMh5E8HgyW7u4FMfwqrCHdxiqFXYneV5tfEoRBxZBhnu4LUFFU5Jbaa/ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10809"; a="363888712" X-IronPort-AV: E=Sophos;i="6.01,191,1684825200"; d="scan'208";a="363888712" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2023 15:53:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10809"; a="909872297" X-IronPort-AV: E=Sophos;i="6.01,191,1684825200"; d="scan'208";a="909872297" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2023 15:52:56 -0700 From: "Huang, Ying" To: Alistair Popple Cc: Andrew Morton , , , , , , "Aneesh Kumar K . V" , Wei Xu , Dan Williams , Dave Hansen , "Davidlohr Bueso" , Johannes Weiner , "Jonathan Cameron" , Michal Hocko , Yang Shi , Rafael J Wysocki , Dave Jiang Subject: Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management References: <20230721012932.190742-1-ying.huang@intel.com> <20230721012932.190742-2-ying.huang@intel.com> <87r0owzqdc.fsf@nvdebian.thelocal> <87r0owy95t.fsf@yhuang6-desk2.ccr.corp.intel.com> <87sf9cxupz.fsf@nvdebian.thelocal> <878rb3xh2x.fsf@yhuang6-desk2.ccr.corp.intel.com> <87351axbk6.fsf@nvdebian.thelocal> <87edkuvw6m.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y1j2vvqw.fsf@nvdebian.thelocal> <87a5vhx664.fsf@yhuang6-desk2.ccr.corp.intel.com> <87lef0x23q.fsf@nvdebian.thelocal> <87r0oack40.fsf@yhuang6-desk2.ccr.corp.intel.com> <87cyzgwrys.fsf@nvdebian.thelocal> Date: Tue, 22 Aug 2023 06:50:51 +0800 In-Reply-To: <87cyzgwrys.fsf@nvdebian.thelocal> (Alistair Popple's message of "Mon, 21 Aug 2023 21:26:24 +1000") Message-ID: <87il98c8ms.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Stat-Signature: qm8bkws9ctg7k8ubfyk3r1oyi5c1qamb X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2D37540009 X-Rspam-User: X-HE-Tag: 1692658382-915702 X-HE-Meta: U2FsdGVkX1+dfk9bxuDSKFIjcNAwEvoxzNnacPIl7O/bwl6S343QQuBI908vUmeejLcgdYalX6aZHDTQ6Pbe1qyWd74fY0YREZNgAnyfCwXhA+IcaAgxdcdGSy/oCHc/ypGIp5lunXkwgKz4MPw4Jo9FuvS7z8lrOChRRHhLrYneFQcGNUdE83t1InQLFYlFxmpJTfXRnlv/sdRhC3W60EVVITL5b6U3qjHB50jODqZKACuQ2+mHOdexnfRbCQUqWHEHgoXV9NLq8dEa1zj/K3ae+zPaPYHQ+CpbM0MGZ0V81FsWUS80s/bklm2H2zld0y2lcKvW1iXRdx21naPHc6ADznzroerkk6dvygJ+C6LtkwQB/F7UGuDmyrVoEuLS3fPattROypMOgzlKUFQovd8yBQ4hLt8MYEJ3AB0yD+EQeg4HKm/0YWBn1timLVG2aS/3mIaoRSDBvWLitlpvynX6hQ5f1XhDsd4nRP8axB6SAPgFrvwEfpWNWcSh+H4FddX2y+03xEDGYhN4EZeOLNyj5I1ktvVCEeoeRp5I2zytmJ+tGtk7VcPK9dH75O6LC/RE0bt8qq6KLN2sDYtggIij7M924h6qK2X80wmSwa/V6rTZPtQBxo+p9NF73StJu9XGbaG8wbumOV0nnqcr2/at/k3E9DOjZzU69l5zVrrA5aUAqtjrYjD9fF6JXXoyWIFMiYkme+/iiaOIiSzRDblxtmGmL76c8ZVZ6z3B/t2ETfLYrvo2J3S8wfUsfVewdryuuMbBLlnBHF2LacwN2nAVOQrhPWY8uCTawDdykCyT8B/i88gTpABN8f3jtJ0/6PKI41mQi0GlvbliXLVz35lKu9CoIkvGMA3jh6KFIMOaVB8KNYMoYG9IFHRCs5XSlLT4/SbfkrITqVvJvRAZvzY5wbIKaKcS1u+MRiUXk2p+56apJBg6swoB9gBhHZ2J0Cp9LbfisJLVeg9OTfI 395tvE2l lmFhEc0dI/Ca2/2s1Ys6ZMYVv2S5kYYilBboLo4evp+dp+iqSWtVCLYPwtpqlbAmjAxobnf6Tb/fuqyC7ccwh16d3j8wj2uUu7jSh3XbKxrlIyyJzVwTmmMh9XriR3UDpa4mY17/ZSoPDHJDA3DEt8jUcrXnlTwXW857JDoimMDn6COOeW+JW7hk9ARYlb2vfjRNg/cNa6aDPdruLT5JIWOez01nLdvlijwh3b6m7O7bPCw3DRARdbT2iAw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Alistair Popple writes: > "Huang, Ying" writes: > >> Hi, Alistair, >> >> Sorry for late response. Just come back from vacation. > > Ditto for this response :-) > > I see Andrew has taken this into mm-unstable though, so my bad for not > getting around to following all this up sooner. > >> Alistair Popple writes: >> >>> "Huang, Ying" writes: >>> >>>> Alistair Popple writes: >>>> >>>>> "Huang, Ying" writes: >>>>> >>>>>> Alistair Popple writes: >>>>>> >>>>>>>>>> While other memory device drivers can use the general notifier chain >>>>>>>>>> interface at the same time. >>>>>>> >>>>>>> How would that work in practice though? The abstract distance as far as >>>>>>> I can tell doesn't have any meaning other than establishing preferences >>>>>>> for memory demotion order. Therefore all calculations are relative to >>>>>>> the rest of the calculations on the system. So if a driver does it's own >>>>>>> thing how does it choose a sensible distance? IHMO the value here is in >>>>>>> coordinating all that through a standard interface, whether that is HMAT >>>>>>> or something else. >>>>>> >>>>>> Only if different algorithms follow the same basic principle. For >>>>>> example, the abstract distance of default DRAM nodes are fixed >>>>>> (MEMTIER_ADISTANCE_DRAM). The abstract distance of the memory device is >>>>>> in linear direct proportion to the memory latency and inversely >>>>>> proportional to the memory bandwidth. Use the memory latency and >>>>>> bandwidth of default DRAM nodes as base. >>>>>> >>>>>> HMAT and CDAT report the raw memory latency and bandwidth. If there are >>>>>> some other methods to report the raw memory latency and bandwidth, we >>>>>> can use them too. >>>>> >>>>> Argh! So we could address my concerns by having drivers feed >>>>> latency/bandwidth numbers into a standard calculation algorithm right? >>>>> Ie. Rather than having drivers calculate abstract distance themselves we >>>>> have the notifier chains return the raw performance data from which the >>>>> abstract distance is derived. >>>> >>>> Now, memory device drivers only need a general interface to get the >>>> abstract distance from the NUMA node ID. In the future, if they need >>>> more interfaces, we can add them. For example, the interface you >>>> suggested above. >>> >>> Huh? Memory device drivers (ie. dax/kmem.c) don't care about abstract >>> distance, it's a meaningless number. The only reason they care about it >>> is so they can pass it to alloc_memory_type(): >>> >>> struct memory_dev_type *alloc_memory_type(int adistance) >>> >>> Instead alloc_memory_type() should be taking bandwidth/latency numbers >>> and the calculation of abstract distance should be done there. That >>> resovles the issues about how drivers are supposed to devine adistance >>> and also means that when CDAT is added we don't have to duplicate the >>> calculation code. >> >> In the current design, the abstract distance is the key concept of >> memory types and memory tiers. And it is used as interface to allocate >> memory types. This provides more flexibility than some other interfaces >> (e.g. read/write bandwidth/latency). For example, in current >> dax/kmem.c, if HMAT isn't available in the system, the default abstract >> distance: MEMTIER_DEFAULT_DAX_ADISTANCE is used. This is still useful >> to support some systems now. On a system without HMAT/CDAT, it's >> possible to calculate abstract distance from ACPI SLIT, although this is >> quite limited. I'm not sure whether all systems will provide read/write >> bandwith/latency data for all memory devices. >> >> HMAT and CDAT or some other mechanisms may provide the read/write >> bandwidth/latency data to be used to calculate abstract distance. For >> them, we can provide a shared implementation in mm/memory-tiers.c to map >> from read/write bandwith/latency to the abstract distance. Can this >> solve your concerns about the consistency among algorithms? If so, we >> can do that when we add the second algorithm that needs that. > > I guess it would address my concerns if we did that now. I don't see why > we need to wait for a second implementation for that though - the whole > series seems to be built around adding a framework for supporting > multiple algorithms even though only one exists. So I think we should > support that fully, or simplfy the whole thing and just assume the only > thing that exists is HMAT and get rid of the general interface until a > second algorithm comes along. We will need a general interface even for one algorithm implementation. Because it's not good to make a dax subsystem driver (dax/kmem) to depend on a ACPI subsystem driver (acpi/hmat). We need some general interface at subsystem level (memory tier here) between them. Best Regards, Huang, Ying