From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B842C4332F for ; Fri, 3 Nov 2023 07:02:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9639D8D00BC; Fri, 3 Nov 2023 03:02:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 913D38D000F; Fri, 3 Nov 2023 03:02:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DD1A8D00BC; Fri, 3 Nov 2023 03:02:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6EDD08D000F for ; Fri, 3 Nov 2023 03:02:29 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 42ECEA0FDB for ; Fri, 3 Nov 2023 07:02:29 +0000 (UTC) X-FDA: 81415749618.07.41976CC Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by imf04.hostedemail.com (Postfix) with ESMTP id 0656940009 for ; Fri, 3 Nov 2023 07:02:25 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=VUb11KNv; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf04.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698994947; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Mp3ZCKium890M+UJaz3hWyKMGz+jsAEBIvkt7yfNiC0=; b=pl3ceVa+DPRxOViXKvO65YUzCQnOQGIt37n6Y2V2ynwz6hUOj5CbmXZBk/9wtte/jipCOf /iesa7qj3Gkmdsvi4L4X9LIUT3r0s57YAk1zgYgj9KBqnj/4ANLKMLC8Y1MHqmygiO5KML eB+ogssXnm+2KFCDT6fOAbnTtzhyiT4= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=VUb11KNv; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf04.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698994947; a=rsa-sha256; cv=none; b=vhnFOId2ii4ONCA21ASeN8RWKZUUnxkNlq4VIlcdK8KMJ93yjfYsg54lonTjmsKLGMpwHZ uMu55NQVKU1EQsUFCOMRnizXQsP1gcR8y30+j5WPGjqAUkLRHdZh5h3FgnD8ttiWUq8WVT JYp8ii+xi2mx7vKeE3jEmVzF8hdgQOo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698994946; x=1730530946; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=NlUvUl/u77O6i94HU6sLKj/Hm6CHxxLM9bHKMdKaE20=; b=VUb11KNvEzbJ34BmHSo3EwkYJIwbqPtyVwS+QoNmUt1dfHqvfouWrJvk 9eL03IYPVrdjbOsL87jjAAyCiQUd2LnaKPHkQAereuNKgc3+Lp2VW+R8k eJdwI5QdXgan8Dc51Kgeu4wtFoL2jle2MzE6mvcBMirTzEyJH9DNLTt7d upftCYwnY0oZD1vodAo3Cs5MAn/LOGOoCIziXllAjaVaDMqV95+CTtsUB TjBL44YezrPcISkT8nrNPEPXufGfxbnQY1x+EPBwdEw172SUDgLat22eI 82zrx0CK+4T+id+8xw/WWovUUKm5znxMzYy9v6to4G8KeCdWNDwYwAXts A==; X-IronPort-AV: E=McAfee;i="6600,9927,10882"; a="386062797" X-IronPort-AV: E=Sophos;i="6.03,273,1694761200"; d="scan'208";a="386062797" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Nov 2023 00:02:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10882"; a="878548746" X-IronPort-AV: E=Sophos;i="6.03,273,1694761200"; d="scan'208";a="878548746" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Nov 2023 00:02:20 -0700 From: "Huang, Ying" To: Ravi Jonnalagadda Cc: , , , , , , , , , , , , , , , Subject: Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave In-Reply-To: <20231102093542.70-1-ravis.opensrc@micron.com> (Ravi Jonnalagadda's message of "Thu, 2 Nov 2023 15:05:42 +0530") References: <87a5rw1wu8.fsf@yhuang6-desk2.ccr.corp.intel.com> <20231102093542.70-1-ravis.opensrc@micron.com> Date: Fri, 03 Nov 2023 15:00:18 +0800 Message-ID: <87o7gbz5h9.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 0656940009 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 6gogbo65ouuonsk96r6mk7dbyu6odmik X-HE-Tag: 1698994945-845974 X-HE-Meta: U2FsdGVkX18cZK+m8lILbv3NToBoFM2MoSFp0HCKviX3HWqKIs/DRKwm//MgajFc3B/EpvyOWduBpH4zIfR7T78Ut69uhiCUYU4n+3sWuvtFMKbASk0mfAu3cnlWgiugNvV7I8E8lgpaj7Ri7fxEhrrADGV5ekYkMhe/MsZt7NBLgSFlVERpPkdVDja04gYKGTc14kob7QYV9W6XBg43qPQDiBV5lEdnJbabCLxCHaBG84SlyTFcZGfCvA5pvJKl8aPr4tpHTTamClkWtjK+3QJSFmxbI37ms82ICEzycH2SaeyAZ558VcLX93hVp6V3EotVu4LLnk5J981uOW1LkkfXiyoRBGbsnrvDZKAm7tE1r5pxBeeR5C5sw+r0vh1ZvQ1NmMZjRVCdLFDPJ5QrjIw+CsH/8Se3+MbXdAQbUImsAqGA/EDDdLMTYb4wTYQxrddg7fpm5v8T6efbeVwm22pngNGf8HW1Hs3CqfHEtLvpCKNItaoMpvvquxJKr39PkYBcRYwrw1WiY85qcDRKHyfesClUEvZQaNQgasXbKKdRXQ3HmaRYWcwQwXkIwfQoCB6XY8xJ8919/WCVS7CLbvnB/R274csJyXKmu7lU6R+cbQEKMD1KwbV/peltDzJlkhqS8Pxm7Z9LtzrAeRjcZmb/qGrN7eIM1WNUWfcXXvEKt7e3q8QizbV/6dazxpI9hVSHZip7qekj4WNjAnMN668zwj9ximbQ3X3SDBoR/qZV/wrwqSbYlzAw4cIJ088kwdSkMWMSWCihSUJNJ1gADGoWQOKDQe6koUMIoE2j4fH8rAmEJtlFckn7Cj86CYztuJCGZ6P8ye2ybrT4Vn4Dwq4S127HFyUz56612LnHUeSkIf7QBSlXDtJIaw3uCetH9aDCqA4gV0ck3qeJ3CHoFkrsbXFuAKBLxTuGPgtaNzfr4jH0BNNhT7xkYFx4AJt1x7ajzIwCF8jL60IEMYJ qWbNrYER DXzjzpBhUaxCjxjzxsF32p4hBy18bi7GE5j6DZ9CFY9SqiaMt0hjXn0Sv2KewJtX5nKEMVq4mniFK9rQbXVBA4soDUlisStNj0C0fTPk6ryvIYR7uTe9aa11Cwzk5AXNZYiA+c8V3PSCCmvioy75vqfjWpbm4gbtkpqtz5BsBSPjkEEAVwAn+I54JAzBPnfGNIFJ2h20d+D63MzeDzqvW0iVU7Xl5DtPF5vURHQpHZfeVTsyLD/jsHyrruUSgWdse5at4LLwEL1hXr0eJ5KubUYGkiQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Ravi Jonnalagadda writes: > Should Node based interleave solution be considered complex or not would probably > depend on number of numa nodes that would be present in the system and whether > we are able to setup the default weights correctly to obtain optimum bandwidth > expansion. Node based interleave is more complex than tier based interleave. Because you have less tiers than nodes in general. >> >>> Pros and Cons of Memory Tier based interleave: >>> Pros: >>> 1. Programming weight per initiator would apply for all the nodes in the tier. >>> 2. Weights can be calculated considering the cumulative bandwidth of all >>> the nodes in the tier and need to be programmed once for all the nodes in a >>> given tier. >>> 3. It may be useful in cases where numa nodes with similar latency and bandwidth >>> characteristics increase, possibly with pooling use cases. >> >>4. simpler. >> >>> Cons: >>> 1. If nodes with different bandwidth and latency characteristics are placed >>> in same tier as seen in the current mainline kernel, it will be difficult to >>> apply a correct interleave weight policy. >>> 2. There will be a need for functionality to move nodes between different tiers >>> or create new tiers to place such nodes for programming correct interleave weights. >>> We are working on a patch to support it currently. >> >>Thanks! If we have such system, we will need this. >> >>> 3. For systems where each numa node is having different characteristics, >>> a single node might end up existing in different memory tier, which would be >>> equivalent to node based interleaving. >> >>No. A node can only exist in one memory tier. > > Sorry for the confusion what i meant was, if each node is having different > characteristics, to program the memory tier weights correctly we need to place > each node in a separate tier of it's own. So each memory tier will contain > only a single node and the solution would resemble node based interleaving. > >> >>> On newer systems where all CXL memory from different devices under a >>> port are combined to form single numa node, this scenario might be >>> applicable. >> >>You mean the different memory ranges of a NUMA node may have different >>performance? I don't think that we can deal with this. > > Example Configuration: On a server that we are using now, four different > CXL cards are combined to form a single NUMA node and two other cards are > exposed as two individual numa nodes. > So if we have the ability to combine multiple CXL memory ranges to a > single NUMA node the number of NUMA nodes in the system would potentially > decrease even if we can't combine the entire range to form a single node. Sorry, I misunderstand your words. Yes, it's possible that there one tier for each node in some systems. But I guess we will have less tiers than nodes in general. -- Best Regards, Huang, Ying >> >>> 4. Users may need to keep track of different memory tiers and what nodes are present >>> in each tier for invoking interleave policy. >> >>I don't think this is a con. With node based solution, you need to know >>your system too. >> >>>> >>>>> Could you elaborate on the 'get what you pay for' usecase you >>>>> mentioned? >>>> >> >>-- >>Best Regards, >>Huang, Ying > -- > Best Regards, > Ravi Jonnalagadda