From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDD9FCDB47E for ; Wed, 18 Oct 2023 08:33:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 506A48D0147; Wed, 18 Oct 2023 04:33:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B6718D0016; Wed, 18 Oct 2023 04:33:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A4EB8D0147; Wed, 18 Oct 2023 04:33:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2C1AC8D0016 for ; Wed, 18 Oct 2023 04:33:22 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E680E80194 for ; Wed, 18 Oct 2023 08:33:21 +0000 (UTC) X-FDA: 81357917802.22.AF5B5FC Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by imf06.hostedemail.com (Postfix) with ESMTP id 65EE318000A for ; Wed, 18 Oct 2023 08:33:19 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=X4iJ9q8D; spf=pass (imf06.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697617999; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=jx6PzON9MS9vp5iltHMQIjhJIp1EeEnkAGQ2m9ocGUU=; b=iZ8/P9S8ZRNLKH1V0MbB6/6ZpcR34CAozBPdBMoVSbVp2VlR6YxeIDkGz7XSAqqU9weZ7I 4DWh9LKE7lYb8q1ULj/Z4QEPE2DNn8ImMtPlSW3U3tKZzojJMS9xB54R2XuBXZ73V5Nwce l/RaipjbPbWwjFPJaHwUhXoBwvBmupc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697617999; a=rsa-sha256; cv=none; b=8Vx6MHdqW7O5mqk15ZaeE9L8ZZMW8rJfX4Rh5HquaUJ6IRv9T3wrl463mnxS38hvzWV9hs OdxXddKNpwyfBFkV4ci+K0s9ee6hSmyrjrhKTWoVEdQRs20Or67MYGTW+AB68a+GjN1aTg LPUa13XUTM/xD3Yw9ph+SoBsiVRC+vE= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=X4iJ9q8D; spf=pass (imf06.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697617999; x=1729153999; h=from:to:subject:in-reply-to:references:date:message-id: mime-version; bh=28BEhauydut+mQimX7U4YfapCopmebuRODTGRijiKyI=; b=X4iJ9q8D93m9f1tfNo+asx7YM/fAr3YIw373g+STANa6zHaVQ1elgw4H FT8i383JnZy1m9Qmo+XVxxiy2aU0hoIqksxJXUa5MZPoB0YbWhcCF6Jt4 wftQ0Bmc5EYJ2PAKlBC+kllmlow+A2dMIR8e3IW8JdSWWmLLxZ0wQld4M 9C+pFH0nyQUBadJnNP+uCwJcijxZFfeCsrrS5vUTvjBnHYuS3Oo2V0jCI rFQ/YCK0UGAHlK/6sirs5biAA1ERWdxJAKYcywOk7HKoqfEDsVh181VrE H3uq2pdJZDX+OWzD2Drt7SPxggi13v7NmNsMHBu/SrUoa9iiAdI1/cPLZ Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10866"; a="384848478" X-IronPort-AV: E=Sophos;i="6.03,234,1694761200"; d="scan'208";a="384848478" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2023 01:33:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10866"; a="749994672" X-IronPort-AV: E=Sophos;i="6.03,234,1694761200"; d="scan'208";a="749994672" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2023 01:33:13 -0700 From: "Huang, Ying" To: Gregory Price Cc: Gregory Price , , , , , Cc: Aneesh Kumar K.V Cc: Wei Xu Cc: Alistair Popple Cc: Dan Williams Cc: Dave Hansen Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Subject: Re: [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving In-Reply-To: <87pm1cwcz5.fsf@yhuang6-desk2.ccr.corp.intel.com> (Ying Huang's message of "Wed, 18 Oct 2023 16:29:02 +0800") References: <20231009204259.875232-1-gregory.price@memverge.com> <87o7gzm22n.fsf@yhuang6-desk2.ccr.corp.intel.com> <87pm1cwcz5.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 18 Oct 2023 16:31:11 +0800 Message-ID: <87lec0wcvk.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Stat-Signature: snidihxwb1qhbabjpkhnxpf5umqw4nkj X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 65EE318000A X-Rspam-User: X-HE-Tag: 1697617999-751554 X-HE-Meta: U2FsdGVkX1829NPjQVRTl5Ke/1bza+VuZwI65uQMGNx4rnAGymFbct9ZuO3t0K4PUZP+iTxFuzV0fsTxaaurUGu7wGSRgqQpQVEgy61q0MCN1MInmd4QxXjpKneb+obtZgU/fvScG/8vrNZa6PDmX56BlwdgrKoaaQZ3/bWljUA5e+mf2nY5Y9zTLgnDB0DlsuZi3YcNKweMIqIaqIHhschV9b0cKqaBiVQu38LgfvnfCGmeJIAZuDGgA88oP8SieAZezdwqJXHWULCSZqQxmu+SUM95//boMLWF7iAqWPF2KTEp/aqcgbGfFAVLMPoZHu29lW7l2OY0mlyx9+zRuxdEhLVV4DMq7HsmHU3u0reuaJjGMCd+JQx/RgkrrcqPYVInw1OfJZ69YGOyt4wSohe8pWLpj7wB4YAsb6I64snZU+FN1N4X6vZF+3XhG9HYUOG6bGKa6kWQrP4f6c8J7H/5msVjLhiQCfGVge47AEhDEOIRTKS3trUnjlDIASOtl/7Gbn841Cu5/VX/cQJJLzkVik9LDfm52I9aneSNLG7hEAgedExudSr37cZNuP+HAdQjBaNP5q7Uzyu3idXppXxWhicTnpzU2dvRtnlvBr2jLEChacTOUczqfgCFUkmUR4l0iazOYxZ9kt9yO9zMCoLJOr/4TujgpHqwoeECq/aMND/YrhgvxLwHraWz/6VSmMvAEMDQKYM2rVziMPCSAhwCZTKpxq/I0j2f8524MKEpIex+EL2bYRXJ2zsdWguOQf4ab9kGF6iDmDEVNzNfkO9J84v4VBgTrXBI8QErL5IlbqpDyKlEft2EtrK+eOqF/Vd4KN2kAg7YB/DNDzvQh83bJmPtHXU3HRg57TQXm4l0/aduomIqJcp2SVOA9DhnzaMc8ayIY3D+kCYxQyF4aeoY2g44z3udQvri9lx1VHiUdWR+0xtooG8MqNach+hAw2oQPVyTBRD1qKvbecn oH38iGZ9 td56+rCDM4CGHyAc+tvKL/VHv0oc9akZ6+tteeanPucqfDgcRKjm0BGBUgRN3HsccNbAYiYj1PI68/HwFSvT9S5gqlfd2Rqg2qQ/F7g1abeo+uZtg6QiFHE6cBy72VPocCeGi+mO7+a7ZMFRXfkAN0ClidbVr3SgZNt3jVNWtTiTAI8Q0nxui9k+WffT4UwkHxRLjhcH8GIdF+ja/Y1ROSLrFPH66fu+gtfMqbcaZxvgrlUEdziaPsiK+0Z5J+8Pri8bTKBh/eXhnQK/ZPlBD0s7K+5qYIXZW00IuQu3ZGVQWWmeFa5x4SILwHuWl1tdV5ueElM+PBAWT27sZmz+0CB9Wf0IK6AYzCdjJGxahxx/CtICfkso5+PQx4ynqC/0qWQSkP8PMq2/RCDoB5DEKcYyGfg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Forget to Cc more people. "Huang, Ying" writes: > Gregory Price writes: > >> On Mon, Oct 16, 2023 at 03:57:52PM +0800, Huang, Ying wrote: >>> Gregory Price writes: >>> >>> > == Mutex to Semaphore change: >>> > >>> > Since it is expected that many threads will be accessing this data >>> > during allocations, a mutex is not appropriate. >>> >>> IIUC, this is a change for performance. If so, please show some >>> performance data. >>> >> >> This change will be dropped in v3 in favor of the existing >> RCU mechanism in memory-tiers.c as pointed out by Matthew. >> >>> > == Source-node relative weighting: >>> > >>> > 1. Set weights for DDR (tier4) and CXL(teir22) tiers. >>> > echo source_node:weight > /path/to/interleave_weight >>> >>> If source_node is considered, why not consider target_node too? On a >>> system with only 1 tier (DRAM), do you want weighted interleaving among >>> NUMA nodes? If so, why tie weighted interleaving with memory tiers? >>> Why not just introduce weighted interleaving for NUMA nodes? >>> >> >> The short answer: Practicality and ease-of-use. >> >> The long answer: We have been discussing how to make this more flexible.. >> >> Personally, I agree with you. If Task A is on Socket 0, the weight on >> Socket 0 DRAM should not be the same as the weight on Socket 1 DRAM. >> However, right now, DRAM nodes are lumped into the same tier together, >> resulting in them having the same weight. >> >> If you scrollback through the list, you'll find an RFC I posted for >> set_mempolicy2 which implements weighted interleave in mm/mempolicy. >> However, mm/mempolicy is extremely `current-centric` at the moment, >> so that makes changing weights at runtime (in response to a hotplug >> event, for example) very difficult. >> >> I still think there is room to extend set_mempolicy to allow >> task-defined weights to take preference over tier defined weights. >> >> We have discussed adding the following features to memory-tiers: >> >> 1) breaking up tiers to allow 1 tier per node, as opposed to defaulting >> to lumping all nodes of a simlar quality into the same tier >> >> 2) enabling movemnet of nodes between tiers (for the purpose of >> reconfiguring due to hotplug and other situations) >> >> For users that require fine-grained control over each individual node, >> this would allow for weights to be applied per-node, because a >> node=tier. For the majority of use cases, it would allow clumping of >> nodes into tiers based on physical topology and performance class, and >> then allow for the general weighting to apply. This seems like the most >> obvious use-case that a majority of users would use, and also the >> easiest to set-up in the short term. >> >> That said, there are probably 3 or 4 different ways/places to implement >> this feature. The question is what is the clear and obvious way? >> I don't have a definitive answer for that, hence the RFC. >> >> There are at least 5 proposals that i know of at the moment >> >> 1) mempolicy >> 2) memory-tiers >> 3) memory-block interleaving? (weighting among blocks inside a node) >> Maybe relevant if Dynamic Capacity devices arrive, but it seems >> like the wrong place to do this. >> 4) multi-device nodes (e.g. cxl create-region ... mem0 mem1...) >> 5) "just do it in hardware" > > It may be easier to start with the use case. What is the practical use > cases in your mind that can not be satisfied with simple per-memory-tier > weight? Can you compare the memory layout with different proposals? > >>> > # Set tier4 weight from node 0 to 85 >>> > echo 0:85 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight >>> > # Set tier4 weight from node 1 to 65 >>> > echo 1:65 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight >>> > # Set tier22 weight from node 0 to 15 >>> > echo 0:15 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight >>> > # Set tier22 weight from node 1 to 10 >>> > echo 1:10 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight > > -- > Best Regards, > Huang, Ying