From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 621FECDB465 for ; Thu, 19 Oct 2023 06:30:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D785D8008F; Thu, 19 Oct 2023 02:30:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D00678008C; Thu, 19 Oct 2023 02:30:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA1EC8008F; Thu, 19 Oct 2023 02:30:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A55828008C for ; Thu, 19 Oct 2023 02:30:53 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 814001610F7 for ; Thu, 19 Oct 2023 06:30:53 +0000 (UTC) X-FDA: 81361237986.24.89A7376 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by imf12.hostedemail.com (Postfix) with ESMTP id EEA0F40006 for ; Thu, 19 Oct 2023 06:30:50 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=LmBR15YL; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697697051; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KOJ1S9jOP78hbk+SwiWzfd7EeYpQyHmGHFuWnCkE5oo=; b=j0bONxtbE5apg2fy5rRgtTkOlBb6cPOsiqDWuqj11HbKO4frfLjkcPq7stXOyc+fKArlr9 Ob4Bau5a+FkgEe8pPmPIxPIhwE6SWqLJYsJ9zZ3gl74lJvGYIiFQ5Y8uTIhS8gol+bxNi3 QbmPcw1owjrx7TX2bDaSXYjBTb/UJhY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697697051; a=rsa-sha256; cv=none; b=nVk6cen8vmmEWfDli0HQpCbLRPHBSfxO2KDpWx+hRoY7V+2Y1dY4hNzHJoJ0YHeIZzQLkn EHfM02lvPdrxbnEnYo3MMAQnEWjzVw2e3kUBmrUJVwyye1Kyyngm4K59+fvQr3Anr6C4KG Ir+3eB+5BNLDu+m17aXR7K+LdqVyapE= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=LmBR15YL; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697697051; x=1729233051; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=CXr5jF2igpwFL4qpy1yXG/Sjr63AfMYi+PszPqadSzU=; b=LmBR15YLOjSJ6vWV5zbuMTKMWhFPDKN2X4AHbxhlcR8KzoCQt4vnmk0k EdYpCC8j1U1fnhY3xiDVgtSRYTD9O1Bji81eWfPS3ZVjc7d+qEZZXzuP+ 1vBBbtIdvxxLoZDCssFfTB9rNsSUJ4F+aSrwmzzwW6b1Q4Hq+4t03NPAX rNiKi8t+daTVbTnEBu0Xu6N2NpdPcC22qguqTpVdYNdQBsoBkcJTVCusy QOuD/yGvrYbS+NNdBZHAuMfqqZFyuDiRNBeUancnsIi8BKa3NGKgSBD47 DQjy/QcCfA2np7Zou2ws6dw9UJr6OtBPmmXTvBXjraBqSD4X7xBiXNpk7 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10867"; a="450404714" X-IronPort-AV: E=Sophos;i="6.03,236,1694761200"; d="scan'208";a="450404714" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2023 23:30:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10867"; a="930489532" X-IronPort-AV: E=Sophos;i="6.03,236,1694761200"; d="scan'208";a="930489532" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2023 23:30:43 -0700 From: "Huang, Ying" To: Gregory Price Cc: Gregory Price , , , , , , Aneesh Kumar K.V , Wei Xu , Alistair Popple , Dan Williams , Dave Hansen , Johannes Weiner , Jonathan Cameron , Michal Hocko , Tim Chen , Yang Shi Subject: Re: [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving In-Reply-To: (Gregory Price's message of "Mon, 16 Oct 2023 22:52:58 -0400") References: <20231009204259.875232-1-gregory.price@memverge.com> <87o7gzm22n.fsf@yhuang6-desk2.ccr.corp.intel.com> <87pm1cwcz5.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 19 Oct 2023 14:28:42 +0800 Message-ID: <87edhrunvp.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: EEA0F40006 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: s7wfz3gq7dn3ppjeipfzyfpe66px4y7u X-HE-Tag: 1697697050-480674 X-HE-Meta: U2FsdGVkX18n0Qvp4KtAtefIod+CsDrSuw98yFKq+qXPrShf4QAWb1pXW1lr+K8ozBhsOI9OXpmZoIztU8RJv+y9+3G5nlFh40VMMEXd/L6HJZke3UdsESNMEth5V7z//q8wmcsLGceXyLgrIhh9ND9rKBGKD42V4yZgGeirfuqBD2r4xF/F3YjbftQ6EuHmZj25adzIySArFVld0sV4oZWuq5a/OJSeWq+NYJLswWs5yQzRDm0AVG3RO1FdCs3w5AIMNMLrvkRWSsnedswk9qBYkwuNOorDY1OcZXZwRm2eRLZ/hiMRV7WaE6+EfW8L+XWE7dgkUUOunY6dxyLSHAziVXpr9/Rh/DurvaBGir7FmHAe0hvsffB2bxS7k2XOPnhWypKrA36w2Ao4jCCpkotizToU+ILmJUqcnwT8W36syAUquPS0UAKxVZ+uAPzCmXgV79RsPpyP9bU0pQo+Gf6vI6UPs92wJOUEiOH59XeyAQrLGnzp/h/zmjOJNy6oqQVecmLqrkPyX0cFIV3+ihzF+rT5FWr35py9wmH6uXNq+E6FHwUe1lvEBCYeIqoofpl7Z5/3ouPJWKZf3XExSPC6KozzFevcgkdiHyUkG9RAh1DgpfJoQv2+/32EW9iJahx7pnf9pcXsNSLEerJ1N1KwJogpJnsUeTXiLaE+ierXKJ1vrQEEZPHUwOq+Oe+31RIVsfU6xhQ18PCLWw9WeKkfTiyAw/mUvYeJp3jFg5dRlwjNnsRQ8UIDBxpJndEVsQxd/kFvkIt/5ySu5/IVGo0wXPBeSyOaOC7BCGN3/pK4L7/+CfI+2cWJfk2/UxJP2bJXEDuJTbRaL97guOePrDJD9aaVaWBvZ+oteepbrVSMZbSlsZWlpPKDF/legSp4cWnqvcno5e2NFqkiGG1Iu0O9x1xm173tss9i1jWQSJKPAQEVUt/Ag73M34a4wCw8fei0sN4dOTVaEVRBOdy f/p7OBSI pPNKCKpAOlaMc1QKLWdVqY0xU05XEeZ1zPE36QqP614iyQobmtaAK/f7y3fZy+uL0VDVNRaogbhtEAUVX+NGQ6PLJHCyDntcdMJg/GR4cfSofOge0dH3HHu755nQS/ufU7frqmCIUgvR56zNsElwidoXM6mANVCcEL2xQ8ARzrcAnIBrgYXmAj+vdkDbOumajzSsVdnqmsaGE+195629OYbU2kPRaM+VVXOJnqJvkgTw3po1/7aByM1fXQFKdRHpdYTwLfQnE9dF29EhRwW0JGY42/w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Gregory Price writes: > On Wed, Oct 18, 2023 at 04:29:02PM +0800, Huang, Ying wrote: >> Gregory Price writes: >> >> > There are at least 5 proposals that i know of at the moment >> > >> > 1) mempolicy >> > 2) memory-tiers >> > 3) memory-block interleaving? (weighting among blocks inside a node) >> > Maybe relevant if Dynamic Capacity devices arrive, but it seems >> > like the wrong place to do this. >> > 4) multi-device nodes (e.g. cxl create-region ... mem0 mem1...) >> > 5) "just do it in hardware" >> >> It may be easier to start with the use case. What is the practical use >> cases in your mind that can not be satisfied with simple per-memory-tier >> weight? Can you compare the memory layout with different proposals? >> > > Before I delve in, one clarifying question: When you asked whether > weights should be part of node or memory-tiers, i took that to mean > whether it should be part of mempolicy or memory-tiers. > > Were you suggesting that weights should actually be part of > drivers/base/node.c? Yes. drivers/base/node.c vs. memory tiers. > Because I had not considered that, and this seems reasonable, easy to > implement, and would not require tying mempolicy.c to memory-tiers.c > > > > Beyond this, i think there's been 3 imagined use cases (now, including > this). > > a) > numactl --weighted-interleave=Node:weight,0:16,1:4,... > > b) > echo weight > /sys/.../memory-tiers/memtier/access0/interleave_weight > numactl --interleave=0,1 > > c) > echo weight > /sys/bus/node/node0/access0/interleave_weight > numactl --interleave=0,1 > > d) > options b or c, but with --weighted-interleave=0,1 instead > this requires libnuma changes to pick up, but it retains --interleave > as-is to avoid user confusion. > > The downside of an approach like A (which was my original approach), was > that the weights cannot really change should a node be hotplugged. Tasks > would need to detect this and change the policy themselves. That's not > a good solution. > > However in both B and C's design, weights can be rebalanced in response > to any number of events. Ultimately B and C are equivalent, but > the placement in nodes is cleaner and more intuitive. If memory-tiers > wants to use/change this information, there's nothing that prevents it. > > Assuming this is your meaning, I agree and I will pivot to this. Can you give a not-so-abstract example? For example, on a system with node 0, 1, 2, 3, memory tiers 4 (0, 1), 22 (2, 3), .... A workload runs on CPU of node 0, ...., interleaves memory on node 0, 1, ... Then compare the different behavior (including memory bandwidth) with node and memory-tier based solution. -- Best Regards, Huang, Ying