From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24C5ECDB47E for ; Wed, 18 Oct 2023 08:31:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A84558D0146; Wed, 18 Oct 2023 04:31:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A34058D0016; Wed, 18 Oct 2023 04:31:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FA578D0146; Wed, 18 Oct 2023 04:31:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 810258D0016 for ; Wed, 18 Oct 2023 04:31:11 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 50D3CA01A6 for ; Wed, 18 Oct 2023 08:31:11 +0000 (UTC) X-FDA: 81357912342.08.30EF344 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by imf10.hostedemail.com (Postfix) with ESMTP id 3A6DFC0003 for ; Wed, 18 Oct 2023 08:31:07 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QCMawXBY; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf10.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697617868; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3of+Ekocmy24picbGrqrsUXh+2i8VeE6WiDSKdPkFPI=; b=sknC02WCM0gqXwEnwpsv6z0h2fR3h/cl4LOjUCeYPSJXP84OH1wzWJh4nQ4ku+Y4HyUi4C LV8LR6mlvQNhC0xJ9lPYjMsaBLKRlB2foxe4IWE8wt7UFuFjH6J1lHn6HXBAdkprBen2A8 vn1YmpBx1mPQYZsSNlIOXNXuDAQcBoc= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QCMawXBY; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf10.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697617868; a=rsa-sha256; cv=none; b=q4Cqwyc4RV1dnoQAHua4qsEKAz8hlHeqFLcvFs0ca5rbvJj8lOwPiF7Rn6g4ncTi4K49wI V2ytmO2Vao6IgbKhzuE2X5lhGxyzD8SjE//gmP2Cda9iE08lcwoveU+1pgXmgSfp+0nOsR bCR5L191zQ2JN7cUKIVBvajsYTjoLs0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697617868; x=1729153868; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=WcocWUIOvUx1uVkxBe9q1KGujuqpZCCrB8D1V/83YAg=; b=QCMawXBYe1h+vziQDG08jyIHizWh0zgDMVdqKQEvXkmwqwM0cNSu5pL+ xBx+l6XU4Kx1zUkwJ6pNI6ZZcqFQeSrvlSxHPZvuA2e/ozcetgU5ONlZs RD6xMzf9VU9fkpVCODkXAz+RJ4p1tlnVlKy16YrOCrYqu+LcK6fm5Y8i9 qZQM7JfiaS/uWimTQ/4KicJHMDeT6rVtcYDv/RG9WpszzeYCY8pRy8zoO 38aUQvd7brt1cAsAwSCtO0BRKCXhg8IyK39O++F5v/aJWW2EPI7txVq5D kY9wBJLzQ9djXhBTmeJwj7ZeD/wHejx649ae90EcrsD87AIc21DwT4nuc g==; X-IronPort-AV: E=McAfee;i="6600,9927,10866"; a="452439087" X-IronPort-AV: E=Sophos;i="6.03,234,1694761200"; d="scan'208";a="452439087" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2023 01:31:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10866"; a="847174028" X-IronPort-AV: E=Sophos;i="6.03,234,1694761200"; d="scan'208";a="847174028" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2023 01:31:03 -0700 From: "Huang, Ying" To: Gregory Price Cc: Gregory Price , , , , , Subject: Re: [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving In-Reply-To: (Gregory Price's message of "Mon, 16 Oct 2023 21:28:33 -0400") References: <20231009204259.875232-1-gregory.price@memverge.com> <87o7gzm22n.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 18 Oct 2023 16:29:02 +0800 Message-ID: <87pm1cwcz5.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 3A6DFC0003 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: o5r57qniegbbxi4hr6rgbq9y1agn7r6h X-HE-Tag: 1697617867-203733 X-HE-Meta: U2FsdGVkX19rcs4joYqLlCCuURj598iUqZB5sVL4F4j3bLl87zk2Dnp5s4Nkr0w0RP7nPUorrkjjc+7siS6L3+zyc25gqqJeFRNjP3rG5UcnkwZ0EbKVdi5fqbzlPmZBZYLay6ycUR3AqEaoTbvRKQg/axZm9Im3qtt1f+jGzYviGZ93UTj/ZH37rIa7+kixFIh9RPstKue4S2dxmlJvIiYnhvqi6xjhYTChdj8IFlBzYPOibXO3v9CJ1ZK0PP4PwQARQN55hMcstoCvT4XYPEuSc4W+wIPna/hGmxuqPSXSDipJ3R4kKKD35ZlX0SBfp2ibKbNQbhotE1NnXfEaRegpRfF+1t2bq2cFR4zfOrqQ1eLRAIIsT/x7ikn7fluHpOHsNE9V0yTklDCcKWf5N6TA1JgtmCACFHVincS6G6pnpp36vlSR4saiBRWOvaClEvgAHAVnsPrK6NcuTVasDjJSCO5RWa0ROKGCXapK7aXt1gMQyFIWiOD4Mjw1q3lSBK7/Glwqcr9NT6amwolCWuIzBHtC6qWByHxC77UjV+1XF99zdU8BjWcNZ/64++E/JC3RcaNJnnX02a5RiGe8ZOi6W+Qnof9+Nq5mBpR0dYqjAo08NjTkM2RzZ39+bc/pvPbjWeQq+EftE6F/2BvY91nXk1bd6RpGODq+Vy69y/FJhmz4tJOqeDPunjN4sJwcOPnCX+knmlM21ps8EcdWNcsh9mjrCHZdX2whr4G8FfuWmJLYqdFuIN61dbFzCi52zDaRoU8RiOBJ9/UaNXfH/Ckh27sN6aXHY0bTQzhYN5kv18SNBi+svpglANMH4OZLnITvHCbMp0AiIb+p0M0q6v4CWZL6LSVFxXx9wUNJE4DyB5RqKZprZhmE/8GydgBZJAZXSEhYLGr82LK2dnXW4mxVrvkexSyCrdVdo7k2kprKWjG9yYVRVxMOnzAySo/0PpP4STY25G3AcDDARs9 VycErWsG S0HyYlL1wisXrzq3Cd4vhOEo4QZtIVLs1x7PQnfnztPUg1hWjPsTDLKzWTOBWlgC7r4leF2MWRL9ha59Goef1Z8xF2rWV48uZP+ch2Vjxp5+sDzGqdPGwlqoRqglYJZzWgNG37rpO7lbGx54OSWslOrfppQEE1zRBRgCPM0aAfh2m3wG7w5x3+0k3l+AEPg+sCjyTxMg09SZI5LsmP3WPMz88EHxE31qp1undpI+CHDl/W7Fq785WYXxMVOulFR/5gu1mwczPQ9NaI2Tg+3WOHSVSVq5DwhURonhArxr81mSPVcU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Gregory Price writes: > On Mon, Oct 16, 2023 at 03:57:52PM +0800, Huang, Ying wrote: >> Gregory Price writes: >> >> > == Mutex to Semaphore change: >> > >> > Since it is expected that many threads will be accessing this data >> > during allocations, a mutex is not appropriate. >> >> IIUC, this is a change for performance. If so, please show some >> performance data. >> > > This change will be dropped in v3 in favor of the existing > RCU mechanism in memory-tiers.c as pointed out by Matthew. > >> > == Source-node relative weighting: >> > >> > 1. Set weights for DDR (tier4) and CXL(teir22) tiers. >> > echo source_node:weight > /path/to/interleave_weight >> >> If source_node is considered, why not consider target_node too? On a >> system with only 1 tier (DRAM), do you want weighted interleaving among >> NUMA nodes? If so, why tie weighted interleaving with memory tiers? >> Why not just introduce weighted interleaving for NUMA nodes? >> > > The short answer: Practicality and ease-of-use. > > The long answer: We have been discussing how to make this more flexible.. > > Personally, I agree with you. If Task A is on Socket 0, the weight on > Socket 0 DRAM should not be the same as the weight on Socket 1 DRAM. > However, right now, DRAM nodes are lumped into the same tier together, > resulting in them having the same weight. > > If you scrollback through the list, you'll find an RFC I posted for > set_mempolicy2 which implements weighted interleave in mm/mempolicy. > However, mm/mempolicy is extremely `current-centric` at the moment, > so that makes changing weights at runtime (in response to a hotplug > event, for example) very difficult. > > I still think there is room to extend set_mempolicy to allow > task-defined weights to take preference over tier defined weights. > > We have discussed adding the following features to memory-tiers: > > 1) breaking up tiers to allow 1 tier per node, as opposed to defaulting > to lumping all nodes of a simlar quality into the same tier > > 2) enabling movemnet of nodes between tiers (for the purpose of > reconfiguring due to hotplug and other situations) > > For users that require fine-grained control over each individual node, > this would allow for weights to be applied per-node, because a > node=tier. For the majority of use cases, it would allow clumping of > nodes into tiers based on physical topology and performance class, and > then allow for the general weighting to apply. This seems like the most > obvious use-case that a majority of users would use, and also the > easiest to set-up in the short term. > > That said, there are probably 3 or 4 different ways/places to implement > this feature. The question is what is the clear and obvious way? > I don't have a definitive answer for that, hence the RFC. > > There are at least 5 proposals that i know of at the moment > > 1) mempolicy > 2) memory-tiers > 3) memory-block interleaving? (weighting among blocks inside a node) > Maybe relevant if Dynamic Capacity devices arrive, but it seems > like the wrong place to do this. > 4) multi-device nodes (e.g. cxl create-region ... mem0 mem1...) > 5) "just do it in hardware" It may be easier to start with the use case. What is the practical use cases in your mind that can not be satisfied with simple per-memory-tier weight? Can you compare the memory layout with different proposals? >> > # Set tier4 weight from node 0 to 85 >> > echo 0:85 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight >> > # Set tier4 weight from node 1 to 65 >> > echo 1:65 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight >> > # Set tier22 weight from node 0 to 15 >> > echo 0:15 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight >> > # Set tier22 weight from node 1 to 10 >> > echo 1:10 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight -- Best Regards, Huang, Ying