From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F230C4332F for ; Thu, 9 Nov 2023 00:25:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1C758D00C9; Wed, 8 Nov 2023 19:25:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ACBA08D0073; Wed, 8 Nov 2023 19:25:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9933C8D00C9; Wed, 8 Nov 2023 19:25:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8A8828D0073 for ; Wed, 8 Nov 2023 19:25:27 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4BE2D140CF8 for ; Thu, 9 Nov 2023 00:25:27 +0000 (UTC) X-FDA: 81436521894.03.1DFF4F7 Received: from mail-pl1-f193.google.com (mail-pl1-f193.google.com [209.85.214.193]) by imf05.hostedemail.com (Postfix) with ESMTP id 8A7C610000A for ; Thu, 9 Nov 2023 00:25:25 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=R+LC5om9; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.214.193 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699489525; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=PYg0/Bwt2k4uqOXp/QqUKQgISPeVRvmg1I3hb80MjOo=; b=B0RwHJDXybLTMLkA7w9f0b2CbftxIapBEjyqeUJA+TgkzPiex0fHdOAZxjlJb4ARKSvDpg 5xY4pBVkyydsivv3hzLrFAyjigBP0m7cPqQlfxrEsFBRRZfN9ow0W25mzGdZkWfRaPfw16 NlJYyhTS+7C0VrXLzKgoD/MX7c1Bhig= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=R+LC5om9; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.214.193 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699489525; a=rsa-sha256; cv=none; b=0pRUa4RBfv2Dky7hjYMRHT8GTDV9HLlnEpQE7K6wSgeFQP43lsZNJV7626BGvA/fQyCoXh T1u16bCtNsu/v0chzewiObzzy1f6ZE8cCuqinfiKDpvq1/pXzsO6GWYVtQz0ql0UlhR3R3 9iItvwrPlf0K1uS1imaupMLPFvnV0eQ= Received: by mail-pl1-f193.google.com with SMTP id d9443c01a7336-1cc0e78ec92so1910575ad.3 for ; Wed, 08 Nov 2023 16:25:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699489524; x=1700094324; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=PYg0/Bwt2k4uqOXp/QqUKQgISPeVRvmg1I3hb80MjOo=; b=R+LC5om9VUBWzLN8d5KUAaLKZC1yeJbNKqP+KD2PAbAfEuMtd8RuvS+iDRjVIhFLEj wtfAPMxkYUx7ZKI85YnOoJpc0XJdJSRmcuumKa1Uxjor58c7sfDEve38NHkc5hEpxgxe y1LuzLS4ojkKmN2AkLD0T79BuyOrX+6jjmRZIo1AJGRPTLUOpwffGChT5k1k1A3/aoNR i8ZhM5AykKTWgSR3//MdiRPuaxh8Gp6Da71JAUDOooLdcfv2mKqDzduBJjySfWcYp1KB yHIlf30+SDTdnyaPM+61bjYv5spnXI4ojgQFY6bxd3xiCshmV0rld/8UsoGIISskVuee F5QQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699489524; x=1700094324; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PYg0/Bwt2k4uqOXp/QqUKQgISPeVRvmg1I3hb80MjOo=; b=ZLy14uEWtdccnBdC9lRq16lr3E/4Oh4TSy0c3QpVqGorhrOTpp/cJbi18b+xhoko19 FCCHDw5YzXwi6P007KCm0KtR0892RWybnCPn6ISPgULdHYvIZTf72DzEbka4st6dyZgx ahCtg8wNdiP/zUZnVjxRjR2ZrL2xhreEiFHlgMhRTWHpS/tl7gX5Z5I8kRvLzHOvs189 hxRniRR2viIMEO3JqqQf5Y+XrDnqGhQfez8tBiBHuq2sEcqZLCMcy4+u5cn5bSNOcs78 nv88D9AG1EEYCu412r3RJD0tKQ+/NR5x19gjaeKgyrU+eAGhytxygsGg5D7TSGBg3KdL 0u/Q== X-Gm-Message-State: AOJu0YwokAlxR0hPmjnESQvsayKozbx/1PSQkkZIJyyoXzbjeOiH0336 VrX5oyH7WLDHKidKIMbozg== X-Google-Smtp-Source: AGHT+IGPnnbBYTYzVdIicc6PZI3X0xu/feSIqiTtziNRP5jwGNx+9qgUA2enN/eKVIV4FBs7qh+FHw== X-Received: by 2002:a17:902:ec8e:b0:1cc:482c:bc4d with SMTP id x14-20020a170902ec8e00b001cc482cbc4dmr4126391plg.5.1699489524241; Wed, 08 Nov 2023 16:25:24 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id b10-20020a170902a9ca00b001bc21222e34sm2219073plr.285.2023.11.08.16.25.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 16:25:23 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-kernel@vger.kernel.org Cc: linux-cxl@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, ying.huang@intel.com, akpm@linux-foundation.org, mhocko@kernel.org, tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org, corbet@lwn.net, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, Gregory Price Subject: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control Date: Wed, 8 Nov 2023 19:25:14 -0500 Message-Id: <20231109002517.106829-1-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 8A7C610000A X-Stat-Signature: 5o744u9se53gp93ozyp6pqws9qsm81hi X-HE-Tag: 1699489525-281625 X-HE-Meta: U2FsdGVkX1+Kx4ktVbILg9mnS5YPPUhr0TRO/OOwGVfqBKqlbeKtGhG203sXIB8ZCROyo5wX9fPKCQSIFpPclYkIVlTpwFBdxmtBnVVdVzVsTnZFXvu50Xd0F7f8uec/BZPpEs3yWDYeARPHzTyRpKcEiQJzgbKovb3S2qDtKea+2TOBpnIOO1bLDlpVoqUZuWX5Ion0Spg5677J+JtMeDHGK/hR2SLsgEJD0blxfiUn8U0X3G6whAeRvzan9DJynvrv49d4qgoR2XJPBLUT3OIgFNcY586lBqFcHGztUrEls+F6HfQiR1eDhe9jLqsgNpyo38ORI0PdXYmsVnpOuuvRYXwfQwMwvxig8bl1kclpSQCRxviVxLZxPrJE25fsKbYGR7I3IZcLaLkdh6iOGXXhM6HiuBBWfUHQLNW4rf9NEtUifgNbjcCq1kfEPwxu5kv5nhdnWMlj4nelqBePO1a5983ic+lhjgCcz/pYxWRIuCq2zlqy14yQaz/oH6GZQR1JVce/pU9g25i8i/MTzk+K7THIXcoxVD7krVdsb3x4RVrje9gyfnij48t6KXMg5rmZ2skSxqngnXbjyuyMYYd/4KHdIu5tPKoHrdavPO+wsFpaZCAEB6PcQBXlxJs0jrZKyRiRHM9LZ/vlVv9MVUaEQXzib6zeXw81vpoCPp8CimCTiCbX51MkJp6AsTQB5APbTERjd4BXTX+QZGzMNhyBSToJj9GTHePEkr+T5fq7FDigm7H1IklNlDYhA+R065uYMzxotP9NxMN9C9FiGeXcO9fyycIfWYa2QB15l2H889cZnstrXYHZCXt6vLH88L9NKXh0odBcIthskyGismWNXf2s0iPXy8TRB/0VhnASbyhXvMtajmRe4msUhtxvf9eg5/BOfmWzLXsS/gituu2m4VVv9oBhLS+eg+UuzvSn2dH57W/rvym6LSVlZ+pmD6CRsWwX31n+kk4UsgR bYIVrzTq 14/KOSoKuDpzjmOqKxM04eduA2NxiPCmNiOaq9G6t2JohA5nl67f+eqZb96ygoR4ILLJ2noLf0F4ikaejP2RAIKOSvoWmbSl85gutAIF/x3qGRnp5YvgwqhnXn1gNfMCKoInHVi4nU0YAQdQLlmGCudCxkwEyHp9LghDF2YeUVzzV7V52lQQgh1TiYh7S2TkrNDumMWfXq2RgMbQbnsXUmycQVPfbgKoHG5YOH1dOvckxkFb5eYeiuU1iYfm3iJBYaSUArKxYSzYzxxbvxbgDejqloRlFTQ+zyxcPW/LGZcNNuLcFJS8+sB9qLBvremYoh2bjQqdw3NX4/R3SxsbR/eiK9BKM4BmTFgx9tfZ10LHaVTjA2mA+A1AnJtqwrKtViHuT3xV1EhdVpDfY7xzN+0dNOJ4jk1jwxm9Ad/BOTbdni+KT9oynjDtq+9+FmUxbeOJk593RYPtQRPeJbA8Wyw/dPg8Sx3Vv56e89Yo0lk7BFdOh9K6U0MC9YByHKk6u9Bi89+mEL7nWL6lZEqaZrR7p0sBHVpD8X0jLInNoz2FDZ9mXQIHRoPk8egRUSHJRohTs/ZnvAiPPuq1lT4nbUbxw34Hs2lJ1Ny7Pwee0WZFFBE8pjLMydB6ddypneNWTFCCTMVO2zZ9QaiGrC6UtXLMYlagpI5DFODH2Fb1sGCh228smFQ1pyDut0i7vQzv8/B/XKlmv98jJ9iIOwEM/oTxGjzZE1qAMwltF3E5QgsEYcJY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000112, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patchset implements weighted interleave and adds a new cgroup sysfs entry: cgroup/memory.interleave_weights (excluded from root). The il_weight of a node is used by mempolicy to implement weighted interleave when `numactl --interleave=...` is invoked. By default il_weight for a node is always 1, which preserves the default round robin interleave behavior. Interleave weights denote the number of pages that should be allocated from the node when interleaving occurs and have a range of 1-255. The weight of a node can never be 0, and instead the preferred way to prevent allocation is to remove the node from the cpuset or mempolicy altogether. For example, if a node's interleave weight is set to 5, 5 pages will be allocated from that node before the next node is scheduled for allocations. # Set node weight for node 0 to 5 echo 0:5 > /sys/fs/cgroup/user.slice/memory.interleave_weights # Set node weight for node 1 to 3 echo 1:3 > /sys/fs/cgroup/user.slice/memory.interleave_weights # View the currently set weights cat /sys/fs/cgroup/user.slice/memory.interleave_weights 0:5,1:3 Weights will only be displayed for possible nodes. With this it becomes possible to set an interleaving strategy that fits the available bandwidth for the devices available on the system. An example system: Node 0 - CPU+DRAM, 400GB/s BW (200 cross socket) Node 1 - CXL Memory. 64GB/s BW, on Node 0 root complex In this setup, the effective weights for a node set of [0,1] may be may be [86, 14] (86% of memory on Node 0, 14% on node 1) or some smaller fraction thereof to encourge quicker rounds for better overall distribution. This spreads memory out across devices which all have different latency and bandwidth attributes in a way that can maximize the available resources. ~Gregory ============= Version Notes: = v4 notes Moved interleave weights to cgroups from nodes. Omitted them from the root cgroup for initial testing/comment, but it seems like it may be a reasonable idea to place them there too. == Weighted interleave mm/mempolicy: modify interleave mempolicy to use node weights The mempolicy MPOL_INTERLEAVE utilizes the node weights defined in the cgroup memory.interleave_weights interfaces to implement weighted interleave. By default, since all nodes default to a weight of 1, the original interleave behavior is retained. ============ RFC History Node based weights By: Gregory Price https://lore.kernel.org/linux-mm/20231031003810.4532-1-gregory.price@memverge.com/ Memory-tier based weights By: Ravi Shankar https://lore.kernel.org/all/20230927095002.10245-1-ravis.opensrc@micron.com/ Mempolicy multi-node weighting w/ set_mempolicy2: By: Gregory Price https://lore.kernel.org/all/20231003002156.740595-1-gregory.price@memverge.com/ Hasan Al Maruf: N:M weighting in mempolicy https://lore.kernel.org/linux-mm/YqD0%2FtzFwXvJ1gK6@cmpxchg.org/T/ Huang, Ying's presentation in lpc22, 16th slide in https://lpc.events/event/16/contributions/1209/attachments/1042/1995/\ Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf =================== Gregory Price (3): mm/memcontrol: implement memcg.interleave_weights mm/mempolicy: implement weighted interleave Documentation: sysfs entries for cgroup.memory.interleave_weights Documentation/admin-guide/cgroup-v2.rst | 45 +++++ .../admin-guide/mm/numa_memory_policy.rst | 11 ++ include/linux/memcontrol.h | 31 ++++ include/linux/mempolicy.h | 3 + mm/memcontrol.c | 172 ++++++++++++++++++ mm/mempolicy.c | 153 +++++++++++++--- 6 files changed, 387 insertions(+), 28 deletions(-) -- 2.39.1