From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A673DC4332F for ; Thu, 9 Nov 2023 00:25:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BC348D00CA; Wed, 8 Nov 2023 19:25:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 06B6E8D0073; Wed, 8 Nov 2023 19:25:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E320D8D00CA; Wed, 8 Nov 2023 19:25:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CFB468D0073 for ; Wed, 8 Nov 2023 19:25:30 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AF244A0C6A for ; Thu, 9 Nov 2023 00:25:30 +0000 (UTC) X-FDA: 81436522020.09.97D57FF Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by imf11.hostedemail.com (Postfix) with ESMTP id CCEA04000F for ; Thu, 9 Nov 2023 00:25:28 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iOhr9bfs; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.210.195 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699489528; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=x+vLMcQqPa9/uA9h+Ooe3B/NT+SKXukU8inBObWmrPM=; b=zy11LGHxdtsLa97P9Wduebn/bO3ZERettUtHpHmTBzTNtf5dU1DML0vkrLzD1x/QuQ2sZ1 q8d8m7YWEtIR2UZxkp+z8PzcspZG6b+aDwJEmn0LKUSK2mQ3d1gSUpbSrG41qaub3hWttc E+rt1EuOTq8Fk/1d9Z1Sm6zzV3U/v1Y= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iOhr9bfs; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.210.195 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699489528; a=rsa-sha256; cv=none; b=2TmWP9/mWHSvyRopUVsiTaBmU+iplYCH2xghlwH66wWDmFsCkY+7LfIzFdDV2kacPeYAHY HzIHEoGh/XwRt39GbtY61WiAKYmE0QD/cGUWk6lrRb4N80N5NWPtkOUnm+jskp/JZaQQmk IlO4rs+CbHW/LbGULIldxGWsjA00RsM= Received: by mail-pf1-f195.google.com with SMTP id d2e1a72fcca58-6bb4abb8100so254250b3a.2 for ; Wed, 08 Nov 2023 16:25:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699489528; x=1700094328; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=x+vLMcQqPa9/uA9h+Ooe3B/NT+SKXukU8inBObWmrPM=; b=iOhr9bfsImg4bEDUJm8kpC/CtFt6RkRQnWGBAaa/USrvNp8mKvsWMByJS41dd2zqHF oLcK+8J5bUs9hen/qYNsKIpVEWKHQaeu+1urMAqELlZSwPIVGHIP84ounJ0TMFNPYRJy iLi5fkIAEe2i3ktUOXddYqDro8+fZ4ImLxeDJ2MyMaWsmNqPj2ncN98S5DN6DQKOxQb/ VrPKa0KNslRVREmM0TKJLx4Lc1OgSHXIhewqd6iEZ08MpAiGZ99LLs2cD9bxuUO7Ay/x 0L0wa2AG4Do6Vl0RLFgx+/IaFYDQYBMDDqj4FHDofdfNVFV6i24qbjx3382FwPeOw8lK 3EgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699489528; x=1700094328; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=x+vLMcQqPa9/uA9h+Ooe3B/NT+SKXukU8inBObWmrPM=; b=wGtXM72YOxnyF/i/qDVF2MYIFwUVqY1FkPKjlrHeCLW69Og4ACT5jLDNmIB8YAOWLP MlINRs6qiJVZNYT9UOlHBZD0Qc3EJHyWcIcQfjbRpP9onIYzJ8jgBb4A/cbJ+V77yOHB nYgyKYENYDgeJT1edd2B/7///pbegI18nHw1OLAO97lab33rnynbIh3x9lr73Inu1+4Y 9UXKpp7rJvXCpdwFEfuvTl2pkinWnIOMhLy4NuaCZoI5ocBvHd8Lguwqp3XZEhHGQ/v+ pVK1UbzVEebAJNb7+JH1B0UAdxe3qev1KKWSVbnSKvHoELODxqX0U7dUwg/GAqg5J54l X3xg== X-Gm-Message-State: AOJu0YzBVDnUiKcwPikTCCNRj5zwz0aHSHj0hQy8yFIbC4+kaXsLoHJ7 e6CgkADlcK/Rtdrs44kDIA== X-Google-Smtp-Source: AGHT+IFJa3XqHfr7w43VvBJ++JO55x37+essItDkdmOarWpBCb4W6tU0UNUrxGBma9nAsrqSWjotag== X-Received: by 2002:a05:6a20:7f93:b0:180:f0ed:2992 with SMTP id d19-20020a056a207f9300b00180f0ed2992mr2006580pzj.51.1699489527665; Wed, 08 Nov 2023 16:25:27 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id b10-20020a170902a9ca00b001bc21222e34sm2219073plr.285.2023.11.08.16.25.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 16:25:27 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-kernel@vger.kernel.org Cc: linux-cxl@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, ying.huang@intel.com, akpm@linux-foundation.org, mhocko@kernel.org, tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org, corbet@lwn.net, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, Gregory Price Subject: [RFC PATCH v4 1/3] mm/memcontrol: implement memcg.interleave_weights Date: Wed, 8 Nov 2023 19:25:15 -0500 Message-Id: <20231109002517.106829-2-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20231109002517.106829-1-gregory.price@memverge.com> References: <20231109002517.106829-1-gregory.price@memverge.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: b8393sps61y85wryey1pwzyjf73m4shq X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: CCEA04000F X-HE-Tag: 1699489528-338191 X-HE-Meta: U2FsdGVkX1+pXFPiAsvvboyJh3mQ/69EcNe91x8fNR0kvP6ztRXgbe+IV9SCoLnHC0GSHF6gUXKHpFPjdYaSN1U+wgh+U6g/HHpeOSxQs5WBpYHq+9MCTN0hdESitJFmEUhSsYTbRuiHBUxADwlyTTaSFSu0543sYIeUZX/90TXwc0DCPvSwWpJIDe2WVjT7FPHUUY6VWjJjC+5++M0xQpNgV+78ctOvrBL0QX1k8aAuLu2BLqModZKEsRjdqCdXT584pKxv6GPVNJ3z5E21eveqF0uMuVb9szZTfokRjoaFetTNDOwhvL6ZkUHULjfJ9EY1Rqslo4fpbtpbwtovRvwv4gDm0RpuuiLyqGaSb3yqGbeHduA/19RTphFSJ3xOVCn/N4PVpGnrYKtoO0rAQoPoid/+Mb5D29ryoeXWGoXiytnYr2OGPJXeTw4vGHcdSY5MzW0yKB7pGl0MNXG6YtX1pYESxYaEc+40HywZzKcbE3wjcn5tv6gASSYIcPZ+3Or0PsAv04HTAtK03p4VUvM9wu9MNX0/W+sZdFPNcyFML8ZbF9iJrQUUMTSo6X/ZcZeY96EjQgrmaN0wnZDm5+mg86Mfkcve7f1gOPKu+XIDqnuRXbxTfkNGErUaTbx91a3KUvvqSowACJl7uAtfxEEsjU07Bw/ZigkPfGQEgNBaJX+DxEFvBwnNxX4y+m6dfYxVFVVTtq9AxaKV+2xCBGI8YZjQjgH/sD2ubiTHia1fPkr0ADb13zbNzNnKzpqfkkdK6i0FwMEzXFjnS+QWaDYc5crofVRxSotPv3Y35gTqUW7f4lrsFNjeUlkcLNKekqcsFOtHhpq8ENJ+Fi3ZnY8HcVJaO57Q5QXbrSTXa6xf4MBGh7W3YES1DlpDHFdqLwkhWqHqjpgOxcFo9urY9kRw3lc5M/qWbCFUvocI5BSp2raEo/xbLEWcGuZ8QEZ/A2rC+IU4vvVtuVSgFDA 97ix4SJL O30EVGXSs0kmZfkSjWpS3MqjZZB82G14+He/DmNbZGe+Jq5FVwAA6W83MACR/qpmIe+suKKo4izZR/Pk7FwJ7+LYUyjzRXqlpJljZwNiOJniMbNzDrWyrECHD4Hkhm+iKd8wx5gdOw6S+Z/+vNCGnO5prL3osYLPi1E8b5vu62/aaPdsrqzkxdc9uUdXlLNRj6U7lhkwFXuCQpyA3PO4HVuHgTdeW90wfKT9QFks4kJQmzM6WgL3W49JiESAetojEASxeW7uTtjZY6Ng9up2DdjwASTJ25ZkVEdtH1u48f9KNHj/NCs1g/GuJmzwIIFm2RcXNmZ3Bspz7/QA78i3e9gsuFdGWlCYf43qijpL9NF7lrzJw/XpIL0DSqTZzbCLoHAI37nrpqYV3G87GVSdv/YCCrKjHQi3o5zqMlQbN9oL/fW00yf/q1pqyk0Ra3yDy97CsbAnJLpMHaUecpUgCDULojlLq/W+fmGVVwr/4LQ3evPXS7QInY/82yEPm2mbCIW84nOt5Jg06aNI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Create an RCU-protected array of unsigned char[MAX_NUMNODES] where interleave weights can be stored. The intent of these weights are to be used by mempolicy to implement weighted interleave for bandwidth optimization. Node weights assigned via cgroup/memory.interleave_weights Example: Set a 3:1 weighting ratio for nodes 0 and 1 respectively. echo 0:3 > cgroup/memory.interleave_weights echo 1:1 > cgroup/memory.interleave_weights Example output: cat cgroup/memory.interleave_weights 0:3,1:1 Child cgroups inherit parent interleave weights and may override them. To revert weights to inheriting from the parent, write "-1:0" Example: echo -1:0 > cgroup/memory.interleave_weights Signed-off-by: Gregory Price --- include/linux/memcontrol.h | 31 +++++++ mm/memcontrol.c | 172 +++++++++++++++++++++++++++++++++++++ 2 files changed, 203 insertions(+) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e4e24da16d2c..338a9dcda446 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -21,6 +21,8 @@ #include #include #include +#include +#include struct mem_cgroup; struct obj_cgroup; @@ -167,6 +169,15 @@ struct mem_cgroup_thresholds { struct mem_cgroup_threshold_ary *spare; }; +/* For mempolicy information */ +struct mem_cgroup_mempolicy { + /* + * When interleaving is applied, do allocations on each node by the + * weight value. Size is always MAX_NUMNODES. Protected by RCU. + */ + unsigned char *il_weights; +}; + /* * Remember four most recent foreign writebacks with dirty pages in this * cgroup. Inode sharing is expected to be uncommon and, even if we miss @@ -265,6 +276,12 @@ struct mem_cgroup { /* thresholds for mem+swap usage. RCU-protected */ struct mem_cgroup_thresholds memsw_thresholds; + /* protect the mempolicy settings */ + struct mutex mempolicy_lock; + + /* mempolicy defaults for tasks */ + struct mem_cgroup_mempolicy mempolicy; + /* For oom notifier event fd */ struct list_head oom_notify; @@ -1159,6 +1176,12 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, unsigned long *total_scanned); + +unsigned char mem_cgroup_get_il_weight(unsigned int nid); + +unsigned int mem_cgroup_get_il_weights(nodemask_t *nodes, + unsigned char *weights); + #else /* CONFIG_MEMCG */ #define MEM_CGROUP_ID_SHIFT 0 @@ -1591,6 +1614,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, { return 0; } + +static unsigned char mem_cgroup_get_il_weight(unsigned int nid) { return 0; } + +static unsigned int mem_cgroup_get_il_weights(nodemask_t *nodes, + unsigned char *weights) +{ + return 0; +} #endif /* CONFIG_MEMCG */ static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5b009b233ab8..67e8c1767471 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5319,6 +5319,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void) INIT_WORK(&memcg->high_work, high_work_func); INIT_LIST_HEAD(&memcg->oom_notify); mutex_init(&memcg->thresholds_lock); + mutex_init(&memcg->mempolicy_lock); spin_lock_init(&memcg->move_lock); vmpressure_init(&memcg->vmpressure); INIT_LIST_HEAD(&memcg->event_list); @@ -7896,6 +7897,176 @@ static struct cftype zswap_files[] = { }; #endif /* CONFIG_MEMCG_KMEM && CONFIG_ZSWAP */ +unsigned char mem_cgroup_get_il_weight(unsigned int nid) +{ + struct mem_cgroup *memcg; + unsigned char weight = 0; + unsigned char *weights; + + rcu_read_lock(); + memcg = mem_cgroup_from_task(current); + while (!mem_cgroup_is_root(memcg)) { + weights = rcu_dereference(memcg->mempolicy.il_weights); + if (weights) { + weight = weights[nid]; + break; + } + memcg = parent_mem_cgroup(memcg); + } + rcu_read_unlock(); + + return weight; +} + +unsigned int mem_cgroup_get_il_weights(nodemask_t *nodes, + unsigned char *weights) +{ + struct mem_cgroup *memcg; + unsigned char *memcg_weights; + unsigned int nid; + unsigned int total = 0; + unsigned char weight; + + rcu_read_lock(); + memcg = mem_cgroup_from_task(current); + while (memcg && !mem_cgroup_is_root(memcg)) { + memcg_weights = rcu_dereference(memcg->mempolicy.il_weights); + if (!memcg_weights) { + memcg = parent_mem_cgroup(memcg); + continue; + } + + for_each_node_mask(nid, *nodes) { + weight = memcg_weights[nid]; + weights[nid] = weight ? weight : 1; + total += weights[nid]; + } + break; + } + rcu_read_unlock(); + + return total; +} + +static int mpol_ilw_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg; + unsigned char *weights; + unsigned int nid; + unsigned int count = 0; + + rcu_read_lock(); + memcg = mem_cgroup_from_seq(m); + + while (memcg && !mem_cgroup_is_root(memcg)) { + weights = rcu_dereference(memcg->mempolicy.il_weights); + if (weights) + break; + memcg = parent_mem_cgroup(memcg); + } + for_each_node(nid) { + seq_printf(m, "%s%d:%d", (count++ ? "," : ""), nid, + weights ? weights[nid] : 1); + } + seq_putc(m, '\n'); + rcu_read_unlock(); + + return 0; +} + +static ssize_t mpol_ilw_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + struct mem_cgroup *pmcg; + unsigned char *new_weights = NULL, *old_weights = NULL; + int node; + unsigned char weight; + ssize_t ret; + char *sep = memchr(buf, ':', nbytes); + bool parent_weights = false; + + if (!sep || sep == buf || sep == (buf + nbytes - 1)) + return -EINVAL; + *sep = '\0'; + + ret = kstrtoint(buf, 10, &node); + if (ret) + return ret; + + ret = kstrtou8(sep + 1, 10, &weight); + if (ret) + return ret; + + /* + * if value is -1:0, clear weights and set pointer to NULL + * this allows the parent cgroup settings to take over + */ + if (node == -1 && weight == 0) + goto set_weights; + else if (node < 0) + return -EINVAL; + else if (node >= MAX_NUMNODES || weight == 0) + return -EINVAL; + + new_weights = kzalloc(sizeof(unsigned char)*MAX_NUMNODES, GFP_KERNEL); + if (!new_weights) + return -ENOMEM; +set_weights: + /* acquire mutex and readlock so we can read from parents if needed */ + mutex_lock(&memcg->mempolicy_lock); + rcu_read_lock(); + old_weights = rcu_dereference(memcg->mempolicy.il_weights); + + /* If we're clearing the weights, don't bother looking at old ones */ + if (!new_weights) + goto swap_weights; + + /* Check for parent weights to inherit */ + pmcg = memcg; + while (!old_weights) { + pmcg = parent_mem_cgroup(pmcg); + + if (!pmcg || mem_cgroup_is_root(pmcg)) + break; + old_weights = rcu_dereference(pmcg->mempolicy.il_weights); + parent_weights = true; + } + + /* Copy the old weights or default all nodes to 1 */ + if (old_weights) + memcpy(new_weights, old_weights, + sizeof(unsigned char)*MAX_NUMNODES); + else + memset(new_weights, 1, + sizeof(unsigned char)*MAX_NUMNODES); + new_weights[node] = weight; + +swap_weights: + rcu_assign_pointer(memcg->mempolicy.il_weights, new_weights); + + rcu_read_unlock(); + synchronize_rcu(); + + /* If we are inheriting weights from the parent, do not free */ + if (old_weights && !parent_weights) + kfree(old_weights); + + mutex_unlock(&memcg->mempolicy_lock); + + return nbytes; +} + +static struct cftype mempolicy_files[] = { + { + .name = "interleave_weights", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = mpol_ilw_show, + .write = mpol_ilw_write, + }, + { } /* terminate */ +}; + static int __init mem_cgroup_swap_init(void) { if (mem_cgroup_disabled()) @@ -7906,6 +8077,7 @@ static int __init mem_cgroup_swap_init(void) #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, zswap_files)); #endif + WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, mempolicy_files)); return 0; } subsys_initcall(mem_cgroup_swap_init); -- 2.39.1