From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0B42C47DA9 for ; Tue, 30 Jan 2024 18:21:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BC996B0089; Tue, 30 Jan 2024 13:21:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 346306B008A; Tue, 30 Jan 2024 13:21:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 16F6F6B008C; Tue, 30 Jan 2024 13:21:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id F408C6B0089 for ; Tue, 30 Jan 2024 13:20:59 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C7DBC120BE0 for ; Tue, 30 Jan 2024 18:20:59 +0000 (UTC) X-FDA: 81736793838.30.BEA4254 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by imf01.hostedemail.com (Postfix) with ESMTP id D98E64000B for ; Tue, 30 Jan 2024 18:20:57 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YuFWgm1L; spf=pass (imf01.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.215.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706638857; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6ZMn6sDMi1uCxjmVz8fP6z7yvojLb9nEQHXs3aNIFG4=; b=fLvgvNKpfZy1bSmxXQ/AkATsl6kVMhn1/PaXB5RcxFf0DLbp8tkpsPJi1xhc21QblfPKKI wY/0ZtKCO1McMCqERGTpOrzcGaYzueEL/EKJThZwJ+PhuSpFOqs0ODPapywPDFgOpSc9CD 5b0Q++xc1S6OKA3hWkGzSSmWjkCx//A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706638857; a=rsa-sha256; cv=none; b=iOA7xl6f/E6pR0f6mYCsv1AOklbWYC/8Qm4YrXCtsGNW5j2YScwlr2trguWwtdrsUIydVV vz8On4gA9kggPHaeaBlWsBGhJ7Tgh/uRw9smiZ5t1zjKjw9QNI93XTOBYK0+CSR/DTE1Ow KXklD5+I3IjEHfrqu7aEg3627CIuolE= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YuFWgm1L; spf=pass (imf01.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.215.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pg1-f194.google.com with SMTP id 41be03b00d2f7-5d8df2edd29so1002764a12.2 for ; Tue, 30 Jan 2024 10:20:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706638856; x=1707243656; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6ZMn6sDMi1uCxjmVz8fP6z7yvojLb9nEQHXs3aNIFG4=; b=YuFWgm1LzEibrNR+9ARGbZh9wY2XZeRJ/YSH5rwaZNfmqWVVZFpWEhCuAD7mvwptNs Qvl99BNvvA8JzXMifRZKH4bqhs2IuLDeYiNb5quqDJccmHf0Pv95Tn3wRxVxsvWzvUAi Lt+xHysTlP3FmVcem6RRbdGa9CkiI0DCl6/ppMIjn2hEph8GkLOR8galxE1TmbhM6any Di495+pWkrJPVJApHoF/K07+sc4SFm5NPegdvC4bNwCoSbruoqeOhQJbj2URg7ylwIIl zTc9EN7GbeHBeKyIWd7xgqE6F+Oez+SKBJYDtNLTV1rY4fo46NMY6aD/qsBr8Xvt9rtA BxNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706638856; x=1707243656; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6ZMn6sDMi1uCxjmVz8fP6z7yvojLb9nEQHXs3aNIFG4=; b=fqzyioWKukjSudXwL0QvWMlbC5tb4FoGdWUJyyDq6D3PxY8JbRrvdOyV1aB4sOahct 9Jf39LFn0pt39+dkf5apilnmCkohxa/UOANbAns53AeKk5Xe/YsyUAd1AWrDZ15q8DNl NkyrwJTMrkgW3M+YB9YXyWnDUdIx3MwLPY22e2h9vY2IvxuXbwsfTjkqCcIGIJ7IBppR T+TJsZjjIo7sbPg6qFiGSU9KqTPMHVOUbPNoSCYFPMW4Z/tZ4bb5ncD8RNEvtTxEunby HQYdlHkaFvpdqqJCQ95TmEyVpV0p0+5v2nVH67T6YPO55DGSfcGe0KIBb6JoR5x6VWrv GlbQ== X-Gm-Message-State: AOJu0YxcbVcFArYTPkcSSOoVDt3D1rGXoj5N3YEoGdT2HaVdfFzyXC1s 0EZQekMf8hk+gph6+meIw+T1+rtSPrV5OcoMaxcA0MxgqmhZ3ibERl0wDVhqBFbD X-Google-Smtp-Source: AGHT+IHt0cjlyHl0KL24KfAI1BtOVueUlhISU+9jt8XD1LbCPQo9Jx7uDvV7iI3p0+67jXuwN4211w== X-Received: by 2002:a05:6a21:4487:b0:19b:4349:5447 with SMTP id vo7-20020a056a21448700b0019b43495447mr4623581pzb.29.1706638856462; Tue, 30 Jan 2024 10:20:56 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id o64-20020a62cd43000000b006d9ce7d3258sm8460143pfg.204.2024.01.30.10.20.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jan 2024 10:20:56 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, corbet@lwn.net, akpm@linux-foundation.org, gregory.price@memverge.com, honggyu.kim@sk.com, rakie.kim@sk.com, hyeongtak.ji@sk.com, mhocko@kernel.org, ying.huang@intel.com, vtavarespetr@micron.com, jgroves@micron.com, ravis.opensrc@micron.com, sthanneeru@micron.com, emirakhur@micron.com, Hasan.Maruf@amd.com, seungjun.ha@samsung.com, hannes@cmpxchg.org, dan.j.williams@intel.com Subject: [PATCH v4 1/3] mm/mempolicy: implement the sysfs-based weighted_interleave interface Date: Tue, 30 Jan 2024 13:20:44 -0500 Message-Id: <20240130182046.74278-2-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20240130182046.74278-1-gregory.price@memverge.com> References: <20240130182046.74278-1-gregory.price@memverge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: D98E64000B X-Rspam-User: X-Stat-Signature: 17j6wwatu8oww8mzkbaso378fdsp3qdi X-Rspamd-Server: rspam03 X-HE-Tag: 1706638857-179310 X-HE-Meta: U2FsdGVkX18JmOlwDNP24ofSDiTXG0oyzFz5iKffyfOuKq6GrXbQHCjtpjDhM0Al1094LWXjTwPiNliRy4JSn3iVUilj3pSvXKr/DowD8rjibw4gVgbziRGnS/tsvEEUGaDmAwkk91u1U+bXnjrqygr0yaQxSSDO2KF7zFTLaO7mAu73jnbkofwaFgWF4QgTJG1E+epVVC+PFJuJwk9ZEjN6j6WLhIOZLrHHo6Kt8kEI5tkR+F3VYMMFf+ltOjTH3I/dDjhfJP0OSznwB721suW2nB+Q00KaMS3HA9yMgp6a2ZRbe9HlhkyILL0aCmOmJSdig7M9CT6gF+8Y1Udpo3FqlqW6z70ZTyFAyrJ76pB412Te3E9mB8DL105SgPyCcMaRYSHyPFQdr0NCiLOKu5rq9hdM5rhcsCoJAgf2ZHpESX3lhaoM8I7WBgsb1CvYNPrqEAmJgtmqoNu1xCr/mgmmcFZTrIteMGsHioWoyPnWHyfBV6lOWH2U5m5fl8R+5/ZpmNjDwXWdHFX5neMUD/+eJwXXDZgXLfQvrny0Nk732QYdDKfcYT35KW7u5ho3lBZvYwoX38iJEwpvUCiW7HYw8kbu/Hk5vD2xFE0HlylTAYhrtADIWxLTECAkrsi51TdE5nj4yC55wKQ6Y5UWcOupNsg69TfIFoMwFGLygvppUQvUOaeIWXiCSPbCVCUGXp53C5qOyGCDRgXFQIiDNBTGRxa7LGpsJWd1+XEQhA3OLuD/lOc6lyVA34bmIV5Dv2+CVZoYJIAPjDCUfFbE3L6rSwZ8X8Jcnbkea9sswyvT6AgzxJpYGLxOiYhbkpTmItQUpxo7wRtvRbeIGtVOXTYaCmlmDXQSDqhb1GvUr2dkse67Tedq6lOWFE6RYMg0TXp+fbdgku623XsUM9qaX59yi5D97V8p9U69RZmQbyGJJ78q3BPIlwryPSSw2FLRC76TCxsOCdZFPleZhFe myUVH7Jw UW/ZelDeHdYgBtEJaDcoIDAC5l1/LQh3X57HJes0zFyb5czZZKrMHJ0odjXfELy/OJAffGFWZuJqex9Wun2waJWWiVgW7s5fDLOoy3rrvLjckGs7iZheSqpnoIJSzfOVMOGEfnrxOUpJwFGlzNX11CVBTDrNsj7HfPzWPoFK1gJ/JF7bY6I/D2eODTD5UGTiFITRuCXcV72Z6I2xOGv3tx8UpnNuufIMoCdnunS2m48GrI7xt2tnj++cpk/Dam+jSvtoQmOAg/mtgYfsaIbu9YeWC/dnah7ihtndUlmrDieTe3SQSIDqbLv/BPrP15sPjOPuiZ/KSSr//O1lT2mQ4qC83vlsj3wcKaHTe7px5pLBkKxNVoVjCYd21MMyHLCpMlu5oqjruhLQB2aJrZgO2rBOAaBrJ7oqqvR8iNq1wnGamIf+ghFHYJ3CtRBArqhdLp08UBx8RwpfxYHTaBh8iNgZFhC55BYikjvjbpBMVJu5EG2dZP1ByoajXlzWBpVQ9/Lz5Y93QBMmje349pmr+gH4QqjtugtY6rn9tF2PzfpRCsblHpfCdKTMA1MVdulmW8H4orXiu8kSTsnbfIjnvP8biVL67+rjxKyqyYdz8jSBteBrJpEwzJxt1guKQV1ym2aFIj68cP9Cn1RUt6XSKPdEmDZjLdo9OSDEVJaZ1G5+FtftF5O8yIhaJClwK7XgN13Sh9ao2761I2aNDgv+m5HLblg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Rakie Kim This patch provides a way to set interleave weight information under sysfs at /sys/kernel/mm/mempolicy/weighted_interleave/nodeN The sysfs structure is designed as follows. $ tree /sys/kernel/mm/mempolicy/ /sys/kernel/mm/mempolicy/ [1] └── weighted_interleave [2] ├── node0 [3] └── node1 Each file above can be explained as follows. [1] mm/mempolicy: configuration interface for mempolicy subsystem [2] weighted_interleave/: config interface for weighted interleave policy [3] weighted_interleave/nodeN: weight for nodeN If a node value is set to `0`, the system-default value will be used. As of this patch, the system-default for all nodes is always 1. Suggested-by: Huang Ying Signed-off-by: Rakie Kim Signed-off-by: Honggyu Kim Co-developed-by: Gregory Price Signed-off-by: Gregory Price Co-developed-by: Hyeongtak Ji Signed-off-by: Hyeongtak Ji --- .../ABI/testing/sysfs-kernel-mm-mempolicy | 4 + ...fs-kernel-mm-mempolicy-weighted-interleave | 25 ++ mm/mempolicy.c | 223 ++++++++++++++++++ 3 files changed, 252 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-mempolicy create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy new file mode 100644 index 000000000000..8ac327fd7fb6 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy @@ -0,0 +1,4 @@ +What: /sys/kernel/mm/mempolicy/ +Date: January 2024 +Contact: Linux memory management mailing list +Description: Interface for Mempolicy diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave new file mode 100644 index 000000000000..0b7972de04e9 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave @@ -0,0 +1,25 @@ +What: /sys/kernel/mm/mempolicy/weighted_interleave/ +Date: January 2024 +Contact: Linux memory management mailing list +Description: Configuration Interface for the Weighted Interleave policy + +What: /sys/kernel/mm/mempolicy/weighted_interleave/nodeN +Date: January 2024 +Contact: Linux memory management mailing list +Description: Weight configuration interface for nodeN + + The interleave weight for a memory node (N). These weights are + utilized by tasks which have set their mempolicy to + MPOL_WEIGHTED_INTERLEAVE. + + These weights only affect new allocations, and changes at runtime + will not cause migrations on already allocated pages. + + The minimum weight for a node is always 1. + + Minimum weight: 1 + Maximum weight: 255 + + Writing an empty string or `0` will reset the weight to the + system default. The system default may be set by the kernel + or drivers at boot or during hotplug events. diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 10a590ee1c89..440128a398ef 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -131,6 +131,32 @@ static struct mempolicy default_policy = { static struct mempolicy preferred_node_policy[MAX_NUMNODES]; +/* + * iw_table is the sysfs-set interleave weight table, a value of 0 denotes + * system-default value should be used. A NULL iw_table also denotes that + * system-default values should be used. Until the system-default table + * is implemented, the system-default is always 1. + * + * iw_table is RCU protected + */ +static u8 __rcu *iw_table; +static DEFINE_MUTEX(iw_table_lock); + +static u8 get_il_weight(int node) +{ + u8 __rcu *table; + u8 weight; + + rcu_read_lock(); + table = rcu_dereference(iw_table); + /* if no iw_table, use system default */ + weight = table ? table[node] : 1; + /* if value in iw_table is 0, use system default */ + weight = weight ? weight : 1; + rcu_read_unlock(); + return weight; +} + /** * numa_nearest_node - Find nearest node by state * @node: Node id to start the search @@ -3067,3 +3093,200 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) p += scnprintf(p, buffer + maxlen - p, ":%*pbl", nodemask_pr_args(&nodes)); } + +#ifdef CONFIG_SYSFS +struct iw_node_attr { + struct kobj_attribute kobj_attr; + int nid; +}; + +static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + struct iw_node_attr *node_attr; + u8 weight; + + node_attr = container_of(attr, struct iw_node_attr, kobj_attr); + weight = get_il_weight(node_attr->nid); + return sysfs_emit(buf, "%d\n", weight); +} + +static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct iw_node_attr *node_attr; + u8 __rcu *new; + u8 __rcu *old; + u8 weight = 0; + + node_attr = container_of(attr, struct iw_node_attr, kobj_attr); + if (count == 0 || sysfs_streq(buf, "")) + weight = 0; + else if (kstrtou8(buf, 0, &weight)) + return -EINVAL; + + new = kzalloc(nr_node_ids, GFP_KERNEL); + if (!new) + return -ENOMEM; + + mutex_lock(&iw_table_lock); + old = rcu_dereference_protected(iw_table, + lockdep_is_held(&iw_table_lock)); + if (old) + memcpy(new, old, nr_node_ids); + new[node_attr->nid] = weight; + rcu_assign_pointer(iw_table, new); + mutex_unlock(&iw_table_lock); + synchronize_rcu(); + kfree(old); + return count; +} + +static struct iw_node_attr **node_attrs; + +static void sysfs_wi_node_release(struct iw_node_attr *node_attr, + struct kobject *parent) +{ + if (!node_attr) + return; + sysfs_remove_file(parent, &node_attr->kobj_attr.attr); + kfree(node_attr->kobj_attr.attr.name); + kfree(node_attr); +} + +static void sysfs_wi_release(struct kobject *wi_kobj) +{ + int i; + + for (i = 0; i < nr_node_ids; i++) + sysfs_wi_node_release(node_attrs[i], wi_kobj); + kobject_put(wi_kobj); +} + +static const struct kobj_type wi_ktype = { + .sysfs_ops = &kobj_sysfs_ops, + .release = sysfs_wi_release, +}; + +static int add_weight_node(int nid, struct kobject *wi_kobj) +{ + struct iw_node_attr *node_attr; + char *name; + + node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL); + if (!node_attr) + return -ENOMEM; + + name = kasprintf(GFP_KERNEL, "node%d", nid); + if (!name) { + kfree(node_attr); + return -ENOMEM; + } + + sysfs_attr_init(&node_attr->kobj_attr.attr); + node_attr->kobj_attr.attr.name = name; + node_attr->kobj_attr.attr.mode = 0644; + node_attr->kobj_attr.show = node_show; + node_attr->kobj_attr.store = node_store; + node_attr->nid = nid; + + if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) { + kfree(node_attr->kobj_attr.attr.name); + kfree(node_attr); + pr_err("failed to add attribute to weighted_interleave\n"); + return -ENOMEM; + } + + node_attrs[nid] = node_attr; + return 0; +} + +static int add_weighted_interleave_group(struct kobject *root_kobj) +{ + struct kobject *wi_kobj; + int nid, err; + + wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL); + if (!wi_kobj) + return -ENOMEM; + + err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj, + "weighted_interleave"); + if (err) { + kfree(wi_kobj); + return err; + } + + for_each_node_state(nid, N_POSSIBLE) { + err = add_weight_node(nid, wi_kobj); + if (err) { + pr_err("failed to add sysfs [node%d]\n", nid); + break; + } + } + if (err) + kobject_put(wi_kobj); + return 0; +} + +static void mempolicy_kobj_release(struct kobject *kobj) +{ + u8 __rcu *old; + + mutex_lock(&iw_table_lock); + old = rcu_dereference_protected(iw_table, + lockdep_is_held(&iw_table_lock)); + rcu_assign_pointer(iw_table, NULL); + mutex_unlock(&iw_table_lock); + synchronize_rcu(); + kfree(old); + kfree(node_attrs); + kfree(kobj); +} + +static const struct kobj_type mempolicy_ktype = { + .release = mempolicy_kobj_release +}; + +static int __init mempolicy_sysfs_init(void) +{ + int err; + static struct kobject *mempolicy_kobj; + + mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL); + if (!mempolicy_kobj) { + err = -ENOMEM; + goto err_out; + } + + node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *), + GFP_KERNEL); + if (!node_attrs) { + err = -ENOMEM; + goto mempol_out; + } + + err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj, + "mempolicy"); + if (err) + goto node_out; + + err = add_weighted_interleave_group(mempolicy_kobj); + if (err) { + pr_err("mempolicy sysfs structure failed to initialize\n"); + kobject_put(mempolicy_kobj); + return err; + } + + return err; +node_out: + kfree(node_attrs); +mempol_out: + kfree(mempolicy_kobj); +err_out: + pr_err("failed to add mempolicy kobject to the system\n"); + return err; +} + +late_initcall(mempolicy_sysfs_init); +#endif /* CONFIG_SYSFS */ -- 2.39.1