From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0085DC4167B for ; Tue, 28 Nov 2023 12:51:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E5426B02FF; Tue, 28 Nov 2023 07:51:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 86EE26B0301; Tue, 28 Nov 2023 07:51:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E8246B0302; Tue, 28 Nov 2023 07:51:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 595696B02FF for ; Tue, 28 Nov 2023 07:51:07 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 39C4B401E0 for ; Tue, 28 Nov 2023 12:51:07 +0000 (UTC) X-FDA: 81507348174.18.05FEDE7 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf27.hostedemail.com (Postfix) with ESMTP id 5B38B40016 for ; Tue, 28 Nov 2023 12:51:03 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of weixi.zhu@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=weixi.zhu@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701175865; a=rsa-sha256; cv=none; b=Ye+/ocJGEEJT2q3mRFZETnGZlGpV+J7/7vzsehlQcQzcbUDMchYla+o51QDZ+vKWHe+wdn wGJo/YX7OetQ+6Cpnj9L8UhX8WjJdw+m1OY7355dc9mosUgTmMOPFclOORlExVCbLhk37k US59SiKdVF9eBiA+b8908nFx/Mtuj9s= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of weixi.zhu@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=weixi.zhu@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701175865; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yb3aOGC/TA01KFAO1ZDW1bDfbV71HXxhr9myMt4NVJ4=; b=WURD+D6LOSbi5mOFMRWAb6W2AG4IT/MXS10dW1Pr7XV380wRZwio69pnqHtZ33AeMJJb58 9PbD8OvFLbSun60CBzHJAUkXIZpuRj7Ed2uFJmVjomWds/I0DY/3oOptALwHbLG9AU3iCz 2B2Wjqe1G3xhXBEQNvxAThqKPh2128A= Received: from kwepemm000018.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Sfj3C1QgjzWhqF; Tue, 28 Nov 2023 20:49:59 +0800 (CST) Received: from DESKTOP-RAUQ1L5.china.huawei.com (10.174.179.172) by kwepemm000018.china.huawei.com (7.193.23.4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 28 Nov 2023 20:50:40 +0800 From: Weixi Zhu To: , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , Weixi Zhu Subject: [RFC PATCH 1/6] mm/gmem: add heterogeneous NUMA node Date: Tue, 28 Nov 2023 20:50:20 +0800 Message-ID: <20231128125025.4449-2-weixi.zhu@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231128125025.4449-1-weixi.zhu@huawei.com> References: <20231128125025.4449-1-weixi.zhu@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.174.179.172] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemm000018.china.huawei.com (7.193.23.4) X-CFilter-Loop: Reflected X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 5B38B40016 X-Stat-Signature: uejf1suor8zsdkexjiehxbuziku48cdr X-Rspam-User: X-HE-Tag: 1701175863-480792 X-HE-Meta: U2FsdGVkX19jKxFz5qOoaDXXT1NWKoW7P2e9itFc2O4BIls32cenn333qkWKYyYQ0sRgMKDXT2JIfW7fBDD7yY45ZZhCb2OXmCIVTu5GRndK1Xe/jlzYQNgTi2Fwgc90GZ6eD+BrrLZwNUR8zZ1GlHq2lTxhnUQJDMi1X+vQXwWWQ0ilDCEoquNjI28eIa0IBiuXEVDRKtkoxZUOjH+7yv/Bxs9SRzro/fPkrRrrn+eZpl0EHl/cxEtKi7CnDtTPVuPK7VVb/kGP5/tnx+IMTN63I+eyfGOIgsvaxXxmOAuWoCTKh/8BI/X1xdA5XOdexF1aSgHMBbiHUOZOxHkFYKEgNL/7T446sLGlHnSK2VrZycVC+veNZk0GTGBbbC42vKNW+JjfMV456/ku+HYcs+LAgHtAVw3wnunTFSjmeJB+MDysBmui0w2KcnzqtCJ9vWeLj7yqanUwjMHpTOfSqU0FtxzOpC5QzLryZOT+dgvxHLTjLD8l+VUMktihvn80677N0g70ReYQyNTG31AbY/dTH5ao1bX4Bs0iK39fqY8blMCWTKJnwxyy9kbDZFwZuJCcNs7UUV3q0dB6alCeBLI7aTJnFbntwG0snbwbWN46NaQF7O8CpfAiGN5BOUBKWk1jibKPwnefxQCmRPITYl8eKaPAOSSHjwcPjLx/+ggAoSwAc+WHFTB7huZFeKJk0YRJ3ARXBqPgcBdDWOYxFCMgHhuGVxmAzOaT9XMdtlw5ASA6DiN2E9d0styKIBHOHR0mlmGdeeg+Ula5x+T27gIN9ygtE0Mk4BxrkRforydrdaFsm3uhDUBEsyc6CVxF1w+sMJY0qhlB3K52fY0uJxppgNKPE8lj+lD22D1Zj0JbwSKUjvAKHjjj4EZwqiaJkqGAo0oz47bfM5WAi3gp/i22BnZ8mjnCBnbNbzcle5JBAbu/Nz0SKC/cK/TzqIVrBKNkeEEFC4t8dvWBrS6 jdey/QTP F6rm8k0H+W3vfSzfWKOXN+s9BxqiCO1NiQ/BnTMOdUBml9r16hqv0TlhiVmirU1AMYjc/RXesfsF4nkirqTKGWCqWzHPYlv3nQPquhtY3bz98jIoQLR6yZvt5fVNC1GCjAb9dMqgzMMEoNBQatirg1wQS4iYxxaZ03QtISxk2cVm8POi7fGULVEk9Rw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch adds a new NUMA node state, named N_HETEROGENEOUS. It is utilized to identify heterogeneous NUMA (hNUMA) node. Note that hNUMA node may not be directly accessible by the CPU. Each hNUMA node can be identified with a NUMA id. This can be extended to provide NUMA topology including device local DRAM, where a cache-coherent bus does not need to exist between the CPU and device local DRAM. Furthermore, this allows an application user to issue memory hints that bind with specific hNUMA nodes. Signed-off-by: Weixi Zhu --- drivers/base/node.c | 6 ++++ include/linux/gmem.h | 19 ++++++++++ include/linux/nodemask.h | 10 ++++++ init/main.c | 2 ++ mm/Kconfig | 14 ++++++++ mm/Makefile | 1 + mm/gmem.c | 78 ++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 3 ++ 8 files changed, 133 insertions(+) create mode 100644 include/linux/gmem.h create mode 100644 mm/gmem.c diff --git a/drivers/base/node.c b/drivers/base/node.c index 493d533f8375..aa4d2ca266aa 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -928,6 +928,9 @@ static struct node_attr node_state_attr[] = { [N_CPU] = _NODE_ATTR(has_cpu, N_CPU), [N_GENERIC_INITIATOR] = _NODE_ATTR(has_generic_initiator, N_GENERIC_INITIATOR), +#ifdef CONFIG_GMEM + [N_HETEROGENEOUS] = _NODE_ATTR(has_hetero_memory, N_HETEROGENEOUS), +#endif }; static struct attribute *node_state_attrs[] = { @@ -940,6 +943,9 @@ static struct attribute *node_state_attrs[] = { &node_state_attr[N_MEMORY].attr.attr, &node_state_attr[N_CPU].attr.attr, &node_state_attr[N_GENERIC_INITIATOR].attr.attr, +#ifdef CONFIG_GMEM + &node_state_attr[N_HETEROGENEOUS].attr.attr, +#endif NULL }; diff --git a/include/linux/gmem.h b/include/linux/gmem.h new file mode 100644 index 000000000000..fff877873557 --- /dev/null +++ b/include/linux/gmem.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Generalized Memory Management. + * + * Copyright (C) 2023- Huawei, Inc. + * Author: Weixi Zhu + * + */ +#ifndef _GMEM_H +#define _GMEM_H + +#ifdef CONFIG_GMEM +/* h-NUMA topology */ +void __init hnuma_init(void); +#else +static inline void hnuma_init(void) {} +#endif + +#endif /* _GMEM_H */ diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 8d07116caaf1..66e4640a52ba 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -407,6 +407,9 @@ enum node_states { N_MEMORY, /* The node has memory(regular, high, movable) */ N_CPU, /* The node has one or more cpus */ N_GENERIC_INITIATOR, /* The node has one or more Generic Initiators */ +#ifdef CONFIG_GMEM + N_HETEROGENEOUS, /* The node has heterogeneous memory */ +#endif NR_NODE_STATES }; @@ -536,6 +539,13 @@ static inline int node_random(const nodemask_t *maskp) #define for_each_node(node) for_each_node_state(node, N_POSSIBLE) #define for_each_online_node(node) for_each_node_state(node, N_ONLINE) +#ifdef CONFIG_GMEM +/* For h-NUMA topology */ +#define hnode_map node_states[N_HETEROGENEOUS] +#define num_hnodes() num_node_state(N_HETEROGENEOUS) +#define for_each_hnode(node) for_each_node_state(node, N_HETEROGENEOUS) +#endif + /* * For nodemask scratch area. * NODEMASK_ALLOC(type, name) allocates an object with a specified type and diff --git a/init/main.c b/init/main.c index e24b0780fdff..12dfb5b63d51 100644 --- a/init/main.c +++ b/init/main.c @@ -100,6 +100,7 @@ #include #include #include +#include #include #include @@ -901,6 +902,7 @@ void start_kernel(void) setup_per_cpu_areas(); smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ boot_cpu_hotplug_init(); + hnuma_init(); pr_notice("Kernel command line: %s\n", saved_command_line); /* parameters may set static keys */ diff --git a/mm/Kconfig b/mm/Kconfig index 89971a894b60..1a7d8194513c 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1270,6 +1270,20 @@ config LOCK_MM_AND_FIND_VMA bool depends on !STACK_GROWSUP +config GMEM + bool "generalized memory management for external memory devices" + depends on (ARM64 || X86_64) && MMU && TRANSPARENT_HUGEPAGE + select ARCH_USES_HIGH_VMA_FLAGS + default y + help + Supporting GMEM (generalized memory management) for external memory + devices + + GMEM extends Linux MM to share its machine-independent MM code. Only + high-level interface is provided for device drivers. This prevents + accelerator drivers from reinventing the wheel, but relies on drivers to + implement their hardware-dependent functions declared by GMEM. + source "mm/damon/Kconfig" endmenu diff --git a/mm/Makefile b/mm/Makefile index 33873c8aedb3..f48ea2eb4a44 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -138,3 +138,4 @@ obj-$(CONFIG_IO_MAPPING) += io-mapping.o obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o +obj-$(CONFIG_GMEM) += gmem.o diff --git a/mm/gmem.c b/mm/gmem.c new file mode 100644 index 000000000000..767eb070b22e --- /dev/null +++ b/mm/gmem.c @@ -0,0 +1,78 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Generalized Memory Management. + * + * Copyright (C) 2023- Huawei, Inc. + * Author: Weixi Zhu + * + */ +#include +#include + +DEFINE_SPINLOCK(hnode_lock); + +struct hnode { + unsigned int id; + struct gm_dev *dev; + struct xarray pages; +}; + +struct hnode *hnodes[MAX_NUMNODES]; + +static bool is_hnode(int node) +{ + return !node_isset(node, node_possible_map) && + node_isset(node, hnode_map); +} + +static bool is_hnode_allowed(int node) +{ + return is_hnode(node) && node_isset(node, current->mems_allowed); +} + +static struct hnode *get_hnode(unsigned int hnid) +{ + return hnodes[hnid]; +} + +void __init hnuma_init(void) +{ + unsigned int node; + + for_each_node(node) + node_set(node, hnode_map); +} + +static unsigned int alloc_hnode_id(void) +{ + unsigned int node; + + spin_lock(&hnode_lock); + node = first_unset_node(hnode_map); + node_set(node, hnode_map); + spin_unlock(&hnode_lock); + + return node; +} + +static void free_hnode_id(unsigned int nid) +{ + node_clear(nid, hnode_map); +} + +static void hnode_init(struct hnode *hnode, unsigned int hnid, + struct gm_dev *dev) +{ + hnodes[hnid] = hnode; + hnodes[hnid]->id = hnid; + hnodes[hnid]->dev = dev; + xa_init(&hnodes[hnid]->pages); +} + +static void hnode_deinit(unsigned int hnid, struct gm_dev *dev) +{ + hnodes[hnid]->id = 0; + hnodes[hnid]->dev = NULL; + xa_destroy(&hnodes[hnid]->pages); + hnodes[hnid] = NULL; +} diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 733732e7e0ba..a785b62a1542 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -192,6 +192,9 @@ EXPORT_SYMBOL(latent_entropy); nodemask_t node_states[NR_NODE_STATES] __read_mostly = { [N_POSSIBLE] = NODE_MASK_ALL, [N_ONLINE] = { { [0] = 1UL } }, +#ifdef CONFIG_GMEM + [N_HETEROGENEOUS] = NODE_MASK_NONE, +#endif #ifndef CONFIG_NUMA [N_NORMAL_MEMORY] = { { [0] = 1UL } }, #ifdef CONFIG_HIGHMEM -- 2.25.1