From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C38C3C624D2 for ; Sun, 22 Feb 2026 08:49:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1313E6B0089; Sun, 22 Feb 2026 03:49:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B17E6B008A; Sun, 22 Feb 2026 03:49:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC81B6B008C; Sun, 22 Feb 2026 03:48:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D5E166B0089 for ; Sun, 22 Feb 2026 03:48:59 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 75EB95998A for ; Sun, 22 Feb 2026 08:48:59 +0000 (UTC) X-FDA: 84471467598.12.39444A2 Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) by imf28.hostedemail.com (Postfix) with ESMTP id 9A2B0C000B for ; Sun, 22 Feb 2026 08:48:57 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=qmb7Pv5S; spf=pass (imf28.hostedemail.com: domain of gourry@gourry.net designates 209.85.160.178 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1771750137; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TgAF2gtVAnq1fInoChXWKm0rRX8xSwD2e+wYRBZcfuo=; b=kEPoyNcukOkDlEpDsifqDybJOwfmE6iFqMX5K9QkPxPaOCy+2luYO2InZn8ovLTRGLpeRT YpTFtXCIgACPDVKiFTBfmbEbAOPnHDE3h1DqG4piKRFkUF6bYU1sBUVVb/0X3meU3Dr0PH Kp1isw+zds5GtsCCDIsKoCFdXKC6Dq4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1771750137; a=rsa-sha256; cv=none; b=jxY9BGHsZ8XK1kb/Dxh0HCRHrR8F+vrTdF6Ll25YRB2zkvo2uVeRX8x29C9h+yjp4AGfRZ JysUv5/CyDz/RAS/Vm3ZLItflnKk4fu3smPRttAybi6O4nsgTWTxwh34xI5vCEcJYVwIlO dEWquf7cV4RPjW2xUZ9AHtUWkfDoitU= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=qmb7Pv5S; spf=pass (imf28.hostedemail.com: domain of gourry@gourry.net designates 209.85.160.178 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-506cb1b63d0so36178521cf.2 for ; Sun, 22 Feb 2026 00:48:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1771750136; x=1772354936; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TgAF2gtVAnq1fInoChXWKm0rRX8xSwD2e+wYRBZcfuo=; b=qmb7Pv5SCBd1/q0kGtpfv4XHhxCRDoBmzhdv9iXxyl61binRJl5eie+UM4cfmSHLGJ rK4JrAdaTET/PgxMlps2rPBVnFtEjVaYliECjNeU9YZAMCgxUmLgPirqENpTcvjatvQm 0fGeZ1F4kzg7QlKdOwQtqdIKip3EySJ9eMl6PnOteFY42IC2gRhw/Qb7VpiOmTC8MbaB 3Rs4PJsJYE+VvUXxCLgbBArBZFFsWMrY4x3HAm9HpuJLXE34kZEa6zLf0Jub32isrABV k6Igkhz59dpVHoAQ77xy+aAvxtPPItRhOwL03lml8KsM3j/w7UONm+9EHsZ+nHq62wVZ PXUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771750136; x=1772354936; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=TgAF2gtVAnq1fInoChXWKm0rRX8xSwD2e+wYRBZcfuo=; b=pX1bAICmPsrLG+TJnturypEV+/4BRMMsaRwGdZN4nyou/c2Ukom4Mz/KAVOGG/AP/O Lx6vnXrUwYa5ys5VfQJ18d4I7s2faJ1wiv54Pmo3NxfrVDXWOOXauHikYT6m7SYUWxdT rFCNayHGJ28vkqCaMYrmACvDD19MogzXXIPzrpWAtoHLScEof0VVW5olyFq+36NYhTq9 p78BgbTUJxPsV4x1eKTeD7brzHmPQ/qxjg04ChH1W+OPHdMyLGi3+AODsGbcMVxjmHmz GWKhqrf7fPc7s9TbVsQQD9HgnHRWMfELc2Ln5gzLOdBYCy9Kv1kQ6ZOhd9qpl354dEx/ Lp8w== X-Forwarded-Encrypted: i=1; AJvYcCULjV+380aJMzRyJNS5mWFnapWsAFPZWMv1kzn1NysCd1RDVR0JFftC2cZ6YH3/AXubOd6qVQdfBg==@kvack.org X-Gm-Message-State: AOJu0YzMm3cxWraPI5pz/VR8IZ6csKbT/4cZuqsDjuOZtelX/pwT0+8K 8EvNr66/mItwfsal2bkyxunVurg0FJMUNrSjPNXtJJfe+iQvNa3x9gT7A5nBnmp77zs= X-Gm-Gg: AZuq6aIzPDKBl7fZm2XCHBxnZUz1eVPrBb+CNCuSwU1UxaJ+ZKbQfSf14Kx9xfI4pVw LXJlJ3ne3OOSYc8A/CNhf+ZAeg4exfOegy2rZrdmZFwNQHTShJDrUF5lZuS27pOL4gsCl6vobSI weZLh/BGbTuvRb5j1WEhWNSk8iyXYNs8xMfLVlR+E8T89nYi72eKtLe3jo1XtknZxjxqC3xZTKX gAKyCq1nfd4s5xsa/+Nv2m/TiMYBv5hS4W36/+/kHXI2RsXm7aUT5PfDr9kHeQB57W6bUqxXKgq wN+7ZI2CISv4tiZySEegwT4TLoRVJg4/ec/jzMhrInWYmB9tWwZDcDxQxX0lk7FEz4sJ7a2l1Ou uJhfY4BlcnyzpghOTNp86zEcpe5MlUZG7beKLGLouCx6fJ0f9GdVpeXYpmNBPzICYYzmX8Khvv1 HM8/5VBw68gh1BIioBUUSckUOjdALj6iWB8yif6wqmYx9IxQXXs2FgrKhpAjpKNbaFeoUV4E1+y 32Tq7gLNeDrOYU= X-Received: by 2002:a05:622a:180b:b0:501:49f9:7488 with SMTP id d75a77b69052e-5070bc727d6mr79493791cf.49.1771750136265; Sun, 22 Feb 2026 00:48:56 -0800 (PST) Received: from gourry-fedora-PF4VCD3F.lan (pool-96-255-20-138.washdc.ftas.verizon.net. [96.255.20.138]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-5070d53f0fcsm38640631cf.9.2026.02.22.00.48.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Feb 2026 00:48:55 -0800 (PST) From: Gregory Price To: lsf-pc@lists.linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, damon@lists.linux.dev, kernel-team@meta.com, gregkh@linuxfoundation.org, rafael@kernel.org, dakr@kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, yury.norov@gmail.com, linux@rasmusvillemoes.dk, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, jackmanb@google.com, sj@kernel.org, baolin.wang@linux.alibaba.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, muchun.song@linux.dev, xu.xin16@zte.com.cn, chengming.zhou@linux.dev, jannh@google.com, linmiaohe@huawei.com, nao.horiguchi@gmail.com, pfalcato@suse.de, rientjes@google.com, shakeel.butt@linux.dev, riel@surriel.com, harry.yoo@oracle.com, cl@gentwo.org, roman.gushchin@linux.dev, chrisl@kernel.org, kasong@tencent.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, zhengqi.arch@bytedance.com, terry.bowman@amd.com Subject: [RFC PATCH v4 01/27] numa: introduce N_MEMORY_PRIVATE node state Date: Sun, 22 Feb 2026 03:48:16 -0500 Message-ID: <20260222084842.1824063-2-gourry@gourry.net> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260222084842.1824063-1-gourry@gourry.net> References: <20260222084842.1824063-1-gourry@gourry.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 9A2B0C000B X-Stat-Signature: kdqn48gpky6hmr3xmhfpf9iyn73ef7gi X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1771750137-272630 X-HE-Meta: U2FsdGVkX1+i32leoll/X/neti7suP8ocvoSjV6H/GI/ZnPEsOxOe4eWLN0xUDidZqyexogJSXDXoAXHSBH0md0P1510vB9BNuwpDV7LgDcbR+LpvlJbCYAbExHhjx1wC8axDTSa4UeNbWQ0ZSboAqnRznl/Sc4oHWru/op5RNVZt/ojEvLRzO59XrQ37iV1jKLQW2QG0IF0KBCsHrh/UHkxamanT05VVsA1i8qR8SaljUrGFqN42AtMWR7xkdPo5aO6iBdJ9GyBvdoyVnMzCSb2hlrM9HTgE1pCgztnD2dicZTe7xNaRY5UF9x9WdcCkj8onPh+tMfSJMzWx/Cy+YX6PvhAAHJSei2xQzVTMP5xfhOQj6Sv1Oipto8hj0mp3aKxYtv25ThJ6Q94BIPXLy/5E+tY8rUu/NNd9i0qwx5mDR0FUphTyATJemQUD4O4cmJ1mARQENssDQ/92cRuoRTeVTeekWczXz2HLVLF8+X/8x363bOzYinUdAmbGhHasWOWW8xrwz4ywqCnTrDW/gbM2xxqhOPsdAQOQp85X0N9B81NpL6Oqzja5EKLy8jfvvkGoXwa8x/dhFLkHSybNYLcBUB7uP/WsVdVXhSaaSxJz1tyDLtB4gV9EEMRSvI+RyAuNMae/SHbfPei1rQ7oiO/0NetD5Kb8yRkKIOwgElR57rs1YAV0hbhAlyrpu9UfXnnlAotZVMosHwSurF0v2mxvO944MHH1sfVonWCRSnYrGzUl9cS10KQ5Mj1ZcodXatUH/yYj5FDOYUVm+7if+v9AZxgcuagjRs7JiQEMLz9QOtEuY0xOaj16KowVx/CZ3bKNaTQdbsrTfYx7Xd0FVQlMC3FHOI6IX/hgELXPNpJwFmfNVHqSUiaS4+NRwXtwxyNRHRI9ZeWIoCKOLbyXVvVqegxZ1BdWBmG5gXcavrlTVLP8YoaFhEyQ5ersL1RouyYw2gYTpkGB7C/aqI GEMi6xgv clyjP7RPf+D6UMc7DySOtvU8i5x2reav4yDSljDELxo4UHw9D31cfkOQkJzmlSN57K0CMpWouyyqVDuW0bRIRr9MuSBYTB+pMxtnrwK0MdKIPa34PQXU3ZoAH6vRCavTJ2+dpRFxn+5n8v6ajxImB9ZMVkTLVmctfdgFIH6cMQ61g3cfHp3dnMR2gs4oAfDymHG8zaYvtKKpziHEP5nsqeRTOh8UoO7gz9QBGmNhg5NvzNIM/vKLv/Js7j1qIA2TMhy7628XYelYw+06f+0zOk2QAreHC6M6loa9R6kVyQ0dptK0rYLwFOxdetr2GqWMcd1vRuk5bbUQP1uTNoJjl2Cmv9egkgdQ/Pco3dOwsH7NrCXaDvoPaYAmtez3tnrwXrbSTRNElO0g0yFPhtEp+flCsSn8U4rhz5KZIyLZsTCBYU+4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: N_MEMORY nodes are intended to contain general System RAM. Today, some device drivers hotplug their memory (marked Specific Purpose or Reserved) to get access to mm/ services, but don't intend it for general consumption. Create N_MEMORY_PRIVATE for memory nodes whose memory is not intended for general consumption. This state is mutually exclusive with N_MEMORY. Add the node_private infrastructure for N_MEMORY_PRIVATE nodes: - struct node_private: Per-node container stored in NODE_DATA(nid), holding driver callbacks (ops), owner, and refcount. - struct node_private_ops: Initial structure with void *reserved placeholder and flags field. Callbacks will be added by subsequent commits as each consumer is wired up. - folio_is_private_node() / page_is_private_node(): check if a folio/page resides on a private node. - folio_node_private_ops() / node_private_flags(): retrieve the ops vtable or flags for a folio's node. - Registration API: node_private_register()/unregister() for drivers to register callbacks for private nodes. Only one driver callback can be registered per node - attempting to register different ops returns -EBUSY. - sysfs attribute exposing N_MEMORY_PRIVATE node state. Zonelist construction changes for private nodes are deferred to a subsequent commit. Signed-off-by: Gregory Price --- drivers/base/node.c | 197 ++++++++++++++++++++++++++++++++ include/linux/mmzone.h | 4 + include/linux/node_private.h | 210 +++++++++++++++++++++++++++++++++++ include/linux/nodemask.h | 1 + 4 files changed, 412 insertions(+) create mode 100644 include/linux/node_private.h diff --git a/drivers/base/node.c b/drivers/base/node.c index 00cf4532f121..646dc48a23b5 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -22,6 +22,7 @@ #include #include #include +#include static const struct bus_type node_subsys = { .name = "node", @@ -861,6 +862,198 @@ void register_memory_blocks_under_node_hotplug(int nid, unsigned long start_pfn, (void *)&nid, register_mem_block_under_node_hotplug); return; } + +static DEFINE_MUTEX(node_private_lock); +static bool node_private_initialized; + +/** + * node_private_register - Register a private node + * @nid: Node identifier + * @np: The node_private structure (driver-allocated, driver-owned) + * + * Register a driver for a private node. Only one driver can register + * per node. If another driver has already registered (with different np), + * -EBUSY is returned. Re-registration with the same np is allowed. + * + * The driver owns the node_private memory and must ensure it remains valid + * until refcount reaches 0 after node_private_unregister(). + * + * Returns 0 on success, negative errno on failure. + */ +int node_private_register(int nid, struct node_private *np) +{ + struct node_private *existing; + pg_data_t *pgdat; + int ret = 0; + + if (!np || !node_possible(nid)) + return -EINVAL; + + if (!node_private_initialized) + return -ENODEV; + + mutex_lock(&node_private_lock); + mem_hotplug_begin(); + + /* N_MEMORY_PRIVATE and N_MEMORY are mutually exclusive */ + if (node_state(nid, N_MEMORY)) { + ret = -EBUSY; + goto out; + } + + pgdat = NODE_DATA(nid); + existing = rcu_dereference_protected(pgdat->node_private, + lockdep_is_held(&node_private_lock)); + + /* Only one source my register this node */ + if (existing) { + if (existing != np) { + ret = -EBUSY; + goto out; + } + goto out; + } + + refcount_set(&np->refcount, 1); + init_completion(&np->released); + + rcu_assign_pointer(pgdat->node_private, np); + pgdat->private = true; + +out: + mem_hotplug_done(); + mutex_unlock(&node_private_lock); + return ret; +} +EXPORT_SYMBOL_GPL(node_private_register); + +/** + * node_private_set_ops - Set service callbacks on a registered private node + * @nid: Node identifier + * @ops: Service callbacks and flags (driver-owned, must outlive registration) + * + * Validates flag dependencies and sets the ops on the node's node_private. + * The node must already be registered via node_private_register(). + * + * Returns 0 on success, -EINVAL for invalid flag combinations, + * -ENODEV if no node_private is registered on @nid. + */ +int node_private_set_ops(int nid, const struct node_private_ops *ops) +{ + struct node_private *np; + int ret = 0; + + if (!ops) + return -EINVAL; + + if (!node_possible(nid)) + return -EINVAL; + + mutex_lock(&node_private_lock); + np = rcu_dereference_protected(NODE_DATA(nid)->node_private, + lockdep_is_held(&node_private_lock)); + if (!np) + ret = -ENODEV; + else + np->ops = ops; + mutex_unlock(&node_private_lock); + return ret; +} +EXPORT_SYMBOL_GPL(node_private_set_ops); + +/** + * node_private_clear_ops - Clear service callbacks from a private node + * @nid: Node identifier + * @ops: Expected ops pointer (must match current ops) + * + * Clears the ops only if @ops matches the currently registered ops, + * preventing one service from accidentally clearing another's callbacks. + * + * Returns 0 on success, -ENODEV if no node_private is registered, + * -EINVAL if @ops does not match. + */ +int node_private_clear_ops(int nid, const struct node_private_ops *ops) +{ + struct node_private *np; + int ret = 0; + + if (!node_possible(nid)) + return -EINVAL; + + mutex_lock(&node_private_lock); + np = rcu_dereference_protected(NODE_DATA(nid)->node_private, + lockdep_is_held(&node_private_lock)); + if (!np) + ret = -ENODEV; + else if (np->ops != ops) + ret = -EINVAL; + else + np->ops = NULL; + mutex_unlock(&node_private_lock); + return ret; +} +EXPORT_SYMBOL_GPL(node_private_clear_ops); + +/** + * node_private_unregister - Unregister a private node + * @nid: Node identifier + * + * Unregister the driver from a private node. Only succeeds if all memory + * has been offlined and the node is no longer N_MEMORY_PRIVATE. + * When successful, drops the refcount to 0 indicating the driver can + * free its context. + * + * N_MEMORY_PRIVATE state is cleared by offline_pages() when the last + * memory is offlined, not by this function. + * + * Return: 0 if unregistered, -EBUSY if N_MEMORY_PRIVATE is still set + * (other memory blocks remain on this node). + */ +int node_private_unregister(int nid) +{ + struct node_private *np; + pg_data_t *pgdat; + + if (!node_possible(nid)) + return 0; + + mutex_lock(&node_private_lock); + mem_hotplug_begin(); + + pgdat = NODE_DATA(nid); + np = rcu_dereference_protected(pgdat->node_private, + lockdep_is_held(&node_private_lock)); + if (!np) { + mem_hotplug_done(); + mutex_unlock(&node_private_lock); + return 0; + } + + /* + * Only unregister if all memory is offline and N_MEMORY_PRIVATE is + * cleared. N_MEMORY_PRIVATE is cleared by offline_pages() when the + * last memory block is offlined. + */ + if (node_state(nid, N_MEMORY_PRIVATE)) { + mem_hotplug_done(); + mutex_unlock(&node_private_lock); + return -EBUSY; + } + + rcu_assign_pointer(pgdat->node_private, NULL); + pgdat->private = false; + + mem_hotplug_done(); + mutex_unlock(&node_private_lock); + + synchronize_rcu(); + + if (!refcount_dec_and_test(&np->refcount)) + wait_for_completion(&np->released); + return 0; +} +EXPORT_SYMBOL_GPL(node_private_unregister); + #endif /* CONFIG_MEMORY_HOTPLUG */ /** @@ -959,6 +1152,7 @@ static struct node_attr node_state_attr[] = { [N_HIGH_MEMORY] = _NODE_ATTR(has_high_memory, N_HIGH_MEMORY), #endif [N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY), + [N_MEMORY_PRIVATE] = _NODE_ATTR(has_private_memory, N_MEMORY_PRIVATE), [N_CPU] = _NODE_ATTR(has_cpu, N_CPU), [N_GENERIC_INITIATOR] = _NODE_ATTR(has_generic_initiator, N_GENERIC_INITIATOR), @@ -972,6 +1166,7 @@ static struct attribute *node_state_attrs[] = { &node_state_attr[N_HIGH_MEMORY].attr.attr, #endif &node_state_attr[N_MEMORY].attr.attr, + &node_state_attr[N_MEMORY_PRIVATE].attr.attr, &node_state_attr[N_CPU].attr.attr, &node_state_attr[N_GENERIC_INITIATOR].attr.attr, NULL @@ -1007,5 +1202,7 @@ void __init node_dev_init(void) panic("%s() failed to add node: %d\n", __func__, ret); } + node_private_initialized = true; + register_memory_blocks_under_nodes(); } diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b01cb1e49896..992eb1c5a2c6 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -25,6 +25,8 @@ #include #include +struct node_private; + /* Free memory management - zoned buddy allocator. */ #ifndef CONFIG_ARCH_FORCE_MAX_ORDER #define MAX_PAGE_ORDER 10 @@ -1514,6 +1516,8 @@ typedef struct pglist_data { atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; #ifdef CONFIG_NUMA struct memory_tier __rcu *memtier; + struct node_private __rcu *node_private; + bool private; #endif #ifdef CONFIG_MEMORY_FAILURE struct memory_failure_stats mf_stats; diff --git a/include/linux/node_private.h b/include/linux/node_private.h new file mode 100644 index 000000000000..6a70ec39d569 --- /dev/null +++ b/include/linux/node_private.h @@ -0,0 +1,210 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_NODE_PRIVATE_H +#define _LINUX_NODE_PRIVATE_H + +#include +#include +#include +#include +#include + +struct page; +struct vm_area_struct; +struct vm_fault; + +/** + * struct node_private_ops - Callbacks for private node services + * + * Services register these callbacks to intercept MM operations that affect + * their private nodes. + * + * Flag bits control which MM subsystems may operate on folios on this node. + * + * The pgdat->node_private pointer is RCU-protected. Callbacks fall into + * three categories based on their calling context: + * + * Folio-referenced callbacks (RCU released before callback): + * The caller holds a reference to a folio on the private node, which + * pins the node's memory online and prevents node_private teardown. + * + * Refcounted callbacks (RCU released before callback): + * The caller has no folio on the private node (e.g., folios are on a + * source node being migrated TO this node). A temporary refcount is + * taken on node_private under rcu_read_lock to keep the structure (and + * the service module) alive across the callback. node_private_unregister + * waits for all temporary references to drain before returning. + * + * Non-folio callbacks (rcu_read_lock held during callback): + * No folio reference exists, so rcu_read_lock is held across the + * callback to prevent node_private from being freed. + * These callbacks MUST NOT sleep. + * + * @flags: Operation exclusion flags (NP_OPS_* constants). + * + */ +struct node_private_ops { + unsigned long flags; +}; + +/** + * struct node_private - Per-node container for N_MEMORY_PRIVATE nodes + * + * This structure is allocated by the driver and passed to node_private_register(). + * The driver owns the memory and must ensure it remains valid until after + * node_private_unregister() returns with the reference count dropped to 0. + * + * @owner: Opaque driver identifier + * @refcount: Reference count (1 = registered; temporary refs for non-folio + * callbacks that may sleep; 0 = fully released) + * @released: Signaled when refcount drops to 0; unregister waits on this + * @ops: Service callbacks and exclusion flags (NULL until service registers) + */ +struct node_private { + void *owner; + refcount_t refcount; + struct completion released; + const struct node_private_ops *ops; +}; + +#ifdef CONFIG_NUMA + +#include + +/** + * folio_is_private_node - Check if folio is on an N_MEMORY_PRIVATE node + * @folio: The folio to check + * + * Returns true if the folio resides on a private node. + */ +static inline bool folio_is_private_node(struct folio *folio) +{ + return node_state(folio_nid(folio), N_MEMORY_PRIVATE); +} + +/** + * page_is_private_node - Check if page is on an N_MEMORY_PRIVATE node + * @page: The page to check + * + * Returns true if the page resides on a private node. + */ +static inline bool page_is_private_node(struct page *page) +{ + return node_state(page_to_nid(page), N_MEMORY_PRIVATE); +} + +static inline const struct node_private_ops * +folio_node_private_ops(struct folio *folio) +{ + const struct node_private_ops *ops; + struct node_private *np; + + rcu_read_lock(); + np = rcu_dereference(NODE_DATA(folio_nid(folio))->node_private); + ops = np ? np->ops : NULL; + rcu_read_unlock(); + + return ops; +} + +static inline unsigned long node_private_flags(int nid) +{ + struct node_private *np; + unsigned long flags; + + rcu_read_lock(); + np = rcu_dereference(NODE_DATA(nid)->node_private); + flags = (np && np->ops) ? np->ops->flags : 0; + rcu_read_unlock(); + + return flags; +} + +static inline bool folio_private_flags(struct folio *f, unsigned long flag) +{ + return node_private_flags(folio_nid(f)) & flag; +} + +static inline bool node_private_has_flag(int nid, unsigned long flag) +{ + return node_private_flags(nid) & flag; +} + +static inline bool zone_private_flags(struct zone *z, unsigned long flag) +{ + return node_private_flags(zone_to_nid(z)) & flag; +} + +#else /* !CONFIG_NUMA */ + +static inline bool folio_is_private_node(struct folio *folio) +{ + return false; +} + +static inline bool page_is_private_node(struct page *page) +{ + return false; +} + +static inline const struct node_private_ops * +folio_node_private_ops(struct folio *folio) +{ + return NULL; +} + +static inline unsigned long node_private_flags(int nid) +{ + return 0; +} + +static inline bool folio_private_flags(struct folio *f, unsigned long flag) +{ + return false; +} + +static inline bool node_private_has_flag(int nid, unsigned long flag) +{ + return false; +} + +static inline bool zone_private_flags(struct zone *z, unsigned long flag) +{ + return false; +} + +#endif /* CONFIG_NUMA */ + +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG) + +int node_private_register(int nid, struct node_private *np); +int node_private_unregister(int nid); +int node_private_set_ops(int nid, const struct node_private_ops *ops); +int node_private_clear_ops(int nid, const struct node_private_ops *ops); + +#else /* !CONFIG_NUMA || !CONFIG_MEMORY_HOTPLUG */ + +static inline int node_private_register(int nid, struct node_private *np) +{ + return -ENODEV; +} + +static inline int node_private_unregister(int nid) +{ + return 0; +} + +static inline int node_private_set_ops(int nid, + const struct node_private_ops *ops) +{ + return -ENODEV; +} + +static inline int node_private_clear_ops(int nid, + const struct node_private_ops *ops) +{ + return -ENODEV; +} + +#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */ + +#endif /* _LINUX_NODE_PRIVATE_H */ diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index bd38648c998d..c9bcfd5a9a06 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -391,6 +391,7 @@ enum node_states { N_HIGH_MEMORY = N_NORMAL_MEMORY, #endif N_MEMORY, /* The node has memory(regular, high, movable) */ + N_MEMORY_PRIVATE, /* The node's memory is private */ N_CPU, /* The node has one or more cpus */ N_GENERIC_INITIATOR, /* The node has one or more Generic Initiators */ NR_NODE_STATES -- 2.53.0