From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E584C10DC1 for ; Thu, 7 Dec 2023 00:28:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA81C6B0081; Wed, 6 Dec 2023 19:28:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DE3B46B0083; Wed, 6 Dec 2023 19:28:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBF566B0085; Wed, 6 Dec 2023 19:28:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9FAE66B0081 for ; Wed, 6 Dec 2023 19:28:22 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7B6FE1601C2 for ; Thu, 7 Dec 2023 00:28:22 +0000 (UTC) X-FDA: 81538135644.27.59FFB49 Received: from mail-yw1-f194.google.com (mail-yw1-f194.google.com [209.85.128.194]) by imf16.hostedemail.com (Postfix) with ESMTP id A90BE180018 for ; Thu, 7 Dec 2023 00:28:20 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IY6U40sU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.128.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701908900; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Qvi2/mpIum+VLqMErxqrFZmXeHAbCeCHSBvaGi/hCFE=; b=hfUlri9fDnWQBFloGH8POiaSnmnTPfubv9ny7+/6xQ+2Hba5ogyqz77lAnJDoW2fG+IAd5 43zbvKTjCqi76HLVDO6sq5u/taZaPgkR1mEvJMK/3mxXX7/qBl26vGjmxUzVo4hnFyRKW5 +Xsb2RrPp72FHqJYKos3vtIPKtw6X0Y= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IY6U40sU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.128.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701908900; a=rsa-sha256; cv=none; b=gCN65q9B8LQD5FCYw1PJwjSxdxkUkOUqKO8GPKnIdtt4Ypzg2Ml1zclY5FH8Mn9rUsxQDH iFfc8Loh0reLsGhwAuAmQJDWp45ws4xBjKtV19n3inq5mCi5kY/4zHLteWejkAKs0LRgDO KiC/2rIBBHyHt0Ci7YCynBYlzRicx8Q= Received: by mail-yw1-f194.google.com with SMTP id 00721157ae682-5cfc3a48ab2so1069837b3.0 for ; Wed, 06 Dec 2023 16:28:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701908899; x=1702513699; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Qvi2/mpIum+VLqMErxqrFZmXeHAbCeCHSBvaGi/hCFE=; b=IY6U40sUGTG0E5ilzCnRoCaUoQgDeUDBM4d6zPljIR5jYFANULiDROf1mq2/ebJkUR suOGhsD1wY/QnrBQZKAsKE2xY6hc0rmlkBU71PU43Pm5hDWSPh7KTDxDBXbXtAFAgtWU OjlUMl+JPPLQVRmteNvuQjqzXSKHJ09jz+xG3ZSwKQqe169MXjm5kBoFGpHw9legpomR sD5hOSl8Qfp08+b6O+NDYarThw+cipq8WaE0QmWmlgA21Dtp30Tocg0DW4gXHE+3OxPr 1alOF4YhZbkhaiJ36gNUdJEiKetOppaUqcZLRHht3zUfBvcp5LFwmy9JeYWw6dcLPzJG WLyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701908899; x=1702513699; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Qvi2/mpIum+VLqMErxqrFZmXeHAbCeCHSBvaGi/hCFE=; b=j7+qjmyZB6tec/g9XncUjyW/tiTnTqfIcs1VO6D9KE/bYGbxvMFbyznFSeTXvzsUjQ vASx7OHGAg+eZrsIMrMzQPMjIJPtBCth4zdL25OzpKdj5QK3C8mESqZlwtgSrwieCa44 YNi0KtL8OPVkQ5sm3N8QVl2RIu/VfpNWBPQB3fTXkKqrTbwQm79o6mAIwVNr48uL1BLb 9OjCf9ea9GFWeaqLpv1ePPdJJHXXJY9q4oe6F1U+lzUiSUmWwWEsB/Ri4lL9nc1NCG1R S3D7K/vmucU5VsS3LbgMlw/Kz99vfh7wjk+AZqvj6Z1Qi4ceiQ6+Q12u4Aqh3UrlI9yZ eFnA== X-Gm-Message-State: AOJu0YxiwqK0fmsD8o5BhT/qmTVmiiKMEk0LneFsbGvrIX+yac0FS7lr L9Gan/0/R3kKwD5L9dD/mOwHb5G3uxkT X-Google-Smtp-Source: AGHT+IFVVF+YBuzebOJmvH0eUoKyko9Si9Ft0YksdzCwNpjnfH/hXJxOD4qQ1QcAfTvX+hXmiDv2mw== X-Received: by 2002:a81:b104:0:b0:5d7:1941:ac3 with SMTP id p4-20020a81b104000000b005d719410ac3mr1584805ywh.94.1701908899587; Wed, 06 Dec 2023 16:28:19 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id x145-20020a81a097000000b005d82fc8cc92sm19539ywg.105.2023.12.06.16.28.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 16:28:19 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org, jgroves@micron.com, ravis.opensrc@micron.com, sthanneeru@micron.com, emirakhur@micron.com, Hasan.Maruf@amd.com Cc: linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de, luto@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, mhocko@kernel.org, tj@kernel.org, ying.huang@intel.com, gregory.price@memverge.com, corbet@lwn.net, rakie.kim@sk.com, hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com, peterz@infradead.org Subject: [RFC PATCH 05/11] mm/mempolicy: refactor kernel_get_mempolicy for code re-use Date: Wed, 6 Dec 2023 19:27:53 -0500 Message-Id: <20231207002759.51418-6-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20231207002759.51418-1-gregory.price@memverge.com> References: <20231207002759.51418-1-gregory.price@memverge.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: A90BE180018 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: gfj6fdxbqhdzf56ozhwqnibrhiuezy45 X-HE-Tag: 1701908900-189658 X-HE-Meta: U2FsdGVkX18iPJQmiAlGH0OuuCYlFtWsdVJ5gawx8rb32SoIyBHvkQHDl1G5r9KqdOLmZVbMru4YhrQWkIjxyRhZHVRzRmz3172Yg93bR/1fhk0hqkzieq0AHrxq5OYrjI6jMKUPZNgEmDVQbdNn4fTQhk+Dzh4dmaq6qa0t+xbyAN13vt3gPWbsJ5a86W1Cr8l91J5nD3Wf389lxSUNE9QJ3lZs0ywxonzQ2gMG9wHugcxGsG2qMWcBt8lYE07e7jXwqWMhnkGX2ldOFGapKZlaR4VykRmo7Q/Huc33MLe+54AAczo+LyQpBffNPvZFDoi+1AmhazAdn8A/pesYLXlV+zlr43Ce0NGc7mcu4Pbmj8YYadFq1FeHYcHn/e0CvUC40vSFOx7ik6npGcrIkDeKw1xFVZF4OD72+GzoJb509Wz1kFn944CU4ko3m2l3cm9WiUOqbYs8shxWP5ql0BJ8ObSyQzaSBhx43ofZPHBqVVnxeOhzWbiEe0DwIkZ2YD/1fmmG39fckHvzyz4FN9ypS2wCT4zt/kJH6DtwAq6gc05j5rBv9jpRLvqxIENPE6nVQS1LHvjjp6fLGgzgdK9SCrkF8JgDRwaOoOPA13VaPecHf583pdVAR2Jsq0pWHfhAmL7s/eXAXVXj24b0eafpTU2Xlbdf89FXpEfoVyt9zJjZ5YbEGLVE8YtmXMikhsXoy95LNleI/R326r2fDVcEilYvQ+KUM88jRG7Z1FIiuPuWE0XXGJy2bG9qH4QYescr6hx9jOhZLhVEBbwSWV4VPyXVZPl7Uy/VrZLT5TOy2RPwPZMMaCaQ9+Peu0sdVdgC8rfCPyva35jo0JuPw5459g5JJiL9CvPgMnX9Cr/A7dwNzg8jDy5w4NB9m8ONmBdKv82X7hmCGiWdir8HKbZNrYvkBJR/dKhszmHSM4IvJ7Kc7+mwQRJ253b5dZIk5v9MRT9i3pUp40yCJH2 vqUMLLzW YFJKcRl+e/Yj89ugOORvds9P3UefqQ2Zp0PEqb6+g7bfJyx5RZvMCun1jKsR6Nc4WBmnmgWPU4Xpu92qWkzsNGUvdVLYUI8QGKMx+wSZoeffT9jNnFIyvwoLIqKXIdv91UWIpeaVlCXSYL+8QKh6HwRdhIDdURuTOJrUqRPkdtFf8YPmiMZibgtWMLX4s6GPqSFmAS5mF0+6ECQcuwyUuM9lcYcG8izHGz+bF4nqxl87C50Kb6n+trUpxaddLPwadQZD9aWlMctk2BdvMo5CT2HgjnOkpKw0msLVY47gAvmv3BYwUHaWuMkDGlbqDcD/KY3UtNsq1um2vlHNBW2xqJASNr4n3sf8kp3np+IbGzYNcHEYYt9pnNpZ4oG0nXwDzmYmp8Vg8kl7P6+MPQ5kxLAlFzmxvsldOiBQCUZox/YvDEi/0O/eVO0L/8KrjmaPeTKmK04s+ckPG/wPT9GmTeDvS/eixjzSyBWLP847em3bEYR6aV5x5MKebR95FGza84ctYhlEOdSWgGFc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Pull operation flag checking from inside do_get_mempolicy out to kernel_get_mempolicy. This allows us to flatten the internal code, and break it into separate functions for future syscalls (get_mempolicy2, process_get_mempolicy) to re-use the code, even after additional extensions are made. The primary change is that the flag is treated as the multiplexer that it actually is. For get_mempolicy, the flags represents 3 different primary operations: if (flags & MPOL_F_MEMS_ALLOWED) return task->mems_allowed else if (flags & MPOL_F_ADDR) return vma mempolicy information else return task mempolicy information Plus the behavior modifying flag: if (flags & MPOL_F_NODE) change the return value of (int __user *policy) based on whether MPOL_F_ADDR was set. The original behavior of get_mempolicy is retained, but we utilize the new mempolicy_args structure to pass the operations down the stack. This will allow us to extend the internal functions without affecting the legacy behavior of get_mempolicy. Signed-off-by: Gregory Price --- mm/mempolicy.c | 240 ++++++++++++++++++++++++++++++------------------- 1 file changed, 150 insertions(+), 90 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4c343218c033..fecdc781b6a0 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -898,106 +898,107 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr) return ret; } -/* Retrieve NUMA policy */ -static long do_get_mempolicy(int *policy, nodemask_t *nmask, - unsigned long addr, unsigned long flags) +/* Retrieve the mems_allowed for current task */ +static inline long do_get_mems_allowed(nodemask_t *nmask) { - int err; - struct mm_struct *mm = current->mm; - struct vm_area_struct *vma = NULL; - struct mempolicy *pol = current->mempolicy, *pol_refcount = NULL; + task_lock(current); + *nmask = cpuset_current_mems_allowed; + task_unlock(current); + return 0; +} - if (flags & - ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR|MPOL_F_MEMS_ALLOWED)) - return -EINVAL; +/* If the policy has additional node information to retrieve, return it */ +static long do_get_policy_node(struct mempolicy *pol) +{ + /* + * For MPOL_INTERLEAVE, the extended node information is the next + * node that will be selected for interleave. For weighted interleave + * we return the next node based on the current weight. + */ + if (pol == current->mempolicy && pol->mode == MPOL_INTERLEAVE) + return next_node_in(current->il_prev, pol->nodes); - if (flags & MPOL_F_MEMS_ALLOWED) { - if (flags & (MPOL_F_NODE|MPOL_F_ADDR)) - return -EINVAL; - *policy = 0; /* just so it's initialized */ + if (pol == current->mempolicy && + pol->mode == MPOL_WEIGHTED_INTERLEAVE) { + if (pol->wil.cur_weight) + return current->il_prev; + else + return next_node_in(current->il_prev, pol->nodes); + } + return -EINVAL; +} + +/* Handle user_nodemask condition when fetching nodemask for userspace */ +static void do_get_mempolicy_nodemask(struct mempolicy *pol, nodemask_t *nmask) +{ + if (mpol_store_user_nodemask(pol)) { + *nmask = pol->w.user_nodemask; + } else { task_lock(current); - *nmask = cpuset_current_mems_allowed; + get_policy_nodemask(pol, nmask); task_unlock(current); - return 0; } +} - if (flags & MPOL_F_ADDR) { - pgoff_t ilx; /* ignored here */ - /* - * Do NOT fall back to task policy if the - * vma/shared policy at addr is NULL. We - * want to return MPOL_DEFAULT in this case. - */ - mmap_read_lock(mm); - vma = vma_lookup(mm, addr); - if (!vma) { - mmap_read_unlock(mm); - return -EFAULT; - } - pol = __get_vma_policy(vma, addr, &ilx); - } else if (addr) - return -EINVAL; +/* Retrieve NUMA policy for a VMA assocated with a given address */ +static long do_get_vma_mempolicy(struct mempolicy_args *args) +{ + pgoff_t ilx; + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma = NULL; + struct mempolicy *pol = NULL; + mmap_read_lock(mm); + vma = vma_lookup(mm, args->addr); + if (!vma) { + mmap_read_unlock(mm); + return -EFAULT; + } + pol = __get_vma_policy(vma, args->addr, &ilx); if (!pol) - pol = &default_policy; /* indicates default behavior */ + pol = &default_policy; + /* this may cause a double-reference, resolved by a put+cond_put */ + mpol_get(pol); + mmap_read_unlock(mm); - if (flags & MPOL_F_NODE) { - if (flags & MPOL_F_ADDR) { - /* - * Take a refcount on the mpol, because we are about to - * drop the mmap_lock, after which only "pol" remains - * valid, "vma" is stale. - */ - pol_refcount = pol; - vma = NULL; - mpol_get(pol); - mmap_read_unlock(mm); - err = lookup_node(mm, addr); - if (err < 0) - goto out; - *policy = err; - } else if (pol == current->mempolicy && - pol->mode == MPOL_INTERLEAVE) { - *policy = next_node_in(current->il_prev, pol->nodes); - } else if (pol == current->mempolicy && - (pol->mode == MPOL_WEIGHTED_INTERLEAVE)) { - if (pol->wil.cur_weight) - *policy = current->il_prev; - else - *policy = next_node_in(current->il_prev, - pol->nodes); - } else { - err = -EINVAL; - goto out; - } - } else { - *policy = pol == &default_policy ? MPOL_DEFAULT : - pol->mode; - /* - * Internal mempolicy flags must be masked off before exposing - * the policy to userspace. - */ - *policy |= (pol->flags & MPOL_MODE_FLAGS); - } + /* Fetch the node for the given address */ + args->addr_node = lookup_node(mm, args->addr); - err = 0; - if (nmask) { - if (mpol_store_user_nodemask(pol)) { - *nmask = pol->w.user_nodemask; - } else { - task_lock(current); - get_policy_nodemask(pol, nmask); - task_unlock(current); - } + args->mode = pol == &default_policy ? MPOL_DEFAULT : pol->mode; + args->mode_flags = (pol->flags & MPOL_MODE_FLAGS); + + /* If this policy has extra node info, fetch that */ + args->policy_node = do_get_policy_node(pol); + + if (args->policy_nodes) + do_get_mempolicy_nodemask(pol, args->policy_nodes); + + if (pol != &default_policy) { + mpol_put(pol); + mpol_cond_put(pol); } - out: - mpol_cond_put(pol); - if (vma) - mmap_read_unlock(mm); - if (pol_refcount) - mpol_put(pol_refcount); - return err; + return 0; +} + +/* Retrieve NUMA policy for the current task */ +static long do_get_task_mempolicy(struct mempolicy_args *args) +{ + struct mempolicy *pol = current->mempolicy; + + if (!pol) + pol = &default_policy; /* indicates default behavior */ + + args->mode = pol == &default_policy ? MPOL_DEFAULT : pol->mode; + /* Internal flags must be masked off before exposing to userspace */ + args->mode_flags = (pol->flags & MPOL_MODE_FLAGS); + + args->policy_node = do_get_policy_node(pol); + + if (args->policy_nodes) + do_get_mempolicy_nodemask(pol, args->policy_nodes); + + return 0; } #ifdef CONFIG_MIGRATION @@ -1734,16 +1735,75 @@ static int kernel_get_mempolicy(int __user *policy, unsigned long addr, unsigned long flags) { + struct mempolicy_args args; int err; - int pval; + int pval = 0; nodemask_t nodes; if (nmask != NULL && maxnode < nr_node_ids) return -EINVAL; - addr = untagged_addr(addr); + if (flags & + ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR|MPOL_F_MEMS_ALLOWED)) + return -EINVAL; - err = do_get_mempolicy(&pval, &nodes, addr, flags); + /* Ensure any data that may be copied to userland is initialized */ + memset(&args, 0, sizeof(args)); + args.policy_nodes = &nodes; + args.addr = untagged_addr(addr); + + /* + * set_mempolicy was originally multiplexed based on 3 flags: + * MPOL_F_MEMS_ALLOWED: fetch task->mems_allowed + * MPOL_F_ADDR : operate on vma->mempolicy + * MPOL_F_NODE : change return value of *policy + * + * Split this behavior out here, rather than internal functions, + * so that the internal functions can be re-used by future + * get_mempolicy2 interfaces and the arg structure made extensible + */ + if (flags & MPOL_F_MEMS_ALLOWED) { + if (flags & (MPOL_F_NODE|MPOL_F_ADDR)) + return -EINVAL; + pval = 0; /* just so it's initialized */ + err = do_get_mems_allowed(&nodes); + } else if (flags & MPOL_F_ADDR) { + /* If F_ADDR, we operation on a vma policy (or default) */ + err = do_get_vma_mempolicy(&args); + if (err) + return err; + /* if (F_ADDR | F_NODE), *pval is the address' node */ + if (flags & MPOL_F_NODE) { + /* if we failed to fetch, that's likely an EFAULT */ + if (args.addr_node < 0) + return args.addr_node; + pval = args.addr_node; + } else + pval = args.mode | args.mode_flags; + } else { + /* if not F_ADDR and addr != null, EINVAL */ + if (addr) + return -EINVAL; + + err = do_get_task_mempolicy(&args); + if (err) + return err; + /* + * if F_NODE was set and mode was MPOL_INTERLEAVE + * *pval is equal to next interleave node. + * + * if args.policy_node < 0, this means the mode did + * not have a policy. This presently emulates the + * original behavior of (F_NODE) & (!MPOL_INTERLEAVE) + * producing -EINVAL + */ + if (flags & MPOL_F_NODE) { + if (args.policy_node < 0) + return args.policy_node; + pval = args.policy_node; + } else + pval = args.mode | args.mode_flags; + } if (err) return err; -- 2.39.1