From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40D69C433FE for ; Wed, 13 Oct 2021 09:46:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BCBA160F94 for ; Wed, 13 Oct 2021 09:46:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BCBA160F94 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 2814D6B006C; Wed, 13 Oct 2021 05:46:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 22F9A6B0071; Wed, 13 Oct 2021 05:46:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F8A5900002; Wed, 13 Oct 2021 05:46:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id 01D546B006C for ; Wed, 13 Oct 2021 05:46:35 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id B1A9C19B3D for ; Wed, 13 Oct 2021 09:46:35 +0000 (UTC) X-FDA: 78690934350.31.E0CCA3B Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf27.hostedemail.com (Postfix) with ESMTP id 3151170000A4 for ; Wed, 13 Oct 2021 09:46:35 +0000 (UTC) Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19D833Wm014847; Wed, 13 Oct 2021 05:46:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=oICOH0X7TMau5fbIVdgw2c7WVZjp0L935WwPYjC4wK8=; b=JAkRYjnCsmCUDqht1eDr6cc2hyUeJTIC7MGxE2QLSGyKwME/b6H4OpHyDz9cO6iPKDke FE02MMPBv1pUQR5RG13orx0OGIEb0jXFP39x9mKfHIMP9waPBg8V6dRGrANkejtr2exS jFLat1R4Z2WgHTDqta63j9o2Jt0ulh0e82paYrINhHmU0Q+LHNKcFD9MIPgsS/1C2BLJ 0lkLjuHb9269Uia7zqpgceUfqP4PrmO13Svl5VOqUA9A4GKgSfWiLh5CTw8bP8yGxgvc mI4S60oumyLpCRHpsDfgL+QP81uJ/6gMp6qqtuzZ7/jTwdWkoEo8eayjcKOo73oVHTFl oA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3bnnvfrpy1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 13 Oct 2021 05:46:25 -0400 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 19D9Nmia005983; Wed, 13 Oct 2021 05:46:25 -0400 Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0a-001b2d01.pphosted.com with ESMTP id 3bnnvfrpxq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 13 Oct 2021 05:46:25 -0400 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 19D9bRuR029596; Wed, 13 Oct 2021 09:46:24 GMT Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by ppma04dal.us.ibm.com with ESMTP id 3bkeq7evbq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 13 Oct 2021 09:46:24 +0000 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 19D9kNG350004316 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Oct 2021 09:46:23 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 89F32AC064; Wed, 13 Oct 2021 09:46:23 +0000 (GMT) Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6D214AC05E; Wed, 13 Oct 2021 09:46:18 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.38.58]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP; Wed, 13 Oct 2021 09:46:18 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org Cc: akpm@linux-foundation.org, "Aneesh Kumar K.V" , Ben Widawsky , Dave Hansen , Feng Tang , Michal Hocko , Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , Huang Ying Subject: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy Date: Wed, 13 Oct 2021 15:15:39 +0530 Message-Id: <20211013094539.962357-1-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: hUH0Tk_S1qxuPzV7qzW3OuOv7WRy2EfI X-Proofpoint-ORIG-GUID: cyKnzEWcF8yWp15QSIw5oNeQrDq22gSo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-10-13_03,2021-10-13_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 mlxlogscore=999 bulkscore=0 malwarescore=0 suspectscore=0 phishscore=0 mlxscore=0 lowpriorityscore=0 priorityscore=1501 spamscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109230001 definitions=main-2110130064 X-Rspamd-Queue-Id: 3151170000A4 X-Stat-Signature: c9134qre1jo7ky1epgmah3ufjk6juyhs Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=JAkRYjnC; spf=pass (imf27.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-Rspamd-Server: rspam06 X-HE-Tag: 1634118395-286597 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This mempolicy mode can be used with either the set_mempolicy(2) or mbind(2) interfaces. Like the MPOL_PREFERRED interface, it allows an application to set a preference node from which the kernel will fulfill memory allocation requests. Unlike the MPOL_PREFERRED mode, it takes a set of nodes. The nodes in the nodemask are used as fallback allocation nodes if memory is not available on the preferred node. Unlike MPOL_PREFERRED_MANY, it will not fall back memory allocations to all nodes in the system. Like the MPOL_BIND interface, it works over a set of nodes and will cause a SIGSEGV or invoke the OOM killer if memory is not available on those preferred nodes. This patch helps applications to hint a memory allocation preference node and fallback to _only_ a set of nodes if the memory is not available on the preferred node. Fallback allocation is attempted from the node wh= ich is nearest to the preferred node. This new memory policy helps applications to have explicit control on slo= w memory allocation and avoids default fallback to slow memory NUMA nodes. The difference with MPOL_BIND is the ability to specify a preferred node which is the first node in the nodemask argument passed. Cc: Ben Widawsky Cc: Dave Hansen Cc: Feng Tang Cc: Michal Hocko Cc: Andrea Arcangeli Cc: Mel Gorman Cc: Mike Kravetz Cc: Randy Dunlap Cc: Vlastimil Babka Cc: Andi Kleen Cc: Dan Williams Cc: Huang Ying b Signed-off-by: Aneesh Kumar K.V --- .../admin-guide/mm/numa_memory_policy.rst | 7 +++ include/uapi/linux/mempolicy.h | 1 + mm/mempolicy.c | 43 +++++++++++++++++-- 3 files changed, 48 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Docume= ntation/admin-guide/mm/numa_memory_policy.rst index 64fd0ba0d057..4dfdcbd22d67 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -252,6 +252,13 @@ MPOL_PREFERRED_MANY can fall back to all existing numa nodes. This is effectively MPOL_PREFERRED allowed for a mask rather than a single node. =20 +MPOL_PREFERRED_STRICT + This mode specifies that the allocation should be attempted + from the first node specified in the nodemask of the policy. + If that allocation fails, the kernel will search other nodes + in the nodemask, in order of increasing distance from the + preferred node based on information provided by the platform firmware. + NUMA memory policy supports the following optional mode flags: =20 MPOL_F_STATIC_NODES diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolic= y.h index 046d0ccba4cd..8aa1d1963235 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -23,6 +23,7 @@ enum { MPOL_INTERLEAVE, MPOL_LOCAL, MPOL_PREFERRED_MANY, + MPOL_PREFERRED_STRICT, MPOL_MAX, /* always last member of enum */ }; =20 diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 1592b081c58e..59080dd1ea69 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -407,6 +407,10 @@ static const struct mempolicy_operations mpol_ops[MP= OL_MAX] =3D { .create =3D mpol_new_nodemask, .rebind =3D mpol_rebind_preferred, }, + [MPOL_PREFERRED_STRICT] =3D { + .create =3D mpol_new_nodemask, + .rebind =3D mpol_rebind_preferred, + }, }; =20 static int migrate_page_add(struct page *page, struct list_head *pagelis= t, @@ -900,6 +904,7 @@ static void get_policy_nodemask(struct mempolicy *p, = nodemask_t *nodes) case MPOL_INTERLEAVE: case MPOL_PREFERRED: case MPOL_PREFERRED_MANY: + case MPOL_PREFERRED_STRICT: *nodes =3D p->nodes; break; case MPOL_LOCAL: @@ -1781,7 +1786,7 @@ nodemask_t *policy_nodemask(gfp_t gfp, struct mempo= licy *policy) cpuset_nodemask_valid_mems_allowed(&policy->nodes)) return &policy->nodes; =20 - if (mode =3D=3D MPOL_PREFERRED_MANY) + if (mode =3D=3D MPOL_PREFERRED_MANY || mode =3D=3D MPOL_PREFERRED_STRIC= T) return &policy->nodes; =20 return NULL; @@ -1796,7 +1801,7 @@ nodemask_t *policy_nodemask(gfp_t gfp, struct mempo= licy *policy) */ static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) { - if (policy->mode =3D=3D MPOL_PREFERRED) { + if (policy->mode =3D=3D MPOL_PREFERRED || policy->mode =3D=3D MPOL_PREF= ERRED_STRICT) { nd =3D first_node(policy->nodes); } else { /* @@ -1840,6 +1845,7 @@ unsigned int mempolicy_slab_node(void) =20 switch (policy->mode) { case MPOL_PREFERRED: + case MPOL_PREFERRED_STRICT: return first_node(policy->nodes); =20 case MPOL_INTERLEAVE: @@ -1952,7 +1958,8 @@ int huge_node(struct vm_area_struct *vma, unsigned = long addr, gfp_t gfp_flags, huge_page_shift(hstate_vma(vma))); } else { nid =3D policy_node(gfp_flags, *mpol, numa_node_id()); - if (mode =3D=3D MPOL_BIND || mode =3D=3D MPOL_PREFERRED_MANY) + if (mode =3D=3D MPOL_BIND || mode =3D=3D MPOL_PREFERRED_MANY || + mode =3D=3D MPOL_PREFERRED_STRICT) *nodemask =3D &(*mpol)->nodes; } return nid; @@ -1986,6 +1993,7 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask) switch (mempolicy->mode) { case MPOL_PREFERRED: case MPOL_PREFERRED_MANY: + case MPOL_PREFERRED_STRICT: case MPOL_BIND: case MPOL_INTERLEAVE: *mask =3D mempolicy->nodes; @@ -2072,6 +2080,23 @@ static struct page *alloc_pages_preferred_many(gfp= _t gfp, unsigned int order, return page; } =20 +static struct page *alloc_pages_preferred_strict(gfp_t gfp, unsigned int= order, + struct mempolicy *pol) +{ + int nid; + gfp_t preferred_gfp; + + /* + * With MPOL_PREFERRED_STRICT first node in the policy nodemask + * is picked as the preferred node id and the fallback allocation + * is still restricted to the preferred nodes in the nodemask. + */ + preferred_gfp =3D gfp | __GFP_NOWARN; + preferred_gfp &=3D ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL); + nid =3D first_node(pol->nodes); + return __alloc_pages(preferred_gfp, order, nid, &pol->nodes); +} + /** * alloc_pages_vma - Allocate a page for a VMA. * @gfp: GFP flags. @@ -2113,6 +2138,12 @@ struct page *alloc_pages_vma(gfp_t gfp, int order,= struct vm_area_struct *vma, goto out; } =20 + if (pol->mode =3D=3D MPOL_PREFERRED_STRICT) { + page =3D alloc_pages_preferred_strict(gfp, order, pol); + mpol_cond_put(pol); + goto out; + } + if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) { int hpage_node =3D node; =20 @@ -2193,6 +2224,8 @@ struct page *alloc_pages(gfp_t gfp, unsigned order) else if (pol->mode =3D=3D MPOL_PREFERRED_MANY) page =3D alloc_pages_preferred_many(gfp, order, numa_node_id(), pol); + else if (pol->mode =3D=3D MPOL_PREFERRED_STRICT) + page =3D alloc_pages_preferred_strict(gfp, order, pol); else page =3D __alloc_pages(gfp, order, policy_node(gfp, pol, numa_node_id()), @@ -2265,6 +2298,7 @@ bool __mpol_equal(struct mempolicy *a, struct mempo= licy *b) case MPOL_INTERLEAVE: case MPOL_PREFERRED: case MPOL_PREFERRED_MANY: + case MPOL_PREFERRED_STRICT: return !!nodes_equal(a->nodes, b->nodes); case MPOL_LOCAL: return true; @@ -2405,6 +2439,7 @@ int mpol_misplaced(struct page *page, struct vm_are= a_struct *vma, unsigned long break; =20 case MPOL_PREFERRED: + case MPOL_PREFERRED_STRICT: if (node_isset(curnid, pol->nodes)) goto out; polnid =3D first_node(pol->nodes); @@ -2866,6 +2901,7 @@ int mpol_parse_str(char *str, struct mempolicy **mp= ol) err =3D 0; goto out; case MPOL_PREFERRED_MANY: + case MPOL_PREFERRED_STRICT: case MPOL_BIND: /* * Insist on a nodelist @@ -2953,6 +2989,7 @@ void mpol_to_str(char *buffer, int maxlen, struct m= empolicy *pol) break; case MPOL_PREFERRED: case MPOL_PREFERRED_MANY: + case MPOL_PREFERRED_STRICT: case MPOL_BIND: case MPOL_INTERLEAVE: nodes =3D pol->nodes; --=20 2.31.1