From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=MVzO=AL=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-10.0 required=3.0
	tests=HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 15EA0C433E2
	for <linux-mm@archiver.kernel.org>; Tue, 30 Jun 2020 21:25:44 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id CA86D20663
	for <linux-mm@archiver.kernel.org>; Tue, 30 Jun 2020 21:25:43 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA86D20663
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id CF1A28D0010; Tue, 30 Jun 2020 17:25:32 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id BB1108D0007; Tue, 30 Jun 2020 17:25:32 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 9E2108D000E; Tue, 30 Jun 2020 17:25:32 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23])
	by kanga.kvack.org (Postfix) with ESMTP id 66A7E8D000D
	for <linux-mm@kvack.org>; Tue, 30 Jun 2020 17:25:32 -0400 (EDT)
Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id 31D032DFD
	for <linux-mm@kvack.org>; Tue, 30 Jun 2020 21:25:32 +0000 (UTC)
X-FDA: 76987159704.03.pet05_351143926e7b
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin03.hostedemail.com (Postfix) with ESMTP id 0661C28A4E9
	for <linux-mm@kvack.org>; Tue, 30 Jun 2020 21:25:31 +0000 (UTC)
X-HE-Tag: pet05_351143926e7b
X-Filterd-Recvd-Size: 6380
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
	by imf25.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 30 Jun 2020 21:25:30 +0000 (UTC)
IronPort-SDR: h00sLeAe0FYICXMuaS1LaZgpSLQ4Tc+rAKf+8H27j8d9nHLAAQhTyGBMt2R2TDll9TMzOVPxtg
 V5B07ma759ug==
X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011324"
X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; 
   d="scan'208";a="126011324"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga005.fm.intel.com ([10.253.24.32])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:25 -0700
IronPort-SDR: jiH7xBlVLhLIbymLk1VPjHyNfRig7LjaND1D47uVIY79yl6f1VN84qqwM4SZDL/oeYdj6uXFnR
 nc9G6savvdFA==
X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; 
   d="scan'208";a="481336282"
Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42])
  by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:25 -0700
From: Ben Widawsky <ben.widawsky@intel.com>
To: linux-mm <linux-mm@kvack.org>,
	linux-kernel@vger.kernel.org
Cc: Michal Hocko <mhocko@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Ben Widawsky <ben.widawsky@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: [PATCH 08/12] mm/mempolicy: Create a page allocator for policy
Date: Tue, 30 Jun 2020 14:25:13 -0700
Message-Id: <20200630212517.308045-9-ben.widawsky@intel.com>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com>
References: <20200630212517.308045-1-ben.widawsky@intel.com>
MIME-Version: 1.0
X-Rspamd-Queue-Id: 0661C28A4E9
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam01
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

This patch adds a helper function which takes care of handling multiple
preferred nodes. It will be called by future patches that need to handle
this, specifically VMA based page allocation, and task based page
allocation. Huge pages don't quite fit the same pattern because they use
different underlying page allocation functions. This consumes the
previous interleave policy specific allocation function to make a one
stop shop for policy based allocation.

For now, only interleaved policy will be used so there should be no
functional change yet. However, if bisection points to issues in the
next few commits, it was likely the fault of this patch.

Similar functionality is offered via policy_node() and
policy_nodemask(). By themselves however, neither can achieve this
fallback style of sets of nodes.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
CC: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
 mm/mempolicy.c | 60 +++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 47 insertions(+), 13 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 3b38c9c4e580..1009cf90ad37 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2199,22 +2199,56 @@ bool mempolicy_nodemask_intersects(struct task_st=
ruct *tsk,
 	return ret;
 }
=20
-/* Allocate a page in interleaved policy.
-   Own path because it needs to do special accounting. */
-static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
-					unsigned nid)
+/* Handle page allocation for all but interleaved policies */
+static struct page *alloc_pages_policy(struct mempolicy *pol, gfp_t gfp,
+				       unsigned int order, int preferred_nid)
 {
 	struct page *page;
+	gfp_t gfp_mask =3D gfp;
=20
-	page =3D __alloc_pages(gfp, order, nid);
-	/* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled */
-	if (!static_branch_likely(&vm_numa_stat_key))
+	if (pol->mode =3D=3D MPOL_INTERLEAVE) {
+		page =3D __alloc_pages(gfp, order, preferred_nid);
+		/* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled *=
/
+		if (!static_branch_likely(&vm_numa_stat_key))
+			return page;
+		if (page && page_to_nid(page) =3D=3D preferred_nid) {
+			preempt_disable();
+			__inc_numa_state(page_zone(page), NUMA_INTERLEAVE_HIT);
+			preempt_enable();
+		}
 		return page;
-	if (page && page_to_nid(page) =3D=3D nid) {
-		preempt_disable();
-		__inc_numa_state(page_zone(page), NUMA_INTERLEAVE_HIT);
-		preempt_enable();
 	}
+
+	VM_BUG_ON(preferred_nid !=3D NUMA_NO_NODE);
+
+	preferred_nid =3D numa_node_id();
+
+	/*
+	 * There is a two pass approach implemented here for
+	 * MPOL_PREFERRED_MANY. In the first pass we pretend the preferred node=
s
+	 * are bound, but allow the allocation to fail. The below table explain=
s
+	 * how this is achieved.
+	 *
+	 * | Policy                        | preferred nid | nodemask   |
+	 * |-------------------------------|---------------|------------|
+	 * | MPOL_DEFAULT                  | local         | NULL       |
+	 * | MPOL_PREFERRED                | best          | NULL       |
+	 * | MPOL_INTERLEAVE               | ERR           | ERR        |
+	 * | MPOL_BIND                     | local         | pol->nodes |
+	 * | MPOL_PREFERRED_MANY           | best          | pol->nodes |
+	 * | MPOL_PREFERRED_MANY (round 2) | local         | NULL       |
+	 * +-------------------------------+---------------+------------+
+	 */
+	if (pol->mode =3D=3D MPOL_PREFERRED_MANY)
+		gfp_mask |=3D __GFP_RETRY_MAYFAIL;
+
+	page =3D __alloc_pages_nodemask(gfp_mask, order,
+				      policy_node(gfp, pol, preferred_nid),
+				      policy_nodemask(gfp, pol));
+
+	if (unlikely(!page && pol->mode =3D=3D MPOL_PREFERRED_MANY))
+		page =3D __alloc_pages_nodemask(gfp, order, preferred_nid, NULL);
+
 	return page;
 }
=20
@@ -2256,8 +2290,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_are=
a_struct *vma,
 		unsigned nid;
=20
 		nid =3D interleave_nid(pol, vma, addr, PAGE_SHIFT + order);
+		page =3D alloc_pages_policy(pol, gfp, order, nid);
 		mpol_cond_put(pol);
-		page =3D alloc_page_interleave(gfp, order, nid);
 		goto out;
 	}
=20
@@ -2341,7 +2375,7 @@ struct page *alloc_pages_current(gfp_t gfp, unsigne=
d order)
 	 * nor system default_policy
 	 */
 	if (pol->mode =3D=3D MPOL_INTERLEAVE)
-		page =3D alloc_page_interleave(gfp, order, interleave_nodes(pol));
+		page =3D alloc_pages_policy(pol, gfp, order, interleave_nodes(pol));
 	else
 		page =3D __alloc_pages_nodemask(gfp, order,
 				policy_node(gfp, pol, numa_node_id()),
--=20
2.27.0