From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15EA0C433E2 for ; Tue, 30 Jun 2020 21:25:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CA86D20663 for ; Tue, 30 Jun 2020 21:25:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA86D20663 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CF1A28D0010; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB1108D0007; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E2108D000E; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id 66A7E8D000D for ; Tue, 30 Jun 2020 17:25:32 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 31D032DFD for ; Tue, 30 Jun 2020 21:25:32 +0000 (UTC) X-FDA: 76987159704.03.pet05_351143926e7b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 0661C28A4E9 for ; Tue, 30 Jun 2020 21:25:31 +0000 (UTC) X-HE-Tag: pet05_351143926e7b X-Filterd-Recvd-Size: 6380 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Tue, 30 Jun 2020 21:25:30 +0000 (UTC) IronPort-SDR: h00sLeAe0FYICXMuaS1LaZgpSLQ4Tc+rAKf+8H27j8d9nHLAAQhTyGBMt2R2TDll9TMzOVPxtg V5B07ma759ug== X-IronPort-AV: E=McAfee;i="6000,8403,9668"; a="126011324" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126011324" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:25 -0700 IronPort-SDR: jiH7xBlVLhLIbymLk1VPjHyNfRig7LjaND1D47uVIY79yl6f1VN84qqwM4SZDL/oeYdj6uXFnR nc9G6savvdFA== X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="481336282" Received: from schittin-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.132.42]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 14:25:25 -0700 From: Ben Widawsky To: linux-mm , linux-kernel@vger.kernel.org Cc: Michal Hocko , Dave Hansen , Ben Widawsky , Andrew Morton , Vlastimil Babka Subject: [PATCH 08/12] mm/mempolicy: Create a page allocator for policy Date: Tue, 30 Jun 2020 14:25:13 -0700 Message-Id: <20200630212517.308045-9-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200630212517.308045-1-ben.widawsky@intel.com> References: <20200630212517.308045-1-ben.widawsky@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 0661C28A4E9 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds a helper function which takes care of handling multiple preferred nodes. It will be called by future patches that need to handle this, specifically VMA based page allocation, and task based page allocation. Huge pages don't quite fit the same pattern because they use different underlying page allocation functions. This consumes the previous interleave policy specific allocation function to make a one stop shop for policy based allocation. For now, only interleaved policy will be used so there should be no functional change yet. However, if bisection points to issues in the next few commits, it was likely the fault of this patch. Similar functionality is offered via policy_node() and policy_nodemask(). By themselves however, neither can achieve this fallback style of sets of nodes. Cc: Andrew Morton Cc: Michal Hocko CC: Vlastimil Babka Signed-off-by: Ben Widawsky --- mm/mempolicy.c | 60 +++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 47 insertions(+), 13 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 3b38c9c4e580..1009cf90ad37 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2199,22 +2199,56 @@ bool mempolicy_nodemask_intersects(struct task_st= ruct *tsk, return ret; } =20 -/* Allocate a page in interleaved policy. - Own path because it needs to do special accounting. */ -static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, - unsigned nid) +/* Handle page allocation for all but interleaved policies */ +static struct page *alloc_pages_policy(struct mempolicy *pol, gfp_t gfp, + unsigned int order, int preferred_nid) { struct page *page; + gfp_t gfp_mask =3D gfp; =20 - page =3D __alloc_pages(gfp, order, nid); - /* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled */ - if (!static_branch_likely(&vm_numa_stat_key)) + if (pol->mode =3D=3D MPOL_INTERLEAVE) { + page =3D __alloc_pages(gfp, order, preferred_nid); + /* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled *= / + if (!static_branch_likely(&vm_numa_stat_key)) + return page; + if (page && page_to_nid(page) =3D=3D preferred_nid) { + preempt_disable(); + __inc_numa_state(page_zone(page), NUMA_INTERLEAVE_HIT); + preempt_enable(); + } return page; - if (page && page_to_nid(page) =3D=3D nid) { - preempt_disable(); - __inc_numa_state(page_zone(page), NUMA_INTERLEAVE_HIT); - preempt_enable(); } + + VM_BUG_ON(preferred_nid !=3D NUMA_NO_NODE); + + preferred_nid =3D numa_node_id(); + + /* + * There is a two pass approach implemented here for + * MPOL_PREFERRED_MANY. In the first pass we pretend the preferred node= s + * are bound, but allow the allocation to fail. The below table explain= s + * how this is achieved. + * + * | Policy | preferred nid | nodemask | + * |-------------------------------|---------------|------------| + * | MPOL_DEFAULT | local | NULL | + * | MPOL_PREFERRED | best | NULL | + * | MPOL_INTERLEAVE | ERR | ERR | + * | MPOL_BIND | local | pol->nodes | + * | MPOL_PREFERRED_MANY | best | pol->nodes | + * | MPOL_PREFERRED_MANY (round 2) | local | NULL | + * +-------------------------------+---------------+------------+ + */ + if (pol->mode =3D=3D MPOL_PREFERRED_MANY) + gfp_mask |=3D __GFP_RETRY_MAYFAIL; + + page =3D __alloc_pages_nodemask(gfp_mask, order, + policy_node(gfp, pol, preferred_nid), + policy_nodemask(gfp, pol)); + + if (unlikely(!page && pol->mode =3D=3D MPOL_PREFERRED_MANY)) + page =3D __alloc_pages_nodemask(gfp, order, preferred_nid, NULL); + return page; } =20 @@ -2256,8 +2290,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_are= a_struct *vma, unsigned nid; =20 nid =3D interleave_nid(pol, vma, addr, PAGE_SHIFT + order); + page =3D alloc_pages_policy(pol, gfp, order, nid); mpol_cond_put(pol); - page =3D alloc_page_interleave(gfp, order, nid); goto out; } =20 @@ -2341,7 +2375,7 @@ struct page *alloc_pages_current(gfp_t gfp, unsigne= d order) * nor system default_policy */ if (pol->mode =3D=3D MPOL_INTERLEAVE) - page =3D alloc_page_interleave(gfp, order, interleave_nodes(pol)); + page =3D alloc_pages_policy(pol, gfp, order, interleave_nodes(pol)); else page =3D __alloc_pages_nodemask(gfp, order, policy_node(gfp, pol, numa_node_id()), --=20 2.27.0