From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 493C9C433E7 for ; Tue, 1 Sep 2020 14:53:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0D382206EB for ; Tue, 1 Sep 2020 14:53:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D382206EB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 99AC26B0085; Tue, 1 Sep 2020 10:53:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 94BA96B0088; Tue, 1 Sep 2020 10:53:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83ADF6B0089; Tue, 1 Sep 2020 10:53:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id 6E54E6B0085 for ; Tue, 1 Sep 2020 10:53:13 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 280D48248047 for ; Tue, 1 Sep 2020 14:53:13 +0000 (UTC) X-FDA: 77214785466.05.net35_07043e827099 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id EC88018015874 for ; Tue, 1 Sep 2020 14:53:12 +0000 (UTC) X-HE-Tag: net35_07043e827099 X-Filterd-Recvd-Size: 4878 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Tue, 1 Sep 2020 14:53:12 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A7AA1ACC3; Tue, 1 Sep 2020 14:53:11 +0000 (UTC) Date: Tue, 1 Sep 2020 16:53:10 +0200 From: Michal Hocko To: Li Xinhai Cc: Mike Kravetz , "linux-mm@kvack.org" , akpm , guro Subject: Re: [PATCH] mm/hugetlb: try preferred node first when alloc gigantic page from cma Message-ID: <20200901145310.GG16650@dhcp22.suse.cz> References: <20200830140418.605627-1-lixinhai.lxh@gmail.com> <640ddf82-26b1-3e38-5245-df481bc0756e@oracle.com> <20200901134119.GE16650@dhcp22.suse.cz> <202009012220421669005@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <202009012220421669005@gmail.com> X-Rspamd-Queue-Id: EC88018015874 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 01-09-20 22:20:44, Li Xinhai wrote: > On 2020-09-01=A0at 21:41=A0Michal Hocko=A0wrote: > >On Mon 31-08-20 14:44:40, Mike Kravetz wrote: > >> On 8/30/20 7:04 AM, Li Xinhai wrote: > >> > Since commit cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate g= igantic > >> > hugepages using cma"), the gigantic page would be allocated from n= ode > >> > which is not the preferred node, although there are pages availabl= e from > >> > that node. The reason is that the nid parameter has been ignored i= n > >> > alloc_gigantic_page(). > >> > > >> > After this patch, the preferred node is tried first before other a= llowed > >> > nodes. > >> > >> Thank you! > >> This is an issue that needs to be fixed. > >> > >> > Fixes: cf11e85fc08cc6a4 ("mm: hugetlb: optionally allocate giganti= c hugepages using cma") > >> > Cc: Roman Gushchin > >> > Cc: Mike Kravetz > >> > Cc: Michal Hocko > >> > Signed-off-by: Li Xinhai > >> > --- > >> >=A0 mm/hugetlb.c | 9 ++++++++- > >> >=A0 1 file changed, 8 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > >> > index a301c2d672bf..4a28b8853d47 100644 > >> > --- a/mm/hugetlb.c > >> > +++ b/mm/hugetlb.c > >> > @@ -1256,8 +1256,15 @@ static struct page *alloc_gigantic_page(str= uct hstate *h, gfp_t gfp_mask, > >> >=A0 struct page *page; > >> >=A0 int node; > >> >=A0 > >> > + if (hugetlb_cma[nid]) { > >> > + page =3D cma_alloc(hugetlb_cma[nid], nr_pages, > >> > + huge_page_order(h), true); > >> > + if (page) > >> > + return page; > >> > + } > >> > + > >> > >> When looking at your changes, I noticed that this code for allocatio= n > >> from CMA does not take gfp_mask into account.=A0 The 'normal' use ca= se > >> is to allocate pool pages with something similar to: > >> > >> echo 16 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > >> > >> The routine alloc_pool_huge_page will try to interleave pages among = nodes: > >> > >> ... > >>=A0=A0=A0=A0=A0=A0=A0=A0 gfp_t gfp_mask =3D htlb_alloc_mask(h) | __GF= P_THISNODE; > >> > >>=A0=A0=A0=A0=A0=A0=A0=A0 for_each_node_mask_to_alloc(h, nr_nodes, nod= e, nodes_allowed) { > >> ... > >> > >> which will eventually call alloc_gigantic_page.=A0 If __GFP_THISNODE= is > >> set we really do not want to execute the below for loop in alloc_gig= antic_page. > > > >Yes, this is the case indeed. > > > >> I think the convention in the mm code is that only the lowest level > >> allocation routines should interpret the GFP flags.=A0 We may need t= o make > >> an exception here and check for __GFP_THISNODE. > > > >Yes this is true, But alloc_gigantic_page is actually low level > >allocation routine in fact. > >=20 > Thanks for the review, we need to consider the __GFP_THISNODE flag. Yeah, my bad. Quite ugly but a larger rework would be needed to make it nicer. Not sure this is worth it. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a301c2d672bf..55baaac848da 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1256,6 +1256,16 @@ static struct page *alloc_gigantic_page(struct hst= ate *h, gfp_t gfp_mask, struct page *page; int node; =20 + if (nid !=3D NUMA_NO_NODE && hugetlb_cma[nid]) { + page =3D cma_alloc(hugetlb_cma[node], nr_pages, + huge_page_order(h), true); + if (page) + return page; + } + + if (gfp_mask & __GFP_THISNODE) + goto fallback; + for_each_node_mask(node, *nodemask) { if (!hugetlb_cma[node]) continue; @@ -1266,6 +1276,7 @@ static struct page *alloc_gigantic_page(struct hsta= te *h, gfp_t gfp_mask, return page; } } +fallback: #endif =20 return alloc_contig_pages(nr_pages, gfp_mask, nid, nodemask); --=20 Michal Hocko SUSE Labs