linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Anton Blanchard <anton@samba.org>
Cc: linux-mm@kvack.org, clameter@SGI.com, ak@suse.de,
	nish.aravamudan@gmail.com, mel@csn.ul.ie, apw@shadowen.org,
	Christoph Lameter <clameter@sgi.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric Whitney <eric.whitney@hp.com>
Subject: Re: [PATCH] Fix hugetlb pool allocation with empty nodes - V2
Date: Fri, 04 May 2007 16:29:02 -0400	[thread overview]
Message-ID: <1178310543.5236.43.camel@localhost> (raw)
In-Reply-To: <20070503022107.GA13592@kryten>

On Wed, 2007-05-02 at 21:21 -0500, Anton Blanchard wrote:
> An interesting bug was pointed out to me where we failed to allocate
> hugepages evenly. In the example below node 7 has no memory (it only has
> CPUs). Node 0 and 1 have plenty of free memory. After doing:

Here's my attempt to fix the problem [I see it on HP platforms as well],
without removing the population check in build_zonelists_node().  Seems
to work.

[Because I had to rebase the patch to 21-rc7-mm2 where I'm working, I
just refreshed the entire patch, instead of creating an incremental
patch on top of Anton's.]

---------------------------------------------------------------------

[PATCH] Fix hugetlb pool allocation with empty nodes V2

Against 2.6.21-rc7-mm2

Changes V1 [Anton]  -> V2 [Lee]:

1) reverted the populated_zone() check in build_zonelists_node to avoid
   empty zones in the allocation zonelists.

2) added a populated_zone() check to alloc_fresh_huge_page().  Skip
   nodes whose zone corresponding to GFP_HIGHUSER is empty.

--------
Original description:

An interesting bug was pointed out to me where we failed to allocate
hugepages evenly. In the example below node 7 has no memory (it only has
CPUs). Node 0 and 1 have plenty of free memory. After doing:

# echo 16 > /proc/sys/vm/nr_hugepages

We see the imbalance:

# cat /sys/devices/system/node/node*/meminfo|grep HugePages_Total
Node 0 HugePages_Total:     6
Node 1 HugePages_Total:     10
Node 7 HugePages_Total:     0

It didnt take long to realise that alloc_fresh_huge_page is allocating
from node 7 without GFP_THISNODE set, so we fallback to its next
preferred node (ie 1). This means we end up with a 1/3 2/3 imbalance.

After fixing this it still didnt work, and after some more poking I see
why. When building our fallback zonelist in build_zonelists_node we
skip empty zones. This means zone 7 never registers node 7's empty
zonelists and instead registers node 1's. Therefore when we ask for a
page from node 7, using the GFP_THISNODE flag we end up with node 1
memory.

<snip bit about removing pop check from build_zonelists_node...>

Add zone population check to alloc_fresh_huge_page() and skip nodes 
with unpopulated zone.

V2 testing:

Tested on 4-node, 32GB HP NUMA platform with funky 512MB pseudo-zone
for hardware interleaved memory.  The pseudo-zone contains only ZONE_DMA
memory.  Without this patch, after "echo 64 >/proc/sys/vm/nr_hugepages",
"cat /sys/devices/system/node/node*/meminfo | grep HugeP" would yield:

Node 0 HugePages_Total:    25
Node 0 HugePages_Free:     25
Node 1 HugePages_Total:    13
Node 1 HugePages_Free:     13
Node 2 HugePages_Total:    13
Node 2 HugePages_Free:     13
Node 3 HugePages_Total:    13
Node 3 HugePages_Free:     13
Node 4 HugePages_Total:     0
Node 4 HugePages_Free:      0

With patch:

Node 0 HugePages_Total:    16
Node 0 HugePages_Free:     16
Node 1 HugePages_Total:    16
Node 1 HugePages_Free:     16
Node 2 HugePages_Total:    16
Node 2 HugePages_Free:     16
Node 3 HugePages_Total:    16
Node 3 HugePages_Free:     16
Node 4 HugePages_Total:     0
Node 4 HugePages_Free:      0


Originally
Signed-off-by: Anton Blanchard <anton@samba.org>

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/hugetlb.c |   23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

Index: Linux/mm/hugetlb.c
===================================================================
--- Linux.orig/mm/hugetlb.c	2007-05-04 15:41:10.000000000 -0400
+++ Linux/mm/hugetlb.c	2007-05-04 15:48:22.000000000 -0400
@@ -107,11 +107,24 @@ static int alloc_fresh_huge_page(void)
 {
 	static int nid = 0;
 	struct page *page;
-	page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN,
-					HUGETLB_PAGE_ORDER);
-	nid = next_node(nid, node_online_map);
-	if (nid == MAX_NUMNODES)
-		nid = first_node(node_online_map);
+	int start_nid = nid;
+
+	do {
+		pg_data_t *pgdat =  NODE_DATA(nid);
+		struct zone *zone = pgdat->node_zones + gfp_zone(GFP_HIGHUSER);
+
+		/*
+		 * accept only nodes with populated "HIGHUSER" zone
+		 */
+		if (populated_zone(zone))
+			page = alloc_pages_node(nid,
+					GFP_HIGHUSER|__GFP_COMP|GFP_THISNODE,
+  					HUGETLB_PAGE_ORDER);
+
+		nid = next_node(nid, node_online_map);
+		if (nid == MAX_NUMNODES)
+			nid = first_node(node_online_map);
+	} while (!page && nid != start_nid);
 	if (page) {
 		set_compound_page_dtor(page, free_huge_page);
 		spin_lock(&hugetlb_lock);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2007-05-04 20:29 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-03  2:21 [PATCH] Fix hugetlb pool allocation with empty nodes Anton Blanchard
2007-05-03  3:02 ` Christoph Lameter
2007-05-03  6:07   ` Anton Blanchard
2007-05-03  6:37     ` Christoph Lameter
2007-05-03  8:59 ` Andi Kleen
2007-05-03 13:22   ` Anton Blanchard
2007-05-04 20:29 ` Lee Schermerhorn [this message]
2007-05-04 21:27   ` [PATCH] Fix hugetlb pool allocation with empty nodes - V2 Christoph Lameter
2007-05-04 22:39     ` Nish Aravamudan
2007-05-07 13:40     ` Lee Schermerhorn
2007-05-09 16:37     ` [PATCH] Fix hugetlb pool allocation with empty nodes - V2 -> V3 Lee Schermerhorn
2007-05-09 16:57       ` Christoph Lameter
2007-05-09 19:17         ` Lee Schermerhorn
2007-05-16 17:27           ` Nish Aravamudan
2007-05-16 20:01             ` Lee Schermerhorn
2007-05-09 19:59       ` Nish Aravamudan
2007-05-09 20:37         ` Lee Schermerhorn
2007-05-09 20:54           ` Christoph Lameter
2007-05-09 22:34           ` Nish Aravamudan
2007-05-15 16:30             ` Lee Schermerhorn
2007-05-16 23:47               ` Nish Aravamudan
2007-05-16 19:59       ` Nish Aravamudan
2007-05-16 20:32         ` Lee Schermerhorn
2007-05-16 22:17         ` [PATCH/RFC] Fix hugetlb pool allocation with empty nodes - V4 Lee Schermerhorn
2007-05-18  0:30           ` Nish Aravamudan
2007-05-21 14:57             ` Lee Schermerhorn
2007-05-21 17:51               ` Nish Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1178310543.5236.43.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=anton@samba.org \
    --cc=apw@shadowen.org \
    --cc=clameter@SGI.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=nish.aravamudan@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox