From: Anton Blanchard <anton@samba.org>
To: linux-mm@kvack.org, clameter@SGI.com, ak@suse.de
Cc: nish.aravamudan@gmail.com, mel@csn.ul.ie, apw@shadowen.org
Subject: [PATCH] Fix hugetlb pool allocation with empty nodes
Date: Wed, 2 May 2007 21:21:07 -0500 [thread overview]
Message-ID: <20070503022107.GA13592@kryten> (raw)
An interesting bug was pointed out to me where we failed to allocate
hugepages evenly. In the example below node 7 has no memory (it only has
CPUs). Node 0 and 1 have plenty of free memory. After doing:
# echo 16 > /proc/sys/vm/nr_hugepages
We see the imbalance:
# cat /sys/devices/system/node/node*/meminfo|grep HugePages_Total
Node 0 HugePages_Total: 6
Node 1 HugePages_Total: 10
Node 7 HugePages_Total: 0
It didnt take long to realise that alloc_fresh_huge_page is allocating
from node 7 without GFP_THISNODE set, so we fallback to its next
preferred node (ie 1). This means we end up with a 1/3 2/3 imbalance.
After fixing this it still didnt work, and after some more poking I see
why. When building our fallback zonelist in build_zonelists_node we
skip empty zones. This means zone 7 never registers node 7's empty
zonelists and instead registers node 1's. Therefore when we ask for a
page from node 7, using the GFP_THISNODE flag we end up with node 1
memory.
By removing the populated_zone() check in build_zonelists_node we fix
the problem:
# cat /sys/devices/system/node/node*/meminfo|grep HugePages_Total
Node 0 HugePages_Total: 8
Node 1 HugePages_Total: 8
Node 7 HugePages_Total: 0
Im guessing registering empty remote zones might make the SGI guys a bit
unhappy, maybe we should just force the registration of empty local
zones? Does anyone care?
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: kernel/mm/hugetlb.c
===================================================================
--- kernel.orig/mm/hugetlb.c 2007-05-02 20:46:03.000000000 -0500
+++ kernel/mm/hugetlb.c 2007-05-02 20:48:15.000000000 -0500
@@ -103,11 +103,18 @@ static int alloc_fresh_huge_page(void)
{
static int nid = 0;
struct page *page;
- page = alloc_pages_node(nid, GFP_HIGHUSER|__GFP_COMP|__GFP_NOWARN,
+ int start_nid = nid;
+
+ do {
+ page = alloc_pages_node(nid,
+ GFP_HIGHUSER|__GFP_COMP|GFP_THISNODE,
HUGETLB_PAGE_ORDER);
- nid = next_node(nid, node_online_map);
- if (nid == MAX_NUMNODES)
- nid = first_node(node_online_map);
+
+ nid = next_node(nid, node_online_map);
+ if (nid == MAX_NUMNODES)
+ nid = first_node(node_online_map);
+ } while (!page && nid != start_nid);
+
if (page) {
set_compound_page_dtor(page, free_huge_page);
spin_lock(&hugetlb_lock);
Index: kernel/mm/page_alloc.c
===================================================================
--- kernel.orig/mm/page_alloc.c 2007-05-02 20:46:03.000000000 -0500
+++ kernel/mm/page_alloc.c 2007-05-02 20:47:59.000000000 -0500
@@ -1659,10 +1659,8 @@ static int __meminit build_zonelists_nod
do {
zone_type--;
zone = pgdat->node_zones + zone_type;
- if (populated_zone(zone)) {
- zonelist->zones[nr_zones++] = zone;
- check_highest_zone(zone_type);
- }
+ zonelist->zones[nr_zones++] = zone;
+ check_highest_zone(zone_type);
} while (zone_type);
return nr_zones;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2007-05-03 2:21 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-03 2:21 Anton Blanchard [this message]
2007-05-03 3:02 ` Christoph Lameter
2007-05-03 6:07 ` Anton Blanchard
2007-05-03 6:37 ` Christoph Lameter
2007-05-03 8:59 ` Andi Kleen
2007-05-03 13:22 ` Anton Blanchard
2007-05-04 20:29 ` [PATCH] Fix hugetlb pool allocation with empty nodes - V2 Lee Schermerhorn
2007-05-04 21:27 ` Christoph Lameter
2007-05-04 22:39 ` Nish Aravamudan
2007-05-07 13:40 ` Lee Schermerhorn
2007-05-09 16:37 ` [PATCH] Fix hugetlb pool allocation with empty nodes - V2 -> V3 Lee Schermerhorn
2007-05-09 16:57 ` Christoph Lameter
2007-05-09 19:17 ` Lee Schermerhorn
2007-05-16 17:27 ` Nish Aravamudan
2007-05-16 20:01 ` Lee Schermerhorn
2007-05-09 19:59 ` Nish Aravamudan
2007-05-09 20:37 ` Lee Schermerhorn
2007-05-09 20:54 ` Christoph Lameter
2007-05-09 22:34 ` Nish Aravamudan
2007-05-15 16:30 ` Lee Schermerhorn
2007-05-16 23:47 ` Nish Aravamudan
2007-05-16 19:59 ` Nish Aravamudan
2007-05-16 20:32 ` Lee Schermerhorn
2007-05-16 22:17 ` [PATCH/RFC] Fix hugetlb pool allocation with empty nodes - V4 Lee Schermerhorn
2007-05-18 0:30 ` Nish Aravamudan
2007-05-21 14:57 ` Lee Schermerhorn
2007-05-21 17:51 ` Nish Aravamudan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070503022107.GA13592@kryten \
--to=anton@samba.org \
--cc=ak@suse.de \
--cc=apw@shadowen.org \
--cc=clameter@SGI.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=nish.aravamudan@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox