From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 355B86B005A for ; Wed, 17 Jun 2009 09:18:05 -0400 (EDT) Date: Wed, 17 Jun 2009 14:18:33 +0100 From: Mel Gorman Subject: Re: [PATCH 1/5] Free huge pages round robin to balance across nodes Message-ID: <20090617131833.GG28529@csn.ul.ie> References: <20090616135228.25248.22018.sendpatchset@lts-notebook> <20090616135236.25248.93692.sendpatchset@lts-notebook> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20090616135236.25248.93692.sendpatchset@lts-notebook> Sender: owner-linux-mm@kvack.org To: Lee Schermerhorn Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Nishanth Aravamudan , Adam Litke , Andy Whitcroft , eric.whitney@hp.com List-ID: On Tue, Jun 16, 2009 at 09:52:36AM -0400, Lee Schermerhorn wrote: > [PATCH 1/5] Free huge pages round robin to balance across nodes > > Against: 17may09 mmotm > > Currently, altho' increasing nr_hugepages will [attempt to] > distribute the new huge pages across all nodes in the system, > reducing nr_hugepages will free or surplus all free pages > from nodes in node id order. This patch frees huges pages > from nodes in round robin fashion in an attempt to keep > [persistent] hugepage allocates balanced across the nodes. > > New function free_pool_huge_page() is modeled on and > performs roughly the inverse of alloc_fresh_huge_page(). > Replaces dequeue_huge_page() which now has no callers > and can be removed. > > Helper function hstate_next_to_free_node() uses new hstate > member next_to_free_nid to distribute "frees" across all > nodes with huge pages. > > I placed this patch first in the series because I think it > [or something similar] should be applied independent of the > rest of the series. > Agreed. Reading though, I can't see any problems with the patch and it does make the freeing symmetric with the allocation. For clarity though, would it be worth renaming hugetlb_next_nid to next_to_alloc_nid so that there is a clear relationship in the round-robin allocation and freeing of pages amoung online nodes? > Signed-off-by: Lee Schermerhorn > > include/linux/hugetlb.h | 1 > mm/hugetlb.c | 68 +++++++++++++++++++++++++++++++++--------------- > 2 files changed, 48 insertions(+), 21 deletions(-) > > Index: linux-2.6.30-rc8-mmotm-090603-1633/include/linux/hugetlb.h > =================================================================== > --- linux-2.6.30-rc8-mmotm-090603-1633.orig/include/linux/hugetlb.h 2009-06-04 12:59:29.000000000 -0400 > +++ linux-2.6.30-rc8-mmotm-090603-1633/include/linux/hugetlb.h 2009-06-04 12:59:31.000000000 -0400 > @@ -184,6 +184,7 @@ unsigned long hugetlb_get_unmapped_area( > /* Defines one hugetlb page size */ > struct hstate { > int hugetlb_next_nid; > + int next_to_free_nid; > unsigned int order; > unsigned long mask; > unsigned long max_huge_pages; > Index: linux-2.6.30-rc8-mmotm-090603-1633/mm/hugetlb.c > =================================================================== > --- linux-2.6.30-rc8-mmotm-090603-1633.orig/mm/hugetlb.c 2009-06-04 12:59:29.000000000 -0400 > +++ linux-2.6.30-rc8-mmotm-090603-1633/mm/hugetlb.c 2009-06-04 12:59:31.000000000 -0400 > @@ -455,24 +455,6 @@ static void enqueue_huge_page(struct hst > h->free_huge_pages_node[nid]++; > } > > -static struct page *dequeue_huge_page(struct hstate *h) > -{ > - int nid; > - struct page *page = NULL; > - > - for (nid = 0; nid < MAX_NUMNODES; ++nid) { > - if (!list_empty(&h->hugepage_freelists[nid])) { > - page = list_entry(h->hugepage_freelists[nid].next, > - struct page, lru); > - list_del(&page->lru); > - h->free_huge_pages--; > - h->free_huge_pages_node[nid]--; > - break; > - } > - } > - return page; > -} > - > static struct page *dequeue_huge_page_vma(struct hstate *h, > struct vm_area_struct *vma, > unsigned long address, int avoid_reserve) > @@ -683,6 +665,52 @@ static int alloc_fresh_huge_page(struct > return ret; > } > > +/* > + * helper for free_pool_huge_page() - find next node > + * from which to free a huge page > + */ > +static int hstate_next_to_free_node(struct hstate *h) > +{ > + int next_nid; > + next_nid = next_node(h->next_to_free_nid, node_online_map); > + if (next_nid == MAX_NUMNODES) > + next_nid = first_node(node_online_map); > + h->next_to_free_nid = next_nid; > + return next_nid; > +} > + > +/* > + * Free huge page from pool from next node to free. > + * Attempt to keep persistent huge pages more or less > + * balanced over allowed nodes. > + * Called with hugetlb_lock locked. > + */ > +static int free_pool_huge_page(struct hstate *h) > +{ > + int start_nid; > + int nid; > + int ret = 0; > + > + start_nid = h->next_to_free_nid; > + nid = h->next_to_free_nid; > + > + do { > + if (!list_empty(&h->hugepage_freelists[nid])) { > + struct page *page = > + list_entry(h->hugepage_freelists[nid].next, > + struct page, lru); > + list_del(&page->lru); > + h->free_huge_pages--; > + h->free_huge_pages_node[nid]--; > + update_and_free_page(h, page); > + ret = 1; > + } > + nid = hstate_next_to_free_node(h); > + } while (!ret && nid != start_nid); > + > + return ret; > +} > + > static struct page *alloc_buddy_huge_page(struct hstate *h, > struct vm_area_struct *vma, unsigned long address) > { > @@ -1226,10 +1254,8 @@ static unsigned long set_max_huge_pages( > min_count = max(count, min_count); > try_to_free_low(h, min_count); > while (min_count < persistent_huge_pages(h)) { > - struct page *page = dequeue_huge_page(h); > - if (!page) > + if (!free_pool_huge_page(h)) > break; > - update_and_free_page(h, page); > } > while (count < persistent_huge_pages(h)) { > if (!adjust_pool_surplus(h, 1)) > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org