Re: [PATCH 1/3] Balance Freeing of Huge Pages across Nodes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: linux-mm@kvack.org, linux-numa@vger.org,
	akpm@linux-foundation.org, Nishanth Aravamudan <nacc@us.ibm.com>,
	David Rientjes <rientjes@google.com>, Adam Litke <agl@us.ibm.com>,
	Andy Whitcroft <apw@canonical.com>,
	eric.whitney@hp.com
Subject: Re: [PATCH 1/3] Balance Freeing of Huge Pages across Nodes
Date: Tue, 30 Jun 2009 14:05:16 +0100	[thread overview]
Message-ID: <20090630130515.GD17561@csn.ul.ie> (raw)
In-Reply-To: <20090629215234.20038.62303.sendpatchset@lts-notebook>

On Mon, Jun 29, 2009 at 05:52:34PM -0400, Lee Schermerhorn wrote:
> [PATCH] 1/3 Balance Freeing of Huge Pages across Nodes
> 
> Against:  25jun09 mmotm
> 
> Free huges pages from nodes in round robin fashion in an
> attempt to keep [persistent a.k.a static] hugepages balanced
> across nodes
> 
> New function free_pool_huge_page() is modeled on and
> performs roughly the inverse of alloc_fresh_huge_page().
> Replaces dequeue_huge_page() which now has no callers,
> so this patch removes it.
> 
> Helper function hstate_next_node_to_free() uses new hstate
> member next_to_free_nid to distribute "frees" across all
> nodes with huge pages.
> 
> V2:
> 
> At Mel Gorman's suggestion:  renamed hstate_next_node() to
> hstate_next_node_to_alloc() for symmetry.  Also, renamed
> hstate member hugetlb_next_node to next_node_to_free.
> ["hugetlb" is implicit in the hstate struct, I think].
> 
> New in this version:
> 
> Modified adjust_pool_surplus() to use hstate_next_node_to_alloc()
> and hstate_next_node_to_free() to advance node id for adjusting
> surplus huge page count, as this is equivalent to allocating and
> freeing persistent huge pages.  [Can't blame Mel for this part.]
> 
> V3:
> 
> Minor cleanup: rename 'nid' to 'next_nid' in free_pool_huge_page() to
> better match alloc_fresh_huge_page() conventions.
> 
> Acked-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  include/linux/hugetlb.h |    3 -
>  mm/hugetlb.c            |  132 +++++++++++++++++++++++++++++++-----------------
>  2 files changed, 88 insertions(+), 47 deletions(-)
> 
> Index: linux-2.6.31-rc1-mmotm-090625-1549/include/linux/hugetlb.h
> ===================================================================
> --- linux-2.6.31-rc1-mmotm-090625-1549.orig/include/linux/hugetlb.h	2009-06-29 10:21:12.000000000 -0400
> +++ linux-2.6.31-rc1-mmotm-090625-1549/include/linux/hugetlb.h	2009-06-29 10:27:18.000000000 -0400
> @@ -183,7 +183,8 @@ unsigned long hugetlb_get_unmapped_area(
>  #define HSTATE_NAME_LEN 32
>  /* Defines one hugetlb page size */
>  struct hstate {
> -	int hugetlb_next_nid;
> +	int next_nid_to_alloc;
> +	int next_nid_to_free;
>  	unsigned int order;
>  	unsigned long mask;
>  	unsigned long max_huge_pages;
> Index: linux-2.6.31-rc1-mmotm-090625-1549/mm/hugetlb.c
> ===================================================================
> --- linux-2.6.31-rc1-mmotm-090625-1549.orig/mm/hugetlb.c	2009-06-29 10:21:12.000000000 -0400
> +++ linux-2.6.31-rc1-mmotm-090625-1549/mm/hugetlb.c	2009-06-29 15:53:55.000000000 -0400
> @@ -455,24 +455,6 @@ static void enqueue_huge_page(struct hst
>  	h->free_huge_pages_node[nid]++;
>  }
>  
> -static struct page *dequeue_huge_page(struct hstate *h)
> -{
> -	int nid;
> -	struct page *page = NULL;
> -
> -	for (nid = 0; nid < MAX_NUMNODES; ++nid) {
> -		if (!list_empty(&h->hugepage_freelists[nid])) {
> -			page = list_entry(h->hugepage_freelists[nid].next,
> -					  struct page, lru);
> -			list_del(&page->lru);
> -			h->free_huge_pages--;
> -			h->free_huge_pages_node[nid]--;
> -			break;
> -		}
> -	}
> -	return page;
> -}
> -
>  static struct page *dequeue_huge_page_vma(struct hstate *h,
>  				struct vm_area_struct *vma,
>  				unsigned long address, int avoid_reserve)
> @@ -640,7 +622,7 @@ static struct page *alloc_fresh_huge_pag
>  
>  /*
>   * Use a helper variable to find the next node and then
> - * copy it back to hugetlb_next_nid afterwards:
> + * copy it back to next_nid_to_alloc afterwards:
>   * otherwise there's a window in which a racer might
>   * pass invalid nid MAX_NUMNODES to alloc_pages_exact_node.
>   * But we don't need to use a spin_lock here: it really
> @@ -649,13 +631,13 @@ static struct page *alloc_fresh_huge_pag
>   * if we just successfully allocated a hugepage so that
>   * the next caller gets hugepages on the next node.
>   */
> -static int hstate_next_node(struct hstate *h)
> +static int hstate_next_node_to_alloc(struct hstate *h)
>  {
>  	int next_nid;
> -	next_nid = next_node(h->hugetlb_next_nid, node_online_map);
> +	next_nid = next_node(h->next_nid_to_alloc, node_online_map);
>  	if (next_nid == MAX_NUMNODES)
>  		next_nid = first_node(node_online_map);
> -	h->hugetlb_next_nid = next_nid;
> +	h->next_nid_to_alloc = next_nid;
>  	return next_nid;
>  }
>  

Strictly speaking, next_nid_to_alloc looks more like last_nid_alloced but I
don't think it makes an important difference. Implementing it this way is
shorter and automatically ensures next_nid is an online node. 

If you wanted to be pedantic, I think the following untested code would
make it really next_nid_to_alloc but I don't think it's terribly
important.

static int hstate_next_node_to_alloc(struct hstate *h)
{
	int this_nid = h->next_nid_to_alloc;

	/* Check the node didn't get off-lined since */
	if (unlikely(!node_online(next_nid))) {
		this_nid = next_node(h->next_nid_to_alloc, node_online_map);
		h->next_nid_to_alloc = this_nid;
	}

	h->next_nid_to_alloc = next_node(h->next_nid_to_alloc, node_online_map);
	if (h->next_nid_to_alloc == MAX_NUMNODES)
		h->next_nid_to_alloc = first_node(node_online_map);

	return this_nid;
}

> @@ -666,14 +648,15 @@ static int alloc_fresh_huge_page(struct 
>  	int next_nid;
>  	int ret = 0;
>  
> -	start_nid = h->hugetlb_next_nid;
> +	start_nid = h->next_nid_to_alloc;
> +	next_nid = start_nid;
>  
>  	do {
> -		page = alloc_fresh_huge_page_node(h, h->hugetlb_next_nid);
> +		page = alloc_fresh_huge_page_node(h, next_nid);
>  		if (page)
>  			ret = 1;
> -		next_nid = hstate_next_node(h);
> -	} while (!page && h->hugetlb_next_nid != start_nid);
> +		next_nid = hstate_next_node_to_alloc(h);
> +	} while (!page && next_nid != start_nid);
>  
>  	if (ret)
>  		count_vm_event(HTLB_BUDDY_PGALLOC);
> @@ -683,6 +666,52 @@ static int alloc_fresh_huge_page(struct 
>  	return ret;
>  }
>  
> +/*
> + * helper for free_pool_huge_page() - find next node
> + * from which to free a huge page
> + */
> +static int hstate_next_node_to_free(struct hstate *h)
> +{
> +	int next_nid;
> +	next_nid = next_node(h->next_nid_to_free, node_online_map);
> +	if (next_nid == MAX_NUMNODES)
> +		next_nid = first_node(node_online_map);
> +	h->next_nid_to_free = next_nid;
> +	return next_nid;
> +}
> +
> +/*
> + * Free huge page from pool from next node to free.
> + * Attempt to keep persistent huge pages more or less
> + * balanced over allowed nodes.
> + * Called with hugetlb_lock locked.
> + */
> +static int free_pool_huge_page(struct hstate *h)
> +{
> +	int start_nid;
> +	int next_nid;
> +	int ret = 0;
> +
> +	start_nid = h->next_nid_to_free;
> +	next_nid = start_nid;
> +
> +	do {
> +		if (!list_empty(&h->hugepage_freelists[next_nid])) {
> +			struct page *page =
> +				list_entry(h->hugepage_freelists[next_nid].next,
> +					  struct page, lru);
> +			list_del(&page->lru);
> +			h->free_huge_pages--;
> +			h->free_huge_pages_node[next_nid]--;
> +			update_and_free_page(h, page);
> +			ret = 1;
> +		}
> +		next_nid = hstate_next_node_to_free(h);
> +	} while (!ret && next_nid != start_nid);
> +
> +	return ret;
> +}
> +
>  static struct page *alloc_buddy_huge_page(struct hstate *h,
>  			struct vm_area_struct *vma, unsigned long address)
>  {
> @@ -1007,7 +1036,7 @@ int __weak alloc_bootmem_huge_page(struc
>  		void *addr;
>  
>  		addr = __alloc_bootmem_node_nopanic(
> -				NODE_DATA(h->hugetlb_next_nid),
> +				NODE_DATA(h->next_nid_to_alloc),
>  				huge_page_size(h), huge_page_size(h), 0);
>  
>  		if (addr) {
> @@ -1019,7 +1048,7 @@ int __weak alloc_bootmem_huge_page(struc
>  			m = addr;
>  			goto found;
>  		}
> -		hstate_next_node(h);
> +		hstate_next_node_to_alloc(h);
>  		nr_nodes--;
>  	}
>  	return 0;
> @@ -1140,31 +1169,43 @@ static inline void try_to_free_low(struc
>   */
>  static int adjust_pool_surplus(struct hstate *h, int delta)
>  {
> -	static int prev_nid;
> -	int nid = prev_nid;
> +	int start_nid, next_nid;
>  	int ret = 0;
>  
>  	VM_BUG_ON(delta != -1 && delta != 1);
> -	do {
> -		nid = next_node(nid, node_online_map);
> -		if (nid == MAX_NUMNODES)
> -			nid = first_node(node_online_map);
>  
> -		/* To shrink on this node, there must be a surplus page */
> -		if (delta < 0 && !h->surplus_huge_pages_node[nid])
> -			continue;
> -		/* Surplus cannot exceed the total number of pages */
> -		if (delta > 0 && h->surplus_huge_pages_node[nid] >=
> +	if (delta < 0)
> +		start_nid = h->next_nid_to_alloc;
> +	else
> +		start_nid = h->next_nid_to_free;
> +	next_nid = start_nid;
> +
> +	do {
> +		int nid = next_nid;
> +		if (delta < 0)  {
> +			next_nid = hstate_next_node_to_alloc(h);
> +			/*
> +			 * To shrink on this node, there must be a surplus page
> +			 */
> +			if (!h->surplus_huge_pages_node[nid])
> +				continue;
> +		}
> +		if (delta > 0) {
> +			next_nid = hstate_next_node_to_free(h);
> +			/*
> +			 * Surplus cannot exceed the total number of pages
> +			 */
> +			if (h->surplus_huge_pages_node[nid] >=
>  						h->nr_huge_pages_node[nid])
> -			continue;
> +				continue;
> +		}
>  
>  		h->surplus_huge_pages += delta;
>  		h->surplus_huge_pages_node[nid] += delta;
>  		ret = 1;
>  		break;
> -	} while (nid != prev_nid);
> +	} while (next_nid != start_nid);
>  
> -	prev_nid = nid;
>  	return ret;
>  }
>  
> @@ -1226,10 +1267,8 @@ static unsigned long set_max_huge_pages(
>  	min_count = max(count, min_count);
>  	try_to_free_low(h, min_count);
>  	while (min_count < persistent_huge_pages(h)) {
> -		struct page *page = dequeue_huge_page(h);
> -		if (!page)
> +		if (!free_pool_huge_page(h))
>  			break;
> -		update_and_free_page(h, page);
>  	}
>  	while (count < persistent_huge_pages(h)) {
>  		if (!adjust_pool_surplus(h, 1))
> @@ -1441,7 +1480,8 @@ void __init hugetlb_add_hstate(unsigned 
>  	h->free_huge_pages = 0;
>  	for (i = 0; i < MAX_NUMNODES; ++i)
>  		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
> -	h->hugetlb_next_nid = first_node(node_online_map);
> +	h->next_nid_to_alloc = first_node(node_online_map);
> +	h->next_nid_to_free = first_node(node_online_map);
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
>  					huge_page_size(h)/1024);
>  

Nothing problematic jumps out at me. Even with hstate_next_node_to_alloc()
as it is;

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-06-30 13:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-29 21:52 [PATCH 0/3] " Lee Schermerhorn
2009-06-29 21:52 ` [PATCH 1/3] " Lee Schermerhorn
2009-06-30 13:05   ` Mel Gorman [this message]
2009-06-30 13:48     ` Lee Schermerhorn
2009-06-30 13:58       ` Mel Gorman
2009-06-29 21:52 ` [PATCH 2/3] Use free_pool_huge_page() to return unused surplus pages Lee Schermerhorn
2009-06-29 21:52 ` [PATCH 3/3] Cleanup and update huge pages documentation Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090630130515.GD17561@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-numa@vger.org \
    --cc=nacc@us.ibm.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox