From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: clameter@sgi.com, anton@samba.org, wli@holomorphy.com,
linux-mm@kvack.org
Subject: Re: [RFC][PATCH 1/2] hugetlb: search harder for memory in alloc_fresh_huge_page()
Date: Tue, 07 Aug 2007 16:15:22 -0400 [thread overview]
Message-ID: <1186517722.5067.31.camel@localhost> (raw)
In-Reply-To: <20070807171432.GY15714@us.ibm.com>
On Tue, 2007-08-07 at 10:14 -0700, Nishanth Aravamudan wrote:
> hugetlb: search harder for memory in alloc_fresh_huge_page()
>
> Currently, alloc_fresh_huge_page() returns NULL when it is not able to
> allocate a huge page on the current node, as specified by its custom
> interleave variable. The callers of this function, though, assume that a
> failure in alloc_fresh_huge_page() indicates no hugepages can be
> allocated on the system period. This might not be the case, for
> instance, if we have an uneven NUMA system, and we happen to try to
> allocate a hugepage on a node with less memory and fail, while there is
> still plenty of free memory on the other nodes.
>
> To correct this, make alloc_fresh_huge_page() search through all online
> nodes before deciding no hugepages can be allocated. Add a helper
> function for actually allocating the hugepage.
>
> While there are interleave interfaces that could be exported from the
> mempolicy layer, that seems like an inappropriate design decision. Work
> is needed on a subsystem-level interleaving interface, but I'm still not
> quite sure how that should look. Hence the custom interleaving here.
>
> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
>
> ---
> I split up patch 1/5 into two bits, as they are really two logical
> changes. Does this look better, Christoph?
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d7ca59d..17a377e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -101,36 +101,59 @@ static void free_huge_page(struct page *page)
> spin_unlock(&hugetlb_lock);
> }
>
> -static int alloc_fresh_huge_page(void)
> +static struct page *alloc_fresh_huge_page_node(int nid)
> {
> - static int prev_nid;
> struct page *page;
> - int nid;
> -
> - /*
> - * Copy static prev_nid to local nid, work on that, then copy it
> - * back to prev_nid afterwards: otherwise there's a window in which
> - * a racer might pass invalid nid MAX_NUMNODES to alloc_pages_node.
> - * But we don't need to use a spin_lock here: it really doesn't
> - * matter if occasionally a racer chooses the same nid as we do.
> - */
> - nid = next_node(prev_nid, node_online_map);
> - if (nid == MAX_NUMNODES)
> - nid = first_node(node_online_map);
> - prev_nid = nid;
>
> - page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN,
> - HUGETLB_PAGE_ORDER);
> + page = alloc_pages_node(nid,
> + htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|__GFP_NOWARN,
> + HUGETLB_PAGE_ORDER);
> if (page) {
> set_compound_page_dtor(page, free_huge_page);
> spin_lock(&hugetlb_lock);
> nr_huge_pages++;
> - nr_huge_pages_node[page_to_nid(page)]++;
> + nr_huge_pages_node[nid]++;
Not that I don't trust __GFP_THISNODE, but may I suggest a
"VM_BUG_ON(page_to_nid(page) != nid)" -- up above the spin_lock(), of
course. Better yet, add the assertion and drop this one line change?
This isn't a hot path, I think.
> spin_unlock(&hugetlb_lock);
> put_page(page); /* free it into the hugepage allocator */
> - return 1;
> }
> - return 0;
> +
> + return page;
> +}
> +
> +static int alloc_fresh_huge_page(void)
> +{
> + static int nid = -1;
> + struct page *page;
> + int start_nid;
> + int next_nid;
> + int ret = 0;
> +
> + if (nid < 0)
> + nid = first_node(node_online_map);
> + start_nid = nid;
> +
> + do {
> + page = alloc_fresh_huge_page_node(nid);
> + if (page)
> + ret = 1;
> + /*
> + * Use a helper variable to find the next node and then
> + * copy it back to nid nid afterwards: otherwise there's
> + * a window in which a racer might pass invalid nid
> + * MAX_NUMNODES to alloc_pages_node. But we don't need
> + * to use a spin_lock here: it really doesn't matter if
> + * occasionally a racer chooses the same nid as we do.
> + * Move nid forward in the mask even if we just
> + * successfully allocated a hugepage so that the next
> + * caller gets hugepages on the next node.
> + */
> + next_nid = next_node(nid, node_online_map);
> + if (next_nid == MAX_NUMNODES)
> + next_nid = first_node(node_online_map);
> + nid = next_nid;
> + } while (!page && nid != start_nid);
> +
> + return ret;
> }
>
> static struct page *alloc_huge_page(struct vm_area_struct *vma,
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-08-07 20:15 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-07 17:14 Nishanth Aravamudan
2007-08-07 17:15 ` [RFC][PATCH 2/2][V10] hugetlb: fix pool allocation with memoryless nodes Nishanth Aravamudan
2007-08-07 20:15 ` Lee Schermerhorn [this message]
2007-08-07 22:12 ` [RFC][PATCH 1/2][UPDATED] hugetlb: search harder for memory in alloc_fresh_huge_page() Nishanth Aravamudan
2007-08-07 22:54 ` Christoph Lameter
2007-08-07 23:02 ` Nishanth Aravamudan
2007-08-08 0:14 ` Christoph Lameter
2007-08-08 1:32 ` Nishanth Aravamudan
2007-08-08 13:20 ` Lee Schermerhorn
2007-08-08 13:17 ` Lee Schermerhorn
2007-08-07 23:34 ` Nishanth Aravamudan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1186517722.5067.31.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=anton@samba.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=nacc@us.ibm.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox