linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Nish Aravamudan" <nish.aravamudan@gmail.com>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>,
	Anton Blanchard <anton@samba.org>,
	linux-mm@kvack.org, ak@suse.de, mel@csn.ul.ie, apw@shadowen.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric Whitney <eric.whitney@hp.com>,
	andyw@uk.ibm.com
Subject: Re: [PATCH] Fix hugetlb pool allocation with empty nodes - V2 -> V3
Date: Wed, 16 May 2007 12:59:38 -0700	[thread overview]
Message-ID: <29495f1d0705161259p70a1e499tb831889fd2bcebcb@mail.gmail.com> (raw)
In-Reply-To: <1178728661.5047.64.camel@localhost>

On 5/9/07, Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> On Fri, 2007-05-04 at 14:27 -0700, Christoph Lameter wrote:
> > On Fri, 4 May 2007, Lee Schermerhorn wrote:
> >
> > > On Wed, 2007-05-02 at 21:21 -0500, Anton Blanchard wrote:
> > > > An interesting bug was pointed out to me where we failed to allocate
> > > > hugepages evenly. In the example below node 7 has no memory (it only has
> > > > CPUs). Node 0 and 1 have plenty of free memory. After doing:
> > >
> > > Here's my attempt to fix the problem [I see it on HP platforms as well],
> > > without removing the population check in build_zonelists_node().  Seems
> > > to work.
> >
> > I think we need something like for_each_online_node for each node with
> > memory otherwise we are going to replicate this all over the place for
> > memoryless nodes. Add a nodemap for populated nodes?
> >
> > I.e.
> >
> > for_each_mem_node?
> >
> > Then you do not have to check the zone flags all the time. May avoid a lot
> > of mess?
>
> OK, here's a rework that exports a node_populated_map and associated
> access functions from page_alloc.c where we already check for populated
> zones.  Maybe this should be "node_hugepages_map" ?
>
> Also, we might consider exporting this to user space for applications
> that want to "interleave across all nodes with hugepages"--not that
> hugetlbfs mappings currently obey "vma policy".  Could still be used
> with the "set task policy before allocating region" method [not that I
> advocate this method ;-)].
>
> I don't think that a 'for_each_*_node()' macro is appropriate for this
> usage, as allocate_fresh_huge_page() is an "incremental allocator" that
> returns a page from the "next eligible node" on each call.
>
> By the way:  does anything protect the "static int nid" in
> allocate_fresh_huge_page() from racing attempts to set nr_hugepages?
> Can this happen?  Do we care?
>
> Again, I chose to rework Anton's original patch, maintaining his
> rationale/discussion, rather create a separate patch.  Note the "Rework"
> comments therein--especially regarding NORMAL zone.  I expect we'll need
> a few more rounds of "discussion" on this issue.  And, it'll require
> rework to merge with the "change zonelist order" series that hits the
> same area.
>
> Lee
>
> [PATCH] Fix hugetlb pool allocation with empty nodes - V3

<snip>

===================================================================
> --- Linux.orig/mm/page_alloc.c  2007-05-08 11:47:45.000000000 -0400
> +++ Linux/mm/page_alloc.c       2007-05-09 11:16:27.000000000 -0400

<snip>

> @@ -2021,11 +2024,14 @@ void show_free_areas(void)
>   * Builds allocation fallback zone lists.
>   *
>   * Add all populated zones of a node to the zonelist.
> + * Record nodes with populated gfp_zone(GFP_HIGHUSER) for huge page allocation.
>   */
>  static int __meminit build_zonelists_node(pg_data_t *pgdat,
> -                       struct zonelist *zonelist, int nr_zones, enum zone_type zone_type)
> +                       struct zonelist *zonelist, int nr_zones,
> +                       enum zone_type zone_type)
>  {
>         struct zone *zone;
> +       enum zone_type zone_highuser = gfp_zone(GFP_HIGHUSER);
>
>         BUG_ON(zone_type >= MAX_NR_ZONES);
>         zone_type++;
> @@ -2036,7 +2042,10 @@ static int __meminit build_zonelists_nod
>                 if (populated_zone(zone)) {
>                         zonelist->zones[nr_zones++] = zone;
>                         check_highest_zone(zone_type);
> -               }
> +                       if (zone_type == zone_highuser)
> +                               node_set_populated(pgdat->node_id);
> +               } else if (zone_type == zone_highuser)
> +                       node_not_populated(pgdat->node_id);
>
>         } while (zone_type);
>         return nr_zones;

This completely breaks hugepage allocation on 4-node x86_64 box I have
here. Each node has <4GB of memory, so all memory is ZONE_DMA and
ZONE_DMA32. gfp_zone(GFP_HIGHUSER) is ZONE_NORMAL, though. So all
nodes are not populated by the default initialization to an empty
nodemask.

Thanks to Andy Whitcroft for helping me debug this.

I'm not sure how to fix this -- but I ran into while trying to base my
sysfs hugepage allocation patches on top of yours.

Thoughts?

Thanks,
Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2007-05-16 19:59 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-03  2:21 [PATCH] Fix hugetlb pool allocation with empty nodes Anton Blanchard
2007-05-03  3:02 ` Christoph Lameter
2007-05-03  6:07   ` Anton Blanchard
2007-05-03  6:37     ` Christoph Lameter
2007-05-03  8:59 ` Andi Kleen
2007-05-03 13:22   ` Anton Blanchard
2007-05-04 20:29 ` [PATCH] Fix hugetlb pool allocation with empty nodes - V2 Lee Schermerhorn
2007-05-04 21:27   ` Christoph Lameter
2007-05-04 22:39     ` Nish Aravamudan
2007-05-07 13:40     ` Lee Schermerhorn
2007-05-09 16:37     ` [PATCH] Fix hugetlb pool allocation with empty nodes - V2 -> V3 Lee Schermerhorn
2007-05-09 16:57       ` Christoph Lameter
2007-05-09 19:17         ` Lee Schermerhorn
2007-05-16 17:27           ` Nish Aravamudan
2007-05-16 20:01             ` Lee Schermerhorn
2007-05-09 19:59       ` Nish Aravamudan
2007-05-09 20:37         ` Lee Schermerhorn
2007-05-09 20:54           ` Christoph Lameter
2007-05-09 22:34           ` Nish Aravamudan
2007-05-15 16:30             ` Lee Schermerhorn
2007-05-16 23:47               ` Nish Aravamudan
2007-05-16 19:59       ` Nish Aravamudan [this message]
2007-05-16 20:32         ` Lee Schermerhorn
2007-05-16 22:17         ` [PATCH/RFC] Fix hugetlb pool allocation with empty nodes - V4 Lee Schermerhorn
2007-05-18  0:30           ` Nish Aravamudan
2007-05-21 14:57             ` Lee Schermerhorn
2007-05-21 17:51               ` Nish Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29495f1d0705161259p70a1e499tb831889fd2bcebcb@mail.gmail.com \
    --to=nish.aravamudan@gmail.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=andyw@uk.ibm.com \
    --cc=anton@samba.org \
    --cc=apw@shadowen.org \
    --cc=clameter@sgi.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox