From: Christoph Lameter <clameter@sgi.com>
To: Mel Gorman <mel@skynet.ie>
Cc: linux-mm@kvack.org, Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
ak@suse.de, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
akpm@linux-foundation.org, pj@sgi.com
Subject: Re: NUMA policy issues with ZONE_MOVABLE
Date: Wed, 25 Jul 2007 12:31:21 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0707251212300.8820@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <20070725111646.GA9098@skynet.ie>
On Wed, 25 Jul 2007, Mel Gorman wrote:
> o check_highest_zone will be the highest populated zone that is not ZONE_MOVEABLE
> o bind_zonelist builds a zonelist of all populated zones, not policy_zone and lower
> o The page allocator checks what the highest usable zone is and ignores
> zones in the zonelist that should not be used
Which is a performance impact that we would rather avoid since we are now
filtering zonelists on every allocation. But we have other issues as well
that would be fixed by this approach.
How about changing __alloc_pages to lookup the zonelist on its own based
on a node parameter and a set of allowed nodes? That may significantly
clean up the memory policy layer and the cpuset layer. But it will
increase the effort to scan zonelists on each allocation. A large system
with 1024 nodes may have more than 1024 zones on each nodelist!
> On the second point here, policy_zone and how it is used is a bit
> mad. Particularly, its behaviour on machines with multiple zones is a
> little unpredictable with cross-platform applications potentially behaving
> different on IA64 than x86_64 for example. However, a test patch that would
> delete it looked as if it would break NUMAQ if a process was bound to nodes
> 2 and 3 but not 0 for example because slab allocations would fail. Similar,
> it would have consequences on x86_64 with NORMAL and DMA32.
Nope it would not fail. NUMAQ has policy_zone == HIGHMEM and slab
allocations do not use highmem. Memory policies are not applied to slab
allocs on NUMAQ. Thus slab allocations will use node 0 even
if you restrict allocs to node 2 and 3.
> Here is the patch just to handle policies with ZONE_MOVABLE. The highest
> zone still gets treated as it does today but allocations using ZONE_MOVABLE
> will still be policied. It has been boot-tested and a basic compile job run
> on a x86_64 NUMA machine (elm3b6 on test.kernel.org). Is there a
> standard test for regression testing policies?
There is a test in the numactl package by Andi Kleen.
> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> index e147cf5..5bdd656 100644
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -166,7 +166,7 @@ extern enum zone_type policy_zone;
>
> static inline void check_highest_zone(enum zone_type k)
> {
> - if (k > policy_zone)
> + if (k > policy_zone && k != ZONE_MOVABLE)
> policy_zone = k;
> }
That actually cleans up stuff...
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 71b84b4..e798be5 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -149,7 +144,7 @@ static struct zonelist *bind_zonelist(nodemask_t *nodes)
> lower zones etc. Avoid empty zones because the memory allocator
> doesn't like them. If you implement node hot removal you
> have to fix that. */
> - k = policy_zone;
> + k = MAX_NR_ZONES - 1;
k = ZONE_MOVABLE?
> while (1) {
> for_each_node_mask(nd, *nodes) {
> struct zone *z = &NODE_DATA(nd)->node_zones[k];
So bind zonelists now include two zones per node: The origin of
ZONE_MOVABLE and ZONE_MOVABLE.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 40954fb..22485d5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1157,6 +1157,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order,
> nodemask_t *allowednodes = NULL;/* zonelist_cache approximation */
> int zlc_active = 0; /* set if using zonelist_cache */
> int did_zlc_setup = 0; /* just call zlc_setup() one time */
> + enum zone_type highest_zoneidx;
>
> zonelist_scan:
> /*
> @@ -1165,10 +1166,23 @@ zonelist_scan:
> */
> z = zonelist->zones;
>
> + /* For memory policies, get the highest allowed zone by the flags */
> + if (NUMA_BUILD)
> + highest_zoneidx = gfp_zone(gfp_mask);
> +
> do {
> if (NUMA_BUILD && zlc_active &&
> !zlc_zone_worth_trying(zonelist, z, allowednodes))
> continue;
> +
> + /*
> + * In NUMA, this could be a policy zonelist which contains
> + * zones that may not be allowed by the current gfp_mask.
> + * Check the zone is allowed by the current flags
> + */
> + if (NUMA_BUILD && zone_idx(*z) > highest_zoneidx)
> + continue;
> +
Skip the zones that are higher?
So for a __GFP_MOVABLE alloc we would scan all zones and for
policy_zone just the policy zone.
Lee should probably also review this in detail since he has recent
experience fiddling around with memory policies. Paul has also
experience in this area.
Maybe this can actually help to deal with some of the corner cases of
memory policies (just hope the performance impact is not significant).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-07-25 19:31 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-25 4:20 Christoph Lameter
2007-07-25 4:47 ` Nick Piggin
2007-07-25 5:05 ` Christoph Lameter
2007-07-25 5:24 ` Nick Piggin
2007-07-25 6:00 ` Christoph Lameter
2007-07-25 6:09 ` Nick Piggin
2007-07-25 9:32 ` Andi Kleen
2007-07-25 6:36 ` KAMEZAWA Hiroyuki
2007-07-25 11:16 ` Mel Gorman
2007-07-25 14:30 ` Lee Schermerhorn
2007-07-25 19:31 ` Christoph Lameter [this message]
2007-07-26 4:15 ` KAMEZAWA Hiroyuki
2007-07-26 4:53 ` Christoph Lameter
2007-07-26 7:41 ` KAMEZAWA Hiroyuki
2007-07-26 16:16 ` Mel Gorman
2007-07-26 18:03 ` Christoph Lameter
2007-07-26 18:26 ` Mel Gorman
2007-07-26 13:23 ` Mel Gorman
2007-07-26 18:07 ` Christoph Lameter
2007-07-26 22:59 ` Mel Gorman
2007-07-27 1:22 ` Christoph Lameter
2007-07-27 8:20 ` Mel Gorman
2007-07-27 15:45 ` Mel Gorman
2007-07-27 17:35 ` Christoph Lameter
2007-07-27 17:46 ` Mel Gorman
2007-07-27 18:38 ` Christoph Lameter
2007-07-27 18:00 ` [PATCH] Document Linux Memory Policy - V2 Lee Schermerhorn
2007-07-27 18:38 ` Randy Dunlap
2007-07-27 19:01 ` Lee Schermerhorn
2007-07-27 19:21 ` Randy Dunlap
2007-07-27 18:55 ` Christoph Lameter
2007-07-27 19:24 ` Lee Schermerhorn
2007-07-31 15:14 ` Mel Gorman
2007-07-31 16:34 ` Lee Schermerhorn
2007-07-31 19:10 ` Christoph Lameter
2007-07-31 19:46 ` Lee Schermerhorn
2007-07-31 19:58 ` Christoph Lameter
2007-07-31 20:23 ` Lee Schermerhorn
2007-07-31 20:48 ` [PATCH] Document Linux Memory Policy - V3 Lee Schermerhorn
2007-08-03 13:52 ` Mel Gorman
2007-07-28 7:28 ` NUMA policy issues with ZONE_MOVABLE KAMEZAWA Hiroyuki
2007-07-28 11:57 ` Mel Gorman
2007-07-28 14:10 ` KAMEZAWA Hiroyuki
2007-07-28 14:21 ` KAMEZAWA Hiroyuki
2007-07-30 12:41 ` Mel Gorman
2007-07-30 18:06 ` Christoph Lameter
2007-07-27 14:24 ` Lee Schermerhorn
2007-08-01 18:59 ` Lee Schermerhorn
2007-08-02 0:36 ` KAMEZAWA Hiroyuki
2007-08-02 17:10 ` Mel Gorman
2007-08-02 17:51 ` Lee Schermerhorn
2007-07-26 18:09 ` Lee Schermerhorn
2007-08-02 14:09 ` Mel Gorman
2007-08-02 18:56 ` Christoph Lameter
2007-08-02 19:42 ` Mel Gorman
2007-08-02 19:52 ` Christoph Lameter
2007-08-03 9:32 ` Mel Gorman
2007-08-03 16:36 ` Christoph Lameter
2007-07-25 14:27 ` Lee Schermerhorn
2007-07-25 17:39 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0707251212300.8820@schroedinger.engr.sgi.com \
--to=clameter@sgi.com \
--cc=Lee.Schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@skynet.ie \
--cc=pj@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox