From: mel@skynet.ie (Mel Gorman)
To: Christoph Lameter <clameter@sgi.com>
Cc: Lee.Schermerhorn@hp.com, ak@suse.de,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask
Date: Tue, 21 Aug 2007 10:12:02 +0100 [thread overview]
Message-ID: <20070821091202.GE29794@skynet.ie> (raw)
In-Reply-To: <Pine.LNX.4.64.0708171417510.9635@schroedinger.engr.sgi.com>
On (17/08/07 14:29), Christoph Lameter didst pronounce:
> On Fri, 17 Aug 2007, Mel Gorman wrote:
>
> > @@ -696,6 +696,16 @@ static inline struct zonelist *node_zone
> > return &NODE_DATA(nid)->node_zonelist;
> > }
> >
> > +static inline int zone_in_nodemask(unsigned long zone_addr,
> > + nodemask_t *nodes)
> > +{
> > +#ifdef CONFIG_NUMA
> > + return node_isset(zonelist_zone(zone_addr)->node, *nodes);
> > +#else
> > + return 1;
> > +#endif /* CONFIG_NUMA */
> > +}
> > +
>
> This is dereferencind the zone in a filtering operation. I wonder if
> we could encode the node in the zone_addr as well? x86_64 aligns zones on
> page boundaries. So we have 10 bits left after taking 2 for the zone id.
>
I had considered it but not gotten around to an implementation. A quick
look shows that it is likely to be a win on x86_64 and ppc64 as in those
places NODES_SHIFT is small enough to fit into the lower bits of the
zone addresses. It does not appear to be the case on IA-64 though. The
INTERNODE_CACHE_SHIFT will be around 7 but the NODES_SHIFT defaults to
10 so it will not fit.
I'll try it out anyway.
> > -int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl)
> > +int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
> > {
> > - int i;
> > -
> > - for (i = 0; zl->_zones[i]; i++) {
> > - int nid = zone_to_nid(zonelist_zone(zl->_zones[i]));
> > + int nid;
> >
> > + for_each_node_mask(nid, *nodemask)
> > if (node_isset(nid, current->mems_allowed))
> > return 1;
> > - }
> > +
> > return 0;
>
> Hmmm... This is equivalent to
>
> nodemask_t temp;
>
> nodes_and(temp, nodemask, current->mems_allowed);
> return !nodes_empty(temp);
>
> which avoids the loop over all nodes.
>
Good point. I've replaced the code with your version.
> > - }
> > - if (num == 0) {
> > - kfree(zl);
> > - return ERR_PTR(-EINVAL);
> > + for_each_node_mask(nd, *nodemask) {
> > + struct zone *z = &NODE_DATA(nd)->node_zones[k];
> > + if (z->present_pages > 0)
> > + return 1;
>
> Here you could use an and with the N_HIGH_MEMORY or N_NORMAL_MEMORY
> nodemask.
>
I'm basing against 2.6.23-rc3 at the moment. I'll add an additional
patch later to use the N_HIGH_MEMORy and N_NORMAL_MEMORY nodemasks.
> > @@ -1149,12 +1125,19 @@ unsigned slab_node(struct mempolicy *pol
> > case MPOL_INTERLEAVE:
> > return interleave_nodes(policy);
> >
> > - case MPOL_BIND:
> > + case MPOL_BIND: {
>
> No { } needed.
>
> > /*
> > * Follow bind policy behavior and start allocation at the
> > * first node.
> > */
> > - return zone_to_nid(zonelist_zone(policy->v.zonelist->_zones[0]));
> > + struct zonelist *zonelist;
> > + unsigned long *z;
Without the {}, it would fail to compile here
> > + enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL);
> > + zonelist = &NODE_DATA(numa_node_id())->node_zonelist;
> > + z = first_zones_zonelist(zonelist, &policy->v.nodes,
> > + highest_zoneidx);
> > + return zone_to_nid(zonelist_zone(*z));
> > + }
> >
> > case MPOL_PREFERRED:
> > if (policy->v.preferred_node >= 0)
>
> > @@ -1330,14 +1314,6 @@ struct mempolicy *__mpol_copy(struct mem
> > }
> > *new = *old;
> > atomic_set(&new->refcnt, 1);
> > - if (new->policy == MPOL_BIND) {
> > - int sz = ksize(old->v.zonelist);
> > - new->v.zonelist = kmemdup(old->v.zonelist, sz, GFP_KERNEL);
> > - if (!new->v.zonelist) {
> > - kmem_cache_free(policy_cache, new);
> > - return ERR_PTR(-ENOMEM);
> > - }
> > - }
> > return new;
>
> That is a good optimization.
>
Thanks
> > @@ -1680,32 +1647,6 @@ void mpol_rebind_policy(struct mempolicy
> > *mpolmask, *newmask);
> > *mpolmask = *newmask;
> > break;
> > - case MPOL_BIND: {
> > - nodemask_t nodes;
> > - unsigned long *z;
> > - struct zonelist *zonelist;
> > -
> > - nodes_clear(nodes);
> > - for (z = pol->v.zonelist->_zones; *z; z++)
> > - node_set(zone_to_nid(zonelist_zone(*z)), nodes);
> > - nodes_remap(tmp, nodes, *mpolmask, *newmask);
> > - nodes = tmp;
> > -
> > - zonelist = bind_zonelist(&nodes);
> > -
> > - /* If no mem, then zonelist is NULL and we keep old zonelist.
> > - * If that old zonelist has no remaining mems_allowed nodes,
> > - * then zonelist_policy() will "FALL THROUGH" to MPOL_DEFAULT.
> > - */
> > -
> > - if (!IS_ERR(zonelist)) {
> > - /* Good - got mem - substitute new zonelist */
> > - kfree(pol->v.zonelist);
> > - pol->v.zonelist = zonelist;
> > - }
> > - *mpolmask = *newmask;
> > - break;
> > - }
>
> Simply dropped? We still need to recalculate the node_mask depending on
> the new cpuset environment!
>
It's not simply dropped. The previous patch chunk made the MPOL_BIND case
falls through to take the same action as MPOL_INTERLEAVE. Is that wrong?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-08-21 9:12 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-17 20:16 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v4 Mel Gorman
2007-08-17 20:17 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-08-17 20:17 ` [PATCH 2/6] Use one zonelist that is filtered instead of multiple zonelists Mel Gorman
2007-08-17 20:59 ` Christoph Lameter
2007-08-21 8:51 ` Mel Gorman
2007-08-17 20:17 ` [PATCH 3/6] Embed zone_id information within the zonelist->zones pointer Mel Gorman
2007-08-17 21:02 ` Christoph Lameter
2007-08-21 8:54 ` Mel Gorman
2007-08-17 20:18 ` [PATCH 4/6] Record how many zones can be safely skipped in the zonelist Mel Gorman
2007-08-17 21:03 ` Christoph Lameter
2007-08-21 8:58 ` Mel Gorman
2007-08-17 20:18 ` [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2007-08-17 21:29 ` Christoph Lameter
2007-08-21 9:12 ` Mel Gorman [this message]
2007-08-17 20:18 ` [PATCH 6/6] Do not use FASTCALL for __alloc_pages_nodemask() Mel Gorman
2007-08-17 21:07 ` Christoph Lameter
2007-08-18 12:51 ` Andi Kleen
2007-08-21 10:25 ` Mel Gorman
2007-09-11 15:19 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5 Mel Gorman
2007-09-11 15:21 ` [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2007-09-11 21:30 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5 (resend) Mel Gorman
2007-09-11 21:31 ` [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2007-09-12 21:04 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v6 Mel Gorman
2007-09-12 21:06 ` [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2007-09-12 21:23 ` Christoph Lameter
2007-09-13 10:25 ` Mel Gorman
2007-09-13 15:49 ` Lee Schermerhorn
2007-09-13 17:52 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v7 Mel Gorman
2007-09-13 17:53 ` [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2007-09-28 14:23 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v8 Mel Gorman
2007-09-28 14:25 ` [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2007-09-28 15:37 ` Lee Schermerhorn
2007-09-28 18:28 ` Mel Gorman
2007-09-28 18:38 ` Paul Jackson
2007-09-28 21:03 ` Lee Schermerhorn
2007-11-09 14:32 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v9 Mel Gorman
2007-11-09 14:34 ` [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2008-02-29 5:01 ` Paul Jackson
2008-02-29 14:49 ` Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070821091202.GE29794@skynet.ie \
--to=mel@skynet.ie \
--cc=Lee.Schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=clameter@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox