Re: [PATCH 4/6] hugetlb: derive huge pages nodes allowed from task mempolicy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-numa@vger.kernel.org, mel@csn.ul.ie,
	randy.dunlap@oracle.com, nacc@us.ibm.com, rientjes@google.com,
	agl@us.ibm.com, apw@canonical.com, eric.whitney@hp.com
Subject: Re: [PATCH 4/6] hugetlb:  derive huge pages nodes allowed from task mempolicy
Date: Fri, 11 Sep 2009 09:12:27 -0400	[thread overview]
Message-ID: <1252674747.4392.229.camel@useless.americas.hpqcorp.net> (raw)
In-Reply-To: <20090910161525.dce065b0.akpm@linux-foundation.org>

On Thu, 2009-09-10 at 16:15 -0700, Andrew Morton wrote:
> On Wed, 09 Sep 2009 12:31:52 -0400
> Lee Schermerhorn <lee.schermerhorn@hp.com> wrote:
> 
> > This patch derives a "nodes_allowed" node mask from the numa
> > mempolicy of the task modifying the number of persistent huge
> > pages to control the allocation, freeing and adjusting of surplus
> > huge pages.
> >
> > ...
> >
> 
> > Index: linux-2.6.31-rc7-mmotm-090827-1651/mm/mempolicy.c
> > ===================================================================
> > --- linux-2.6.31-rc7-mmotm-090827-1651.orig/mm/mempolicy.c	2009-09-09 11:57:26.000000000 -0400
> > +++ linux-2.6.31-rc7-mmotm-090827-1651/mm/mempolicy.c	2009-09-09 11:57:36.000000000 -0400
> > @@ -1564,6 +1564,57 @@ struct zonelist *huge_zonelist(struct vm
> >  	}
> >  	return zl;
> >  }
> > +
> > +/*
> > + * alloc_nodemask_of_mempolicy
> > + *
> > + * Returns a [pointer to a] nodelist based on the current task's mempolicy.
> > + *
> > + * If the task's mempolicy is "default" [NULL], return NULL for default
> > + * behavior.  Otherwise, extract the policy nodemask for 'bind'
> > + * or 'interleave' policy or construct a nodemask for 'preferred' or
> > + * 'local' policy and return a pointer to a kmalloc()ed nodemask_t.
> > + *
> > + * N.B., it is the caller's responsibility to free a returned nodemask.
> > + */
> > +nodemask_t *alloc_nodemask_of_mempolicy(void)
> > +{
> > +	nodemask_t *nodes_allowed = NULL;
> > +	struct mempolicy *mempolicy;
> > +	int nid;
> > +
> > +	if (!current->mempolicy)
> > +		return NULL;
> > +
> > +	mpol_get(current->mempolicy);
> > +	nodes_allowed = kmalloc(sizeof(*nodes_allowed), GFP_KERNEL);
> 
> Ho hum.  I guess a caller which didn't permit GFP_KERNEL would be
> pretty lame.
> 
> > +	if (!nodes_allowed)
> > +		return NULL;		/* silently default */
> 
> Missed an mpol_put().

Ah, yes.  But, see below...

> 
> > +	nodes_clear(*nodes_allowed);
> > +	mempolicy = current->mempolicy;
> > +	switch (mempolicy->mode) {
> > +	case MPOL_PREFERRED:
> > +		if (mempolicy->flags & MPOL_F_LOCAL)
> > +			nid = numa_node_id();
> > +		else
> > +			nid = mempolicy->v.preferred_node;
> > +		node_set(nid, *nodes_allowed);
> > +		break;
> > +
> > +	case MPOL_BIND:
> > +		/* Fall through */
> > +	case MPOL_INTERLEAVE:
> > +		*nodes_allowed =  mempolicy->v.nodes;
> > +		break;
> > +
> > +	default:
> > +		BUG();
> > +	}
> > +
> > +	mpol_put(current->mempolicy);
> > +	return nodes_allowed;
> > +}
> 
> Do we actually need the mpol_get()/put here?  Can some other process
> really some in and trash a process's current->mempolicy when that
> process isn't looking?

You're correct.  In this context, I can/will eliminate the get/put.

We only really need the reference count in two places:

1) for shared policies [shmem, ...]:  one task could be replacing a
shared policy while another task is trying to allocate using it's
nodemask.

2) for show_numa_maps(), as we're looking at another task's mempolicy
[when vma policies default to task mempolicy].

But, here, where the current task is examining its own mempolicy, we
don't need the get/put as only the task itself can change it's
mempolicy.

> 
> If so, why the heck isn't the code racy?
> 
> static inline void mpol_get(struct mempolicy *pol)
> {
> 	if (pol)
> 		atomic_inc(&pol->refcnt);
> }
> 
> If it's possible for some other task to trash current->mempolicy then
> that trashing can happen between the `if' and the `atomic_inc', so
> we're screwed.  
> 
> So either we need some locking here or the mpol_get() isn't needed on
> current's mempolicy or the mpol_get() has some secret side-effect?

Good point.  Need to revisit this, altho' not in the context of this
series, IMO.  May need an atomic_inc_if_nonzero() or such there.

> 
> 
> Fixlets:
> 
> --- a/mm/hugetlb.c~hugetlb-derive-huge-pages-nodes-allowed-from-task-mempolicy-fix
> +++ a/mm/hugetlb.c
> @@ -1253,7 +1253,7 @@ static unsigned long set_max_huge_pages(
>  
>  	nodes_allowed = alloc_nodemask_of_mempolicy();
>  	if (!nodes_allowed) {
> -		printk(KERN_WARNING "%s unable to allocate nodes allowed mask "
> +		printk(KERN_WARNING "%s: unable to allocate nodes allowed mask "
>  			"for huge page allocation.  Falling back to default.\n",
>  			current->comm);
>  		nodes_allowed = &node_online_map;
> --- a/mm/mempolicy.c~hugetlb-derive-huge-pages-nodes-allowed-from-task-mempolicy-fix
> +++ a/mm/mempolicy.c
> @@ -1589,7 +1589,7 @@ nodemask_t *alloc_nodemask_of_mempolicy(
>  	mpol_get(current->mempolicy);
>  	nodes_allowed = kmalloc(sizeof(*nodes_allowed), GFP_KERNEL);
>  	if (!nodes_allowed)
> -		return NULL;		/* silently default */
> +		goto out;		/* silently default */
>  
>  	nodes_clear(*nodes_allowed);
>  	mempolicy = current->mempolicy;
> @@ -1611,7 +1611,7 @@ nodemask_t *alloc_nodemask_of_mempolicy(
>  	default:
>  		BUG();
>  	}
> -
> +out:
>  	mpol_put(current->mempolicy);
>  	return nodes_allowed;
>  }
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-09-11 13:12 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-09 16:31 [PATCH 0/6] hugetlb: V6 constrain allocation/free based on " Lee Schermerhorn
2009-09-09 16:31 ` [PATCH 1/6] hugetlb: rework hstate_next_node_* functions Lee Schermerhorn
2009-09-09 16:31 ` [PATCH 2/6] hugetlb: add nodemask arg to huge page alloc, free and surplus adjust fcns Lee Schermerhorn
2009-09-09 16:31 ` [PATCH 3/6] hugetlb: introduce alloc_nodemask_of_node Lee Schermerhorn
2009-09-10 23:05   ` Andrew Morton
2009-09-10 23:17     ` David Rientjes
2009-09-10 23:36       ` Andrew Morton
2009-09-10 23:43         ` David Rientjes
2009-09-11 13:11     ` Lee Schermerhorn
2009-09-11 22:38       ` David Rientjes
2009-09-09 16:31 ` [PATCH 4/6] hugetlb: derive huge pages nodes allowed from task mempolicy Lee Schermerhorn
2009-09-10 23:15   ` Andrew Morton
2009-09-11 13:12     ` Lee Schermerhorn [this message]
2009-09-09 16:31 ` [PATCH 5/6] hugetlb: add per node hstate attributes Lee Schermerhorn
2009-09-10 12:32   ` Mel Gorman
2009-09-10 14:26     ` Lee Schermerhorn
2009-09-10 19:50       ` David Rientjes
2009-09-10 19:58         ` Lee Schermerhorn
2009-09-10 23:31   ` Andrew Morton
2009-09-11 13:12     ` Lee Schermerhorn
2009-09-09 16:32 ` [PATCH 6/6] hugetlb: update hugetlb documentation for mempolicy based management Lee Schermerhorn
2009-09-09 16:32 ` [PATCH 1/3] hugetlb: use only nodes with memory for huge pages Lee Schermerhorn
2009-09-10 23:33   ` Andrew Morton
2009-09-11 13:54     ` Lee Schermerhorn
2009-09-09 16:32 ` [PATCH 2/3] hugetlb: handle memory hot-plug events Lee Schermerhorn
2009-09-09 16:32 ` [PATCH 3/3] hugetlb: offload per node attribute registrations Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1252674747.4392.229.camel@useless.americas.hpqcorp.net \
    --to=lee.schermerhorn@hp.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-numa@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=nacc@us.ibm.com \
    --cc=randy.dunlap@oracle.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox