Re: [PATCH 4/11] hugetlb: derive huge pages nodes allowed from task mempolicy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: linux-mm@kvack.org, linux-numa@vger.kernel.org,
	akpm@linux-foundation.org, Mel Gorman <mel@csn.ul.ie>,
	Randy Dunlap <randy.dunlap@oracle.com>,
	Nishanth Aravamudan <nacc@us.ibm.com>,
	Adam Litke <agl@us.ibm.com>, Andy Whitcroft <apw@canonical.com>,
	eric.whitney@hp.com
Subject: Re: [PATCH 4/11] hugetlb:  derive huge pages nodes allowed from task mempolicy
Date: Wed, 7 Oct 2009 13:09:32 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.1.00.0910071254290.1928@chino.kir.corp.google.com> (raw)
In-Reply-To: <1254933058.4483.223.camel@useless.americas.hpqcorp.net>

On Wed, 7 Oct 2009, Lee Schermerhorn wrote:

> > > Index: linux-2.6.31-mmotm-090925-1435/mm/hugetlb.c
> > > ===================================================================
> > > --- linux-2.6.31-mmotm-090925-1435.orig/mm/hugetlb.c	2009-09-30 12:48:45.000000000 -0400
> > > +++ linux-2.6.31-mmotm-090925-1435/mm/hugetlb.c	2009-10-02 21:22:04.000000000 -0400
> > > @@ -1334,29 +1334,71 @@ static struct hstate *kobj_to_hstate(str
> > >  	return NULL;
> > >  }
> > >  
> > > -static ssize_t nr_hugepages_show(struct kobject *kobj,
> > > +static ssize_t nr_hugepages_show_common(struct kobject *kobj,
> > >  					struct kobj_attribute *attr, char *buf)
> > >  {
> > >  	struct hstate *h = kobj_to_hstate(kobj);
> > >  	return sprintf(buf, "%lu\n", h->nr_huge_pages);
> > >  }
> > > -static ssize_t nr_hugepages_store(struct kobject *kobj,
> > > -		struct kobj_attribute *attr, const char *buf, size_t count)
> > > +static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
> > > +			struct kobject *kobj, struct kobj_attribute *attr,
> > > +			const char *buf, size_t len)
> > >  {
> > >  	int err;
> > > -	unsigned long input;
> > > +	unsigned long count;
> > >  	struct hstate *h = kobj_to_hstate(kobj);
> > > +	NODEMASK_ALLOC(nodemask, nodes_allowed);
> > >  
> > 
> > In the two places you do NODEMASK_ALLOC(), here and 
> > hugetlb_sysctl_handler(), you'll need to check that nodes_allowed is 
> > non-NULL since it's possible that kmalloc() will return NULL for 
> > CONFIG_NODES_SHIFT > 8.
> > 
> > In such a case, it's probably sufficient to simply set nodes_allowed to 
> > node_states[N_HIGH_MEMORY] so that we can still free hugepages when we're 
> > oom, a common memory freeing tactic.
> > 
> > You could do that by simply returning false from 
> > init_nodemask_of_mempolicy() if !nodes_allowed since NODEMASK_FREE() can 
> > take a NULL pointer, but it may be easier to factor that logic into your 
> > conditional below:
> > 
> > > -	err = strict_strtoul(buf, 10, &input);
> > > +	err = strict_strtoul(buf, 10, &count);
> > >  	if (err)
> > >  		return 0;
> > >  
> > > -	h->max_huge_pages = set_max_huge_pages(h, input, &node_online_map);
> > > +	if (!(obey_mempolicy && init_nodemask_of_mempolicy(nodes_allowed))) {
> > > +		NODEMASK_FREE(nodes_allowed);
> > > +		nodes_allowed = &node_online_map;
> > > +	}
> > > +	h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed);
> > >
> > 
> > You can get away with just testing !nodes_allowed here since the stack 
> > allocation variation of NODEMASK_ALLOC() is such that nodes_allowed will 
> > always be an initialized pointer pointing to _nodes_allowed so you won't 
> > have an uninitialized warning.
> > 
> > Once that's done, you can get rid of the check for a NULL nodes_allowed in 
> > try_to_free_low() from patch 2 since it will always be valid in 
> > set_max_huge_pages().
> 
> 
> OK.  already removed the NULL check from try_to_free_low().  And I made
> the change to init_nodemask_of_mempolicy to return false on NULL mask.
> 
> I'm not completely happy with dropping back to default behavior
> [node_online_map here, replaced with node_states[N_HIGH_MEMORY] in
> subsequent patch] on failure to allocate nodes_allowed.  We only do the
> NODEMASK_ALLOC when we've come in from either nr_hugepages_mempolicy or
> a per node attribute [subsequent patch], so I'm not sure that ignoring
> the mempolicy, if any, or the specified node id, is a good thing here.
> Not silently, at least.  I haven't addressed this, yet.  We can submit
> an incremental patch.  Thoughts?
> 

Hmm, it's debatable since the NODEMASK_ALLOC() slab allocation is 
GFP_KERNEL which would cause direct reclaim (and perhaps even the oom 
killer) to free memory.  If the oom killer were invoked, current would 
probably even be killed because of how the oom killer works for 
CONSTRAINT_MEMORY_POLICY.  So the end result is that the pages would 
eventually be freed because current would get access to memory reserves 
via TIF_MEMDIE but would die immediately after returning.  It was nice of 
current to sacrifice itself like that.

Unfortunately, I think the long term solution is that NODEMASK_ALLOC() is 
going to require a gfp parameter to pass to kmalloc() and in this case we 
should union __GFP_NORETRY.  Then, if nodes_allowed can't be allocated I 
think it would be better to simply return -ENOMEM to userspace so it can 
either reduce the number of global hugepages or free memory in another 
way.  (There might be a caveat where the user's mempolicy already includes 
all online nodes and they use nr_hugepages_mempolicy where they couldn't 
free hugepages because of -ENOMEM but could via nr_hugepages, but I don't 
think you need to address that.)

The worst case allocation is probably 512 bytes for CONFIG_NODES_SHIFT of 
12 so I don't think using __GFP_NORETRY here is going to be that 
ridiculous.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-10-07 20:09 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-06  3:17 [PATCH 0/11] hugetlb: V9 numa control of persistent huge pages alloc/free Lee Schermerhorn
2009-10-06  3:17 ` [PATCH 1/11] hugetlb: rework hstate_next_node_* functions Lee Schermerhorn
2009-10-06  3:17 ` [PATCH 2/11] hugetlb: add nodemask arg to huge page alloc, free and surplus adjust fcns Lee Schermerhorn
2009-10-06  9:09   ` David Rientjes
2009-10-07  3:26   ` David Rientjes
2009-10-07 14:13     ` Lee Schermerhorn
2009-10-06  3:17 ` [PATCH 3/11] hugetlb: factor init_nodemask_of_node Lee Schermerhorn
2009-10-07  3:21   ` David Rientjes
2009-10-06  3:18 ` [PATCH 4/11] hugetlb: derive huge pages nodes allowed from task mempolicy Lee Schermerhorn
2009-10-07  3:26   ` David Rientjes
2009-10-07 16:30     ` Lee Schermerhorn
2009-10-07 20:09       ` David Rientjes [this message]
2009-10-06  3:18 ` [PATCH 5/11] hugetlb: accomodate reworked NODEMASK_ALLOC Lee Schermerhorn
2009-10-06  3:18 ` [PATCH 6/11] hugetlb: add generic definition of NUMA_NO_NODE Lee Schermerhorn
2009-10-06  9:28   ` David Rientjes
2009-10-06  3:18 ` [PATCH 7/11] hugetlb: add per node hstate attributes Lee Schermerhorn
2009-10-07  4:04   ` David Rientjes
2009-10-06  3:18 ` [PATCH 8/11] hugetlb: update hugetlb documentation for NUMA controls Lee Schermerhorn
2009-10-06  3:18 ` [PATCH 9/11] hugetlb: use only nodes with memory for huge pages Lee Schermerhorn
2009-10-06  3:18 ` [PATCH 10/11] hugetlb: handle memory hot-plug events Lee Schermerhorn
2009-10-07  4:12   ` David Rientjes
2009-10-06  3:19 ` [PATCH 11/11] hugetlb: offload per node attribute registrations Lee Schermerhorn
2009-10-06 16:01   ` Andi Kleen
2009-10-06 16:28     ` Lee Schermerhorn
2009-10-06 16:46       ` Andi Kleen
2009-10-06 17:57         ` Lee Schermerhorn
2009-10-07  8:24   ` [patch] mm: clear node in N_HIGH_MEMORY and stop kswapd when all memory is offlined David Rientjes
2009-10-07 14:25     ` Christoph Lameter
2009-10-07 16:48     ` Lee Schermerhorn
2009-10-07 19:53       ` David Rientjes
2009-10-06 16:02 ` [PATCH 0/11] hugetlb: V9 numa control of persistent huge pages alloc/free Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2009-09-15 20:43 [PATCH 0/11] hugetlb: V7 constrain allocation/free based on task mempolicy Lee Schermerhorn
2009-09-15 20:44 ` [PATCH 4/11] hugetlb: derive huge pages nodes allowed from " Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.1.00.0910071254290.1928@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-numa@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=nacc@us.ibm.com \
    --cc=randy.dunlap@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox