linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: David Rientjes <rientjes@google.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	minchan.kim@gmail.com, vedran.furac@gmail.com,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>
Subject: Re: [RFC][-mm][PATCH 1/6] oom-killer: updates for classification of OOM
Date: Wed, 4 Nov 2009 09:01:16 +0900	[thread overview]
Message-ID: <20091104090116.fd319d1d.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.00.0911031150500.11821@chino.kir.corp.google.com>

On Tue, 3 Nov 2009 12:18:40 -0800 (PST)
David Rientjes <rientjes@google.com> wrote:

> On Mon, 2 Nov 2009, KAMEZAWA Hiroyuki wrote:
> 
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Rewrite oom constarint to be up to date.
> > 
> > (1). Now, at badness calculation, oom_constraint and other information
> >    (which is available easily) are ignore. Pass them.
> > 
> > (2)Adds more classes of oom constraint as _MEMCG and _LOWMEM.
> >    This is just a change for interface and doesn't add new logic, at this stage.
> > 
> > (3) Pass nodemask to oom_kill. Now alloc_pages() are totally rewritten and
> >   it uses nodemask as its argument. By this, mempolicy doesn't have its own
> >   private zonelist. So, Passing nodemask to out_of_memory() is necessary.
> >   But, pagefault_out_of_memory() doesn't have enough information. We should
> >   visit this again, later.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  drivers/char/sysrq.c |    2 -
> >  fs/proc/base.c       |    4 +-
> >  include/linux/oom.h  |    8 +++-
> >  mm/oom_kill.c        |  101 +++++++++++++++++++++++++++++++++++++++------------
> >  mm/page_alloc.c      |    2 -
> >  5 files changed, 88 insertions(+), 29 deletions(-)
> > 
> > Index: mmotm-2.6.32-Nov2/include/linux/oom.h
> > ===================================================================
> > --- mmotm-2.6.32-Nov2.orig/include/linux/oom.h
> > +++ mmotm-2.6.32-Nov2/include/linux/oom.h
> > @@ -10,23 +10,27 @@
> >  #ifdef __KERNEL__
> >  
> >  #include <linux/types.h>
> > +#include <linux/nodemask.h>
> >  
> >  struct zonelist;
> >  struct notifier_block;
> >  
> >  /*
> > - * Types of limitations to the nodes from which allocations may occur
> > + * Types of limitations to zones from which allocations may occur
> >   */
> >  enum oom_constraint {
> >  	CONSTRAINT_NONE,
> > +	CONSTRAINT_LOWMEM,
> >  	CONSTRAINT_CPUSET,
> >  	CONSTRAINT_MEMORY_POLICY,
> > +	CONSTRAINT_MEMCG
> >  };
> >  
> >  extern int try_set_zone_oom(struct zonelist *zonelist, gfp_t gfp_flags);
> >  extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
> >  
> > -extern void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, int order);
> > +extern void out_of_memory(struct zonelist *zonelist,
> > +		gfp_t gfp_mask, int order, nodemask_t *mask);
> >  extern int register_oom_notifier(struct notifier_block *nb);
> >  extern int unregister_oom_notifier(struct notifier_block *nb);
> >  
> > Index: mmotm-2.6.32-Nov2/mm/oom_kill.c
> > ===================================================================
> > --- mmotm-2.6.32-Nov2.orig/mm/oom_kill.c
> > +++ mmotm-2.6.32-Nov2/mm/oom_kill.c
> > @@ -27,6 +27,7 @@
> >  #include <linux/notifier.h>
> >  #include <linux/memcontrol.h>
> >  #include <linux/security.h>
> > +#include <linux/mempolicy.h>
> >  
> >  int sysctl_panic_on_oom;
> >  int sysctl_oom_kill_allocating_task;
> > @@ -55,6 +56,8 @@ static int has_intersects_mems_allowed(s
> >   * badness - calculate a numeric value for how bad this task has been
> >   * @p: task struct of which task we should calculate
> >   * @uptime: current uptime in seconds
> > + * @constraint: type of oom_kill region
> > + * @mem: set if called by memory cgroup
> >   *
> >   * The formula used is relatively simple and documented inline in the
> >   * function. The main rationale is that we want to select a good task
> > @@ -70,7 +73,9 @@ static int has_intersects_mems_allowed(s
> >   *    of least surprise ... (be careful when you change it)
> >   */
> >  
> > -unsigned long badness(struct task_struct *p, unsigned long uptime)
> > +static unsigned long __badness(struct task_struct *p,
> > +		      unsigned long uptime, enum oom_constraint constraint,
> > +		      struct mem_cgroup *mem)
> >  {
> >  	unsigned long points, cpu_time, run_time;
> >  	struct mm_struct *mm;
> > @@ -193,30 +198,68 @@ unsigned long badness(struct task_struct
> >  	return points;
> >  }
> >  
> > +/* for /proc */
> > +unsigned long global_badness(struct task_struct *p, unsigned long uptime)
> > +{
> > +	return __badness(p, uptime, CONSTRAINT_NONE, NULL);
> > +}
> 
> I don't understand why this is necessary, CONSTRAINT_NONE should be 
> available to proc_oom_score() via linux/oom.h.  It would probably be 
> better to not rename badness() and use it directly.
> 
> > +
> > +
> >  /*
> >   * Determine the type of allocation constraint.
> >   */
> > -static inline enum oom_constraint constrained_alloc(struct zonelist *zonelist,
> > -						    gfp_t gfp_mask)
> > -{
> > +
> >  #ifdef CONFIG_NUMA
> > +static inline enum oom_constraint guess_oom_context(struct zonelist *zonelist,
> > +		gfp_t gfp_mask, nodemask_t *nodemask)
> 
> Why is this renamed from constrained_alloc()?  If the new code is really a 
> guess, we probably shouldn't be altering the oom killing behavior to kill 
> innocent tasks if it's wrong.
> 
No reasons. This just comes from my modification history.
I'll revert this part.


> > +{
> >  	struct zone *zone;
> >  	struct zoneref *z;
> >  	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
> > -	nodemask_t nodes = node_states[N_HIGH_MEMORY];
> > +	enum oom_constraint ret = CONSTRAINT_NONE;
> >  
> > -	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
> > -		if (cpuset_zone_allowed_softwall(zone, gfp_mask))
> > -			node_clear(zone_to_nid(zone), nodes);
> > -		else
> > +	/*
> > +	 * In numa environ, almost all allocation will be against NORMAL zone.
> > +	 * But some small area, ex)GFP_DMA for ia64 or GFP_DMA32 for x86-64
> > +	 * can cause OOM. We can use policy_zone for checking lowmem.
> > +	 */
> > +	if (high_zoneidx < policy_zone)
> > +		return CONSTRAINT_LOWMEM;
> > +	/*
> > +	 * Now, only mempolicy specifies nodemask. But if nodemask
> > +	 * covers all nodes, this oom is global oom.
> > +	 */
> > +	if (nodemask && !nodes_equal(node_states[N_HIGH_MEMORY], *nodemask))
> > +		ret = CONSTRAINT_MEMORY_POLICY;
> > +	/*
> > + 	 * If not __GFP_THISNODE, zonelist containes all nodes. And if
> > + 	 * zonelist contains a zone which isn't allowed under cpuset, we assume
> > + 	 * this allocation failure is caused by cpuset's constraint.
> > + 	 * Note: all nodes are scanned if nodemask=NULL.
> > + 	 */
> > +	for_each_zone_zonelist_nodemask(zone,
> > +			z, zonelist, high_zoneidx, nodemask) {
> > +		if (!cpuset_zone_allowed_softwall(zone, gfp_mask))
> >  			return CONSTRAINT_CPUSET;
> > +	}
> 
> This could probably be written as
> 
> 	int nid;
> 	if (nodemask)
> 		for_each_node_mask(nid, *nodemask)
> 			if (!__cpuset_node_allowed_softwall(nid, gfp_mask))
> 				return CONSTRAINT_CPUSET;
> 
> and then you don't need the struct zoneref or struct zone.

IIUC, typical allocation under cpuset is nodemask=NULL.
We'll have to scan zonelist.

The cpuset doesn't use nodemask at calling alloc_pages() because of its
softwall and hardwall feature and its nature of hierarchy.


Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-11-04  0:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-02  7:22 [RFC][-mm][PATCH 0/6] oom-killer: total renewal KAMEZAWA Hiroyuki
2009-11-02  7:24 ` [RFC][-mm][PATCH 1/6] oom-killer: updates for classification of OOM KAMEZAWA Hiroyuki
2009-11-02 17:05   ` Christoph Lameter
2009-11-02 23:02     ` KAMEZAWA Hiroyuki
2009-11-03 20:18   ` David Rientjes
2009-11-04  0:01     ` KAMEZAWA Hiroyuki [this message]
2009-11-02  7:25 ` [RFC][-mm][PATCH 2/6] oom-killer: count swap usage per process KAMEZAWA Hiroyuki
2009-11-02 17:07   ` Christoph Lameter
2009-11-02 23:03     ` KAMEZAWA Hiroyuki
2009-11-03 19:47   ` David Rientjes
2009-11-04  0:02     ` KAMEZAWA Hiroyuki
2009-11-02  7:26 ` [RFC][-mm][PATCH 3/6] oom-killer: count lowmem rss KAMEZAWA Hiroyuki
2009-11-02 17:09   ` Christoph Lameter
2009-11-02 23:11     ` KAMEZAWA Hiroyuki
2009-11-03 20:24   ` David Rientjes
2009-11-04  0:22     ` KAMEZAWA Hiroyuki
2009-11-02  7:27 ` [RFC][-mm][PATCH 4/6] oom-killer: fork bomb detector KAMEZAWA Hiroyuki
2009-11-02  8:39   ` KAMEZAWA Hiroyuki
2009-11-02  7:28 ` [RFC][-mm][PATCH 5/6] oom-killer: check last total_vm expansion KAMEZAWA Hiroyuki
2009-11-03 20:29   ` David Rientjes
2009-11-04  0:25     ` KAMEZAWA Hiroyuki
2009-11-02  7:30 ` [RFC][-mm][PATCH 6/6] oom-killer: rewrite badness KAMEZAWA Hiroyuki
2009-11-02 15:04 ` [RFC][-mm][PATCH 0/6] oom-killer: total renewal Minchan Kim
2009-11-02 15:44   ` KAMEZAWA Hiroyuki
2009-11-03 20:34 ` David Rientjes
2009-11-03 23:56   ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091104090116.fd319d1d.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=rientjes@google.com \
    --cc=vedran.furac@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox