linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch -mm 1/5] oom: prevent including sched.h in header file
@ 2007-09-22 17:47 David Rientjes
  2007-09-22 17:47 ` [patch -mm 2/5] oom: add header file to Kbuild as unifdef David Rientjes
  0 siblings, 1 reply; 26+ messages in thread
From: David Rientjes @ 2007-09-22 17:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Christoph Lameter, Alexey Dobriyan, linux-mm

It's not necessary to include all of linux/sched.h in linux/oom.h.
Instead, simply include prototypes for the relevant structs and include
linux/types.h for gfp_t.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 include/linux/oom.h |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -1,8 +1,6 @@
 #ifndef __INCLUDE_LINUX_OOM_H
 #define __INCLUDE_LINUX_OOM_H
 
-#include <linux/sched.h>
-
 /* /proc/<pid>/oom_adj set to -17 protects from the oom-killer */
 #define OOM_DISABLE (-17)
 /* inclusive */
@@ -11,6 +9,11 @@
 
 #ifdef __KERNEL__
 
+#include <linux/types.h>
+
+struct zonelist;
+struct notifier_block;
+
 /*
  * Types of limitations to the nodes from which allocations may occur
  */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch -mm 2/5] oom: add header file to Kbuild as unifdef
  2007-09-22 17:47 [patch -mm 1/5] oom: prevent including sched.h in header file David Rientjes
@ 2007-09-22 17:47 ` David Rientjes
  2007-09-22 17:47   ` [patch -mm 3/5] oom: convert zone_scan_lock from mutex to spinlock David Rientjes
  0 siblings, 1 reply; 26+ messages in thread
From: David Rientjes @ 2007-09-22 17:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Christoph Lameter, Alexey Dobriyan, linux-mm

Preprocess include/linux/oom.h before exporting it to userspace.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 include/linux/Kbuild |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/Kbuild b/include/linux/Kbuild
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -286,6 +286,7 @@ unifdef-y += nfs_idmap.h
 unifdef-y += n_r3964.h
 unifdef-y += nubus.h
 unifdef-y += nvram.h
+unifdef-y += oom.h
 unifdef-y += parport.h
 unifdef-y += patchkey.h
 unifdef-y += pci.h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch -mm 3/5] oom: convert zone_scan_lock from mutex to spinlock
  2007-09-22 17:47 ` [patch -mm 2/5] oom: add header file to Kbuild as unifdef David Rientjes
@ 2007-09-22 17:47   ` David Rientjes
  2007-09-22 17:47     ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim David Rientjes
  0 siblings, 1 reply; 26+ messages in thread
From: David Rientjes @ 2007-09-22 17:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, Christoph Lameter, linux-mm

There's no reason to sleep in try_set_zone_oom() or clear_zonelist_oom()
if the lock can't be acquired; it will be available soon enough once
the zonelist scanning is done.  All other threads waiting for the OOM
killer are also contingent on the exiting task being able to acquire
the lock in clear_zonelist_oom() so it doesn't make sense to put it
to sleep.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -28,7 +28,7 @@
 
 int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
-static DEFINE_MUTEX(zone_scan_mutex);
+static DEFINE_SPINLOCK(zone_scan_mutex);
 /* #define DEBUG */
 
 /**
@@ -403,7 +403,7 @@ int try_set_zone_oom(struct zonelist *zonelist)
 
 	z = zonelist->zones;
 
-	mutex_lock(&zone_scan_mutex);
+	spin_lock(&zone_scan_mutex);
 	do {
 		if (zone_is_oom_locked(*z)) {
 			ret = 0;
@@ -420,7 +420,7 @@ int try_set_zone_oom(struct zonelist *zonelist)
 		zone_set_flag(*z, ZONE_OOM_LOCKED);
 	} while (*(++z) != NULL);
 out:
-	mutex_unlock(&zone_scan_mutex);
+	spin_unlock(&zone_scan_mutex);
 	return ret;
 }
 
@@ -435,11 +435,11 @@ void clear_zonelist_oom(struct zonelist *zonelist)
 
 	z = zonelist->zones;
 
-	mutex_lock(&zone_scan_mutex);
+	spin_lock(&zone_scan_mutex);
 	do {
 		zone_clear_flag(*z, ZONE_OOM_LOCKED);
 	} while (*(++z) != NULL);
-	mutex_unlock(&zone_scan_mutex);
+	spin_unlock(&zone_scan_mutex);
 }
 
 /**

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-22 17:47   ` [patch -mm 3/5] oom: convert zone_scan_lock from mutex to spinlock David Rientjes
@ 2007-09-22 17:47     ` David Rientjes
  2007-09-22 17:47       ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes
                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: David Rientjes @ 2007-09-22 17:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, Christoph Lameter, linux-mm

Introduces new zone flag interface for testing and setting flags:

	int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag)

Instead of setting and clearing ZONE_RECLAIM_LOCKED each time
shrink_zone() is called, this flag is test and set before starting zone
reclaim.  Zone reclaim starts in __alloc_pages() when a zone's watermark
fails and the system is in zone_reclaim_mode.  If it's already in
reclaim, there's no need to start again so it is simply considered full
for that allocation attempt.

There is a change of behavior with regard to concurrent zone shrinking.
It is now possible for try_to_free_pages() or kswapd to already be
shrinking a particular zone when __alloc_pages() starts zone reclaim.  In
this case, it is possible for two concurrent threads to invoke
shrink_zone() for a single zone.

This change forbids a zone to be in zone reclaim twice, which was always
the behavior, but allows for concurrent try_to_free_pages() or kswapd
shrinking when starting zone reclaim.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 include/linux/mmzone.h |    4 ++++
 mm/vmscan.c            |   23 +++++++++++++----------
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -320,6 +320,10 @@ static inline void zone_set_flag(struct zone *zone, zone_flags_t flag)
 {
 	set_bit(flag, &zone->flags);
 }
+static inline int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag)
+{
+	return test_and_set_bit(flag, &zone->flags);
+}
 static inline void zone_clear_flag(struct zone *zone, zone_flags_t flag)
 {
 	clear_bit(flag, &zone->flags);
diff --git a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1067,8 +1067,6 @@ static unsigned long shrink_zone(int priority, struct zone *zone,
 	unsigned long nr_to_scan;
 	unsigned long nr_reclaimed = 0;
 
-	zone_set_flag(zone, ZONE_RECLAIM_LOCKED);
-
 	/*
 	 * Add one to `nr_to_scan' just to make sure that the kernel will
 	 * slowly sift through the active list.
@@ -1107,8 +1105,6 @@ static unsigned long shrink_zone(int priority, struct zone *zone,
 	}
 
 	throttle_vm_writeout(sc->gfp_mask);
-
-	zone_clear_flag(zone, ZONE_RECLAIM_LOCKED);
 	return nr_reclaimed;
 }
 
@@ -1852,6 +1848,7 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 {
 	cpumask_t mask;
 	int node_id;
+	int ret;
 
 	/*
 	 * Zone reclaim reclaims unmapped file backed pages and
@@ -1869,13 +1866,13 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 			<= zone->min_slab_pages)
 		return 0;
 
+	if (zone_is_all_unreclaimable(zone))
+		return 0;
+
 	/*
-	 * Avoid concurrent zone reclaims, do not reclaim in a zone that does
-	 * not have reclaimable pages and if we should not delay the allocation
-	 * then do not scan.
+	 * Do not scan if the allocation should not be delayed.
 	 */
-	if (!(gfp_mask & __GFP_WAIT) || zone_is_all_unreclaimable(zone) ||
-		zone_is_reclaim_locked(zone) || (current->flags & PF_MEMALLOC))
+	if (!(gfp_mask & __GFP_WAIT) || (current->flags & PF_MEMALLOC))
 			return 0;
 
 	/*
@@ -1888,6 +1885,12 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	mask = node_to_cpumask(node_id);
 	if (!cpus_empty(mask) && node_id != numa_node_id())
 		return 0;
-	return __zone_reclaim(zone, gfp_mask, order);
+
+	if (zone_test_and_set_flag(zone, ZONE_RECLAIM_LOCKED))
+		return 0;
+	ret = __zone_reclaim(zone, gfp_mask, order);
+	zone_clear_flag(zone, ZONE_RECLAIM_LOCKED);
+
+	return ret;
 }
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch -mm 5/5] oom: add sysctl to dump tasks memory state
  2007-09-22 17:47     ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim David Rientjes
@ 2007-09-22 17:47       ` David Rientjes
  2007-09-25  4:37         ` Balbir Singh
  2007-09-26 20:06         ` Andrew Morton
  2007-09-24 19:05       ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim Christoph Lameter
  2007-09-25  4:26       ` Balbir Singh
  2 siblings, 2 replies; 26+ messages in thread
From: David Rientjes @ 2007-09-22 17:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, Christoph Lameter, linux-mm

Adds a new sysctl, 'oom_dump_tasks', that dumps a list of all system tasks
(excluding kernel threads) and their pid, uid, tgid, vm size, rss cpu,
oom_adj score, and name.

Helpful for determining why an OOM condition occurred and what rogue task
caused it.

It is configurable so that large systems, such as those with several
thousand tasks, do not incur a performance penalty associated with data
they may not desire.

There currently do not appear to be any other generic kernel callers that
dump all this information.  Perhaps in the future it will be worthwhile
to construct a generic task dump interface based on passing a set of
flags that specify what per-task information shall be shown.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 Documentation/sysctl/vm.txt |   21 +++++++++++++++++++++
 kernel/sysctl.c             |    9 +++++++++
 mm/oom_kill.c               |   33 +++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -31,6 +31,7 @@ Currently, these files are in /proc/sys/vm:
 - min_unmapped_ratio
 - min_slab_ratio
 - panic_on_oom
+- oom_dump_tasks
 - oom_kill_allocating_task
 - mmap_min_address
 - numa_zonelist_order
@@ -223,6 +224,26 @@ according to your policy of failover.
 
 =============================================================
 
+oom_dump_tasks
+
+This enables a system-wide task dump (excluding kernel threads) that
+includes such information as pid, uid, tgid, vm size, rss, cpu,
+oom_adj score, and name.  This is helpful to determine why the OOM
+killer was invoked and to identify the rogue task that caused it.
+
+If this is set to zero, this information is suppressed.  On very
+large systems with thousands of tasks it may not be feasible to dump
+the memory state information for each one.  Such systems should not
+be forced to incur a performance penalty in OOM conditions when the
+information may not be desired.
+
+If this is set to non-zero, this information is shown whenever the
+OOM killer actually kills a memory-hogging task.
+
+The default value is 0.
+
+=============================================================
+
 oom_kill_allocating_task
 
 This enables or disables killing the OOM-triggering task in
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -64,6 +64,7 @@ extern int sysctl_overcommit_memory;
 extern int sysctl_overcommit_ratio;
 extern int sysctl_panic_on_oom;
 extern int sysctl_oom_kill_allocating_task;
+extern int sysctl_oom_dump_tasks;
 extern int max_threads;
 extern int core_uses_pid;
 extern int suid_dumpable;
@@ -807,6 +808,14 @@ static ctl_table vm_table[] = {
 		.proc_handler	= &proc_dointvec,
 	},
 	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "oom_dump_tasks",
+		.data		= &sysctl_oom_dump_tasks,
+		.maxlen		= sizeof(sysctl_oom_dump_tasks),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+	},
+	{
 		.ctl_name	= VM_OVERCOMMIT_RATIO,
 		.procname	= "overcommit_ratio",
 		.data		= &sysctl_overcommit_ratio,
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -28,6 +28,7 @@
 
 int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
+int sysctl_oom_dump_tasks;
 static DEFINE_SPINLOCK(zone_scan_mutex);
 /* #define DEBUG */
 
@@ -266,6 +267,36 @@ static struct task_struct *select_bad_process(unsigned long *ppoints)
 }
 
 /**
+ * Dumps the current memory state of all system tasks, excluding kernel threads.
+ * State information includes task's pid, uid, tgid, vm size, rss, cpu, oom_adj
+ * score, and name.
+ *
+ * Call with tasklist_lock read-locked.
+ */
+static void dump_tasks(void)
+{
+	struct task_struct *g, *p;
+
+	printk(KERN_INFO "[ pid ]   uid  tgid total_vm      rss cpu oom_adj "
+	       "name\n");
+	do_each_thread(g, p) {
+		/*
+		 * total_vm and rss sizes do not exist for tasks with a
+		 * detached mm so there's no need to report them.
+		 */
+		if (!p->mm)
+			continue;
+
+		task_lock(p);
+		printk(KERN_INFO "[%5d] %5d %5d %8lu %8lu %3d     %3d %s\n",
+		       p->pid, p->uid, p->tgid, p->mm->total_vm,
+		       get_mm_rss(p->mm), (int)task_cpu(p), p->oomkilladj,
+		       p->comm);
+		task_unlock(p);
+	} while_each_thread(g, p);
+}
+
+/**
  * Send SIGKILL to the selected  process irrespective of  CAP_SYS_RAW_IO
  * flag though it's unlikely that  we select a process with CAP_SYS_RAW_IO
  * set.
@@ -352,6 +383,8 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 			current->comm, gfp_mask, order, current->oomkilladj);
 		dump_stack();
 		show_mem();
+		if (sysctl_oom_dump_tasks)
+			dump_tasks();
 	}
 
 	/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-22 17:47     ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim David Rientjes
  2007-09-22 17:47       ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes
@ 2007-09-24 19:05       ` Christoph Lameter
  2007-09-24 19:14         ` David Rientjes
  2007-09-25  4:26       ` Balbir Singh
  2 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2007-09-24 19:05 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andrew Morton, Andrea Arcangeli, linux-mm

On Sat, 22 Sep 2007, David Rientjes wrote:

> +++ b/include/linux/mmzone.h
> @@ -320,6 +320,10 @@ static inline void zone_set_flag(struct zone *zone, zone_flags_t flag)
>  {
>  	set_bit(flag, &zone->flags);
>  }
> +static inline int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag)
> +{
> +	return test_and_set_bit(flag, &zone->flags);
> +}

Missing blank line.

>  static inline void zone_clear_flag(struct zone *zone, zone_flags_t flag)
>  {
>  	clear_bit(flag, &zone->flags);
> diff --git a/mm/vmscan.c b/mm/vmscan.c

The rest looks fine.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-24 19:05       ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim Christoph Lameter
@ 2007-09-24 19:14         ` David Rientjes
  2007-09-24 20:11           ` Christoph Lameter
  2007-09-24 21:02           ` David Rientjes
  0 siblings, 2 replies; 26+ messages in thread
From: David Rientjes @ 2007-09-24 19:14 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andrew Morton, Andrea Arcangeli, linux-mm

On Mon, 24 Sep 2007, Christoph Lameter wrote:

> > +++ b/include/linux/mmzone.h
> > @@ -320,6 +320,10 @@ static inline void zone_set_flag(struct zone *zone, zone_flags_t flag)
> >  {
> >  	set_bit(flag, &zone->flags);
> >  }
> > +static inline int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag)
> > +{
> > +	return test_and_set_bit(flag, &zone->flags);
> > +}
> 
> Missing blank line.
> 

The only blank line for inlined functions added to mmzone.h for zone 
flag support is between the generic flavors that set, test and set, or 
clear the flags and the explicit flavors that test specific bits; so this 
newline behavior is correct as written.

I was hoping to avoid doing things like

	#define ZoneSetReclaimLocked(zone)	zone_set_flag((zone),	\
							ZONE_RECLAIM_LOCKED)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-24 19:14         ` David Rientjes
@ 2007-09-24 20:11           ` Christoph Lameter
  2007-09-24 21:02           ` David Rientjes
  1 sibling, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2007-09-24 20:11 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andrew Morton, Andrea Arcangeli, linux-mm

On Mon, 24 Sep 2007, David Rientjes wrote:

> > > +static inline int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag)
> > > +{
> > > +	return test_and_set_bit(flag, &zone->flags);
> > > +}
> > 
> > Missing blank line.
> > 
> 
> The only blank line for inlined functions added to mmzone.h for zone 
> flag support is between the generic flavors that set, test and set, or 
> clear the flags and the explicit flavors that test specific bits; so this 
> newline behavior is correct as written.
> 
> I was hoping to avoid doing things like
> 
> 	#define ZoneSetReclaimLocked(zone)	zone_set_flag((zone),	\
> 							ZONE_RECLAIM_LOCKED)

Not sure what the #define is supposed to tell me.

Please add the corresponding blank lines at the end of functions. One 
function seems to be running into the next.

It should look like the rest of mmzone.h:

static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
        unsigned long map = section->section_mem_map;
        map &= SECTION_MAP_MASK;
        return (struct page *)map;
}

static inline int valid_section(struct mem_section *section)
{
        return (section && (section->section_mem_map & SECTION_MARKED_PRESENT));
}

static inline int section_has_mem_map(struct mem_section *section)
{
        return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
}

static inline int valid_section_nr(unsigned long nr)
{
        return valid_section(__nr_to_section(nr));
}

static inline struct mem_section *__pfn_to_section(unsigned long pfn)
{
        return __nr_to_section(pfn_to_section_nr(pfn));
}

static inline int pfn_valid(unsigned long pfn)
{
        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
                return 0;
        return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
}

Note that there is a empty line at the end of each function.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-24 19:14         ` David Rientjes
  2007-09-24 20:11           ` Christoph Lameter
@ 2007-09-24 21:02           ` David Rientjes
  2007-09-24 21:09             ` Christoph Lameter
  1 sibling, 1 reply; 26+ messages in thread
From: David Rientjes @ 2007-09-24 21:02 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andrew Morton, Andrea Arcangeli, linux-mm

Add newlines between new zone flag tester/modifier functions.

Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 include/linux/mmzone.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -320,10 +320,12 @@ static inline void zone_set_flag(struct zone *zone, zone_flags_t flag)
 {
 	set_bit(flag, &zone->flags);
 }
+
 static inline int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag)
 {
 	return test_and_set_bit(flag, &zone->flags);
 }
+
 static inline void zone_clear_flag(struct zone *zone, zone_flags_t flag)
 {
 	clear_bit(flag, &zone->flags);
@@ -333,10 +335,12 @@ static inline int zone_is_all_unreclaimable(const struct zone *zone)
 {
 	return test_bit(ZONE_ALL_UNRECLAIMABLE, &zone->flags);
 }
+
 static inline int zone_is_reclaim_locked(const struct zone *zone)
 {
 	return test_bit(ZONE_RECLAIM_LOCKED, &zone->flags);
 }
+
 static inline int zone_is_oom_locked(const struct zone *zone)
 {
 	return test_bit(ZONE_OOM_LOCKED, &zone->flags);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-24 21:02           ` David Rientjes
@ 2007-09-24 21:09             ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2007-09-24 21:09 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andrew Morton, Andrea Arcangeli, linux-mm

On Mon, 24 Sep 2007, David Rientjes wrote:

> Add newlines between new zone flag tester/modifier functions.

Acked-by: Christoph Lameter <clameter@sgi.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-22 17:47     ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim David Rientjes
  2007-09-22 17:47       ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes
  2007-09-24 19:05       ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim Christoph Lameter
@ 2007-09-25  4:26       ` Balbir Singh
  2007-09-25  4:34         ` David Rientjes
  2 siblings, 1 reply; 26+ messages in thread
From: Balbir Singh @ 2007-09-25  4:26 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm

David Rientjes wrote:
> +
> +	if (zone_test_and_set_flag(zone, ZONE_RECLAIM_LOCKED))
> +		return 0;

What's the consequence of this on the caller of zone_reclaim()?
I see that the zone is marked as full and will not be re-examined
again.

Am I missing something?

> +	ret = __zone_reclaim(zone, gfp_mask, order);
> +	zone_clear_flag(zone, ZONE_RECLAIM_LOCKED);
> +
> +	return ret;
>  }
>  #endif

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-25  4:26       ` Balbir Singh
@ 2007-09-25  4:34         ` David Rientjes
  2007-09-25  6:17           ` Balbir Singh
  2007-09-25 21:14           ` Christoph Lameter
  0 siblings, 2 replies; 26+ messages in thread
From: David Rientjes @ 2007-09-25  4:34 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm

On Tue, 25 Sep 2007, Balbir Singh wrote:

> > +
> > +	if (zone_test_and_set_flag(zone, ZONE_RECLAIM_LOCKED))
> > +		return 0;
> 
> What's the consequence of this on the caller of zone_reclaim()?
> I see that the zone is marked as full and will not be re-examined
> again.
> 

It's only marked as full in the zonelist cache for the zonelist that 
__alloc_pages() was called with, which is an optimization.  The zone is 
already flagged as being in __zone_reclaim() so there's no need to 
reinvoke it for this allocation attempt; that behavior is unchanged from 
current behavior.

One thing that has been changed in -mm with regard to my last patchset is 
that kswapd and try_to_free_pages() are allowed to call shrink_zone() 
concurrently.

ZONE_RECLAIM_LOCKED will be cleared upon return from __zone_reclaim().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state
  2007-09-22 17:47       ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes
@ 2007-09-25  4:37         ` Balbir Singh
  2007-09-25  4:57           ` David Rientjes
  2007-09-26 20:06         ` Andrew Morton
  1 sibling, 1 reply; 26+ messages in thread
From: Balbir Singh @ 2007-09-25  4:37 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm

David Rientjes wrote:
>  /**
> + * Dumps the current memory state of all system tasks, excluding kernel threads.
> + * State information includes task's pid, uid, tgid, vm size, rss, cpu, oom_adj
> + * score, and name.
> + *
> + * Call with tasklist_lock read-locked.
> + */
> +static void dump_tasks(void)
> +{
> +	struct task_struct *g, *p;
> +
> +	printk(KERN_INFO "[ pid ]   uid  tgid total_vm      rss cpu oom_adj "
> +	       "name\n");
> +	do_each_thread(g, p) {
> +		/*
> +		 * total_vm and rss sizes do not exist for tasks with a
> +		 * detached mm so there's no need to report them.
> +		 */
> +		if (!p->mm)
> +			continue;
> +
> +		task_lock(p);
> +		printk(KERN_INFO "[%5d] %5d %5d %8lu %8lu %3d     %3d %s\n",
> +		       p->pid, p->uid, p->tgid, p->mm->total_vm,
> +		       get_mm_rss(p->mm), (int)task_cpu(p), p->oomkilladj,
> +		       p->comm);
> +		task_unlock(p);
> +	} while_each_thread(g, p);
> +}
> +


I like this, but can we make this cgroup aware?
-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state
  2007-09-25  4:37         ` Balbir Singh
@ 2007-09-25  4:57           ` David Rientjes
  0 siblings, 0 replies; 26+ messages in thread
From: David Rientjes @ 2007-09-25  4:57 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm

On Tue, 25 Sep 2007, Balbir Singh wrote:

> > +static void dump_tasks(void)
> > +{
> > +	struct task_struct *g, *p;
> > +
> > +	printk(KERN_INFO "[ pid ]   uid  tgid total_vm      rss cpu oom_adj "
> > +	       "name\n");
> > +	do_each_thread(g, p) {
> > +		/*
> > +		 * total_vm and rss sizes do not exist for tasks with a
> > +		 * detached mm so there's no need to report them.
> > +		 */
> > +		if (!p->mm)
> > +			continue;
> > +
> > +		task_lock(p);
> > +		printk(KERN_INFO "[%5d] %5d %5d %8lu %8lu %3d     %3d %s\n",
> > +		       p->pid, p->uid, p->tgid, p->mm->total_vm,
> > +		       get_mm_rss(p->mm), (int)task_cpu(p), p->oomkilladj,
> > +		       p->comm);
> > +		task_unlock(p);
> > +	} while_each_thread(g, p);
> > +}
> > +
> 
> 
> I like this, but can we make this cgroup aware?

To do that it would be necessary to pass the struct mem_cgroup pointer to 
oom_kill_process(), pass it to dump_tasks(), and filter any tasks where
mm->mem_cgroup != mem.

We can fold that into any changes that are made for OOM killer cgroup 
serialization when the time comes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-25  4:34         ` David Rientjes
@ 2007-09-25  6:17           ` Balbir Singh
  2007-09-25  6:29             ` David Rientjes
  2007-09-25 21:15             ` Christoph Lameter
  2007-09-25 21:14           ` Christoph Lameter
  1 sibling, 2 replies; 26+ messages in thread
From: Balbir Singh @ 2007-09-25  6:17 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm

David Rientjes wrote:
> On Tue, 25 Sep 2007, Balbir Singh wrote:
> 
>>> +
>>> +	if (zone_test_and_set_flag(zone, ZONE_RECLAIM_LOCKED))
>>> +		return 0;
>> What's the consequence of this on the caller of zone_reclaim()?
>> I see that the zone is marked as full and will not be re-examined
>> again.
>>
> 
> It's only marked as full in the zonelist cache for the zonelist that 
> __alloc_pages() was called with, which is an optimization.  The zone is 
> already flagged as being in __zone_reclaim() so there's no need to 
> reinvoke it for this allocation attempt; that behavior is unchanged from 
> current behavior.
> 

OK

> One thing that has been changed in -mm with regard to my last patchset is 
> that kswapd and try_to_free_pages() are allowed to call shrink_zone() 
> concurrently.
> 

Aah.. interesting. Could you define concurrently more precisely,
concurrently as in the same zone or for different zones concurrently?

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-25  6:17           ` Balbir Singh
@ 2007-09-25  6:29             ` David Rientjes
  2007-09-25 21:15             ` Christoph Lameter
  1 sibling, 0 replies; 26+ messages in thread
From: David Rientjes @ 2007-09-25  6:29 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm

On Tue, 25 Sep 2007, Balbir Singh wrote:

> > One thing that has been changed in -mm with regard to my last patchset is 
> > that kswapd and try_to_free_pages() are allowed to call shrink_zone() 
> > concurrently.
> > 
> 
> Aah.. interesting. Could you define concurrently more precisely,
> concurrently as in the same zone or for different zones concurrently?
> 

Same zone, thankfully.

Previous to my 9-patch series that serialized the OOM killer, there was an 
atomic_t reclaim_in_progress member of each struct zone that was 
incremented each time shrink_zone() was called and decremented each time 
it exited.  The only place where this was tested was in zone_reclaim() and 
it returned 0 to __alloc_pages() if it was non-zero prior to calling 
__zone_reclaim().

So other callers to shrink_zone(), such as kswapd and try_to_free_pages(), 
were still able to invoke it several times for the same zone but 
zone_reclaim() could not if the zone was being shrunk, regardless of where 
shrink_zone() was called from. 

That's partially still true: kswapd and try_to_free_pages() (and actually 
balance_pgdat()) can still call it several times, but the first call to 
zone_reclaim() will also succeed since we're now indicating zone reclaims 
with a flag instead of an accumulator.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-25  4:34         ` David Rientjes
  2007-09-25  6:17           ` Balbir Singh
@ 2007-09-25 21:14           ` Christoph Lameter
  2007-09-25 21:17             ` David Rientjes
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2007-09-25 21:14 UTC (permalink / raw)
  To: David Rientjes; +Cc: Balbir Singh, Andrew Morton, Andrea Arcangeli, linux-mm

On Mon, 24 Sep 2007, David Rientjes wrote:

> ZONE_RECLAIM_LOCKED will be cleared upon return from __zone_reclaim().

ZONE_RECLAIM_LOCKED means that one zone reclaim is already running and 
other processes should not perform zone reclaim on the same zone. They 
will instead fall back to allocate memory from zones that are not local.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-25  6:17           ` Balbir Singh
  2007-09-25  6:29             ` David Rientjes
@ 2007-09-25 21:15             ` Christoph Lameter
  2007-09-25 21:19               ` David Rientjes
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2007-09-25 21:15 UTC (permalink / raw)
  To: Balbir Singh; +Cc: David Rientjes, Andrew Morton, Andrea Arcangeli, linux-mm

On Tue, 25 Sep 2007, Balbir Singh wrote:

> > One thing that has been changed in -mm with regard to my last patchset is 
> > that kswapd and try_to_free_pages() are allowed to call shrink_zone() 
> > concurrently.
> > 
> 
> Aah.. interesting. Could you define concurrently more precisely,
> concurrently as in the same zone or for different zones concurrently?

There was no change. They were allowed to call shrink_zone concurrently 
before.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-25 21:14           ` Christoph Lameter
@ 2007-09-25 21:17             ` David Rientjes
  0 siblings, 0 replies; 26+ messages in thread
From: David Rientjes @ 2007-09-25 21:17 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Balbir Singh, Andrew Morton, Andrea Arcangeli, linux-mm

On Tue, 25 Sep 2007, Christoph Lameter wrote:

> On Mon, 24 Sep 2007, David Rientjes wrote:
> 
> > ZONE_RECLAIM_LOCKED will be cleared upon return from __zone_reclaim().
> 
> ZONE_RECLAIM_LOCKED means that one zone reclaim is already running and 
> other processes should not perform zone reclaim on the same zone. They 
> will instead fall back to allocate memory from zones that are not local.
> 

Yes, and that's still true.  But the point is that shrink_zone() can be 
called from different points (__zone_reclaim(), kswapd, 
try_to_free_pages(), balance_pgdat()) for a zone and it will not stop zone 
reclaim from being invoked on the same zone.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-25 21:15             ` Christoph Lameter
@ 2007-09-25 21:19               ` David Rientjes
  2007-09-25 21:39                 ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: David Rientjes @ 2007-09-25 21:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Balbir Singh, Andrew Morton, Andrea Arcangeli, linux-mm

On Tue, 25 Sep 2007, Christoph Lameter wrote:

> > > One thing that has been changed in -mm with regard to my last patchset is 
> > > that kswapd and try_to_free_pages() are allowed to call shrink_zone() 
> > > concurrently.
> > > 
> > 
> > Aah.. interesting. Could you define concurrently more precisely,
> > concurrently as in the same zone or for different zones concurrently?
> 
> There was no change. They were allowed to call shrink_zone concurrently 
> before.
> 

Yes, there was.  Before the patchset, zone reclaim would not be able to 
call shrink_zone() on a zone that it is already being invoked for, 
regardless of where it was previous invoked from.  Now it is.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-25 21:19               ` David Rientjes
@ 2007-09-25 21:39                 ` Christoph Lameter
  2007-09-25 21:43                   ` David Rientjes
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2007-09-25 21:39 UTC (permalink / raw)
  To: David Rientjes; +Cc: Balbir Singh, Andrew Morton, Andrea Arcangeli, linux-mm

On Tue, 25 Sep 2007, David Rientjes wrote:

> On Tue, 25 Sep 2007, Christoph Lameter wrote:
> 
> > > > One thing that has been changed in -mm with regard to my last patchset is 
> > > > that kswapd and try_to_free_pages() are allowed to call shrink_zone() 
> > > > concurrently.
> > > > 
> > > 
> > > Aah.. interesting. Could you define concurrently more precisely,
> > > concurrently as in the same zone or for different zones concurrently?
> > 
> > There was no change. They were allowed to call shrink_zone concurrently 
> > before.
> > 
> 
> Yes, there was.  Before the patchset, zone reclaim would not be able to 
> call shrink_zone() on a zone that it is already being invoked for, 
> regardless of where it was previous invoked from.  Now it is.

Right. I just wanted to make sure of this. The statement above 
seems to indicate that there was a change to the concurrency of kswapd 
and try_to_free_pages with one another.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim
  2007-09-25 21:39                 ` Christoph Lameter
@ 2007-09-25 21:43                   ` David Rientjes
  0 siblings, 0 replies; 26+ messages in thread
From: David Rientjes @ 2007-09-25 21:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Balbir Singh, Andrew Morton, Andrea Arcangeli, linux-mm

On Tue, 25 Sep 2007, Christoph Lameter wrote:

> > > > > One thing that has been changed in -mm with regard to my last patchset is 
> > > > > that kswapd and try_to_free_pages() are allowed to call shrink_zone() 
> > > > > concurrently.
> > > > > 
> > > > 
> > > > Aah.. interesting. Could you define concurrently more precisely,
> > > > concurrently as in the same zone or for different zones concurrently?
> > > 
> > > There was no change. They were allowed to call shrink_zone concurrently 
> > > before.
> > > 
> > 
> > Yes, there was.  Before the patchset, zone reclaim would not be able to 
> > call shrink_zone() on a zone that it is already being invoked for, 
> > regardless of where it was previous invoked from.  Now it is.
> 
> Right. I just wanted to make sure of this. The statement above 
> seems to indicate that there was a change to the concurrency of kswapd 
> and try_to_free_pages with one another.
> 

Ah, I see, I was trying to say that zone reclaim can now be called 
concurrently with kswapd and try_to_free_pages() in shrink_zone().  Thanks 
for pointing that out, it was poorly worded.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state
  2007-09-22 17:47       ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes
  2007-09-25  4:37         ` Balbir Singh
@ 2007-09-26 20:06         ` Andrew Morton
  2007-09-26 20:46           ` David Rientjes
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2007-09-26 20:06 UTC (permalink / raw)
  To: David Rientjes; +Cc: andrea, clameter, linux-mm

On Sat, 22 Sep 2007 10:47:13 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> Adds a new sysctl, 'oom_dump_tasks', that dumps a list of all system tasks
> (excluding kernel threads) and their pid, uid, tgid, vm size, rss cpu,
> oom_adj score, and name.
> 
> Helpful for determining why an OOM condition occurred and what rogue task
> caused it.
> 
> It is configurable so that large systems, such as those with several
> thousand tasks, do not incur a performance penalty associated with data
> they may not desire.
> 
> There currently do not appear to be any other generic kernel callers that
> dump all this information.  Perhaps in the future it will be worthwhile
> to construct a generic task dump interface based on passing a set of
> flags that specify what per-task information shall be shown.

It isn't obvious to me why this has "oom" in its name.  It is just a
general display-stuff-about-task-memory handler, isn't it?

Nor is it obvious why we need it at all.  This sort of information can
already be gathered from /proc/pid/whatever.  If the system is all wedged
and you can't get console control then this info dump doesn't provide you
with info which you're interested in anyway - you want to see the global
(or per-cgroup) info, not the per-task info.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state
  2007-09-26 20:06         ` Andrew Morton
@ 2007-09-26 20:46           ` David Rientjes
  2007-09-26 21:47             ` Andrew Morton
  0 siblings, 1 reply; 26+ messages in thread
From: David Rientjes @ 2007-09-26 20:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: andrea, clameter, linux-mm

On Wed, 26 Sep 2007, Andrew Morton wrote:

> > Adds a new sysctl, 'oom_dump_tasks', that dumps a list of all system tasks
> > (excluding kernel threads) and their pid, uid, tgid, vm size, rss cpu,
> > oom_adj score, and name.
> > 
> > Helpful for determining why an OOM condition occurred and what rogue task
> > caused it.
> > 
> > It is configurable so that large systems, such as those with several
> > thousand tasks, do not incur a performance penalty associated with data
> > they may not desire.
> > 
> > There currently do not appear to be any other generic kernel callers that
> > dump all this information.  Perhaps in the future it will be worthwhile
> > to construct a generic task dump interface based on passing a set of
> > flags that specify what per-task information shall be shown.
> 
> It isn't obvious to me why this has "oom" in its name.  It is just a
> general display-stuff-about-task-memory handler, isn't it?
> 

Yes.  When other subsystems have been converted to using it, probably with 
a callback filter function and flags to specify what traits to show for 
each task, it will be feasible to move it out of the OOM killer.  Until 
that happens, however, it can remain static and in oom_kill.c.

There's several places in the kernel where a tasklist is dumped but the 
information they dump are very different.  Any generic tasklist dumping 
interface will become complex just based on the number of possible traits 
to display.

> Nor is it obvious why we need it at all.  This sort of information can
> already be gathered from /proc/pid/whatever.  If the system is all wedged
> and you can't get console control then this info dump doesn't provide you
> with info which you're interested in anyway - you want to see the global
> (or per-cgroup) info, not the per-task info.
> 

It can be gathered by other means, yes, but not at the time of OOM nor 
immediately before a task is killed.  This tasklist dump is done very 
close to the OOM kill and it represents the per-task memory state, whether 
system or cgroup, that triggered that event.  This could be done other 
ways, for instance with an OOM userspace notifier, but that would delay 
the SIGKILL being sent.  So in the interest of a fast OOM killer, it's 
best to dump the information ourselves, if the user chose to enable that 
functionality.

The information should be displayed in a per-task manner because the 
global memory state doesn't really matter: we know we're OOM, because 
we're in the OOM killer.  Showing how little free memory we have isn't 
immediately helpful on a system-wide basis.  But oom_dump_tasks, the way 
I've written it, allows you to identify the "rogue" task that is using way 
more memory than expected and allows you to alter oom_adj scores in the 
case when the task you've identified, and the one you want dead, isn't the 
one that ends up being killed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state
  2007-09-26 20:46           ` David Rientjes
@ 2007-09-26 21:47             ` Andrew Morton
  2007-09-27  6:15               ` David Rientjes
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2007-09-26 21:47 UTC (permalink / raw)
  To: David Rientjes; +Cc: andrea, clameter, linux-mm

On Wed, 26 Sep 2007 13:46:49 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> > Nor is it obvious why we need it at all.  This sort of information can
> > already be gathered from /proc/pid/whatever.  If the system is all wedged
> > and you can't get console control then this info dump doesn't provide you
> > with info which you're interested in anyway - you want to see the global
> > (or per-cgroup) info, not the per-task info.
> > 
> 
> It can be gathered by other means, yes, but not at the time of OOM nor 
> immediately before a task is killed.  This tasklist dump is done very 
> close to the OOM kill and it represents the per-task memory state, whether 
> system or cgroup, that triggered that event.

OK, that's useful.  But your changelog was completely wrong - it implies
that this sysctl _causes_ the dump, rather than stating that the sysctl
_enables_ an oom-kill-time dump:

  Adds a new sysctl, 'oom_dump_tasks', that dumps a list of all system
  tasks (excluding kernel threads) and their pid, uid, tgid, vm size, rss
  cpu, oom_adj score, and name.

  Helpful for determining why an OOM condition occurred and what rogue
  task caused it.

and silly me believed it.

And now I understand why it has "oom_' in the name.

I'd further suggest that the Documentation/sysctl/vm.txt buries the lede a
bit.  Better would be to start with "Enabling this causes a system-wide
task dump to be produced when the kernel performs an oom-killing ..."

>  This could be done other 
> ways, for instance with an OOM userspace notifier, but that would delay 
> the SIGKILL being sent.  So in the interest of a fast OOM killer, it's 
> best to dump the information ourselves, if the user chose to enable that 
> functionality.
> 
> The information should be displayed in a per-task manner because the 
> global memory state doesn't really matter: we know we're OOM, because 
> we're in the OOM killer.  Showing how little free memory we have isn't 
> immediately helpful on a system-wide basis.  But oom_dump_tasks, the way 
> I've written it, allows you to identify the "rogue" task that is using way 
> more memory than expected and allows you to alter oom_adj scores in the 
> case when the task you've identified, and the one you want dead, isn't the 
> one that ends up being killed.

Well I get it now ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state
  2007-09-26 21:47             ` Andrew Morton
@ 2007-09-27  6:15               ` David Rientjes
  0 siblings, 0 replies; 26+ messages in thread
From: David Rientjes @ 2007-09-27  6:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: andrea, clameter, linux-mm

On Wed, 26 Sep 2007, Andrew Morton wrote:

> > It can be gathered by other means, yes, but not at the time of OOM nor 
> > immediately before a task is killed.  This tasklist dump is done very 
> > close to the OOM kill and it represents the per-task memory state, whether 
> > system or cgroup, that triggered that event.
> 
> OK, that's useful.  But your changelog was completely wrong - it implies
> that this sysctl _causes_ the dump, rather than stating that the sysctl
> _enables_ an oom-kill-time dump:
> 

Oops, that does sound confusing.  I'll reword the changelog and the 
addition to Documentation/sysctl/vm.txt so it's not ambiguous.

Sorry for the confusion.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2007-09-27  6:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-22 17:47 [patch -mm 1/5] oom: prevent including sched.h in header file David Rientjes
2007-09-22 17:47 ` [patch -mm 2/5] oom: add header file to Kbuild as unifdef David Rientjes
2007-09-22 17:47   ` [patch -mm 3/5] oom: convert zone_scan_lock from mutex to spinlock David Rientjes
2007-09-22 17:47     ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim David Rientjes
2007-09-22 17:47       ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes
2007-09-25  4:37         ` Balbir Singh
2007-09-25  4:57           ` David Rientjes
2007-09-26 20:06         ` Andrew Morton
2007-09-26 20:46           ` David Rientjes
2007-09-26 21:47             ` Andrew Morton
2007-09-27  6:15               ` David Rientjes
2007-09-24 19:05       ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim Christoph Lameter
2007-09-24 19:14         ` David Rientjes
2007-09-24 20:11           ` Christoph Lameter
2007-09-24 21:02           ` David Rientjes
2007-09-24 21:09             ` Christoph Lameter
2007-09-25  4:26       ` Balbir Singh
2007-09-25  4:34         ` David Rientjes
2007-09-25  6:17           ` Balbir Singh
2007-09-25  6:29             ` David Rientjes
2007-09-25 21:15             ` Christoph Lameter
2007-09-25 21:19               ` David Rientjes
2007-09-25 21:39                 ` Christoph Lameter
2007-09-25 21:43                   ` David Rientjes
2007-09-25 21:14           ` Christoph Lameter
2007-09-25 21:17             ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox