* [patch -mm 1/5] oom: prevent including sched.h in header file @ 2007-09-22 17:47 David Rientjes 2007-09-22 17:47 ` [patch -mm 2/5] oom: add header file to Kbuild as unifdef David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2007-09-22 17:47 UTC (permalink / raw) To: Andrew Morton Cc: Andrea Arcangeli, Christoph Lameter, Alexey Dobriyan, linux-mm It's not necessary to include all of linux/sched.h in linux/oom.h. Instead, simply include prototypes for the relevant structs and include linux/types.h for gfp_t. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> --- include/linux/oom.h | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -1,8 +1,6 @@ #ifndef __INCLUDE_LINUX_OOM_H #define __INCLUDE_LINUX_OOM_H -#include <linux/sched.h> - /* /proc/<pid>/oom_adj set to -17 protects from the oom-killer */ #define OOM_DISABLE (-17) /* inclusive */ @@ -11,6 +9,11 @@ #ifdef __KERNEL__ +#include <linux/types.h> + +struct zonelist; +struct notifier_block; + /* * Types of limitations to the nodes from which allocations may occur */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch -mm 2/5] oom: add header file to Kbuild as unifdef 2007-09-22 17:47 [patch -mm 1/5] oom: prevent including sched.h in header file David Rientjes @ 2007-09-22 17:47 ` David Rientjes 2007-09-22 17:47 ` [patch -mm 3/5] oom: convert zone_scan_lock from mutex to spinlock David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2007-09-22 17:47 UTC (permalink / raw) To: Andrew Morton Cc: Andrea Arcangeli, Christoph Lameter, Alexey Dobriyan, linux-mm Preprocess include/linux/oom.h before exporting it to userspace. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> --- include/linux/Kbuild | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/Kbuild b/include/linux/Kbuild --- a/include/linux/Kbuild +++ b/include/linux/Kbuild @@ -286,6 +286,7 @@ unifdef-y += nfs_idmap.h unifdef-y += n_r3964.h unifdef-y += nubus.h unifdef-y += nvram.h +unifdef-y += oom.h unifdef-y += parport.h unifdef-y += patchkey.h unifdef-y += pci.h -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch -mm 3/5] oom: convert zone_scan_lock from mutex to spinlock 2007-09-22 17:47 ` [patch -mm 2/5] oom: add header file to Kbuild as unifdef David Rientjes @ 2007-09-22 17:47 ` David Rientjes 2007-09-22 17:47 ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2007-09-22 17:47 UTC (permalink / raw) To: Andrew Morton; +Cc: Andrea Arcangeli, Christoph Lameter, linux-mm There's no reason to sleep in try_set_zone_oom() or clear_zonelist_oom() if the lock can't be acquired; it will be available soon enough once the zonelist scanning is done. All other threads waiting for the OOM killer are also contingent on the exiting task being able to acquire the lock in clear_zonelist_oom() so it doesn't make sense to put it to sleep. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> --- mm/oom_kill.c | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -28,7 +28,7 @@ int sysctl_panic_on_oom; int sysctl_oom_kill_allocating_task; -static DEFINE_MUTEX(zone_scan_mutex); +static DEFINE_SPINLOCK(zone_scan_mutex); /* #define DEBUG */ /** @@ -403,7 +403,7 @@ int try_set_zone_oom(struct zonelist *zonelist) z = zonelist->zones; - mutex_lock(&zone_scan_mutex); + spin_lock(&zone_scan_mutex); do { if (zone_is_oom_locked(*z)) { ret = 0; @@ -420,7 +420,7 @@ int try_set_zone_oom(struct zonelist *zonelist) zone_set_flag(*z, ZONE_OOM_LOCKED); } while (*(++z) != NULL); out: - mutex_unlock(&zone_scan_mutex); + spin_unlock(&zone_scan_mutex); return ret; } @@ -435,11 +435,11 @@ void clear_zonelist_oom(struct zonelist *zonelist) z = zonelist->zones; - mutex_lock(&zone_scan_mutex); + spin_lock(&zone_scan_mutex); do { zone_clear_flag(*z, ZONE_OOM_LOCKED); } while (*(++z) != NULL); - mutex_unlock(&zone_scan_mutex); + spin_unlock(&zone_scan_mutex); } /** -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-22 17:47 ` [patch -mm 3/5] oom: convert zone_scan_lock from mutex to spinlock David Rientjes @ 2007-09-22 17:47 ` David Rientjes 2007-09-22 17:47 ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: David Rientjes @ 2007-09-22 17:47 UTC (permalink / raw) To: Andrew Morton; +Cc: Andrea Arcangeli, Christoph Lameter, linux-mm Introduces new zone flag interface for testing and setting flags: int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag) Instead of setting and clearing ZONE_RECLAIM_LOCKED each time shrink_zone() is called, this flag is test and set before starting zone reclaim. Zone reclaim starts in __alloc_pages() when a zone's watermark fails and the system is in zone_reclaim_mode. If it's already in reclaim, there's no need to start again so it is simply considered full for that allocation attempt. There is a change of behavior with regard to concurrent zone shrinking. It is now possible for try_to_free_pages() or kswapd to already be shrinking a particular zone when __alloc_pages() starts zone reclaim. In this case, it is possible for two concurrent threads to invoke shrink_zone() for a single zone. This change forbids a zone to be in zone reclaim twice, which was always the behavior, but allows for concurrent try_to_free_pages() or kswapd shrinking when starting zone reclaim. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> --- include/linux/mmzone.h | 4 ++++ mm/vmscan.c | 23 +++++++++++++---------- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -320,6 +320,10 @@ static inline void zone_set_flag(struct zone *zone, zone_flags_t flag) { set_bit(flag, &zone->flags); } +static inline int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag) +{ + return test_and_set_bit(flag, &zone->flags); +} static inline void zone_clear_flag(struct zone *zone, zone_flags_t flag) { clear_bit(flag, &zone->flags); diff --git a/mm/vmscan.c b/mm/vmscan.c --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1067,8 +1067,6 @@ static unsigned long shrink_zone(int priority, struct zone *zone, unsigned long nr_to_scan; unsigned long nr_reclaimed = 0; - zone_set_flag(zone, ZONE_RECLAIM_LOCKED); - /* * Add one to `nr_to_scan' just to make sure that the kernel will * slowly sift through the active list. @@ -1107,8 +1105,6 @@ static unsigned long shrink_zone(int priority, struct zone *zone, } throttle_vm_writeout(sc->gfp_mask); - - zone_clear_flag(zone, ZONE_RECLAIM_LOCKED); return nr_reclaimed; } @@ -1852,6 +1848,7 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) { cpumask_t mask; int node_id; + int ret; /* * Zone reclaim reclaims unmapped file backed pages and @@ -1869,13 +1866,13 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) <= zone->min_slab_pages) return 0; + if (zone_is_all_unreclaimable(zone)) + return 0; + /* - * Avoid concurrent zone reclaims, do not reclaim in a zone that does - * not have reclaimable pages and if we should not delay the allocation - * then do not scan. + * Do not scan if the allocation should not be delayed. */ - if (!(gfp_mask & __GFP_WAIT) || zone_is_all_unreclaimable(zone) || - zone_is_reclaim_locked(zone) || (current->flags & PF_MEMALLOC)) + if (!(gfp_mask & __GFP_WAIT) || (current->flags & PF_MEMALLOC)) return 0; /* @@ -1888,6 +1885,12 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) mask = node_to_cpumask(node_id); if (!cpus_empty(mask) && node_id != numa_node_id()) return 0; - return __zone_reclaim(zone, gfp_mask, order); + + if (zone_test_and_set_flag(zone, ZONE_RECLAIM_LOCKED)) + return 0; + ret = __zone_reclaim(zone, gfp_mask, order); + zone_clear_flag(zone, ZONE_RECLAIM_LOCKED); + + return ret; } #endif -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch -mm 5/5] oom: add sysctl to dump tasks memory state 2007-09-22 17:47 ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim David Rientjes @ 2007-09-22 17:47 ` David Rientjes 2007-09-25 4:37 ` Balbir Singh 2007-09-26 20:06 ` Andrew Morton 2007-09-24 19:05 ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim Christoph Lameter 2007-09-25 4:26 ` Balbir Singh 2 siblings, 2 replies; 26+ messages in thread From: David Rientjes @ 2007-09-22 17:47 UTC (permalink / raw) To: Andrew Morton; +Cc: Andrea Arcangeli, Christoph Lameter, linux-mm Adds a new sysctl, 'oom_dump_tasks', that dumps a list of all system tasks (excluding kernel threads) and their pid, uid, tgid, vm size, rss cpu, oom_adj score, and name. Helpful for determining why an OOM condition occurred and what rogue task caused it. It is configurable so that large systems, such as those with several thousand tasks, do not incur a performance penalty associated with data they may not desire. There currently do not appear to be any other generic kernel callers that dump all this information. Perhaps in the future it will be worthwhile to construct a generic task dump interface based on passing a set of flags that specify what per-task information shall be shown. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> --- Documentation/sysctl/vm.txt | 21 +++++++++++++++++++++ kernel/sysctl.c | 9 +++++++++ mm/oom_kill.c | 33 +++++++++++++++++++++++++++++++++ 3 files changed, 63 insertions(+), 0 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -31,6 +31,7 @@ Currently, these files are in /proc/sys/vm: - min_unmapped_ratio - min_slab_ratio - panic_on_oom +- oom_dump_tasks - oom_kill_allocating_task - mmap_min_address - numa_zonelist_order @@ -223,6 +224,26 @@ according to your policy of failover. ============================================================= +oom_dump_tasks + +This enables a system-wide task dump (excluding kernel threads) that +includes such information as pid, uid, tgid, vm size, rss, cpu, +oom_adj score, and name. This is helpful to determine why the OOM +killer was invoked and to identify the rogue task that caused it. + +If this is set to zero, this information is suppressed. On very +large systems with thousands of tasks it may not be feasible to dump +the memory state information for each one. Such systems should not +be forced to incur a performance penalty in OOM conditions when the +information may not be desired. + +If this is set to non-zero, this information is shown whenever the +OOM killer actually kills a memory-hogging task. + +The default value is 0. + +============================================================= + oom_kill_allocating_task This enables or disables killing the OOM-triggering task in diff --git a/kernel/sysctl.c b/kernel/sysctl.c --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -64,6 +64,7 @@ extern int sysctl_overcommit_memory; extern int sysctl_overcommit_ratio; extern int sysctl_panic_on_oom; extern int sysctl_oom_kill_allocating_task; +extern int sysctl_oom_dump_tasks; extern int max_threads; extern int core_uses_pid; extern int suid_dumpable; @@ -807,6 +808,14 @@ static ctl_table vm_table[] = { .proc_handler = &proc_dointvec, }, { + .ctl_name = CTL_UNNUMBERED, + .procname = "oom_dump_tasks", + .data = &sysctl_oom_dump_tasks, + .maxlen = sizeof(sysctl_oom_dump_tasks), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { .ctl_name = VM_OVERCOMMIT_RATIO, .procname = "overcommit_ratio", .data = &sysctl_overcommit_ratio, diff --git a/mm/oom_kill.c b/mm/oom_kill.c --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -28,6 +28,7 @@ int sysctl_panic_on_oom; int sysctl_oom_kill_allocating_task; +int sysctl_oom_dump_tasks; static DEFINE_SPINLOCK(zone_scan_mutex); /* #define DEBUG */ @@ -266,6 +267,36 @@ static struct task_struct *select_bad_process(unsigned long *ppoints) } /** + * Dumps the current memory state of all system tasks, excluding kernel threads. + * State information includes task's pid, uid, tgid, vm size, rss, cpu, oom_adj + * score, and name. + * + * Call with tasklist_lock read-locked. + */ +static void dump_tasks(void) +{ + struct task_struct *g, *p; + + printk(KERN_INFO "[ pid ] uid tgid total_vm rss cpu oom_adj " + "name\n"); + do_each_thread(g, p) { + /* + * total_vm and rss sizes do not exist for tasks with a + * detached mm so there's no need to report them. + */ + if (!p->mm) + continue; + + task_lock(p); + printk(KERN_INFO "[%5d] %5d %5d %8lu %8lu %3d %3d %s\n", + p->pid, p->uid, p->tgid, p->mm->total_vm, + get_mm_rss(p->mm), (int)task_cpu(p), p->oomkilladj, + p->comm); + task_unlock(p); + } while_each_thread(g, p); +} + +/** * Send SIGKILL to the selected process irrespective of CAP_SYS_RAW_IO * flag though it's unlikely that we select a process with CAP_SYS_RAW_IO * set. @@ -352,6 +383,8 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, current->comm, gfp_mask, order, current->oomkilladj); dump_stack(); show_mem(); + if (sysctl_oom_dump_tasks) + dump_tasks(); } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state 2007-09-22 17:47 ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes @ 2007-09-25 4:37 ` Balbir Singh 2007-09-25 4:57 ` David Rientjes 2007-09-26 20:06 ` Andrew Morton 1 sibling, 1 reply; 26+ messages in thread From: Balbir Singh @ 2007-09-25 4:37 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm David Rientjes wrote: > /** > + * Dumps the current memory state of all system tasks, excluding kernel threads. > + * State information includes task's pid, uid, tgid, vm size, rss, cpu, oom_adj > + * score, and name. > + * > + * Call with tasklist_lock read-locked. > + */ > +static void dump_tasks(void) > +{ > + struct task_struct *g, *p; > + > + printk(KERN_INFO "[ pid ] uid tgid total_vm rss cpu oom_adj " > + "name\n"); > + do_each_thread(g, p) { > + /* > + * total_vm and rss sizes do not exist for tasks with a > + * detached mm so there's no need to report them. > + */ > + if (!p->mm) > + continue; > + > + task_lock(p); > + printk(KERN_INFO "[%5d] %5d %5d %8lu %8lu %3d %3d %s\n", > + p->pid, p->uid, p->tgid, p->mm->total_vm, > + get_mm_rss(p->mm), (int)task_cpu(p), p->oomkilladj, > + p->comm); > + task_unlock(p); > + } while_each_thread(g, p); > +} > + I like this, but can we make this cgroup aware? -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state 2007-09-25 4:37 ` Balbir Singh @ 2007-09-25 4:57 ` David Rientjes 0 siblings, 0 replies; 26+ messages in thread From: David Rientjes @ 2007-09-25 4:57 UTC (permalink / raw) To: Balbir Singh; +Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm On Tue, 25 Sep 2007, Balbir Singh wrote: > > +static void dump_tasks(void) > > +{ > > + struct task_struct *g, *p; > > + > > + printk(KERN_INFO "[ pid ] uid tgid total_vm rss cpu oom_adj " > > + "name\n"); > > + do_each_thread(g, p) { > > + /* > > + * total_vm and rss sizes do not exist for tasks with a > > + * detached mm so there's no need to report them. > > + */ > > + if (!p->mm) > > + continue; > > + > > + task_lock(p); > > + printk(KERN_INFO "[%5d] %5d %5d %8lu %8lu %3d %3d %s\n", > > + p->pid, p->uid, p->tgid, p->mm->total_vm, > > + get_mm_rss(p->mm), (int)task_cpu(p), p->oomkilladj, > > + p->comm); > > + task_unlock(p); > > + } while_each_thread(g, p); > > +} > > + > > > I like this, but can we make this cgroup aware? To do that it would be necessary to pass the struct mem_cgroup pointer to oom_kill_process(), pass it to dump_tasks(), and filter any tasks where mm->mem_cgroup != mem. We can fold that into any changes that are made for OOM killer cgroup serialization when the time comes. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state 2007-09-22 17:47 ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes 2007-09-25 4:37 ` Balbir Singh @ 2007-09-26 20:06 ` Andrew Morton 2007-09-26 20:46 ` David Rientjes 1 sibling, 1 reply; 26+ messages in thread From: Andrew Morton @ 2007-09-26 20:06 UTC (permalink / raw) To: David Rientjes; +Cc: andrea, clameter, linux-mm On Sat, 22 Sep 2007 10:47:13 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > Adds a new sysctl, 'oom_dump_tasks', that dumps a list of all system tasks > (excluding kernel threads) and their pid, uid, tgid, vm size, rss cpu, > oom_adj score, and name. > > Helpful for determining why an OOM condition occurred and what rogue task > caused it. > > It is configurable so that large systems, such as those with several > thousand tasks, do not incur a performance penalty associated with data > they may not desire. > > There currently do not appear to be any other generic kernel callers that > dump all this information. Perhaps in the future it will be worthwhile > to construct a generic task dump interface based on passing a set of > flags that specify what per-task information shall be shown. It isn't obvious to me why this has "oom" in its name. It is just a general display-stuff-about-task-memory handler, isn't it? Nor is it obvious why we need it at all. This sort of information can already be gathered from /proc/pid/whatever. If the system is all wedged and you can't get console control then this info dump doesn't provide you with info which you're interested in anyway - you want to see the global (or per-cgroup) info, not the per-task info. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state 2007-09-26 20:06 ` Andrew Morton @ 2007-09-26 20:46 ` David Rientjes 2007-09-26 21:47 ` Andrew Morton 0 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2007-09-26 20:46 UTC (permalink / raw) To: Andrew Morton; +Cc: andrea, clameter, linux-mm On Wed, 26 Sep 2007, Andrew Morton wrote: > > Adds a new sysctl, 'oom_dump_tasks', that dumps a list of all system tasks > > (excluding kernel threads) and their pid, uid, tgid, vm size, rss cpu, > > oom_adj score, and name. > > > > Helpful for determining why an OOM condition occurred and what rogue task > > caused it. > > > > It is configurable so that large systems, such as those with several > > thousand tasks, do not incur a performance penalty associated with data > > they may not desire. > > > > There currently do not appear to be any other generic kernel callers that > > dump all this information. Perhaps in the future it will be worthwhile > > to construct a generic task dump interface based on passing a set of > > flags that specify what per-task information shall be shown. > > It isn't obvious to me why this has "oom" in its name. It is just a > general display-stuff-about-task-memory handler, isn't it? > Yes. When other subsystems have been converted to using it, probably with a callback filter function and flags to specify what traits to show for each task, it will be feasible to move it out of the OOM killer. Until that happens, however, it can remain static and in oom_kill.c. There's several places in the kernel where a tasklist is dumped but the information they dump are very different. Any generic tasklist dumping interface will become complex just based on the number of possible traits to display. > Nor is it obvious why we need it at all. This sort of information can > already be gathered from /proc/pid/whatever. If the system is all wedged > and you can't get console control then this info dump doesn't provide you > with info which you're interested in anyway - you want to see the global > (or per-cgroup) info, not the per-task info. > It can be gathered by other means, yes, but not at the time of OOM nor immediately before a task is killed. This tasklist dump is done very close to the OOM kill and it represents the per-task memory state, whether system or cgroup, that triggered that event. This could be done other ways, for instance with an OOM userspace notifier, but that would delay the SIGKILL being sent. So in the interest of a fast OOM killer, it's best to dump the information ourselves, if the user chose to enable that functionality. The information should be displayed in a per-task manner because the global memory state doesn't really matter: we know we're OOM, because we're in the OOM killer. Showing how little free memory we have isn't immediately helpful on a system-wide basis. But oom_dump_tasks, the way I've written it, allows you to identify the "rogue" task that is using way more memory than expected and allows you to alter oom_adj scores in the case when the task you've identified, and the one you want dead, isn't the one that ends up being killed. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state 2007-09-26 20:46 ` David Rientjes @ 2007-09-26 21:47 ` Andrew Morton 2007-09-27 6:15 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2007-09-26 21:47 UTC (permalink / raw) To: David Rientjes; +Cc: andrea, clameter, linux-mm On Wed, 26 Sep 2007 13:46:49 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > > Nor is it obvious why we need it at all. This sort of information can > > already be gathered from /proc/pid/whatever. If the system is all wedged > > and you can't get console control then this info dump doesn't provide you > > with info which you're interested in anyway - you want to see the global > > (or per-cgroup) info, not the per-task info. > > > > It can be gathered by other means, yes, but not at the time of OOM nor > immediately before a task is killed. This tasklist dump is done very > close to the OOM kill and it represents the per-task memory state, whether > system or cgroup, that triggered that event. OK, that's useful. But your changelog was completely wrong - it implies that this sysctl _causes_ the dump, rather than stating that the sysctl _enables_ an oom-kill-time dump: Adds a new sysctl, 'oom_dump_tasks', that dumps a list of all system tasks (excluding kernel threads) and their pid, uid, tgid, vm size, rss cpu, oom_adj score, and name. Helpful for determining why an OOM condition occurred and what rogue task caused it. and silly me believed it. And now I understand why it has "oom_' in the name. I'd further suggest that the Documentation/sysctl/vm.txt buries the lede a bit. Better would be to start with "Enabling this causes a system-wide task dump to be produced when the kernel performs an oom-killing ..." > This could be done other > ways, for instance with an OOM userspace notifier, but that would delay > the SIGKILL being sent. So in the interest of a fast OOM killer, it's > best to dump the information ourselves, if the user chose to enable that > functionality. > > The information should be displayed in a per-task manner because the > global memory state doesn't really matter: we know we're OOM, because > we're in the OOM killer. Showing how little free memory we have isn't > immediately helpful on a system-wide basis. But oom_dump_tasks, the way > I've written it, allows you to identify the "rogue" task that is using way > more memory than expected and allows you to alter oom_adj scores in the > case when the task you've identified, and the one you want dead, isn't the > one that ends up being killed. Well I get it now ;) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 5/5] oom: add sysctl to dump tasks memory state 2007-09-26 21:47 ` Andrew Morton @ 2007-09-27 6:15 ` David Rientjes 0 siblings, 0 replies; 26+ messages in thread From: David Rientjes @ 2007-09-27 6:15 UTC (permalink / raw) To: Andrew Morton; +Cc: andrea, clameter, linux-mm On Wed, 26 Sep 2007, Andrew Morton wrote: > > It can be gathered by other means, yes, but not at the time of OOM nor > > immediately before a task is killed. This tasklist dump is done very > > close to the OOM kill and it represents the per-task memory state, whether > > system or cgroup, that triggered that event. > > OK, that's useful. But your changelog was completely wrong - it implies > that this sysctl _causes_ the dump, rather than stating that the sysctl > _enables_ an oom-kill-time dump: > Oops, that does sound confusing. I'll reword the changelog and the addition to Documentation/sysctl/vm.txt so it's not ambiguous. Sorry for the confusion. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-22 17:47 ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim David Rientjes 2007-09-22 17:47 ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes @ 2007-09-24 19:05 ` Christoph Lameter 2007-09-24 19:14 ` David Rientjes 2007-09-25 4:26 ` Balbir Singh 2 siblings, 1 reply; 26+ messages in thread From: Christoph Lameter @ 2007-09-24 19:05 UTC (permalink / raw) To: David Rientjes; +Cc: Andrew Morton, Andrea Arcangeli, linux-mm On Sat, 22 Sep 2007, David Rientjes wrote: > +++ b/include/linux/mmzone.h > @@ -320,6 +320,10 @@ static inline void zone_set_flag(struct zone *zone, zone_flags_t flag) > { > set_bit(flag, &zone->flags); > } > +static inline int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag) > +{ > + return test_and_set_bit(flag, &zone->flags); > +} Missing blank line. > static inline void zone_clear_flag(struct zone *zone, zone_flags_t flag) > { > clear_bit(flag, &zone->flags); > diff --git a/mm/vmscan.c b/mm/vmscan.c The rest looks fine. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-24 19:05 ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim Christoph Lameter @ 2007-09-24 19:14 ` David Rientjes 2007-09-24 20:11 ` Christoph Lameter 2007-09-24 21:02 ` David Rientjes 0 siblings, 2 replies; 26+ messages in thread From: David Rientjes @ 2007-09-24 19:14 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andrew Morton, Andrea Arcangeli, linux-mm On Mon, 24 Sep 2007, Christoph Lameter wrote: > > +++ b/include/linux/mmzone.h > > @@ -320,6 +320,10 @@ static inline void zone_set_flag(struct zone *zone, zone_flags_t flag) > > { > > set_bit(flag, &zone->flags); > > } > > +static inline int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag) > > +{ > > + return test_and_set_bit(flag, &zone->flags); > > +} > > Missing blank line. > The only blank line for inlined functions added to mmzone.h for zone flag support is between the generic flavors that set, test and set, or clear the flags and the explicit flavors that test specific bits; so this newline behavior is correct as written. I was hoping to avoid doing things like #define ZoneSetReclaimLocked(zone) zone_set_flag((zone), \ ZONE_RECLAIM_LOCKED) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-24 19:14 ` David Rientjes @ 2007-09-24 20:11 ` Christoph Lameter 2007-09-24 21:02 ` David Rientjes 1 sibling, 0 replies; 26+ messages in thread From: Christoph Lameter @ 2007-09-24 20:11 UTC (permalink / raw) To: David Rientjes; +Cc: Andrew Morton, Andrea Arcangeli, linux-mm On Mon, 24 Sep 2007, David Rientjes wrote: > > > +static inline int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag) > > > +{ > > > + return test_and_set_bit(flag, &zone->flags); > > > +} > > > > Missing blank line. > > > > The only blank line for inlined functions added to mmzone.h for zone > flag support is between the generic flavors that set, test and set, or > clear the flags and the explicit flavors that test specific bits; so this > newline behavior is correct as written. > > I was hoping to avoid doing things like > > #define ZoneSetReclaimLocked(zone) zone_set_flag((zone), \ > ZONE_RECLAIM_LOCKED) Not sure what the #define is supposed to tell me. Please add the corresponding blank lines at the end of functions. One function seems to be running into the next. It should look like the rest of mmzone.h: static inline struct page *__section_mem_map_addr(struct mem_section *section) { unsigned long map = section->section_mem_map; map &= SECTION_MAP_MASK; return (struct page *)map; } static inline int valid_section(struct mem_section *section) { return (section && (section->section_mem_map & SECTION_MARKED_PRESENT)); } static inline int section_has_mem_map(struct mem_section *section) { return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP)); } static inline int valid_section_nr(unsigned long nr) { return valid_section(__nr_to_section(nr)); } static inline struct mem_section *__pfn_to_section(unsigned long pfn) { return __nr_to_section(pfn_to_section_nr(pfn)); } static inline int pfn_valid(unsigned long pfn) { if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); } Note that there is a empty line at the end of each function. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-24 19:14 ` David Rientjes 2007-09-24 20:11 ` Christoph Lameter @ 2007-09-24 21:02 ` David Rientjes 2007-09-24 21:09 ` Christoph Lameter 1 sibling, 1 reply; 26+ messages in thread From: David Rientjes @ 2007-09-24 21:02 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andrew Morton, Andrea Arcangeli, linux-mm Add newlines between new zone flag tester/modifier functions. Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> --- include/linux/mmzone.h | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -320,10 +320,12 @@ static inline void zone_set_flag(struct zone *zone, zone_flags_t flag) { set_bit(flag, &zone->flags); } + static inline int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag) { return test_and_set_bit(flag, &zone->flags); } + static inline void zone_clear_flag(struct zone *zone, zone_flags_t flag) { clear_bit(flag, &zone->flags); @@ -333,10 +335,12 @@ static inline int zone_is_all_unreclaimable(const struct zone *zone) { return test_bit(ZONE_ALL_UNRECLAIMABLE, &zone->flags); } + static inline int zone_is_reclaim_locked(const struct zone *zone) { return test_bit(ZONE_RECLAIM_LOCKED, &zone->flags); } + static inline int zone_is_oom_locked(const struct zone *zone) { return test_bit(ZONE_OOM_LOCKED, &zone->flags); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-24 21:02 ` David Rientjes @ 2007-09-24 21:09 ` Christoph Lameter 0 siblings, 0 replies; 26+ messages in thread From: Christoph Lameter @ 2007-09-24 21:09 UTC (permalink / raw) To: David Rientjes; +Cc: Andrew Morton, Andrea Arcangeli, linux-mm On Mon, 24 Sep 2007, David Rientjes wrote: > Add newlines between new zone flag tester/modifier functions. Acked-by: Christoph Lameter <clameter@sgi.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-22 17:47 ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim David Rientjes 2007-09-22 17:47 ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes 2007-09-24 19:05 ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim Christoph Lameter @ 2007-09-25 4:26 ` Balbir Singh 2007-09-25 4:34 ` David Rientjes 2 siblings, 1 reply; 26+ messages in thread From: Balbir Singh @ 2007-09-25 4:26 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm David Rientjes wrote: > + > + if (zone_test_and_set_flag(zone, ZONE_RECLAIM_LOCKED)) > + return 0; What's the consequence of this on the caller of zone_reclaim()? I see that the zone is marked as full and will not be re-examined again. Am I missing something? > + ret = __zone_reclaim(zone, gfp_mask, order); > + zone_clear_flag(zone, ZONE_RECLAIM_LOCKED); > + > + return ret; > } > #endif -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-25 4:26 ` Balbir Singh @ 2007-09-25 4:34 ` David Rientjes 2007-09-25 6:17 ` Balbir Singh 2007-09-25 21:14 ` Christoph Lameter 0 siblings, 2 replies; 26+ messages in thread From: David Rientjes @ 2007-09-25 4:34 UTC (permalink / raw) To: Balbir Singh; +Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm On Tue, 25 Sep 2007, Balbir Singh wrote: > > + > > + if (zone_test_and_set_flag(zone, ZONE_RECLAIM_LOCKED)) > > + return 0; > > What's the consequence of this on the caller of zone_reclaim()? > I see that the zone is marked as full and will not be re-examined > again. > It's only marked as full in the zonelist cache for the zonelist that __alloc_pages() was called with, which is an optimization. The zone is already flagged as being in __zone_reclaim() so there's no need to reinvoke it for this allocation attempt; that behavior is unchanged from current behavior. One thing that has been changed in -mm with regard to my last patchset is that kswapd and try_to_free_pages() are allowed to call shrink_zone() concurrently. ZONE_RECLAIM_LOCKED will be cleared upon return from __zone_reclaim(). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-25 4:34 ` David Rientjes @ 2007-09-25 6:17 ` Balbir Singh 2007-09-25 6:29 ` David Rientjes 2007-09-25 21:15 ` Christoph Lameter 2007-09-25 21:14 ` Christoph Lameter 1 sibling, 2 replies; 26+ messages in thread From: Balbir Singh @ 2007-09-25 6:17 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm David Rientjes wrote: > On Tue, 25 Sep 2007, Balbir Singh wrote: > >>> + >>> + if (zone_test_and_set_flag(zone, ZONE_RECLAIM_LOCKED)) >>> + return 0; >> What's the consequence of this on the caller of zone_reclaim()? >> I see that the zone is marked as full and will not be re-examined >> again. >> > > It's only marked as full in the zonelist cache for the zonelist that > __alloc_pages() was called with, which is an optimization. The zone is > already flagged as being in __zone_reclaim() so there's no need to > reinvoke it for this allocation attempt; that behavior is unchanged from > current behavior. > OK > One thing that has been changed in -mm with regard to my last patchset is > that kswapd and try_to_free_pages() are allowed to call shrink_zone() > concurrently. > Aah.. interesting. Could you define concurrently more precisely, concurrently as in the same zone or for different zones concurrently? -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-25 6:17 ` Balbir Singh @ 2007-09-25 6:29 ` David Rientjes 2007-09-25 21:15 ` Christoph Lameter 1 sibling, 0 replies; 26+ messages in thread From: David Rientjes @ 2007-09-25 6:29 UTC (permalink / raw) To: Balbir Singh; +Cc: Andrew Morton, Andrea Arcangeli, Christoph Lameter, linux-mm On Tue, 25 Sep 2007, Balbir Singh wrote: > > One thing that has been changed in -mm with regard to my last patchset is > > that kswapd and try_to_free_pages() are allowed to call shrink_zone() > > concurrently. > > > > Aah.. interesting. Could you define concurrently more precisely, > concurrently as in the same zone or for different zones concurrently? > Same zone, thankfully. Previous to my 9-patch series that serialized the OOM killer, there was an atomic_t reclaim_in_progress member of each struct zone that was incremented each time shrink_zone() was called and decremented each time it exited. The only place where this was tested was in zone_reclaim() and it returned 0 to __alloc_pages() if it was non-zero prior to calling __zone_reclaim(). So other callers to shrink_zone(), such as kswapd and try_to_free_pages(), were still able to invoke it several times for the same zone but zone_reclaim() could not if the zone was being shrunk, regardless of where shrink_zone() was called from. That's partially still true: kswapd and try_to_free_pages() (and actually balance_pgdat()) can still call it several times, but the first call to zone_reclaim() will also succeed since we're now indicating zone reclaims with a flag instead of an accumulator. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-25 6:17 ` Balbir Singh 2007-09-25 6:29 ` David Rientjes @ 2007-09-25 21:15 ` Christoph Lameter 2007-09-25 21:19 ` David Rientjes 1 sibling, 1 reply; 26+ messages in thread From: Christoph Lameter @ 2007-09-25 21:15 UTC (permalink / raw) To: Balbir Singh; +Cc: David Rientjes, Andrew Morton, Andrea Arcangeli, linux-mm On Tue, 25 Sep 2007, Balbir Singh wrote: > > One thing that has been changed in -mm with regard to my last patchset is > > that kswapd and try_to_free_pages() are allowed to call shrink_zone() > > concurrently. > > > > Aah.. interesting. Could you define concurrently more precisely, > concurrently as in the same zone or for different zones concurrently? There was no change. They were allowed to call shrink_zone concurrently before. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-25 21:15 ` Christoph Lameter @ 2007-09-25 21:19 ` David Rientjes 2007-09-25 21:39 ` Christoph Lameter 0 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2007-09-25 21:19 UTC (permalink / raw) To: Christoph Lameter; +Cc: Balbir Singh, Andrew Morton, Andrea Arcangeli, linux-mm On Tue, 25 Sep 2007, Christoph Lameter wrote: > > > One thing that has been changed in -mm with regard to my last patchset is > > > that kswapd and try_to_free_pages() are allowed to call shrink_zone() > > > concurrently. > > > > > > > Aah.. interesting. Could you define concurrently more precisely, > > concurrently as in the same zone or for different zones concurrently? > > There was no change. They were allowed to call shrink_zone concurrently > before. > Yes, there was. Before the patchset, zone reclaim would not be able to call shrink_zone() on a zone that it is already being invoked for, regardless of where it was previous invoked from. Now it is. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-25 21:19 ` David Rientjes @ 2007-09-25 21:39 ` Christoph Lameter 2007-09-25 21:43 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: Christoph Lameter @ 2007-09-25 21:39 UTC (permalink / raw) To: David Rientjes; +Cc: Balbir Singh, Andrew Morton, Andrea Arcangeli, linux-mm On Tue, 25 Sep 2007, David Rientjes wrote: > On Tue, 25 Sep 2007, Christoph Lameter wrote: > > > > > One thing that has been changed in -mm with regard to my last patchset is > > > > that kswapd and try_to_free_pages() are allowed to call shrink_zone() > > > > concurrently. > > > > > > > > > > Aah.. interesting. Could you define concurrently more precisely, > > > concurrently as in the same zone or for different zones concurrently? > > > > There was no change. They were allowed to call shrink_zone concurrently > > before. > > > > Yes, there was. Before the patchset, zone reclaim would not be able to > call shrink_zone() on a zone that it is already being invoked for, > regardless of where it was previous invoked from. Now it is. Right. I just wanted to make sure of this. The statement above seems to indicate that there was a change to the concurrency of kswapd and try_to_free_pages with one another. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-25 21:39 ` Christoph Lameter @ 2007-09-25 21:43 ` David Rientjes 0 siblings, 0 replies; 26+ messages in thread From: David Rientjes @ 2007-09-25 21:43 UTC (permalink / raw) To: Christoph Lameter; +Cc: Balbir Singh, Andrew Morton, Andrea Arcangeli, linux-mm On Tue, 25 Sep 2007, Christoph Lameter wrote: > > > > > One thing that has been changed in -mm with regard to my last patchset is > > > > > that kswapd and try_to_free_pages() are allowed to call shrink_zone() > > > > > concurrently. > > > > > > > > > > > > > Aah.. interesting. Could you define concurrently more precisely, > > > > concurrently as in the same zone or for different zones concurrently? > > > > > > There was no change. They were allowed to call shrink_zone concurrently > > > before. > > > > > > > Yes, there was. Before the patchset, zone reclaim would not be able to > > call shrink_zone() on a zone that it is already being invoked for, > > regardless of where it was previous invoked from. Now it is. > > Right. I just wanted to make sure of this. The statement above > seems to indicate that there was a change to the concurrency of kswapd > and try_to_free_pages with one another. > Ah, I see, I was trying to say that zone reclaim can now be called concurrently with kswapd and try_to_free_pages() in shrink_zone(). Thanks for pointing that out, it was poorly worded. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-25 4:34 ` David Rientjes 2007-09-25 6:17 ` Balbir Singh @ 2007-09-25 21:14 ` Christoph Lameter 2007-09-25 21:17 ` David Rientjes 1 sibling, 1 reply; 26+ messages in thread From: Christoph Lameter @ 2007-09-25 21:14 UTC (permalink / raw) To: David Rientjes; +Cc: Balbir Singh, Andrew Morton, Andrea Arcangeli, linux-mm On Mon, 24 Sep 2007, David Rientjes wrote: > ZONE_RECLAIM_LOCKED will be cleared upon return from __zone_reclaim(). ZONE_RECLAIM_LOCKED means that one zone reclaim is already running and other processes should not perform zone reclaim on the same zone. They will instead fall back to allocate memory from zones that are not local. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim 2007-09-25 21:14 ` Christoph Lameter @ 2007-09-25 21:17 ` David Rientjes 0 siblings, 0 replies; 26+ messages in thread From: David Rientjes @ 2007-09-25 21:17 UTC (permalink / raw) To: Christoph Lameter; +Cc: Balbir Singh, Andrew Morton, Andrea Arcangeli, linux-mm On Tue, 25 Sep 2007, Christoph Lameter wrote: > On Mon, 24 Sep 2007, David Rientjes wrote: > > > ZONE_RECLAIM_LOCKED will be cleared upon return from __zone_reclaim(). > > ZONE_RECLAIM_LOCKED means that one zone reclaim is already running and > other processes should not perform zone reclaim on the same zone. They > will instead fall back to allocate memory from zones that are not local. > Yes, and that's still true. But the point is that shrink_zone() can be called from different points (__zone_reclaim(), kswapd, try_to_free_pages(), balance_pgdat()) for a zone and it will not stop zone reclaim from being invoked on the same zone. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2007-09-27 6:15 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-09-22 17:47 [patch -mm 1/5] oom: prevent including sched.h in header file David Rientjes 2007-09-22 17:47 ` [patch -mm 2/5] oom: add header file to Kbuild as unifdef David Rientjes 2007-09-22 17:47 ` [patch -mm 3/5] oom: convert zone_scan_lock from mutex to spinlock David Rientjes 2007-09-22 17:47 ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim David Rientjes 2007-09-22 17:47 ` [patch -mm 5/5] oom: add sysctl to dump tasks memory state David Rientjes 2007-09-25 4:37 ` Balbir Singh 2007-09-25 4:57 ` David Rientjes 2007-09-26 20:06 ` Andrew Morton 2007-09-26 20:46 ` David Rientjes 2007-09-26 21:47 ` Andrew Morton 2007-09-27 6:15 ` David Rientjes 2007-09-24 19:05 ` [patch -mm 4/5] mm: test and set zone reclaim lock before starting reclaim Christoph Lameter 2007-09-24 19:14 ` David Rientjes 2007-09-24 20:11 ` Christoph Lameter 2007-09-24 21:02 ` David Rientjes 2007-09-24 21:09 ` Christoph Lameter 2007-09-25 4:26 ` Balbir Singh 2007-09-25 4:34 ` David Rientjes 2007-09-25 6:17 ` Balbir Singh 2007-09-25 6:29 ` David Rientjes 2007-09-25 21:15 ` Christoph Lameter 2007-09-25 21:19 ` David Rientjes 2007-09-25 21:39 ` Christoph Lameter 2007-09-25 21:43 ` David Rientjes 2007-09-25 21:14 ` Christoph Lameter 2007-09-25 21:17 ` David Rientjes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox