* [RFC Patch] Thrashing notification
@ 2007-11-05 14:47 Daniel Spång
2007-11-05 15:43 ` Daniel Spång
2007-11-05 18:30 ` Marcelo Tosatti
0 siblings, 2 replies; 8+ messages in thread
From: Daniel Spång @ 2007-11-05 14:47 UTC (permalink / raw)
To: linux-mm; +Cc: marcelo, drepper, riel, akpm, mbligh, balbir, 7eggert
This patch provides a way to notify user applications when the system
is about to thrash. It checks the scanning priority of the inactive
lru list and notifies user applications via sysfs when the priority
reaches a threshold. In comparison to Marcelo Tosatti's oom
notification patch, this patch also works on systems without swap.
Applications can poll() on this sysfs file and can then free memory in
one way or another to prevent an oom situation.
Using a test application http://spng.se/oomtest/ that uses multiple
allocator threads and a single release thread one can see that this
works fairly well. See http://spng.se/oomtest/ for more details
and graphs.
Signed-off-by: Daniel SpAJPYng <daniel.spang@gmail.com>
diff -purN linux-2.6.23.1-mm1/include/linux/thrashing_notify.h linux-2.6.23.1-mm1_thrashing/include/linux/thrashing_notify.h
--- linux-2.6.23.1-mm1/include/linux/thrashing_notify.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.23.1-mm1_thrashing/include/linux/thrashing_notify.h 2007-11-05 14:23:26.000000000 +0100
@@ -0,0 +1,8 @@
+#ifndef _LINUX_THRASHING_NOTIFY_H
+#define _LINUX_THRASHING_NOTIFY_H
+
+void thrashing_notify(int priority);
+
+extern int thrashing_notifier_threshold;
+
+#endif /* _LINUX_THRASHING_NOTIFY_H */
diff -purN linux-2.6.23.1-mm1/kernel/sysctl.c linux-2.6.23.1-mm1_thrashing/kernel/sysctl.c
--- linux-2.6.23.1-mm1/kernel/sysctl.c 2007-11-01 14:59:16.000000000 +0100
+++ linux-2.6.23.1-mm1_thrashing/kernel/sysctl.c 2007-11-05 14:22:29.000000000 +0100
@@ -46,6 +46,7 @@
#include <linux/nfs_fs.h>
#include <linux/acpi.h>
#include <linux/reboot.h>
+#include <linux/thrashing_notify.h>
#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -102,6 +103,7 @@ static int minolduid;
static int min_percpu_pagelist_fract = 8;
static int ngroups_max = NGROUPS_MAX;
+static int def_priority = DEF_PRIORITY;
#ifdef CONFIG_KMOD
extern char modprobe_path[];
@@ -1071,6 +1073,16 @@ static struct ctl_table vm_table[] = {
.extra1 = &zero,
},
#endif
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "thrashing_notifier_threshold",
+ .data = &thrashing_notifier_threshold,
+ .maxlen = sizeof thrashing_notifier_threshold,
+ .mode = 0644,
+ .proc_handler = &proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &def_priority,
+ },
/*
* NOTE: do not add new entries to this table unless you have read
* Documentation/sysctl/ctl_unnumbered.txt
diff -purN linux-2.6.23.1-mm1/mm/Makefile linux-2.6.23.1-mm1_thrashing/mm/Makefile
--- linux-2.6.23.1-mm1/mm/Makefile 2007-11-01 14:59:16.000000000 +0100
+++ linux-2.6.23.1-mm1_thrashing/mm/Makefile 2007-11-05 14:22:11.000000000 +0100
@@ -11,7 +11,7 @@ obj-y := bootmem.o filemap.o mempool.o
page_alloc.o page-writeback.o pdflush.o \
readahead.o swap.o truncate.o vmscan.o \
prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
- page_isolation.o $(mmu-y)
+ page_isolation.o thrashing_notify.o $(mmu-y)
obj-$(CONFIG_BOUNCE) += bounce.o
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
diff -purN linux-2.6.23.1-mm1/mm/thrashing_notify.c linux-2.6.23.1-mm1_thrashing/mm/thrashing_notify.c
--- linux-2.6.23.1-mm1/mm/thrashing_notify.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.23.1-mm1_thrashing/mm/thrashing_notify.c 2007-11-05 14:22:46.000000000 +0100
@@ -0,0 +1,56 @@
+/*
+ * mm/thrashing_notify.c
+ *
+ * Copyright (C) 2007 Daniel SpAJPYng <daniel.spang@gmail.com>
+ *
+ * Released under the GPL, see the file COPYING for details.
+ */
+
+#include <linux/thrashing_notify.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/mman.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+
+/*
+ * The count of thrashing occasions.
+ *
+ * Published to userspace at /sys/kernel/nr_thrashing
+ */
+int nr_thrashing = 0;
+
+int thrashing_notifier_threshold = 4;
+
+static ssize_t nr_thrashing_show(struct kset *kset, char *page)
+{
+ return sprintf(page, "%u\n", nr_thrashing);
+}
+
+static struct subsys_attribute nr_thrashing_attr = __ATTR_RO(nr_thrashing);
+
+static struct attribute *nr_thrashing_attrs[] = {
+ &nr_thrashing_attr.attr,
+ NULL,
+};
+
+static struct attribute_group nr_thrashing_attr_group = {
+ .attrs = nr_thrashing_attrs,
+};
+
+void thrashing_notify(int priority)
+{
+ nr_thrashing++;
+ sysfs_notify(&kernel_subsys.kobj, NULL, "nr_thrashing");
+}
+
+static int __init thrashing_init(void)
+{
+ return sysfs_create_group(&kernel_subsys.kobj,
+ &nr_thrashing_attr_group);
+}
+
+module_init(thrashing_init)
+
diff -purN linux-2.6.23.1-mm1/mm/vmscan.c linux-2.6.23.1-mm1_thrashing/mm/vmscan.c
--- linux-2.6.23.1-mm1/mm/vmscan.c 2007-11-01 14:59:16.000000000 +0100
+++ linux-2.6.23.1-mm1_thrashing/mm/vmscan.c 2007-11-05 14:21:55.000000000 +0100
@@ -39,6 +39,7 @@
#include <linux/kthread.h>
#include <linux/freezer.h>
#include <linux/memcontrol.h>
+#include <linux/thrashing_notify.h>
#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -1285,6 +1286,9 @@ static unsigned long do_try_to_free_page
sc->nr_io_pages = 0;
if (!priority)
disable_swap_token();
+ if (priority == thrashing_notifier_threshold)
+ thrashing_notify(priority);
nr_reclaimed += shrink_zones(priority, zones, sc);
/*
* Don't shrink slabs when reclaiming memory from
@@ -1448,7 +1452,9 @@ loop_again:
/* The swap token gets in the way of swapout... */
if (!priority)
disable_swap_token();
+ if (priority == thrashing_notifier_threshold)
+ thrashing_notify(priority);
sc.nr_io_pages = 0;
all_zones_ok = 1;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC Patch] Thrashing notification
2007-11-05 14:47 [RFC Patch] Thrashing notification Daniel Spång
@ 2007-11-05 15:43 ` Daniel Spång
2007-11-05 18:30 ` Marcelo Tosatti
1 sibling, 0 replies; 8+ messages in thread
From: Daniel Spång @ 2007-11-05 15:43 UTC (permalink / raw)
To: linux-mm; +Cc: marcelo, drepper, riel, akpm, mbligh, balbir, 7eggert
On Mon, 05 Nov 2007 15:47:53 +0100, Daniel SpAJPYng <daniel.spang@gmail.com>
wrote:
> This patch provides a way to notify user applications when the system
> is about to thrash. It checks the scanning priority of the inactive
> lru list and notifies user applications via sysfs when the priority
> reaches a threshold. In comparison to Marcelo Tosatti's oom
> notification patch, this patch also works on systems without swap.
>
> Applications can poll() on this sysfs file and can then free memory in
> one way or another to prevent an oom situation.
>
> Using a test application http://spng.se/oomtest/ that uses multiple
> allocator threads and a single release thread one can see that this
> works fairly well. See http://spng.se/oomtest/ for more details
> and graphs.
>
> Signed-off-by: Daniel Sp??ng <daniel.spang@gmail.com>
>
> diff -purN linux-2.6.23.1-mm1/include/linux/thrashing_notify.h
> linux-2.6.23.1-mm1_thrashing/include/linux/thrashing_notify.h
> --- linux-2.6.23.1-mm1/include/linux/thrashing_notify.h 1970-01-01
> 01:00:00.000000000 +0100
> +++
> linux-2.6.23.1-mm1_thrashing/include/linux/thrashing_notify.h 2007-11-05
> 14:23:26.000000000 +0100
> @@ -0,0 +1,8 @@
> +#ifndef _LINUX_THRASHING_NOTIFY_H
> +#define _LINUX_THRASHING_NOTIFY_H
> +
> +void thrashing_notify(int priority);
> +
> +extern int thrashing_notifier_threshold;
> +
> +#endif /* _LINUX_THRASHING_NOTIFY_H */
> diff -purN linux-2.6.23.1-mm1/kernel/sysctl.c
> linux-2.6.23.1-mm1_thrashing/kernel/sysctl.c
> --- linux-2.6.23.1-mm1/kernel/sysctl.c 2007-11-01 14:59:16.000000000
> +0100
> +++ linux-2.6.23.1-mm1_thrashing/kernel/sysctl.c 2007-11-05
> 14:22:29.000000000 +0100
> @@ -46,6 +46,7 @@
> #include <linux/nfs_fs.h>
> #include <linux/acpi.h>
> #include <linux/reboot.h>
> +#include <linux/thrashing_notify.h>
> #include <asm/uaccess.h>
> #include <asm/processor.h>
> @@ -102,6 +103,7 @@ static int minolduid;
> static int min_percpu_pagelist_fract = 8;
> static int ngroups_max = NGROUPS_MAX;
> +static int def_priority = DEF_PRIORITY;
> #ifdef CONFIG_KMOD
> extern char modprobe_path[];
> @@ -1071,6 +1073,16 @@ static struct ctl_table vm_table[] = {
> .extra1 = &zero,
> },
> #endif
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "thrashing_notifier_threshold",
> + .data = &thrashing_notifier_threshold,
> + .maxlen = sizeof thrashing_notifier_threshold,
> + .mode = 0644,
> + .proc_handler = &proc_dointvec_minmax,
> + .extra1 = &zero,
> + .extra2 = &def_priority,
> + },
> /*
> * NOTE: do not add new entries to this table unless you have read
> * Documentation/sysctl/ctl_unnumbered.txt
> diff -purN linux-2.6.23.1-mm1/mm/Makefile
> linux-2.6.23.1-mm1_thrashing/mm/Makefile
> --- linux-2.6.23.1-mm1/mm/Makefile 2007-11-01 14:59:16.000000000 +0100
> +++ linux-2.6.23.1-mm1_thrashing/mm/Makefile 2007-11-05
> 14:22:11.000000000 +0100
> @@ -11,7 +11,7 @@ obj-y := bootmem.o filemap.o mempool.o
> page_alloc.o page-writeback.o pdflush.o \
> readahead.o swap.o truncate.o vmscan.o \
> prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
> - page_isolation.o $(mmu-y)
> + page_isolation.o thrashing_notify.o $(mmu-y)
> obj-$(CONFIG_BOUNCE) += bounce.o
> obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
> diff -purN linux-2.6.23.1-mm1/mm/thrashing_notify.c
> linux-2.6.23.1-mm1_thrashing/mm/thrashing_notify.c
> --- linux-2.6.23.1-mm1/mm/thrashing_notify.c 1970-01-01
> 01:00:00.000000000 +0100
> +++ linux-2.6.23.1-mm1_thrashing/mm/thrashing_notify.c 2007-11-05
> 14:22:46.000000000 +0100
> @@ -0,0 +1,56 @@
> +/*
> + * mm/thrashing_notify.c
> + *
> + * Copyright (C) 2007 Daniel Sp??ng <daniel.spang@gmail.com>
> + *
> + * Released under the GPL, see the file COPYING for details.
> + */
> +
> +#include <linux/thrashing_notify.h>
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/mman.h>
> +#include <linux/init.h>
> +#include <linux/types.h>
> +#include <linux/kobject.h>
> +#include <linux/sysfs.h>
> +
> +/*
> + * The count of thrashing occasions.
> + *
> + * Published to userspace at /sys/kernel/nr_thrashing
> + */
> +int nr_thrashing = 0;
> +
> +int thrashing_notifier_threshold = 4;
> +
> +static ssize_t nr_thrashing_show(struct kset *kset, char *page)
> +{
> + return sprintf(page, "%u\n", nr_thrashing);
> +}
> +
> +static struct subsys_attribute nr_thrashing_attr =
> __ATTR_RO(nr_thrashing);
> +
> +static struct attribute *nr_thrashing_attrs[] = {
> + &nr_thrashing_attr.attr,
> + NULL,
> +};
> +
> +static struct attribute_group nr_thrashing_attr_group = {
> + .attrs = nr_thrashing_attrs,
> +};
> +
> +void thrashing_notify(int priority)
> +{
> + nr_thrashing++;
> + sysfs_notify(&kernel_subsys.kobj, NULL, "nr_thrashing");
> +}
> +
> +static int __init thrashing_init(void)
> +{
> + return sysfs_create_group(&kernel_subsys.kobj,
> + &nr_thrashing_attr_group);
> +}
> +
> +module_init(thrashing_init)
> +
> diff -purN linux-2.6.23.1-mm1/mm/vmscan.c
> linux-2.6.23.1-mm1_thrashing/mm/vmscan.c
> --- linux-2.6.23.1-mm1/mm/vmscan.c 2007-11-01 14:59:16.000000000 +0100
> +++ linux-2.6.23.1-mm1_thrashing/mm/vmscan.c 2007-11-05
> 14:21:55.000000000 +0100
> @@ -39,6 +39,7 @@
> #include <linux/kthread.h>
> #include <linux/freezer.h>
> #include <linux/memcontrol.h>
> +#include <linux/thrashing_notify.h>
> #include <asm/tlbflush.h>
> #include <asm/div64.h>
> @@ -1285,6 +1286,9 @@ static unsigned long do_try_to_free_page
> sc->nr_io_pages = 0;
> if (!priority)
> disable_swap_token();
> + if (priority == thrashing_notifier_threshold)
> + thrashing_notify(priority);
> nr_reclaimed += shrink_zones(priority, zones, sc);
> /*
> * Don't shrink slabs when reclaiming memory from
> @@ -1448,7 +1452,9 @@ loop_again:
> /* The swap token gets in the way of swapout... */
> if (!priority)
> disable_swap_token();
> + if (priority == thrashing_notifier_threshold)
> + thrashing_notify(priority);
> sc.nr_io_pages = 0;
> all_zones_ok = 1;
>
Oh, some rotten bits. Here is the correct one.
diff -purN linux-2.6.23.1-mm1/include/linux/thrashing_notify.h
linux-2.6.23.1-mm1_thrashing/include/linux/thrashing_notify.h
--- linux-2.6.23.1-mm1/include/linux/thrashing_notify.h 1970-01-01
01:00:00.000000000 +0100
+++
linux-2.6.23.1-mm1_thrashing/include/linux/thrashing_notify.h 2007-11-05
14:23:26.000000000 +0100
@@ -0,0 +1,8 @@
+#ifndef _LINUX_THRASHING_NOTIFY_H
+#define _LINUX_THRASHING_NOTIFY_H
+
+void thrashing_notify(int priority);
+
+extern int thrashing_notifier_threshold;
+
+#endif /* _LINUX_THRASHING_NOTIFY_H */
diff -purN linux-2.6.23.1-mm1/kernel/fork.c
linux-2.6.23.1-mm1_thrashing/kernel/fork.c
--- linux-2.6.23.1-mm1/kernel/fork.c 2007-11-01 14:59:16.000000000 +0100
+++ linux-2.6.23.1-mm1_thrashing/kernel/fork.c 2007-11-05
14:16:04.000000000 +0100
@@ -1443,6 +1443,9 @@ long do_fork(unsigned long clone_flags,
task_pid_nr_ns(p, current->nsproxy->pid_ns) :
task_pid_vnr(p);
+ if (clone_flags & CLONE_PARENT_SETTID)
+ put_user(nr, parent_tidptr);
+
if (clone_flags & CLONE_VFORK) {
p->vfork_done = &vfork;
init_completion(&vfork);
diff -purN linux-2.6.23.1-mm1/kernel/sysctl.c
linux-2.6.23.1-mm1_thrashing/kernel/sysctl.c
--- linux-2.6.23.1-mm1/kernel/sysctl.c 2007-11-01 14:59:16.000000000 +0100
+++ linux-2.6.23.1-mm1_thrashing/kernel/sysctl.c 2007-11-05
14:22:29.000000000 +0100
@@ -46,6 +46,7 @@
#include <linux/nfs_fs.h>
#include <linux/acpi.h>
#include <linux/reboot.h>
+#include <linux/thrashing_notify.h>
#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -102,6 +103,7 @@ static int minolduid;
static int min_percpu_pagelist_fract = 8;
static int ngroups_max = NGROUPS_MAX;
+static int def_priority = DEF_PRIORITY;
#ifdef CONFIG_KMOD
extern char modprobe_path[];
@@ -1071,6 +1073,16 @@ static struct ctl_table vm_table[] = {
.extra1 = &zero,
},
#endif
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "thrashing_notifier_threshold",
+ .data = &thrashing_notifier_threshold,
+ .maxlen = sizeof thrashing_notifier_threshold,
+ .mode = 0644,
+ .proc_handler = &proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &def_priority,
+ },
/*
* NOTE: do not add new entries to this table unless you have read
* Documentation/sysctl/ctl_unnumbered.txt
diff -purN linux-2.6.23.1-mm1/mm/Makefile
linux-2.6.23.1-mm1_thrashing/mm/Makefile
--- linux-2.6.23.1-mm1/mm/Makefile 2007-11-01 14:59:16.000000000 +0100
+++ linux-2.6.23.1-mm1_thrashing/mm/Makefile 2007-11-05 14:22:11.000000000
+0100
@@ -11,7 +11,7 @@ obj-y := bootmem.o filemap.o mempool.o
page_alloc.o page-writeback.o pdflush.o \
readahead.o swap.o truncate.o vmscan.o \
prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
- page_isolation.o $(mmu-y)
+ page_isolation.o thrashing_notify.o $(mmu-y)
obj-$(CONFIG_BOUNCE) += bounce.o
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
diff -purN linux-2.6.23.1-mm1/mm/thrashing_notify.c
linux-2.6.23.1-mm1_thrashing/mm/thrashing_notify.c
--- linux-2.6.23.1-mm1/mm/thrashing_notify.c 1970-01-01 01:00:00.000000000
+0100
+++ linux-2.6.23.1-mm1_thrashing/mm/thrashing_notify.c 2007-11-05
14:22:46.000000000 +0100
@@ -0,0 +1,56 @@
+/*
+ * mm/thrashing_notify.c
+ *
+ * Copyright (C) 2007 Daniel SpAJPYng <daniel.spang@gmail.com>
+ *
+ * Released under the GPL, see the file COPYING for details.
+ */
+
+#include <linux/thrashing_notify.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/mman.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+
+/*
+ * The count of thrashing occasions.
+ *
+ * Published to userspace at /sys/kernel/nr_thrashing
+ */
+int nr_thrashing = 0;
+
+int thrashing_notifier_threshold = 4;
+
+static ssize_t nr_thrashing_show(struct kset *kset, char *page)
+{
+ return sprintf(page, "%u\n", nr_thrashing);
+}
+
+static struct subsys_attribute nr_thrashing_attr =
__ATTR_RO(nr_thrashing);
+
+static struct attribute *nr_thrashing_attrs[] = {
+ &nr_thrashing_attr.attr,
+ NULL,
+};
+
+static struct attribute_group nr_thrashing_attr_group = {
+ .attrs = nr_thrashing_attrs,
+};
+
+void thrashing_notify(int priority)
+{
+ nr_thrashing++;
+ sysfs_notify(&kernel_subsys.kobj, NULL, "nr_thrashing");
+}
+
+static int __init thrashing_init(void)
+{
+ return sysfs_create_group(&kernel_subsys.kobj,
+ &nr_thrashing_attr_group);
+}
+
+module_init(thrashing_init)
+
diff -purN linux-2.6.23.1-mm1/mm/vmscan.c
linux-2.6.23.1-mm1_thrashing/mm/vmscan.c
--- linux-2.6.23.1-mm1/mm/vmscan.c 2007-11-01 14:59:16.000000000 +0100
+++ linux-2.6.23.1-mm1_thrashing/mm/vmscan.c 2007-11-05 16:21:40.000000000
+0100
@@ -39,6 +39,7 @@
#include <linux/kthread.h>
#include <linux/freezer.h>
#include <linux/memcontrol.h>
+#include <linux/thrashing_notify.h>
#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -1285,6 +1286,8 @@ static unsigned long do_try_to_free_page
sc->nr_io_pages = 0;
if (!priority)
disable_swap_token();
+ if (priority == thrashing_notifier_threshold)
+ thrashing_notify(priority);
nr_reclaimed += shrink_zones(priority, zones, sc);
/*
* Don't shrink slabs when reclaiming memory from
@@ -1448,7 +1451,8 @@ loop_again:
/* The swap token gets in the way of swapout... */
if (!priority)
disable_swap_token();
-
+ if (priority == thrashing_notifier_threshold)
+ thrashing_notify(priority);
sc.nr_io_pages = 0;
all_zones_ok = 1;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC Patch] Thrashing notification
2007-11-05 14:47 [RFC Patch] Thrashing notification Daniel Spång
2007-11-05 15:43 ` Daniel Spång
@ 2007-11-05 18:30 ` Marcelo Tosatti
2007-11-05 20:17 ` Rik van Riel
2007-11-06 10:36 ` Daniel Spång
1 sibling, 2 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2007-11-05 18:30 UTC (permalink / raw)
To: Daniel Spång
Cc: linux-mm, marcelo, drepper, riel, akpm, mbligh, balbir, 7eggert
On Mon, Nov 05, 2007 at 03:47:53PM +0100, Daniel Spang wrote:
> This patch provides a way to notify user applications when the system
> is about to thrash. It checks the scanning priority of the inactive
> lru list and notifies user applications via sysfs when the priority
> reaches a threshold. In comparison to Marcelo Tosatti's oom
> notification patch, this patch also works on systems without swap.
>
> Applications can poll() on this sysfs file and can then free memory in
> one way or another to prevent an oom situation.
>
> Using a test application http://spng.se/oomtest/ that uses multiple
> allocator threads and a single release thread one can see that this
> works fairly well. See http://spng.se/oomtest/ for more details
> and graphs.
Daniel,
Hooking into try_to_free_pages() makes the scheme suspectible to
specifics such as:
- can the task writeout pages?
- is the allocation a higher order one?
- in what zones is it operating on?
Remember that notifications are sent to applications which can allocate
globally... It is not very useful to send notifications for a userspace
which has a large percentage of its memory in highmem if the system is
having a lowmem zone shortage (granted that the notify-on-swap heuristic
has that problem, but you can then argue that swap affects system
performance globally, and it generally does in desktop systems).
Other than that tuning "priority" from try_to_free_pages() is rather
difficult for users/admins.
My previous patches had the zone limitation, but the following way of
asking "are we low on memory?" gets rid of it:
+static unsigned int mem_notify_poll(struct file *file, poll_table *wait)
+{
+ unsigned int val = 0;
+ struct zone *zone;
+ int tpages_low, tpages_free, tpages_reserve;
+
+ tpages_low = tpages_free = tpages_reserve = 0;
+
+ poll_wait(file, &mem_wait, wait);
+
+ for_each_zone(zone) {
+ if (!populated_zone(zone))
+ continue;
+ tpages_low += zone->pages_low;
+ tpages_free += zone_page_state(zone, NR_FREE_PAGES);
+ /* always use the reserve of the highest allocation type */
+ tpages_reserve += zone->lowmem_reserve[MAX_NR_ZONES-1];
+ }
+
+ if (mem_notify_status || (tpages_free <= tpages_low + tpages_reserve))
+ val = POLLIN;
+
+ return val;
+}
> Signed-off-by: Daniel Spang <daniel.spang@gmail.com>
>
> diff -purN linux-2.6.23.1-mm1/include/linux/thrashing_notify.h linux-2.6.23.1-mm1_thrashing/include/linux/thrashing_notify.h
> --- linux-2.6.23.1-mm1/include/linux/thrashing_notify.h 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.23.1-mm1_thrashing/include/linux/thrashing_notify.h 2007-11-05 14:23:26.000000000 +0100
> @@ -0,0 +1,8 @@
> +#ifndef _LINUX_THRASHING_NOTIFY_H
> +#define _LINUX_THRASHING_NOTIFY_H
> +
> +void thrashing_notify(int priority);
> +
> +extern int thrashing_notifier_threshold;
> +
> +#endif /* _LINUX_THRASHING_NOTIFY_H */
> diff -purN linux-2.6.23.1-mm1/kernel/sysctl.c linux-2.6.23.1-mm1_thrashing/kernel/sysctl.c
> --- linux-2.6.23.1-mm1/kernel/sysctl.c 2007-11-01 14:59:16.000000000 +0100
> +++ linux-2.6.23.1-mm1_thrashing/kernel/sysctl.c 2007-11-05 14:22:29.000000000 +0100
> @@ -46,6 +46,7 @@
> #include <linux/nfs_fs.h>
> #include <linux/acpi.h>
> #include <linux/reboot.h>
> +#include <linux/thrashing_notify.h>
>
> #include <asm/uaccess.h>
> #include <asm/processor.h>
> @@ -102,6 +103,7 @@ static int minolduid;
> static int min_percpu_pagelist_fract = 8;
>
> static int ngroups_max = NGROUPS_MAX;
> +static int def_priority = DEF_PRIORITY;
>
> #ifdef CONFIG_KMOD
> extern char modprobe_path[];
> @@ -1071,6 +1073,16 @@ static struct ctl_table vm_table[] = {
> .extra1 = &zero,
> },
> #endif
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "thrashing_notifier_threshold",
> + .data = &thrashing_notifier_threshold,
> + .maxlen = sizeof thrashing_notifier_threshold,
> + .mode = 0644,
> + .proc_handler = &proc_dointvec_minmax,
> + .extra1 = &zero,
> + .extra2 = &def_priority,
> + },
> /*
> * NOTE: do not add new entries to this table unless you have read
> * Documentation/sysctl/ctl_unnumbered.txt
> diff -purN linux-2.6.23.1-mm1/mm/Makefile linux-2.6.23.1-mm1_thrashing/mm/Makefile
> --- linux-2.6.23.1-mm1/mm/Makefile 2007-11-01 14:59:16.000000000 +0100
> +++ linux-2.6.23.1-mm1_thrashing/mm/Makefile 2007-11-05 14:22:11.000000000 +0100
> @@ -11,7 +11,7 @@ obj-y := bootmem.o filemap.o mempool.o
> page_alloc.o page-writeback.o pdflush.o \
> readahead.o swap.o truncate.o vmscan.o \
> prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
> - page_isolation.o $(mmu-y)
> + page_isolation.o thrashing_notify.o $(mmu-y)
>
> obj-$(CONFIG_BOUNCE) += bounce.o
> obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o
> diff -purN linux-2.6.23.1-mm1/mm/thrashing_notify.c linux-2.6.23.1-mm1_thrashing/mm/thrashing_notify.c
> --- linux-2.6.23.1-mm1/mm/thrashing_notify.c 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.23.1-mm1_thrashing/mm/thrashing_notify.c 2007-11-05 14:22:46.000000000 +0100
> @@ -0,0 +1,56 @@
> +/*
> + * mm/thrashing_notify.c
> + *
> + * Copyright (C) 2007 Daniel Spang <daniel.spang@gmail.com>
> + *
> + * Released under the GPL, see the file COPYING for details.
> + */
> +
> +#include <linux/thrashing_notify.h>
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/mman.h>
> +#include <linux/init.h>
> +#include <linux/types.h>
> +#include <linux/kobject.h>
> +#include <linux/sysfs.h>
> +
> +/*
> + * The count of thrashing occasions.
> + *
> + * Published to userspace at /sys/kernel/nr_thrashing
> + */
> +int nr_thrashing = 0;
> +
> +int thrashing_notifier_threshold = 4;
> +
> +static ssize_t nr_thrashing_show(struct kset *kset, char *page)
> +{
> + return sprintf(page, "%u\n", nr_thrashing);
> +}
> +
> +static struct subsys_attribute nr_thrashing_attr = __ATTR_RO(nr_thrashing);
> +
> +static struct attribute *nr_thrashing_attrs[] = {
> + &nr_thrashing_attr.attr,
> + NULL,
> +};
> +
> +static struct attribute_group nr_thrashing_attr_group = {
> + .attrs = nr_thrashing_attrs,
> +};
> +
> +void thrashing_notify(int priority)
> +{
> + nr_thrashing++;
> + sysfs_notify(&kernel_subsys.kobj, NULL, "nr_thrashing");
> +}
> +
> +static int __init thrashing_init(void)
> +{
> + return sysfs_create_group(&kernel_subsys.kobj,
> + &nr_thrashing_attr_group);
> +}
> +
> +module_init(thrashing_init)
> +
> diff -purN linux-2.6.23.1-mm1/mm/vmscan.c linux-2.6.23.1-mm1_thrashing/mm/vmscan.c
> --- linux-2.6.23.1-mm1/mm/vmscan.c 2007-11-01 14:59:16.000000000 +0100
> +++ linux-2.6.23.1-mm1_thrashing/mm/vmscan.c 2007-11-05 14:21:55.000000000 +0100
> @@ -39,6 +39,7 @@
> #include <linux/kthread.h>
> #include <linux/freezer.h>
> #include <linux/memcontrol.h>
> +#include <linux/thrashing_notify.h>
>
> #include <asm/tlbflush.h>
> #include <asm/div64.h>
> @@ -1285,6 +1286,9 @@ static unsigned long do_try_to_free_page
> sc->nr_io_pages = 0;
> if (!priority)
> disable_swap_token();
> + if (priority == thrashing_notifier_threshold)
> + thrashing_notify(priority);
> nr_reclaimed += shrink_zones(priority, zones, sc);
> /*
> * Don't shrink slabs when reclaiming memory from
> @@ -1448,7 +1452,9 @@ loop_again:
> /* The swap token gets in the way of swapout... */
> if (!priority)
> disable_swap_token();
> + if (priority == thrashing_notifier_threshold)
> + thrashing_notify(priority);
> sc.nr_io_pages = 0;
> all_zones_ok = 1;
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC Patch] Thrashing notification
2007-11-05 18:30 ` Marcelo Tosatti
@ 2007-11-05 20:17 ` Rik van Riel
2007-11-06 10:41 ` Daniel Spång
2007-11-06 10:36 ` Daniel Spång
1 sibling, 1 reply; 8+ messages in thread
From: Rik van Riel @ 2007-11-05 20:17 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Daniel Spång, linux-mm, drepper, akpm, mbligh, balbir, 7eggert
On Mon, 5 Nov 2007 13:30:25 -0500
Marcelo Tosatti <marcelo@kvack.org> wrote:
> Hooking into try_to_free_pages() makes the scheme suspectible to
> specifics such as:
The specific of where the hook is can be changed. I am sure the
two of you can come up with the best way to do things. Just keep
shooting holes in each other's ideas until one idea remains which
neither of you can find a problem with[1] :)
> Remember that notifications are sent to applications which can allocate
> globally...
This is the bigger problem with the sysfs code: every task that
watches the sysfs node will get woken up. That could be a big
problem when there are hundreds of processes watching that file.
Marcelo's code, which only wakes up one task at a time, has the
potential to work much better. That code can also be enhanced
to wake up tasks that use a lot of memory on the specific NUMA
node that has a memory shortage.
[1] Yes, that is how I usually come up with VM ideas :)
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC Patch] Thrashing notification
2007-11-05 18:30 ` Marcelo Tosatti
2007-11-05 20:17 ` Rik van Riel
@ 2007-11-06 10:36 ` Daniel Spång
1 sibling, 0 replies; 8+ messages in thread
From: Daniel Spång @ 2007-11-06 10:36 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-mm, drepper, riel, akpm, mbligh, balbir, 7eggert
On 11/5/07, Marcelo Tosatti <marcelo@kvack.org> wrote:
> Hooking into try_to_free_pages() makes the scheme suspectible to
> specifics such as:
>
> - can the task writeout pages?
> - is the allocation a higher order one?
> - in what zones is it operating on?
>
> Remember that notifications are sent to applications which can allocate
> globally... It is not very useful to send notifications for a userspace
> which has a large percentage of its memory in highmem if the system is
> having a lowmem zone shortage (granted that the notify-on-swap heuristic
> has that problem, but you can then argue that swap affects system
> performance globally, and it generally does in desktop systems).
On a swapless system, the alternative is often to get killed by the oom killer.
> Other than that tuning "priority" from try_to_free_pages() is rather
> difficult for users/admins.
Yes, that parameter might need some tuning, but my initial tests show
that is pretty robust if you keep out of the ends of the interval.
> My previous patches had the zone limitation, but the following way of
> asking "are we low on memory?" gets rid of it:
>
> +static unsigned int mem_notify_poll(struct file *file, poll_table *wait)
> +{
> + unsigned int val = 0;
> + struct zone *zone;
> + int tpages_low, tpages_free, tpages_reserve;
> +
> + tpages_low = tpages_free = tpages_reserve = 0;
> +
> + poll_wait(file, &mem_wait, wait);
> +
> + for_each_zone(zone) {
> + if (!populated_zone(zone))
> + continue;
> + tpages_low += zone->pages_low;
> + tpages_free += zone_page_state(zone, NR_FREE_PAGES);
> + /* always use the reserve of the highest allocation type */
> + tpages_reserve += zone->lowmem_reserve[MAX_NR_ZONES-1];
> + }
> +
> + if (mem_notify_status || (tpages_free <= tpages_low + tpages_reserve))
> + val = POLLIN;
> +
> + return val;
> +}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC Patch] Thrashing notification
2007-11-05 20:17 ` Rik van Riel
@ 2007-11-06 10:41 ` Daniel Spång
2007-11-06 20:01 ` Rik van Riel
0 siblings, 1 reply; 8+ messages in thread
From: Daniel Spång @ 2007-11-06 10:41 UTC (permalink / raw)
To: Rik van Riel
Cc: Marcelo Tosatti, linux-mm, drepper, akpm, mbligh, balbir, 7eggert
On 11/5/07, Rik van Riel <riel@redhat.com> wrote:
> On Mon, 5 Nov 2007 13:30:25 -0500
> Marcelo Tosatti <marcelo@kvack.org> wrote:
>
> > Hooking into try_to_free_pages() makes the scheme suspectible to
> > specifics such as:
>
> The specific of where the hook is can be changed. I am sure the
> two of you can come up with the best way to do things. Just keep
> shooting holes in each other's ideas until one idea remains which
> neither of you can find a problem with[1] :)
>
> > Remember that notifications are sent to applications which can allocate
> > globally...
>
> This is the bigger problem with the sysfs code: every task that
> watches the sysfs node will get woken up. That could be a big
> problem when there are hundreds of processes watching that file.
>
> Marcelo's code, which only wakes up one task at a time, has the
> potential to work much better. That code can also be enhanced
> to wake up tasks that use a lot of memory on the specific NUMA
> node that has a memory shortage.
>
> [1] Yes, that is how I usually come up with VM ideas :)
I have actually no problem at all using a device to get the message to
userspace. My patch was more like a demonstration of when to trigger
the notification. I still (obviously) think that we need a
notification for systems without swap too.
A concern, or feature =), with the notify-on-swap method is that with
responsive user applications, it will never use swap at all. There are
for sure systems where this behavior is desirable, but for example
desktop systems, the memory occupied by inactive processes might be
better used by active ones.
I think there is a need for both notifications, first a notification
when we are about to swap and then one to trigger when the total free
vm is low or when the system is thrashing, preferable using the same
notification method.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC Patch] Thrashing notification
2007-11-06 10:41 ` Daniel Spång
@ 2007-11-06 20:01 ` Rik van Riel
2007-11-06 21:27 ` Daniel Spång
0 siblings, 1 reply; 8+ messages in thread
From: Rik van Riel @ 2007-11-06 20:01 UTC (permalink / raw)
To: Daniel Spång
Cc: Marcelo Tosatti, linux-mm, drepper, akpm, mbligh, balbir, 7eggert
On Tue, 6 Nov 2007 11:41:20 +0100
"Daniel SpAJPYng" <daniel.spang@gmail.com> wrote:
> I have actually no problem at all using a device to get the message to
> userspace. My patch was more like a demonstration of when to trigger
> the notification. I still (obviously) think that we need a
> notification for systems without swap too.
I agree.
To get out of the "my patch is better" line of conversation,
I guess you and Marcelo should probably try to figure out
some threshold that you both agree on.
> A concern, or feature =), with the notify-on-swap method is that with
> responsive user applications, it will never use swap at all. There are
> for sure systems where this behavior is desirable, but for example
> desktop systems, the memory occupied by inactive processes might be
> better used by active ones.
Well, if the inactive processes get woken up by the low memory
notification and free some of their memory, the active processes
will use the memory from the inactive ones :)
Not using swap is generally considered a good thing on desktops.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC Patch] Thrashing notification
2007-11-06 20:01 ` Rik van Riel
@ 2007-11-06 21:27 ` Daniel Spång
0 siblings, 0 replies; 8+ messages in thread
From: Daniel Spång @ 2007-11-06 21:27 UTC (permalink / raw)
To: Rik van Riel
Cc: Marcelo Tosatti, linux-mm, drepper, akpm, mbligh, balbir, 7eggert
On 11/6/07, Rik van Riel <riel@redhat.com> wrote:
> To get out of the "my patch is better" line of conversation,
> I guess you and Marcelo should probably try to figure out
> some threshold that you both agree on.
Sure, we will do that. =)
> > A concern, or feature =), with the notify-on-swap method is that with
> > responsive user applications, it will never use swap at all. There are
> > for sure systems where this behavior is desirable, but for example
> > desktop systems, the memory occupied by inactive processes might be
> > better used by active ones.
>
> Well, if the inactive processes get woken up by the low memory
> notification and free some of their memory, the active processes
> will use the memory from the inactive ones :)
Yes, but it will probably take some time before all applications start
to use this and even if they do, we might have to consider the case
where the limit is reached and applications have no more memory to
spare.
Also if applications are swamped with notifications we might found our
self in a new new kind of thrashing state where dumb applications
(e.g., my test application) repeatedly and unsuccessfully tries to
release memory. So instead of notify on each priority threshold reach
or every interval where swap has occurred we could enter a state and
not leave it until memory pressure have decreased and only notify on
state change.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-11-06 21:27 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-05 14:47 [RFC Patch] Thrashing notification Daniel Spång
2007-11-05 15:43 ` Daniel Spång
2007-11-05 18:30 ` Marcelo Tosatti
2007-11-05 20:17 ` Rik van Riel
2007-11-06 10:41 ` Daniel Spång
2007-11-06 20:01 ` Rik van Riel
2007-11-06 21:27 ` Daniel Spång
2007-11-06 10:36 ` Daniel Spång
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox