From: Christoph Lameter <clameter@sgi.com>
To: akpm@osdl.org
Cc: nickpiggin@yahoo.com.au, Rik van Riel <riel@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: NUMA: Patch for node based swapping V2
Date: Wed, 13 Oct 2004 08:14:35 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.58.0410130812560.9057@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <416D0AA4.30701@yahoo.com.au>
This was discussed yesterday on linux-mm.
Changelog:
* NUMA: Add ability to invoke kswapd on a node if local memory falls below a
certain threshold. A node may fill up its memory by simply copying a file
which will fill up the nodes memory with cached pages. The nodes memory will
currently only be reclaimed if all nodes in the system fall below a certain
threshhold. Until that time the processes on the node will only be allocated
off node memory. Invoking kswapd on a node fixes this situation until
a better solution can be found.
* Threshold may be set in /proc/sys/vm/node_swap in percent * 10. The threshold
is set to zero by default which means that node swapping is off.
Index: linux-2.6.9-rc4/mm/page_alloc.c
===================================================================
--- linux-2.6.9-rc4.orig/mm/page_alloc.c 2004-10-10 19:57:03.000000000 -0700
+++ linux-2.6.9-rc4/mm/page_alloc.c 2004-10-13 07:58:57.000000000 -0700
@@ -41,6 +41,19 @@
long nr_swap_pages;
int numnodes = 1;
int sysctl_lower_zone_protection = 0;
+#ifdef CONFIG_NUMA
+/*
+ * sysctl_node_swap is a percentage of the pages available
+ * in a zone multiplied by 10. If the available pages
+ * in a zone drop below this limit then kswapd is invoked
+ * for this zone alone. This results in the reclaiming
+ * of local memory. Local memory may be filled up by simply reading
+ * a file. If local memory is not available the off node memory
+ * will be allocated to a process which makes all memory access
+ * less efficient then they could be.
+ */
+int sysctl_node_swap = 0;
+#endif
EXPORT_SYMBOL(totalram_pages);
EXPORT_SYMBOL(nr_swap_pages);
@@ -483,6 +496,14 @@
p = &z->pageset[cpu];
if (pg == orig) {
z->pageset[cpu].numa_hit++;
+ /*
+ * If zone allocation has left less than
+ * (sysctl_node_swap / 10) % of the zone free invoke kswapd.
+ * (the page limit is obtained through (pages*limit)/1024 to
+ * make the calculation more efficient)
+ */
+ if (z->free_pages < (z->present_pages * sysctl_node_swap) << 10)
+ wakeup_kswapd(z);
} else {
p->numa_miss++;
zonelist->zones[0]->pageset[cpu].numa_foreign++;
Index: linux-2.6.9-rc4/kernel/sysctl.c
===================================================================
--- linux-2.6.9-rc4.orig/kernel/sysctl.c 2004-10-10 19:57:03.000000000 -0700
+++ linux-2.6.9-rc4/kernel/sysctl.c 2004-10-11 12:54:51.000000000 -0700
@@ -65,6 +65,9 @@
extern int min_free_kbytes;
extern int printk_ratelimit_jiffies;
extern int printk_ratelimit_burst;
+#ifdef CONFIG_NUMA
+extern int sysctl_node_swap;
+#endif
#if defined(CONFIG_X86_LOCAL_APIC) && defined(__i386__)
int unknown_nmi_panic;
@@ -800,7 +803,17 @@
.extra1 = &zero,
},
#endif
- { .ctl_name = 0 }
+#ifdef CONFIG_NUMA
+ {
+ .ctl_name = VM_NODE_SWAP,
+ .procname = "node_swap",
+ .data = &sysctl_node_swap,
+ .maxlen = sizeof(sysctl_node_swap),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec
+ },
+#endif
+ { .ctl_name = 0 }
};
static ctl_table proc_table[] = {
Index: linux-2.6.9-rc4/include/linux/sysctl.h
===================================================================
--- linux-2.6.9-rc4.orig/include/linux/sysctl.h 2004-10-10 19:58:05.000000000 -0700
+++ linux-2.6.9-rc4/include/linux/sysctl.h 2004-10-11 12:54:51.000000000 -0700
@@ -167,6 +167,7 @@
VM_HUGETLB_GROUP=25, /* permitted hugetlb group */
VM_VFS_CACHE_PRESSURE=26, /* dcache/icache reclaim pressure */
VM_LEGACY_VA_LAYOUT=27, /* legacy/compatibility virtual address space layout */
+ VM_NODE_SWAP=28, /* Swap local node memory limit (in % *10) */
};
Index: linux-2.6.9-rc4/mm/vmscan.c
===================================================================
--- linux-2.6.9-rc4.orig/mm/vmscan.c 2004-10-10 19:57:04.000000000 -0700
+++ linux-2.6.9-rc4/mm/vmscan.c 2004-10-11 12:54:51.000000000 -0700
@@ -1168,9 +1168,11 @@
*/
void wakeup_kswapd(struct zone *zone)
{
+ extern int sysctl_node_swap;
+
if (zone->present_pages == 0)
return;
- if (zone->free_pages > zone->pages_low)
+ if (zone->free_pages > (zone->present_pages * sysctl_node_swap) << 10 && zone->free_pages > zone->pages_low)
return;
if (!waitqueue_active(&zone->zone_pgdat->kswapd_wait))
return;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-10-13 15:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-10-12 15:02 NUMA: Patch for node based swapping Christoph Lameter
2004-10-12 15:16 ` Martin J. Bligh
2004-10-12 15:38 ` Christoph Lameter
2004-10-12 15:20 ` Jan-Benedict Glaw
2004-10-12 15:27 ` Rik van Riel
2004-10-12 15:39 ` Christoph Lameter
2004-10-12 15:52 ` Rik van Riel
2004-10-12 20:20 ` Christoph Lameter
2004-10-13 10:59 ` Nick Piggin
2004-10-13 15:14 ` Christoph Lameter [this message]
2004-10-12 19:33 ` Anton Blanchard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.58.0410130812560.9057@schroedinger.engr.sgi.com \
--to=clameter@sgi.com \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox