From: Christoph Lameter <clameter@sgi.com>
To: linux-kernel@vger.kernel.org
Cc: akpm@osdl.org, Hugh Dickins <hugh@veritas.com>,
Nick Piggin <nickpiggin@yahoo.com.au>,
linux-mm@kvack.org, Andi Kleen <ak@suse.de>,
Marcelo Tosatti <marcelo.tosatti@cyclades.com>,
Christoph Lameter <clameter@sgi.com>
Subject: [RFC3 06/14] Zone Reclaim
Date: Wed, 14 Dec 2005 16:14:46 -0800 (PST) [thread overview]
Message-ID: <20051215001446.31405.63375.sendpatchset@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <20051215001415.31405.24898.sendpatchset@schroedinger.engr.sgi.com>
Zone reclaim allows the reclaiming of pages from a zone if the number of free
pages falls below the watermark even if other zones still have enough pages
available. Zone reclaim is of particular importance for NUMA machines. It can
be more beneficial to reclaim a page than taking the performance penalties
that come with allocating a page on a remote zone.
Zone reclaim is enabled if the maximum distance to another node is higher
than RECLAIM_DISTANCE, which may be defined by an arch. By default
RECLAIM_DISTANCE is 20 meaning the distance to another node in the
same component (enclosure or motherboard).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.15-rc5-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/page_alloc.c 2005-12-14 14:57:33.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/page_alloc.c 2005-12-14 15:24:22.000000000 -0800
@@ -1186,7 +1186,9 @@ get_page_from_freelist(gfp_t gfp_mask, u
mark = (*z)->pages_high;
if (!zone_watermark_ok(*z, order, mark,
classzone_idx, alloc_flags))
- continue;
+ if (!zone_reclaim_mode ||
+ !zone_reclaim(*z, gfp_mask, order))
+ continue;
}
page = buffered_rmqueue(*z, order, gfp_mask);
@@ -1957,13 +1959,22 @@ static void __init build_zonelists(pg_da
prev_node = local_node;
nodes_clear(used_mask);
while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
+ int distance = node_distance(local_node, node);
+
+ /*
+ * If another node is sufficiently far away then it is better
+ * to reclaim pages in a zone before going off node.
+ */
+ if (distance > RECLAIM_DISTANCE)
+ zone_reclaim_mode = 1;
+
/*
* We don't want to pressure a particular node.
* So adding penalty to the first node in same
* distance group to make it round-robin.
*/
- if (node_distance(local_node, node) !=
- node_distance(local_node, prev_node))
+
+ if (distance != node_distance(local_node, prev_node))
node_load[node] += load;
prev_node = node;
load--;
Index: linux-2.6.15-rc5-mm2/include/linux/swap.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/swap.h 2005-12-13 20:41:05.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/swap.h 2005-12-14 15:24:22.000000000 -0800
@@ -172,6 +172,17 @@ extern void swap_setup(void);
/* linux/mm/vmscan.c */
extern int try_to_free_pages(struct zone **, gfp_t);
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
+extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
+#else
+#define zone_reclaim_mode 0
+static inline int zone_reclaim(struct zone *z, gfp_t mask,
+ unsigned int order)
+{
+ return 0;
+}
+#endif
extern int shrink_all_memory(int);
extern int vm_swappiness;
Index: linux-2.6.15-rc5-mm2/include/linux/topology.h
===================================================================
--- linux-2.6.15-rc5-mm2.orig/include/linux/topology.h 2005-12-12 09:10:34.000000000 -0800
+++ linux-2.6.15-rc5-mm2/include/linux/topology.h 2005-12-14 15:24:22.000000000 -0800
@@ -56,6 +56,9 @@
#define REMOTE_DISTANCE 20
#define node_distance(from,to) ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
#endif
+#ifndef RECLAIM_DISTANCE
+#define RECLAIM_DISTANCE 20
+#endif
#ifndef PENALTY_FOR_NODE_WITH_CPUS
#define PENALTY_FOR_NODE_WITH_CPUS (1)
#endif
Index: linux-2.6.15-rc5-mm2/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc5-mm2.orig/mm/vmscan.c 2005-12-14 15:24:19.000000000 -0800
+++ linux-2.6.15-rc5-mm2/mm/vmscan.c 2005-12-14 15:24:43.000000000 -0800
@@ -1823,3 +1823,60 @@ static int __init kswapd_init(void)
}
module_init(kswapd_init)
+
+#ifdef CONFIG_NUMA
+/*
+ * Zone reclaim mode
+ *
+ * If non-zero call zone_reclaim when the number of free pages falls below
+ * the watermarks.
+ */
+int zone_reclaim_mode __read_mostly;
+
+/*
+ * Try to free up some pages from this zone through reclaim.
+ */
+int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
+{
+ struct scan_control sc;
+ int nr_pages = 1 << order;
+ struct task_struct *p = current;
+ struct reclaim_state reclaim_state;
+
+ if (!(gfp_mask & __GFP_WAIT) ||
+ zone->zone_pgdat->node_id != numa_node_id() ||
+ zone->all_unreclaimable ||
+ atomic_read(&zone->reclaim_in_progress) > 0)
+ return 0;
+
+ /*
+ * Check if there is a reasonable amount of recoverable memory before
+ * doing the scan.
+ */
+ if (zone_page_state(zone, NR_PAGECACHE) <=
+ zone_page_state(zone, NR_MAPPED) + nr_pages)
+ return 0;
+
+ sc.gfp_mask = gfp_mask;
+ sc.may_writepage = 0;
+ sc.may_swap = 0;
+ sc.nr_mapped = global_page_state(NR_MAPPED);
+ sc.nr_scanned = 0;
+ sc.nr_reclaimed = 0;
+ sc.priority = 0;
+ disable_swap_token();
+
+ sc.swap_cluster_max = max(nr_pages, SWAP_CLUSTER_MAX);
+
+ cond_resched();
+ p->flags |= PF_MEMALLOC;
+ reclaim_state.reclaimed_slab = 0;
+ p->reclaim_state = &reclaim_state;
+ shrink_zone(zone, &sc);
+ p->reclaim_state = NULL;
+ current->flags &= ~PF_MEMALLOC;
+ cond_resched();
+ return sc.nr_reclaimed >= (1 << order);
+}
+#endif
+
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2005-12-15 0:14 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-15 0:14 [RFC3 00/14] Zoned VM stats Christoph Lameter
2005-12-15 0:14 ` [RFC3 01/14] Add some consts for inlines in mm.h Christoph Lameter
2005-12-15 1:01 ` J.A. Magallon
2005-12-15 0:14 ` [RFC3 02/14] Basic counter functionality Christoph Lameter
2005-12-17 4:01 ` Marcelo Tosatti
2005-12-17 4:19 ` Marcelo Tosatti
2005-12-19 17:58 ` Christoph Lameter
2005-12-15 0:14 ` [RFC3 03/14] Convert nr_mapped Christoph Lameter
2005-12-15 0:14 ` [RFC3 04/14] Convert nr_pagecache Christoph Lameter
2005-12-15 0:14 ` [RFC3 05/14] Resurrect scan_control.may_swap Christoph Lameter
2005-12-15 0:14 ` Christoph Lameter [this message]
2005-12-15 0:14 ` [RFC3 07/14] Expanded node and zone statistics Christoph Lameter
2005-12-15 0:14 ` [RFC3 08/14] Convert nr_slab Christoph Lameter
2005-12-15 0:15 ` [RFC3 09/14] Convert nr_page_table Christoph Lameter
2005-12-15 0:15 ` [RFC3 10/14] Convert nr_dirty Christoph Lameter
2005-12-15 0:15 ` [RFC3 11/14] Convert nr_writeback Christoph Lameter
2005-12-15 0:15 ` [RFC3 12/14] Convert nr_unstable Christoph Lameter
2005-12-15 0:15 ` [RFC3 13/14] Remove get_page_state functions Christoph Lameter
2005-12-15 0:15 ` [RFC3 14/14] Remove wbs Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051215001446.31405.63375.sendpatchset@schroedinger.engr.sgi.com \
--to=clameter@sgi.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=marcelo.tosatti@cyclades.com \
--cc=nickpiggin@yahoo.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox