linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] VM: add may_swap flag to scan_control
       [not found] <20050601141154.GN14894@localhost>
@ 2005-06-01 14:22 ` Martin Hicks
  2005-06-01 14:23 ` [PATCH 2/4] VM: early zone reclaim Martin Hicks
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 4+ messages in thread
From: Martin Hicks @ 2005-06-01 14:22 UTC (permalink / raw)
  To: Linux MM, Andrew Morton; +Cc: Ray Bryant

This adds an extra switch to the scan_control struct.  It simply
lets the reclaim code know if its allowed to swap pages out.

This was required for a simple per-zone reclaimer.  Without this
addition pages would be swapped out as soon as a zone ran out of
memory and the early reclaim kicked in.

Signed-off-by: Martin Hicks <mort@sgi.com>

 mm/vmscan.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletion(-)

Index: linux-2.6.12-rc5-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.12-rc5-mm1.orig/mm/vmscan.c	2005-05-26 12:27:01.000000000 -0700
+++ linux-2.6.12-rc5-mm1/mm/vmscan.c	2005-05-26 12:27:05.000000000 -0700
@@ -74,6 +74,9 @@ struct scan_control {
 
 	int may_writepage;
 
+	/* Can pages be swapped as part of reclaim? */
+	int may_swap;
+
 	/* This context's SWAP_CLUSTER_MAX. If freeing memory for
 	 * suspend, we effectively ignore SWAP_CLUSTER_MAX.
 	 * In this context, it doesn't matter that we scan the
@@ -414,7 +417,7 @@ static int shrink_list(struct list_head 
 		 * Anonymous process memory has backing store?
 		 * Try to allocate it some swap space here.
 		 */
-		if (PageAnon(page) && !PageSwapCache(page)) {
+		if (PageAnon(page) && !PageSwapCache(page) && sc->may_swap) {
 			void *cookie = page->mapping;
 			pgoff_t index = page->index;
 
@@ -930,6 +933,7 @@ int try_to_free_pages(struct zone **zone
 
 	sc.gfp_mask = gfp_mask;
 	sc.may_writepage = 0;
+	sc.may_swap = 1;
 
 	inc_page_state(allocstall);
 
@@ -1030,6 +1034,7 @@ loop_again:
 	total_reclaimed = 0;
 	sc.gfp_mask = GFP_KERNEL;
 	sc.may_writepage = 0;
+	sc.may_swap = 1;
 	sc.nr_mapped = read_page_state(nr_mapped);
 
 	inc_page_state(pageoutrun);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/4] VM: early zone reclaim
       [not found] <20050601141154.GN14894@localhost>
  2005-06-01 14:22 ` [PATCH 1/4] VM: add may_swap flag to scan_control Martin Hicks
@ 2005-06-01 14:23 ` Martin Hicks
  2005-06-01 14:23 ` [PATCH 3/4] VM: add __GFP_NORECLAIM Martin Hicks
  2005-06-01 14:23 ` [PATCH 4/4] VM: rate limit early reclaim Martin Hicks
  3 siblings, 0 replies; 4+ messages in thread
From: Martin Hicks @ 2005-06-01 14:23 UTC (permalink / raw)
  To: Linux MM, Andrew Morton; +Cc: Ray Bryant

This is the core of the (much simplified) early reclaim.  The goal of
this patch is to reclaim some easily-freed pages from a zone before
falling back onto another zone.

One of the major uses of this is NUMA machines.  With the default
allocator behavior the allocator would look for memory in another
zone, which might be off-node, before trying to reclaim from the
current zone.

This adds a zone tuneable to enable early zone reclaim.  It is selected
on a per-zone basis and is turned on/off via syscall.

Signed-off-by: Martin Hicks <mort@sgi.com>

 arch/i386/kernel/syscall_table.S |    2 -
 arch/ia64/kernel/entry.S         |    2 -
 include/asm-i386/unistd.h        |    2 -
 include/asm-ia64/unistd.h        |    1 
 include/linux/mmzone.h           |    6 +++
 include/linux/swap.h             |    1 
 kernel/sys_ni.c                  |    1 
 mm/page_alloc.c                  |   31 ++++++++++++++++--
 mm/vmscan.c                      |   64 +++++++++++++++++++++++++++++++++++++++
 9 files changed, 103 insertions(+), 7 deletions(-)

Index: linux-2.6.12-rc5-mm1/arch/ia64/kernel/entry.S
===================================================================
--- linux-2.6.12-rc5-mm1.orig/arch/ia64/kernel/entry.S	2005-05-26 12:26:59.000000000 -0700
+++ linux-2.6.12-rc5-mm1/arch/ia64/kernel/entry.S	2005-05-26 12:27:11.000000000 -0700
@@ -1573,7 +1573,7 @@ sys_call_table:
 	data8 sys_keyctl
 	data8 sys_ni_syscall
 	data8 sys_ni_syscall			// 1275
-	data8 sys_ni_syscall
+	data8 sys_set_zone_reclaim
 	data8 sys_ni_syscall
 	data8 sys_ni_syscall
 	data8 sys_ni_syscall
Index: linux-2.6.12-rc5-mm1/include/linux/mmzone.h
===================================================================
--- linux-2.6.12-rc5-mm1.orig/include/linux/mmzone.h	2005-05-26 12:26:59.000000000 -0700
+++ linux-2.6.12-rc5-mm1/include/linux/mmzone.h	2005-05-26 12:27:11.000000000 -0700
@@ -145,6 +145,12 @@ struct zone {
 	int			all_unreclaimable; /* All pages pinned */
 
 	/*
+	 * Does the allocator try to reclaim pages from the zone as soon
+	 * as it fails a watermark_ok() in __alloc_pages?
+	 */
+	int			reclaim_pages;
+
+	/*
 	 * prev_priority holds the scanning priority for this zone.  It is
 	 * defined as the scanning priority at which we achieved our reclaim
 	 * target at the previous try_to_free_pages() or balance_pgdat()
Index: linux-2.6.12-rc5-mm1/include/linux/swap.h
===================================================================
--- linux-2.6.12-rc5-mm1.orig/include/linux/swap.h	2005-05-26 12:26:59.000000000 -0700
+++ linux-2.6.12-rc5-mm1/include/linux/swap.h	2005-05-26 12:27:11.000000000 -0700
@@ -173,6 +173,7 @@ extern void swap_setup(void);
 
 /* linux/mm/vmscan.c */
 extern int try_to_free_pages(struct zone **, unsigned int, unsigned int);
+extern int zone_reclaim(struct zone *, unsigned int, unsigned int);
 extern int shrink_all_memory(int);
 extern int vm_swappiness;
 
Index: linux-2.6.12-rc5-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.12-rc5-mm1.orig/mm/page_alloc.c	2005-05-26 12:26:59.000000000 -0700
+++ linux-2.6.12-rc5-mm1/mm/page_alloc.c	2005-05-26 12:27:11.000000000 -0700
@@ -724,6 +724,14 @@ int zone_watermark_ok(struct zone *z, in
 	return 1;
 }
 
+static inline int
+check_zone_reclaim(struct zone *z, unsigned int gfp_mask)
+{
+	if (!z->reclaim_pages)
+		return 0;
+	return 1;
+}
+
 /*
  * This is the 'heart' of the zoned buddy allocator.
  */
@@ -763,14 +771,29 @@ __alloc_pages(unsigned int __nocast gfp_
  restart:
 	/* Go through the zonelist once, looking for a zone with enough free */
 	for (i = 0; (z = zones[i]) != NULL; i++) {
-
-		if (!zone_watermark_ok(z, order, z->pages_low,
-				       classzone_idx, 0, 0))
-			continue;
+		int do_reclaim = check_zone_reclaim(z, gfp_mask);
 
 		if (!cpuset_zone_allowed(z))
 			continue;
 
+		/*
+		 * If the zone is to attempt early page reclaim then this loop
+		 * will try to reclaim pages and check the watermark a second
+		 * time before giving up and falling back to the next zone.
+		 */
+	zone_reclaim_retry:
+		if (!zone_watermark_ok(z, order, z->pages_low,
+				       classzone_idx, 0, 0)) {
+			if (!do_reclaim)
+				continue;
+			else {
+				zone_reclaim(z, gfp_mask, order);
+				/* Only try reclaim once */
+				do_reclaim = 0;
+				goto zone_reclaim_retry;
+			}
+		}
+
 		page = buffered_rmqueue(z, order, gfp_mask);
 		if (page)
 			goto got_pg;
Index: linux-2.6.12-rc5-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.12-rc5-mm1.orig/mm/vmscan.c	2005-05-26 12:27:05.000000000 -0700
+++ linux-2.6.12-rc5-mm1/mm/vmscan.c	2005-05-26 12:27:11.000000000 -0700
@@ -1326,3 +1326,67 @@ static int __init kswapd_init(void)
 }
 
 module_init(kswapd_init)
+
+
+/*
+ * Try to free up some pages from this zone through reclaim.
+ */
+int zone_reclaim(struct zone *zone, unsigned int gfp_mask, unsigned int order)
+{
+	struct scan_control sc;
+	int nr_pages = 1 << order;
+	int total_reclaimed = 0;
+
+	/* The reclaim may sleep, so don't do it if sleep isn't allowed */
+	if (!(gfp_mask & __GFP_WAIT))
+		return 0;
+	if (zone->all_unreclaimable)
+		return 0;
+
+	sc.gfp_mask = gfp_mask;
+	sc.may_writepage = 0;
+	sc.may_swap = 0;
+	sc.nr_mapped = read_page_state(nr_mapped);
+	sc.nr_scanned = 0;
+	sc.nr_reclaimed = 0;
+	/* scan at the highest priority */
+	sc.priority = 0;
+
+	if (nr_pages > SWAP_CLUSTER_MAX)
+		sc.swap_cluster_max = nr_pages;
+	else
+		sc.swap_cluster_max = SWAP_CLUSTER_MAX;
+
+	shrink_zone(zone, &sc);
+	total_reclaimed = sc.nr_reclaimed;
+
+	return total_reclaimed;
+}
+
+asmlinkage long sys_set_zone_reclaim(unsigned int node, unsigned int zone,
+				     unsigned int state)
+{
+	struct zone *z;
+	int i;
+
+	if (node >= MAX_NUMNODES || !node_online(node))
+		return -EINVAL;
+
+	/* This will break if we ever add more zones */
+	if (!(zone & (1<<ZONE_DMA|1<<ZONE_NORMAL|1<<ZONE_HIGHMEM)))
+		return -EINVAL;
+
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		if (!(zone & 1<<i))
+			continue;
+
+		z = &NODE_DATA(node)->node_zones[i];
+
+		if (state)
+			z->reclaim_pages = 1;
+		else
+			z->reclaim_pages = 0;
+	}
+
+	return 0;
+}
Index: linux-2.6.12-rc5-mm1/kernel/sys_ni.c
===================================================================
--- linux-2.6.12-rc5-mm1.orig/kernel/sys_ni.c	2005-05-26 12:26:59.000000000 -0700
+++ linux-2.6.12-rc5-mm1/kernel/sys_ni.c	2005-05-26 12:27:11.000000000 -0700
@@ -77,6 +77,7 @@ cond_syscall(sys_request_key);
 cond_syscall(sys_keyctl);
 cond_syscall(compat_sys_keyctl);
 cond_syscall(compat_sys_socketcall);
+cond_syscall(sys_set_zone_reclaim);
 
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);
Index: linux-2.6.12-rc5-mm1/include/asm-i386/unistd.h
===================================================================
--- linux-2.6.12-rc5-mm1.orig/include/asm-i386/unistd.h	2005-05-26 12:26:59.000000000 -0700
+++ linux-2.6.12-rc5-mm1/include/asm-i386/unistd.h	2005-05-26 12:27:11.000000000 -0700
@@ -256,7 +256,7 @@
 #define __NR_io_submit		248
 #define __NR_io_cancel		249
 #define __NR_fadvise64		250
-
+#define __NR_set_zone_reclaim	251
 #define __NR_exit_group		252
 #define __NR_lookup_dcookie	253
 #define __NR_epoll_create	254
Index: linux-2.6.12-rc5-mm1/include/asm-ia64/unistd.h
===================================================================
--- linux-2.6.12-rc5-mm1.orig/include/asm-ia64/unistd.h	2005-05-26 12:26:59.000000000 -0700
+++ linux-2.6.12-rc5-mm1/include/asm-ia64/unistd.h	2005-05-26 12:27:11.000000000 -0700
@@ -263,6 +263,7 @@
 #define __NR_add_key			1271
 #define __NR_request_key		1272
 #define __NR_keyctl			1273
+#define __NR_set_zone_reclaim		1276
 
 #ifdef __KERNEL__
 
Index: linux-2.6.12-rc5-mm1/arch/i386/kernel/syscall_table.S
===================================================================
--- linux-2.6.12-rc5-mm1.orig/arch/i386/kernel/syscall_table.S	2005-05-26 12:26:59.000000000 -0700
+++ linux-2.6.12-rc5-mm1/arch/i386/kernel/syscall_table.S	2005-05-26 12:27:11.000000000 -0700
@@ -251,7 +251,7 @@ ENTRY(sys_call_table)
 	.long sys_io_submit
 	.long sys_io_cancel
 	.long sys_fadvise64	/* 250 */
-	.long sys_ni_syscall
+	.long sys_set_zone_reclaim
 	.long sys_exit_group
 	.long sys_lookup_dcookie
 	.long sys_epoll_create
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 3/4] VM: add __GFP_NORECLAIM
       [not found] <20050601141154.GN14894@localhost>
  2005-06-01 14:22 ` [PATCH 1/4] VM: add may_swap flag to scan_control Martin Hicks
  2005-06-01 14:23 ` [PATCH 2/4] VM: early zone reclaim Martin Hicks
@ 2005-06-01 14:23 ` Martin Hicks
  2005-06-01 14:23 ` [PATCH 4/4] VM: rate limit early reclaim Martin Hicks
  3 siblings, 0 replies; 4+ messages in thread
From: Martin Hicks @ 2005-06-01 14:23 UTC (permalink / raw)
  To: Linux MM, Andrew Morton; +Cc: Ray Bryant

When using the early zone reclaim, it was noticed that allocating new
pages that should be spread across the whole system caused eviction
of local pages.

This adds a new GFP flag to prevent early reclaim from happening during
certain allocation attempts.  The example that is implemented here is
for page cache pages.  We want page cache pages to be spread across the
whole system, and we don't want page cache pages to evict other pages
to get local memory.

Signed-off-by:  Martin Hicks <mort@sgi.com>

 include/linux/gfp.h     |    3 ++-
 include/linux/pagemap.h |    4 ++--
 mm/page_alloc.c         |    2 ++
 3 files changed, 6 insertions(+), 3 deletions(-)

Index: linux-2.6.12-rc5-mm1/include/linux/gfp.h
===================================================================
--- linux-2.6.12-rc5-mm1.orig/include/linux/gfp.h	2005-05-26 12:26:57.000000000 -0700
+++ linux-2.6.12-rc5-mm1/include/linux/gfp.h	2005-05-26 12:27:15.000000000 -0700
@@ -39,6 +39,7 @@ struct vm_area_struct;
 #define __GFP_COMP	0x4000u	/* Add compound page metadata */
 #define __GFP_ZERO	0x8000u	/* Return zeroed page on success */
 #define __GFP_NOMEMALLOC 0x10000u /* Don't use emergency reserves */
+#define __GFP_NORECLAIM  0x20000u /* No realy zone reclaim during allocation */
 
 #define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1)
@@ -47,7 +48,7 @@ struct vm_area_struct;
 #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
 			__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
 			__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
-			__GFP_NOMEMALLOC)
+			__GFP_NOMEMALLOC|__GFP_NORECLAIM)
 
 #define GFP_ATOMIC	(__GFP_HIGH)
 #define GFP_NOIO	(__GFP_WAIT)
Index: linux-2.6.12-rc5-mm1/include/linux/pagemap.h
===================================================================
--- linux-2.6.12-rc5-mm1.orig/include/linux/pagemap.h	2005-05-26 12:26:57.000000000 -0700
+++ linux-2.6.12-rc5-mm1/include/linux/pagemap.h	2005-05-26 12:27:15.000000000 -0700
@@ -52,12 +52,12 @@ void release_pages(struct page **pages, 
 
 static inline struct page *page_cache_alloc(struct address_space *x)
 {
-	return alloc_pages(mapping_gfp_mask(x), 0);
+	return alloc_pages(mapping_gfp_mask(x)|__GFP_NORECLAIM, 0);
 }
 
 static inline struct page *page_cache_alloc_cold(struct address_space *x)
 {
-	return alloc_pages(mapping_gfp_mask(x)|__GFP_COLD, 0);
+	return alloc_pages(mapping_gfp_mask(x)|__GFP_COLD|__GFP_NORECLAIM, 0);
 }
 
 typedef int filler_t(void *, struct page *);
Index: linux-2.6.12-rc5-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.12-rc5-mm1.orig/mm/page_alloc.c	2005-05-26 12:27:11.000000000 -0700
+++ linux-2.6.12-rc5-mm1/mm/page_alloc.c	2005-05-26 12:27:15.000000000 -0700
@@ -729,6 +729,8 @@ check_zone_reclaim(struct zone *z, unsig
 {
 	if (!z->reclaim_pages)
 		return 0;
+	if (gfp_mask & __GFP_NORECLAIM)
+		return 0;
 	return 1;
 }
 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 4/4] VM: rate limit early reclaim
       [not found] <20050601141154.GN14894@localhost>
                   ` (2 preceding siblings ...)
  2005-06-01 14:23 ` [PATCH 3/4] VM: add __GFP_NORECLAIM Martin Hicks
@ 2005-06-01 14:23 ` Martin Hicks
  3 siblings, 0 replies; 4+ messages in thread
From: Martin Hicks @ 2005-06-01 14:23 UTC (permalink / raw)
  To: Linux MM, Andrew Morton; +Cc: Ray Bryant

When early zone reclaim is turned on the LRU is scanned more frequently
when a zone is low on memory.  This limits when the zone reclaim can
be called by skipping the scan if another thread (either via kswapd or
sync reclaim) is already reclaiming from the zone.

Signed-off-by: Martin Hicks <mort@sgi.com> 

 include/linux/mmzone.h |    2 ++
 mm/page_alloc.c        |    1 +
 mm/vmscan.c            |   10 ++++++++++
 3 files changed, 13 insertions(+)

Index: linux-2.6.12-rc5-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.12-rc5-mm1.orig/mm/vmscan.c	2005-05-26 12:27:11.000000000 -0700
+++ linux-2.6.12-rc5-mm1/mm/vmscan.c	2005-05-26 12:27:17.000000000 -0700
@@ -903,7 +903,9 @@ shrink_caches(struct zone **zones, struc
 		if (zone->all_unreclaimable && sc->priority != DEF_PRIORITY)
 			continue;	/* Let kswapd poll it */
 
+		atomic_inc(&zone->reclaim_in_progress);
 		shrink_zone(zone, sc);
+		atomic_dec(&zone->reclaim_in_progress);
 	}
 }
  
@@ -1114,7 +1116,9 @@ scan:
 			sc.nr_reclaimed = 0;
 			sc.priority = priority;
 			sc.swap_cluster_max = nr_pages? nr_pages : SWAP_CLUSTER_MAX;
+			atomic_inc(&zone->reclaim_in_progress);
 			shrink_zone(zone, &sc);
+			atomic_dec(&zone->reclaim_in_progress);
 			reclaim_state->reclaimed_slab = 0;
 			nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL,
 						lru_pages);
@@ -1357,9 +1361,15 @@ int zone_reclaim(struct zone *zone, unsi
 	else
 		sc.swap_cluster_max = SWAP_CLUSTER_MAX;
 
+	/* Don't reclaim the zone if there are other reclaimers active */
+	if (!atomic_inc_and_test(&zone->reclaim_in_progress))
+		goto out;
+
 	shrink_zone(zone, &sc);
 	total_reclaimed = sc.nr_reclaimed;
 
+ out:
+	atomic_dec(&zone->reclaim_in_progress);
 	return total_reclaimed;
 }
 
Index: linux-2.6.12-rc5-mm1/include/linux/mmzone.h
===================================================================
--- linux-2.6.12-rc5-mm1.orig/include/linux/mmzone.h	2005-05-26 12:27:11.000000000 -0700
+++ linux-2.6.12-rc5-mm1/include/linux/mmzone.h	2005-05-26 12:27:17.000000000 -0700
@@ -149,6 +149,8 @@ struct zone {
 	 * as it fails a watermark_ok() in __alloc_pages?
 	 */
 	int			reclaim_pages;
+	/* A count of how many reclaimers are scanning this zone */
+	atomic_t		reclaim_in_progress;
 
 	/*
 	 * prev_priority holds the scanning priority for this zone.  It is
Index: linux-2.6.12-rc5-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.12-rc5-mm1.orig/mm/page_alloc.c	2005-05-26 12:27:15.000000000 -0700
+++ linux-2.6.12-rc5-mm1/mm/page_alloc.c	2005-05-26 12:27:17.000000000 -0700
@@ -1757,6 +1757,7 @@ static void __init free_area_init_core(s
 		zone->nr_scan_inactive = 0;
 		zone->nr_active = 0;
 		zone->nr_inactive = 0;
+		atomic_set(&zone->reclaim_in_progress, -1);
 		if (!size)
 			continue;
 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-06-01 14:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20050601141154.GN14894@localhost>
2005-06-01 14:22 ` [PATCH 1/4] VM: add may_swap flag to scan_control Martin Hicks
2005-06-01 14:23 ` [PATCH 2/4] VM: early zone reclaim Martin Hicks
2005-06-01 14:23 ` [PATCH 3/4] VM: add __GFP_NORECLAIM Martin Hicks
2005-06-01 14:23 ` [PATCH 4/4] VM: rate limit early reclaim Martin Hicks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox