linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Zone reclaim: Allow modification of zone reclaim behavior
@ 2006-01-30 20:24 Christoph Lameter
  2006-01-30 21:45 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Lameter @ 2006-01-30 20:24 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm

In some situations one may want zone_reclaim to behave differently. For
example a process writing large amounts of memory will spew unto other
nodes to cache the writes if many pages in a zone become dirty. This may
impact the performance of processes running on other nodes.

Allowing writes during reclaim puts a stop to that behavior and throttles
the process by restricting the pages to the local zone.

Similarly one may want to contain processes to local memory by enabling
regular swap behavior during zone_reclaim. Off node memory allocation
can then be controlled through memory policies and cpusets.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.16-rc1-mm4/mm/vmscan.c
===================================================================
--- linux-2.6.16-rc1-mm4.orig/mm/vmscan.c	2006-01-30 11:31:31.000000000 -0800
+++ linux-2.6.16-rc1-mm4/mm/vmscan.c	2006-01-30 12:19:49.000000000 -0800
@@ -1831,6 +1831,11 @@ module_init(kswapd_init)
  */
 int zone_reclaim_mode __read_mostly;
 
+#define RECLAIM_OFF 0
+#define RECLAIM_ZONE (1<<0)	/* Run shrink_cache on the zone */
+#define RECLAIM_WRITE (1<<1)	/* Writeout pages during reclaim */
+#define RECLAIM_SWAP (1<<2)	/* Swap pages out during reclaim */
+
 /*
  * Mininum time between zone reclaim scans
  */
@@ -1869,8 +1874,8 @@ int zone_reclaim(struct zone *zone, gfp_
 	if (!cpus_empty(mask) && node_id != numa_node_id())
 		return 0;
 
-	sc.may_writepage = 0;
-	sc.may_swap = 0;
+	sc.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE);
+	sc.may_swap = !!(zone_reclaim_mode & RECLAIM_SWAP);
 	sc.nr_scanned = 0;
 	sc.nr_reclaimed = 0;
 	sc.priority = ZONE_RECLAIM_PRIORITY + 1;
Index: linux-2.6.16-rc1-mm4/Documentation/sysctl/vm.txt
===================================================================
--- linux-2.6.16-rc1-mm4.orig/Documentation/sysctl/vm.txt	2006-01-30 12:13:33.000000000 -0800
+++ linux-2.6.16-rc1-mm4/Documentation/sysctl/vm.txt	2006-01-30 12:22:18.000000000 -0800
@@ -127,17 +127,39 @@ the high water marks for each per cpu pa
 
 zone_reclaim_mode:
 
-This is set during bootup to 1 if it is determined that pages from
-remote zones will cause a significant performance reduction. The
+Zone_reclaim_mode allows to set more or less agressive approaches to
+reclaim memory when a zone runs out of memory. If it is set to zero then no
+zone reclaim occurs. Allocations will be satisfied from other zones / nodes
+in the system.
+
+This is value ORed together of
+
+1	= Zone reclaim on
+2	= Zone reclaim writes dirty pages out
+4	= Zone reclaim swaps pages
+
+zone_reclaim_mode is set during bootup to 1 if it is determined that pages
+from remote zones will cause a measurable performance reduction. The
 page allocator will then reclaim easily reusable pages (those page
-cache pages that are currently not used) before going off node.
+cache pages that are currently not used) before allocating off node pages.
 
-The user can override this setting. It may be beneficial to switch
-off zone reclaim if the system is used for a file server and all
-of memory should be used for caching files from disk.
+It may be beneficial to switch off zone reclaim if the system is
+used for a file server and all of memory should be used for caching files
+from disk. In that case the caching effect is more important than
+data locality.
+
+Allowing zone reclaim to write out pages stops processes that are
+writing large amounts of data from dirtying pages on other nodes. Zone
+reclaim will write out dirty pages if a zone fills up and so effectively
+throttle the process. This may decrease the performance of a single process
+since it cannot use all of system memory to buffer the outgoing writes
+anymore but it preserve the memory on other nodes so that the performance
+of other processes running on other nodes will not be affected.
+
+Allowing regular swap effectively restricts allocations to the local
+node unless explicitly overridden by memory policies or cpuset
+configurations.
 
-It may be beneficial to switch this on if one wants to do zone
-reclaim regardless of the numa distances in the system.
 ================================================================
 
 zone_reclaim_interval:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Zone reclaim: Allow modification of zone reclaim behavior
  2006-01-30 20:24 [PATCH] Zone reclaim: Allow modification of zone reclaim behavior Christoph Lameter
@ 2006-01-30 21:45 ` Andrew Morton
  2006-01-30 21:59   ` Christoph Lameter
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2006-01-30 21:45 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm

Christoph Lameter <clameter@engr.sgi.com> wrote:
>
> In some situations one may want zone_reclaim to behave differently. For
>  example a process writing large amounts of memory will spew unto other
>  nodes to cache the writes if many pages in a zone become dirty. This may
>  impact the performance of processes running on other nodes.
> 
>  Allowing writes during reclaim puts a stop to that behavior and throttles
>  the process by restricting the pages to the local zone.
> 
>  Similarly one may want to contain processes to local memory by enabling
>  regular swap behavior during zone_reclaim. Off node memory allocation
>  can then be controlled through memory policies and cpusets.

The proliferating /proc configurability is a worry.  It'll confuse people
and people just won't know that it's there and it's yet another question
which maintenance people need to ask end-users during problem resolution.

Is there not some means by which we can simply get these things right?

Why wouldn't we want to perform writeback or swapout during zone reclaim?

Why wouldn't we want to reclaim slab during zone reclaim?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Zone reclaim: Allow modification of zone reclaim behavior
  2006-01-30 21:45 ` Andrew Morton
@ 2006-01-30 21:59   ` Christoph Lameter
  0 siblings, 0 replies; 3+ messages in thread
From: Christoph Lameter @ 2006-01-30 21:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm

On Mon, 30 Jan 2006, Andrew Morton wrote:

> The proliferating /proc configurability is a worry.  It'll confuse people
> and people just won't know that it's there and it's yet another question
> which maintenance people need to ask end-users during problem resolution.
> 
> Is there not some means by which we can simply get these things right?

I wish I knew some other way to do this. We will have to do significant 
changes to the VM to even have the data available to make the proper 
decisions in these settings. See my zone based counter patches from before 
Christmas. These allow to get rid of the reclaim_interval but are so 
extensive you would not want them for 2.6.16. More brainwork is needed 
after the counters are in to figure out way to make the other knobs 
unnecessary.

> Why wouldn't we want to perform writeback or swapout during zone reclaim?

Because that will reduce performance. If writeback is performed during 
reclaim then a process cannot dirty all of available memory. It will be 
throttled after using up all of a nodes memory. This is a significant 
regression from current performance.

If you do swapout then the process is restricted to a node and will start 
swapping if more memory starts being used than a node has avalable. This 
is going to drastically reduce performance.

zone_reclaim in its default configuration is simply throwing out pages 
that have no references left. These are pagecache pages that may be left 
from a copy operation or from an application that has terminated.

> Why wouldn't we want to reclaim slab during zone reclaim?

Because its too expensive to do and because slab reclaim is not able to 
cleanly reclaim per zone right now. It does a global shrink operation on 
nodes that may still have lots of memory available.

We can skip some of these for 2.6.16 if you do not want the knobs. The 
default behavior without the knobs should be fine for most cases.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-01-30 21:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-30 20:24 [PATCH] Zone reclaim: Allow modification of zone reclaim behavior Christoph Lameter
2006-01-30 21:45 ` Andrew Morton
2006-01-30 21:59   ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox