VM tuning patch, take 2

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* VM tuning patch, take 2
@ 2001-06-07 17:48 Jonathan Morton
  2001-06-07 16:59 ` Marcelo Tosatti
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jonathan Morton @ 2001-06-07 17:48 UTC (permalink / raw)
  To: linux-mm

I've been coding my butt off for the past ... well, couple of evenings, and
I've got a modified 2.4.5 kernel which addresses some of the problems with
stock 2.4.5 VM.  To summarise:

- ageing is now done evenly, and independently of the number of mappings on
a given page.  This is done by introducing a 4th LRU list (aside from
active, inactive_clean and inactive_dirty) which holds pages attached to a
process but not in the swapcache.  This is then scanned immediately before
calling swap_out(), and does ageing up and down.  Maintenance of the new
list is done automatically as part of the existing add_page_to_*_list()
macros, and new pages are discovered by try_to_swap_out().  Also maintains
a count of pages on the list, which I'd like to report in /proc/meminfo.

- try_to_swap_out() will now refuse to move a page into the swapcache which
still has positive age.  This helps preserve the working set information,
and may help to reduce swap bloat.  It may re-introduce the cause of cache
collapse, but I haven't seen any evidence of this being disastrous, as yet.

- new pages are still given an age of PAGE_AGE_START, which is 2.
PAGE_AGE_ADV has been increased to 4, and PAGE_AGE_MAX to 128.  Pages which
are demand-paged in from swap are given an initial age of PAGE_AGE_MAX/2,
or 64 - this should help to keep these (expensive) pages around for as long
as possible.  Ageing down is now done using a decrement instead of a
division by 2, preserving the age information for longer.

- includes the original patch to reclaim dead swapcache pages quickly.  I
need to update this to the version which includes SMP locking and is
factored out into a function.  Would be nice to include the bdflush
swapcache-reclaim patch too.

- also includes my own patches to fix vm_enough_memory() and
out_of_memory() to be consistent with each other and reality.  This is a
big bug which has gone unfixed for too long, and which people ARE noticing.

The result is a kernel which exhibits considerably better performance under
high VM load (of the limited types I have available), uses less swap, and
is far less likely to go OOM unexpectedly, than the stock 2.4.x kernels.

Compiling MySQL with 256Mb RAM and make -j 15 now takes 6m15s on my Athlon
(make -j 10 takes around 5m and completes within physical RAM), during
which the mpg123 playing on a separate terminal stutters slightly a few
times but is not badly affected (the disk containing the MP3s is physically
separate from the swap device).  Swap usage goes to around 70Mb under these
conditions.

I'm just about to test single-threaded compilation with 48Mb and 32Mb
physical RAM, for comparison.  Previous best times are 6m30s and 2h15m
respectively...

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)

The key to knowledge is not to rely on people to teach you it.

GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: VM tuning patch, take 2
  2001-06-07 17:48 VM tuning patch, take 2 Jonathan Morton
@ 2001-06-07 16:59 ` Marcelo Tosatti
  2001-06-07 18:47   ` Jonathan Morton
  2001-06-07 18:23 ` Jonathan Morton
  2001-06-09  3:15 ` Rik van Riel
  2 siblings, 1 reply; 9+ messages in thread
From: Marcelo Tosatti @ 2001-06-07 16:59 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: linux-mm


On Thu, 7 Jun 2001, Jonathan Morton wrote:

> - new pages are still given an age of PAGE_AGE_START, which is 2.
> PAGE_AGE_ADV has been increased to 4, and PAGE_AGE_MAX to 128.  Pages which
> are demand-paged in from swap are given an initial age of PAGE_AGE_MAX/2,
> or 64 - this should help to keep these (expensive) pages around for as long
> as possible.  Ageing down is now done using a decrement instead of a
> division by 2, preserving the age information for longer.

Just one comment about this specific change. I would not like to tweak the
PAGE_AGE_* values until we have centralized page aging. (ie only kswapd
doing the aging) 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: VM tuning patch, take 2
  2001-06-07 16:59 ` Marcelo Tosatti
@ 2001-06-07 18:47   ` Jonathan Morton
  2001-06-09  3:17     ` Rik van Riel
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Morton @ 2001-06-07 18:47 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-mm

>> - new pages are still given an age of PAGE_AGE_START, which is 2.
>> PAGE_AGE_ADV has been increased to 4, and PAGE_AGE_MAX to 128.  Pages which
>> are demand-paged in from swap are given an initial age of PAGE_AGE_MAX/2,
>> or 64 - this should help to keep these (expensive) pages around for as long
>> as possible.  Ageing down is now done using a decrement instead of a
>> division by 2, preserving the age information for longer.
>
>Just one comment about this specific change. I would not like to tweak the
>PAGE_AGE_* values until we have centralized page aging. (ie only kswapd
>doing the aging)

I forgot to mention, I also have applied the patch which causes allocations
to wait on kswapd.  As far as I can tell, the actual numbers attached to
the ageing matter far less than how they are applied.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)

The key to knowledge is not to rely on people to teach you it.

GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: VM tuning patch, take 2
  2001-06-07 18:47   ` Jonathan Morton
@ 2001-06-09  3:17     ` Rik van Riel
  2001-06-09  1:52       ` Marcelo Tosatti
  0 siblings, 1 reply; 9+ messages in thread
From: Rik van Riel @ 2001-06-09  3:17 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Marcelo Tosatti, linux-mm

On Thu, 7 Jun 2001, Jonathan Morton wrote:

> I forgot to mention, I also have applied the patch which causes
> allocations to wait on kswapd.  As far as I can tell, the actual
> numbers attached to the ageing matter far less than how they are
> applied.

Ahhh cool, this should indeed cause lots of CPU eating problems.

I have a similar patch which makes processes wait on IO completion
when they find too many dirty pages on the inactive_dirty list ;)

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: VM tuning patch, take 2
  2001-06-09  3:17     ` Rik van Riel
@ 2001-06-09  1:52       ` Marcelo Tosatti
  2001-06-09  3:55         ` Rik van Riel
  0 siblings, 1 reply; 9+ messages in thread
From: Marcelo Tosatti @ 2001-06-09  1:52 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Jonathan Morton, linux-mm

On Sat, 9 Jun 2001, Rik van Riel wrote:

<snip>

> I have a similar patch which makes processes wait on IO completion
> when they find too many dirty pages on the inactive_dirty list ;)

If we ever want to make that PageLaunder thing reality (well, if we realy
want a decent VM we _need_ that) we need to make the accouting on a
buffer_head basis and decrease the amount of data being written out to
disk at end_buffer_io_sync(). 

The reason is write() --- its impossible to account for pages written
via write(). 

:( 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: VM tuning patch, take 2
  2001-06-09  1:52       ` Marcelo Tosatti
@ 2001-06-09  3:55         ` Rik van Riel
  0 siblings, 0 replies; 9+ messages in thread
From: Rik van Riel @ 2001-06-09  3:55 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Jonathan Morton, linux-mm

On Fri, 8 Jun 2001, Marcelo Tosatti wrote:
> On Sat, 9 Jun 2001, Rik van Riel wrote:
> 
> <snip>
> 
> > I have a similar patch which makes processes wait on IO completion
> > when they find too many dirty pages on the inactive_dirty list ;)
> 
> If we ever want to make that PageLaunder thing reality (well, if we realy
> want a decent VM we _need_ that) we need to make the accouting on a
> buffer_head basis and decrease the amount of data being written out to
> disk at end_buffer_io_sync(). 
> 
> The reason is write() --- its impossible to account for pages written
> via write(). 
> 
> :( 

This doesn't seem to be a big issue in my patch at all ...
See below for the patch, I'll port it to a newer kernel RSN.

The reasons why it's not a big issue with the patch:

1) we scan only part of the inactive list in the first
   scanning round, when we don't encounter freeable pages
   there, we go into the launder_loop, asynchronously write
   pages to disk and SCAN TWICE THE AMOUNT we scanned in the
   first loop ... here we can encounter clean, freeable pages

2) the inactive_list doesn't get re-ordered, if we write out
   a page we'll see it again as soon as it unlocks, instead of
   us waiting until the whole inactive_dirty list has "rolled
   over" and we've submitted all pages for IO

3) if, in the launder_loop, we failed to free pages, we leave
   a reminder for other tasks to sleep synchronously on the last
   piece of IO they submit, this way we:
	3a) don't waste CPU spinning on page_launder()
	3b) get freeable pages with less IO than we used to

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)



--- linux-2.4.5-ac2/mm/vmscan.c.orig	Fri Jun  1 22:31:48 2001
+++ linux-2.4.5-ac2/mm/vmscan.c	Mon Jun  4 11:27:33 2001
@@ -430,11 +430,19 @@
 #define MAX_LAUNDER 		(4 * (1 << page_cluster))
 int page_launder(int gfp_mask, int sync)
 {
+	static int cannot_free_pages;
 	int launder_loop, maxscan, cleaned_pages, maxlaunder;
 	int can_get_io_locks;
-	struct list_head * page_lru;
+	struct list_head * page_lru, * marker_lru;
 	struct page * page;
 
+	/* Our bookmark of where we are in the inactive_dirty list. */
+	struct page marker_page_struct = {
+		flags: (1<<PG_marker),
+		lru: { NULL, NULL },
+	};
+	marker_lru = &marker_page_struct.lru;
+
 	/*
 	 * We can only grab the IO locks (eg. for flushing dirty
 	 * buffers to disk) if __GFP_IO is set.
@@ -447,10 +455,36 @@
 
 dirty_page_rescan:
 	spin_lock(&pagemap_lru_lock);
-	maxscan = nr_inactive_dirty_pages;
-	while ((page_lru = inactive_dirty_list.prev) != &inactive_dirty_list &&
-				maxscan-- > 0) {
+	/*
+	 * By not scanning all inactive dirty pages we'll write out
+	 * really old dirty pages before evicting newer clean pages.
+	 * This should cause some LRU behaviour if we have a large
+	 * amount of inactive pages (due to eg. drop behind).
+	 *
+	 * It also makes us accumulate dirty pages until we have enough
+	 * to be worth writing to disk without causing excessive disk
+	 * seeks and eliminates the infinite penalty clean pages incurred
+	 * vs. dirty pages.
+	 */
+	maxscan = nr_inactive_dirty_pages / 4;
+	if (launder_loop)
+		maxscan *= 2;
+	list_add_tail(marker_lru, &inactive_dirty_list);
+	while ((page_lru = marker_lru->prev) != &inactive_dirty_list &&
+			maxscan-- > 0 && free_shortage()) {
 		page = list_entry(page_lru, struct page, lru);
+		/* We move the bookmark forward by flipping the page ;) */
+		list_del(page_lru);
+		list_add(page_lru, marker_lru);
+
+		/* Don't waste CPU if chances are we cannot free anything. */
+		if (launder_loop && maxlaunder < 0 && cannot_free_pages)
+			break;
+	
+		/* Skip other people's marker pages. */
+		if (PageMarker(page)) {
+			continue;
+		}
 
 		/* Wrong page on list?! (list corruption, should not happen) */
 		if (!PageInactiveDirty(page)) {
@@ -472,11 +506,9 @@
 
 		/*
 		 * The page is locked. IO in progress?
-		 * Move it to the back of the list.
+		 * Skip the page, we'll take a look when it unlocks.
 		 */
 		if (TryLockPage(page)) {
-			list_del(page_lru);
-			list_add(page_lru, &inactive_dirty_list);
 			continue;
 		}
 
@@ -490,10 +522,8 @@
 			if (!writepage)
 				goto page_active;
 
-			/* First time through? Move it to the back of the list */
+			/* First time through? Skip the page. */
 			if (!launder_loop) {
-				list_del(page_lru);
-				list_add(page_lru, &inactive_dirty_list);
 				UnlockPage(page);
 				continue;
 			}
@@ -552,7 +582,7 @@
 
 			/* The buffers were not freed. */
 			if (!clearedbuf) {
-				add_page_to_inactive_dirty_list(page);
+				add_page_to_inactive_dirty_list_marker(page);
 
 			/* The page was only in the buffer cache. */
 			} else if (!page->mapping) {
@@ -608,6 +638,8 @@
 			UnlockPage(page);
 		}
 	}
+	/* Remove our marker. */
+	list_del(marker_lru);
 	spin_unlock(&pagemap_lru_lock);
 
 	/*
@@ -626,12 +658,22 @@
 		/* If we cleaned pages, never do synchronous IO. */
 		if (cleaned_pages)
 			sync = 0;
+		/* If we cannot free pages, always sleep on IO. */
+		else if (cannot_free_pages)
+			sync = 1;
 		/* We only do a few "out of order" flushes. */
 		maxlaunder = MAX_LAUNDER;
-		/* Kflushd takes care of the rest. */
+		/* Let bdflush take care of the rest. */
 		wakeup_bdflush(0);
 		goto dirty_page_rescan;
 	}
+
+	/*
+	 * If we failed to free pages (because all pages are dirty)
+	 * we remember this for the next time. This will prevent us
+	 * from wasting too much CPU here.
+	 */
+	cannot_free_pages = !cleaned_pages;
 
 	/* Return the number of pages moved to the inactive_clean list. */
 	return cleaned_pages;
--- linux-2.4.5-ac2/include/linux/mm.h.orig	Fri Jun  1 22:33:26 2001
+++ linux-2.4.5-ac2/include/linux/mm.h	Mon Jun  4 09:49:52 2001
@@ -282,6 +282,7 @@
 #define PG_skip			10
 #define PG_inactive_clean	11
 #define PG_highmem		12
+#define PG_marker		13
 				/* bits 21-29 unused */
 #define PG_arch_1		30
 #define PG_reserved		31
@@ -353,6 +354,9 @@
 #define PageInactiveClean(page)	test_bit(PG_inactive_clean, &(page)->flags)
 #define SetPageInactiveClean(page)	set_bit(PG_inactive_clean, &(page)->flags)
 #define ClearPageInactiveClean(page)	clear_bit(PG_inactive_clean, &(page)->flags)
+
+#define PageMarker(page)	test_bit(PG_marker, &(page)->flags)
+#define SetPageMarker(page)	set_bit(PG_marker, &(page)->flags)
 
 #ifdef CONFIG_HIGHMEM
 #define PageHighMem(page)		test_bit(PG_highmem, &(page)->flags)
--- linux-2.4.5-ac2/include/linux/fs.h.orig	Mon Jun  4 09:41:21 2001
+++ linux-2.4.5-ac2/include/linux/fs.h	Mon Jun  4 09:49:39 2001
@@ -1309,7 +1309,6 @@
 extern void set_blocksize(kdev_t, int);
 extern struct buffer_head * bread(kdev_t, int, int);
 extern void wakeup_bdflush(int wait);
-extern int flush_dirty_buffers(int);
 
 extern int brw_page(int, struct page *, kdev_t, int [], int);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: VM tuning patch, take 2
  2001-06-07 17:48 VM tuning patch, take 2 Jonathan Morton
  2001-06-07 16:59 ` Marcelo Tosatti
@ 2001-06-07 18:23 ` Jonathan Morton
  2001-06-07 18:26   ` Jeff Garzik
  2001-06-09  3:15 ` Rik van Riel
  2 siblings, 1 reply; 9+ messages in thread
From: Jonathan Morton @ 2001-06-07 18:23 UTC (permalink / raw)
  To: linux-mm

>I'm just about to test single-threaded compilation with 48Mb and 32Mb
>physical RAM, for comparison.  Previous best times are 6m30s and 2h15m
>respectively...

...which have now completed.  Results as follows:

mem=	2.4.5		earlier tweaks	now
48M	8m30s		6m30s		5m58s
32M	unknown		2h15m		12m34s

That's some improvement!  :D  Now to do the cleanups and make the diff.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)

The key to knowledge is not to rely on people to teach you it.

GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: VM tuning patch, take 2
  2001-06-07 18:23 ` Jonathan Morton
@ 2001-06-07 18:26   ` Jeff Garzik
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Garzik @ 2001-06-07 18:26 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: linux-mm

Jonathan Morton wrote:
> mem=    2.4.5           earlier tweaks  now
> 48M     8m30s           6m30s           5m58s
> 32M     unknown         2h15m           12m34s
> 
> That's some improvement!  :D  Now to do the cleanups and make the diff.

impressive :)

-- 
Jeff Garzik      | Andre the Giant has a posse.
Building 1024    |
MandrakeSoft     |
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: VM tuning patch, take 2
  2001-06-07 17:48 VM tuning patch, take 2 Jonathan Morton
  2001-06-07 16:59 ` Marcelo Tosatti
  2001-06-07 18:23 ` Jonathan Morton
@ 2001-06-09  3:15 ` Rik van Riel
  2 siblings, 0 replies; 9+ messages in thread
From: Rik van Riel @ 2001-06-09  3:15 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: linux-mm

On Thu, 7 Jun 2001, Jonathan Morton wrote:

> - ageing is now done evenly, and independently of the number of
> mappings on a given page.  This is done by introducing a 4th LRU list
> (aside from active, inactive_clean and inactive_dirty) which holds
> pages attached to a process but not in the swapcache.

IMHO it would be better to add these to the active list so both
filesystem-backed and swap-backed pages will be aged the same.

> - try_to_swap_out() will now refuse to move a page into the swapcache
> which still has positive age.  This helps preserve the working set
> information, and may help to reduce swap bloat.  It may re-introduce
> the cause of cache collapse, but I haven't seen any evidence of this
> being disastrous, as yet.

This should only affect swap bloat and nothing else. The "cache
collapse" thing vmstat might show is just a "lack" of swap cache
pages being generated...

> - new pages are still given an age of PAGE_AGE_START, which is 2.
> PAGE_AGE_ADV has been increased to 4, and PAGE_AGE_MAX to 128.  Pages which
> are demand-paged in from swap are given an initial age of PAGE_AGE_MAX/2,
> or 64 - this should help to keep these (expensive) pages around for as long
> as possible.  Ageing down is now done using a decrement instead of a
> division by 2, preserving the age information for longer.

I think the PAGE_AGE_START should be the same for all pages.
About decrement vs. division by two, I think this is something
we may want to make tunable (I have the code for this floating
around somewhere, hold on).


regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2001-06-09  3:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-07 17:48 VM tuning patch, take 2 Jonathan Morton
2001-06-07 16:59 ` Marcelo Tosatti
2001-06-07 18:47   ` Jonathan Morton
2001-06-09  3:17     ` Rik van Riel
2001-06-09  1:52       ` Marcelo Tosatti
2001-06-09  3:55         ` Rik van Riel
2001-06-07 18:23 ` Jonathan Morton
2001-06-07 18:26   ` Jeff Garzik
2001-06-09  3:15 ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox