Re: VM tuning patch, take 2

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Rik van Riel <riel@conectiva.com.br>
To: Marcelo Tosatti <marcelo@conectiva.com.br>
Cc: Jonathan Morton <chromi@cyberspace.org>, linux-mm@kvack.org
Subject: Re: VM tuning patch, take 2
Date: Sat, 9 Jun 2001 00:55:15 -0300 (BRST)	[thread overview]
Message-ID: <Pine.LNX.4.21.0106090050240.10415-100000@imladris.rielhome.conectiva> (raw)
In-Reply-To: <Pine.LNX.4.21.0106082248320.3343-100000@freak.distro.conectiva>

On Fri, 8 Jun 2001, Marcelo Tosatti wrote:
> On Sat, 9 Jun 2001, Rik van Riel wrote:
> 
> <snip>
> 
> > I have a similar patch which makes processes wait on IO completion
> > when they find too many dirty pages on the inactive_dirty list ;)
> 
> If we ever want to make that PageLaunder thing reality (well, if we realy
> want a decent VM we _need_ that) we need to make the accouting on a
> buffer_head basis and decrease the amount of data being written out to
> disk at end_buffer_io_sync(). 
> 
> The reason is write() --- its impossible to account for pages written
> via write(). 
> 
> :( 

This doesn't seem to be a big issue in my patch at all ...
See below for the patch, I'll port it to a newer kernel RSN.

The reasons why it's not a big issue with the patch:

1) we scan only part of the inactive list in the first
   scanning round, when we don't encounter freeable pages
   there, we go into the launder_loop, asynchronously write
   pages to disk and SCAN TWICE THE AMOUNT we scanned in the
   first loop ... here we can encounter clean, freeable pages

2) the inactive_list doesn't get re-ordered, if we write out
   a page we'll see it again as soon as it unlocks, instead of
   us waiting until the whole inactive_dirty list has "rolled
   over" and we've submitted all pages for IO

3) if, in the launder_loop, we failed to free pages, we leave
   a reminder for other tasks to sleep synchronously on the last
   piece of IO they submit, this way we:
	3a) don't waste CPU spinning on page_launder()
	3b) get freeable pages with less IO than we used to

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)



--- linux-2.4.5-ac2/mm/vmscan.c.orig	Fri Jun  1 22:31:48 2001
+++ linux-2.4.5-ac2/mm/vmscan.c	Mon Jun  4 11:27:33 2001
@@ -430,11 +430,19 @@
 #define MAX_LAUNDER 		(4 * (1 << page_cluster))
 int page_launder(int gfp_mask, int sync)
 {
+	static int cannot_free_pages;
 	int launder_loop, maxscan, cleaned_pages, maxlaunder;
 	int can_get_io_locks;
-	struct list_head * page_lru;
+	struct list_head * page_lru, * marker_lru;
 	struct page * page;
 
+	/* Our bookmark of where we are in the inactive_dirty list. */
+	struct page marker_page_struct = {
+		flags: (1<<PG_marker),
+		lru: { NULL, NULL },
+	};
+	marker_lru = &marker_page_struct.lru;
+
 	/*
 	 * We can only grab the IO locks (eg. for flushing dirty
 	 * buffers to disk) if __GFP_IO is set.
@@ -447,10 +455,36 @@
 
 dirty_page_rescan:
 	spin_lock(&pagemap_lru_lock);
-	maxscan = nr_inactive_dirty_pages;
-	while ((page_lru = inactive_dirty_list.prev) != &inactive_dirty_list &&
-				maxscan-- > 0) {
+	/*
+	 * By not scanning all inactive dirty pages we'll write out
+	 * really old dirty pages before evicting newer clean pages.
+	 * This should cause some LRU behaviour if we have a large
+	 * amount of inactive pages (due to eg. drop behind).
+	 *
+	 * It also makes us accumulate dirty pages until we have enough
+	 * to be worth writing to disk without causing excessive disk
+	 * seeks and eliminates the infinite penalty clean pages incurred
+	 * vs. dirty pages.
+	 */
+	maxscan = nr_inactive_dirty_pages / 4;
+	if (launder_loop)
+		maxscan *= 2;
+	list_add_tail(marker_lru, &inactive_dirty_list);
+	while ((page_lru = marker_lru->prev) != &inactive_dirty_list &&
+			maxscan-- > 0 && free_shortage()) {
 		page = list_entry(page_lru, struct page, lru);
+		/* We move the bookmark forward by flipping the page ;) */
+		list_del(page_lru);
+		list_add(page_lru, marker_lru);
+
+		/* Don't waste CPU if chances are we cannot free anything. */
+		if (launder_loop && maxlaunder < 0 && cannot_free_pages)
+			break;
+	
+		/* Skip other people's marker pages. */
+		if (PageMarker(page)) {
+			continue;
+		}
 
 		/* Wrong page on list?! (list corruption, should not happen) */
 		if (!PageInactiveDirty(page)) {
@@ -472,11 +506,9 @@
 
 		/*
 		 * The page is locked. IO in progress?
-		 * Move it to the back of the list.
+		 * Skip the page, we'll take a look when it unlocks.
 		 */
 		if (TryLockPage(page)) {
-			list_del(page_lru);
-			list_add(page_lru, &inactive_dirty_list);
 			continue;
 		}
 
@@ -490,10 +522,8 @@
 			if (!writepage)
 				goto page_active;
 
-			/* First time through? Move it to the back of the list */
+			/* First time through? Skip the page. */
 			if (!launder_loop) {
-				list_del(page_lru);
-				list_add(page_lru, &inactive_dirty_list);
 				UnlockPage(page);
 				continue;
 			}
@@ -552,7 +582,7 @@
 
 			/* The buffers were not freed. */
 			if (!clearedbuf) {
-				add_page_to_inactive_dirty_list(page);
+				add_page_to_inactive_dirty_list_marker(page);
 
 			/* The page was only in the buffer cache. */
 			} else if (!page->mapping) {
@@ -608,6 +638,8 @@
 			UnlockPage(page);
 		}
 	}
+	/* Remove our marker. */
+	list_del(marker_lru);
 	spin_unlock(&pagemap_lru_lock);
 
 	/*
@@ -626,12 +658,22 @@
 		/* If we cleaned pages, never do synchronous IO. */
 		if (cleaned_pages)
 			sync = 0;
+		/* If we cannot free pages, always sleep on IO. */
+		else if (cannot_free_pages)
+			sync = 1;
 		/* We only do a few "out of order" flushes. */
 		maxlaunder = MAX_LAUNDER;
-		/* Kflushd takes care of the rest. */
+		/* Let bdflush take care of the rest. */
 		wakeup_bdflush(0);
 		goto dirty_page_rescan;
 	}
+
+	/*
+	 * If we failed to free pages (because all pages are dirty)
+	 * we remember this for the next time. This will prevent us
+	 * from wasting too much CPU here.
+	 */
+	cannot_free_pages = !cleaned_pages;
 
 	/* Return the number of pages moved to the inactive_clean list. */
 	return cleaned_pages;
--- linux-2.4.5-ac2/include/linux/mm.h.orig	Fri Jun  1 22:33:26 2001
+++ linux-2.4.5-ac2/include/linux/mm.h	Mon Jun  4 09:49:52 2001
@@ -282,6 +282,7 @@
 #define PG_skip			10
 #define PG_inactive_clean	11
 #define PG_highmem		12
+#define PG_marker		13
 				/* bits 21-29 unused */
 #define PG_arch_1		30
 #define PG_reserved		31
@@ -353,6 +354,9 @@
 #define PageInactiveClean(page)	test_bit(PG_inactive_clean, &(page)->flags)
 #define SetPageInactiveClean(page)	set_bit(PG_inactive_clean, &(page)->flags)
 #define ClearPageInactiveClean(page)	clear_bit(PG_inactive_clean, &(page)->flags)
+
+#define PageMarker(page)	test_bit(PG_marker, &(page)->flags)
+#define SetPageMarker(page)	set_bit(PG_marker, &(page)->flags)
 
 #ifdef CONFIG_HIGHMEM
 #define PageHighMem(page)		test_bit(PG_highmem, &(page)->flags)
--- linux-2.4.5-ac2/include/linux/fs.h.orig	Mon Jun  4 09:41:21 2001
+++ linux-2.4.5-ac2/include/linux/fs.h	Mon Jun  4 09:49:39 2001
@@ -1309,7 +1309,6 @@
 extern void set_blocksize(kdev_t, int);
 extern struct buffer_head * bread(kdev_t, int, int);
 extern void wakeup_bdflush(int wait);
-extern int flush_dirty_buffers(int);
 
 extern int brw_page(int, struct page *, kdev_t, int [], int);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

next prev parent reply	other threads:[~2001-06-09  3:55 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-06-07 17:48 Jonathan Morton
2001-06-07 16:59 ` Marcelo Tosatti
2001-06-07 18:47   ` Jonathan Morton
2001-06-09  3:17     ` Rik van Riel
2001-06-09  1:52       ` Marcelo Tosatti
2001-06-09  3:55         ` Rik van Riel [this message]
2001-06-07 18:23 ` Jonathan Morton
2001-06-07 18:26   ` Jeff Garzik
2001-06-09  3:15 ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.21.0106090050240.10415-100000@imladris.rielhome.conectiva \
    --to=riel@conectiva.com.br \
    --cc=chromi@cyberspace.org \
    --cc=linux-mm@kvack.org \
    --cc=marcelo@conectiva.com.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox