[PATCH] page aging + launder 2.4.9-ac15

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] page aging + launder 2.4.9-ac15
@ 2001-09-24 23:53 Rik van Riel
  0 siblings, 0 replies; only message in thread
From: Rik van Riel @ 2001-09-24 23:53 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

[ TESTERS WANTED ]

Hi,

the following patch implements some minor page aging changes
and a rather large page_launder() change for 2.4.9-ac15:

page aging:
 - use min()/max()
 - only enforce inactive target when we have a
   free shortage()
 - set the page age to a minimum of PAGE_AGE_START
   when a page is added back to the active list
 - still age pages even when we don't have an
   inactive shortage in the zone, this way we keep
   the ages up-to-date in workloads where we're
   'grazing' a zone lightly

page launder:
 - stick to the idea of not writing out pages when
   the amount of dirty pages is relatively small ...
 - ... but drop the launder_loop logic, making kswapd
   much more CPU friendly in extreme cases
 - writeout is controlled by a static variable, which
   is set if the percentage of not-freed pages goes above
   a high water mark and is cleared once it goes below
   the low water mark
 - (these watermarks are still untuned ... need to fix that)
 - some other random cruft was removed from page_launder()


Testers for this patch are requested.  Please try this thing
under various workloads, in particular the workload you're
using your computer under most of the time ... ;)

regards,

Rik
--
IA64: a worthy successor to the i860.

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/



--- linux-2.4.9-ac15/mm/vmscan.c.orig	Mon Sep 24 17:30:48 2001
+++ linux-2.4.9-ac15/mm/vmscan.c	Mon Sep 24 20:39:41 2001
@@ -28,19 +28,12 @@

 static inline void age_page_up(struct page *page)
 {
-	unsigned long age = page->age;
-	age += PAGE_AGE_ADV;
-	if (age > PAGE_AGE_MAX)
-		age = PAGE_AGE_MAX;
-	page->age = age;
+	page->age = min((int) (page->age + PAGE_AGE_ADV), PAGE_AGE_MAX);
 }

 static inline void age_page_down(struct page *page)
 {
-	unsigned long age = page->age;
-	if (age > 0)
-		age -= PAGE_AGE_DECL;
-	page->age = age;
+	page->age -= min(PAGE_AGE_DECL, (int)page->age);
 }

 /*
@@ -108,6 +101,23 @@
 	pte_t pte;
 	swp_entry_t entry;

+	/* Don't look at this pte if it's been accessed recently. */
+	if (ptep_test_and_clear_young(page_table)) {
+		age_page_up(page);
+		return;
+	}
+
+	/*
+	 * If the page is on the active list, page aging is done in
+	 * refill_inactive_scan(), anonymous pages are aged here.
+	 * This is done so heavily shared pages (think libc.so)
+	 * don't get punished heavily while they are still in use.
+	 * The alternative would be to put anonymous pages on the
+	 * active list too, but that increases complexity (for now).
+	 */
+	if (!PageActive(page))
+		age_page_down(page);
+
 	/*
 	 * If we have plenty inactive pages on this
 	 * zone, skip it.
@@ -133,23 +143,6 @@
 	pte = ptep_get_and_clear(page_table);
 	flush_tlb_page(vma, address);

-	/* Don't look at this pte if it's been accessed recently. */
-	if (ptep_test_and_clear_young(page_table)) {
-		age_page_up(page);
-		return;
-	}
-
-	/*
-	 * If the page is on the active list, page aging is done in
-	 * refill_inactive_scan(), anonymous pages are aged here.
-	 * This is done so heavily shared pages (think libc.so)
-	 * don't get punished heavily while they are still in use.
-	 * The alternative would be to put anonymous pages on the
-	 * active list too, but that increases complexity (for now).
-	 */
-	if (!PageActive(page))
-		age_page_down(page);
-
 	/*
 	 * Is the page already in the swap cache? If so, then
 	 * we can just drop our reference to it without doing
@@ -447,6 +440,7 @@
 				page_count(page) > (1 + !!page->buffers)) {
 			del_page_from_inactive_clean_list(page);
 			add_page_to_active_list(page);
+			page->age = max((int)page->age, PAGE_AGE_START);
 			continue;
 		}

@@ -536,34 +530,56 @@
  * @gfp_mask: what operations we are allowed to do
  * @sync: are we allowed to do synchronous IO in emergencies ?
  *
- * When this function is called, we are most likely low on free +
- * inactive_clean pages. Since we want to refill those pages as
- * soon as possible, we'll make two loops over the inactive list,
- * one to move the already cleaned pages to the inactive_clean lists
- * and one to (often asynchronously) clean the dirty inactive pages.
+ * This function is called when we are low on free / inactive_clean
+ * pages, its purpose is to refill the free/clean list as efficiently
+ * as possible.
  *
- * In situations where kswapd cannot keep up, user processes will
- * end up calling this function. Since the user process needs to
- * have a page before it can continue with its allocation, we'll
- * do synchronous page flushing in that case.
+ * This means we do writes asynchronously as long as possible and will
+ * only sleep on IO when we don't have another option. Since writeouts
+ * cause disk seeks and make read IO slower, we skip writes alltogether
+ * when the amount of dirty pages is small.
  *
  * This code is heavily inspired by the FreeBSD source code. Thanks
  * go out to Matthew Dillon.
  */
-#define MAX_LAUNDER 		(4 * (1 << page_cluster))
-#define CAN_DO_FS		(gfp_mask & __GFP_FS)
-#define CAN_DO_IO		(gfp_mask & __GFP_IO)
+#define	CAN_DO_FS	((gfp_mask & __GFP_FS) && should_write)
+#define	WRITE_LOW_WATER		10
+#define	WRITE_HIGH_WATER	25
 int page_launder(int gfp_mask, int sync)
 {
-	int launder_loop, maxscan, cleaned_pages, maxlaunder;
+	int maxscan, cleaned_pages, failed_pages, total;
 	struct list_head * page_lru;
 	struct page * page;

-	launder_loop = 0;
-	maxlaunder = 0;
+	/*
+	 * This flag determines if we should do writeouts of dirty data
+	 * or not.  When more than WRITE_HIGH_WATER percentage of the
+	 * pages we process would need to be written out we set this flag
+	 * and will do writeout, the flag is cleared once we go below
+	 * WRITE_LOW_WATER.  Note that only pages we actually process
+	 * get counted, ie. pages where we make it beyond the TryLock.
+	 *
+	 * XXX: These flags still need tuning.
+	 */
+	static int should_write = 0;
+
 	cleaned_pages = 0;
+	failed_pages = 0;
+
+	/*
+	 * The gfp_mask tells try_to_free_buffers() below if it should
+	 * wait do IO or may be allowed to wait on IO synchronously.
+	 *
+	 * Note that syncronous IO only happens when a page has not been
+	 * written out yet when we see it for a second time, this is done
+	 * through magic in try_to_free_buffers().
+	 */
+	if (!should_write)
+		gfp_mask &= ~(__GFP_WAIT | __GFP_IO);
+	else if (!sync)
+		gfp_mask &= ~__GFP_WAIT;

-dirty_page_rescan:
+	/* The main launder loop. */
 	spin_lock(&pagemap_lru_lock);
 	maxscan = nr_inactive_dirty_pages;
 	while ((page_lru = inactive_dirty_list.prev) != &inactive_dirty_list &&
@@ -579,34 +595,24 @@
 			continue;
 		}

-		/*
-		 * If the page is or was in use, we move it to the active
-		 * list. Note that pages in use by processes can easily
-		 * end up here to be unmapped later, but we just don't
-		 * want them clogging up the inactive list.
-		 */
+		/* Page is or was in use?  Move it to the active list. */
 		if ((PageReferenced(page) || page->age > 0 ||
 				page_count(page) > (1 + !!page->buffers) ||
 				page_ramdisk(page)) && !PageLocked(page)) {
 			del_page_from_inactive_dirty_list(page);
 			add_page_to_active_list(page);
+			page->age = max((int)page->age, PAGE_AGE_START);
 			continue;
 		}

 		/*
-		 * If we have plenty free pages on a zone:
-		 *
-		 * 1) we avoid a writeout for that page if its dirty.
-		 * 2) if its a buffercache page, and not a pagecache
-		 * one, we skip it since we cannot move it to the
-		 * inactive clean list --- we have to free it.
+		 * If this zone has plenty of pages free,
+		 * don't spend time on cleaning it.
 		 */
 		if (zone_free_plenty(page->zone)) {
-			if (!page->mapping || page_dirty(page)) {
-				list_del(page_lru);
-				list_add(page_lru, &inactive_dirty_list);
-				continue;
-			}
+			list_del(page_lru);
+			list_add(page_lru, &inactive_dirty_list);
+			continue;
 		}

 		/*
@@ -633,11 +639,12 @@
 			if (!writepage)
 				goto page_active;

-			/* First time through? Move it to the back of the list */
-			if (!launder_loop || !CAN_DO_FS) {
+			/* Can't do it? Move it to the back of the list. */
+			if (!CAN_DO_FS) {
 				list_del(page_lru);
 				list_add(page_lru, &inactive_dirty_list);
 				UnlockPage(page);
+				failed_pages++;
 				continue;
 			}

@@ -664,9 +671,7 @@
 		 * buffer pages
 		 */
 		if (page->buffers) {
-			unsigned int buffer_mask;
 			int clearedbuf;
-			int freed_page = 0;
 			/*
 			 * Since we might be doing disk IO, we have to
 			 * drop the spinlock and take an extra reference
@@ -676,16 +681,8 @@
 			page_cache_get(page);
 			spin_unlock(&pagemap_lru_lock);

-			/* Will we do (asynchronous) IO? */
-			if (launder_loop && maxlaunder == 0 && sync)
-				buffer_mask = gfp_mask;				/* Do as much as we can */
-			else if (launder_loop && maxlaunder-- > 0)
-				buffer_mask = gfp_mask & ~__GFP_WAIT;			/* Don't wait, async write-out */
-			else
-				buffer_mask = gfp_mask & ~(__GFP_WAIT | __GFP_IO);	/* Don't even start IO */
-
-			/* Try to free metadata underlying the page. */
-			clearedbuf = try_to_release_page(page, buffer_mask);
+			/* Try to free the page buffers. */
+			clearedbuf = try_to_release_page(page, gfp_mask);

 			/*
 			 * Re-take the spinlock. Note that we cannot
@@ -697,11 +694,11 @@
 			/* The buffers were not freed. */
 			if (!clearedbuf) {
 				add_page_to_inactive_dirty_list(page);
+				failed_pages++;

 			/* The page was only in the buffer cache. */
 			} else if (!page->mapping) {
 				atomic_dec(&buffermem_pages);
-				freed_page = 1;
 				cleaned_pages++;

 			/* The page has more users besides the cache and us. */
@@ -722,13 +719,6 @@
 			UnlockPage(page);
 			page_cache_release(page);

-			/*
-			 * If we're freeing buffer cache pages, stop when
-			 * we've got enough free memory.
-			 */
-			if (freed_page && !free_shortage())
-				break;
-
 			continue;
 		} else if (page->mapping && !PageDirty(page)) {
 			/*
@@ -750,33 +740,22 @@
 			 */
 			del_page_from_inactive_dirty_list(page);
 			add_page_to_active_list(page);
+			page->age = max((int)page->age, PAGE_AGE_START);
 			UnlockPage(page);
 		}
 	}
 	spin_unlock(&pagemap_lru_lock);

 	/*
-	 * If we don't have enough free pages, we loop back once
-	 * to queue the dirty pages for writeout. When we were called
-	 * by a user process (that /needs/ a free page) and we didn't
-	 * free anything yet, we wait synchronously on the writeout of
-	 * MAX_SYNC_LAUNDER pages.
-	 *
-	 * We also wake up bdflush, since bdflush should, under most
-	 * loads, flush out the dirty pages before we have to wait on
-	 * IO.
+	 * Set the should_write flag, for the next callers of page_launder.
+	 * If we go below the low watermark we stop the writeout of dirty
+	 * pages, writeout is started when we get above the high watermark.
 	 */
-	if (CAN_DO_IO && !launder_loop && free_shortage()) {
-		launder_loop = 1;
-		/* If we cleaned pages, never do synchronous IO. */
-		if (cleaned_pages)
-			sync = 0;
-		/* We only do a few "out of order" flushes. */
-		maxlaunder = MAX_LAUNDER;
-		/* Kflushd takes care of the rest. */
-		wakeup_bdflush(0);
-		goto dirty_page_rescan;
-	}
+	total = failed_pages + cleaned_pages;
+	if (should_write && failed_pages * 100 < WRITE_LOW_WATER * total)
+		should_write = 0;
+	else if (!should_write && failed_pages * 100 > WRITE_HIGH_WATER * total)
+		should_write = 1;

 	/* Return the number of pages moved to the inactive_clean list. */
 	return cleaned_pages;
@@ -998,27 +977,26 @@
 	return progress;
 }

-
-
+/*
+ * Worker function for kswapd and try_to_free_pages, we get
+ * called whenever there is a shortage of free/inactive_clean
+ * pages.
+ *
+ * This function will also move pages to the inactive list,
+ * if needed.
+ */
 static int do_try_to_free_pages(unsigned int gfp_mask, int user)
 {
 	int ret = 0;

 	/*
-	 * If we're low on free pages, move pages from the
-	 * inactive_dirty list to the inactive_clean list.
-	 *
-	 * Usually bdflush will have pre-cleaned the pages
-	 * before we get around to moving them to the other
-	 * list, so this is a relatively cheap operation.
+	 * Eat memory from filesystem page cache, buffer cache,
+	 * dentry, inode and filesystem quota caches.
 	 */
-
-	if (free_shortage()) {
-		ret += page_launder(gfp_mask, user);
-		shrink_dcache_memory(0, gfp_mask);
-		shrink_icache_memory(0, gfp_mask);
-		shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
-	}
+	ret += page_launder(gfp_mask, user);
+	shrink_dcache_memory(0, gfp_mask);
+	shrink_icache_memory(0, gfp_mask);
+	shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);

 	/*
 	 * If needed, we move pages from the active list
@@ -1028,7 +1006,7 @@
 		ret += refill_inactive(gfp_mask);

 	/*
-	 * Reclaim unused slab cache if memory is low.
+	 * Reclaim unused slab cache memory.
 	 */
 	kmem_cache_reap(gfp_mask);

@@ -1080,7 +1058,7 @@
 		static long recalc = 0;

 		/* If needed, try to free some memory. */
-		if (inactive_shortage() || free_shortage())
+		if (free_shortage())
 			do_try_to_free_pages(GFP_KSWAPD, 0);

 		/* Once a second ... */
@@ -1090,8 +1068,6 @@
 			/* Do background page aging. */
 			refill_inactive_scan(DEF_PRIORITY);
 		}
-
-		run_task_queue(&tq_disk);

 		/*
 		 * We go to sleep if either the free page shortage

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2001-09-24 23:53 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-24 23:53 [PATCH] page aging + launder 2.4.9-ac15 Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox