[PATCH] Recent VM fiasco

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Recent VM fiasco - fixed
@ 2000-05-08 17:21 Zlatko Calusic
  2000-05-08 17:43 ` Rik van Riel
  0 siblings, 1 reply; 67+ messages in thread
From: Zlatko Calusic @ 2000-05-08 17:21 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 1572 bytes --]

Hi to all!

After I _finally_ got tired of the constant worse and worse VM
behaviour in the recent kernels, I thought I could spare few hours
this weekend just to see what's going on. I was quite surprised to see
that VM subsystem, while at its worst condition (at least in 2.3.x),
is quite easily repairable even to unskilled ones... I compiled and
checked few kernels back to 2.3.51, and found that new code was
constantly added just to make things go worse. Short history:

2.3.51 - mostly OK, but reading from disk takes too much CPU (kswapd)
2.3.99-pre1, 2 - as .51 + aggressive swap out during writing
2.3.99-pre3, 4, 5 - reading better
2.3.99-pre5, 6 - both reading and writing take 100% CPU!!!

I also tried some pre7-x (forgot which one) but that one was f****d up
beyond a recognition (read: was killing my processes including X11
like mad, every time I started writing to disk). Thus patch that
follows, and fixes all above mentioned problems, was made against
pre6, sorry. I'll made another patch when pre7 gets out, if things are
still not properly fixed.

BTW, this patch mostly *removes* cruft recently added, and returns to
the known state of operation. After that is achieved it is then easy
to selectively add good things I might have removed, and change
behaviour as wanted, but I would like to urge people to test things
thoroughly before releasing patches this close to 2.4.

Then again, I might have introduced bugs in this patch, too. :)
But, I *tried* to break it (spent some time doing that), and testing
didn't reveal any bad behaviour.

Enjoy!


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 15658 bytes --]

Index: 9906.2/include/linux/swap.h
--- 9906.2/include/linux/swap.h Thu, 27 Apr 2000 22:11:43 +0200 zcalusic (linux/C/b/20_swap.h 1.4.1.15.1.1 644)
+++ 9906.5/include/linux/swap.h Sun, 07 May 2000 20:39:35 +0200 zcalusic (linux/C/b/20_swap.h 1.4.1.15.1.1.1.1 644)
@@ -87,7 +87,6 @@
 
 /* linux/mm/vmscan.c */
 extern int try_to_free_pages(unsigned int gfp_mask, zone_t *zone);
-extern int swap_out(unsigned int gfp_mask, int priority);
 
 /* linux/mm/page_io.c */
 extern void rw_swap_page(int, struct page *, int);
Index: 9906.2/mm/vmscan.c
--- 9906.2/mm/vmscan.c Thu, 27 Apr 2000 22:11:43 +0200 zcalusic (linux/F/b/13_vmscan.c 1.5.1.22 644)
+++ 9906.5/mm/vmscan.c Sun, 07 May 2000 20:39:35 +0200 zcalusic (linux/F/b/13_vmscan.c 1.5.1.22.2.1 644)
@@ -48,7 +48,6 @@
 	if ((page-mem_map >= max_mapnr) || PageReserved(page))
 		goto out_failed;
 
-	mm->swap_cnt--;
 	/* Don't look at this pte if it's been accessed recently. */
 	if (pte_young(pte)) {
 		/*
@@ -220,8 +219,6 @@
 		result = try_to_swap_out(mm, vma, address, pte, gfp_mask);
 		if (result)
 			return result;
-		if (!mm->swap_cnt)
-			return 0;
 		address += PAGE_SIZE;
 		pte++;
 	} while (address && (address < end));
@@ -251,8 +248,6 @@
 		int result = swap_out_pmd(mm, vma, pmd, address, end, gfp_mask);
 		if (result)
 			return result;
-		if (!mm->swap_cnt)
-			return 0;
 		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address && (address < end));
@@ -277,8 +272,6 @@
 		int result = swap_out_pgd(mm, vma, pgdir, address, end, gfp_mask);
 		if (result)
 			return result;
-		if (!mm->swap_cnt)
-			return 0;
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		pgdir++;
 	} while (address && (address < end));
@@ -328,7 +321,7 @@
  * N.B. This function returns only 0 or 1.  Return values != 1 from
  * the lower level routines result in continued processing.
  */
-int swap_out(unsigned int priority, int gfp_mask)
+static int swap_out(unsigned int priority, int gfp_mask)
 {
 	struct task_struct * p;
 	int counter;
@@ -363,7 +356,6 @@
 		p = init_task.next_task;
 		for (; p != &init_task; p = p->next_task) {
 			struct mm_struct *mm = p->mm;
-			p->hog = 0;
 			if (!p->swappable || !mm)
 				continue;
 	 		if (mm->rss <= 0)
@@ -377,26 +369,9 @@
 				pid = p->pid;
 			}
 		}
-		if (assign == 1) {
-			/* we just assigned swap_cnt, normalise values */
-			assign = 2;
-			p = init_task.next_task;
-			for (; p != &init_task; p = p->next_task) {
-				int i = 0;
-				struct mm_struct *mm = p->mm;
-				if (!p->swappable || !mm || mm->rss <= 0)
-					continue;
-				/* small processes are swapped out less */
-				while ((mm->swap_cnt << 2 * (i + 1) < max_cnt))
-					i++;
-				mm->swap_cnt >>= i;
-				mm->swap_cnt += i; /* if swap_cnt reaches 0 */
-				/* we're big -> hog treatment */
-				if (!i)
-					p->hog = 1;
-			}
-		}
 		read_unlock(&tasklist_lock);
+		if (assign == 1)
+			assign = 2;
 		if (!best) {
 			if (!assign) {
 				assign = 1;
@@ -437,14 +412,13 @@
 {
 	int priority;
 	int count = SWAP_CLUSTER_MAX;
-	int ret;
 
 	/* Always trim SLAB caches when memory gets low. */
 	kmem_cache_reap(gfp_mask);
 
 	priority = 6;
 	do {
-		while ((ret = shrink_mmap(priority, gfp_mask, zone))) {
+		while (shrink_mmap(priority, gfp_mask, zone)) {
 			if (!--count)
 				goto done;
 		}
@@ -467,9 +441,7 @@
 			}
 		}
 
-		/* Then, try to page stuff out..
-		 * We use swapcount here because this doesn't actually
-		 * free pages */
+		/* Then, try to page stuff out.. */
 		while (swap_out(priority, gfp_mask)) {
 			if (!--count)
 				goto done;
@@ -497,10 +469,7 @@
  */
 int kswapd(void *unused)
 {
-	int i;
 	struct task_struct *tsk = current;
-	pg_data_t *pgdat;
-	zone_t *zone;
 
 	tsk->session = 1;
 	tsk->pgrp = 1;
@@ -521,25 +490,38 @@
 	 */
 	tsk->flags |= PF_MEMALLOC;
 
-	while (1) {
+	for (;;) {
+		int work_to_do = 0;
+
 		/*
 		 * If we actually get into a low-memory situation,
 		 * the processes needing more memory will wake us
 		 * up on a more timely basis.
 		 */
-		pgdat = pgdat_list;
-		while (pgdat) {
-			for (i = 0; i < MAX_NR_ZONES; i++) {
-				zone = pgdat->node_zones + i;
-				if (tsk->need_resched)
-					schedule();
-				if ((!zone->size) || (!zone->zone_wake_kswapd))
-					continue;
-				do_try_to_free_pages(GFP_KSWAPD, zone);
+		do {
+			pg_data_t *pgdat = pgdat_list;
+
+			while (pgdat) {
+				int i;
+
+				for (i = 0; i < MAX_NR_ZONES; i++) {
+					zone_t *zone = pgdat->node_zones + i;
+
+					if (!zone->size)
+						continue;
+					if (!zone->low_on_memory)
+						continue;
+					work_to_do = 1;
+					do_try_to_free_pages(GFP_KSWAPD, zone);
+				}
+				pgdat = pgdat->node_next;
 			}
-			pgdat = pgdat->node_next;
-		}
-		run_task_queue(&tq_disk);
+			run_task_queue(&tq_disk);
+			if (tsk->need_resched)
+				break;
+			if (nr_free_pages() > freepages.high)
+				break;
+		} while (work_to_do);
 		tsk->state = TASK_INTERRUPTIBLE;
 		interruptible_sleep_on(&kswapd_wait);
 	}
Index: 9906.2/mm/filemap.c
--- 9906.2/mm/filemap.c Thu, 27 Apr 2000 22:11:43 +0200 zcalusic (linux/F/b/16_filemap.c 1.6.1.3.2.4.1.1.2.2.2.1.1.21.1.1 644)
+++ 9906.5/mm/filemap.c Sun, 07 May 2000 20:39:35 +0200 zcalusic (linux/F/b/16_filemap.c 1.6.1.3.2.4.1.1.2.2.2.1.1.21.1.1.2.1 644)
@@ -238,55 +238,41 @@
 
 int shrink_mmap(int priority, int gfp_mask, zone_t *zone)
 {
-	int ret = 0, loop = 0, count;
+	int ret = 0, count;
 	LIST_HEAD(young);
 	LIST_HEAD(old);
 	LIST_HEAD(forget);
 	struct list_head * page_lru, * dispose;
-	struct page * page = NULL;
-	struct zone_struct * p_zone;
-	int maxloop = 256 >> priority;
+	struct page * page;
 	
 	if (!zone)
 		BUG();
 
-	count = nr_lru_pages >> priority;
-	if (!count)
-		return ret;
+	count = nr_lru_pages / (priority+1);
 
 	spin_lock(&pagemap_lru_lock);
-again:
-	/* we need pagemap_lru_lock for list_del() ... subtle code below */
+
 	while (count > 0 && (page_lru = lru_cache.prev) != &lru_cache) {
 		page = list_entry(page_lru, struct page, lru);
 		list_del(page_lru);
-		p_zone = page->zone;
 
-		/*
-		 * These two tests are there to make sure we don't free too
-		 * many pages from the "wrong" zone. We free some anyway,
-		 * they are the least recently used pages in the system.
-		 * When we don't free them, leave them in &old.
-		 */
-		dispose = &old;
-		if (p_zone != zone && (loop > (maxloop / 4) ||
-				p_zone->free_pages > p_zone->pages_high))
+		dispose = &lru_cache;
+		if (test_and_clear_bit(PG_referenced, &page->flags))
+			/* Roll the page at the top of the lru list,
+			 * we could also be more aggressive putting
+			 * the page in the young-dispose-list, so
+			 * avoiding to free young pages in each pass.
+			 */
 			goto dispose_continue;
 
-		/* The page is in use, or was used very recently, put it in
-		 * &young to make sure that we won't try to free it the next
-		 * time */
-		dispose = &young;
-
-		if (test_and_clear_bit(PG_referenced, &page->flags))
+		dispose = &old;
+		/* don't account passes over not DMA pages */
+		if (zone && (!memclass(page->zone, zone)))
 			goto dispose_continue;
 
 		count--;
-		if (!page->buffers && page_count(page) > 1)
-			goto dispose_continue;
 
-		/* Page not used -> free it; if that fails -> &old */
-		dispose = &old;
+		dispose = &young;
 		if (TryLockPage(page))
 			goto dispose_continue;
 
@@ -297,11 +283,22 @@
 		   page locked down ;). */
 		spin_unlock(&pagemap_lru_lock);
 
+		/* avoid unscalable SMP locking */
+		if (!page->buffers && page_count(page) > 1)
+			goto unlock_noput_continue;
+
+		/* Take the pagecache_lock spinlock held to avoid
+		   other tasks to notice the page while we are looking at its
+		   page count. If it's a pagecache-page we'll free it
+		   in one atomic transaction after checking its page count. */
+		spin_lock(&pagecache_lock);
+
 		/* avoid freeing the page while it's locked */
 		get_page(page);
 
 		/* Is it a buffer page? */
 		if (page->buffers) {
+			spin_unlock(&pagecache_lock);
 			if (!try_to_free_buffers(page))
 				goto unlock_continue;
 			/* page was locked, inode can't go away under us */
@@ -309,14 +306,9 @@
 				atomic_dec(&buffermem_pages);
 				goto made_buffer_progress;
 			}
+			spin_lock(&pagecache_lock);
 		}
 
-		/* Take the pagecache_lock spinlock held to avoid
-		   other tasks to notice the page while we are looking at its
-		   page count. If it's a pagecache-page we'll free it
-		   in one atomic transaction after checking its page count. */
-		spin_lock(&pagecache_lock);
-
 		/*
 		 * We can't free pages unless there's just one user
 		 * (count == 2 because we added one ourselves above).
@@ -325,6 +317,12 @@
 			goto cache_unlock_continue;
 
 		/*
+		 * We did the page aging part.
+		 */
+		if (nr_lru_pages < freepages.min * priority)
+			goto cache_unlock_continue;
+
+		/*
 		 * Is it a page swap page? If so, we want to
 		 * drop it if it is no longer used, even if it
 		 * were to be marked referenced..
@@ -353,13 +351,21 @@
 cache_unlock_continue:
 		spin_unlock(&pagecache_lock);
 unlock_continue:
-		spin_lock(&pagemap_lru_lock);
 		UnlockPage(page);
 		put_page(page);
+dispose_relock_continue:
+		/* even if the dispose list is local, a truncate_inode_page()
+		   may remove a page from its queue so always
+		   synchronize with the lru lock while accesing the
+		   page->lru field */
+		spin_lock(&pagemap_lru_lock);
 		list_add(page_lru, dispose);
 		continue;
 
-		/* we're holding pagemap_lru_lock, so we can just loop again */
+unlock_noput_continue:
+		UnlockPage(page);
+		goto dispose_relock_continue;
+
 dispose_continue:
 		list_add(page_lru, dispose);
 	}
@@ -374,11 +380,6 @@
 	spin_lock(&pagemap_lru_lock);
 	/* nr_lru_pages needs the spinlock */
 	nr_lru_pages--;
-
-	loop++;
-	/* wrong zone?  not looped too often?    roll again... */
-	if (page->zone != zone && loop < maxloop)
-		goto again;
 
 out:
 	list_splice(&young, &lru_cache);
Index: 9906.2/mm/page_alloc.c
--- 9906.2/mm/page_alloc.c Thu, 27 Apr 2000 22:11:43 +0200 zcalusic (linux/F/b/18_page_alloc 1.5.2.21 644)
+++ 9906.5/mm/page_alloc.c Sun, 07 May 2000 20:39:35 +0200 zcalusic (linux/F/b/18_page_alloc 1.5.2.21.2.1 644)
@@ -58,8 +58,6 @@
  */
 #define BAD_RANGE(zone,x) (((zone) != (x)->zone) || (((x)-mem_map) < (zone)->offset) || (((x)-mem_map) >= (zone)->offset+(zone)->size))
 
-#if 0
-
 static inline unsigned long classfree(zone_t *zone)
 {
 	unsigned long free = 0;
@@ -73,8 +71,6 @@
 	return(free);
 }
 
-#endif
-
 /*
  * Buddy system. Hairy. You really aren't expected to understand this
  *
@@ -156,10 +152,8 @@
 
 	spin_unlock_irqrestore(&zone->lock, flags);
 
-	if (zone->free_pages > zone->pages_high) {
-		zone->zone_wake_kswapd = 0;
+	if (zone->free_pages > zone->pages_high)
 		zone->low_on_memory = 0;
-	}
 }
 
 #define MARK_USED(index, order, area) \
@@ -186,8 +180,7 @@
 	return page;
 }
 
-static FASTCALL(struct page * rmqueue(zone_t *zone, unsigned long order));
-static struct page * rmqueue(zone_t *zone, unsigned long order)
+static inline struct page * rmqueue(zone_t *zone, unsigned long order)
 {
 	free_area_t * area = zone->free_area + order;
 	unsigned long curr_order = order;
@@ -227,115 +220,72 @@
 	return NULL;
 }
 
-static int zone_balance_memory(zonelist_t *zonelist)
-{
-	int tried = 0, freed = 0;
-	zone_t **zone;
-	int gfp_mask = zonelist->gfp_mask;
-	extern wait_queue_head_t kswapd_wait;
-
-	zone = zonelist->zones;
-	for (;;) {
-		zone_t *z = *(zone++);
-		if (!z)
-			break;
-		if (z->free_pages > z->pages_low)
-			continue;
-
-		z->zone_wake_kswapd = 1;
-		wake_up_interruptible(&kswapd_wait);
-
-		/* Are we reaching the critical stage? */
-		if (!z->low_on_memory) {
-			/* Not yet critical, so let kswapd handle it.. */
-			if (z->free_pages > z->pages_min)
-				continue;
-			z->low_on_memory = 1;
-		}
-		/*
-		 * In the atomic allocation case we only 'kick' the
-		 * state machine, but do not try to free pages
-		 * ourselves.
-		 */
-		tried = 1;
-		freed |= try_to_free_pages(gfp_mask, z);
-	}
-	if (tried && !freed) {
-		if (!(gfp_mask & __GFP_HIGH))
-			return 0;
-	}
-	return 1;
-}
-
 /*
  * This is the 'heart' of the zoned buddy allocator:
  */
 struct page * __alloc_pages(zonelist_t *zonelist, unsigned long order)
 {
 	zone_t **zone = zonelist->zones;
-	int gfp_mask = zonelist->gfp_mask;
-	static int low_on_memory;
-
-	/*
-	 * If this is a recursive call, we'd better
-	 * do our best to just allocate things without
-	 * further thought.
-	 */
-	if (current->flags & PF_MEMALLOC)
-		goto allocate_ok;
-
-	/* If we're a memory hog, unmap some pages */
-	if (current->hog && low_on_memory &&
-			(gfp_mask & __GFP_WAIT))
-		swap_out(4, gfp_mask);
 
 	/*
 	 * (If anyone calls gfp from interrupts nonatomically then it
-	 * will sooner or later tripped up by a schedule().)
+	 * will be sooner or later tripped up by a schedule().)
 	 *
 	 * We are falling back to lower-level zones if allocation
 	 * in a higher zone fails.
 	 */
 	for (;;) {
 		zone_t *z = *(zone++);
+
 		if (!z)
 			break;
+
 		if (!z->size)
 			BUG();
 
-		/* Are we supposed to free memory? Don't make it worse.. */
-		if (!z->zone_wake_kswapd && z->free_pages > z->pages_low) {
+		/*
+		 * If this is a recursive call, we'd better
+		 * do our best to just allocate things without
+		 * further thought.
+		 */
+		if (!(current->flags & PF_MEMALLOC)) {
+			if (z->free_pages <= z->pages_high) {
+				unsigned long free = classfree(z);
+
+				if (free <= z->pages_low) {
+					extern wait_queue_head_t kswapd_wait;
+
+					z->low_on_memory = 1;
+					wake_up_interruptible(&kswapd_wait);
+				}
+
+				if (free <= z->pages_min) {
+					int gfp_mask = zonelist->gfp_mask;
+
+					if (!try_to_free_pages(gfp_mask, z)) {
+						if (!(gfp_mask & __GFP_HIGH))
+							return NULL;
+					}
+				}
+			}
+		}
+
+		/*
+		 * This is an optimization for the 'higher order zone
+		 * is empty' case - it can happen even in well-behaved
+		 * systems, think the page-cache filling up all RAM.
+		 * We skip over empty zones. (this is not exact because
+		 * we do not take the spinlock and it's not exact for
+		 * the higher order case, but will do it for most things.)
+		 */
+		if (z->free_pages) {
 			struct page *page = rmqueue(z, order);
-			low_on_memory = 0;
+
 			if (page)
 				return page;
 		}
 	}
-
-	low_on_memory = 1;
-	/*
-	 * Ok, no obvious zones were available, start
-	 * balancing things a bit..
-	 */
-	if (zone_balance_memory(zonelist)) {
-		zone = zonelist->zones;
-allocate_ok:
-		for (;;) {
-			zone_t *z = *(zone++);
-			if (!z)
-				break;
-			if (z->free_pages) {
-				struct page *page = rmqueue(z, order);
-				if (page)
-					return page;
-			}
-		}
-	}
 	return NULL;
-
-/*
- * The main chunk of the balancing code is in this offline branch:
- */
 }
 
 /*
@@ -599,7 +549,6 @@
 		zone->pages_low = mask*2;
 		zone->pages_high = mask*3;
 		zone->low_on_memory = 0;
-		zone->zone_wake_kswapd = 0;
 		zone->zone_mem_map = mem_map + offset;
 		zone->zone_start_mapnr = offset;
 		zone->zone_start_paddr = zone_start_paddr;
@@ -642,7 +591,8 @@
 
 	while (get_option(&str, &zone_balance_ratio[j++]) == 2);
 	printk("setup_mem_frac: ");
-	for (j = 0; j < MAX_NR_ZONES; j++) printk("%d  ", zone_balance_ratio[j]);
+	for (j = 0; j < MAX_NR_ZONES; j++)
+		printk("%d  ", zone_balance_ratio[j]);
 	printk("\n");
 	return 1;
 }
Index: 9906.2/include/linux/mmzone.h
--- 9906.2/include/linux/mmzone.h Thu, 27 Apr 2000 22:11:43 +0200 zcalusic (linux/u/c/2_mmzone.h 1.9 644)
+++ 9906.5/include/linux/mmzone.h Sun, 07 May 2000 20:39:35 +0200 zcalusic (linux/u/c/2_mmzone.h 1.10 644)
@@ -29,7 +29,6 @@
 	unsigned long		offset;
 	unsigned long		free_pages;
 	char			low_on_memory;
-	char			zone_wake_kswapd;
 	unsigned long		pages_min, pages_low, pages_high;
 
 	/*

[-- Attachment #3: Type: text/plain, Size: 12 bytes --]


-- 
Zlatko

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-08 17:21 [PATCH] Recent VM fiasco - fixed Zlatko Calusic
@ 2000-05-08 17:43 ` Rik van Riel
  2000-05-08 18:16   ` Zlatko Calusic
  2000-05-09  7:56   ` Daniel Stone
  0 siblings, 2 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-08 17:43 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: linux-mm, linux-kernel, Linus Torvalds

On 8 May 2000, Zlatko Calusic wrote:

> BTW, this patch mostly *removes* cruft recently added, and
> returns to the known state of operation.

Which doesn't work.

Think of a 1GB machine which has a 16MB DMA zone,
a 950MB normal zone and a very small HIGHMEM zone.

With the old VM code the HIGHMEM zone would be
swapping like mad while the other two zones are
idle.

It's Not That Kind Of Party(tm)

cheers,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-08 17:43 ` Rik van Riel
@ 2000-05-08 18:16   ` Zlatko Calusic
  2000-05-08 18:20     ` Linus Torvalds
  2000-05-08 18:46     ` Rik van Riel
  2000-05-09  7:56   ` Daniel Stone
  1 sibling, 2 replies; 67+ messages in thread
From: Zlatko Calusic @ 2000-05-08 18:16 UTC (permalink / raw)
  To: riel; +Cc: linux-mm, linux-kernel, Linus Torvalds

Rik van Riel <riel@conectiva.com.br> writes:

> On 8 May 2000, Zlatko Calusic wrote:
> 
> > BTW, this patch mostly *removes* cruft recently added, and
> > returns to the known state of operation.
> 
> Which doesn't work.
> 
> Think of a 1GB machine which has a 16MB DMA zone,
> a 950MB normal zone and a very small HIGHMEM zone.
> 
> With the old VM code the HIGHMEM zone would be
> swapping like mad while the other two zones are
> idle.
> 
> It's Not That Kind Of Party(tm)
> 

OK, I see now what you have in mind, and I'll try to test it when I
get home (yes, late worker... my only connection to the Net :))
If only I could buy 1GB to test in the real setup. ;)

But still, optimizing for 1GB, while at the same time completely
killing performances even *usability* for the 99% of users doesn't
look like a good solution, does it?

There was lot of VM changes recently (>100K of patches) where we went
further and further away from the mostly stable code base (IMHO)
trying to fix zone balancing. Maybe it's time we try again, fresh from
the "start"?

I'll admit I didn't understand most of the conversation about zone
balancing recently on linux-mm. And I know it's because I didn't have
much time lately to hack the kernel, unfortunately.

But after few hours spent dealing with the horrible VM that is in the
pre6, I'm not scared anymore. And I think that solution to all our
problems with zone balancing must be very simple. But it's probably
hard to find, so it will need lots of modeling and testing. I don't
think adding few lines here and there all the time will take us
anywhere.

Regards,
-- 
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-08 18:16   ` Zlatko Calusic
@ 2000-05-08 18:20     ` Linus Torvalds
  2000-05-08 18:46     ` Rik van Riel
  1 sibling, 0 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-08 18:20 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: riel, linux-mm, linux-kernel

On 8 May 2000, Zlatko Calusic wrote:
> 
> But still, optimizing for 1GB, while at the same time completely
> killing performances even *usability* for the 99% of users doesn't
> look like a good solution, does it?

Oh, definitely. I'll make a new pre7 that has a lot of the simplifications
discussed here over the weekend, and seems to work for me (tested both on
a 512MB setup and a 64MB setup for some sanity).

This pre7 almost certainly won't be all that perfect either, but gives a
better starting point.

> But after few hours spent dealing with the horrible VM that is in the
> pre6, I'm not scared anymore.

Good. This is really not scary stuff. Much of it is quite straightforward,
and is mainly just getting the right "feel". It's really easy to make
mistakes here, but they tend to be mistakes that just makes the system act
badly, not the kind of _really_ scary mistakes (the ones that make it
corrupt disks randomly ;)

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-08 18:16   ` Zlatko Calusic
  2000-05-08 18:20     ` Linus Torvalds
@ 2000-05-08 18:46     ` Rik van Riel
  2000-05-08 18:53       ` Zlatko Calusic
  1 sibling, 1 reply; 67+ messages in thread
From: Rik van Riel @ 2000-05-08 18:46 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: linux-mm, linux-kernel, Linus Torvalds

On 8 May 2000, Zlatko Calusic wrote:
> Rik van Riel <riel@conectiva.com.br> writes:
> > On 8 May 2000, Zlatko Calusic wrote:
> > 
> > > BTW, this patch mostly *removes* cruft recently added, and
> > > returns to the known state of operation.
> > 
> > Which doesn't work.
> > 
> > Think of a 1GB machine which has a 16MB DMA zone,
> > a 950MB normal zone and a very small HIGHMEM zone.
> > 
> > With the old VM code the HIGHMEM zone would be
> > swapping like mad while the other two zones are
> > idle.
> > 
> > It's Not That Kind Of Party(tm)
> 
> OK, I see now what you have in mind, and I'll try to test it when I
> get home (yes, late worker... my only connection to the Net :))
> If only I could buy 1GB to test in the real setup. ;)
> 
> But still, optimizing for 1GB, while at the same time completely
> killing performances even *usability* for the 99% of users doesn't
> look like a good solution, does it?

20MB and 24MB machines will be in the same situation, if
that's of any help to you ;)

> But after few hours spent dealing with the horrible VM that is
> in the pre6, I'm not scared anymore. And I think that solution
> to all our problems with zone balancing must be very simple.

It is. Linus is working on a conservative & simple solution
while I'm trying a bit more "far-out" code (active and inactive
list a'la BSD, etc...). We should have at least one good VM
subsystem within the next few weeks ;)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-08 18:46     ` Rik van Riel
@ 2000-05-08 18:53       ` Zlatko Calusic
  2000-05-08 19:04         ` Rik van Riel
  0 siblings, 1 reply; 67+ messages in thread
From: Zlatko Calusic @ 2000-05-08 18:53 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel, Linus Torvalds

Rik van Riel <riel@conectiva.com.br> writes:

> 20MB and 24MB machines will be in the same situation, if
> that's of any help to you ;)
> 

Yes, you are right. And thanks for that tip (booting with mem=24m)
because that will be my first test case later tonight.

> > But after few hours spent dealing with the horrible VM that is
> > in the pre6, I'm not scared anymore. And I think that solution
> > to all our problems with zone balancing must be very simple.
> 
> It is. Linus is working on a conservative & simple solution
> while I'm trying a bit more "far-out" code (active and inactive
> list a'la BSD, etc...). We should have at least one good VM
> subsystem within the next few weeks ;)
> 

Nice. I'm also in favour of some kind of active/inactive list
solution (looks promising), but that is probably 2.5.x stuff.

I would be happy to see 2.4 out ASAP. Later, when it stabilizes, we
will have lots of fun in 2.5, that's for sure.

Regards,
-- 
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-08 18:53       ` Zlatko Calusic
@ 2000-05-08 19:04         ` Rik van Riel
  0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-08 19:04 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: linux-mm, linux-kernel, Linus Torvalds

On 8 May 2000, Zlatko Calusic wrote:
> Rik van Riel <riel@conectiva.com.br> writes:
> 
> > > But after few hours spent dealing with the horrible VM that is
> > > in the pre6, I'm not scared anymore. And I think that solution
> > > to all our problems with zone balancing must be very simple.
> > 
> > It is. Linus is working on a conservative & simple solution
> > while I'm trying a bit more "far-out" code (active and inactive
> > list a'la BSD, etc...). We should have at least one good VM
> > subsystem within the next few weeks ;)
> 
> Nice. I'm also in favour of some kind of active/inactive list
> solution (looks promising), but that is probably 2.5.x stuff.

I have it booting (against pre7-4) and it seems almost
stable ;)  (with _low_ overhead)

> I would be happy to see 2.4 out ASAP. Later, when it stabilizes,
> we will have lots of fun in 2.5, that's for sure.

Of course, this has the highest priority.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-08 17:43 ` Rik van Riel
  2000-05-08 18:16   ` Zlatko Calusic
@ 2000-05-09  7:56   ` Daniel Stone
  2000-05-09  8:25     ` Christoph Rohland
  2000-05-09 10:21     ` Rik van Riel
  1 sibling, 2 replies; 67+ messages in thread
From: Daniel Stone @ 2000-05-09  7:56 UTC (permalink / raw)
  To: riel; +Cc: Zlatko Calusic, linux-mm, linux-kernel, Linus Torvalds

Rik,
That's astonishing, I'm sure, but think of us poor bastards who DON'T have
an SMP machine with >1gig of RAM.

This is a P120, 32meg. Lately, fine has degenerated into bad into worse
into absolutely obscene. It even kills my PGSQL compiles.
And I killed *EVERYTHING* there was to kill.
The only processes were init, bash and gcc/cc1. VM still wiped it out.

d

On Mon, 8 May 2000, Rik van Riel wrote:

> On 8 May 2000, Zlatko Calusic wrote:
> 
> > BTW, this patch mostly *removes* cruft recently added, and
> > returns to the known state of operation.
> 
> Which doesn't work.
> 
> Think of a 1GB machine which has a 16MB DMA zone,
> a 950MB normal zone and a very small HIGHMEM zone.
> 
> With the old VM code the HIGHMEM zone would be
> swapping like mad while the other two zones are
> idle.
> 
> It's Not That Kind Of Party(tm)
> 
> cheers,
> 
> Rik
> --
> The Internet is not a network of computers. It is a network
> of people. That is its real strength.
> 
> Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
> http://www.conectiva.com/		http://www.surriel.com/
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-09  7:56   ` Daniel Stone
@ 2000-05-09  8:25     ` Christoph Rohland
  2000-05-09 15:44       ` Linus Torvalds
  2000-05-09 10:21     ` Rik van Riel
  1 sibling, 1 reply; 67+ messages in thread
From: Christoph Rohland @ 2000-05-09  8:25 UTC (permalink / raw)
  To: Daniel Stone; +Cc: riel, Zlatko Calusic, linux-mm, linux-kernel, Linus Torvalds

Daniel Stone <tamriel@ductape.net> writes:

> That's astonishing, I'm sure, but think of us poor bastards who
> DON'T have an SMP machine with >1gig of RAM.

He has to care obout us fortunate guys with e.g. 8GB memory also. The
recent kernels are broken for that also.

Greetings
		Christoph
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-09  8:25     ` Christoph Rohland
@ 2000-05-09 15:44       ` Linus Torvalds
  2000-05-09 16:12         ` Simon Kirby
                           ` (2 more replies)
  0 siblings, 3 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-09 15:44 UTC (permalink / raw)
  To: Christoph Rohland
  Cc: Daniel Stone, riel, Zlatko Calusic, linux-mm, linux-kernel


On 9 May 2000, Christoph Rohland wrote:

> Daniel Stone <tamriel@ductape.net> writes:
> 
> > That's astonishing, I'm sure, but think of us poor bastards who
> > DON'T have an SMP machine with >1gig of RAM.
> 
> He has to care obout us fortunate guys with e.g. 8GB memory also. The
> recent kernels are broken for that also.

Try out the really recent one - pre7-8. So far it hassome good reviews,
and I've tested it both on a 20MB machine and a 512MB one..

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-09 15:44       ` Linus Torvalds
@ 2000-05-09 16:12         ` Simon Kirby
  2000-05-09 17:42         ` Christoph Rohland
  2000-05-10  4:05         ` James H. Cloos Jr.
  2 siblings, 0 replies; 67+ messages in thread
From: Simon Kirby @ 2000-05-09 16:12 UTC (permalink / raw)
  To: linux-mm, linux-kernel

On Tue, May 09, 2000 at 08:44:43AM -0700, Linus Torvalds wrote:

> On 9 May 2000, Christoph Rohland wrote:
> 
> > Daniel Stone <tamriel@ductape.net> writes:
> > 
> > > That's astonishing, I'm sure, but think of us poor bastards who
> > > DON'T have an SMP machine with >1gig of RAM.
> > 
> > He has to care obout us fortunate guys with e.g. 8GB memory also. The
> > recent kernels are broken for that also.
> 
> Try out the really recent one - pre7-8. So far it hassome good reviews,
> and I've tested it both on a 20MB machine and a 512MB one..

On my box with 128 MB dual SMP 450 MHz box, there's still definitely
something broken (pre7-8).  I notice it most with mutt loading the
linux-kernel folder... The folder is about 54 MB, and it takes kswapd
about 3 to 4 seconds of CPU time to clear out old stuff when it loads. 
This is pretty bad considering mutt itself takes only about 5 seconds
of real time to load the folder.

The main thing that fills up my cache is mainly playback of MP3s off
disk, which is pretty much running all the time.  If I open the folder,
quit, let MP3 playing fill eat up the free memory into cache, and then
run mutt again, kswapd use goes up 3 or 4 seconds further again.

I never used to see this with 2.2 kernels...

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-09 15:44       ` Linus Torvalds
  2000-05-09 16:12         ` Simon Kirby
@ 2000-05-09 17:42         ` Christoph Rohland
  2000-05-09 19:50           ` Linus Torvalds
  2000-05-10  4:05         ` James H. Cloos Jr.
  2 siblings, 1 reply; 67+ messages in thread
From: Christoph Rohland @ 2000-05-09 17:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Stone, riel, Zlatko Calusic, linux-mm, linux-kernel

Linus Torvalds <torvalds@transmeta.com> writes:

> Try out the really recent one - pre7-8. So far it hassome good reviews,
> and I've tested it both on a 20MB machine and a 512MB one..

Nope, does more or less lockup after the first attempt to swap
something out. I can still run ls and free. but as soon as something
touches /proc it locks up. Also my test programs do not do anything
any more.

I append the mem and task info from sysrq. Mem info seems to not
change after lockup.

Greetings
		Christoph

SysRq: Show Memory
Mem-info:
Free pages:      713756kB (  2040kB HighMem)
( Free: 178439, lru_cache: 3149 (1024 2048 3072) )
  DMA: 1*4kB 2*8kB 1*16kB 4*32kB 3*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 6*2048kB = 13796kB)
  Normal: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 340*2048kB = 697920kB)
  HighMem: 2*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB = 2040kB)
Swap cache: add 0, delete 0, find 0/0
Free swap:       4048296kB
2162688 pages of RAM
1867776 pages of HIGHMEM
104332 reserved pages
868894 pages shared
0 pages swap cached
0 pages in page table cache
Buffer memory:     1340kB
    CLEAN: 175 buffers, 700 kbyte, 3 used (last=47), 0 locked, 0 protected, 0 dirty
   LOCKED: 217 buffers, 868 kbyte, 19 used (last=190), 0 locked, 0 protected, 0
dirty                                                                           

SysRq: Show State

                         free                        sibling
  task             PC    stack   pid father child younger older
init      R C1089F0C     0     1      0   612  (NOTLB)
   sig: 0 0000000000000000 0000000000000000 : X
kswapd    D C4F154E8     0     2      1        (L-TLB)       3
   sig: 0 0000000000000000 ffffffffffffffff : X
kflushd   S C3FEC000     0     3      1        (L-TLB)       4     2
   sig: 0 0000000000000000 ffffffffffffffff : X
kupdate   R C3FEBFC4     0     4      1        (L-TLB)     278     3
   sig: 0 0000000000000000 fffffffffff9ffff : X
portmap   S 7FFFFFFF  2856   278      1        (NOTLB)     341     4
   sig: 0 0000000000000000 0000000000000000 : X
syslogd   R 7FFFFFFF     0   341      1        (NOTLB)     352   278
   sig: 1 0000000000002000 0000000000000000 : 14 X
klogd     R C3E78000     0   352      1        (NOTLB)     368   341
   sig: 0 0000000000000000 0000000000000000 : X
atd       S C3E49F78  2856   368      1        (NOTLB)     384   352
   sig: 0 0000000000000000 0000000000010000 : X
crond     R C3E3DF78  2856   384      1        (NOTLB)     404   368
   sig: 0 0000000000000000 0000000000000000 : X
inetd     S 7FFFFFFF  2856   404      1        (NOTLB)     413   384
   sig: 0 0000000000000000 0000000000000000 : X
sshd      S 7FFFFFFF     0   413      1   634  (NOTLB)     429   404
   sig: 0 0000000000000000 0000000000000000 : X
lpd       S 7FFFFFFF     0   429      1        (NOTLB)     469   413
   sig: 0 0000000000000000 0000000000000000 : X
automount  R C3EC56C0     0   469      1        (NOTLB)     471   429
   sig: 1 0000000000002000 0000000000000000 : 14 X
automount  R CD486AA0  4992   471      1        (NOTLB)     511   469
   sig: 1 0000000000002000 0000000000000000 : 14 X
sendmail  R C119FF0C  5956   511      1        (NOTLB)     528   471
   sig: 0 0000000000000000 0000000000000000 : X
gpm       S C117BF0C     0   528      1        (NOTLB)     544   511
   sig: 0 0000000000000000 0000000000000000 : X
httpd     R C1181F0C     0   544      1   557  (NOTLB)     571   528
   sig: 0 0000000000000000 0000000000000000 : X
httpd     S C117DF38     0   548    544        (NOTLB)     549
   sig: 0 0000000000000000 0000000000000000 : X
httpd     S C1185F38     0   549    544        (NOTLB)     550   548
   sig: 0 0000000000000000 0000000000000000 : X
httpd     S 7FFFFFFF     0   550    544        (NOTLB)     551   549
   sig: 0 0000000000000000 0000000000000000 : X
httpd     S C113FF38     0   551    544        (NOTLB)     552   550
   sig: 0 0000000000000000 0000000000000000 : X
httpd     S C1133F38     0   552    544        (NOTLB)     553   551
   sig: 0 0000000000000000 0000000000000000 : X
httpd     S C1129F38     0   553    544        (NOTLB)     554   552
   sig: 0 0000000000000000 0000000000000000 : X
httpd     S C1127F38     0   554    544        (NOTLB)     555   553
   sig: 0 0000000000000000 0000000000000000 : X
httpd     S C110FF38     0   555    544        (NOTLB)     556   554
   sig: 0 0000000000000000 0000000000000000 : X
httpd     S C1101F38     0   556    544        (NOTLB)     557   555
   sig: 0 0000000000000000 0000000000000000 : X
httpd     S F75F5F38     0   557    544        (NOTLB)           556
   sig: 0 0000000000000000 0000000000000000 : X
xfs       S F75B7F0C     0   571      1        (NOTLB)     606   544
   sig: 0 0000000000000000 0000000000000000 : X
mingetty  S 7FFFFFFF  5124   606      1        (NOTLB)     607   571
   sig: 0 0000000000000000 0000000000000000 : X
mingetty  S 7FFFFFFF  2856   607      1        (NOTLB)     608   606
   sig: 0 0000000000000000 0000000000000000 : X
mingetty  S 7FFFFFFF  2856   608      1        (NOTLB)     609   607
   sig: 0 0000000000000000 0000000000000000 : X
mingetty  S 7FFFFFFF  2856   609      1        (NOTLB)     610   608
   sig: 0 0000000000000000 0000000000000000 : X
mingetty  S 7FFFFFFF  2856   610      1        (NOTLB)     611   609
   sig: 0 0000000000000000 0000000000000000 : X
mingetty  S 7FFFFFFF  2856   611      1        (NOTLB)     612   610
   sig: 0 0000000000000000 0000000000000000 : X
login     S 00000000  2856   612      1   617  (NOTLB)           611
   sig: 0 0000000000000000 0000000000000000 : X
bash      S 00000000     0   617    612   633  (NOTLB)
   sig: 0 0000000000000000 0000000000010000 : X
vmstat    R F74E5F78     0   633    617        (NOTLB)
   sig: 1 0000000000080000 0000000000000000 : 20 X
sshd      R 7FFFFFFF     0   634    413   636  (NOTLB)
   sig: 0 0000000000000000 0000000000000000 : X
xterm     S 7FFFFFFF  4900   636    634   639  (NOTLB)
   sig: 0 0000000000000000 0000000000000000 : X
bash      S 7FFFFFFF     0   639    636   652  (NOTLB)
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    R F746A000  2856   642    639   651  (NOTLB)     652
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    D F6AB52B4  2856   643    642        (NOTLB)     644
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    R F7458000     0   644    642        (NOTLB)     645   643
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    R F7448000     0   645    642        (NOTLB)     646   644
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    R F7436000     0   646    642        (NOTLB)     647   645
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    R C01DCB90     0   647    642        (NOTLB)     648   646
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    R current      0   648    642        (NOTLB)     649   647
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    R F746FCB4     0   649    642        (NOTLB)     650   648
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    R C0123017     0   650    642        (NOTLB)     651   649
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    R F73E2000     0   651    642        (NOTLB)           650
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    R F678C000     0   652    639   653  (NOTLB)           642
   sig: 0 0000000000000000 0000000000000000 : X
ipctst    R F6784000  5612   653    652        (NOTLB)
   sig: 0 0000000000000000 0000000000000000 : X
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-09 17:42         ` Christoph Rohland
@ 2000-05-09 19:50           ` Linus Torvalds
  2000-05-10 11:25             ` Christoph Rohland
  0 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-09 19:50 UTC (permalink / raw)
  To: Christoph Rohland
  Cc: Daniel Stone, riel, Zlatko Calusic, linux-mm, linux-kernel

On 9 May 2000, Christoph Rohland wrote:

> Linus Torvalds <torvalds@transmeta.com> writes:
> 
> > Try out the really recent one - pre7-8. So far it hassome good reviews,
> > and I've tested it both on a 20MB machine and a 512MB one..
> 
> Nope, does more or less lockup after the first attempt to swap
> something out. I can still run ls and free. but as soon as something
> touches /proc it locks up. Also my test programs do not do anything
> any more.

This may be due to an unrelated bug with the task_lock() fixing (see
separate patch from Manfred for that one).

> I append the mem and task info from sysrq. Mem info seems to not
> change after lockup.

I suspect that if you do right-alt + scrolllock, you'll see it looping on
a spinlock. Which is why the memory info isn't changing ;)

But I'll double-check the shm code (I didn't test anything that did any
shared memory, for example).

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-09 19:50           ` Linus Torvalds
@ 2000-05-10 11:25             ` Christoph Rohland
  2000-05-10 11:50               ` Zlatko Calusic
  0 siblings, 1 reply; 67+ messages in thread
From: Christoph Rohland @ 2000-05-10 11:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Stone, riel, Zlatko Calusic, linux-mm, linux-kernel

Linus Torvalds <torvalds@transmeta.com> writes:

> On 9 May 2000, Christoph Rohland wrote:
> 
> > Linus Torvalds <torvalds@transmeta.com> writes:
> > 
> > > Try out the really recent one - pre7-8. So far it hassome good reviews,
> > > and I've tested it both on a 20MB machine and a 512MB one..

> > I append the mem and task info from sysrq. Mem info seems to not
> > change after lockup.
> 
> I suspect that if you do right-alt + scrolllock, you'll see it looping on
> a spinlock. Which is why the memory info isn't changing ;)
> 
> But I'll double-check the shm code (I didn't test anything that did any
> shared memory, for example).

Juan Quintela's patch fixes the lockup. shm paging locked up on the
page lock.

Now I can give more data about pre7-8. After a short run I can say the
following:

The machine seems to be stable, but VM is mainly unbalanced:

[root@ls3016 /root]# vmstat 5
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id

[...]

 9  3  0      0 1460016   1588  11284   0   0     0     0  109 23524   4  96   0
 9  3  1   7552 557432   1004  19320   0 1607     0   402  186 42582   2  89   9
11  1  1  41972 111368    424  53740   0 6884     2  1721  277 25904   0  89  10
11  1  0  48084  11896    276  59404   0 1133     1   284  181  4439   0  95   5
13  2  2  48352 466952    180  52960   5 158     4    39  230  6381   2  98   0
10  3  1  53400 934204    248  59940 498 1442   128   363  272  3953   1  99   0
11  3  1  52624 878696    300  59820 248  50    81    13  148   971   0 100   0
11  1  0   4556 883852    316  16164 855   0   214     1  127 25188   3  97   0
12  0  0   3936 525620    316  15544   0   0     0     0  109 33969   4  96   0
12  0  0   3936 2029556    316  15544   0   0     0     0  123 19659   4  96   0
11  1  0   3936 686856    316  15544   0   0     0     0  117 14370   3  97   0
12  0  0   3936 388176    320  15544   0   0     0     0  121  7477   3  97   0
10  3  1  47660   5216     88  19992   0 9353     0  2341  757  1267   0  97   3
 VM: killing process ipctst
 6  6  1  36792 484880    152  26892  65 12307    21  3078 1619  2184   0  94   6
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
10  1  1  39620  66736    148  29364   8 494     2   125  327  1980   0 100   0
VM: killing process ipctst
 9  2  1  46536 627356    116  31072  87 8675    23  2169 1784  1412   0  96   4
10  0  1  46664 617368    116  31200   0  26     0     6  258   112   0 100   0
10  0  1  47300 607184    116  31832   0 126     0    32  291   110   0 100   0

So we are swapping out with lots of free memory and killing random
processes. The machine also becomes quite unresponsive compared to
pre4 on the same tests.

Greetings
		Christoph

-- 
Christoph Rohland               Tel:   +49 6227 748201
SAP AG                          Fax:   +49 6227 758201
LinuxLab                        Email: cr@sap.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-10 11:25             ` Christoph Rohland
@ 2000-05-10 11:50               ` Zlatko Calusic
  2000-05-11 23:40                 ` Mark Hahn
  0 siblings, 1 reply; 67+ messages in thread
From: Zlatko Calusic @ 2000-05-10 11:50 UTC (permalink / raw)
  To: Christoph Rohland
  Cc: Linus Torvalds, Daniel Stone, riel, linux-mm, linux-kernel

Christoph Rohland <cr@sap.com> writes:

> Linus Torvalds <torvalds@transmeta.com> writes:
> 
> > On 9 May 2000, Christoph Rohland wrote:
> > 
> > > Linus Torvalds <torvalds@transmeta.com> writes:
> > > 
> > > > Try out the really recent one - pre7-8. So far it hassome good reviews,
> > > > and I've tested it both on a 20MB machine and a 512MB one..
> 
> > > I append the mem and task info from sysrq. Mem info seems to not
> > > change after lockup.
> > 
> > I suspect that if you do right-alt + scrolllock, you'll see it looping on
> > a spinlock. Which is why the memory info isn't changing ;)
> > 
> > But I'll double-check the shm code (I didn't test anything that did any
> > shared memory, for example).
> 
> Juan Quintela's patch fixes the lockup. shm paging locked up on the
> page lock.
> 
> Now I can give more data about pre7-8. After a short run I can say the
> following:
> 
> The machine seems to be stable, but VM is mainly unbalanced:
> 
> [root@ls3016 /root]# vmstat 5
>    procs                      memory    swap          io     system         cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
> 
> [...]
> 
>  9  3  0      0 1460016   1588  11284   0   0     0     0  109 23524   4  96   0
>  9  3  1   7552 557432   1004  19320   0 1607     0   402  186 42582   2  89   9
> 11  1  1  41972 111368    424  53740   0 6884     2  1721  277 25904   0  89  10
[ too many lines error, truncating... ]
>  9  2  1  46536 627356    116  31072  87 8675    23  2169 1784  1412   0  96   4
> 10  0  1  46664 617368    116  31200   0  26     0     6  258   112   0 100   0
> 10  0  1  47300 607184    116  31832   0 126     0    32  291   110   0 100   0
> 
> So we are swapping out with lots of free memory and killing random
> processes. The machine also becomes quite unresponsive compared to
> pre4 on the same tests.
> 

I'll second this!

I checked pre7-8 briefly, but I/O & MM interaction is bad. Lots of
swapping, lots of wasted CPU cycles and lots of dead writer processes
(write(2): out of memory, while there is 100MB in the page cache).

Back to my patch and working on the solution for the 20-24 MB & 1GB
machines. Anybody with spare 1GB RAM to help development? :)

-- 
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-10 11:50               ` Zlatko Calusic
@ 2000-05-11 23:40                 ` Mark Hahn
  0 siblings, 0 replies; 67+ messages in thread
From: Mark Hahn @ 2000-05-11 23:40 UTC (permalink / raw)
  To: linux-mm

> I checked pre7-8 briefly, but I/O & MM interaction is bad. Lots of
> swapping, lots of wasted CPU cycles and lots of dead writer processes
> (write(2): out of memory, while there is 100MB in the page cache).

I've checked pre7-8 and -9 fairly extensively, and it works GREAT.  
this is the first kernel since around 2.3.36 that passes my main criteria:

1. I have an app that sequentially traverses 12 40M chunks of data by
mmaping one, reading each u16, unmapping, on to the next.  until
pre7-8, old 40M chunks would NOT be scavenged, and instead the ~10M
rss of the analysis program would be thrashed, over and over.
with pre7-8 and -9, there's only incidental swapping, and performance 
is roughly 2.2x better than preceeding kernels.

2. big compilations (kernel make -j2) seem to run fine:

under 2.3.99-7-8:
334.65user 20.28system 3:01.53elapsed 195%CPU (330186major+472843minor)pf
334.23user 20.28system 2:58.13elapsed 199%CPU (340672major+472770minor)pf
334.33user 20.28system 2:57.79elapsed 199%CPU (329202major+472769minor)pf
287.99user 17.51system 2:33.72elapsed 198%CPU (270411major+396913minor)pf
335.65user 20.31system 3:01.13elapsed 196%CPU (332370major+472770minor)pf

under 2.3.99-pre7 (somewhat hacked):
333.55user 20.37system 3:19.69elapsed 177%CPU (341428major+472709minor)
334.02user 19.53system 3:09.28elapsed 186%CPU (330283major+472709minor)
334.57user 18.98system 3:08.02elapsed 188%CPU (328941major+472709minor)
334.89user 18.97system 3:07.91elapsed 188%CPU (328941major+472709minor)
333.22user 20.36system 3:07.75elapsed 188%CPU (328941major+472709minor)
334.15user 19.42system 3:07.84elapsed 188%CPU (328941major+472709minor)

under 2.3.36:
332.59user 19.93system 3:38.24elapsed 161%CPU (331704major+468634minor)
332.16user 21.14system 3:07.62elapsed 188%CPU (328998major+468634minor)
296.87user 17.93system 2:39.25elapsed 197%CPU (284086major+408452minor)
332.48user 20.89system 3:07.80elapsed 188%CPU (328998major+468634minor)
296.28user 18.08system 2:39.04elapsed 197%CPU (283978major+408169minor)

under 2.3.99-7-9:
331.28user 21.01system 3:18.83elapsed 177%CPU (328941major+472703minor)
334.06user 19.17system 3:07.72elapsed 188%CPU (328941major+472703minor)
332.79user 20.59system 3:07.73elapsed 188%CPU (328941major+472703minor)
334.29user 19.22system 3:07.55elapsed 188%CPU (328941major+472703minor)
332.25user 20.96system 3:07.55elapsed 188%CPU (328941major+472703minor)
332.09user 21.45system 3:07.67elapsed 188%CPU (328941major+472703minor)
334.04user 19.62system 3:07.72elapsed 188%CPU (328941major+472703minor)
334.38user 18.98system 3:07.50elapsed 188%CPU (328941major+472703minor)
333.67user 19.54system 3:07.54elapsed 188%CPU (328941major+472703minor)

wow, those identical PF numbers are kinda eerie!  the machine was otherwise
idle during these tests, but not single-user.  I don't really understand 
why 2.3.36 would sometimes perform *significantly* better.

3. disk bandwidth (bonnie) is excellent on 2.3.99-7-8 or -9

I usually use this machine remotely, so I can't comment on "feel".
big memory or IO load didn't seem to hurt the update latency of top/vmstat
type tools.  machine is a dual celeron/550, bx, 128M, single udma.

I briefly tested a kernel build on an old 32M cyrix 166, and it 
was a little slower than 2.3.36.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-09 15:44       ` Linus Torvalds
  2000-05-09 16:12         ` Simon Kirby
  2000-05-09 17:42         ` Christoph Rohland
@ 2000-05-10  4:05         ` James H. Cloos Jr.
  2000-05-10  7:29           ` James H. Cloos Jr.
  2 siblings, 1 reply; 67+ messages in thread
From: James H. Cloos Jr. @ 2000-05-10  4:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, linux-kernel

>>>>> "Linus" == Linus Torvalds <torvalds@transmeta.com> writes:

Linus> Try out the really recent one - pre7-8. So far it hassome good
Linus> reviews, and I've tested it both on a 20MB machine and a 512MB
Linus> one..

pre7-8 still isn't completely fixed, but it is better than pre6.

Try doing something like 'cp -a linux-2.3.99-pre7-8 foobar' and
watching kswapd in top (or qps, el al).  On my dual-proc box, kswapd
still maxes out one of the cpus.  Tar doesn't seem to show it, but
bzcat can get an occasional segfault on large files.

The filesystem, though, has 1k rather than 4k blocks.  Yeah, just
tested again on a fs w/ 4k blocks.  kswapd only used 50% to 65% of a
cpu, but that was an ide drive and the former was on a scsi drive.[1]

OTOH, in pre6 X would hit (or at least report) 2^32-1 major faults
after only a few hours of usage.  That bug is gone in pre7-8.

[1] asus p2b-ds mb using onboard adaptec scsi and piix ide; drives are
    all IBM ultrastars and deskstars.

-JimC
-- 
James H. Cloos, Jr.  <URL:http://jhcloos.com/public_key> 1024D/ED7DAEA6 
<cloos@jhcloos.com>  E9E9 F828 61A4 6EA9 0F2B  63E7 997A 9F17 ED7D AEA6
        Save Trees:  Get E-Gold! <URL:http://jhcloos.com/go?e-gold>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-10  4:05         ` James H. Cloos Jr.
@ 2000-05-10  7:29           ` James H. Cloos Jr.
  2000-05-11  0:16             ` Linus Torvalds
  0 siblings, 1 reply; 67+ messages in thread
From: James H. Cloos Jr. @ 2000-05-10  7:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, linux-kernel

Ok.  Tried w/ Manfred patch (ie the 2nd half).  kswapd still uses a lot
of cpu doing recursuve cp(1)s, but it is less than in virgin pre7-8.  I
got about 10s of cpu for cp and 40s for kswapd doing a cp -a of the 7-8
tree (after compiling) on the ide drive (w/ 4k ext2 blocks).  On the 1k
ext2 block scsi partition, it was 1m50s for kswapd and 20s for cp to cp
three such trees.  kswapd %cpu never exceeded 65% on the latter and 50%
on the former; substantially better than in virgin 7-8, but not as good
as earlier kernels (though I don't have any numbers to back that up). I
did this test in single user mode w/ only top running (on another vc).

Hope the datapoint helps!

-JimC
-- 
James H. Cloos, Jr.  <URL:http://jhcloos.com/public_key> 1024D/ED7DAEA6 
<cloos@jhcloos.com>  E9E9 F828 61A4 6EA9 0F2B  63E7 997A 9F17 ED7D AEA6
        Save Trees:  Get E-Gold! <URL:http://jhcloos.com/go?e-gold>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-10  7:29           ` James H. Cloos Jr.
@ 2000-05-11  0:16             ` Linus Torvalds
  2000-05-11  0:32               ` Linus Torvalds
                                 ` (3 more replies)
  0 siblings, 4 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-11  0:16 UTC (permalink / raw)
  To: James H. Cloos Jr.; +Cc: linux-mm, linux-kernel

Ok, there's a pre7-9 out there, and the biggest change versus pre7-8 is
actually how block fs dirty data is flushed out. Instead of just waking up
kflushd and hoping for the best, we actually just write it out (and even
wait on it, if absolutely required).

Which makes the whole process much more streamlined, and makes the numbers
more repeatable. It also fixes the problem with dirty buffer cache data
much more efficiently than the kflushd approach, and mmap002 is not a
problem any more. At least for me.

[ I noticed that mmap002 finishes a whole lot faster if I never actually
  wait for the writes to complete, but that had some nasty behaviour under
  low memory circumstances, so it's not what pre7-9 actually does. I
  _suspect_ that I should start actually waiting for pages only when
  priority reaches 0 - comments welcomed, see fs/buffer.c and the
  sync_page_buffers() function ]

kswapd is still quite aggressive, and will show higher CPU time than
before. This is a tweaking issue - I suspect it is too aggressive right
now, but it needs more testing and feedback. 

Just the dirty buffer handling made quite an enormous difference, so
please do test this if you hated earlier pre7 kernels.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11  0:16             ` Linus Torvalds
@ 2000-05-11  0:32               ` Linus Torvalds
  2000-05-11 16:36                 ` [PATCH] Recent VM fiasco - fixed (pre7-9) Rajagopal Ananthanarayanan
  2000-05-11  1:04               ` [PATCH] Recent VM fiasco - fixed Juan J. Quintela
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-11  0:32 UTC (permalink / raw)
  To: Rajagopal Ananthanarayanan, Juan J. Quintela, Rik van Riel; +Cc: linux-mm

Some more explanations on the differences between pre7-8 and pre7-9..

Basically pre7-9 survives mmap002 quite gracefully, and I think it does so
for all the right reasons. It's not tuned for that load at all, it's just
that mmap002 was really good at showing two weak points of the mm layer:

 - try_to_free_pages() could actually return success without freeing a
   single page (just moving pages around to the swap cache). This was bad,
   because it could cause us to get into a situation where we
   "successfully" free'd pages without ever adding any to the list. Which
   would, for all the obvious reasons, cause problems later when we
   couldn't allocate a page after all..

 - The "sync_page_buffers()" thing to sync pages directly to disk rather
   than wait for bdflush to do it for us (and have people run out of
   memory before bdflush got around to the right pages).

   Sadly, as it was set up, try_to_free_buffers() doesn't even get the
   "urgency" flag, so right now it doesn't know whether it should wait for
   previous write-outs or not. So it always does, even though for
   non-critical allocations it should just ignore locked buffers.

Fixing these things suddenly made mmap002 behave quite well. I'll make the
change to pass in the priority to sync_page_buffers() so that I'll get the
increased performance from not waiting when I don't have to, but it starts
to look like pre7 is getting in shape.

		Linus

On Wed, 10 May 2000, Linus Torvalds wrote:
> 
> Ok, there's a pre7-9 out there, and the biggest change versus pre7-8 is
> actually how block fs dirty data is flushed out. Instead of just waking up
> kflushd and hoping for the best, we actually just write it out (and even
> wait on it, if absolutely required).
> 
> Which makes the whole process much more streamlined, and makes the numbers
> more repeatable. It also fixes the problem with dirty buffer cache data
> much more efficiently than the kflushd approach, and mmap002 is not a
> problem any more. At least for me.
> 
> [ I noticed that mmap002 finishes a whole lot faster if I never actually
>   wait for the writes to complete, but that had some nasty behaviour under
>   low memory circumstances, so it's not what pre7-9 actually does. I
>   _suspect_ that I should start actually waiting for pages only when
>   priority reaches 0 - comments welcomed, see fs/buffer.c and the
>   sync_page_buffers() function ]
> 
> kswapd is still quite aggressive, and will show higher CPU time than
> before. This is a tweaking issue - I suspect it is too aggressive right
> now, but it needs more testing and feedback. 
> 
> Just the dirty buffer handling made quite an enormous difference, so
> please do test this if you hated earlier pre7 kernels.
> 
> 		Linus
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed (pre7-9)
  2000-05-11  0:32               ` Linus Torvalds
@ 2000-05-11 16:36                 ` Rajagopal Ananthanarayanan
  0 siblings, 0 replies; 67+ messages in thread
From: Rajagopal Ananthanarayanan @ 2000-05-11 16:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Juan J. Quintela, Rik van Riel, linux-mm

Dbench runs well on pre7-9. As far as I can tell,
there were NO failures in 15 hours of running,
the longest I've ever run this test. The performance has been
pretty good. Swapping was initially very low, although
it didn't affect performance. Later, guessing that
more periodic system processes started to run, swap
level increased, but stayed to "usual" levels observed
before ... the swap build-up was gradual likely indicating
that the right things were swapped out only when necessary.

regards,

ananth.

--------------------------------------------------------------------------
Rajagopal Ananthanarayanan ("ananth")
Member Technical Staff, SGI.
--------------------------------------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11  0:16             ` Linus Torvalds
  2000-05-11  0:32               ` Linus Torvalds
@ 2000-05-11  1:04               ` Juan J. Quintela
  2000-05-11  1:53                 ` Simon Kirby
  2000-05-11  5:10                 ` Linus Torvalds
  2000-05-11 11:12               ` [PATCH] Recent VM fiasco - fixed Christoph Rohland
  2000-05-11 17:38               ` Steve Dodd
  3 siblings, 2 replies; 67+ messages in thread
From: Juan J. Quintela @ 2000-05-11  1:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James H. Cloos Jr., linux-mm, linux-kernel

>>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:

linus> Which makes the whole process much more streamlined, and makes the numbers
linus> more repeatable. It also fixes the problem with dirty buffer cache data
linus> much more efficiently than the kflushd approach, and mmap002 is not a
linus> problem any more. At least for me.

linus> [ I noticed that mmap002 finishes a whole lot faster if I never actually
linus> wait for the writes to complete, but that had some nasty behaviour under
linus> low memory circumstances, so it's not what pre7-9 actually does. I
linus> _suspect_ that I should start actually waiting for pages only when
linus> priority reaches 0 - comments welcomed, see fs/buffer.c and the
linus> sync_page_buffers() function ]

Hi
        I have done my normal mmap002 test and this goes slower than
ever, it takes something like 3m50 seconds to complete, (pre7-8 2m50,
andrea classzone 2m8, and 2.2.15 1m55 for reference).  I have no more
time now to do more testing, I will continue tomorrow late.  My
findings are:

real    3m41.403s
user    0m16.010s
sys     0m36.890s


It takes the same user time than anterior versions, but the system
time has aumented a lot, it was ~10/12 seconds in pre7-8 and around 8
in classzone and 2.2.15.

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11  1:04               ` [PATCH] Recent VM fiasco - fixed Juan J. Quintela
@ 2000-05-11  1:53                 ` Simon Kirby
  2000-05-11  7:23                   ` Linus Torvalds
  2000-05-11 11:15                   ` [PATCH] Recent VM fiasco - fixed Rik van Riel
  2000-05-11  5:10                 ` Linus Torvalds
  1 sibling, 2 replies; 67+ messages in thread
From: Simon Kirby @ 2000-05-11  1:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]

On Thu, May 11, 2000 at 03:04:37AM +0200, Juan J. Quintela wrote:

> >>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:
> 
> linus> Which makes the whole process much more streamlined, and makes the numbers
> linus> more repeatable. It also fixes the problem with dirty buffer cache data
> linus> much more efficiently than the kflushd approach, and mmap002 is not a
> linus> problem any more. At least for me.
>...
>         I have done my normal mmap002 test and this goes slower than
> ever, it takes something like 3m50 seconds to complete, (pre7-8 2m50,
> andrea classzone 2m8, and 2.2.15 1m55 for reference).  I have no more
> time now to do more testing, I will continue tomorrow late.  My
> findings are:
> 
> real    3m41.403s
> user    0m16.010s
> sys     0m36.890s
> 
> 
> It takes the same user time than anterior versions, but the system
> time has aumented a lot, it was ~10/12 seconds in pre7-8 and around 8
> in classzone and 2.2.15.

I, too, see unbelievably slow writing now when uncompressing large data
files.  128 MB, dual processor, IDE drive.  It seems to syncronously be
writing out data as it's dirty, not grouping it into blocks at all like
it used to.  This would probably increase seeking, no?

Trying now with Andrea's classzone-27 against pre7-8, the results are
much better.

I attached vmstat-1.txt (2.3.99pre7-8+classzone-27) and vmstat-2.txt
(2.3.99pre7-9), which are outputs from "vmstat 1" when uncompressing the
same thing.  2.3.99pre7-9 seems to be taking about twice as long (real
time).  This is from and to a 4K EXT2 filesystem.  Both seem to swap out
some, which I guess is arguably good or bad...

Is Andrea taking a too dangerous approach for the current kernel version,
or are you trying to get something extremely simple working instead?

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

[-- Attachment #2: vmstat-1.txt --]
[-- Type: text/plain, Size: 13520 bytes --]

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0      0 106132   1100  10048   0   0    63     1   70    48   2   3  95
 0  0  0      0 106132   1100  10048   0   0     0     0  102     6   0   0 100
 0  0  0      0 106132   1100  10048   0   0     0     0  117     8   0   0 100
 0  0  0      0 106132   1100  10048   0   0     0     0  103    10   0   0 100
 0  0  0      0 106132   1100  10048   0   0     1     0  102    16   0   0 100
 0  0  0      0 106132   1100  10048   0   0     0     0  103    10   0   0  99
 0  0  0      0 106132   1100  10048   0   0     0     0  104    10   0   0 100
 0  0  0      0 106132   1100  10048   0   0     0     0  104     6   0   1  99
 0  0  0      0 106132   1100  10048   0   0     0     0  105    16   0   0 100
 0  1  0      0 105904   1116  10236   0   0   100    68  167    50   0   0 100
 1  0  0      0  75028   1164  38844   0   0  3670     0  340   261  20  13  67
 0  1  0      0  55968   1188  57268   0   0  2322   500  392   277  15   6  79
 0  1  0      0  44000   1200  68856   0   0  1442  1500  624   176   7   7  86
 0  1  0      0  31836   1216  80620   0   0  1488  1500  593   178   9   6  85
 0  1  0      0  19936   1224  92140   0   0  1441  1500  611   170   9   6  85
 1  0  0      0   9292   1240 102436   0   0  1296  1000  587   146   5   3  92
 0  1  0    248   2880    900 109460  84 324  1239  1581  433   158   8   7  85
 0  1  0    248   2672    260 110496   0   0  1474  1500  580   186   8   7  85
 1  0  0    232   2748    260 110408  44   0  1308  1000  540   219   8   9  83
 0  1  0    232   2400    268 110736   0   0  1251  1500  483   175   8   5  86
 0  1  0    232   2392    260 110752   0   0  1441  1500  482   196   8   5  86
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0    232   2052    268 111140   0   0  1489  1500  522   206   7   7  85
 0  1  0    232   2900    264 110064   0   0  1313  1000  647   169   6   6  88
 0  1  0    232   2496    264 110644   0   0  1232  1500  494   171   8   3  89
 0  1  0    232   2612    268 110528   0   0  1474  1500  552   206   9   5  85
 1  0  0    232   2544    264 110668   0   0  1153  1000  623   146   6   3  90
 0  1  1    504   1792    276 111656   0 272  1072  2068  656   499   7   5  87
 0  1  0    540   2876    268 110720   0  36  1378  1009  618   202   9   6  85
 0  1  0    540   3196    256 110484   0   0  1345  1000  520   189   9   5  86
 1  0  0    540   3112    264 110616   0   0  1170  1000  409   150   8   5  87
 0  1  0    540   2708    264 110952   0   0  1346  1500  450   139  10   4  86
 1  0  0    540   2500    272 111216   0   0  1315  1000  432   140  10   6  84
 1  0  0    540   2172    268 111540   0   0  1199  1500  491   162   6   4  90
 0  1  0    540   2284    272 111364   0   0  1443  1500  438   135   7   7  86
 0  1  0    540   2852    272 110796   0   0  1089  1000  465   148   7   5  88
 0  1  0    540   2980    264 110676   0   0  1423  1500  380   140   7   7  86
 0  1  0    540   2584    272 111068   0   0  1059  1000  421   146   9   3  88
 0  1  0    988   2556    276 111524   0 448  1473  1612  523   148  10   6  84
 1  0  0    988   2704    264 111408   0   0  1135  1000  423   163   6   6  88
 0  1  0    988   2616    272 111480   0   0  1443  1500  482   165   8   7  85
 0  1  0    988   2796    268 111296   0   0  1392  1000  390   169   9   5  86
 0  1  0    988   3048    272 111048   0   0  1122  1500  450   161   8   4  88
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0    988   3160    268 110936   0   0  1441  1500  540   166   8   6  86
 0  1  0    988   2844    268 111316   0   0  1200  1000  479   120  10   3  87
 0  1  0    988   2676    284 111396   0   0  1319  1500  375   150   8   6  86
 0  1  0    988   2816    280 111280   0   0  1057  1000  384   122   6   4  90
 0  1  0    988   2536    284 111544   0   0  1456  1500  426   207  11   3  86
 0  1  0   1052   2724    296 111416   0  64  1122  1516  577   102   6   6  88
 1  1  0   1200   2164    284 112116   0 148   993  1037  555    93   5   5  89
 0  1  0   1200   2884    280 111420   0   0  1488  1000  429   157   9   7  84
 0  1  0   1200   2868    284 111428   0   0  1090  1500  521   119   5   5  90
 1  0  0   1200   2148    284 112212   0   0  1698  1000  479   148  10   8  82
 1  0  0   1200   2080    280 112216   0   0  1167  1500  462   130   8   4  88
 1  0  0   1200   2328    284 111968   0   0  1282  1000  385   148   8   5  87
 0  1  0   1200   3028    288 111276   0   0  1218  1500  552   129   7   5  87
 0  1  0   1200   2704    264 111620   0   0  1487  1500  514   144   9   8  83
 1  0  0   1200   2740    272 111628   0   0  1283  1000  477   125   6   8  86
 0  1  0   1200   3176    276 111124   0   0  1250  1500  537   125   3   7  89
 0  1  0   1388   2916    272 111568   0 188   911  1047  537   108   5   6  89
 0  1  0   1388   2772    276 111712   0   0  1603  1500  463   145  10   6  84
 1  0  0   1388   2964    272 111592   0   0  1424  1000  482   131  11   2  87
 0  1  0   1388   2440    276 112048   0   0  1218  1500  462   130   9   4  87
 0  1  0   1388   2936    272 111552   0   0  1377  1500  461   124   7   5  88
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0   1388   2128    272 112388   0   0  1104  1000  400   112   3   9  88
 0  1  0   1388   3116    276 111372   0   0  1442  1500  462   160   7   8  85
 0  1  0   1388   2972    272 111516   0   0  1441  1500  441   138   8   6  86
 0  1  0   1388   2268    276 112220   0   0  1104  1000  418   138   5   3  91
 0  1  0   1388   2492    276 111976   0   0  1122  1500  559   160   7   5  88
 1  0  0   1580   2436    284 112364   0 192  1634  1048  454   170   9   7  84
 0  1  0   1756   3084    276 111768   0 176  1231  1544  490   140   9   5  85
 1  0  0   1756   2552    276 112424   0   0  1379  1000  426   124   7   4  89
 0  1  0   1756   2420    276 112432   0   0  1153  1500  496   142   9   3  87
 1  0  0   1756   2640    276 112284   0   0  1510  1500  470   146   7   7  86
 0  1  0   1756   2712    280 112132   0   0  1421  1500  467   132   8   5  87
 0  1  0   1756   2880    280 111756   0   0   961  1000  516    91   6   3  91
 0  1  0   1756   3148    280 111700   0   0  1585  1500  452   171  11   5  83
 1  0  0   1756   2428    280 112480   0   0  1377  1000  518   152   7   6  87
 0  1  0   1756   2428    280 112420   0   0  1168  1500  398   129   6   6  88
 1  0  0   1756   2604    288 112296   0   0  1250  1000  402   149   7   6  86
 1  0  0   1860   3032    280 111944   0 104   993  1526  541   104   6   7  87
 0  1  0   1860   2096    280 112848   0   0  1712  1500  426   203   9   7  84
 0  1  0   1860   2100    284 112844   0   0  1438  1000  479   172   7   7  85
 0  1  0   1860   2336    280 112616   0   0  1126  1500  470   156   8   6  86
 0  1  0   1860   2652    276 112300   0   0  1263  1000  450   185   8   7  85
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0   1860   2124    284 112820   0   0  1218  1500  491   169   7   4  89
 0  1  0   1860   2108    284 112836   0   0  1443  1500  466   206   8   7  85
 1  0  0   1860   2876    280 112200   0   0   943  1000  465   105   6   4  90
 1  1  0   1860   2604    284 112340   0   0  1603  1500  458   146   9   6  84
 0  1  0   1860   3108    284 111836   0   0  1089  1000  503   105   7   3  90
 0  1  0   1860   3160    288 111780   0   0  1457  1500  525   152   8   7  85
 1  0  0   1948   2868    284 112172   0  88  1345  1022  569   137  10   5  85
 0  1  0   1948   3020    284 112008   0   0  1129  1500  509   101   7   6  87
 0  1  0   1948   2592    288 112432   0   0  1513  1500  422   140   7   6  87
 0  1  0   1948   3092    284 111936   0   0  1089  1000  429   126   8   3  88
 0  1  0   1948   2832    280 112200   0   0  1328  1000  451   204  10   5  85
 0  1  0   1948   2372    284 112656   0   0  1090  1500  468   168   5   5  90
 0  1  0   1948   3000    284 112028   0   0  1505  1500  499   158   7   7  85
 0  1  0   1948   2228    284 112840   0   0  1104  1000  468   129   8   3  88
 0  1  0   1948   2408    284 112620   0   0  1154  1500  496   129   7   4  88
 1  0  0   1948   2780    284 112312   0   0  1698  1000  448   161  12   4  84
 0  1  0   2052   2100    284 113036   0 104  1135  2026  519   378   7   6  87
 0  1  0   2052   2084    304 113032   0   0  1442  1000  522   146   8   4  88
 0  1  0   2052   2956    300 112164   0   0  1030  1000  410   108   9   2  88
 0  1  0   2052   2892    296 112232   0   0  1487  1500  404   150   9   7  84
 1  0  0   2052   2736    320 112496   0   0  1192  1000  437   150   9   4  87
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0   2052   2408    312 112704   0   0  1313  1500  475   138  10   3  87
 0  1  0   2052   2280    308 112836   0   0  1039  1000  384   112   6   3  91
 0  1  0   2052   2264    312 112848   0   0  1443  1500  441   160  11   5  84
 0  1  0   2052   3024    312 112088   0   0  1057  1000  535   133   7   4  89
 1  0  0   2052   2636    312 112540   0   0  1200  1000  413   171   7   5  88
 0  1  0   2052   2808    324 112292   0   0  1283  1500  485   132   6   6  87
 1  0  0   2132   2272    320 112912   0  80  1089  1020  460   121   9   4  86
 0  1  0   2180   2224    308 113020   0  48  1327  1512  408   167   8   8  84
 1  0  0   2180   2460    316 112840   0   0  1412  1000  406   131   7   8  85
 0  1  0   2180   2576    316 112660   0   0  1281  1500  474   157   6   5  88
 0  0  0    928   2736    308 112496 772   0  1045   500  307   153   4   3  93
 0  0  0    928   2736    308 112496   0   0     0     0  101     8   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  102    10   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  101    12   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  102     6   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  105     8   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  133     6   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  131    12   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  129    14   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  105     8   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  102     6   0   0 100
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0    928   2736    308 112496   0   0     0     0  101     8   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  102    10   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  103    12   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  101     6   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  102     8   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  101     7   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  106    12   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  102    10   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  103     8   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  101     6   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  102     8   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  101    10   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  102    12   0   0 100
 0  0  0    928   2736    308 112496   0   0     0     0  101     6   0   0 100
 0  0  0    928   2732    308 112500   0   0     0     0  102     8   1   0  99
 0  0  0    928   2732    308 112500   0   0     0     0  101     6   0   0 100
 0  0  0    928   2732    308 112500   0   0     0     0  110    12   0   0 100
 0  0  0    928   2732    308 112500   0   0     0  3221  258    84   0   0 100
 0  0  0    928   2732    308 112500   0   0     0     0  153     8   0   0 100
 0  0  0    928   2720    312 112508   0   0     5     0  107    12   0   0 100
 0  0  0    928   2720    312 112508   0   0     0     0  104     8   0   0 100
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0    916   2680    312 112532   0   0    19     0  110    16   0   0 100
 0  0  0    916   2680    312 112532   0   0     0  2300  257    12   0   0 100
 0  0  0    904   2680    312 112520   0   0     0     0  113    14   0   0 100
 0  0  0    904   2680    312 112520   0   0     0     0  107    10   0   0 100
 0  0  0    904   2680    312 112520   0   0     0     0  108     6   0   0 100
 0  0  0    904   2680    312 112520   0   0     0     0  102    14   0   0 100

[-- Attachment #3: vmstat-2.txt --]
[-- Type: text/plain, Size: 21840 bytes --]

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0      0 105948   1240  10040   0   0    40     1   64    32   2   2  96
 0  0  0      0 105948   1240  10040   0   0     0     0  103    10   0   0 100
 0  0  0      0 105948   1240  10040   0   0     0     0  109     6   0   0 100
 0  0  0      0 105948   1240  10040   0   0     0     0  102     8   0   0 100
 0  0  0      0 105948   1240  10040   0   0     0     0  101     6   0   0 100
 0  0  0      0 105948   1240  10040   0   0     0     0  102    14   0   0 100
 0  0  0      0 105948   1240  10040   0   0     0     0  101     8   0   0 100
 0  0  0      0 105948   1240  10040   0   0     0     0  102     8   0   0 100
 1  0  0      0 102096   1264  12532   0   0   469     0  141    69   2   1  97
 1  0  0      0  70620   1300  42988   0   0  3827     0  347   283  21  17  62
 0  1  0      0  52628   1324  60332   0   0  2179  1000  425   271  13   7  80
 1  0  0      0  40864   1336  71716   0   0  1424  1000  589   178   8   5  87
 1  0  0      0  30348   1348  81956   0   0  1282  1500  645   143   8   3  89
 0  1  0      0  19240   1356  92644   0   0  1345  1500  633   153   7   4  89
 0  1  1      0  13348    976  98932   0   0  1168  1012  379   164   4   8  88
 0  1  1      0   9240    968 102924   0   0   512   528  255   134   2   3  95
 0  1  1      0   5932    632 106608   0   0   642   528  263   144   5   4  91
 0  1  1      0   1688    588 110776   0   0   609  1026  381   131   7   3  90
 0  1  1      0   2804    228 111064   0   0   512   528  241   104   2  20  78
 0  1  1      0   2288    208 112316   0   0   609   514  295    94   3  25  71
 0  1  1      0   2824    224 111608   0   0   625   515  374   122   3  16  81
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1      0   2828    228 111440   0   0   608   524  348   112   3  11  86
 0  1  1      0   2824    236 111312   0   0   482   532  256    94   2   8  90
 1  0  1      0   2692    240 111448   0   0   480   536  354    86   2   8  89
 1  0  1      0   2852    232 111492   0   0   481   527  257    95   4   9  87
 0  1  1      0   2824    236 111316   0   0   544   528  255   103   2   3  94
 0  1  1      0   2824    240 111180   0   0   513   537  340    96   3   5  91
 0  1  1      0   2828    244 111040   0   0   497   520  311   112   1   5  94
 0  1  1      0   2832    216 111848   0   0   478   530  345    83   3  21  76
 0  1  1      0   2600    228 111920   0   0   578   522  224   122   3   3  94
 0  1  1      0   2836    228 111536   0   0   576   525  238   120   5  13  82
 0  1  1      0   2724    236 111516   0   0   481   526  326    94   4   6  89
 1  0  1     36   2624    236 111552   0  36   640  1022  346   105   5  23  71
 0  1  1     36   2892    244 111060   0   0   577   519  344    99   2  16  82
 0  1  1     36   2892    248 110916   0   0   559   525  255   108   4   9  87
 0  1  1     32   2936    252 110732   0   0   480   539  357    96   3   8  89
 0  1  1     32   2528    260 110912   0   0   450   532  261    81   2   7  90
 0  1  1     32   2896    268 110504   0   0   636   533  359   110   4   7  89
 0  1  0     32   2572    244 110704   0   0   741  1037  297   160   4  11  85
 0  1  0     32   3064    248 110704   0   0   449    62  364   164   3  16  81
 0  1  1     32   2452    252 111184   0   0   438   534  356   101   2  11  87
 0  1  1     32   3056    260 110492   0   0   392   542  323    93   2  14  84
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1     32   1288    260 112128   0   0   512   536  354    96   4   5  91
 1  0  1     32   2536    272 110744   0   0   354   540  242   100   1   9  90
 1  0  1     32   3260    208 110884   0   0   416   547  252   175   3  22  75
 0  1  1     32   2212    216 112424   0   4   609   527  239   112   5   8  87
 0  1  1     32   3004    216 111472   0   4   608   540  327   120   5   7  87
 0  1  1     32   3008    232 111252   0   4   768  1083  405   252   3  11  86
 0  1  1     32   1964    240 112144   0   0   594   570  422   222   3  14  83
 1  1  1     60   2932    248 111164   0  36   661   586  349   226   3  13  84
 0  1  1     60   2752    256 111468   0   0   705   567  309   162   3  12  85
 0  1  1     60   2480    260 111568   0   0   641  1066  436   197   3   6  91
 0  1  1     64   3012    268 110940   0   4   674   580  329   232   5  12  83
 0  1  1     64   2764    268 111224   0   0   783  1144  423   354   5  28  67
 1  0  1     64   3016    272 110944   0   0   706   610  434   289   4  21  75
 0  1  1     64   2508    268 111428   0   0   832  1154  435   379   5  29  66
 0  1  1     64   3032    272 110768   0   0   833   628  414   338   5  28  67
 0  1  1     64   2608    276 111040   0   0   769  1081  440   268   3  24  72
 0  1  1     64   2684    284 110804   0   0   847  1089  370   280   6  12  82
 0  1  0     64   2832    276 110796   0   0   706   558  384   193   7  11  82
 0  1  1    136   1280    284 112440   0  72   911  1084  494   251   6  13  81
 0  1  1    132   2632    292 111020   0   0   883  1137  478   353   3  27  70
 0  1  1    132   2996    300 110464   0   0   687   622  344   306   2  32  65
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1    132   1688    288 111708   0   0   834  1109  490   308   5  23  72
 0  1  1    132   2400    244 111760   0   4   765   628  413   325   5  26  69
 0  1  1    204   2688    244 111632   0  92   705   669  390   360   3  32  65
 0  1  1    240   2624    248 111684   0  40   609  1105  383   259   3  20  77
 0  1  1    276   2940    252 111256   0  36   623   589  451   229   1  15  84
 1  0  1    276   2940    256 111064   0   0   736   586  332   244   5  17  77
 0  1  1    276   1920    264 112024   0   0   674  1069  417   228   6   9  84
 0  1  1    276   2832    264 111084   0   0   656   582  462   245   4  11  84
 0  1  1    276   3044    252 111096   0   0   544   597  336   238   3  29  68
 0  1  1    276   2708    248 111468   0   0   609   608  320   264   3  30  67
 0  1  1    276   2992    256 111120   0   0   438   567  289   176   2  15  83
 0  1  1    276   3000    256 110984   0   0   505   564  303   186   3  11  86
 1  0  1    276   3060    268 110780   0   0   509   562  321   199   4   8  88
 0  1  1    276   2844    268 110852   0   0   677   561  297   185   5  10  85
 1  0  1    276   2964    260 110976   0   0   481  1044  400   141   3   8  88
 0  1  1    276   1780    260 112016   0   0   608   563  251   179   3  11  86
 0  1  1    276   2012    264 111700   0   0   513   557  324   155   3   5  92
 0  1  1    276   3076    272 110528   0   0   783   577  328   244   5  13  82
 0  1  1    276   1072    264 112812   0   0   674  1047  366   139   6   6  88
 0  1  1    276   1960    264 111980   0   0   608   568  482   200   3  15  81
 0  1  1    276   1140    268 112664   0   0   513   578  399   198   5  25  70
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1    276   2404    272 111240   0   0   544   563  347   159   5  15  79
 0  1  1    276   1792    252 112016   0   0   737   569  293   204   4  22  74
 0  1  1    276   1116    264 112592   0   0   401   554  339   155   2  13  84
 0  1  1    268   1876    264 111808   0   0   576   562  305   166   2  15  83
 0  1  1    260   2804    244 111444   0   0   450   563  327   180   3  16  81
 1  0  1    260   2700    240 111636   0   4   608   561  392   169   2  24  73
 0  1  1    260   2172    244 112120   0   4   673  1041  401   103   3  11  86
 0  1  1    260   2832    252 111296   0   4   609   556  427   155   6   8  86
 1  0  1    260   2800    260 111208   0   0   687   571  319   210   5  12  83
 0  1  1    260   1592    260 112180   0   0   672  1056  432   168   5   4  90
 0  1  1    260   2948    256 111088   0   0   610   560  339   182   5  14  81
 0  1  1    260   2824    248 111192   0   0   576   561  326   163   3  11  86
 1  0  1    260   2864    256 111060   0   0   577   553  292   142   2   8  90
 1  0  1    260   2572    264 111180   0   0   613   549  359   136   3   9  88
 0  1  1    260   2100    264 111508   0   0   587  1056  305   163   3   7  90
 1  0  1    260   2348    260 111584   0   0   608   527  384    94   5   9  86
 1  0  1    260   2820    264 110832   0   0   561   531  366   105   4   5  92
 0  1  1    260   2656    268 111072   0   0   672   549  378   129   4  10  86
 0  1  1    260   2684    276 110920   0   0   450   550  278   148   2   4  93
 0  1  2    260   1536    264 112156   0   0   802  1060  474   192   7  11  82
 0  1  1    260   1568    260 112164   0   0   943  1084  378   258   7  12  81
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1    296   2136    264 111824   0  36   898   602  400   275   7  15  78
 1  0  1    296   1296    268 112564   0   0   640  1092  473   248   5  25  70
 1  0  1    296   2992    272 110632   0   0   801   601  364   279   3  20  77
 0  1  1    296   1540    284 111988   4   0   726  1051  389   200   5  11  84
 0  1  1    368   2988    264 111444   0  72   687   610  390   244   2  25  73
 0  1  1    368   3048    264 111216   0   0   770   636  386   339   2  28  69
 0  1  1    368   2836    268 111392   0   0   640  1058  414   160   3  16  81
 0  1  1    368   2952    276 111064   0   0   833   552  335   127   4  16  79
 0  1  1    368   2016    280 111852   0   0   641  1054  425   169   3   8  88
 1  0  0    368   2056    264 112296   0   0   815   602  359   275   6  26  68
 0  1  1    368   2284    276 111816   0   0   674  1114  430   288   4  20  75
 0  1  1    368   2768    280 111324   0   0   833   614  407   301   7  22  71
 0  1  1    368   1952    276 112124   0   0   672  1068  465   196   4  14  81
 0  1  1    368   2752    284 111360   0   0   705   581  326   229   6  19  74
 0  1  1    368   3024    280 110956   0   0   655   576  352   225   6  15  79
 0  1  1    368   2136    288 111688   0   0   578  1053  425   163   5   9  86
 1  0  1    368   2652    288 111080   0   0   768   559  333   205   2  10  87
 0  1  1    368   3068    280 110860   0   0   641   563  285   186   6  10  84
 1  0  0    368   3064    280 110800   0   0   673  1040  391   149   6   9  85
 0  1  1    368   1892    256 112248   0   0   783   571  385   204   4  15  81
 0  1  1    368   2972    260 111024   0   0   512   572  286   195   3  26  71
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1    368   1632    264 112364   0   0   514   584  339   202   2  29  69
 0  1  1    368   1836    264 112032   0   0   448   554  270   125   3  14  83
 1  0  1    368   1284    272 112492   0   0   545   568  337   181   4  13  83
 0  1  1    368   2824    276 110792   0   0   590  1086  276   214   4  14  82
 0  1  1    368   3036    264 110920   0   0   481    66  328   140   3  18  79
 0  1  1    368   2428    272 111372   0   0   591  1060  407   185   5   7  88
 1  0  1    368   2468    260 111608   0   0   512   577  302   218   3   8  89
 0  1  1    368   1244    264 112780   0   0   514   550  288   158   3   7  90
 0  1  1    368   1540    264 112456   0   0   544   547  381   138   5   9  86
 1  0  1    368   2896    276 110960   0   0   482   561  340   173   1  10  88
 0  1  1    368   2988    260 111172   0   0   897   597  382   219   3  25  71
 0  1  1    368   1692    264 112332   0   0   687  1068  377   196   3  15  82
 0  1  1    368   2752    244 111716   0   0   768   638  466   352  10  32  58
 0  1  1    368   2200    256 112084   0   0   674  1110  420   286   3  23  73
 0  1  1    368   2992    260 111148   0   0   673   620  392   315   5  33  61
 0  1  1    368   2816    256 111312   0   0   736  1134  483   335   3  31  66
 0  1  1    368   2916    268 111124   0   0   658   571  402   215   3  15  81
 0  1  1    368   2024    256 112064   0   0   928  1102  480   299   8  20  72
 0  1  1    368   2552    264 111400   0   0   834   602  445   293   6  20  74
 0  1  1    368   1564    264 112420   0   0   705  1067  398   203   5  14  81
 0  1  1    368   1196    264 112868   0   0   993  1076  415   240   7  14  79
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1    368   3016    264 110964   0   0   847   616  419   331   5  28  67
 0  1  1    368   1292    272 112612   0   0   674  1101  399   307   4  18  78
 0  1  1    368   2712    272 111212   0   0   769   624  397   358   4  30  66
 0  1  1    368   2064    264 111848   0   0   704  1060  399   215   5  14  81
 0  1  1    368   2772    260 111304   0   0   673   585  358   223   4  20  76
 1  0  1    368   2868    268 111136   0   0   687   579  362   228   1  13  86
 0  1  1    368   2444    280 111332   0   0   610  1067  398   182   4   9  87
 0  1  1    368   1808    264 112088   0   0   576   559  389   166   4  10  86
 1  0  1    368   2812    260 111476   0   0   514   564  305   167   4  12  84
 0  1  1    368   2868    268 111192   0   0   860  1041  305   152   5  22  73
 0  1  1    368   2056    268 111856   0   0   589   554  396   195   3  11  86
 0  1  1    368   2756    272 111080   0   0   807   564  393   215   7  15  78
 0  1  1    368   1756    280 112064   0   0   752  1056  456   182   3  13  84
 0  1  1    368   3020    268 110960   0   0   769   580  427   212   4  23  73
 0  1  1    368   1672    268 112272   0   0   769  1046  430   123   6  17  77
 0  1  1    368   1880    264 112212   0   0   879  1086  391   249   5  14  81
 0  1  1    368   2944    268 111040   0   0   672   566  421   198   6  19  74
 0  1  1    368   2168    272 111772   0   0   898  1059  439   218   4  17  78
 0  1  1    368   2156    264 111924   0   0   929  1072  458   222   5  14  81
 1  0  1    404   2600    268 111444   0  36   673   571  471   180   4  18  78
 0  2  1    404   2100    268 112028   4   0   862  1090  374   269   6  32  62
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0    404   2240    284 111816   0   0   616   549  488   168   3  17  80
 0  1  1    404   2556    284 111604   0   0   929  1094  408   269   5  24  70
 0  1  1    404   1736    284 112332   0   0   608   556  383   165   4  13  83
 0  1  1    404   2688    284 111352   0   0   833   596  344   255   6  22  72
 0  1  1    404   2892    280 111136   0   0   495   540  337   185   4  10  86
 0  1  1    404   2640    280 111444   0   0   544  1045  310   109   3   9  88
 0  1  1    404   1640    276 112584   0   0   610   560  420   171   2  11  86
 0  1  1    404   2784    284 111312   0   0   449   539  312    86   3   9  88
 0  1  1    404   2460    284 111576   0   0   512   541  316    75   3   8  89
 0  1  1    404   2060    288 111844   0   0   508   541  307   114   4   7  89
 0  1  1    404   2288    288 111416   0   0   549   548  291   175   4   9  86
 0  1  1    404   2908    276 111268   0   0   751   572  397   217   4  17  78
 0  1  0    404   2304    272 111976   0   0   642  1059  323   167   3  13  83
 0  1  1    404   2072    268 112360   0   0   576   569  422   188   4  16  80
 0  1  1    404   2396    268 112128   0   0   641   561  417   178   3  18  79
 2  0  1    404   2672    272 111768   0   0   577   561  294   153   5  13  82
 0  1  1    404   2868    280 111264   0   0   655   550  244   163   5   6  89
 0  1  1    404   2796    284 111256   0   0   544  1056  387   158   3  13  83
 0  1  1    404   1600    292 112344   0   0   642   536  441   132   4   5  90
 0  1  0    404   2112    292 111940   0   0   869  1073  330   225   5  15  79
 0  1  1    404   2180    292 111668   0   0   704   574  439   230   5  15  80
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1    404   2280    280 111692   0   0   784  1081  404   243   5  18  77
 0  1  1    404   1300    276 112736   0   0   654   585  348   223   4  18  78
 0  1  1    404   1644    284 112356   0   0   898  1101  351   269   6  21  73
 0  1  1    404   2936    280 111104   0   0   833   561  366   179   6  21  73
 0  1  1    404   2464    272 111988   0  12   700  1121  425   304   5  33  62
 0  1  1    404   2548    272 111776   0  16   517   588  344   220   1  27  71
 0  1  1    404   1688    272 112548   0   0   591   577  399   217   3  20  76
 0  1  1    404   1796    280 112280   0   0   898  1122  441   342   6  30  64
 0  1  1    404   2824    268 111368   0   0   608   581  385   221   5  25  69
 0  1  1    404   2688    276 111432   0   0   865  1120  398   314   2  26  72
 0  1  1    404   3052    284 110884   0   0   673   567  496   200   1  21  78
 0  1  0    404   2832    276 111284   0   0   816  1089  438   271   4  22  73
 0  1  1    404   2968    280 111244   0   0   770   569  439   218   5  18  77
 0  1  1    404   2364    288 111760   0   0   769  1077  429   242   2  15  82
 0  1  1    404   3040    288 111032   0   0   640   569  373   196   3  14  82
 0  1  0    404   2528    292 111452   0   0   848  1081  352   251   4  26  70
 0  1  1    404   1400    296 112420   0   0   608   553  344   161   6   7  87
 0  1  1    404   1508    280 112620   0   0   834   586  415   262   5  17  78
 0  1  1    404   1416    276 112648   0   0   578  1061  377   182   4  11  85
 1  0  1    404   2964    276 111140   0   0   768   587  392   234   4  26  70
 0  1  1    404   1412    280 112620   0   0   769  1080  342   230   7  19  74
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0    392   3296    292 111980  56   0   899   547  357   232   1  12  87
 0  0  0    392   3296    292 111980   0   0     0     0  102     8   0   0 100
 0  0  0    392   3296    292 111980   0   0     0     0  103    12   0   0 100
 0  0  0    392   3296    292 111980   0   0     0     0  101    10   0   0 100
 0  0  0    392   3296    292 111980   0   0     0     0  101     6   0   0 100
 0  0  0    392   3296    292 111980   0   0     0     0  102     8   0   0 100
 0  0  0    392   3296    292 111980   0   0     0     0  101     6   0   0 100
 0  0  0    392   3296    292 111980   0   0     0     0  102    14   0   0 100
 0  0  0    392   3268    292 112008   0   0    14     0  103    12   0   0 100
 0  0  0    392   3268    292 112008   0   0     0     0  102     8   0   0 100
 0  0  0    392   3268    292 112008   0   0     0     0  101     6   0   0 100
 0  0  0    392   3268    292 112008   0   0     0     0  102     8   0   0 100
 0  0  0    392   3268    292 112008   0   0     0     0  103    12   0   0 100
 0  0  0    392   3268    292 112008   0   0     0     0  101    10   0   0 100
 0  0  0    392   3268    292 112008   0   0     0     0  101     6   0   0 100
 0  0  0    392   3268    292 112008   0   0     0     0  102     8   0   0 100
 0  0  0    392   3268    292 112008   0   0     0     0  101     6   0   0 100
 0  0  0    392   3224    308 112036   4   0    21     0  112    32   0   0 100
 0  0  0    392   3224    308 112036   0   0     0     0  104    12   0   0 100
 0  0  0    392   3224    308 112036   0   0     0     0  101     8   0   0 100
 0  0  0    392   3224    308 112036   0   0     0     0  102     6   0   0 100
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0    392   3224    308 112036   0   0     0     0  101     8   0   0 100
 0  0  0    392   3224    308 112036   0   0     0     0  103    12   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  103    10   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  101     6   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  101    10   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  102     6   0   0 100
 0  0  0    392   3220    308 112040   0   0     0  3225  243    84   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  165     8   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  101     8   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  102     6   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  101     8   0   0 100
 0  0  0    392   3220    308 112040   0   0     0  2238  243    12   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  116    10   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  106     6   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  106     8   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  111    14   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  108    16   0   0 100
 0  0  0    392   3220    308 112040   0   0     0     0  118    20   0   0 100

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11  1:53                 ` Simon Kirby
@ 2000-05-11  7:23                   ` Linus Torvalds
  2000-05-11 14:17                     ` Simon Kirby
  2000-05-11 11:15                   ` [PATCH] Recent VM fiasco - fixed Rik van Riel
  1 sibling, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-11  7:23 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-mm, linux-kernel


Hmm..

 Having tested some more, the "wait for locked buffer" logic in
fs/buffer.c (sync_page_buffers()) seems toserialize thingsawhole lote more
than I initially thought..

Does it act the way you expect if you change the

	if (buffer_locked(p))
		__wait_on_buffer(p);
	else if (buffer_dirty(p))
		ll_rw_block(..

to a simpler

	if (buffer_dirty(p) && !buffer_locked(p))
		ll_rw_block(..

which doesn't endup serializing the IO all the time?

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11  7:23                   ` Linus Torvalds
@ 2000-05-11 14:17                     ` Simon Kirby
  2000-05-11 23:38                       ` Simon Kirby
  0 siblings, 1 reply; 67+ messages in thread
From: Simon Kirby @ 2000-05-11 14:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2449 bytes --]

On Thu, May 11, 2000 at 12:23:19AM -0700, Linus Torvalds wrote:

> Hmm..
> 
>  Having tested some more, the "wait for locked buffer" logic in
> fs/buffer.c (sync_page_buffers()) seems toserialize thingsawhole lote more
> than I initially thought..
> 
> Does it act the way you expect if you change the
> 
> 	if (buffer_locked(p))
> 		__wait_on_buffer(p);
> 	else if (buffer_dirty(p))
> 		ll_rw_block(..
> 
> to a simpler
> 
> 	if (buffer_dirty(p) && !buffer_locked(p))
> 		ll_rw_block(..
> 
> which doesn't endup serializing the IO all the time?

A little bit better!  203 vmstat-line-seconds before, now 155
vmstat-line-seconds to complete.  It seems to be doing a better job like
this, but still doesn't write out in blocks like it used to:

2.3.99pre7-9 vanilla:

 0 1 1   32   2212   216 112424   0   4   609   527  239   112   5   8  87
 0 1 1   32   3004   216 111472   0   4   608   540  327   120   5   7  87
 0 1 1   32   3008   232 111252   0   4   768  1083  405   252   3  11  86
 0 1 1   32   1964   240 112144   0   0   594   570  422   222   3  14  83
 1 1 1   60   2932   248 111164   0  36   661   586  349   226   3  13  84

2.3.99pre7-9 with above adjustment:

 0 1 1   64   3032   272 110768   0   0   833   628  414   338   5  28  67
 0 1 1   64   2608   276 111040   0   0   769  1081  440   268   3  24  72
 0 1 1   64   2684   284 110804   0   0   847  1089  370   280   6  12  82
 0 1 0   64   2832   276 110796   0   0   706   558  384   193   7  11  82
 0 1 1  136   1280   284 112440   0  72   911  1084  494   251   6  13  81

Also, it's still not as fast as classzone-27 writing out, and CPU use is
still a bit higher:

2.3.99pre7-8 classzone-27:

 0 1 0  540   2852   272 110796   0   0  1089  1000  465   148   7   5  88
 0 1 0  540   2980   264 110676   0   0  1423  1500  380   140   7   7  86
 0 1 0  540   2584   272 111068   0   0  1059  1000  421   146   9   3  88
 0 1 0  988   2556   276 111524   0 448  1473  1612  523   148  10   6  84
 1 0 0  988   2704   264 111408   0   0  1135  1000  423   163   6   6  88

(All from random areas, sorry... it might be a good idea to read all of
the output in the attachment.)

I attached vmstat-3.txt, the full output with "2.3.99pre7-9 with above
adjustment".

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

[-- Attachment #2: vmstat-3.txt --]
[-- Type: text/plain, Size: 18880 bytes --]

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0      0 106000   1144  10132   0   0    46     2   68    38   3   2  96
 0  0  0      0 106000   1144  10132   0   0     0     0  102     6   0   0 100
 0  0  0      0 106000   1144  10132   0   0     0     0  102     8   0   0 100
 0  0  0      0 105996   1148  10132   0   0     1     6  107    10   0   0 100
 0  0  0      0 105996   1148  10132   0   0     0     0  102    12   0   0 100
 0  0  0      0 105996   1148  10132   0   0     0     0  102     8   0   0 100
 0  0  0      0 105996   1148  10132   0   0     0     0  122     8   0   0 100
 0  0  0      0 105996   1148  10132   0   0     0     0  121     6   0   0 100
 1  0  0      0  81272   1204  32784   0   0  3002     5  309   245  16  11  72
 0  1  0      0  56364   1236  56840   0   0  3027   500  369   327  16  12  72
 0  1  0      0  44196   1248  68616   0   0  1473  1500  581   196   8   7  85
 0  1  0      0  33288   1264  79224   0   0  1329  1000  584   159   7   7  86
 0  1  0      0  22708   1272  89408   0   0  1281  1500  551   144   8   5  87
 0  1  0      0  14784    872  97684   0   0  1424  1500  566   173  10  12  78
 1  0  1      0  13980    244  99444   0   0   770   525  349   109   3  14  83
 0  1  0      0  12380    204 102208   0   0   800  1021  342   105   4  19  76
 1  0  0      0  11624    200 103216   0   0   801   533  292   111   6  19  74
 0  1  1      0  11276    192 103856   0   0   705  1027  318    93   3  19  77
 0  1  1      4  10164    200 104812   0   4   816   534  414   122   5  17  78
 0  1  1      4   9040    200 106044   0   0   706  1024  390    98   2  19  78
 0  1  1     40   8080    208 107056   0  36   796   539  431   112   5  21  74
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0     40   6936    200 108144   0   0   709  1024  366    92   5  25  69
 0  1  1     40   6328    204 108556   0   0   737   533  339   105   3  18  79
 0  1  0     40   5184    204 109588   0   0   719  1027  327   115   4  19  77
 1  0  1     76   4448    208 110376   0  36   642   533  393    96   5  14  80
 0  1  1     76   3300    204 111444   0   0   832  1030  303   117   8  20  72
 0  1  1    256   3056    208 111760   0 180   513   578  340    74   3  36  60
 0  1  0    292   2816    200 112200   0  36   448    42  325    57   2  40  58
 0  1  1    328   2984    200 112184   0  36   545  1039  271    84   3  30  67
 1  1  1    400   3028    204 112084   0  72   463    58  330    62   2  36  62
 0  1  1    400   3012    208 111988   0   0   544  1031  360    79   5  29  66
 0  1  1    400   2952    216 111896   0   0   546   527  330    74   4  30  65
 0  1  0    400   2988    220 111720   0   0   636   524  461    81   3  28  69
 0  1  0    400   2960    224 111704   0   0   741   530  269   105   3  22  75
 0  1  0    400   2064    216 112660   0   0   705  1024  352    89   5  23  72
 0  1  1    400   2664    224 111872   0   0   783   530  279   116   5  21  73
 0  1  0    400   2824    236 111612   0   0   674  1021  429    99   4  22  74
 1  0  1    400   3080    232 111172   0   0   736   524  387   136   4  19  77
 0  1  0    400   2560    220 112020   0   0   737  1032  304    97   4  17  79
 0  1  0    400   2276    228 112212   0   0   769   545  414    81   7  15  77
 0  1  0    436   2616    216 112164   0  36   751  1069  325   155   5  20  75
 0  1  0    432   2812    224 111748   0   0   866   554  445    88   5  20  75
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1    468   2692    232 111632   0  36   705  1057  377    83   4  15  80
 1  0  0    464   2396    240 111796   0   0   833  1068  342   107   8  12  80
 0  1  0    464   2308    252 111644   0   0  1040  1053  420   122   7  17  76
 0  1  0    460   2364    256 111524   0   0  1026  1059  411   124   4  18  77
 0  1  0    460   2572    260 111212   0   0   961  1068  448   108   7  19  74
 0  1  0    460   2812    252 110972   0   0   993  1068  427   114   7  20  73
 0  1  0    460   2820    248 111104   0   0  1007  1078  426   108   5  20  74
 1  0  1    460   3068    256 110896   0   0   817   555  487    97   7  14  79
 1  0  0    460   2512    248 111568   0   0   929  1063  538   105   7  16  77
 0  1  0    460   2572    232 111720   0   0   929  1059  574    99   6  21  73
 0  1  1    492   2972    240 111244   0  36  1040  1071  421   115   6  16  78
 0  1  1    492   3012    248 111112   0   0  1058  1062  462   122   3  17  79
 0  1  0    492   3048    248 111028   0   0   929  1058  415   112   7  13  79
 1  0  0    492   2124    260 111788   0   0  1040  1060  480   114   7  16  77
 0  1  1    528   2296    252 111772   0  36   924  1071  439   141   5  15  80
 0  1  0    564   2828    232 111648   0  36   966  1081  467   139   7  19  74
 0  1  0    564   3064    236 111492   0   0   897   569  347    98   8  13  78
 1  0  1    564   3072    236 111508   0   0   705  1045  398    79   3  14  82
 0  1  0    564   2332    240 112200   0   0   879  1057  371   121   5  14  81
 1  0  1    564   3068    240 111320   0   0   898   555  369    94   5  17  77
 0  1  0    600   2196    244 112376   0  36   705  1052  375   104   3  18  79
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0    636   2900    232 111728   0  36   865  1071  329   104   6  21  73
 0  1  1    636   3068    244 111496   0   0   653   540  331    85   3  19  78
 1  0  1    636   3056    244 111316   0   0   736   557  389   101   7  19  74
 0  1  0    636   2224    236 112216   0   0   770  1045  431    79   5  15  80
 0  1  0    672   2884    256 111760  24  36   763   554  327   118   7  16  77
 0  1  0    672   2824    252 111672   0   0   769  1048  416   114   4  18  78
 0  1  0    672   2832    260 111568   0   0   943  1077  329   104   7  20  73
 1  0  0    672   3000    268 111284   0   0   674   546  430    71   5  13  82
 0  1  0    672   2844    264 111616   0   0   865  1048  447    85   8  14  78
 0  1  0    708   2896    260 111372   0  36   928  1082  348   112   3  19  78
 0  1  1    708   3068    252 111752   0   0   720   547  528    85   4  17  78
 0  1  0    708   2536    252 112056   0   0   896  1043  376    97   7  12  81
 0  1  0    744   2880    264 111772   0  36   834   571  427    88   4  21  74
 1  0  0    740   2540    268 112096   0   0   769  1041  420    81   5  18  77
 0  1  0    772   2548    256 112152   0  36   865  1059  335    94   4  21  75
 0  1  1    772   2964    252 111764   0   0   816   559  381   101   4  15  81
 0  1  0    772   2764    248 112056  16   0   757  1040  443    95   4  13  82
 0  1  0    768   2292    252 112348   0   0   929  1059  378    90   4  19  77
 0  1  0    768   3072    252 111336   0   0   994  1060  442   104   7  23  70
 0  1  0    768   2440    252 111944   0   0  1039  1059  417   114   5  18  76
 0  1  0    768   2556    240 112224   0   0  1026  1068  464   109   6  22  72
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1    768   3068    252 111584   0   0   990  1058  392   124   4  18  78
 0  1  0    768   2696    256 111708   0   0   993  1059  427   123   8  19  73
 0  1  1    768   3068    252 111604   0   0   975  1062  429   105   6  18  75
 0  1  0    768   2544    252 111916   0   0   994  1063  498   101   6  17  77
 0  1  0    768   3064    240 111648   0   0  1025  1065  452   104   7  19  74
 0  1  0    840   2368    252 112160   0  72   976  1068  387   110   5  16  79
 0  1  0    840   2876    264 111424   0   0   994  1056  395   104   6  14  79
 0  1  0    836   2864    264 111464   0   0  1025  1056  473    98   7  15  78
 1  0  1    836   3080    240 111340   0   0   993  1053  512   108   6  16  77
 0  1  0    872   2804    240 112080   0  36   975   569  376   102   4  17  78
 1  0  1    872   3056    240 111868   0   0   738  1047  505    89   4  18  77
 0  1  0    872   2364    240 112448   0   0   865  1057  413   105   7  15  78
 0  1  1    872   3068    240 111676   0   0   928   570  350    87   5  18  76
 0  1  0    872   2712    236 112148   0   0   610  1042  349    82   4  13  83
 0  1  0    872   2600    240 112236   0   0   942  1077  349   122   6  15  79
 0  1  0    872   2888    244 112016   0   0   706   543  330    84   4  13  83
 0  1  0    872   2736    248 111988   0   0   865  1048  390   114   6  13  81
 0  1  0    872   2280    244 112448   0   0   929  1046  348    96   6  17  77
 1  0  0    868   2688    244 111968   0   0   847   557  422    95   7  17  76
 0  1  1    868   2536    256 111856   0   0   770  1048  411    99   5  14  81
 0  1  0    868   2308    240 112716   0   0   895  1067  446    99   6  20  74
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0    904   2812    248 112092   0  36   993  1067  485   125   5  18  77
 0  1  0    940   2892    248 111932   0  36   879   561  463    96   3  15  81
 1  0  1    940   2464    260 111992   0   0  1058  1060  429   104   6  15  79
 0  1  0    940   2552    248 112304   0   0  1057  1077  509   107   5  21  73
 0  1  0    976   2816    248 111972  12  36  1011  1061  424   103   7  16  77
 1  0  0    976   2140    256 112300   0   0   975  1050  411   116   8  13  79
 0  1  1    976   3068    256 111724   0   0   834  1066  375    94   2  17  81
 0  1  1    976   3068    236 112080   0   0   929  1048  402    96   4  23  73
 0  1  1    976   2960    244 111940   0   0   961  1054  394   131   5  17  78
 0  1  0   1012   2828    240 112108   0  36   975  1068  490   136   8  22  70
 0  1  0   1048   2828    240 112092   0  36   994  1081  438   156   5  19  76
 1  0  0   1048   2992    248 111792   0   0   993  1053  414   121   6  16  77
 0  1  0   1048   2824    252 111708   0   0   944  1057  382   111   7  15  78
 0  1  0   1048   2896    256 111824   0   0   860   545  367   100   6  16  78
 0  1  1   1048   3068    236 112060   0   0   838  1050  474   124   5  15  79
 1  0  0   1048   3088    248 111864   0   0   898  1050  415   119   6  19  75
 0  1  0   1084   2204    256 112596   0  36   919  1061  392   115   5  14  81
 1  0  0   1084   2292    264 112316   0   0  1083  1060  445   113  10  14  76
 1  0  1   1120   2420    252 112472   0  36  1025  1078  511   101   5  18  77
 0  1  0   1192   2684    248 112392   0  72   993  1085  479   102   5  18  77
 0  1  0   1192   3060    248 112040   0   0  1007  1060  398   118   5  15  79
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0   1228   2548    276 112316  88  36   940  1048  388   124   6  14  80
 0  1  0   1228   2380    284 112220   0   0  1057  1056  525   124  10  14  76
 0  1  1   1228   3044    284 111460   0   0   993  1035  530    96   5  16  79
 0  1  0   1228   2576    260 111900   0   0  1007  1051  394   112   4  17  79
 0  1  0   1228   2348    256 113036   0   0  1026  1061  456   104   4  23  73
 0  1  0   1300   2276    264 112916   0  72   993  1066  441   108   6  13  81
 0  1  0   1296   2324    276 112612   0   0   919  1060  469   114   7  14  79
 0  1  1   1296   2336    276 112356   0   0   761   542  429   187   4  13  83
 0  1  0   1296   2292    280 112656   0   0   866  1055  328    99   3  18  79
 0  1  0   1296   2820    268 112140   0   0   865  1080  462   101   6  18  76
 1  0  1   1296   3068    272 111868   0   0   673   539  441   112   2  15  82
 0  1  1   1284   2612    268 112516  52   0   924  1065  424   109   4  20  76
 0  1  1   1284   2708    280 112244   4   0   754   542  334    86   3  19  78
 0  1  0   1280   2168    276 112712   0   0   832  1050  403   101   6  14  80
 0  1  0   1280   2388    248 112688   0   0   897  1062  368    94   6  18  76
 0  1  0   1280   2264    244 113180   0   0   705   539  428    86   5  15  80
 0  1  0   1280   3032    248 112184   0   0   879  1049  337   101   6  18  76
 0  1  0   1280   2764    268 112288   0   0   870   540  306    94   4  15  80
 0  1  1   1288   3072    268 112028   0   8  1022  1062  398    99   6  23  71
 0  1  1   1324   3072    272 111852   0  36   801  1039  520    85   5   9  85
 0  1  0   1324   2868    280 111876   0   0  1135  1068  460   132   7  16  77
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0   1360   2804    264 112356   0  36   962  1061  489   110   4  18  78
 0  1  1   1360   3032    260 112172   0   0  1057  1070  565   109   6  20  74
 0  1  0   1360   2356    268 112708   0   0   702  1036  408    85   5  12  83
 0  1  0   1396   2556    276 112428   0  36  1010  1057  444   106   4  17  78
 0  1  1   1396   2852    272 112056   0   0   994  1056  420   105   6  15  79
 0  1  0   1396   2300    272 112580   0   0  1025  1063  447   129   7  15  78
 0  1  0   1396   2808    260 112632   0   0   977  1058  427   118   5  22  73
 0  1  0   1396   2740    264 112612   0   0   994  1042  486   103   5  18  77
 0  1  0   1396   2800    268 112388   0   0   993  1049  410   108   5  15  80
 0  1  0   1396   2352    272 112712   0   0   993  1059  487   133   7  14  79
 0  1  0   1396   3068    276 111908   0   0   975  1062  409   123   7  15  78
 0  1  0   1396   2544    284 112260   0   0   642   534  455    66   5   8  87
 0  1  0   1396   2176    280 112916   0   0   898  1045  353   110   6  17  77
 0  1  0   1396   2760    276 112160   0   0   993  1059  475   114   6  20  74
 0  1  0   1396   2888    276 112032   0   0   975  1037  399   112   5  14  80
 0  0  0   1264   2976    292 112956 100   0   415     2  239    93   0   3  97
 0  0  0   1264   2976    292 112956   0   0     0     0  101    10   0   0 100
 0  0  0   1264   2916    292 113012   0   0    29     0  105    10   0   0 100
 0  0  0   1264   2916    292 113012   0   0     0     0  101     8   0   0 100
 0  0  0   1264   2916    292 113012   0   0     0     0  102     6   0   0 100
 0  0  0   1264   2916    292 113012   0   0     0     0  101    14   0   0 100
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0   1264   2860    296 113056  16   0    19     0  107    16   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  105     8   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  103     8   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  103     8   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  101    12   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  102    10   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  102     6   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  103     8   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  101     6   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  101    14   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  102     8   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  101     8   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  102     8   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  101     8   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  102    12   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  101    10   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  102     6   0   0 100
 0  0  0   1264   2860    296 113056   0   0     0     0  102     8   0   0 100
 0  0  0   1264   2732    312 113160  80   0    40     0  120    36   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0  1174  177    14   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  101     8   0   0 100
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0   1264   2728    312 113164   0   0     0     0  101     8   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  102     6   0   0 100
 0  0  1   1264   2728    312 113164   0   0     0  2601  101    10   0   0  99
 0  0  0   1264   2728    312 113164   0   0     0  1673  356   240   0   1  99
 0  0  0   1264   2728    312 113164   0   0     0     0  132    10   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  102     6   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  101     8   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  102     6   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     4  104    14   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  102    12   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  101     8   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  102     6   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  101     8   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  102    12   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  101    10   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  102     6   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  101     8   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  102     6   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  101    14   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  102     8   0   0 100
 0  0  0   1264   2728    312 113164   0   0     0     0  101     8   0   0 100
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0   1264   2724    312 113168   4   0     1     0  110    16   0   0  99
 0  0  0   1264   2552    332 113308   0   0    78     4  123    47   0   1  99
 0  0  0   1264   2552    332 113308   0   0     0     0  102     8   0   0 100
 0  0  0   1264   2552    332 113308   0   0     0     0  108    10   0   0 100

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11 14:17                     ` Simon Kirby
@ 2000-05-11 23:38                       ` Simon Kirby
  2000-05-12  0:09                         ` Linus Torvalds
  0 siblings, 1 reply; 67+ messages in thread
From: Simon Kirby @ 2000-05-11 23:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1576 bytes --]

On Thu, May 11, 2000 at 07:17:04AM -0700, Simon Kirby wrote:

> On Thu, May 11, 2000 at 12:23:19AM -0700, Linus Torvalds wrote:
> ...
> > which doesn't endup serializing the IO all the time?
> 
> A little bit better!  203 vmstat-line-seconds before, now 155
> vmstat-line-seconds to complete.  It seems to be doing a better job like
> this, but still doesn't write out in blocks like it used to:

Hrm!  pre7 release seems to be even better.  113 vmstat-line-seconds now
(yes, I know this isn't a very scientific testing method :)).  Second try
was 114 vmstat-line-seconds.  classzone-27 did it in 107, so that's not
very far off!  Also, it swapped much less this time, and used less CPU. 
vmstat output attached.

Hmm...I don't know if this means anything, but this kernel and pre7-9
with the buffer.c modification seem to look at bit different than with
classzone and with 2.2.  As the free memory is used and turned into cache
as the uncompression first starts, it seemed to kind of sweep down not in
a line but in a curve as it approached the minimum free, and during the
beginning it was writing out in groups of 500 blocks but then went back
to the continuous writing.  It seems odd that when it start it has no
problem going through the first 50 MB in two or three seconds but then
takes a long time to go through the next.  Maybe not, though.  Just
noticing. :)

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

[-- Attachment #2: vmstat-4.txt --]
[-- Type: text/plain, Size: 13280 bytes --]

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0      0 102848   1252  10652   0   0    27     1   67    32   1   1  98
 0  0  0      0 102928   1252  10652   0   0     0     0  114    27   0   0 100
 0  0  0      0 102848   1252  10652   0   0     0     0  123    34   0   0  99
 0  0  0      0 102844   1252  10652   0   0     0     0  132    27   0   0 100
 0  0  0      0 102844   1252  10652   0   0     0     0  133    30   0   0 100
 0  0  0      0 102844   1252  10652   0   0     0     0  127    26   0   0 100
 1  0  0      0  96160   1268  15896   0   0   820     0  162   109   3   4  93
 1  0  0      0  65276   1308  45840   0   0  3763     0  353   313  23  13  64
 0  1  0      0  50064   1324  60432   0   0  1827  1003  352   250  11   6  83
 0  1  0      0  37832   1340  72264   0   0  1488  1500  591   193   8   5  87
 0  1  0      0  25928   1352  83784   0   0  1442  1500  608   185  10   5  85
 1  0  0      0  15760   1200  93968   0   0  1346  1000  621   165   8   7  85
 0  1  0      0  15044    252  96420   0   0  1231  1503  451   193  10   4  86
 1  0  0      0  13796    228  98736   0   0  1284  1000  578   205   7   7  86
 0  1  0      0  13676    228  98872   0   0   705  1802  778   157   3   4  93
 0  1  0      0  12536    232 100012   0   0  1522  1027  531   237   8   6  86
 0  1  0      0  12024    224 100500   0   0   642  2150  748   224   5   4  91
 0  1  0      0  11856    224 100696   0   0   704  1473  745   107   4   3  93
 0  1  0      0  10220    204 102488   0   0  1698   932  550   244   8   7  85
 1  0  0      0   9096    220 103804   0   0  1523   870  430   239   7   5  88
 0  1  0      0   8284    220 104548   0   0  1025  1348  596   161   5   6  89
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0      0   7940    204 104852   0   0  1183  1377  619   185   8   7  85
 0  1  0      0   7184    212 105604   0   0  1269  1157  496   212   9   5  86
 1  0  0      0   6052    220 106732   0   0  1059  1124  586   169   6   4  90
 1  0  0      0   4928    216 107932   0   0  1426  1361  607   224   8   7  85
 0  1  0      0   4464    220 108328   0   0  1218  1302  617   181   5   3  92
 0  1  0      0   4016    212 108820   0   0   865  1445  698   178   4   6  89
 0  1  0      0   3076    204 109800   0   0  1153  1156  545   174   8   4  88
 0  1  0      0   2572    220 110280   0   0  1651   897  531   240   9   6  85
 0  1  0      0   3008    220 109688   0   0  1762  1030  475   222  13   7  80
 0  1  0      0   2820    224 109848   0   0  1008  1092  554   132   5   6  89
 0  1  0      0   2448    232 110140   0   0  1250  1153  466   180   8   5  87
 0  1  0      0   2580    232 109884   0   0  1437  1235  595   183   8   5  87
 0  1  0      0   2884    228 109560   0   0  1590  1010  500   202  11   7  82
 0  1  0      0   2760    228 109672   0   0  1155  1540  609   174   5   6  88
 1  0  0      0   3072    232 109360   0   0  1089  1540  517   183   8   5  87
 0  1  0      0   2768    216 109692   0   0   415  1605  677   285   2   3  94
 0  1  0      0   2832    224 109704   0   0  1396  1155  547   188   7   5  88
 1  0  0      0   2344    228 110336   0   0  1693   961  600   238  10   7  83
 0  1  0      0   2684    240 109812   0   0  2038   868  500   266  14   9  77
 1  0  0      0   2436    228 110184   0   0  1057  1446  576   156   5   3  91
 0  1  0      0   2980    216 109556   0   0   416  1746  711   136   3   3  93
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0      0   2564    224 109976   0   0   946  1108  624   173   5   6  89
 0  1  0      0   2432    228 110180   0   0  1762   963  529   243  10  10  80
 0  1  0      0   2336    236 110248   0   0  2097  1028  586   282  13   5  82
 0  1  1      0   2812    220 109572   0   0  1250  1831  488   154   7   4  88
 0  1  0      0   2592    212 109828   0   0   412  1539  763   361   2   4  93
 0  1  1      0   2868    232 109560   0   0  1689   389  492   221  11   3  86
 0  1  0      0   2564    232 109860   0   0  1282  1199  576   151   8   5  87
 1  0  0      0   2492    236 109928   0   0  1153  1281  565   163   6   5  89
 0  1  0      0   2828    224 109564   0   0  1456  1704  472   362   7   9  84
 0  1  0      0   2796    220 109600   0   0   512  1156  754   280   5   1  94
 1  0  0      0   2680    228 109784   0   0   994  1251  575   165   5   7  87
 0  1  0      0   2580    236 109836   0   0  1634  1122  505   221  11   6  83
 1  0  0      0   2256    232 110236   0   0  2097   739  434   229  18   6  76
 0  1  0      0   2840    212 109612   0   0  1213  1319  540   145   9   5  86
 0  1  0      0   2564    224 109880   0   0  1046  1378  592   156   4   4  92
 0  1  1      0   2816    236 109568   0   0   964  1281  580   149   5   5  89
 0  1  0      0   2564    236 109800   0   0  1153  1205  616   176   6   7  87
 0  1  0      0   2500    240 109804   0   0  1712   996  456   200  13   8  79
 0  1  0      0   2556    228 109796   0   0  1378  1267  515   150  10   6  83
 0  1  0      0   2564    224 109840   0   0   961  1831  522   294   3   7  90
 0  1  0      0   2236    224 110196   0   0   417  1249  741   245   3   1  96
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0      0   2692    224 109828   0   0  1213  1252  602   188   8   6  86
 0  1  0      0   2580    228 109996   0   0  1751   993  511   199  10   8  82
 0  1  0      0   2576    232 110032   0   0  1392  1094  570   181   9   5  86
 1  0  0      0   2604    232 110016   0   0  1731  1121  539   221  10   9  81
 0  1  1      0   2796    236 109584   0   0  1411  1124  576   175  11   4  85
 1  0  0      0   2924    236 109368   0   0  1169  1476  409   146   9   4  87
 0  1  0      0   2564    236 109728   0   0  1185  1092  570   162   8   6  86
 0  1  0      0   2568    220 109752   0   0  1281  1569  647   185   8   8  84
 0  1  0      0   2820    228 109496   0   0   721  1381  558   112   5   3  91
 0  1  0      0   2328    228 110020   0   0   706  1699  617   150   2   3  95
 1  0  0      0   2884    208 109680   0   0   428  1188  691   218   3   5  92
 0  1  1      0   3072    192 109720   0   0   738  2171  582   133   5   6  89
 0  1  0      0   2948    188 110484   0   0   769   941  709   176   4   5  90
 1  0  0      0   2264    220 110992   0   0  2613   261  382   293  16  11  72
 0  1  0     36   2908    224 110112   0  36  1585  1023  441   216   9   9  82
 0  1  0     36   2832    224 110092   0   0  1090  1716  552   192   6   5  89
 1  0  0     36   2172    220 109876   0   0   608  1025  699   187   2   4  94
 1  0  1     36   2896    220 110052   0   0  2291   708  492   285  13   8  79
 1  0  0     36   3068    228 109684   0   0  1506  1414  470   237   9   8  82
 0  2  0     36   2764    220 110032   0   0   446  1797  725   303   4   3  93
 1  0  0     36   2260    216 110620   0   0   929   769  615   128   8   4  88
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0     36   2316    232 110464   0   0  1715  1089  545   207   7   7  86
 0  1  0     36   2752    224 110036   0   0  1346  1061  523   234   6   6  88
 0  1  0     36   2584    224 110232   0   0  1296  1349  591   226   9   5  86
 0  1  1     36   3080    216 109756   0   0   689  1988  612   212   3   6  91
 0  1  0     36   2684    212 110204   0   0  1411   513  592   228   8   5  87
 0  1  0     36   3064    224 109792   0   0  1586  1252  510   244  11   8  81
 0  1  0     36   2900    232 109800   0   0  1858   978  533   213  11   9  80
 0  1  0     36   2804    228 109800   0   0  1085  1256  495   202   6   7  87
 0  1  0     36   2564    228 110040   0   0  1140  1186  534   136   7   5  87
 0  1  0     36   2820    240 109632   0   0  1539  1252  583   184   9   5  86
 0  1  0     36   2692    236 109772   0   0  1121  1026  567   165  11   3  86
 0  1  0     36   2584    224 109912   0   0   911  1381  463   255   6   5  89
 1  0  0     36   2424    220 110240   0   0   544  1026  723   130   3   3  94
 0  1  0     36   2560    216 110036   0   0  1507  1445  605   170   7  10  83
 0  1  0     36   2196    232 109548   0   0  1426  1378  572   191  10   5  85
 0  1  0     36   2744    236 109828   0   0  2147  1110  537   252  13  12  75
 0  1  0     72   2836    220 109764   0  36   739  1181  563   116   4   3  92
 0  1  0     72   2228    232 110364   0   0  1071  1025  592   164   9   4  87
 0  1  0     72   2632    248 109956   0   0  1221  1186  526   178   9   4  87
 0  1  0     72   2824    236 109784   0   0  1121  1155  579   156   7   5  88
 0  1  0     72   2572    236 110012   0   0  1264  1122  532   181   8   6  86
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  0     72   2984    224 109588   0   0  1442  1116  403   154   9   4  86
 0  1  0    108   3020    216 109620   0  36  1123  1498  535   134   6   7  87
 0  1  0    108   2580    216 110072   0   0   545   803  454    90   3   2  94
 0  1  0    108   2568    228 110088   0   0  1101  1314  505   162   6   6  88
 0  1  0    108   2732    232 109920   0   0  1223  1152  557   185   7   3  89
 0  1  0    108   2820    228 109868   0   0  1153  1221  592   168   7   4  89
 0  1  0    108   2564    240 110128   0   0  1426   901  503   254   8   5  86
 0  1  0    108   2760    236 109948   0   0  1121  1090  536   141   6   6  88
 0  1  0    108   2628    232 110020   0   0  1520  1347  579   200  13   5  82
 0  1  0    108   2772    236 109868   0   0  1122  1187  557   157   8   5  87
 0  1  0    108   2568    236 110072   0   0  1058  1219  573   178   5   6  89
 1  0  0    108   2812    228 109900   0   0  1121  1412  656   168   7   3  90
 0  0  0    108   3804    248 110140   0   0  1094   609  397   166   4   4  91
 0  0  0    108   3804    248 110140   0   0     0     0  105    29   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  103    24   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  102    26   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  103    30   0   0  99
 0  0  0    108   3804    248 110140   0   0     0     0  101    29   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  101    25   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  102    24   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  101    26   0   0 100
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0    108   3804    248 110140   0   0     0     0  102    29   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  103    31   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  101    24   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  103    28   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  103    30   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  101    27   0   0 100
 0  0  0    108   3804    248 110140   0   0     0     0  103    30   0   0 100
 0  0  0    108   3764    264 110164   0   0    20     0  112    43   0   0  99
 0  0  0    108   3764    264 110164   0   0     0     0  101    24   0   0 100
 0  0  0    108   3764    264 110164   0   0     0     0  101    30   0   0 100
 0  0  0    108   3764    264 110164   0   0     0     0  102    24   0   0 100
 0  0  0    108   3764    264 110164   0   0     0     2  105    32   0   0 100
 0  0  0    108   3764    264 110164   0   0     0     0  101    23   0   0 100
 0  0  0    108   3764    264 110164   0   0     0     0  101    26   0   0 100
 0  0  0    108   3760    264 110168   0   0     0     0  106    32   0   0 100
 0  0  0    108   3760    264 110168   0   0     0     0  103    27   0   0 100
 0  0  0    108   3760    264 110168   0   0     0     0  101    29   0   0 100
 0  0  0    108   3760    264 110168   0   0     0     0  101    26   0   0 100
 0  0  0    108   3760    264 110168   0   0     0     0  102    23   0   0 100
 0  0  0    108   3760    264 110168   0   0     0     0  103    31   0   0 100
 0  0  0    108   3760    264 110168   0   0     0     0  101    26   0   0 100
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0    108   3760    264 110168   0   0     0  4834  313   332   0   1  99
 0  0  0    108   3760    264 110168   0   0     0     0  211    26   0   0 100
 0  0  0    108   3760    264 110168   0   0     0     0  103    27   0   0 100

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11 23:38                       ` Simon Kirby
@ 2000-05-12  0:09                         ` Linus Torvalds
  2000-05-12  2:51                           ` [RFC][PATCH] shrink_mmap avoid list_del (Was: Re: [PATCH] Recent VM fiasco - fixed) Roger Larsson
  0 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-12  0:09 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-mm, linux-kernel

On Thu, 11 May 2000, Simon Kirby wrote:
> 
> Hrm!  pre7 release seems to be even better.  113 vmstat-line-seconds now
> (yes, I know this isn't a very scientific testing method :)).  Second try
> was 114 vmstat-line-seconds.  classzone-27 did it in 107, so that's not
> very far off!  Also, it swapped much less this time, and used less CPU. 
> vmstat output attached.

The final pre7 did something that I'm not entirely excited about, but that
kind of makes sense at least from a CPU standpoint (as the SGI people have
repeated multiple times). What the real pre7 does is to just move any page
that has problems getting free'd to the head of the LRU list, so that we
won't try it immediately the next time. This way we don't test the same
pages over and over again when they are either shared, in the wrong zone,
or have dirty/locked buffers.

It means that the "LRU" is less LRU, but you could see it as a "how hard
do we want to free this" pressure-based system that really a least
recently _used_ system. And it avoids the "repeat the whole thing on the
same page" issue. And it looks like it behaves reasonably well, while
saving a lot of CPU.

Knock wood.

I'm still considering the pre7 as more a "ok, I tried to get rid of the
cruft" thing. Most of the special case code that has accumulated lately is
gone. We can start adding stuff back now, I'm happy that the basics are
reasonably clean.

I think Ingo already posted a very valid concern about high-memory
machines, and there are other issues we should look at. I just want to be
in a position where we can look at the code and say "we do X because Y",
rather than a collection of random tweaks that just happens to work.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC][PATCH] shrink_mmap avoid list_del (Was: Re: [PATCH] Recent VM fiasco - fixed)
  2000-05-12  0:09                         ` Linus Torvalds
@ 2000-05-12  2:51                           ` Roger Larsson
  0 siblings, 0 replies; 67+ messages in thread
From: Roger Larsson @ 2000-05-12  2:51 UTC (permalink / raw)
  To: Linus Torvalds, Rik van Riel; +Cc: linux-mm

[-- Attachment #1: Type: text/plain, Size: 2210 bytes --]

Hi,

I tried to find a way to walk the lru list without list_del.

Here is my patch:
- not compiled nor run (low on HD...)

Could something like this be used?
If no, why not?

/RogerL


Linus Torvalds wrote:
> 
> On Thu, 11 May 2000, Simon Kirby wrote:
> >
> > Hrm!  pre7 release seems to be even better.  113 vmstat-line-seconds now
> > (yes, I know this isn't a very scientific testing method :)).  Second try
> > was 114 vmstat-line-seconds.  classzone-27 did it in 107, so that's not
> > very far off!  Also, it swapped much less this time, and used less CPU.
> > vmstat output attached.
> 
> The final pre7 did something that I'm not entirely excited about, but that
> kind of makes sense at least from a CPU standpoint (as the SGI people have
> repeated multiple times). What the real pre7 does is to just move any page
> that has problems getting free'd to the head of the LRU list, so that we
> won't try it immediately the next time. This way we don't test the same
> pages over and over again when they are either shared, in the wrong zone,
> or have dirty/locked buffers.
> 
> It means that the "LRU" is less LRU, but you could see it as a "how hard
> do we want to free this" pressure-based system that really a least
> recently _used_ system. And it avoids the "repeat the whole thing on the
> same page" issue. And it looks like it behaves reasonably well, while
> saving a lot of CPU.
> 
> Knock wood.
> 
> I'm still considering the pre7 as more a "ok, I tried to get rid of the
> cruft" thing. Most of the special case code that has accumulated lately is
> gone. We can start adding stuff back now, I'm happy that the basics are
> reasonably clean.
> 
> I think Ingo already posted a very valid concern about high-memory
> machines, and there are other issues we should look at. I just want to be
> in a position where we can look at the code and say "we do X because Y",
> rather than a collection of random tweaks that just happens to work.
> 
>                 Linus
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/

--
Home page:
  http://www.norran.net/nra02596/

[-- Attachment #2: patch-2.3.99-pre7-9-shrink_mmap.1 --]
[-- Type: text/plain, Size: 3624 bytes --]

diff -Naur linux-2.3-pre9--/mm/filemap.c linux-2.3/mm/filemap.c
--- linux-2.3-pre9--/mm/filemap.c	Fri May 12 02:42:19 2000
+++ linux-2.3/mm/filemap.c	Fri May 12 04:28:30 2000
@@ -236,7 +236,6 @@
 int shrink_mmap(int priority, int gfp_mask)
 {
 	int ret = 0, count;
-	LIST_HEAD(old);
 	struct list_head * page_lru, * dispose;
 	struct page * page = NULL;
 	
@@ -244,26 +243,29 @@
 
 	/* we need pagemap_lru_lock for list_del() ... subtle code below */
 	spin_lock(&pagemap_lru_lock);
-	while (count > 0 && (page_lru = lru_cache.prev) != &lru_cache) {
+	page_lru = &lru_cache;
+	while (count > 0 && (page_lru = page_lru->prev) != &lru_cache) {
 		page = list_entry(page_lru, struct page, lru);
-		list_del(page_lru);
 
 		dispose = &lru_cache;
 		if (PageTestandClearReferenced(page))
 			goto dispose_continue;
 
 		count--;
-		dispose = &old;
+
+		dispose = NULL;
 
 		/*
 		 * Avoid unscalable SMP locking for pages we can
 		 * immediate tell are untouchable..
 		 */
 		if (!page->buffers && page_count(page) > 1)
-			goto dispose_continue;
+			continue;
 
+		/* Lock this lru page, reentrant
+		 * will be disposed correctly when unlocked */
 		if (TryLockPage(page))
-			goto dispose_continue;
+			continue;
 
 		/* Release the pagemap_lru lock even if the page is not yet
 		   queued in any lru queue since we have just locked down
@@ -281,7 +283,7 @@
 		 */
 		if (page->buffers) {
 			if (!try_to_free_buffers(page))
-				goto unlock_continue;
+				goto page_unlock_continue;
 			/* page was locked, inode can't go away under us */
 			if (!page->mapping) {
 				atomic_dec(&buffermem_pages);
@@ -336,27 +338,43 @@
 
 cache_unlock_continue:
 		spin_unlock(&pagecache_lock);
-unlock_continue:
+page_unlock_continue:
 		spin_lock(&pagemap_lru_lock);
 		UnlockPage(page);
 		put_page(page);
+		continue;
+
 dispose_continue:
-		list_add(page_lru, dispose);
-	}
-	goto out;
+		/* have the pagemap_lru_lock, lru cannot change */
+		{
+		  struct list_head * page_lru_to_move = page_lru; 
+		  page_lru = page_lru->next; /* continues with page_lru.prev */
+		  list_del(page_lru_to_move);
+		  list_add(page_lru_to_move, dispose);
+		}
+		continue;
 
 made_inode_progress:
-	page_cache_release(page);
+		page_cache_release(page);
 made_buffer_progress:
-	UnlockPage(page);
-	put_page(page);
-	ret = 1;
-	spin_lock(&pagemap_lru_lock);
-	/* nr_lru_pages needs the spinlock */
-	nr_lru_pages--;
+		/* like to have the lru lock before UnlockPage */
+		spin_lock(&pagemap_lru_lock);
 
-out:
-	list_splice(&old, lru_cache.prev);
+		UnlockPage(page);
+		put_page(page);
+		ret++;
+
+		/* lru manipulation needs the spin lock */
+		{
+		  struct list_head * page_lru_to_free = page_lru; 
+		  page_lru = page_lru->next; /* continues with page_lru.prev */
+		  list_del(page_lru_to_free);
+		}
+
+		/* nr_lru_pages needs the spinlock */
+		nr_lru_pages--;
+
+	}
 
 	spin_unlock(&pagemap_lru_lock);
 
diff -Naur linux-2.3-pre9--/mm/vmscan.c linux-2.3/mm/vmscan.c
--- linux-2.3-pre9--/mm/vmscan.c	Fri May 12 02:42:19 2000
+++ linux-2.3/mm/vmscan.c	Fri May 12 04:32:16 2000
@@ -443,10 +443,9 @@
 
 	priority = 6;
 	do {
-		while (shrink_mmap(priority, gfp_mask)) {
-			if (!--count)
-				goto done;
-		}
+	        count -= shrink_mmap(priority, gfp_mask);
+		if (count <= 0)
+		  goto done;
 
 		/* Try to get rid of some shared memory pages.. */
 		if (gfp_mask & __GFP_IO) {
@@ -481,10 +480,9 @@
 	} while (--priority >= 0);
 
 	/* Always end on a shrink_mmap.. */
-	while (shrink_mmap(0, gfp_mask)) {
-		if (!--count)
-			goto done;
-	}
+	count -= shrink_mmap(priority, gfp_mask);
+	if (count <= 0)
+	  goto done;
 
 	return 0;
 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11  1:53                 ` Simon Kirby
  2000-05-11  7:23                   ` Linus Torvalds
@ 2000-05-11 11:15                   ` Rik van Riel
  1 sibling, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-11 11:15 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Linus Torvalds, linux-mm, linux-kernel

On Wed, 10 May 2000, Simon Kirby wrote:

> Is Andrea taking a too dangerous approach for the current kernel
> version, or are you trying to get something extremely simple
> working instead?

You may want to read his patch before saying it does any good.

There probably are some good bits in the classzone patch, but
it also backs out bugfixes for bugs which have been proven to
exist and fixed by those fixes. ;(

It would be nice if Andrea could separate the good bits from
the bad bits and make a somewhat cleaner patch...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11  1:04               ` [PATCH] Recent VM fiasco - fixed Juan J. Quintela
  2000-05-11  1:53                 ` Simon Kirby
@ 2000-05-11  5:10                 ` Linus Torvalds
  2000-05-11 10:09                   ` James H. Cloos Jr.
                                     ` (2 more replies)
  1 sibling, 3 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-11  5:10 UTC (permalink / raw)
  To: Juan J. Quintela; +Cc: James H. Cloos Jr., linux-mm, linux-kernel

On 11 May 2000, Juan J. Quintela wrote:
>
>         I have done my normal mmap002 test and this goes slower than
> ever, it takes something like 3m50 seconds to complete, (pre7-8 2m50,
> andrea classzone 2m8, and 2.2.15 1m55 for reference).

Note that the mmap002 test is avery bad performance test.

Why?

Because it's a classic "walk a large array in order" test, which means
that the worst possible order to page things out in is LRU.

So toreally speed up mmap002, the best approach is to try to be as non-LRU
as possible, which is obviously the wrong thing to do in real life. So in
that sense optimizing mmap002 is a _bad_ thing.

What I found interesting was how the non-waiting version seemed to have
the actual _disk_ throughput a lot higher. That's much harder to measure,
and I don't have good numbers for it, the best I can say is that it causes
my ncr SCSI controller to complain about too deep queueing depths, which
is a sure sign that we're driving the IO layer hard. Which is a good
thingwhen you measure how efficiently you page things in and out..

But don't look at wall-clock times for mmap002. 

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11  5:10                 ` Linus Torvalds
@ 2000-05-11 10:09                   ` James H. Cloos Jr.
  2000-05-11 17:25                   ` Juan J. Quintela
  2000-05-11 23:25                   ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
  2 siblings, 0 replies; 67+ messages in thread
From: James H. Cloos Jr. @ 2000-05-11 10:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Juan J. Quintela, linux-mm, linux-kernel

Tried the cp of the compiled kernel tree on 7-9.  *Much* better than any of
the 99s I've tried.  On the 4k ext2 ide drive:

    # time cp -av linux-2.3.99-pre7-9 L
    [...]
    0.81user 8.95system 3:37.76elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (158major+199minor)pagefaults 0swaps
    # time du -s L
    137404  L
    0.05user 0.42system 0:05.41elapsed 8%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (105major+26minor)pagefaults 0swaps

kswapd did hit a peak of 50% cpu, but only *very* briefly; it hovered
in the 5% to 10% range for most of the 218 seconds.

On the 1k ext2 scsi drive, kswapd never exceeded 25% cpu, though the
cp took about twice as long for 2/3 the data (and no -v switch):

    # time cp -a linux-2.3.99-pre7-8/ L 
    0.26user 6.80system 5:57.71elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (141major+180minor)pagefaults 0swaps
    # time du -s L
    88545   L
    0.02user 0.59system 0:03.82elapsed 15%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (105major+23minor)pagefaults 0swaps

Mem usage seems to be about 2:1 in favour of cache+buffer.

Another usefule test I've found is to run realplay on large streams.
mediatrip.com has some useful ones, OTOO 22 minutes at 700 kbps.
Watching the four or five such streams which make up a given film in
the same realplay session will result in a segfault in any of the
previous 99s.  At least if you watch the 700 kbps streams at double
resolution.  That combo seems to have enough memory pressure.  

I'd suggest someone w/ more bandwidth than my workstation try it, though.

-JimC
-- 
James H. Cloos, Jr.  <URL:http://jhcloos.com/public_key> 1024D/ED7DAEA6 
<cloos@jhcloos.com>  E9E9 F828 61A4 6EA9 0F2B  63E7 997A 9F17 ED7D AEA6
     Check out TGC:  <URL:http://jhcloos.com/go?tgc>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11  5:10                 ` Linus Torvalds
  2000-05-11 10:09                   ` James H. Cloos Jr.
@ 2000-05-11 17:25                   ` Juan J. Quintela
  2000-05-11 23:25                   ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
  2 siblings, 0 replies; 67+ messages in thread
From: Juan J. Quintela @ 2000-05-11 17:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James H. Cloos Jr., linux-mm, linux-kernel

>>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:

Hi

linus> On 11 May 2000, Juan J. Quintela wrote:
>> 
>> I have done my normal mmap002 test and this goes slower than
>> ever, it takes something like 3m50 seconds to complete, (pre7-8 2m50,
>> andrea classzone 2m8, and 2.2.15 1m55 for reference).

linus> Note that the mmap002 test is avery bad performance test.

Yes, I know, I included it in the memtest suite not like a benchmark.
I put the time results only for comparison.  The important thing is
that if we are running a memory hog like mmap002, we have very bad
interactive performance.  We swap the incorrect aplications (i.e. no
mmap002 data).

More in this sense is the test mmap001,  this is one test that *only*
mmaps a file the size of the physical memory and writes it (only one
pass).  Then closes the file.  With that test in pre7-9 I got load 14
and dropouts in sound (MP3 playing) of more than one second.  And the
interactive performance is *ugly*.  The system is unresponsive while I
run that,  I am *unable* to change Desktops with the keyboard.  You
don't want to know about the jumps of the mouse.  I think that we need
to solve that problems.  I don't mind that that aplication goes
slower, but it can got so much CPU/memory.  My system here is an
Athlon 500Mhz with 256MB of RAM.  This system is unable to write an
mmaped file of 256MB char by char.  That sound bad from my point of
view.

The tests in memtest try to found problems like that.  I am sorry if
it appears that I talk about raw clock time (re-reading my post I see
that I made that point very *unclear*, sorry for the confusion).

linus> Why?

linus> Because it's a classic "walk a large array in order" test, which means
linus> that the worst possible order to page things out in is LRU.

Yes, I know that we don't want to optimise for that things, but is not
good also that one of that things can got our server to its knees.

linus> So toreally speed up mmap002, the best approach is to try to be as non-LRU
linus> as possible, which is obviously the wrong thing to do in real life. So in
linus> that sense optimizing mmap002 is a _bad_ thing.

I don't want to optimize for mmap002, but mmap002 don't touch his
pages in a long time, then its pages must be swaped out, and when
touched again, swaped in.  This is not what appears to happen here.

linus> What I found interesting was how the non-waiting version seemed to have
linus> the actual _disk_ throughput a lot higher. That's much harder to measure,
linus> and I don't have good numbers for it, the best I can say is that it causes
linus> my ncr SCSI controller to complain about too deep queueing depths, which
linus> is a sure sign that we're driving the IO layer hard. Which is a good
linus> thingwhen you measure how efficiently you page things in and out..

I think that the problem is that we are not agresive enough to swap
pages that can be swaped and then, in one moment we are unable to find
*any* memory.

linus> But don't look at wall-clock times for mmap002. 

Yes, I know, sorry again for the confusion.  And thanks for all your
comments, I appreciate them very much.

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [patch] balanced highmem subsystem under pre7-9
  2000-05-11  5:10                 ` Linus Torvalds
  2000-05-11 10:09                   ` James H. Cloos Jr.
  2000-05-11 17:25                   ` Juan J. Quintela
@ 2000-05-11 23:25                   ` Ingo Molnar
  2000-05-11 23:46                     ` Linus Torvalds
                                       ` (2 more replies)
  2 siblings, 3 replies; 67+ messages in thread
From: Ingo Molnar @ 2000-05-11 23:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: MM mailing list, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2431 bytes --]

IMO high memory should not be balanced. Stock pre7-9 tried to balance high
memory once it got below the treshold (causing very bad VM behavior and
high kswapd usage) - this is incorrect because there is nothing special
about the highmem zone, it's more like an 'extension' of the normal zone,
from which specific caches can turn. (patch attached)

another problem is that even during a mild test the DMA zone gets emptied
easily - but on a big RAM box kswapd has to work _alot_ to fill it up. In
fact on an 8GB box it's completely futile to fill up the DMA zone. What
worked for me is this zone-chainlist trick in the zone setup code:

                        case ZONE_NORMAL:
                                zone = pgdat->node_zones + ZONE_NORMAL;
                                if (zone->size)
                                        zonelist->zones[j++] = zone;
++                              break;
                        case ZONE_DMA:
                                zone = pgdat->node_zones + ZONE_DMA;
                                if (zone->size)
                                        zonelist->zones[j++] = zone;

no 'normal' allocation chain leads to the ZONE_DMA zone, except GFP_DMA
and GFP_ATOMIC - both of them rightfully access the DMA zone.

this is a RL problem, without the above a 8GB box under load crashes
pretty quickly due to failed SCSI-layer DMA allocations. (i think those
allocations are silly in the first place.)

the above is suboptimal on boxes which have total RAM within one order of
magnitude of 16MB (the DMA zone stays empty most of the time and is
unaccessible to various caches) - so maybe the following (not yet
implemented) solution would be generic and acceptable:

allocate 5% of total RAM or 16MB to the DMA zone (via fixing up zone sizes
on bootup), whichever is smaller, in 2MB increments. Disadvantage of this
method: eg. it wastes 2MB RAM on a 8MB box. We could probably live with
64kb increments (there are 64kb ISA DMA constraints the sound drivers and
some SCSI drivers are hitting) - is this really true? If nobody objects
i'll implement this later one (together with the assymetric allocation
chain trick) - there will be a 64kb DMA pool allocated on the smallest
boxes, which should be acceptable even on a 4MB box. We could turn off the
DMA zone altogether on most boxes, if it wasnt for the SCSI layer
allocating DMA pages even for PCI drivers ...

Comments?

	Ingo

[-- Attachment #2: Type: TEXT/PLAIN, Size: 642 bytes --]

--- linux/mm/page_alloc.c.orig	Thu May 11 02:10:34 2000
+++ linux/mm/page_alloc.c	Thu May 11 16:03:48 2000
@@ -553,9 +566,14 @@
 			mask = zone_balance_min[j];
 		else if (mask > zone_balance_max[j])
 			mask = zone_balance_max[j];
-		zone->pages_min = mask;
-		zone->pages_low = mask*2;
-		zone->pages_high = mask*3;
+		if (j == ZONE_HIGHMEM) {
+			zone->pages_low = zone->pages_high =
+						zone->pages_min = 0;
+		} else {
+			zone->pages_min = mask;
+			zone->pages_low = mask*2;
+			zone->pages_high = mask*3;
+		}
 		zone->low_on_memory = 0;
 		zone->zone_wake_kswapd = 0;
 		zone->zone_mem_map = mem_map + offset;

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-11 23:25                   ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
@ 2000-05-11 23:46                     ` Linus Torvalds
  2000-05-12  0:08                       ` Ingo Molnar
  2000-05-12  9:02                     ` Christoph Rohland
  2000-05-12 10:57                     ` Andrea Arcangeli
  2 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-11 23:46 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: MM mailing list, linux-kernel

On Fri, 12 May 2000, Ingo Molnar wrote:
> 
> IMO high memory should not be balanced. Stock pre7-9 tried to balance high
> memory once it got below the treshold (causing very bad VM behavior and
> high kswapd usage) - this is incorrect because there is nothing special
> about the highmem zone, it's more like an 'extension' of the normal zone,
> from which specific caches can turn. (patch attached)

Hmm.. I think the patch is wrong. It's much easier to make

	zone_balance_max[HIGHMEM] = 0;

and that will do the same thing, no?

> another problem is that even during a mild test the DMA zone gets emptied
> easily - but on a big RAM box kswapd has to work _alot_ to fill it up. In
> fact on an 8GB box it's completely futile to fill up the DMA zone. What
> worked for me is this zone-chainlist trick in the zone setup code:

Ok. This is a real problem. My inclination would be to say that your patch
is right, but only for large-memory configurations. Ie just say that if
the dang machine has more than half a gig of memory, we shouldn't touch
the 16 low megs at all unless explicitly asked for.

But the static thing ("never touch ZONE_DMA" when doing a normal
allocation) is obviously bogus on smaller-memory machines. So make it
conditional. 

> allocate 5% of total RAM or 16MB to the DMA zone (via fixing up zone sizes
> on bootup), whichever is smaller, in 2MB increments. Disadvantage of this
> method: eg. it wastes 2MB RAM on a 8MB box.

This may be part of the solution - make it more gradual than a complete
cut-off at some random point (eg half a gig).

After all, this is why we zoned memory in the first place, so I think it
makes sense to be much more dynamic with the zones.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-11 23:46                     ` Linus Torvalds
@ 2000-05-12  0:08                       ` Ingo Molnar
  2000-05-12  0:15                         ` Ingo Molnar
  0 siblings, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12  0:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: MM mailing list, linux-kernel, Alan Cox

On Thu, 11 May 2000, Linus Torvalds wrote:

> > IMO high memory should not be balanced. Stock pre7-9 tried to balance high
> > memory once it got below the treshold (causing very bad VM behavior and
> > high kswapd usage) - this is incorrect because there is nothing special
> > about the highmem zone, it's more like an 'extension' of the normal zone,
> > from which specific caches can turn. (patch attached)
> 
> Hmm.. I think the patch is wrong. It's much easier to make

yep, it does work (and fixes the 'kswapd storm'), but it's wrong.

> 	zone_balance_max[HIGHMEM] = 0;
> 
> and that will do the same thing, no?

yep - or in fact just changing the constant initialization to ', 0 } ',
right?

> > another problem is that even during a mild test the DMA zone gets emptied
> > easily - but on a big RAM box kswapd has to work _alot_ to fill it up. In
> > fact on an 8GB box it's completely futile to fill up the DMA zone. What
> > worked for me is this zone-chainlist trick in the zone setup code:
> 
> Ok. This is a real problem. My inclination would be to say that your patch
> is right, but only for large-memory configurations. Ie just say that if
> the dang machine has more than half a gig of memory, we shouldn't touch
> the 16 low megs at all unless explicitly asked for.

i think there are two fundamental problems here:

	1) highmem should not be balanced (period)

	2) once all easily allocatable RAM is gone to some high-flux
	   allocator, the DMA zone is emptied at last and is never
	   refilled effectively, causing a pointless 'kswapd storm' again.

1) is more or less trivially solved by fixing zone_balance_max[]
initialization. 2):

> > allocate 5% of total RAM or 16MB to the DMA zone (via fixing up zone sizes
> > on bootup), whichever is smaller, in 2MB increments. Disadvantage of this
> > method: eg. it wastes 2MB RAM on a 8MB box.
> 
> This may be part of the solution - make it more gradual than a complete
> cut-off at some random point (eg half a gig).
> 
> After all, this is why we zoned memory in the first place, so I think it
> makes sense to be much more dynamic with the zones.

ok, so the rule would be to put:

	zone_dma_size := max(total_pages/32,16MB) &~(64k-1) + 64k

pages into the DMA zone, do the normal zone from this point up to highmem.
This gradually (linearly) increases the DMA zone's size from 64k on 1MB
boxes to 16MB on 512MB boxes and up. (in steps of 64k) This not only
serves as a DMA pool, but as an atomic allocation pool as well (which was
an ever burning problem on low memory NFS boxes).

i hope nothing relies on getting better than 64k physically aligned pages?

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12  0:08                       ` Ingo Molnar
@ 2000-05-12  0:15                         ` Ingo Molnar
  0 siblings, 0 replies; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12  0:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: MM mailing list, linux-kernel, Alan Cox

> 	zone_dma_size := max(total_pages/32,16MB) &~(64k-1) + 64k
                         ^^^min

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-11 23:25                   ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
  2000-05-11 23:46                     ` Linus Torvalds
@ 2000-05-12  9:02                     ` Christoph Rohland
  2000-05-12  9:56                       ` Ingo Molnar
  2000-05-12 16:12                       ` Linus Torvalds
  2000-05-12 10:57                     ` Andrea Arcangeli
  2 siblings, 2 replies; 67+ messages in thread
From: Christoph Rohland @ 2000-05-12  9:02 UTC (permalink / raw)
  To: mingo; +Cc: Linus Torvalds, MM mailing list, linux-kernel

Hi Ingo,

Your patch breaks my tests again (Which run fine for some time now on
pre7):

11  1  0     0 1631764   1796  12840   0   0     0     2  115 57045   4  95   1
10  3  0     0 1420616   1796  12840   0   0     0     0  120 55463   5  95   1
9  3  0      0 998032   1796  12840   0   0     0     2  111 49490   4  96   1
VM: killing process bash
VM: killing process ipctst
VM: killing process ipctst

Greetings
		Christoph

-- 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12  9:02                     ` Christoph Rohland
@ 2000-05-12  9:56                       ` Ingo Molnar
  2000-05-12 11:49                         ` Christoph Rohland
  2000-05-12 16:12                       ` Linus Torvalds
  1 sibling, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12  9:56 UTC (permalink / raw)
  To: Christoph Rohland; +Cc: Linus Torvalds, MM mailing list, linux-kernel

On 12 May 2000, Christoph Rohland wrote:

> Hi Ingo,
> 
> Your patch breaks my tests again (Which run fine for some time now on
> pre7):
> 
> 11  1  0     0 1631764   1796  12840   0   0     0     2  115 57045   4  95   1
> 10  3  0     0 1420616   1796  12840   0   0     0     0  120 55463   5  95   1
> 9  3  0      0 998032   1796  12840   0   0     0     2  111 49490   4  96   1
> VM: killing process bash
> VM: killing process ipctst
> VM: killing process ipctst

hm, IMHO it really does nothing that should make memory balance worse.
Does the stock kernel work even after a long test?

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12  9:56                       ` Ingo Molnar
@ 2000-05-12 11:49                         ` Christoph Rohland
  0 siblings, 0 replies; 67+ messages in thread
From: Christoph Rohland @ 2000-05-12 11:49 UTC (permalink / raw)
  To: mingo; +Cc: Linus Torvalds, MM mailing list, linux-kernel

Ingo Molnar <mingo@elte.hu> writes:

> > VM: killing process ipctst
> 
> hm, IMHO it really does nothing that should make memory balance worse.
> Does the stock kernel work even after a long test?

No, I just ran a longer test. It does begin to swap out but later I
also get the following messages. (But your version does not swap out
at all without killing processes):

7  9  1 558816   3844    100  13096 266 9400   102  2361 10000  1611   0  99 1
VM: killing process ipctst
3 11  1 589464   5724    120  13044 321 6340    88  1587 4414  1404   0  99  1

Woops: just this moment I also got:
exec.c:265: bad pte f1d4dff8(0000000000104025).

Greetings
		Christoph
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12  9:02                     ` Christoph Rohland
  2000-05-12  9:56                       ` Ingo Molnar
@ 2000-05-12 16:12                       ` Linus Torvalds
  1 sibling, 0 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-12 16:12 UTC (permalink / raw)
  To: Christoph Rohland; +Cc: mingo, MM mailing list, linux-kernel

On 12 May 2000, Christoph Rohland wrote:
> 
> Your patch breaks my tests again (Which run fine for some time now on
> pre7):

Notsurprising, actually.

Never balancing highmem pages will also mean that they never get swapped
out. Which makes sense - why should we try to page anything out if we're
not interested in having any free pages for that zone?

So at some point the VM subsystem will just give up: 90% of the pages it
sees are unswappable, and it still cannot make room to free pages..

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-11 23:25                   ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
  2000-05-11 23:46                     ` Linus Torvalds
  2000-05-12  9:02                     ` Christoph Rohland
@ 2000-05-12 10:57                     ` Andrea Arcangeli
  2000-05-12 12:11                       ` Ingo Molnar
  2 siblings, 1 reply; 67+ messages in thread
From: Andrea Arcangeli @ 2000-05-12 10:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, MM mailing list, linux-kernel

On Fri, 12 May 2000, Ingo Molnar wrote:

>IMO high memory should not be balanced. Stock pre7-9 tried to balance high
>memory once it got below the treshold (causing very bad VM behavior and
>high kswapd usage) - this is incorrect because there is nothing special
>about the highmem zone, it's more like an 'extension' of the normal zone,
>from which specific caches can turn. (patch attached)

IMHO that is an hack to workaround the currently broken design of the MM.
And it will also produce bad effect since you won't age the recycle the
cache in the highmem zone correctly.

Without classzone design you will always have kswapd and the page
allocator that shrink memory even if not necessary. Please check as
reference the very detailed explanation I posted around two weeks ago on
linux-mm in reply to Linus.

What you're trying to workaround on the highmem part is exactly the same
problem you also have between the normal zone and the dma zone. Why don't
you also just take 3mbyte always free from the dma zone and you never
shrink the normal zone?

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 10:57                     ` Andrea Arcangeli
@ 2000-05-12 12:11                       ` Ingo Molnar
  2000-05-12 12:57                         ` Andrea Arcangeli
  0 siblings, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 12:11 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, MM mailing list, linux-kernel

On Fri, 12 May 2000, Andrea Arcangeli wrote:

> >IMO high memory should not be balanced. Stock pre7-9 tried to balance high
> >memory once it got below the treshold (causing very bad VM behavior and
> >high kswapd usage) - this is incorrect because there is nothing special
> >about the highmem zone, it's more like an 'extension' of the normal zone,
> >from which specific caches can turn. (patch attached)
> 
> IMHO that is an hack to workaround the currently broken design of the MM.
> And it will also produce bad effect since you won't age the recycle the
> cache in the highmem zone correctly.

what bad effects? the LRU list of the pagecache is a completely
independent mechanizm. Highmem pages are LRU-freed just as effectively as
normal pages. The pagecache LRU list is not per-zone but (IMHO correctly)
global, so the particular zone of highmem pages is completely transparent
and irrelevant to the LRU mechanizm. I cannot see any bad effects wrt. LRU
recycling and the highmem zone here. (let me know if you ment some
different recycling mechanizm)

> What you're trying to workaround on the highmem part is exactly the
> same problem you also have between the normal zone and the dma zone.
> Why don't you also just take 3mbyte always free from the dma zone and
> you never shrink the normal zone?

i'm not working around anything. Highmem _should not be balanced_, period.
It's a superset of normal memory, and by just balancing normal memory (and
adding highmem free count to the total) we are completely fine. Highmem is
also a temporary phenomenon, it will probably disappear in a few years
once 64-bit systems and proper 64-bit DMA becomes commonplace. (and small
devices will do 32-bit + 32-bit DMA.)

'balanced' means: 'keep X amount of highmem free'. What is your point in
keeping free highmem around?

the DMA zone resizing suggestion from yesterday is i believe conceptually
correct as well, _want to_ isolate normal allocators from these 'emergency
pools'. IRQ handlers cannot wait for more free RAM.

about classzone. This was the initial idea how to do balancing when the
zoned allocator was implemented (along with per-zone kswapd threads or
per-zone queues), but it just gets too complex IMHO. Why dont you give the
simpler suggestion from yesterday a thought? We have only one zone
essentially which has to be balanced, ZONE_NORMAL. ZONE_DMA is and should
become special because it also serves as an atomic pool for IRQ
allocations. (ZONE_HIGHMEM is special and uninteresting as far as memory
balance goes, as explained above.) So we only have ZONE_NORMAL to worry
about. Zonechains are perfect ways of defining fallback routes.

i've had a nicely balanced (heavily loaded) 8GB box for the past couple of
weeks, just by doing (yesterday's) slight trivial changes to the
zone-chains and watermarks. The default settings in the stock kernel were
not tuned, but all the mechanizm is there. LRU is working, there was
always DMA RAM around, no classzones necessery here. So what is exactly
the case you are trying to balance?

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 12:11                       ` Ingo Molnar
@ 2000-05-12 12:57                         ` Andrea Arcangeli
  2000-05-12 13:20                           ` Rik van Riel
  0 siblings, 1 reply; 67+ messages in thread
From: Andrea Arcangeli @ 2000-05-12 12:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, MM mailing list, linux-kernel

On Fri, 12 May 2000, Ingo Molnar wrote:

>what bad effects? the LRU list of the pagecache is a completely
>independent mechanizm. Highmem pages are LRU-freed just as effectively as
>normal pages. The pagecache LRU list is not per-zone but (IMHO correctly)
>global, so the particular zone of highmem pages is completely transparent

It shouldn't be global but per-NUMA-node as I have in the classzone patch.

>and irrelevant to the LRU mechanizm. I cannot see any bad effects wrt. LRU
>recycling and the highmem zone here. (let me know if you ment some
>different recycling mechanizm)

See line 320 of filemap.c in 2.3.99-pre7-pre9. (ignore the fact it will
recycle 1 page, it's just because they didn't expected pages_high to be
zero)

>'balanced' means: 'keep X amount of highmem free'. What is your point in
>keeping free highmem around?

Assuming there is no point, you still want to free also from the highmem
zone while doing LRU aging of the cache.

And if you don't keep X amount of highmem free you'll break if an irq will
do a GFP_HIGHMEM allocation.

Note also that with highmem I don't mean not the memory between 1giga and
64giga, but the memory between 0 and 64giga. When you allocate with
GFP_HIGHUSER you ask to the MM a page between 0 and 64giga.

And in turn what is the point of keeping X amount of normal/regular memory
free? You just try to keep such X amount of memory free in the DMA zone,
so why you also try to keep it free on the normal zone? The problem is the
same.

Please read my emails on linux-mm of a few weeks ago about classzone
approch. I can forward them to linux-kernel if there is interest (I don't
know if there's a web archive but I guess there is).

If the current strict zone approch wouldn't be broken we could as well
choose to split the ZONE_HIGHMEM in 10/20 zones to scales 10/20 times
better during allocations, no? Is this argulemnt enough to make you to at
least ring a bell that the current design is flawed? The flaw is that we
pay that with drawbacks and by having the VM that does the wrong thing
because it have no enough information (it only see a little part of the
picture). You can't fix it without looking the whole picture (the
classzone).

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 12:57                         ` Andrea Arcangeli
@ 2000-05-12 13:20                           ` Rik van Riel
  2000-05-12 16:40                             ` Ingo Molnar
  0 siblings, 1 reply; 67+ messages in thread
From: Rik van Riel @ 2000-05-12 13:20 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, Linus Torvalds, MM mailing list, linux-kernel

On Fri, 12 May 2000, Andrea Arcangeli wrote:
> On Fri, 12 May 2000, Ingo Molnar wrote:
> 
> >what bad effects? the LRU list of the pagecache is a completely
> >independent mechanizm. Highmem pages are LRU-freed just as effectively as
> >normal pages. The pagecache LRU list is not per-zone but (IMHO correctly)
> >global, so the particular zone of highmem pages is completely transparent
> 
> It shouldn't be global but per-NUMA-node as I have in the classzone patch.

*nod*

This change is in my source tree too (but the active/inactive
page list thing doesn't work yet).

> >and irrelevant to the LRU mechanizm. I cannot see any bad effects wrt. LRU
> >recycling and the highmem zone here. (let me know if you ment some
> >different recycling mechanizm)
> 
> See line 320 of filemap.c in 2.3.99-pre7-pre9. (ignore the fact
> it will recycle 1 page, it's just because they didn't expected
> pages_high to be zero)

Indeed, pages_high for the higmem zone probably shouldn't be zero.

pages_min and pages_low:  0
pages_high:               128???  (free up to 512kB of high memory)

> >'balanced' means: 'keep X amount of highmem free'. What is your point in
> >keeping free highmem around?
> 
> Assuming there is no point, you still want to free also from the
> highmem zone while doing LRU aging of the cache.

True, but this just involves setting the watermarks right. The
current code supports the balancing just fine.

> And if you don't keep X amount of highmem free you'll break if
> an irq will do a GFP_HIGHMEM allocation.

GFP_HIGHMEM will automatically fallback to the NORMAL zone.
There's no problem here.

> Note also that with highmem I don't mean not the memory between
> 1giga and 64giga, but the memory between 0 and 64giga.

Why do you keep insisting on meaning other things with words than
what everybody else means with them? ;)

> Please read my emails on linux-mm of a few weeks ago about
> classzone approch.

I've read them and it's overly complex and doesn't make much
sense for what we need.

> I can forward them to linux-kernel if there is interest (I don't
> know if there's a web archive but I guess there is).

http://mail.nl.linux.org/linux-mm/
http://www.linux.eu.org/Linux-MM/

> If the current strict zone approch wouldn't be broken we could
> as well choose to split the ZONE_HIGHMEM in 10/20 zones to
> scales 10/20 times better during allocations, no?

This would work just fine, except for the fact that we have
only one pagecache_lock ... maybe we want to have multiple
pagecache_locks based on a hash of the inode number? ;)

> Is this argulemnt enough to make you to at least ring a bell
> that the current design is flawed?

But we *can* split the HIGHMEM zone into a bunch of smaller
ones without affecting performance. Just set zone->pages_min
and zone->pages_low to 0 and zone->pages_high to some smallish
value. Then we can teach the allocator to skip the zone if:
1) no obscenely large amount of free pages
2) zone is locked by somebody else (TryLock(zone->lock))

This will work just fine with the current code (plus these
two minor tweaks). No big changes are needed to support this
idea.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 13:20                           ` Rik van Riel
@ 2000-05-12 16:40                             ` Ingo Molnar
  2000-05-12 17:15                               ` Rik van Riel
                                                 ` (2 more replies)
  0 siblings, 3 replies; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 16:40 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrea Arcangeli, Linus Torvalds, MM mailing list, linux-kernel

On Fri, 12 May 2000, Rik van Riel wrote:

> But we *can* split the HIGHMEM zone into a bunch of smaller
> ones without affecting performance. Just set zone->pages_min
> and zone->pages_low to 0 and zone->pages_high to some smallish
> value. Then we can teach the allocator to skip the zone if:
> 1) no obscenely large amount of free pages
> 2) zone is locked by somebody else (TryLock(zone->lock))

whats the point of this splitup? (i suspect there is a point, i just
cannot see it now. thanks.)

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 16:40                             ` Ingo Molnar
@ 2000-05-12 17:15                               ` Rik van Riel
  2000-05-12 18:15                               ` Linus Torvalds
  2000-05-19  1:58                               ` Andrea Arcangeli
  2 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-12 17:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrea Arcangeli, Linus Torvalds, MM mailing list, linux-kernel

On Fri, 12 May 2000, Ingo Molnar wrote:
> On Fri, 12 May 2000, Rik van Riel wrote:
> 
> > But we *can* split the HIGHMEM zone into a bunch of smaller
> > ones without affecting performance. Just set zone->pages_min
> > and zone->pages_low to 0 and zone->pages_high to some smallish
> > value. Then we can teach the allocator to skip the zone if:
> > 1) no obscenely large amount of free pages
> > 2) zone is locked by somebody else (TryLock(zone->lock))
> 
> whats the point of this splitup? (i suspect there is a point, i
> just cannot see it now. thanks.)

There's not much point in doing so. This is basically
just a reply to Andrea's "but you can't do _this_ with
the current approach" remark ;)

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 16:40                             ` Ingo Molnar
  2000-05-12 17:15                               ` Rik van Riel
@ 2000-05-12 18:15                               ` Linus Torvalds
  2000-05-12 18:53                                 ` Ingo Molnar
  2000-05-19  1:58                               ` Andrea Arcangeli
  2 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-12 18:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel

Ingo, one thing struck me.. Have you actually tested unmodified 99-pre7?

You said that you've been running the "standard kernel with the highmem
modification" for a few weeks on a 8GB machine, and that makes me wonder
if you maybe didn't even try pre7 without your mod?

What _used_ to happen with multi-zone setups was that if on ezone started
to need balancing, you got a lot of page-out activity in the other zones
too, because vmscan would _only_ look at the LRU information, and would
happily page stuff out from the zones that weren't affected at all. On a
highmem machine this means, for example, that if the regular memory zone
(or the DMA zone) got under pressure, we would start paging out highmem
pages too as we encountered them in vmscan.

With such a setup, your patch makes lots of sense - trying to decouple the
highmem zone as much as possible. But the more recent kernels should be
better at not touching zones that don't need touching (it will still
change the LRU information, though).

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 18:15                               ` Linus Torvalds
@ 2000-05-12 18:53                                 ` Ingo Molnar
  2000-05-12 19:06                                   ` Linus Torvalds
  0 siblings, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 18:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1186 bytes --]


On Fri, 12 May 2000, Linus Torvalds wrote:

> With such a setup, your patch makes lots of sense - trying to decouple
> the highmem zone as much as possible. But the more recent kernels
> should be better at not touching zones that don't need touching (it
> will still change the LRU information, though).

i initially tested pre7-9 and it showed bad behavior: high kswapd activity
trying to balance highmem, while the pagecache is primarily filled from
the highmem. I dont think this can be fixed without 'silencing'
ZONE_HIGHMEM's balancing activities: the pagecache allocates from highmem
so it puts direct pressure on the highmem zone.

This had two effects: wasted CPU time, but it also limited the
page-cache's maximum size to the size of highmem. I'll try the final
pre7-2.3.99 kernel as well in a minute to make sure. (i think the bad
behavior is still be there, judging from the differences between pre9 and
the final patch.)

(i've attached a patch against final-pre7, which is not complete and which
i'm not yet happy about (the kernel shows bad behavior if lots of dirty
data is generated by many processes), but it shows eg. the highmem.c
cleanup that is possible.)

	Ingo

[-- Attachment #2: Type: TEXT/PLAIN, Size: 5134 bytes --]

--- linux/mm/page_alloc.c.orig	Fri May 12 08:45:17 2000
+++ linux/mm/page_alloc.c	Fri May 12 09:14:58 2000
@@ -29,9 +29,9 @@
 pg_data_t *pgdat_list = (pg_data_t *)0;
 
 static char *zone_names[MAX_NR_ZONES] = { "DMA", "Normal", "HighMem" };
-static int zone_balance_ratio[MAX_NR_ZONES] = { 128, 128, 128, };
-static int zone_balance_min[MAX_NR_ZONES] = { 10 , 10, 10, };
-static int zone_balance_max[MAX_NR_ZONES] = { 255 , 255, 255, };
+static int zone_balance_ratio[MAX_NR_ZONES] = { 128, 128, 1, };
+static int zone_balance_min[MAX_NR_ZONES] = { 10 , 10, 0, };
+static int zone_balance_max[MAX_NR_ZONES] = { 255 , 255, 0, };
 
 /*
  * Free_page() adds the page to the free lists. This is optimized for
@@ -271,7 +271,10 @@
 	if (!(current->flags & PF_MEMALLOC)) {
 		int gfp_mask = zonelist->gfp_mask;
 		if (!try_to_free_pages(gfp_mask)) {
-			if (!(gfp_mask & __GFP_HIGH))
+			/*
+			 * Non-highprio allocations fail here:
+			 */
+			if (!(gfp_mask & __GFP_PRIO))
 				goto fail;
 		}
 	}
@@ -440,6 +443,9 @@
 				zone = pgdat->node_zones + ZONE_NORMAL;
 				if (zone->size)
 					zonelist->zones[j++] = zone;
+				if ((i && __GFP_WAIT) || !(i && __GFP_PRIO) ||
+						(i && __GFP_IO))
+					break;
 			case ZONE_DMA:
 				zone = pgdat->node_zones + ZONE_DMA;
 				if (zone->size)
--- linux/mm/highmem.c.orig	Fri May 12 09:16:25 2000
+++ linux/mm/highmem.c	Fri May 12 09:27:14 2000
@@ -66,6 +66,13 @@
 	return new_page;
 }
 
+/*
+ * Special zonelist so we can just query the highmem pool and
+ * return immediately if there is no highmem page free.
+ */
+static zonelist_t high_zonelist =
+	{ { NODE_DATA(0)->node_zones + ZONE_HIGHMEM, NULL, }, __GFP_HIGHMEM };
+
 struct page * replace_with_highmem(struct page * page)
 {
 	struct page *highpage;
@@ -74,13 +81,11 @@
 	if (PageHighMem(page) || !nr_free_highpages())
 		return page;
 
-	highpage = alloc_page(GFP_ATOMIC|__GFP_HIGHMEM);
+	highpage = __alloc_pages(&high_zonelist, 0);
 	if (!highpage)
 		return page;
-	if (!PageHighMem(highpage)) {
-		__free_page(highpage);
-		return page;
-	}
+	if (!PageHighMem(highpage))
+		BUG();
 
 	vaddr = kmap(highpage);
 	copy_page((void *)vaddr, (void *)page_address(page));
--- linux/include/linux/mm.h.orig	Fri May 12 08:46:55 2000
+++ linux/include/linux/mm.h	Fri May 12 09:27:56 2000
@@ -471,33 +471,49 @@
  * GFP bitmasks..
  */
 #define __GFP_WAIT	0x01
-#define __GFP_HIGH	0x02
+#define __GFP_PRIO	0x02
 #define __GFP_IO	0x04
+/*
+ * indicates that the buffer will be suitable for DMA.  Ignored on some
+ * platforms, used as appropriate on others
+ */
 #define __GFP_DMA	0x08
+
+/*
+ * indicates that the buffer can be taken from high memory,
+ * which is not permanently mapped by the kernel
+ */
 #ifdef CONFIG_HIGHMEM
 #define __GFP_HIGHMEM	0x10
 #else
 #define __GFP_HIGHMEM	0x0 /* noop */
 #endif
 
-
-#define GFP_BUFFER	(__GFP_HIGH | __GFP_WAIT)
-#define GFP_ATOMIC	(__GFP_HIGH)
-#define GFP_USER	(__GFP_WAIT | __GFP_IO)
-#define GFP_HIGHUSER	(GFP_USER | __GFP_HIGHMEM)
-#define GFP_KERNEL	(__GFP_HIGH | __GFP_WAIT | __GFP_IO)
-#define GFP_NFS		(__GFP_HIGH | __GFP_WAIT | __GFP_IO)
-#define GFP_KSWAPD	(__GFP_IO)
-
-/* Flag - indicates that the buffer will be suitable for DMA.  Ignored on some
-   platforms, used as appropriate on others */
-
-#define GFP_DMA		__GFP_DMA
-
-/* Flag - indicates that the buffer can be taken from high memory which is not
-   permanently mapped by the kernel */
-
-#define GFP_HIGHMEM	__GFP_HIGHMEM
+/*
+ * The 5 GFP bits:
+ *	( __GFP_WAIT | __GFP_PRIO | __GFP_IO | __GFP_DMA | __GFP_HIGHMEM )
+ *
+ * The most typical combinations:
+ */
+
+#define GFP_BUFFER   \
+	( __GFP_WAIT | __GFP_PRIO | 0        | 0         | 0             )
+#define GFP_ATOMIC   \
+	( 0          | __GFP_PRIO | 0        | 0         | 0             )
+#define GFP_USER     \
+	( __GFP_WAIT | 0          | __GFP_IO | 0         | 0             )
+#define GFP_HIGHUSER \
+	( __GFP_WAIT | 0          | __GFP_IO | 0         | __GFP_HIGHMEM )
+#define GFP_KERNEL   \
+	( __GFP_WAIT | __GFP_PRIO | __GFP_IO | 0         | 0             )
+#define GFP_NFS      \
+	( __GFP_WAIT | __GFP_PRIO | __GFP_IO | 0         | 0             )
+#define GFP_KSWAPD   \
+	( 0          | 0          | __GFP_IO | 0         | 0             )
+#define GFP_DMA      \
+	( 0          | 0          | 0        | __GFP_DMA | 0             )
+#define GFP_HIGHMEM  \
+	( 0          | 0          | 0        | 0         | __GFP_HIGHMEM )
 
 /* vma is the first one with  address < vma->vm_end,
  * and even  address < vma->vm_start. Have to extend vma. */
--- linux/include/linux/slab.h.orig	Fri May 12 09:05:15 2000
+++ linux/include/linux/slab.h	Fri May 12 09:27:56 2000
@@ -22,7 +22,7 @@
 #define	SLAB_NFS		GFP_NFS
 #define	SLAB_DMA		GFP_DMA
 
-#define SLAB_LEVEL_MASK		(__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_HIGHMEM)
+#define SLAB_LEVEL_MASK		(__GFP_WAIT|__GFP_PRIO|__GFP_IO|__GFP_HIGHMEM)
 #define	SLAB_NO_GROW		0x00001000UL	/* don't grow a cache */
 
 /* flags to pass to kmem_cache_create().

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 18:53                                 ` Ingo Molnar
@ 2000-05-12 19:06                                   ` Linus Torvalds
  2000-05-12 19:36                                     ` Ingo Molnar
                                                       ` (2 more replies)
  0 siblings, 3 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-12 19:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel

On Fri, 12 May 2000, Ingo Molnar wrote:
> 
> i initially tested pre7-9 and it showed bad behavior: high kswapd activity
> trying to balance highmem, while the pagecache is primarily filled from
> the highmem. I dont think this can be fixed without 'silencing'
> ZONE_HIGHMEM's balancing activities: the pagecache allocates from highmem
> so it puts direct pressure on the highmem zone.

If this is true, then that is a bug in the allocator.

I tried very hard (but must obviously have failed), to make the allocator
_always_ do the right thing - never allocating from a zone that causes
memory balancing if there is another zone that is preferable. 

> This had two effects: wasted CPU time, but it also limited the
> page-cache's maximum size to the size of highmem. I'll try the final
> pre7-2.3.99 kernel as well in a minute to make sure. (i think the bad
> behavior is still be there, judging from the differences between pre9 and
> the final patch.)

Please fix the memory allocator instead. It should really go to the next
zone instead of allocating more from the highmem zone.

Actually, I think the real bug is kswapd - I thought the "for (;;)" loop
was a good idea, but I've since actually thought about it more, and in
real life we really just want to go to sleep when we need to re-schedule,
because if there is any _real_ memory pressure people _will_ wake us up
anyway. So before you touch the memory allocator logic, you might want to
change the

	if (tsk->need_resched)
		schedule();

to a 

	if (tsk->need_resched)
		goto sleep;

(and add a "sleep:" thing to inside the if-statement that makes us go to
sleep). That way, if we end up scheduling away from kswapd, we won't waste
time scheduling back unless we really should.

But do check out __alloc_pages() too, maybe you see some obvious bug of
mine that I just never thought about.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 19:06                                   ` Linus Torvalds
@ 2000-05-12 19:36                                     ` Ingo Molnar
  2000-05-12 19:40                                     ` Ingo Molnar
  2000-05-12 19:54                                     ` Ingo Molnar
  2 siblings, 0 replies; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 19:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel

On Fri, 12 May 2000, Linus Torvalds wrote:

> > i initially tested pre7-9 and it showed bad behavior: high kswapd activity
> > trying to balance highmem, while the pagecache is primarily filled from
> > the highmem. I dont think this can be fixed without 'silencing'
> > ZONE_HIGHMEM's balancing activities: the pagecache allocates from highmem
> > so it puts direct pressure on the highmem zone.
> 
> If this is true, then that is a bug in the allocator.

i just re-checked final pre7-2.3.99, and saw similar behavior. Once
ZONE_HIGHMEM is empty kswapd eats ~6% CPU time (constantly running),
highmem freecount (in /proc/meminfo) fluctuating slightly above zero, but
pagecache is not growing anymore - although there is still lots of
ZONE_NORMAL RAM around.

> anyway. So before you touch the memory allocator logic, you might want to
> change the
> 
> 	if (tsk->need_resched)
> 		schedule();
> 
> to a 
> 
> 	if (tsk->need_resched)
> 		goto sleep;
> 
> (and add a "sleep:" thing to inside the if-statement that makes us go to
> sleep). That way, if we end up scheduling away from kswapd, we won't waste
> time scheduling back unless we really should.

ok, will try this, and will try to find where it fails.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 19:06                                   ` Linus Torvalds
  2000-05-12 19:36                                     ` Ingo Molnar
@ 2000-05-12 19:40                                     ` Ingo Molnar
  2000-05-12 19:54                                     ` Ingo Molnar
  2 siblings, 0 replies; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 19:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel

note that now i'm running the 4GB variant of highmem (easier to fill up) -
so the physical memory layout goes like this:

	1GB permanently mapped RAM
	~2GB highmem

(only 2GB highmem because 5GB of RAM is above 4GB, so unaccesible to
normal 32-bit PTEs.)

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 19:06                                   ` Linus Torvalds
  2000-05-12 19:36                                     ` Ingo Molnar
  2000-05-12 19:40                                     ` Ingo Molnar
@ 2000-05-12 19:54                                     ` Ingo Molnar
  2000-05-12 22:48                                       ` Rik van Riel
  2 siblings, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 19:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel

yes, this appears to have done the trick (patch attached). A 15MB/sec
stream of pure read activity started filling up highmem first. There was
still a light spike of kswapd activity once highmem got filled up, but it
stabilized after a few seconds. Then the pagecache filled up the normal
zone just as fast as it filled up the highmem zone, and now it's in steady
state, with kswapd using up ~5% CPU time [fluctuating, sometimes as high
as 15%, sometimes zero]. (it's recycling LRU pages?) Cool!

	Ingo

--- linux/mm/vmscan.c.orig	Fri May 12 12:28:58 2000
+++ linux/mm/vmscan.c	Fri May 12 12:29:50 2000
@@ -543,13 +543,14 @@
 				something_to_do = 1;
 				do_try_to_free_pages(GFP_KSWAPD);
 				if (tsk->need_resched)
-					schedule();
+					goto sleep;
 			}
 			run_task_queue(&tq_disk);
 			pgdat = pgdat->node_next;
 		} while (pgdat);
 
 		if (!something_to_do) {
+sleep:
 			tsk->state = TASK_INTERRUPTIBLE;
 			interruptible_sleep_on(&kswapd_wait);
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 19:54                                     ` Ingo Molnar
@ 2000-05-12 22:48                                       ` Rik van Riel
  2000-05-13 11:57                                         ` Stephen C. Tweedie
  0 siblings, 1 reply; 67+ messages in thread
From: Rik van Riel @ 2000-05-12 22:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrea Arcangeli, MM mailing list, linux-kernel

On Fri, 12 May 2000, Ingo Molnar wrote:

> --- linux/mm/vmscan.c.orig	Fri May 12 12:28:58 2000
> +++ linux/mm/vmscan.c	Fri May 12 12:29:50 2000
> @@ -543,13 +543,14 @@
>  				something_to_do = 1;
>  				do_try_to_free_pages(GFP_KSWAPD);
>  				if (tsk->need_resched)
> -					schedule();
> +					goto sleep;
>  			}
>  			run_task_queue(&tq_disk);
>  			pgdat = pgdat->node_next;
>  		} while (pgdat);
>  
>  		if (!something_to_do) {
> +sleep:
>  			tsk->state = TASK_INTERRUPTIBLE;
>  			interruptible_sleep_on(&kswapd_wait);
>  		}

This is wrong. It will make it much much easier for processes to
get killed (as demonstrated by quintela's VM test suite).

The correct fix probably is to have the _same_ watermark for
something_to_do *and* the "easy allocation" in __alloc_pages.

(very much untested patch versus pre7-9 below)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/



--- vmscan.c.orig	Thu May 11 12:13:08 2000
+++ vmscan.c	Fri May 12 19:46:49 2000
@@ -542,8 +542,9 @@
 				zone_t *zone = pgdat->node_zones+ i;
 				if (!zone->size || !zone->zone_wake_kswapd)
 					continue;
-				something_to_do = 1;
 				do_try_to_free_pages(GFP_KSWAPD);
+				if (zone->free_pages < zone->pages_low)
+					something_to_do = 1;
 				if (tsk->need_resched)
 					schedule();
 			}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 22:48                                       ` Rik van Riel
@ 2000-05-13 11:57                                         ` Stephen C. Tweedie
  2000-05-13 12:03                                           ` Rik van Riel
  0 siblings, 1 reply; 67+ messages in thread
From: Stephen C. Tweedie @ 2000-05-13 11:57 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ingo Molnar, Linus Torvalds, Andrea Arcangeli, MM mailing list,
	linux-kernel, Stephen Tweedie

Hi,

On Fri, May 12, 2000 at 07:48:45PM -0300, Rik van Riel wrote:

> >  				if (tsk->need_resched)
> > -					schedule();
> > +					goto sleep;
> 
> This is wrong. It will make it much much easier for processes to
> get killed (as demonstrated by quintela's VM test suite).

It shouldn't.  If tasks are getting killed, then the fix should be
in alloc_pages, not in kswapd.  Tasks _should_ be quite able to wait
for memory, and if necessary, drop into try_to_free_pages themselves.

Linus, the fix above seems to be necessary.  Without it, even a simple
playing of mp3 audio on 2.3 fails once memory is full on a 256MB box,
with kswapd consuming between 5% and 25% of CPU and locking things up
sufficiently to cause dropouts in the playback every second or more.
With that one-liner fix, mp3 is smooth even in the presence of other
background file activity.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-13 11:57                                         ` Stephen C. Tweedie
@ 2000-05-13 12:03                                           ` Rik van Riel
  2000-05-13 12:14                                             ` Ingo Molnar
  0 siblings, 1 reply; 67+ messages in thread
From: Rik van Riel @ 2000-05-13 12:03 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Ingo Molnar, Linus Torvalds, MM mailing list, linux-kernel

On Sat, 13 May 2000, Stephen C. Tweedie wrote:
> On Fri, May 12, 2000 at 07:48:45PM -0300, Rik van Riel wrote:
> 
> > >  				if (tsk->need_resched)
> > > -					schedule();
> > > +					goto sleep;
> > 
> > This is wrong. It will make it much much easier for processes to
> > get killed (as demonstrated by quintela's VM test suite).
> 
> It shouldn't.  If tasks are getting killed, then the fix should be
> in alloc_pages, not in kswapd.  Tasks _should_ be quite able to wait
> for memory, and if necessary, drop into try_to_free_pages themselves.

Indeed, but waiting for memory or running
try_to_free_pages themselves is not without
problems either, as you describe below...

> Linus, the fix above seems to be necessary.  Without it, even a
> simple playing of mp3 audio on 2.3 fails once memory is full on
> a 256MB box, with kswapd consuming between 5% and 25% of CPU and
> locking things up sufficiently to cause dropouts in the playback
> every second or more. With that one-liner fix, mp3 is smooth
> even in the presence of other background file activity.

Kswapd freeing pages in the background means that processes
in the foreground can proceed with their allocation without
waiting, leading to smoother VM performance. I guess we
want that ... ;)

Besides, kswapd will _only_ continue if there's a zone with
zone->free_pages < zone->pages_low ... I'm now running pre8
with the patch below and it works fine.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/



--- mm/vmscan.c.orig	Fri May 12 20:13:08 2000
+++ mm/vmscan.c	Fri May 12 20:15:24 2000
@@ -538,16 +538,19 @@
 			int i;
 			for(i = 0; i < MAX_NR_ZONES; i++) {
 				zone_t *zone = pgdat->node_zones+ i;
+				if (tsk->need_resched)
+					schedule();
 				if (!zone->size || !zone->zone_wake_kswapd)
 					continue;
-				something_to_do = 1;
+				if (zone->free_pages < zone->pages_low)
+					something_to_do = 1;
 				do_try_to_free_pages(GFP_KSWAPD);
 			}
 			run_task_queue(&tq_disk);
 			pgdat = pgdat->node_next;
 		} while (pgdat);
 
-		if (tsk->need_resched || !something_to_do) {
+		if (!something_to_do) {
 			tsk->state = TASK_INTERRUPTIBLE;
 			interruptible_sleep_on(&kswapd_wait);
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-13 12:03                                           ` Rik van Riel
@ 2000-05-13 12:14                                             ` Ingo Molnar
  2000-05-13 14:23                                               ` Ingo Molnar
  0 siblings, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-13 12:14 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stephen C. Tweedie, Linus Torvalds, MM mailing list, linux-kernel

i've also seen a bit more frequent allocation failures on pre8, during
high (but non-trashing) VM load. Will try your patch now.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-13 12:14                                             ` Ingo Molnar
@ 2000-05-13 14:23                                               ` Ingo Molnar
  0 siblings, 0 replies; 67+ messages in thread
From: Ingo Molnar @ 2000-05-13 14:23 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stephen C. Tweedie, Linus Torvalds, MM mailing list, linux-kernel

> i've also seen a bit more frequent allocation failures on pre8, during
> high (but non-trashing) VM load. Will try your patch now.

your patch has improved out-of-memory behavior, i have seen no allocation
failures so far. (stock pre8 was occasionally failing)

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-12 16:40                             ` Ingo Molnar
  2000-05-12 17:15                               ` Rik van Riel
  2000-05-12 18:15                               ` Linus Torvalds
@ 2000-05-19  1:58                               ` Andrea Arcangeli
  2000-05-19 15:03                                 ` Rik van Riel
  2 siblings, 1 reply; 67+ messages in thread
From: Andrea Arcangeli @ 2000-05-19  1:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rik van Riel, Linus Torvalds, MM mailing list, linux-kernel

[ sorry for the late reply ]

On Fri, 12 May 2000, Ingo Molnar wrote:

>On Fri, 12 May 2000, Rik van Riel wrote:
>
>> But we *can* split the HIGHMEM zone into a bunch of smaller
>> ones without affecting performance. Just set zone->pages_min
>> and zone->pages_low to 0 and zone->pages_high to some smallish
>> value. Then we can teach the allocator to skip the zone if:
>> 1) no obscenely large amount of free pages
>> 2) zone is locked by somebody else (TryLock(zone->lock))
>
>whats the point of this splitup? (i suspect there is a point, i just
>cannot see it now. thanks.)

I quote email from Rik of 25 Apr 2000 23:10:56 on linux-mm:

-- Message-ID: <Pine.LNX.4.21.0004252240280.14340-100000@duckman.conectiva> --
We can do this just fine. Splitting a box into a dozen more
zones than what we have currently should work just fine,
except for (as you say) higher cpu use by kwapd.

If I get my balancing patch right, most of that disadvantage
should be gone as well. Maybe we *do* want to do this on
bigger SMP boxes so each processor can start out with a
separate zone and check the other zone later to avoid lock
contention?
--------------------------------------------------------------

I still strongly think that the current zone strict mem balancing design
is very broken (and I also think to be right since I believe to see
the whole picture) but I don't think I can explain my arguments
better and/or more extensively of how I just did in linux-mm some week ago.

If you see anything wrong in my reasoning please let me know. The interesting
thread was "Re: 2.3.x mem balancing" (the start were off list) in linux-mm.

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-19  1:58                               ` Andrea Arcangeli
@ 2000-05-19 15:03                                 ` Rik van Riel
  2000-05-19 16:08                                   ` Andrea Arcangeli
  0 siblings, 1 reply; 67+ messages in thread
From: Rik van Riel @ 2000-05-19 15:03 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, Linus Torvalds, MM mailing list, linux-kernel

On Thu, 18 May 2000, Andrea Arcangeli wrote:

> I still strongly think that the current zone strict mem
> balancing design is very broken (and I also think to be right
> since I believe to see the whole picture) but I don't think I
> can explain my arguments better and/or more extensively of how I
> just did in linux-mm some week ago.

The balancing as of pre9-2 works like this:
- LRU list per pgdat
- kswapd runs and makes sure every zone has > zone->pages_low
  free pages, after that it stops
- kswapd frees up to zone->pages_high pages, depending on what
  pages we encounter in the LRU queue, this will make sure that
  the zone with most least recently used pages will have more
  free pages
- __alloc_pages() allocates all pages up to zone->pages_low on
  every zone before waking up kswapd, this makes sure more pages
  from the least loaded zone will be used than from more loaded
  zones, this will make sure balancing between zones happens

I'm curious what would be so "very broken" about this?

AFAICS it does most of what the classzone patch would achieve,
at lower complexity and better readability.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-19 15:03                                 ` Rik van Riel
@ 2000-05-19 16:08                                   ` Andrea Arcangeli
  2000-05-19 17:05                                     ` Rik van Riel
  2000-05-19 22:28                                     ` Linus Torvalds
  0 siblings, 2 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2000-05-19 16:08 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Molnar, Linus Torvalds, MM mailing list, linux-kernel

On Fri, 19 May 2000, Rik van Riel wrote:

>I'm curious what would be so "very broken" about this?

You start eating from ZONE_DMA before you made empty ZONE_NORMAL.

>AFAICS it does most of what the classzone patch would achieve,
>at lower complexity and better readability.

I disagree.

Andrea


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-19 16:08                                   ` Andrea Arcangeli
@ 2000-05-19 17:05                                     ` Rik van Riel
  2000-05-19 22:28                                     ` Linus Torvalds
  1 sibling, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-19 17:05 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, Linus Torvalds, MM mailing list, linux-kernel

On Fri, 19 May 2000, Andrea Arcangeli wrote:
> On Fri, 19 May 2000, Rik van Riel wrote:
> 
> >I'm curious what would be so "very broken" about this?
> 
> You start eating from ZONE_DMA before you made empty ZONE_NORMAL.

What's wrong with this?  We'll never go below zone->pages_low
in ZONE_DMA, so you don't have to worry about running out of
DMA pages.

> >AFAICS it does most of what the classzone patch would achieve,
> >at lower complexity and better readability.
> 
> I disagree.

The classzone patches look like a bunch of magic to most of the
people who've read it and with whom I've spoken. There has been
almost no explanation of what the patch tries to achieve or why
it would work better than the normal code (nor is it visible in
the code).

Juan Quintela's patch, on the other hand, has received continuous
feedback from 7 kernel hackers, all of whom now understand how the
code works. This provides a lot more long-term maintainability of
the code.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [patch] balanced highmem subsystem under pre7-9
  2000-05-19 16:08                                   ` Andrea Arcangeli
  2000-05-19 17:05                                     ` Rik van Riel
@ 2000-05-19 22:28                                     ` Linus Torvalds
  1 sibling, 0 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-19 22:28 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Ingo Molnar, MM mailing list, linux-kernel

On Fri, 19 May 2000, Andrea Arcangeli wrote:

> On Fri, 19 May 2000, Rik van Riel wrote:
> 
> >I'm curious what would be so "very broken" about this?
> 
> You start eating from ZONE_DMA before you made empty ZONE_NORMAL.

THIS IS NOT A BUG!

It's a feature. I don't see why you insist on calling this a problem.

We do NOT keep free memory around just for DMA allocations. We
fundamentally keep free memory around because the buddy allocator (_any_
allocator, in fact) needs some slop in order to do a reasonable job at
allocating contiguous page regions, for example. We keep free memory
around because that way we have a "buffer" to allocate from atomically, so
that when network traffic occurs or there is other behaviour that requires
memory without being able to free it on the spot, we have memory to give.

Keeping only DMA memory around would be =bad=. It would mean, for example,
that when a new packet comes in on the network, it would always be
allocated from the DMA region, because the normal zone hasn't even been
balanced ("why balance it when we still have DMA memory?"). And that would
be a huge mistake, because that would mean, for example, that by selecting
the right allocation patterns and by opening sockets without reading the
data they receive the right way, somebody could force all of DMA memory to
be used up by network allocations that wouldn't be free'd.

In short, your very fundamental premise is BROKEN, Andrea. We want to keep
normal memory around, even if there is low memory available. The same is
true of high memory, for similar reasons. 

Face it. The original zone-only code had problems. One of the worst
problems was that it would try to free up a lot of "normal" memory if it
got low on DMA memory. Those problems have pretty much been fixed, and
they had _nothing_ to do with your "class" patches. They were bugs, plain
and simple, not design mistakes.

If you think you should have zero free normal pages, YOU have a design
mistake. We should not be that black-and-white. The whole point in having
the min/low/max stuff is to make memory allocation less susceptible to
border conditions, and turn a black-and-white situation into more of a
"levels of gray" situation.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11  0:16             ` Linus Torvalds
  2000-05-11  0:32               ` Linus Torvalds
  2000-05-11  1:04               ` [PATCH] Recent VM fiasco - fixed Juan J. Quintela
@ 2000-05-11 11:12               ` Christoph Rohland
  2000-05-11 17:38               ` Steve Dodd
  3 siblings, 0 replies; 67+ messages in thread
From: Christoph Rohland @ 2000-05-11 11:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James H. Cloos Jr., linux-mm, linux-kernel

Linus Torvalds <torvalds@transmeta.com> writes:

> Ok, there's a pre7-9 out there, and the biggest change versus pre7-8 is
[...]
> Just the dirty buffer handling made quite an enormous difference, so
> please do test this if you hated earlier pre7 kernels.

# vmstat 5
9   3  0     0 921884   1796  12776   0   0     0     0  108 77813   2  90   8
11  1  1 12044 523248   1080  25232   0 2494     0   624  327 16323   1  97   3
13  0  1 16468 728120    720  29000   0 3818     0   955  364 17820   3  97   0
11  1  1   336 237340    720  13040   0 1114     0   278  200 10402   1  99   0
10  2  1   476  41628    720  13184   0 4066     0  1017  401  5792   1  99   0
VM: killing process ipctst
VM: killing process ipctst
VM: killing process ipctst
4  5  1  31872   2500     96  25592  22 13447     6  3362  983 10863   0  82  1
5  4  1  58708 675260    280  19024   0 5388    12  1355 2231  1558   0  77  23
0  0  0  58708 675260    280  19024   0   0     0     0  112     4   0   0 100

I still hate it ;-)

Greetings
		Christoph
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-11  0:16             ` Linus Torvalds
                                 ` (2 preceding siblings ...)
  2000-05-11 11:12               ` [PATCH] Recent VM fiasco - fixed Christoph Rohland
@ 2000-05-11 17:38               ` Steve Dodd
  3 siblings, 0 replies; 67+ messages in thread
From: Steve Dodd @ 2000-05-11 17:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James H. Cloos Jr., linux-mm, linux-kernel

On Wed, May 10, 2000 at 05:16:05PM -0700, Linus Torvalds wrote:

[..]
> Just the dirty buffer handling made quite an enormous difference, so
> please do test this if you hated earlier pre7 kernels.

I definitely hate pre7-9.

For various reasons, I'm stuck on a 16Mb box right now. I just tried to start
dselect[0], and it got killed. It's completely repeatable, and running vmstat
shows that something demented is happening:

frodo:~$ vmstat 1 # and then start dselect on another terminal
 procs                  memory    swap        io    system         cpu
 r b w  swpd  free  buff cache  si  so   bi   bo   in   cs  us  sy  id
 0 0 0  2508  6544   196  5124   6   4   94    9  137   56  40   5  55
 0 0 0  2508  6524   196  5140   0   0   16    0  134    4   0   2  98
 0 0 0  2508  6524   196  5140   0   0    0    0  106    2   0   2  98
 0 0 0  2508  6520   200  5140   0   0    1    0  112    6   0   2  98
 0 0 0  2508  6200   204  5224  16   0   77    0  125   29   4   4  92
 0 0 0  2508  6200   204  5224   0   0    0    0  103    2   0   2  98
 1 0 0  2508  5332   212  5504   0   0  285    0  117   21  42   4  54
 1 0 0  2508  3748   216  6004   0   0  501    0  119   24  83   7  11
 1 0 0  2508  2664   220  6388   0   0  389   69  164   20  55   5  40
 1 0 0  2508   964   224  7020   0   0  631    0  117   15  83   6  11
 1 0 0  2508   364   216  6692   0   0  341    0  113   22  81  15   5
 1 0 0  2504   288   208  5900   0   0  512    0  114   18  78  22   0
 1 0 0  2504   364   112  5068   0   0  514    0  114   25  77  18   5
 1 0 1  2504   252    72  4416   0   0  483   12  137   47  73  15  13
 1 0 0  2504   264    68  4448   0   0  511   13  147   77  32  20  48
VM: killing process dselect
 0 2 0  2504  8044    76  3960 176   0  803    0  220  137  16  23  61
 0 0 0  2504  8032    76  3964   0   0    2    0  106    8   0   2  98
 0 0 0  2504  8032    76  3964   0   0    1    0  105    6   0   2  98

I'm not an "mm person", but that doesn't look optimal to me.

The box does have a reasonable amount of swap:

frodo:~$ cat /proc/swaps
Filename			Type		Size	Used	Priority
/dev/hdc2                       partition	18140	2480	-1
/dev/hdc4                       partition	50396	0	-2


[0] so I could install the libbfd header files to compile kdb to poke at the
loop device lock-up stuff so I can use loop for testing ntfs stuff.. I'm
stuck in a maze of twisty kernel bugs, none alike..
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] Recent VM fiasco - fixed
  2000-05-09  7:56   ` Daniel Stone
  2000-05-09  8:25     ` Christoph Rohland
@ 2000-05-09 10:21     ` Rik van Riel
  1 sibling, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-09 10:21 UTC (permalink / raw)
  To: Daniel Stone; +Cc: Zlatko Calusic, linux-mm, linux-kernel, Linus Torvalds

On Tue, 9 May 2000, Daniel Stone wrote:

> That's astonishing, I'm sure, but think of us poor bastards who
> DON'T have an SMP machine with >1gig of RAM.
> 
> This is a P120, 32meg.

The old zoned VM code will run that machine as efficiently
as if it had 16MB of ram. See my point now?

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [PATCH] Recent VM fiasco - fixed
@ 2000-05-11 11:26 Jones D (ISaCS)
  2000-05-12  7:50 ` Andrea Arcangeli
  0 siblings, 1 reply; 67+ messages in thread
From: Jones D (ISaCS) @ 2000-05-11 11:26 UTC (permalink / raw)
  To: 'Rik van Riel', Simon Kirby
  Cc: Linus Torvalds, linux-mm, linux-kernel

> There probably are some good bits in the classzone patch, but
> it also backs out bugfixes for bugs which have been proven to
> exist and fixed by those fixes. ;(
> 
> It would be nice if Andrea could separate the good bits from
> the bad bits and make a somewhat cleaner patch...

As I've been playing with invalidate_inode_pages for the last few
days, this section of Andrea's classzone diff caught my eye.

I noticed that in Andrea's version, if a page is locked, then it is just
ignored, and never freed.  He reduced the complexity of the function, and
sped it up immeasuarably, but aparently at the expense of leaking pages.
I've not looked at the rest of the patch, so my judgement is on the basis
of this section alone.

Andrea, for an improved version of that function see the patch I sent
yesterday.

d.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* RE: [PATCH] Recent VM fiasco - fixed
  2000-05-11 11:26 Jones D (ISaCS)
@ 2000-05-12  7:50 ` Andrea Arcangeli
  0 siblings, 0 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2000-05-12  7:50 UTC (permalink / raw)
  To: Jones D (ISaCS)
  Cc: 'Rik van Riel',
	Simon Kirby, Linus Torvalds, linux-mm, linux-kernel

On Thu, 11 May 2000, Jones D (ISaCS) wrote:

>As I've been playing with invalidate_inode_pages for the last few
>days, this section of Andrea's classzone diff caught my eye.
>
>I noticed that in Andrea's version, if a page is locked, then it is just
>ignored, and never freed.  He reduced the complexity of the function, and

Note that the official kernel clearly ignores it too so I'm not
reinserting any bug there but only avoiding dropping performance for no
good reason and that's why I intentionally backed out such a recent
change.

To avoiding ignoring it you should wait_on_page() (you have no other way)
and according to Trond we can't do that because the caller doesn't handle
a blocking function.

Your patch ignores locked pages too from within
invalidate_inode_pages() as far I can tell.

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2000-05-19 22:28 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-08 17:21 [PATCH] Recent VM fiasco - fixed Zlatko Calusic
2000-05-08 17:43 ` Rik van Riel
2000-05-08 18:16   ` Zlatko Calusic
2000-05-08 18:20     ` Linus Torvalds
2000-05-08 18:46     ` Rik van Riel
2000-05-08 18:53       ` Zlatko Calusic
2000-05-08 19:04         ` Rik van Riel
2000-05-09  7:56   ` Daniel Stone
2000-05-09  8:25     ` Christoph Rohland
2000-05-09 15:44       ` Linus Torvalds
2000-05-09 16:12         ` Simon Kirby
2000-05-09 17:42         ` Christoph Rohland
2000-05-09 19:50           ` Linus Torvalds
2000-05-10 11:25             ` Christoph Rohland
2000-05-10 11:50               ` Zlatko Calusic
2000-05-11 23:40                 ` Mark Hahn
2000-05-10  4:05         ` James H. Cloos Jr.
2000-05-10  7:29           ` James H. Cloos Jr.
2000-05-11  0:16             ` Linus Torvalds
2000-05-11  0:32               ` Linus Torvalds
2000-05-11 16:36                 ` [PATCH] Recent VM fiasco - fixed (pre7-9) Rajagopal Ananthanarayanan
2000-05-11  1:04               ` [PATCH] Recent VM fiasco - fixed Juan J. Quintela
2000-05-11  1:53                 ` Simon Kirby
2000-05-11  7:23                   ` Linus Torvalds
2000-05-11 14:17                     ` Simon Kirby
2000-05-11 23:38                       ` Simon Kirby
2000-05-12  0:09                         ` Linus Torvalds
2000-05-12  2:51                           ` [RFC][PATCH] shrink_mmap avoid list_del (Was: Re: [PATCH] Recent VM fiasco - fixed) Roger Larsson
2000-05-11 11:15                   ` [PATCH] Recent VM fiasco - fixed Rik van Riel
2000-05-11  5:10                 ` Linus Torvalds
2000-05-11 10:09                   ` James H. Cloos Jr.
2000-05-11 17:25                   ` Juan J. Quintela
2000-05-11 23:25                   ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
2000-05-11 23:46                     ` Linus Torvalds
2000-05-12  0:08                       ` Ingo Molnar
2000-05-12  0:15                         ` Ingo Molnar
2000-05-12  9:02                     ` Christoph Rohland
2000-05-12  9:56                       ` Ingo Molnar
2000-05-12 11:49                         ` Christoph Rohland
2000-05-12 16:12                       ` Linus Torvalds
2000-05-12 10:57                     ` Andrea Arcangeli
2000-05-12 12:11                       ` Ingo Molnar
2000-05-12 12:57                         ` Andrea Arcangeli
2000-05-12 13:20                           ` Rik van Riel
2000-05-12 16:40                             ` Ingo Molnar
2000-05-12 17:15                               ` Rik van Riel
2000-05-12 18:15                               ` Linus Torvalds
2000-05-12 18:53                                 ` Ingo Molnar
2000-05-12 19:06                                   ` Linus Torvalds
2000-05-12 19:36                                     ` Ingo Molnar
2000-05-12 19:40                                     ` Ingo Molnar
2000-05-12 19:54                                     ` Ingo Molnar
2000-05-12 22:48                                       ` Rik van Riel
2000-05-13 11:57                                         ` Stephen C. Tweedie
2000-05-13 12:03                                           ` Rik van Riel
2000-05-13 12:14                                             ` Ingo Molnar
2000-05-13 14:23                                               ` Ingo Molnar
2000-05-19  1:58                               ` Andrea Arcangeli
2000-05-19 15:03                                 ` Rik van Riel
2000-05-19 16:08                                   ` Andrea Arcangeli
2000-05-19 17:05                                     ` Rik van Riel
2000-05-19 22:28                                     ` Linus Torvalds
2000-05-11 11:12               ` [PATCH] Recent VM fiasco - fixed Christoph Rohland
2000-05-11 17:38               ` Steve Dodd
2000-05-09 10:21     ` Rik van Riel
2000-05-11 11:26 Jones D (ISaCS)
2000-05-12  7:50 ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox