* [PATCH] Recent VM fiasco - fixed
@ 2000-05-08 17:21 Zlatko Calusic
2000-05-08 17:43 ` Rik van Riel
0 siblings, 1 reply; 67+ messages in thread
From: Zlatko Calusic @ 2000-05-08 17:21 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Linus Torvalds
[-- Attachment #1: Type: text/plain, Size: 1572 bytes --]
Hi to all!
After I _finally_ got tired of the constant worse and worse VM
behaviour in the recent kernels, I thought I could spare few hours
this weekend just to see what's going on. I was quite surprised to see
that VM subsystem, while at its worst condition (at least in 2.3.x),
is quite easily repairable even to unskilled ones... I compiled and
checked few kernels back to 2.3.51, and found that new code was
constantly added just to make things go worse. Short history:
2.3.51 - mostly OK, but reading from disk takes too much CPU (kswapd)
2.3.99-pre1, 2 - as .51 + aggressive swap out during writing
2.3.99-pre3, 4, 5 - reading better
2.3.99-pre5, 6 - both reading and writing take 100% CPU!!!
I also tried some pre7-x (forgot which one) but that one was f****d up
beyond a recognition (read: was killing my processes including X11
like mad, every time I started writing to disk). Thus patch that
follows, and fixes all above mentioned problems, was made against
pre6, sorry. I'll made another patch when pre7 gets out, if things are
still not properly fixed.
BTW, this patch mostly *removes* cruft recently added, and returns to
the known state of operation. After that is achieved it is then easy
to selectively add good things I might have removed, and change
behaviour as wanted, but I would like to urge people to test things
thoroughly before releasing patches this close to 2.4.
Then again, I might have introduced bugs in this patch, too. :)
But, I *tried* to break it (spent some time doing that), and testing
didn't reveal any bad behaviour.
Enjoy!
[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 15658 bytes --]
Index: 9906.2/include/linux/swap.h
--- 9906.2/include/linux/swap.h Thu, 27 Apr 2000 22:11:43 +0200 zcalusic (linux/C/b/20_swap.h 1.4.1.15.1.1 644)
+++ 9906.5/include/linux/swap.h Sun, 07 May 2000 20:39:35 +0200 zcalusic (linux/C/b/20_swap.h 1.4.1.15.1.1.1.1 644)
@@ -87,7 +87,6 @@
/* linux/mm/vmscan.c */
extern int try_to_free_pages(unsigned int gfp_mask, zone_t *zone);
-extern int swap_out(unsigned int gfp_mask, int priority);
/* linux/mm/page_io.c */
extern void rw_swap_page(int, struct page *, int);
Index: 9906.2/mm/vmscan.c
--- 9906.2/mm/vmscan.c Thu, 27 Apr 2000 22:11:43 +0200 zcalusic (linux/F/b/13_vmscan.c 1.5.1.22 644)
+++ 9906.5/mm/vmscan.c Sun, 07 May 2000 20:39:35 +0200 zcalusic (linux/F/b/13_vmscan.c 1.5.1.22.2.1 644)
@@ -48,7 +48,6 @@
if ((page-mem_map >= max_mapnr) || PageReserved(page))
goto out_failed;
- mm->swap_cnt--;
/* Don't look at this pte if it's been accessed recently. */
if (pte_young(pte)) {
/*
@@ -220,8 +219,6 @@
result = try_to_swap_out(mm, vma, address, pte, gfp_mask);
if (result)
return result;
- if (!mm->swap_cnt)
- return 0;
address += PAGE_SIZE;
pte++;
} while (address && (address < end));
@@ -251,8 +248,6 @@
int result = swap_out_pmd(mm, vma, pmd, address, end, gfp_mask);
if (result)
return result;
- if (!mm->swap_cnt)
- return 0;
address = (address + PMD_SIZE) & PMD_MASK;
pmd++;
} while (address && (address < end));
@@ -277,8 +272,6 @@
int result = swap_out_pgd(mm, vma, pgdir, address, end, gfp_mask);
if (result)
return result;
- if (!mm->swap_cnt)
- return 0;
address = (address + PGDIR_SIZE) & PGDIR_MASK;
pgdir++;
} while (address && (address < end));
@@ -328,7 +321,7 @@
* N.B. This function returns only 0 or 1. Return values != 1 from
* the lower level routines result in continued processing.
*/
-int swap_out(unsigned int priority, int gfp_mask)
+static int swap_out(unsigned int priority, int gfp_mask)
{
struct task_struct * p;
int counter;
@@ -363,7 +356,6 @@
p = init_task.next_task;
for (; p != &init_task; p = p->next_task) {
struct mm_struct *mm = p->mm;
- p->hog = 0;
if (!p->swappable || !mm)
continue;
if (mm->rss <= 0)
@@ -377,26 +369,9 @@
pid = p->pid;
}
}
- if (assign == 1) {
- /* we just assigned swap_cnt, normalise values */
- assign = 2;
- p = init_task.next_task;
- for (; p != &init_task; p = p->next_task) {
- int i = 0;
- struct mm_struct *mm = p->mm;
- if (!p->swappable || !mm || mm->rss <= 0)
- continue;
- /* small processes are swapped out less */
- while ((mm->swap_cnt << 2 * (i + 1) < max_cnt))
- i++;
- mm->swap_cnt >>= i;
- mm->swap_cnt += i; /* if swap_cnt reaches 0 */
- /* we're big -> hog treatment */
- if (!i)
- p->hog = 1;
- }
- }
read_unlock(&tasklist_lock);
+ if (assign == 1)
+ assign = 2;
if (!best) {
if (!assign) {
assign = 1;
@@ -437,14 +412,13 @@
{
int priority;
int count = SWAP_CLUSTER_MAX;
- int ret;
/* Always trim SLAB caches when memory gets low. */
kmem_cache_reap(gfp_mask);
priority = 6;
do {
- while ((ret = shrink_mmap(priority, gfp_mask, zone))) {
+ while (shrink_mmap(priority, gfp_mask, zone)) {
if (!--count)
goto done;
}
@@ -467,9 +441,7 @@
}
}
- /* Then, try to page stuff out..
- * We use swapcount here because this doesn't actually
- * free pages */
+ /* Then, try to page stuff out.. */
while (swap_out(priority, gfp_mask)) {
if (!--count)
goto done;
@@ -497,10 +469,7 @@
*/
int kswapd(void *unused)
{
- int i;
struct task_struct *tsk = current;
- pg_data_t *pgdat;
- zone_t *zone;
tsk->session = 1;
tsk->pgrp = 1;
@@ -521,25 +490,38 @@
*/
tsk->flags |= PF_MEMALLOC;
- while (1) {
+ for (;;) {
+ int work_to_do = 0;
+
/*
* If we actually get into a low-memory situation,
* the processes needing more memory will wake us
* up on a more timely basis.
*/
- pgdat = pgdat_list;
- while (pgdat) {
- for (i = 0; i < MAX_NR_ZONES; i++) {
- zone = pgdat->node_zones + i;
- if (tsk->need_resched)
- schedule();
- if ((!zone->size) || (!zone->zone_wake_kswapd))
- continue;
- do_try_to_free_pages(GFP_KSWAPD, zone);
+ do {
+ pg_data_t *pgdat = pgdat_list;
+
+ while (pgdat) {
+ int i;
+
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ zone_t *zone = pgdat->node_zones + i;
+
+ if (!zone->size)
+ continue;
+ if (!zone->low_on_memory)
+ continue;
+ work_to_do = 1;
+ do_try_to_free_pages(GFP_KSWAPD, zone);
+ }
+ pgdat = pgdat->node_next;
}
- pgdat = pgdat->node_next;
- }
- run_task_queue(&tq_disk);
+ run_task_queue(&tq_disk);
+ if (tsk->need_resched)
+ break;
+ if (nr_free_pages() > freepages.high)
+ break;
+ } while (work_to_do);
tsk->state = TASK_INTERRUPTIBLE;
interruptible_sleep_on(&kswapd_wait);
}
Index: 9906.2/mm/filemap.c
--- 9906.2/mm/filemap.c Thu, 27 Apr 2000 22:11:43 +0200 zcalusic (linux/F/b/16_filemap.c 1.6.1.3.2.4.1.1.2.2.2.1.1.21.1.1 644)
+++ 9906.5/mm/filemap.c Sun, 07 May 2000 20:39:35 +0200 zcalusic (linux/F/b/16_filemap.c 1.6.1.3.2.4.1.1.2.2.2.1.1.21.1.1.2.1 644)
@@ -238,55 +238,41 @@
int shrink_mmap(int priority, int gfp_mask, zone_t *zone)
{
- int ret = 0, loop = 0, count;
+ int ret = 0, count;
LIST_HEAD(young);
LIST_HEAD(old);
LIST_HEAD(forget);
struct list_head * page_lru, * dispose;
- struct page * page = NULL;
- struct zone_struct * p_zone;
- int maxloop = 256 >> priority;
+ struct page * page;
if (!zone)
BUG();
- count = nr_lru_pages >> priority;
- if (!count)
- return ret;
+ count = nr_lru_pages / (priority+1);
spin_lock(&pagemap_lru_lock);
-again:
- /* we need pagemap_lru_lock for list_del() ... subtle code below */
+
while (count > 0 && (page_lru = lru_cache.prev) != &lru_cache) {
page = list_entry(page_lru, struct page, lru);
list_del(page_lru);
- p_zone = page->zone;
- /*
- * These two tests are there to make sure we don't free too
- * many pages from the "wrong" zone. We free some anyway,
- * they are the least recently used pages in the system.
- * When we don't free them, leave them in &old.
- */
- dispose = &old;
- if (p_zone != zone && (loop > (maxloop / 4) ||
- p_zone->free_pages > p_zone->pages_high))
+ dispose = &lru_cache;
+ if (test_and_clear_bit(PG_referenced, &page->flags))
+ /* Roll the page at the top of the lru list,
+ * we could also be more aggressive putting
+ * the page in the young-dispose-list, so
+ * avoiding to free young pages in each pass.
+ */
goto dispose_continue;
- /* The page is in use, or was used very recently, put it in
- * &young to make sure that we won't try to free it the next
- * time */
- dispose = &young;
-
- if (test_and_clear_bit(PG_referenced, &page->flags))
+ dispose = &old;
+ /* don't account passes over not DMA pages */
+ if (zone && (!memclass(page->zone, zone)))
goto dispose_continue;
count--;
- if (!page->buffers && page_count(page) > 1)
- goto dispose_continue;
- /* Page not used -> free it; if that fails -> &old */
- dispose = &old;
+ dispose = &young;
if (TryLockPage(page))
goto dispose_continue;
@@ -297,11 +283,22 @@
page locked down ;). */
spin_unlock(&pagemap_lru_lock);
+ /* avoid unscalable SMP locking */
+ if (!page->buffers && page_count(page) > 1)
+ goto unlock_noput_continue;
+
+ /* Take the pagecache_lock spinlock held to avoid
+ other tasks to notice the page while we are looking at its
+ page count. If it's a pagecache-page we'll free it
+ in one atomic transaction after checking its page count. */
+ spin_lock(&pagecache_lock);
+
/* avoid freeing the page while it's locked */
get_page(page);
/* Is it a buffer page? */
if (page->buffers) {
+ spin_unlock(&pagecache_lock);
if (!try_to_free_buffers(page))
goto unlock_continue;
/* page was locked, inode can't go away under us */
@@ -309,14 +306,9 @@
atomic_dec(&buffermem_pages);
goto made_buffer_progress;
}
+ spin_lock(&pagecache_lock);
}
- /* Take the pagecache_lock spinlock held to avoid
- other tasks to notice the page while we are looking at its
- page count. If it's a pagecache-page we'll free it
- in one atomic transaction after checking its page count. */
- spin_lock(&pagecache_lock);
-
/*
* We can't free pages unless there's just one user
* (count == 2 because we added one ourselves above).
@@ -325,6 +317,12 @@
goto cache_unlock_continue;
/*
+ * We did the page aging part.
+ */
+ if (nr_lru_pages < freepages.min * priority)
+ goto cache_unlock_continue;
+
+ /*
* Is it a page swap page? If so, we want to
* drop it if it is no longer used, even if it
* were to be marked referenced..
@@ -353,13 +351,21 @@
cache_unlock_continue:
spin_unlock(&pagecache_lock);
unlock_continue:
- spin_lock(&pagemap_lru_lock);
UnlockPage(page);
put_page(page);
+dispose_relock_continue:
+ /* even if the dispose list is local, a truncate_inode_page()
+ may remove a page from its queue so always
+ synchronize with the lru lock while accesing the
+ page->lru field */
+ spin_lock(&pagemap_lru_lock);
list_add(page_lru, dispose);
continue;
- /* we're holding pagemap_lru_lock, so we can just loop again */
+unlock_noput_continue:
+ UnlockPage(page);
+ goto dispose_relock_continue;
+
dispose_continue:
list_add(page_lru, dispose);
}
@@ -374,11 +380,6 @@
spin_lock(&pagemap_lru_lock);
/* nr_lru_pages needs the spinlock */
nr_lru_pages--;
-
- loop++;
- /* wrong zone? not looped too often? roll again... */
- if (page->zone != zone && loop < maxloop)
- goto again;
out:
list_splice(&young, &lru_cache);
Index: 9906.2/mm/page_alloc.c
--- 9906.2/mm/page_alloc.c Thu, 27 Apr 2000 22:11:43 +0200 zcalusic (linux/F/b/18_page_alloc 1.5.2.21 644)
+++ 9906.5/mm/page_alloc.c Sun, 07 May 2000 20:39:35 +0200 zcalusic (linux/F/b/18_page_alloc 1.5.2.21.2.1 644)
@@ -58,8 +58,6 @@
*/
#define BAD_RANGE(zone,x) (((zone) != (x)->zone) || (((x)-mem_map) < (zone)->offset) || (((x)-mem_map) >= (zone)->offset+(zone)->size))
-#if 0
-
static inline unsigned long classfree(zone_t *zone)
{
unsigned long free = 0;
@@ -73,8 +71,6 @@
return(free);
}
-#endif
-
/*
* Buddy system. Hairy. You really aren't expected to understand this
*
@@ -156,10 +152,8 @@
spin_unlock_irqrestore(&zone->lock, flags);
- if (zone->free_pages > zone->pages_high) {
- zone->zone_wake_kswapd = 0;
+ if (zone->free_pages > zone->pages_high)
zone->low_on_memory = 0;
- }
}
#define MARK_USED(index, order, area) \
@@ -186,8 +180,7 @@
return page;
}
-static FASTCALL(struct page * rmqueue(zone_t *zone, unsigned long order));
-static struct page * rmqueue(zone_t *zone, unsigned long order)
+static inline struct page * rmqueue(zone_t *zone, unsigned long order)
{
free_area_t * area = zone->free_area + order;
unsigned long curr_order = order;
@@ -227,115 +220,72 @@
return NULL;
}
-static int zone_balance_memory(zonelist_t *zonelist)
-{
- int tried = 0, freed = 0;
- zone_t **zone;
- int gfp_mask = zonelist->gfp_mask;
- extern wait_queue_head_t kswapd_wait;
-
- zone = zonelist->zones;
- for (;;) {
- zone_t *z = *(zone++);
- if (!z)
- break;
- if (z->free_pages > z->pages_low)
- continue;
-
- z->zone_wake_kswapd = 1;
- wake_up_interruptible(&kswapd_wait);
-
- /* Are we reaching the critical stage? */
- if (!z->low_on_memory) {
- /* Not yet critical, so let kswapd handle it.. */
- if (z->free_pages > z->pages_min)
- continue;
- z->low_on_memory = 1;
- }
- /*
- * In the atomic allocation case we only 'kick' the
- * state machine, but do not try to free pages
- * ourselves.
- */
- tried = 1;
- freed |= try_to_free_pages(gfp_mask, z);
- }
- if (tried && !freed) {
- if (!(gfp_mask & __GFP_HIGH))
- return 0;
- }
- return 1;
-}
-
/*
* This is the 'heart' of the zoned buddy allocator:
*/
struct page * __alloc_pages(zonelist_t *zonelist, unsigned long order)
{
zone_t **zone = zonelist->zones;
- int gfp_mask = zonelist->gfp_mask;
- static int low_on_memory;
-
- /*
- * If this is a recursive call, we'd better
- * do our best to just allocate things without
- * further thought.
- */
- if (current->flags & PF_MEMALLOC)
- goto allocate_ok;
-
- /* If we're a memory hog, unmap some pages */
- if (current->hog && low_on_memory &&
- (gfp_mask & __GFP_WAIT))
- swap_out(4, gfp_mask);
/*
* (If anyone calls gfp from interrupts nonatomically then it
- * will sooner or later tripped up by a schedule().)
+ * will be sooner or later tripped up by a schedule().)
*
* We are falling back to lower-level zones if allocation
* in a higher zone fails.
*/
for (;;) {
zone_t *z = *(zone++);
+
if (!z)
break;
+
if (!z->size)
BUG();
- /* Are we supposed to free memory? Don't make it worse.. */
- if (!z->zone_wake_kswapd && z->free_pages > z->pages_low) {
+ /*
+ * If this is a recursive call, we'd better
+ * do our best to just allocate things without
+ * further thought.
+ */
+ if (!(current->flags & PF_MEMALLOC)) {
+ if (z->free_pages <= z->pages_high) {
+ unsigned long free = classfree(z);
+
+ if (free <= z->pages_low) {
+ extern wait_queue_head_t kswapd_wait;
+
+ z->low_on_memory = 1;
+ wake_up_interruptible(&kswapd_wait);
+ }
+
+ if (free <= z->pages_min) {
+ int gfp_mask = zonelist->gfp_mask;
+
+ if (!try_to_free_pages(gfp_mask, z)) {
+ if (!(gfp_mask & __GFP_HIGH))
+ return NULL;
+ }
+ }
+ }
+ }
+
+ /*
+ * This is an optimization for the 'higher order zone
+ * is empty' case - it can happen even in well-behaved
+ * systems, think the page-cache filling up all RAM.
+ * We skip over empty zones. (this is not exact because
+ * we do not take the spinlock and it's not exact for
+ * the higher order case, but will do it for most things.)
+ */
+ if (z->free_pages) {
struct page *page = rmqueue(z, order);
- low_on_memory = 0;
+
if (page)
return page;
}
}
-
- low_on_memory = 1;
- /*
- * Ok, no obvious zones were available, start
- * balancing things a bit..
- */
- if (zone_balance_memory(zonelist)) {
- zone = zonelist->zones;
-allocate_ok:
- for (;;) {
- zone_t *z = *(zone++);
- if (!z)
- break;
- if (z->free_pages) {
- struct page *page = rmqueue(z, order);
- if (page)
- return page;
- }
- }
- }
return NULL;
-
-/*
- * The main chunk of the balancing code is in this offline branch:
- */
}
/*
@@ -599,7 +549,6 @@
zone->pages_low = mask*2;
zone->pages_high = mask*3;
zone->low_on_memory = 0;
- zone->zone_wake_kswapd = 0;
zone->zone_mem_map = mem_map + offset;
zone->zone_start_mapnr = offset;
zone->zone_start_paddr = zone_start_paddr;
@@ -642,7 +591,8 @@
while (get_option(&str, &zone_balance_ratio[j++]) == 2);
printk("setup_mem_frac: ");
- for (j = 0; j < MAX_NR_ZONES; j++) printk("%d ", zone_balance_ratio[j]);
+ for (j = 0; j < MAX_NR_ZONES; j++)
+ printk("%d ", zone_balance_ratio[j]);
printk("\n");
return 1;
}
Index: 9906.2/include/linux/mmzone.h
--- 9906.2/include/linux/mmzone.h Thu, 27 Apr 2000 22:11:43 +0200 zcalusic (linux/u/c/2_mmzone.h 1.9 644)
+++ 9906.5/include/linux/mmzone.h Sun, 07 May 2000 20:39:35 +0200 zcalusic (linux/u/c/2_mmzone.h 1.10 644)
@@ -29,7 +29,6 @@
unsigned long offset;
unsigned long free_pages;
char low_on_memory;
- char zone_wake_kswapd;
unsigned long pages_min, pages_low, pages_high;
/*
[-- Attachment #3: Type: text/plain, Size: 12 bytes --]
--
Zlatko
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH] Recent VM fiasco - fixed
2000-05-08 17:21 [PATCH] Recent VM fiasco - fixed Zlatko Calusic
@ 2000-05-08 17:43 ` Rik van Riel
2000-05-08 18:16 ` Zlatko Calusic
2000-05-09 7:56 ` Daniel Stone
0 siblings, 2 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-08 17:43 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: linux-mm, linux-kernel, Linus Torvalds
On 8 May 2000, Zlatko Calusic wrote:
> BTW, this patch mostly *removes* cruft recently added, and
> returns to the known state of operation.
Which doesn't work.
Think of a 1GB machine which has a 16MB DMA zone,
a 950MB normal zone and a very small HIGHMEM zone.
With the old VM code the HIGHMEM zone would be
swapping like mad while the other two zones are
idle.
It's Not That Kind Of Party(tm)
cheers,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-08 17:43 ` Rik van Riel
@ 2000-05-08 18:16 ` Zlatko Calusic
2000-05-08 18:20 ` Linus Torvalds
2000-05-08 18:46 ` Rik van Riel
2000-05-09 7:56 ` Daniel Stone
1 sibling, 2 replies; 67+ messages in thread
From: Zlatko Calusic @ 2000-05-08 18:16 UTC (permalink / raw)
To: riel; +Cc: linux-mm, linux-kernel, Linus Torvalds
Rik van Riel <riel@conectiva.com.br> writes:
> On 8 May 2000, Zlatko Calusic wrote:
>
> > BTW, this patch mostly *removes* cruft recently added, and
> > returns to the known state of operation.
>
> Which doesn't work.
>
> Think of a 1GB machine which has a 16MB DMA zone,
> a 950MB normal zone and a very small HIGHMEM zone.
>
> With the old VM code the HIGHMEM zone would be
> swapping like mad while the other two zones are
> idle.
>
> It's Not That Kind Of Party(tm)
>
OK, I see now what you have in mind, and I'll try to test it when I
get home (yes, late worker... my only connection to the Net :))
If only I could buy 1GB to test in the real setup. ;)
But still, optimizing for 1GB, while at the same time completely
killing performances even *usability* for the 99% of users doesn't
look like a good solution, does it?
There was lot of VM changes recently (>100K of patches) where we went
further and further away from the mostly stable code base (IMHO)
trying to fix zone balancing. Maybe it's time we try again, fresh from
the "start"?
I'll admit I didn't understand most of the conversation about zone
balancing recently on linux-mm. And I know it's because I didn't have
much time lately to hack the kernel, unfortunately.
But after few hours spent dealing with the horrible VM that is in the
pre6, I'm not scared anymore. And I think that solution to all our
problems with zone balancing must be very simple. But it's probably
hard to find, so it will need lots of modeling and testing. I don't
think adding few lines here and there all the time will take us
anywhere.
Regards,
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-08 18:16 ` Zlatko Calusic
@ 2000-05-08 18:20 ` Linus Torvalds
2000-05-08 18:46 ` Rik van Riel
1 sibling, 0 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-08 18:20 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: riel, linux-mm, linux-kernel
On 8 May 2000, Zlatko Calusic wrote:
>
> But still, optimizing for 1GB, while at the same time completely
> killing performances even *usability* for the 99% of users doesn't
> look like a good solution, does it?
Oh, definitely. I'll make a new pre7 that has a lot of the simplifications
discussed here over the weekend, and seems to work for me (tested both on
a 512MB setup and a 64MB setup for some sanity).
This pre7 almost certainly won't be all that perfect either, but gives a
better starting point.
> But after few hours spent dealing with the horrible VM that is in the
> pre6, I'm not scared anymore.
Good. This is really not scary stuff. Much of it is quite straightforward,
and is mainly just getting the right "feel". It's really easy to make
mistakes here, but they tend to be mistakes that just makes the system act
badly, not the kind of _really_ scary mistakes (the ones that make it
corrupt disks randomly ;)
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-08 18:16 ` Zlatko Calusic
2000-05-08 18:20 ` Linus Torvalds
@ 2000-05-08 18:46 ` Rik van Riel
2000-05-08 18:53 ` Zlatko Calusic
1 sibling, 1 reply; 67+ messages in thread
From: Rik van Riel @ 2000-05-08 18:46 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: linux-mm, linux-kernel, Linus Torvalds
On 8 May 2000, Zlatko Calusic wrote:
> Rik van Riel <riel@conectiva.com.br> writes:
> > On 8 May 2000, Zlatko Calusic wrote:
> >
> > > BTW, this patch mostly *removes* cruft recently added, and
> > > returns to the known state of operation.
> >
> > Which doesn't work.
> >
> > Think of a 1GB machine which has a 16MB DMA zone,
> > a 950MB normal zone and a very small HIGHMEM zone.
> >
> > With the old VM code the HIGHMEM zone would be
> > swapping like mad while the other two zones are
> > idle.
> >
> > It's Not That Kind Of Party(tm)
>
> OK, I see now what you have in mind, and I'll try to test it when I
> get home (yes, late worker... my only connection to the Net :))
> If only I could buy 1GB to test in the real setup. ;)
>
> But still, optimizing for 1GB, while at the same time completely
> killing performances even *usability* for the 99% of users doesn't
> look like a good solution, does it?
20MB and 24MB machines will be in the same situation, if
that's of any help to you ;)
> But after few hours spent dealing with the horrible VM that is
> in the pre6, I'm not scared anymore. And I think that solution
> to all our problems with zone balancing must be very simple.
It is. Linus is working on a conservative & simple solution
while I'm trying a bit more "far-out" code (active and inactive
list a'la BSD, etc...). We should have at least one good VM
subsystem within the next few weeks ;)
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-08 18:46 ` Rik van Riel
@ 2000-05-08 18:53 ` Zlatko Calusic
2000-05-08 19:04 ` Rik van Riel
0 siblings, 1 reply; 67+ messages in thread
From: Zlatko Calusic @ 2000-05-08 18:53 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linux-kernel, Linus Torvalds
Rik van Riel <riel@conectiva.com.br> writes:
> 20MB and 24MB machines will be in the same situation, if
> that's of any help to you ;)
>
Yes, you are right. And thanks for that tip (booting with mem=24m)
because that will be my first test case later tonight.
> > But after few hours spent dealing with the horrible VM that is
> > in the pre6, I'm not scared anymore. And I think that solution
> > to all our problems with zone balancing must be very simple.
>
> It is. Linus is working on a conservative & simple solution
> while I'm trying a bit more "far-out" code (active and inactive
> list a'la BSD, etc...). We should have at least one good VM
> subsystem within the next few weeks ;)
>
Nice. I'm also in favour of some kind of active/inactive list
solution (looks promising), but that is probably 2.5.x stuff.
I would be happy to see 2.4 out ASAP. Later, when it stabilizes, we
will have lots of fun in 2.5, that's for sure.
Regards,
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-08 18:53 ` Zlatko Calusic
@ 2000-05-08 19:04 ` Rik van Riel
0 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-08 19:04 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: linux-mm, linux-kernel, Linus Torvalds
On 8 May 2000, Zlatko Calusic wrote:
> Rik van Riel <riel@conectiva.com.br> writes:
>
> > > But after few hours spent dealing with the horrible VM that is
> > > in the pre6, I'm not scared anymore. And I think that solution
> > > to all our problems with zone balancing must be very simple.
> >
> > It is. Linus is working on a conservative & simple solution
> > while I'm trying a bit more "far-out" code (active and inactive
> > list a'la BSD, etc...). We should have at least one good VM
> > subsystem within the next few weeks ;)
>
> Nice. I'm also in favour of some kind of active/inactive list
> solution (looks promising), but that is probably 2.5.x stuff.
I have it booting (against pre7-4) and it seems almost
stable ;) (with _low_ overhead)
> I would be happy to see 2.4 out ASAP. Later, when it stabilizes,
> we will have lots of fun in 2.5, that's for sure.
Of course, this has the highest priority.
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-08 17:43 ` Rik van Riel
2000-05-08 18:16 ` Zlatko Calusic
@ 2000-05-09 7:56 ` Daniel Stone
2000-05-09 8:25 ` Christoph Rohland
2000-05-09 10:21 ` Rik van Riel
1 sibling, 2 replies; 67+ messages in thread
From: Daniel Stone @ 2000-05-09 7:56 UTC (permalink / raw)
To: riel; +Cc: Zlatko Calusic, linux-mm, linux-kernel, Linus Torvalds
Rik,
That's astonishing, I'm sure, but think of us poor bastards who DON'T have
an SMP machine with >1gig of RAM.
This is a P120, 32meg. Lately, fine has degenerated into bad into worse
into absolutely obscene. It even kills my PGSQL compiles.
And I killed *EVERYTHING* there was to kill.
The only processes were init, bash and gcc/cc1. VM still wiped it out.
d
On Mon, 8 May 2000, Rik van Riel wrote:
> On 8 May 2000, Zlatko Calusic wrote:
>
> > BTW, this patch mostly *removes* cruft recently added, and
> > returns to the known state of operation.
>
> Which doesn't work.
>
> Think of a 1GB machine which has a 16MB DMA zone,
> a 950MB normal zone and a very small HIGHMEM zone.
>
> With the old VM code the HIGHMEM zone would be
> swapping like mad while the other two zones are
> idle.
>
> It's Not That Kind Of Party(tm)
>
> cheers,
>
> Rik
> --
> The Internet is not a network of computers. It is a network
> of people. That is its real strength.
>
> Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
> http://www.conectiva.com/ http://www.surriel.com/
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-09 7:56 ` Daniel Stone
@ 2000-05-09 8:25 ` Christoph Rohland
2000-05-09 15:44 ` Linus Torvalds
2000-05-09 10:21 ` Rik van Riel
1 sibling, 1 reply; 67+ messages in thread
From: Christoph Rohland @ 2000-05-09 8:25 UTC (permalink / raw)
To: Daniel Stone; +Cc: riel, Zlatko Calusic, linux-mm, linux-kernel, Linus Torvalds
Daniel Stone <tamriel@ductape.net> writes:
> That's astonishing, I'm sure, but think of us poor bastards who
> DON'T have an SMP machine with >1gig of RAM.
He has to care obout us fortunate guys with e.g. 8GB memory also. The
recent kernels are broken for that also.
Greetings
Christoph
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-09 8:25 ` Christoph Rohland
@ 2000-05-09 15:44 ` Linus Torvalds
2000-05-09 16:12 ` Simon Kirby
` (2 more replies)
0 siblings, 3 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-09 15:44 UTC (permalink / raw)
To: Christoph Rohland
Cc: Daniel Stone, riel, Zlatko Calusic, linux-mm, linux-kernel
On 9 May 2000, Christoph Rohland wrote:
> Daniel Stone <tamriel@ductape.net> writes:
>
> > That's astonishing, I'm sure, but think of us poor bastards who
> > DON'T have an SMP machine with >1gig of RAM.
>
> He has to care obout us fortunate guys with e.g. 8GB memory also. The
> recent kernels are broken for that also.
Try out the really recent one - pre7-8. So far it hassome good reviews,
and I've tested it both on a 20MB machine and a 512MB one..
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH] Recent VM fiasco - fixed
2000-05-09 15:44 ` Linus Torvalds
@ 2000-05-09 16:12 ` Simon Kirby
2000-05-09 17:42 ` Christoph Rohland
2000-05-10 4:05 ` James H. Cloos Jr.
2 siblings, 0 replies; 67+ messages in thread
From: Simon Kirby @ 2000-05-09 16:12 UTC (permalink / raw)
To: linux-mm, linux-kernel
On Tue, May 09, 2000 at 08:44:43AM -0700, Linus Torvalds wrote:
> On 9 May 2000, Christoph Rohland wrote:
>
> > Daniel Stone <tamriel@ductape.net> writes:
> >
> > > That's astonishing, I'm sure, but think of us poor bastards who
> > > DON'T have an SMP machine with >1gig of RAM.
> >
> > He has to care obout us fortunate guys with e.g. 8GB memory also. The
> > recent kernels are broken for that also.
>
> Try out the really recent one - pre7-8. So far it hassome good reviews,
> and I've tested it both on a 20MB machine and a 512MB one..
On my box with 128 MB dual SMP 450 MHz box, there's still definitely
something broken (pre7-8). I notice it most with mutt loading the
linux-kernel folder... The folder is about 54 MB, and it takes kswapd
about 3 to 4 seconds of CPU time to clear out old stuff when it loads.
This is pretty bad considering mutt itself takes only about 5 seconds
of real time to load the folder.
The main thing that fills up my cache is mainly playback of MP3s off
disk, which is pretty much running all the time. If I open the folder,
quit, let MP3 playing fill eat up the free memory into cache, and then
run mutt again, kswapd use goes up 3 or 4 seconds further again.
I never used to see this with 2.2 kernels...
Simon-
[ Stormix Technologies Inc. ][ NetNation Communications Inc. ]
[ sim@stormix.com ][ sim@netnation.com ]
[ Opinions expressed are not necessarily those of my employers. ]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-09 15:44 ` Linus Torvalds
2000-05-09 16:12 ` Simon Kirby
@ 2000-05-09 17:42 ` Christoph Rohland
2000-05-09 19:50 ` Linus Torvalds
2000-05-10 4:05 ` James H. Cloos Jr.
2 siblings, 1 reply; 67+ messages in thread
From: Christoph Rohland @ 2000-05-09 17:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Daniel Stone, riel, Zlatko Calusic, linux-mm, linux-kernel
Linus Torvalds <torvalds@transmeta.com> writes:
> Try out the really recent one - pre7-8. So far it hassome good reviews,
> and I've tested it both on a 20MB machine and a 512MB one..
Nope, does more or less lockup after the first attempt to swap
something out. I can still run ls and free. but as soon as something
touches /proc it locks up. Also my test programs do not do anything
any more.
I append the mem and task info from sysrq. Mem info seems to not
change after lockup.
Greetings
Christoph
SysRq: Show Memory
Mem-info:
Free pages: 713756kB ( 2040kB HighMem)
( Free: 178439, lru_cache: 3149 (1024 2048 3072) )
DMA: 1*4kB 2*8kB 1*16kB 4*32kB 3*64kB 3*128kB 1*256kB 1*512kB 0*1024kB 6*2048kB = 13796kB)
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 340*2048kB = 697920kB)
HighMem: 2*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB = 2040kB)
Swap cache: add 0, delete 0, find 0/0
Free swap: 4048296kB
2162688 pages of RAM
1867776 pages of HIGHMEM
104332 reserved pages
868894 pages shared
0 pages swap cached
0 pages in page table cache
Buffer memory: 1340kB
CLEAN: 175 buffers, 700 kbyte, 3 used (last=47), 0 locked, 0 protected, 0 dirty
LOCKED: 217 buffers, 868 kbyte, 19 used (last=190), 0 locked, 0 protected, 0
dirty
SysRq: Show State
free sibling
task PC stack pid father child younger older
init R C1089F0C 0 1 0 612 (NOTLB)
sig: 0 0000000000000000 0000000000000000 : X
kswapd D C4F154E8 0 2 1 (L-TLB) 3
sig: 0 0000000000000000 ffffffffffffffff : X
kflushd S C3FEC000 0 3 1 (L-TLB) 4 2
sig: 0 0000000000000000 ffffffffffffffff : X
kupdate R C3FEBFC4 0 4 1 (L-TLB) 278 3
sig: 0 0000000000000000 fffffffffff9ffff : X
portmap S 7FFFFFFF 2856 278 1 (NOTLB) 341 4
sig: 0 0000000000000000 0000000000000000 : X
syslogd R 7FFFFFFF 0 341 1 (NOTLB) 352 278
sig: 1 0000000000002000 0000000000000000 : 14 X
klogd R C3E78000 0 352 1 (NOTLB) 368 341
sig: 0 0000000000000000 0000000000000000 : X
atd S C3E49F78 2856 368 1 (NOTLB) 384 352
sig: 0 0000000000000000 0000000000010000 : X
crond R C3E3DF78 2856 384 1 (NOTLB) 404 368
sig: 0 0000000000000000 0000000000000000 : X
inetd S 7FFFFFFF 2856 404 1 (NOTLB) 413 384
sig: 0 0000000000000000 0000000000000000 : X
sshd S 7FFFFFFF 0 413 1 634 (NOTLB) 429 404
sig: 0 0000000000000000 0000000000000000 : X
lpd S 7FFFFFFF 0 429 1 (NOTLB) 469 413
sig: 0 0000000000000000 0000000000000000 : X
automount R C3EC56C0 0 469 1 (NOTLB) 471 429
sig: 1 0000000000002000 0000000000000000 : 14 X
automount R CD486AA0 4992 471 1 (NOTLB) 511 469
sig: 1 0000000000002000 0000000000000000 : 14 X
sendmail R C119FF0C 5956 511 1 (NOTLB) 528 471
sig: 0 0000000000000000 0000000000000000 : X
gpm S C117BF0C 0 528 1 (NOTLB) 544 511
sig: 0 0000000000000000 0000000000000000 : X
httpd R C1181F0C 0 544 1 557 (NOTLB) 571 528
sig: 0 0000000000000000 0000000000000000 : X
httpd S C117DF38 0 548 544 (NOTLB) 549
sig: 0 0000000000000000 0000000000000000 : X
httpd S C1185F38 0 549 544 (NOTLB) 550 548
sig: 0 0000000000000000 0000000000000000 : X
httpd S 7FFFFFFF 0 550 544 (NOTLB) 551 549
sig: 0 0000000000000000 0000000000000000 : X
httpd S C113FF38 0 551 544 (NOTLB) 552 550
sig: 0 0000000000000000 0000000000000000 : X
httpd S C1133F38 0 552 544 (NOTLB) 553 551
sig: 0 0000000000000000 0000000000000000 : X
httpd S C1129F38 0 553 544 (NOTLB) 554 552
sig: 0 0000000000000000 0000000000000000 : X
httpd S C1127F38 0 554 544 (NOTLB) 555 553
sig: 0 0000000000000000 0000000000000000 : X
httpd S C110FF38 0 555 544 (NOTLB) 556 554
sig: 0 0000000000000000 0000000000000000 : X
httpd S C1101F38 0 556 544 (NOTLB) 557 555
sig: 0 0000000000000000 0000000000000000 : X
httpd S F75F5F38 0 557 544 (NOTLB) 556
sig: 0 0000000000000000 0000000000000000 : X
xfs S F75B7F0C 0 571 1 (NOTLB) 606 544
sig: 0 0000000000000000 0000000000000000 : X
mingetty S 7FFFFFFF 5124 606 1 (NOTLB) 607 571
sig: 0 0000000000000000 0000000000000000 : X
mingetty S 7FFFFFFF 2856 607 1 (NOTLB) 608 606
sig: 0 0000000000000000 0000000000000000 : X
mingetty S 7FFFFFFF 2856 608 1 (NOTLB) 609 607
sig: 0 0000000000000000 0000000000000000 : X
mingetty S 7FFFFFFF 2856 609 1 (NOTLB) 610 608
sig: 0 0000000000000000 0000000000000000 : X
mingetty S 7FFFFFFF 2856 610 1 (NOTLB) 611 609
sig: 0 0000000000000000 0000000000000000 : X
mingetty S 7FFFFFFF 2856 611 1 (NOTLB) 612 610
sig: 0 0000000000000000 0000000000000000 : X
login S 00000000 2856 612 1 617 (NOTLB) 611
sig: 0 0000000000000000 0000000000000000 : X
bash S 00000000 0 617 612 633 (NOTLB)
sig: 0 0000000000000000 0000000000010000 : X
vmstat R F74E5F78 0 633 617 (NOTLB)
sig: 1 0000000000080000 0000000000000000 : 20 X
sshd R 7FFFFFFF 0 634 413 636 (NOTLB)
sig: 0 0000000000000000 0000000000000000 : X
xterm S 7FFFFFFF 4900 636 634 639 (NOTLB)
sig: 0 0000000000000000 0000000000000000 : X
bash S 7FFFFFFF 0 639 636 652 (NOTLB)
sig: 0 0000000000000000 0000000000000000 : X
ipctst R F746A000 2856 642 639 651 (NOTLB) 652
sig: 0 0000000000000000 0000000000000000 : X
ipctst D F6AB52B4 2856 643 642 (NOTLB) 644
sig: 0 0000000000000000 0000000000000000 : X
ipctst R F7458000 0 644 642 (NOTLB) 645 643
sig: 0 0000000000000000 0000000000000000 : X
ipctst R F7448000 0 645 642 (NOTLB) 646 644
sig: 0 0000000000000000 0000000000000000 : X
ipctst R F7436000 0 646 642 (NOTLB) 647 645
sig: 0 0000000000000000 0000000000000000 : X
ipctst R C01DCB90 0 647 642 (NOTLB) 648 646
sig: 0 0000000000000000 0000000000000000 : X
ipctst R current 0 648 642 (NOTLB) 649 647
sig: 0 0000000000000000 0000000000000000 : X
ipctst R F746FCB4 0 649 642 (NOTLB) 650 648
sig: 0 0000000000000000 0000000000000000 : X
ipctst R C0123017 0 650 642 (NOTLB) 651 649
sig: 0 0000000000000000 0000000000000000 : X
ipctst R F73E2000 0 651 642 (NOTLB) 650
sig: 0 0000000000000000 0000000000000000 : X
ipctst R F678C000 0 652 639 653 (NOTLB) 642
sig: 0 0000000000000000 0000000000000000 : X
ipctst R F6784000 5612 653 652 (NOTLB)
sig: 0 0000000000000000 0000000000000000 : X
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH] Recent VM fiasco - fixed
2000-05-09 17:42 ` Christoph Rohland
@ 2000-05-09 19:50 ` Linus Torvalds
2000-05-10 11:25 ` Christoph Rohland
0 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-09 19:50 UTC (permalink / raw)
To: Christoph Rohland
Cc: Daniel Stone, riel, Zlatko Calusic, linux-mm, linux-kernel
On 9 May 2000, Christoph Rohland wrote:
> Linus Torvalds <torvalds@transmeta.com> writes:
>
> > Try out the really recent one - pre7-8. So far it hassome good reviews,
> > and I've tested it both on a 20MB machine and a 512MB one..
>
> Nope, does more or less lockup after the first attempt to swap
> something out. I can still run ls and free. but as soon as something
> touches /proc it locks up. Also my test programs do not do anything
> any more.
This may be due to an unrelated bug with the task_lock() fixing (see
separate patch from Manfred for that one).
> I append the mem and task info from sysrq. Mem info seems to not
> change after lockup.
I suspect that if you do right-alt + scrolllock, you'll see it looping on
a spinlock. Which is why the memory info isn't changing ;)
But I'll double-check the shm code (I didn't test anything that did any
shared memory, for example).
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-09 19:50 ` Linus Torvalds
@ 2000-05-10 11:25 ` Christoph Rohland
2000-05-10 11:50 ` Zlatko Calusic
0 siblings, 1 reply; 67+ messages in thread
From: Christoph Rohland @ 2000-05-10 11:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Daniel Stone, riel, Zlatko Calusic, linux-mm, linux-kernel
Linus Torvalds <torvalds@transmeta.com> writes:
> On 9 May 2000, Christoph Rohland wrote:
>
> > Linus Torvalds <torvalds@transmeta.com> writes:
> >
> > > Try out the really recent one - pre7-8. So far it hassome good reviews,
> > > and I've tested it both on a 20MB machine and a 512MB one..
> > I append the mem and task info from sysrq. Mem info seems to not
> > change after lockup.
>
> I suspect that if you do right-alt + scrolllock, you'll see it looping on
> a spinlock. Which is why the memory info isn't changing ;)
>
> But I'll double-check the shm code (I didn't test anything that did any
> shared memory, for example).
Juan Quintela's patch fixes the lockup. shm paging locked up on the
page lock.
Now I can give more data about pre7-8. After a short run I can say the
following:
The machine seems to be stable, but VM is mainly unbalanced:
[root@ls3016 /root]# vmstat 5
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
[...]
9 3 0 0 1460016 1588 11284 0 0 0 0 109 23524 4 96 0
9 3 1 7552 557432 1004 19320 0 1607 0 402 186 42582 2 89 9
11 1 1 41972 111368 424 53740 0 6884 2 1721 277 25904 0 89 10
11 1 0 48084 11896 276 59404 0 1133 1 284 181 4439 0 95 5
13 2 2 48352 466952 180 52960 5 158 4 39 230 6381 2 98 0
10 3 1 53400 934204 248 59940 498 1442 128 363 272 3953 1 99 0
11 3 1 52624 878696 300 59820 248 50 81 13 148 971 0 100 0
11 1 0 4556 883852 316 16164 855 0 214 1 127 25188 3 97 0
12 0 0 3936 525620 316 15544 0 0 0 0 109 33969 4 96 0
12 0 0 3936 2029556 316 15544 0 0 0 0 123 19659 4 96 0
11 1 0 3936 686856 316 15544 0 0 0 0 117 14370 3 97 0
12 0 0 3936 388176 320 15544 0 0 0 0 121 7477 3 97 0
10 3 1 47660 5216 88 19992 0 9353 0 2341 757 1267 0 97 3
VM: killing process ipctst
6 6 1 36792 484880 152 26892 65 12307 21 3078 1619 2184 0 94 6
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
10 1 1 39620 66736 148 29364 8 494 2 125 327 1980 0 100 0
VM: killing process ipctst
9 2 1 46536 627356 116 31072 87 8675 23 2169 1784 1412 0 96 4
10 0 1 46664 617368 116 31200 0 26 0 6 258 112 0 100 0
10 0 1 47300 607184 116 31832 0 126 0 32 291 110 0 100 0
So we are swapping out with lots of free memory and killing random
processes. The machine also becomes quite unresponsive compared to
pre4 on the same tests.
Greetings
Christoph
--
Christoph Rohland Tel: +49 6227 748201
SAP AG Fax: +49 6227 758201
LinuxLab Email: cr@sap.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-10 11:25 ` Christoph Rohland
@ 2000-05-10 11:50 ` Zlatko Calusic
2000-05-11 23:40 ` Mark Hahn
0 siblings, 1 reply; 67+ messages in thread
From: Zlatko Calusic @ 2000-05-10 11:50 UTC (permalink / raw)
To: Christoph Rohland
Cc: Linus Torvalds, Daniel Stone, riel, linux-mm, linux-kernel
Christoph Rohland <cr@sap.com> writes:
> Linus Torvalds <torvalds@transmeta.com> writes:
>
> > On 9 May 2000, Christoph Rohland wrote:
> >
> > > Linus Torvalds <torvalds@transmeta.com> writes:
> > >
> > > > Try out the really recent one - pre7-8. So far it hassome good reviews,
> > > > and I've tested it both on a 20MB machine and a 512MB one..
>
> > > I append the mem and task info from sysrq. Mem info seems to not
> > > change after lockup.
> >
> > I suspect that if you do right-alt + scrolllock, you'll see it looping on
> > a spinlock. Which is why the memory info isn't changing ;)
> >
> > But I'll double-check the shm code (I didn't test anything that did any
> > shared memory, for example).
>
> Juan Quintela's patch fixes the lockup. shm paging locked up on the
> page lock.
>
> Now I can give more data about pre7-8. After a short run I can say the
> following:
>
> The machine seems to be stable, but VM is mainly unbalanced:
>
> [root@ls3016 /root]# vmstat 5
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
>
> [...]
>
> 9 3 0 0 1460016 1588 11284 0 0 0 0 109 23524 4 96 0
> 9 3 1 7552 557432 1004 19320 0 1607 0 402 186 42582 2 89 9
> 11 1 1 41972 111368 424 53740 0 6884 2 1721 277 25904 0 89 10
[ too many lines error, truncating... ]
> 9 2 1 46536 627356 116 31072 87 8675 23 2169 1784 1412 0 96 4
> 10 0 1 46664 617368 116 31200 0 26 0 6 258 112 0 100 0
> 10 0 1 47300 607184 116 31832 0 126 0 32 291 110 0 100 0
>
> So we are swapping out with lots of free memory and killing random
> processes. The machine also becomes quite unresponsive compared to
> pre4 on the same tests.
>
I'll second this!
I checked pre7-8 briefly, but I/O & MM interaction is bad. Lots of
swapping, lots of wasted CPU cycles and lots of dead writer processes
(write(2): out of memory, while there is 100MB in the page cache).
Back to my patch and working on the solution for the 20-24 MB & 1GB
machines. Anybody with spare 1GB RAM to help development? :)
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-10 11:50 ` Zlatko Calusic
@ 2000-05-11 23:40 ` Mark Hahn
0 siblings, 0 replies; 67+ messages in thread
From: Mark Hahn @ 2000-05-11 23:40 UTC (permalink / raw)
To: linux-mm
> I checked pre7-8 briefly, but I/O & MM interaction is bad. Lots of
> swapping, lots of wasted CPU cycles and lots of dead writer processes
> (write(2): out of memory, while there is 100MB in the page cache).
I've checked pre7-8 and -9 fairly extensively, and it works GREAT.
this is the first kernel since around 2.3.36 that passes my main criteria:
1. I have an app that sequentially traverses 12 40M chunks of data by
mmaping one, reading each u16, unmapping, on to the next. until
pre7-8, old 40M chunks would NOT be scavenged, and instead the ~10M
rss of the analysis program would be thrashed, over and over.
with pre7-8 and -9, there's only incidental swapping, and performance
is roughly 2.2x better than preceeding kernels.
2. big compilations (kernel make -j2) seem to run fine:
under 2.3.99-7-8:
334.65user 20.28system 3:01.53elapsed 195%CPU (330186major+472843minor)pf
334.23user 20.28system 2:58.13elapsed 199%CPU (340672major+472770minor)pf
334.33user 20.28system 2:57.79elapsed 199%CPU (329202major+472769minor)pf
287.99user 17.51system 2:33.72elapsed 198%CPU (270411major+396913minor)pf
335.65user 20.31system 3:01.13elapsed 196%CPU (332370major+472770minor)pf
under 2.3.99-pre7 (somewhat hacked):
333.55user 20.37system 3:19.69elapsed 177%CPU (341428major+472709minor)
334.02user 19.53system 3:09.28elapsed 186%CPU (330283major+472709minor)
334.57user 18.98system 3:08.02elapsed 188%CPU (328941major+472709minor)
334.89user 18.97system 3:07.91elapsed 188%CPU (328941major+472709minor)
333.22user 20.36system 3:07.75elapsed 188%CPU (328941major+472709minor)
334.15user 19.42system 3:07.84elapsed 188%CPU (328941major+472709minor)
under 2.3.36:
332.59user 19.93system 3:38.24elapsed 161%CPU (331704major+468634minor)
332.16user 21.14system 3:07.62elapsed 188%CPU (328998major+468634minor)
296.87user 17.93system 2:39.25elapsed 197%CPU (284086major+408452minor)
332.48user 20.89system 3:07.80elapsed 188%CPU (328998major+468634minor)
296.28user 18.08system 2:39.04elapsed 197%CPU (283978major+408169minor)
under 2.3.99-7-9:
331.28user 21.01system 3:18.83elapsed 177%CPU (328941major+472703minor)
334.06user 19.17system 3:07.72elapsed 188%CPU (328941major+472703minor)
332.79user 20.59system 3:07.73elapsed 188%CPU (328941major+472703minor)
334.29user 19.22system 3:07.55elapsed 188%CPU (328941major+472703minor)
332.25user 20.96system 3:07.55elapsed 188%CPU (328941major+472703minor)
332.09user 21.45system 3:07.67elapsed 188%CPU (328941major+472703minor)
334.04user 19.62system 3:07.72elapsed 188%CPU (328941major+472703minor)
334.38user 18.98system 3:07.50elapsed 188%CPU (328941major+472703minor)
333.67user 19.54system 3:07.54elapsed 188%CPU (328941major+472703minor)
wow, those identical PF numbers are kinda eerie! the machine was otherwise
idle during these tests, but not single-user. I don't really understand
why 2.3.36 would sometimes perform *significantly* better.
3. disk bandwidth (bonnie) is excellent on 2.3.99-7-8 or -9
I usually use this machine remotely, so I can't comment on "feel".
big memory or IO load didn't seem to hurt the update latency of top/vmstat
type tools. machine is a dual celeron/550, bx, 128M, single udma.
I briefly tested a kernel build on an old 32M cyrix 166, and it
was a little slower than 2.3.36.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-09 15:44 ` Linus Torvalds
2000-05-09 16:12 ` Simon Kirby
2000-05-09 17:42 ` Christoph Rohland
@ 2000-05-10 4:05 ` James H. Cloos Jr.
2000-05-10 7:29 ` James H. Cloos Jr.
2 siblings, 1 reply; 67+ messages in thread
From: James H. Cloos Jr. @ 2000-05-10 4:05 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, linux-kernel
>>>>> "Linus" == Linus Torvalds <torvalds@transmeta.com> writes:
Linus> Try out the really recent one - pre7-8. So far it hassome good
Linus> reviews, and I've tested it both on a 20MB machine and a 512MB
Linus> one..
pre7-8 still isn't completely fixed, but it is better than pre6.
Try doing something like 'cp -a linux-2.3.99-pre7-8 foobar' and
watching kswapd in top (or qps, el al). On my dual-proc box, kswapd
still maxes out one of the cpus. Tar doesn't seem to show it, but
bzcat can get an occasional segfault on large files.
The filesystem, though, has 1k rather than 4k blocks. Yeah, just
tested again on a fs w/ 4k blocks. kswapd only used 50% to 65% of a
cpu, but that was an ide drive and the former was on a scsi drive.[1]
OTOH, in pre6 X would hit (or at least report) 2^32-1 major faults
after only a few hours of usage. That bug is gone in pre7-8.
[1] asus p2b-ds mb using onboard adaptec scsi and piix ide; drives are
all IBM ultrastars and deskstars.
-JimC
--
James H. Cloos, Jr. <URL:http://jhcloos.com/public_key> 1024D/ED7DAEA6
<cloos@jhcloos.com> E9E9 F828 61A4 6EA9 0F2B 63E7 997A 9F17 ED7D AEA6
Save Trees: Get E-Gold! <URL:http://jhcloos.com/go?e-gold>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH] Recent VM fiasco - fixed
2000-05-10 4:05 ` James H. Cloos Jr.
@ 2000-05-10 7:29 ` James H. Cloos Jr.
2000-05-11 0:16 ` Linus Torvalds
0 siblings, 1 reply; 67+ messages in thread
From: James H. Cloos Jr. @ 2000-05-10 7:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, linux-kernel
Ok. Tried w/ Manfred patch (ie the 2nd half). kswapd still uses a lot
of cpu doing recursuve cp(1)s, but it is less than in virgin pre7-8. I
got about 10s of cpu for cp and 40s for kswapd doing a cp -a of the 7-8
tree (after compiling) on the ide drive (w/ 4k ext2 blocks). On the 1k
ext2 block scsi partition, it was 1m50s for kswapd and 20s for cp to cp
three such trees. kswapd %cpu never exceeded 65% on the latter and 50%
on the former; substantially better than in virgin 7-8, but not as good
as earlier kernels (though I don't have any numbers to back that up). I
did this test in single user mode w/ only top running (on another vc).
Hope the datapoint helps!
-JimC
--
James H. Cloos, Jr. <URL:http://jhcloos.com/public_key> 1024D/ED7DAEA6
<cloos@jhcloos.com> E9E9 F828 61A4 6EA9 0F2B 63E7 997A 9F17 ED7D AEA6
Save Trees: Get E-Gold! <URL:http://jhcloos.com/go?e-gold>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH] Recent VM fiasco - fixed
2000-05-10 7:29 ` James H. Cloos Jr.
@ 2000-05-11 0:16 ` Linus Torvalds
2000-05-11 0:32 ` Linus Torvalds
` (3 more replies)
0 siblings, 4 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-11 0:16 UTC (permalink / raw)
To: James H. Cloos Jr.; +Cc: linux-mm, linux-kernel
Ok, there's a pre7-9 out there, and the biggest change versus pre7-8 is
actually how block fs dirty data is flushed out. Instead of just waking up
kflushd and hoping for the best, we actually just write it out (and even
wait on it, if absolutely required).
Which makes the whole process much more streamlined, and makes the numbers
more repeatable. It also fixes the problem with dirty buffer cache data
much more efficiently than the kflushd approach, and mmap002 is not a
problem any more. At least for me.
[ I noticed that mmap002 finishes a whole lot faster if I never actually
wait for the writes to complete, but that had some nasty behaviour under
low memory circumstances, so it's not what pre7-9 actually does. I
_suspect_ that I should start actually waiting for pages only when
priority reaches 0 - comments welcomed, see fs/buffer.c and the
sync_page_buffers() function ]
kswapd is still quite aggressive, and will show higher CPU time than
before. This is a tweaking issue - I suspect it is too aggressive right
now, but it needs more testing and feedback.
Just the dirty buffer handling made quite an enormous difference, so
please do test this if you hated earlier pre7 kernels.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 0:16 ` Linus Torvalds
@ 2000-05-11 0:32 ` Linus Torvalds
2000-05-11 16:36 ` [PATCH] Recent VM fiasco - fixed (pre7-9) Rajagopal Ananthanarayanan
2000-05-11 1:04 ` [PATCH] Recent VM fiasco - fixed Juan J. Quintela
` (2 subsequent siblings)
3 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-11 0:32 UTC (permalink / raw)
To: Rajagopal Ananthanarayanan, Juan J. Quintela, Rik van Riel; +Cc: linux-mm
Some more explanations on the differences between pre7-8 and pre7-9..
Basically pre7-9 survives mmap002 quite gracefully, and I think it does so
for all the right reasons. It's not tuned for that load at all, it's just
that mmap002 was really good at showing two weak points of the mm layer:
- try_to_free_pages() could actually return success without freeing a
single page (just moving pages around to the swap cache). This was bad,
because it could cause us to get into a situation where we
"successfully" free'd pages without ever adding any to the list. Which
would, for all the obvious reasons, cause problems later when we
couldn't allocate a page after all..
- The "sync_page_buffers()" thing to sync pages directly to disk rather
than wait for bdflush to do it for us (and have people run out of
memory before bdflush got around to the right pages).
Sadly, as it was set up, try_to_free_buffers() doesn't even get the
"urgency" flag, so right now it doesn't know whether it should wait for
previous write-outs or not. So it always does, even though for
non-critical allocations it should just ignore locked buffers.
Fixing these things suddenly made mmap002 behave quite well. I'll make the
change to pass in the priority to sync_page_buffers() so that I'll get the
increased performance from not waiting when I don't have to, but it starts
to look like pre7 is getting in shape.
Linus
On Wed, 10 May 2000, Linus Torvalds wrote:
>
> Ok, there's a pre7-9 out there, and the biggest change versus pre7-8 is
> actually how block fs dirty data is flushed out. Instead of just waking up
> kflushd and hoping for the best, we actually just write it out (and even
> wait on it, if absolutely required).
>
> Which makes the whole process much more streamlined, and makes the numbers
> more repeatable. It also fixes the problem with dirty buffer cache data
> much more efficiently than the kflushd approach, and mmap002 is not a
> problem any more. At least for me.
>
> [ I noticed that mmap002 finishes a whole lot faster if I never actually
> wait for the writes to complete, but that had some nasty behaviour under
> low memory circumstances, so it's not what pre7-9 actually does. I
> _suspect_ that I should start actually waiting for pages only when
> priority reaches 0 - comments welcomed, see fs/buffer.c and the
> sync_page_buffers() function ]
>
> kswapd is still quite aggressive, and will show higher CPU time than
> before. This is a tweaking issue - I suspect it is too aggressive right
> now, but it needs more testing and feedback.
>
> Just the dirty buffer handling made quite an enormous difference, so
> please do test this if you hated earlier pre7 kernels.
>
> Linus
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH] Recent VM fiasco - fixed (pre7-9)
2000-05-11 0:32 ` Linus Torvalds
@ 2000-05-11 16:36 ` Rajagopal Ananthanarayanan
0 siblings, 0 replies; 67+ messages in thread
From: Rajagopal Ananthanarayanan @ 2000-05-11 16:36 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Juan J. Quintela, Rik van Riel, linux-mm
Dbench runs well on pre7-9. As far as I can tell,
there were NO failures in 15 hours of running,
the longest I've ever run this test. The performance has been
pretty good. Swapping was initially very low, although
it didn't affect performance. Later, guessing that
more periodic system processes started to run, swap
level increased, but stayed to "usual" levels observed
before ... the swap build-up was gradual likely indicating
that the right things were swapped out only when necessary.
regards,
ananth.
--------------------------------------------------------------------------
Rajagopal Ananthanarayanan ("ananth")
Member Technical Staff, SGI.
--------------------------------------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 0:16 ` Linus Torvalds
2000-05-11 0:32 ` Linus Torvalds
@ 2000-05-11 1:04 ` Juan J. Quintela
2000-05-11 1:53 ` Simon Kirby
2000-05-11 5:10 ` Linus Torvalds
2000-05-11 11:12 ` [PATCH] Recent VM fiasco - fixed Christoph Rohland
2000-05-11 17:38 ` Steve Dodd
3 siblings, 2 replies; 67+ messages in thread
From: Juan J. Quintela @ 2000-05-11 1:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: James H. Cloos Jr., linux-mm, linux-kernel
>>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:
linus> Which makes the whole process much more streamlined, and makes the numbers
linus> more repeatable. It also fixes the problem with dirty buffer cache data
linus> much more efficiently than the kflushd approach, and mmap002 is not a
linus> problem any more. At least for me.
linus> [ I noticed that mmap002 finishes a whole lot faster if I never actually
linus> wait for the writes to complete, but that had some nasty behaviour under
linus> low memory circumstances, so it's not what pre7-9 actually does. I
linus> _suspect_ that I should start actually waiting for pages only when
linus> priority reaches 0 - comments welcomed, see fs/buffer.c and the
linus> sync_page_buffers() function ]
Hi
I have done my normal mmap002 test and this goes slower than
ever, it takes something like 3m50 seconds to complete, (pre7-8 2m50,
andrea classzone 2m8, and 2.2.15 1m55 for reference). I have no more
time now to do more testing, I will continue tomorrow late. My
findings are:
real 3m41.403s
user 0m16.010s
sys 0m36.890s
It takes the same user time than anterior versions, but the system
time has aumented a lot, it was ~10/12 seconds in pre7-8 and around 8
in classzone and 2.2.15.
Later, Juan.
--
In theory, practice and theory are the same, but in practice they
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 1:04 ` [PATCH] Recent VM fiasco - fixed Juan J. Quintela
@ 2000-05-11 1:53 ` Simon Kirby
2000-05-11 7:23 ` Linus Torvalds
2000-05-11 11:15 ` [PATCH] Recent VM fiasco - fixed Rik van Riel
2000-05-11 5:10 ` Linus Torvalds
1 sibling, 2 replies; 67+ messages in thread
From: Simon Kirby @ 2000-05-11 1:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]
On Thu, May 11, 2000 at 03:04:37AM +0200, Juan J. Quintela wrote:
> >>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:
>
> linus> Which makes the whole process much more streamlined, and makes the numbers
> linus> more repeatable. It also fixes the problem with dirty buffer cache data
> linus> much more efficiently than the kflushd approach, and mmap002 is not a
> linus> problem any more. At least for me.
>...
> I have done my normal mmap002 test and this goes slower than
> ever, it takes something like 3m50 seconds to complete, (pre7-8 2m50,
> andrea classzone 2m8, and 2.2.15 1m55 for reference). I have no more
> time now to do more testing, I will continue tomorrow late. My
> findings are:
>
> real 3m41.403s
> user 0m16.010s
> sys 0m36.890s
>
>
> It takes the same user time than anterior versions, but the system
> time has aumented a lot, it was ~10/12 seconds in pre7-8 and around 8
> in classzone and 2.2.15.
I, too, see unbelievably slow writing now when uncompressing large data
files. 128 MB, dual processor, IDE drive. It seems to syncronously be
writing out data as it's dirty, not grouping it into blocks at all like
it used to. This would probably increase seeking, no?
Trying now with Andrea's classzone-27 against pre7-8, the results are
much better.
I attached vmstat-1.txt (2.3.99pre7-8+classzone-27) and vmstat-2.txt
(2.3.99pre7-9), which are outputs from "vmstat 1" when uncompressing the
same thing. 2.3.99pre7-9 seems to be taking about twice as long (real
time). This is from and to a 4K EXT2 filesystem. Both seem to swap out
some, which I guess is arguably good or bad...
Is Andrea taking a too dangerous approach for the current kernel version,
or are you trying to get something extremely simple working instead?
Simon-
[ Stormix Technologies Inc. ][ NetNation Communications Inc. ]
[ sim@stormix.com ][ sim@netnation.com ]
[ Opinions expressed are not necessarily those of my employers. ]
[-- Attachment #2: vmstat-1.txt --]
[-- Type: text/plain, Size: 13520 bytes --]
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 0 106132 1100 10048 0 0 63 1 70 48 2 3 95
0 0 0 0 106132 1100 10048 0 0 0 0 102 6 0 0 100
0 0 0 0 106132 1100 10048 0 0 0 0 117 8 0 0 100
0 0 0 0 106132 1100 10048 0 0 0 0 103 10 0 0 100
0 0 0 0 106132 1100 10048 0 0 1 0 102 16 0 0 100
0 0 0 0 106132 1100 10048 0 0 0 0 103 10 0 0 99
0 0 0 0 106132 1100 10048 0 0 0 0 104 10 0 0 100
0 0 0 0 106132 1100 10048 0 0 0 0 104 6 0 1 99
0 0 0 0 106132 1100 10048 0 0 0 0 105 16 0 0 100
0 1 0 0 105904 1116 10236 0 0 100 68 167 50 0 0 100
1 0 0 0 75028 1164 38844 0 0 3670 0 340 261 20 13 67
0 1 0 0 55968 1188 57268 0 0 2322 500 392 277 15 6 79
0 1 0 0 44000 1200 68856 0 0 1442 1500 624 176 7 7 86
0 1 0 0 31836 1216 80620 0 0 1488 1500 593 178 9 6 85
0 1 0 0 19936 1224 92140 0 0 1441 1500 611 170 9 6 85
1 0 0 0 9292 1240 102436 0 0 1296 1000 587 146 5 3 92
0 1 0 248 2880 900 109460 84 324 1239 1581 433 158 8 7 85
0 1 0 248 2672 260 110496 0 0 1474 1500 580 186 8 7 85
1 0 0 232 2748 260 110408 44 0 1308 1000 540 219 8 9 83
0 1 0 232 2400 268 110736 0 0 1251 1500 483 175 8 5 86
0 1 0 232 2392 260 110752 0 0 1441 1500 482 196 8 5 86
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 232 2052 268 111140 0 0 1489 1500 522 206 7 7 85
0 1 0 232 2900 264 110064 0 0 1313 1000 647 169 6 6 88
0 1 0 232 2496 264 110644 0 0 1232 1500 494 171 8 3 89
0 1 0 232 2612 268 110528 0 0 1474 1500 552 206 9 5 85
1 0 0 232 2544 264 110668 0 0 1153 1000 623 146 6 3 90
0 1 1 504 1792 276 111656 0 272 1072 2068 656 499 7 5 87
0 1 0 540 2876 268 110720 0 36 1378 1009 618 202 9 6 85
0 1 0 540 3196 256 110484 0 0 1345 1000 520 189 9 5 86
1 0 0 540 3112 264 110616 0 0 1170 1000 409 150 8 5 87
0 1 0 540 2708 264 110952 0 0 1346 1500 450 139 10 4 86
1 0 0 540 2500 272 111216 0 0 1315 1000 432 140 10 6 84
1 0 0 540 2172 268 111540 0 0 1199 1500 491 162 6 4 90
0 1 0 540 2284 272 111364 0 0 1443 1500 438 135 7 7 86
0 1 0 540 2852 272 110796 0 0 1089 1000 465 148 7 5 88
0 1 0 540 2980 264 110676 0 0 1423 1500 380 140 7 7 86
0 1 0 540 2584 272 111068 0 0 1059 1000 421 146 9 3 88
0 1 0 988 2556 276 111524 0 448 1473 1612 523 148 10 6 84
1 0 0 988 2704 264 111408 0 0 1135 1000 423 163 6 6 88
0 1 0 988 2616 272 111480 0 0 1443 1500 482 165 8 7 85
0 1 0 988 2796 268 111296 0 0 1392 1000 390 169 9 5 86
0 1 0 988 3048 272 111048 0 0 1122 1500 450 161 8 4 88
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 988 3160 268 110936 0 0 1441 1500 540 166 8 6 86
0 1 0 988 2844 268 111316 0 0 1200 1000 479 120 10 3 87
0 1 0 988 2676 284 111396 0 0 1319 1500 375 150 8 6 86
0 1 0 988 2816 280 111280 0 0 1057 1000 384 122 6 4 90
0 1 0 988 2536 284 111544 0 0 1456 1500 426 207 11 3 86
0 1 0 1052 2724 296 111416 0 64 1122 1516 577 102 6 6 88
1 1 0 1200 2164 284 112116 0 148 993 1037 555 93 5 5 89
0 1 0 1200 2884 280 111420 0 0 1488 1000 429 157 9 7 84
0 1 0 1200 2868 284 111428 0 0 1090 1500 521 119 5 5 90
1 0 0 1200 2148 284 112212 0 0 1698 1000 479 148 10 8 82
1 0 0 1200 2080 280 112216 0 0 1167 1500 462 130 8 4 88
1 0 0 1200 2328 284 111968 0 0 1282 1000 385 148 8 5 87
0 1 0 1200 3028 288 111276 0 0 1218 1500 552 129 7 5 87
0 1 0 1200 2704 264 111620 0 0 1487 1500 514 144 9 8 83
1 0 0 1200 2740 272 111628 0 0 1283 1000 477 125 6 8 86
0 1 0 1200 3176 276 111124 0 0 1250 1500 537 125 3 7 89
0 1 0 1388 2916 272 111568 0 188 911 1047 537 108 5 6 89
0 1 0 1388 2772 276 111712 0 0 1603 1500 463 145 10 6 84
1 0 0 1388 2964 272 111592 0 0 1424 1000 482 131 11 2 87
0 1 0 1388 2440 276 112048 0 0 1218 1500 462 130 9 4 87
0 1 0 1388 2936 272 111552 0 0 1377 1500 461 124 7 5 88
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 1388 2128 272 112388 0 0 1104 1000 400 112 3 9 88
0 1 0 1388 3116 276 111372 0 0 1442 1500 462 160 7 8 85
0 1 0 1388 2972 272 111516 0 0 1441 1500 441 138 8 6 86
0 1 0 1388 2268 276 112220 0 0 1104 1000 418 138 5 3 91
0 1 0 1388 2492 276 111976 0 0 1122 1500 559 160 7 5 88
1 0 0 1580 2436 284 112364 0 192 1634 1048 454 170 9 7 84
0 1 0 1756 3084 276 111768 0 176 1231 1544 490 140 9 5 85
1 0 0 1756 2552 276 112424 0 0 1379 1000 426 124 7 4 89
0 1 0 1756 2420 276 112432 0 0 1153 1500 496 142 9 3 87
1 0 0 1756 2640 276 112284 0 0 1510 1500 470 146 7 7 86
0 1 0 1756 2712 280 112132 0 0 1421 1500 467 132 8 5 87
0 1 0 1756 2880 280 111756 0 0 961 1000 516 91 6 3 91
0 1 0 1756 3148 280 111700 0 0 1585 1500 452 171 11 5 83
1 0 0 1756 2428 280 112480 0 0 1377 1000 518 152 7 6 87
0 1 0 1756 2428 280 112420 0 0 1168 1500 398 129 6 6 88
1 0 0 1756 2604 288 112296 0 0 1250 1000 402 149 7 6 86
1 0 0 1860 3032 280 111944 0 104 993 1526 541 104 6 7 87
0 1 0 1860 2096 280 112848 0 0 1712 1500 426 203 9 7 84
0 1 0 1860 2100 284 112844 0 0 1438 1000 479 172 7 7 85
0 1 0 1860 2336 280 112616 0 0 1126 1500 470 156 8 6 86
0 1 0 1860 2652 276 112300 0 0 1263 1000 450 185 8 7 85
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 1860 2124 284 112820 0 0 1218 1500 491 169 7 4 89
0 1 0 1860 2108 284 112836 0 0 1443 1500 466 206 8 7 85
1 0 0 1860 2876 280 112200 0 0 943 1000 465 105 6 4 90
1 1 0 1860 2604 284 112340 0 0 1603 1500 458 146 9 6 84
0 1 0 1860 3108 284 111836 0 0 1089 1000 503 105 7 3 90
0 1 0 1860 3160 288 111780 0 0 1457 1500 525 152 8 7 85
1 0 0 1948 2868 284 112172 0 88 1345 1022 569 137 10 5 85
0 1 0 1948 3020 284 112008 0 0 1129 1500 509 101 7 6 87
0 1 0 1948 2592 288 112432 0 0 1513 1500 422 140 7 6 87
0 1 0 1948 3092 284 111936 0 0 1089 1000 429 126 8 3 88
0 1 0 1948 2832 280 112200 0 0 1328 1000 451 204 10 5 85
0 1 0 1948 2372 284 112656 0 0 1090 1500 468 168 5 5 90
0 1 0 1948 3000 284 112028 0 0 1505 1500 499 158 7 7 85
0 1 0 1948 2228 284 112840 0 0 1104 1000 468 129 8 3 88
0 1 0 1948 2408 284 112620 0 0 1154 1500 496 129 7 4 88
1 0 0 1948 2780 284 112312 0 0 1698 1000 448 161 12 4 84
0 1 0 2052 2100 284 113036 0 104 1135 2026 519 378 7 6 87
0 1 0 2052 2084 304 113032 0 0 1442 1000 522 146 8 4 88
0 1 0 2052 2956 300 112164 0 0 1030 1000 410 108 9 2 88
0 1 0 2052 2892 296 112232 0 0 1487 1500 404 150 9 7 84
1 0 0 2052 2736 320 112496 0 0 1192 1000 437 150 9 4 87
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 2052 2408 312 112704 0 0 1313 1500 475 138 10 3 87
0 1 0 2052 2280 308 112836 0 0 1039 1000 384 112 6 3 91
0 1 0 2052 2264 312 112848 0 0 1443 1500 441 160 11 5 84
0 1 0 2052 3024 312 112088 0 0 1057 1000 535 133 7 4 89
1 0 0 2052 2636 312 112540 0 0 1200 1000 413 171 7 5 88
0 1 0 2052 2808 324 112292 0 0 1283 1500 485 132 6 6 87
1 0 0 2132 2272 320 112912 0 80 1089 1020 460 121 9 4 86
0 1 0 2180 2224 308 113020 0 48 1327 1512 408 167 8 8 84
1 0 0 2180 2460 316 112840 0 0 1412 1000 406 131 7 8 85
0 1 0 2180 2576 316 112660 0 0 1281 1500 474 157 6 5 88
0 0 0 928 2736 308 112496 772 0 1045 500 307 153 4 3 93
0 0 0 928 2736 308 112496 0 0 0 0 101 8 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 102 10 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 101 12 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 102 6 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 105 8 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 133 6 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 131 12 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 129 14 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 105 8 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 102 6 0 0 100
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 928 2736 308 112496 0 0 0 0 101 8 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 102 10 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 103 12 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 101 6 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 102 8 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 101 7 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 106 12 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 102 10 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 103 8 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 101 6 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 102 8 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 101 10 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 102 12 0 0 100
0 0 0 928 2736 308 112496 0 0 0 0 101 6 0 0 100
0 0 0 928 2732 308 112500 0 0 0 0 102 8 1 0 99
0 0 0 928 2732 308 112500 0 0 0 0 101 6 0 0 100
0 0 0 928 2732 308 112500 0 0 0 0 110 12 0 0 100
0 0 0 928 2732 308 112500 0 0 0 3221 258 84 0 0 100
0 0 0 928 2732 308 112500 0 0 0 0 153 8 0 0 100
0 0 0 928 2720 312 112508 0 0 5 0 107 12 0 0 100
0 0 0 928 2720 312 112508 0 0 0 0 104 8 0 0 100
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 916 2680 312 112532 0 0 19 0 110 16 0 0 100
0 0 0 916 2680 312 112532 0 0 0 2300 257 12 0 0 100
0 0 0 904 2680 312 112520 0 0 0 0 113 14 0 0 100
0 0 0 904 2680 312 112520 0 0 0 0 107 10 0 0 100
0 0 0 904 2680 312 112520 0 0 0 0 108 6 0 0 100
0 0 0 904 2680 312 112520 0 0 0 0 102 14 0 0 100
[-- Attachment #3: vmstat-2.txt --]
[-- Type: text/plain, Size: 21840 bytes --]
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 0 105948 1240 10040 0 0 40 1 64 32 2 2 96
0 0 0 0 105948 1240 10040 0 0 0 0 103 10 0 0 100
0 0 0 0 105948 1240 10040 0 0 0 0 109 6 0 0 100
0 0 0 0 105948 1240 10040 0 0 0 0 102 8 0 0 100
0 0 0 0 105948 1240 10040 0 0 0 0 101 6 0 0 100
0 0 0 0 105948 1240 10040 0 0 0 0 102 14 0 0 100
0 0 0 0 105948 1240 10040 0 0 0 0 101 8 0 0 100
0 0 0 0 105948 1240 10040 0 0 0 0 102 8 0 0 100
1 0 0 0 102096 1264 12532 0 0 469 0 141 69 2 1 97
1 0 0 0 70620 1300 42988 0 0 3827 0 347 283 21 17 62
0 1 0 0 52628 1324 60332 0 0 2179 1000 425 271 13 7 80
1 0 0 0 40864 1336 71716 0 0 1424 1000 589 178 8 5 87
1 0 0 0 30348 1348 81956 0 0 1282 1500 645 143 8 3 89
0 1 0 0 19240 1356 92644 0 0 1345 1500 633 153 7 4 89
0 1 1 0 13348 976 98932 0 0 1168 1012 379 164 4 8 88
0 1 1 0 9240 968 102924 0 0 512 528 255 134 2 3 95
0 1 1 0 5932 632 106608 0 0 642 528 263 144 5 4 91
0 1 1 0 1688 588 110776 0 0 609 1026 381 131 7 3 90
0 1 1 0 2804 228 111064 0 0 512 528 241 104 2 20 78
0 1 1 0 2288 208 112316 0 0 609 514 295 94 3 25 71
0 1 1 0 2824 224 111608 0 0 625 515 374 122 3 16 81
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 0 2828 228 111440 0 0 608 524 348 112 3 11 86
0 1 1 0 2824 236 111312 0 0 482 532 256 94 2 8 90
1 0 1 0 2692 240 111448 0 0 480 536 354 86 2 8 89
1 0 1 0 2852 232 111492 0 0 481 527 257 95 4 9 87
0 1 1 0 2824 236 111316 0 0 544 528 255 103 2 3 94
0 1 1 0 2824 240 111180 0 0 513 537 340 96 3 5 91
0 1 1 0 2828 244 111040 0 0 497 520 311 112 1 5 94
0 1 1 0 2832 216 111848 0 0 478 530 345 83 3 21 76
0 1 1 0 2600 228 111920 0 0 578 522 224 122 3 3 94
0 1 1 0 2836 228 111536 0 0 576 525 238 120 5 13 82
0 1 1 0 2724 236 111516 0 0 481 526 326 94 4 6 89
1 0 1 36 2624 236 111552 0 36 640 1022 346 105 5 23 71
0 1 1 36 2892 244 111060 0 0 577 519 344 99 2 16 82
0 1 1 36 2892 248 110916 0 0 559 525 255 108 4 9 87
0 1 1 32 2936 252 110732 0 0 480 539 357 96 3 8 89
0 1 1 32 2528 260 110912 0 0 450 532 261 81 2 7 90
0 1 1 32 2896 268 110504 0 0 636 533 359 110 4 7 89
0 1 0 32 2572 244 110704 0 0 741 1037 297 160 4 11 85
0 1 0 32 3064 248 110704 0 0 449 62 364 164 3 16 81
0 1 1 32 2452 252 111184 0 0 438 534 356 101 2 11 87
0 1 1 32 3056 260 110492 0 0 392 542 323 93 2 14 84
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 32 1288 260 112128 0 0 512 536 354 96 4 5 91
1 0 1 32 2536 272 110744 0 0 354 540 242 100 1 9 90
1 0 1 32 3260 208 110884 0 0 416 547 252 175 3 22 75
0 1 1 32 2212 216 112424 0 4 609 527 239 112 5 8 87
0 1 1 32 3004 216 111472 0 4 608 540 327 120 5 7 87
0 1 1 32 3008 232 111252 0 4 768 1083 405 252 3 11 86
0 1 1 32 1964 240 112144 0 0 594 570 422 222 3 14 83
1 1 1 60 2932 248 111164 0 36 661 586 349 226 3 13 84
0 1 1 60 2752 256 111468 0 0 705 567 309 162 3 12 85
0 1 1 60 2480 260 111568 0 0 641 1066 436 197 3 6 91
0 1 1 64 3012 268 110940 0 4 674 580 329 232 5 12 83
0 1 1 64 2764 268 111224 0 0 783 1144 423 354 5 28 67
1 0 1 64 3016 272 110944 0 0 706 610 434 289 4 21 75
0 1 1 64 2508 268 111428 0 0 832 1154 435 379 5 29 66
0 1 1 64 3032 272 110768 0 0 833 628 414 338 5 28 67
0 1 1 64 2608 276 111040 0 0 769 1081 440 268 3 24 72
0 1 1 64 2684 284 110804 0 0 847 1089 370 280 6 12 82
0 1 0 64 2832 276 110796 0 0 706 558 384 193 7 11 82
0 1 1 136 1280 284 112440 0 72 911 1084 494 251 6 13 81
0 1 1 132 2632 292 111020 0 0 883 1137 478 353 3 27 70
0 1 1 132 2996 300 110464 0 0 687 622 344 306 2 32 65
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 132 1688 288 111708 0 0 834 1109 490 308 5 23 72
0 1 1 132 2400 244 111760 0 4 765 628 413 325 5 26 69
0 1 1 204 2688 244 111632 0 92 705 669 390 360 3 32 65
0 1 1 240 2624 248 111684 0 40 609 1105 383 259 3 20 77
0 1 1 276 2940 252 111256 0 36 623 589 451 229 1 15 84
1 0 1 276 2940 256 111064 0 0 736 586 332 244 5 17 77
0 1 1 276 1920 264 112024 0 0 674 1069 417 228 6 9 84
0 1 1 276 2832 264 111084 0 0 656 582 462 245 4 11 84
0 1 1 276 3044 252 111096 0 0 544 597 336 238 3 29 68
0 1 1 276 2708 248 111468 0 0 609 608 320 264 3 30 67
0 1 1 276 2992 256 111120 0 0 438 567 289 176 2 15 83
0 1 1 276 3000 256 110984 0 0 505 564 303 186 3 11 86
1 0 1 276 3060 268 110780 0 0 509 562 321 199 4 8 88
0 1 1 276 2844 268 110852 0 0 677 561 297 185 5 10 85
1 0 1 276 2964 260 110976 0 0 481 1044 400 141 3 8 88
0 1 1 276 1780 260 112016 0 0 608 563 251 179 3 11 86
0 1 1 276 2012 264 111700 0 0 513 557 324 155 3 5 92
0 1 1 276 3076 272 110528 0 0 783 577 328 244 5 13 82
0 1 1 276 1072 264 112812 0 0 674 1047 366 139 6 6 88
0 1 1 276 1960 264 111980 0 0 608 568 482 200 3 15 81
0 1 1 276 1140 268 112664 0 0 513 578 399 198 5 25 70
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 276 2404 272 111240 0 0 544 563 347 159 5 15 79
0 1 1 276 1792 252 112016 0 0 737 569 293 204 4 22 74
0 1 1 276 1116 264 112592 0 0 401 554 339 155 2 13 84
0 1 1 268 1876 264 111808 0 0 576 562 305 166 2 15 83
0 1 1 260 2804 244 111444 0 0 450 563 327 180 3 16 81
1 0 1 260 2700 240 111636 0 4 608 561 392 169 2 24 73
0 1 1 260 2172 244 112120 0 4 673 1041 401 103 3 11 86
0 1 1 260 2832 252 111296 0 4 609 556 427 155 6 8 86
1 0 1 260 2800 260 111208 0 0 687 571 319 210 5 12 83
0 1 1 260 1592 260 112180 0 0 672 1056 432 168 5 4 90
0 1 1 260 2948 256 111088 0 0 610 560 339 182 5 14 81
0 1 1 260 2824 248 111192 0 0 576 561 326 163 3 11 86
1 0 1 260 2864 256 111060 0 0 577 553 292 142 2 8 90
1 0 1 260 2572 264 111180 0 0 613 549 359 136 3 9 88
0 1 1 260 2100 264 111508 0 0 587 1056 305 163 3 7 90
1 0 1 260 2348 260 111584 0 0 608 527 384 94 5 9 86
1 0 1 260 2820 264 110832 0 0 561 531 366 105 4 5 92
0 1 1 260 2656 268 111072 0 0 672 549 378 129 4 10 86
0 1 1 260 2684 276 110920 0 0 450 550 278 148 2 4 93
0 1 2 260 1536 264 112156 0 0 802 1060 474 192 7 11 82
0 1 1 260 1568 260 112164 0 0 943 1084 378 258 7 12 81
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 296 2136 264 111824 0 36 898 602 400 275 7 15 78
1 0 1 296 1296 268 112564 0 0 640 1092 473 248 5 25 70
1 0 1 296 2992 272 110632 0 0 801 601 364 279 3 20 77
0 1 1 296 1540 284 111988 4 0 726 1051 389 200 5 11 84
0 1 1 368 2988 264 111444 0 72 687 610 390 244 2 25 73
0 1 1 368 3048 264 111216 0 0 770 636 386 339 2 28 69
0 1 1 368 2836 268 111392 0 0 640 1058 414 160 3 16 81
0 1 1 368 2952 276 111064 0 0 833 552 335 127 4 16 79
0 1 1 368 2016 280 111852 0 0 641 1054 425 169 3 8 88
1 0 0 368 2056 264 112296 0 0 815 602 359 275 6 26 68
0 1 1 368 2284 276 111816 0 0 674 1114 430 288 4 20 75
0 1 1 368 2768 280 111324 0 0 833 614 407 301 7 22 71
0 1 1 368 1952 276 112124 0 0 672 1068 465 196 4 14 81
0 1 1 368 2752 284 111360 0 0 705 581 326 229 6 19 74
0 1 1 368 3024 280 110956 0 0 655 576 352 225 6 15 79
0 1 1 368 2136 288 111688 0 0 578 1053 425 163 5 9 86
1 0 1 368 2652 288 111080 0 0 768 559 333 205 2 10 87
0 1 1 368 3068 280 110860 0 0 641 563 285 186 6 10 84
1 0 0 368 3064 280 110800 0 0 673 1040 391 149 6 9 85
0 1 1 368 1892 256 112248 0 0 783 571 385 204 4 15 81
0 1 1 368 2972 260 111024 0 0 512 572 286 195 3 26 71
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 368 1632 264 112364 0 0 514 584 339 202 2 29 69
0 1 1 368 1836 264 112032 0 0 448 554 270 125 3 14 83
1 0 1 368 1284 272 112492 0 0 545 568 337 181 4 13 83
0 1 1 368 2824 276 110792 0 0 590 1086 276 214 4 14 82
0 1 1 368 3036 264 110920 0 0 481 66 328 140 3 18 79
0 1 1 368 2428 272 111372 0 0 591 1060 407 185 5 7 88
1 0 1 368 2468 260 111608 0 0 512 577 302 218 3 8 89
0 1 1 368 1244 264 112780 0 0 514 550 288 158 3 7 90
0 1 1 368 1540 264 112456 0 0 544 547 381 138 5 9 86
1 0 1 368 2896 276 110960 0 0 482 561 340 173 1 10 88
0 1 1 368 2988 260 111172 0 0 897 597 382 219 3 25 71
0 1 1 368 1692 264 112332 0 0 687 1068 377 196 3 15 82
0 1 1 368 2752 244 111716 0 0 768 638 466 352 10 32 58
0 1 1 368 2200 256 112084 0 0 674 1110 420 286 3 23 73
0 1 1 368 2992 260 111148 0 0 673 620 392 315 5 33 61
0 1 1 368 2816 256 111312 0 0 736 1134 483 335 3 31 66
0 1 1 368 2916 268 111124 0 0 658 571 402 215 3 15 81
0 1 1 368 2024 256 112064 0 0 928 1102 480 299 8 20 72
0 1 1 368 2552 264 111400 0 0 834 602 445 293 6 20 74
0 1 1 368 1564 264 112420 0 0 705 1067 398 203 5 14 81
0 1 1 368 1196 264 112868 0 0 993 1076 415 240 7 14 79
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 368 3016 264 110964 0 0 847 616 419 331 5 28 67
0 1 1 368 1292 272 112612 0 0 674 1101 399 307 4 18 78
0 1 1 368 2712 272 111212 0 0 769 624 397 358 4 30 66
0 1 1 368 2064 264 111848 0 0 704 1060 399 215 5 14 81
0 1 1 368 2772 260 111304 0 0 673 585 358 223 4 20 76
1 0 1 368 2868 268 111136 0 0 687 579 362 228 1 13 86
0 1 1 368 2444 280 111332 0 0 610 1067 398 182 4 9 87
0 1 1 368 1808 264 112088 0 0 576 559 389 166 4 10 86
1 0 1 368 2812 260 111476 0 0 514 564 305 167 4 12 84
0 1 1 368 2868 268 111192 0 0 860 1041 305 152 5 22 73
0 1 1 368 2056 268 111856 0 0 589 554 396 195 3 11 86
0 1 1 368 2756 272 111080 0 0 807 564 393 215 7 15 78
0 1 1 368 1756 280 112064 0 0 752 1056 456 182 3 13 84
0 1 1 368 3020 268 110960 0 0 769 580 427 212 4 23 73
0 1 1 368 1672 268 112272 0 0 769 1046 430 123 6 17 77
0 1 1 368 1880 264 112212 0 0 879 1086 391 249 5 14 81
0 1 1 368 2944 268 111040 0 0 672 566 421 198 6 19 74
0 1 1 368 2168 272 111772 0 0 898 1059 439 218 4 17 78
0 1 1 368 2156 264 111924 0 0 929 1072 458 222 5 14 81
1 0 1 404 2600 268 111444 0 36 673 571 471 180 4 18 78
0 2 1 404 2100 268 112028 4 0 862 1090 374 269 6 32 62
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 404 2240 284 111816 0 0 616 549 488 168 3 17 80
0 1 1 404 2556 284 111604 0 0 929 1094 408 269 5 24 70
0 1 1 404 1736 284 112332 0 0 608 556 383 165 4 13 83
0 1 1 404 2688 284 111352 0 0 833 596 344 255 6 22 72
0 1 1 404 2892 280 111136 0 0 495 540 337 185 4 10 86
0 1 1 404 2640 280 111444 0 0 544 1045 310 109 3 9 88
0 1 1 404 1640 276 112584 0 0 610 560 420 171 2 11 86
0 1 1 404 2784 284 111312 0 0 449 539 312 86 3 9 88
0 1 1 404 2460 284 111576 0 0 512 541 316 75 3 8 89
0 1 1 404 2060 288 111844 0 0 508 541 307 114 4 7 89
0 1 1 404 2288 288 111416 0 0 549 548 291 175 4 9 86
0 1 1 404 2908 276 111268 0 0 751 572 397 217 4 17 78
0 1 0 404 2304 272 111976 0 0 642 1059 323 167 3 13 83
0 1 1 404 2072 268 112360 0 0 576 569 422 188 4 16 80
0 1 1 404 2396 268 112128 0 0 641 561 417 178 3 18 79
2 0 1 404 2672 272 111768 0 0 577 561 294 153 5 13 82
0 1 1 404 2868 280 111264 0 0 655 550 244 163 5 6 89
0 1 1 404 2796 284 111256 0 0 544 1056 387 158 3 13 83
0 1 1 404 1600 292 112344 0 0 642 536 441 132 4 5 90
0 1 0 404 2112 292 111940 0 0 869 1073 330 225 5 15 79
0 1 1 404 2180 292 111668 0 0 704 574 439 230 5 15 80
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 404 2280 280 111692 0 0 784 1081 404 243 5 18 77
0 1 1 404 1300 276 112736 0 0 654 585 348 223 4 18 78
0 1 1 404 1644 284 112356 0 0 898 1101 351 269 6 21 73
0 1 1 404 2936 280 111104 0 0 833 561 366 179 6 21 73
0 1 1 404 2464 272 111988 0 12 700 1121 425 304 5 33 62
0 1 1 404 2548 272 111776 0 16 517 588 344 220 1 27 71
0 1 1 404 1688 272 112548 0 0 591 577 399 217 3 20 76
0 1 1 404 1796 280 112280 0 0 898 1122 441 342 6 30 64
0 1 1 404 2824 268 111368 0 0 608 581 385 221 5 25 69
0 1 1 404 2688 276 111432 0 0 865 1120 398 314 2 26 72
0 1 1 404 3052 284 110884 0 0 673 567 496 200 1 21 78
0 1 0 404 2832 276 111284 0 0 816 1089 438 271 4 22 73
0 1 1 404 2968 280 111244 0 0 770 569 439 218 5 18 77
0 1 1 404 2364 288 111760 0 0 769 1077 429 242 2 15 82
0 1 1 404 3040 288 111032 0 0 640 569 373 196 3 14 82
0 1 0 404 2528 292 111452 0 0 848 1081 352 251 4 26 70
0 1 1 404 1400 296 112420 0 0 608 553 344 161 6 7 87
0 1 1 404 1508 280 112620 0 0 834 586 415 262 5 17 78
0 1 1 404 1416 276 112648 0 0 578 1061 377 182 4 11 85
1 0 1 404 2964 276 111140 0 0 768 587 392 234 4 26 70
0 1 1 404 1412 280 112620 0 0 769 1080 342 230 7 19 74
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 392 3296 292 111980 56 0 899 547 357 232 1 12 87
0 0 0 392 3296 292 111980 0 0 0 0 102 8 0 0 100
0 0 0 392 3296 292 111980 0 0 0 0 103 12 0 0 100
0 0 0 392 3296 292 111980 0 0 0 0 101 10 0 0 100
0 0 0 392 3296 292 111980 0 0 0 0 101 6 0 0 100
0 0 0 392 3296 292 111980 0 0 0 0 102 8 0 0 100
0 0 0 392 3296 292 111980 0 0 0 0 101 6 0 0 100
0 0 0 392 3296 292 111980 0 0 0 0 102 14 0 0 100
0 0 0 392 3268 292 112008 0 0 14 0 103 12 0 0 100
0 0 0 392 3268 292 112008 0 0 0 0 102 8 0 0 100
0 0 0 392 3268 292 112008 0 0 0 0 101 6 0 0 100
0 0 0 392 3268 292 112008 0 0 0 0 102 8 0 0 100
0 0 0 392 3268 292 112008 0 0 0 0 103 12 0 0 100
0 0 0 392 3268 292 112008 0 0 0 0 101 10 0 0 100
0 0 0 392 3268 292 112008 0 0 0 0 101 6 0 0 100
0 0 0 392 3268 292 112008 0 0 0 0 102 8 0 0 100
0 0 0 392 3268 292 112008 0 0 0 0 101 6 0 0 100
0 0 0 392 3224 308 112036 4 0 21 0 112 32 0 0 100
0 0 0 392 3224 308 112036 0 0 0 0 104 12 0 0 100
0 0 0 392 3224 308 112036 0 0 0 0 101 8 0 0 100
0 0 0 392 3224 308 112036 0 0 0 0 102 6 0 0 100
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 392 3224 308 112036 0 0 0 0 101 8 0 0 100
0 0 0 392 3224 308 112036 0 0 0 0 103 12 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 103 10 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 101 6 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 101 10 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 102 6 0 0 100
0 0 0 392 3220 308 112040 0 0 0 3225 243 84 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 165 8 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 101 8 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 102 6 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 101 8 0 0 100
0 0 0 392 3220 308 112040 0 0 0 2238 243 12 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 116 10 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 106 6 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 106 8 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 111 14 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 108 16 0 0 100
0 0 0 392 3220 308 112040 0 0 0 0 118 20 0 0 100
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 1:53 ` Simon Kirby
@ 2000-05-11 7:23 ` Linus Torvalds
2000-05-11 14:17 ` Simon Kirby
2000-05-11 11:15 ` [PATCH] Recent VM fiasco - fixed Rik van Riel
1 sibling, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-11 7:23 UTC (permalink / raw)
To: Simon Kirby; +Cc: linux-mm, linux-kernel
Hmm..
Having tested some more, the "wait for locked buffer" logic in
fs/buffer.c (sync_page_buffers()) seems toserialize thingsawhole lote more
than I initially thought..
Does it act the way you expect if you change the
if (buffer_locked(p))
__wait_on_buffer(p);
else if (buffer_dirty(p))
ll_rw_block(..
to a simpler
if (buffer_dirty(p) && !buffer_locked(p))
ll_rw_block(..
which doesn't endup serializing the IO all the time?
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 7:23 ` Linus Torvalds
@ 2000-05-11 14:17 ` Simon Kirby
2000-05-11 23:38 ` Simon Kirby
0 siblings, 1 reply; 67+ messages in thread
From: Simon Kirby @ 2000-05-11 14:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2449 bytes --]
On Thu, May 11, 2000 at 12:23:19AM -0700, Linus Torvalds wrote:
> Hmm..
>
> Having tested some more, the "wait for locked buffer" logic in
> fs/buffer.c (sync_page_buffers()) seems toserialize thingsawhole lote more
> than I initially thought..
>
> Does it act the way you expect if you change the
>
> if (buffer_locked(p))
> __wait_on_buffer(p);
> else if (buffer_dirty(p))
> ll_rw_block(..
>
> to a simpler
>
> if (buffer_dirty(p) && !buffer_locked(p))
> ll_rw_block(..
>
> which doesn't endup serializing the IO all the time?
A little bit better! 203 vmstat-line-seconds before, now 155
vmstat-line-seconds to complete. It seems to be doing a better job like
this, but still doesn't write out in blocks like it used to:
2.3.99pre7-9 vanilla:
0 1 1 32 2212 216 112424 0 4 609 527 239 112 5 8 87
0 1 1 32 3004 216 111472 0 4 608 540 327 120 5 7 87
0 1 1 32 3008 232 111252 0 4 768 1083 405 252 3 11 86
0 1 1 32 1964 240 112144 0 0 594 570 422 222 3 14 83
1 1 1 60 2932 248 111164 0 36 661 586 349 226 3 13 84
2.3.99pre7-9 with above adjustment:
0 1 1 64 3032 272 110768 0 0 833 628 414 338 5 28 67
0 1 1 64 2608 276 111040 0 0 769 1081 440 268 3 24 72
0 1 1 64 2684 284 110804 0 0 847 1089 370 280 6 12 82
0 1 0 64 2832 276 110796 0 0 706 558 384 193 7 11 82
0 1 1 136 1280 284 112440 0 72 911 1084 494 251 6 13 81
Also, it's still not as fast as classzone-27 writing out, and CPU use is
still a bit higher:
2.3.99pre7-8 classzone-27:
0 1 0 540 2852 272 110796 0 0 1089 1000 465 148 7 5 88
0 1 0 540 2980 264 110676 0 0 1423 1500 380 140 7 7 86
0 1 0 540 2584 272 111068 0 0 1059 1000 421 146 9 3 88
0 1 0 988 2556 276 111524 0 448 1473 1612 523 148 10 6 84
1 0 0 988 2704 264 111408 0 0 1135 1000 423 163 6 6 88
(All from random areas, sorry... it might be a good idea to read all of
the output in the attachment.)
I attached vmstat-3.txt, the full output with "2.3.99pre7-9 with above
adjustment".
Simon-
[ Stormix Technologies Inc. ][ NetNation Communications Inc. ]
[ sim@stormix.com ][ sim@netnation.com ]
[ Opinions expressed are not necessarily those of my employers. ]
[-- Attachment #2: vmstat-3.txt --]
[-- Type: text/plain, Size: 18880 bytes --]
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 0 106000 1144 10132 0 0 46 2 68 38 3 2 96
0 0 0 0 106000 1144 10132 0 0 0 0 102 6 0 0 100
0 0 0 0 106000 1144 10132 0 0 0 0 102 8 0 0 100
0 0 0 0 105996 1148 10132 0 0 1 6 107 10 0 0 100
0 0 0 0 105996 1148 10132 0 0 0 0 102 12 0 0 100
0 0 0 0 105996 1148 10132 0 0 0 0 102 8 0 0 100
0 0 0 0 105996 1148 10132 0 0 0 0 122 8 0 0 100
0 0 0 0 105996 1148 10132 0 0 0 0 121 6 0 0 100
1 0 0 0 81272 1204 32784 0 0 3002 5 309 245 16 11 72
0 1 0 0 56364 1236 56840 0 0 3027 500 369 327 16 12 72
0 1 0 0 44196 1248 68616 0 0 1473 1500 581 196 8 7 85
0 1 0 0 33288 1264 79224 0 0 1329 1000 584 159 7 7 86
0 1 0 0 22708 1272 89408 0 0 1281 1500 551 144 8 5 87
0 1 0 0 14784 872 97684 0 0 1424 1500 566 173 10 12 78
1 0 1 0 13980 244 99444 0 0 770 525 349 109 3 14 83
0 1 0 0 12380 204 102208 0 0 800 1021 342 105 4 19 76
1 0 0 0 11624 200 103216 0 0 801 533 292 111 6 19 74
0 1 1 0 11276 192 103856 0 0 705 1027 318 93 3 19 77
0 1 1 4 10164 200 104812 0 4 816 534 414 122 5 17 78
0 1 1 4 9040 200 106044 0 0 706 1024 390 98 2 19 78
0 1 1 40 8080 208 107056 0 36 796 539 431 112 5 21 74
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 40 6936 200 108144 0 0 709 1024 366 92 5 25 69
0 1 1 40 6328 204 108556 0 0 737 533 339 105 3 18 79
0 1 0 40 5184 204 109588 0 0 719 1027 327 115 4 19 77
1 0 1 76 4448 208 110376 0 36 642 533 393 96 5 14 80
0 1 1 76 3300 204 111444 0 0 832 1030 303 117 8 20 72
0 1 1 256 3056 208 111760 0 180 513 578 340 74 3 36 60
0 1 0 292 2816 200 112200 0 36 448 42 325 57 2 40 58
0 1 1 328 2984 200 112184 0 36 545 1039 271 84 3 30 67
1 1 1 400 3028 204 112084 0 72 463 58 330 62 2 36 62
0 1 1 400 3012 208 111988 0 0 544 1031 360 79 5 29 66
0 1 1 400 2952 216 111896 0 0 546 527 330 74 4 30 65
0 1 0 400 2988 220 111720 0 0 636 524 461 81 3 28 69
0 1 0 400 2960 224 111704 0 0 741 530 269 105 3 22 75
0 1 0 400 2064 216 112660 0 0 705 1024 352 89 5 23 72
0 1 1 400 2664 224 111872 0 0 783 530 279 116 5 21 73
0 1 0 400 2824 236 111612 0 0 674 1021 429 99 4 22 74
1 0 1 400 3080 232 111172 0 0 736 524 387 136 4 19 77
0 1 0 400 2560 220 112020 0 0 737 1032 304 97 4 17 79
0 1 0 400 2276 228 112212 0 0 769 545 414 81 7 15 77
0 1 0 436 2616 216 112164 0 36 751 1069 325 155 5 20 75
0 1 0 432 2812 224 111748 0 0 866 554 445 88 5 20 75
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 468 2692 232 111632 0 36 705 1057 377 83 4 15 80
1 0 0 464 2396 240 111796 0 0 833 1068 342 107 8 12 80
0 1 0 464 2308 252 111644 0 0 1040 1053 420 122 7 17 76
0 1 0 460 2364 256 111524 0 0 1026 1059 411 124 4 18 77
0 1 0 460 2572 260 111212 0 0 961 1068 448 108 7 19 74
0 1 0 460 2812 252 110972 0 0 993 1068 427 114 7 20 73
0 1 0 460 2820 248 111104 0 0 1007 1078 426 108 5 20 74
1 0 1 460 3068 256 110896 0 0 817 555 487 97 7 14 79
1 0 0 460 2512 248 111568 0 0 929 1063 538 105 7 16 77
0 1 0 460 2572 232 111720 0 0 929 1059 574 99 6 21 73
0 1 1 492 2972 240 111244 0 36 1040 1071 421 115 6 16 78
0 1 1 492 3012 248 111112 0 0 1058 1062 462 122 3 17 79
0 1 0 492 3048 248 111028 0 0 929 1058 415 112 7 13 79
1 0 0 492 2124 260 111788 0 0 1040 1060 480 114 7 16 77
0 1 1 528 2296 252 111772 0 36 924 1071 439 141 5 15 80
0 1 0 564 2828 232 111648 0 36 966 1081 467 139 7 19 74
0 1 0 564 3064 236 111492 0 0 897 569 347 98 8 13 78
1 0 1 564 3072 236 111508 0 0 705 1045 398 79 3 14 82
0 1 0 564 2332 240 112200 0 0 879 1057 371 121 5 14 81
1 0 1 564 3068 240 111320 0 0 898 555 369 94 5 17 77
0 1 0 600 2196 244 112376 0 36 705 1052 375 104 3 18 79
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 636 2900 232 111728 0 36 865 1071 329 104 6 21 73
0 1 1 636 3068 244 111496 0 0 653 540 331 85 3 19 78
1 0 1 636 3056 244 111316 0 0 736 557 389 101 7 19 74
0 1 0 636 2224 236 112216 0 0 770 1045 431 79 5 15 80
0 1 0 672 2884 256 111760 24 36 763 554 327 118 7 16 77
0 1 0 672 2824 252 111672 0 0 769 1048 416 114 4 18 78
0 1 0 672 2832 260 111568 0 0 943 1077 329 104 7 20 73
1 0 0 672 3000 268 111284 0 0 674 546 430 71 5 13 82
0 1 0 672 2844 264 111616 0 0 865 1048 447 85 8 14 78
0 1 0 708 2896 260 111372 0 36 928 1082 348 112 3 19 78
0 1 1 708 3068 252 111752 0 0 720 547 528 85 4 17 78
0 1 0 708 2536 252 112056 0 0 896 1043 376 97 7 12 81
0 1 0 744 2880 264 111772 0 36 834 571 427 88 4 21 74
1 0 0 740 2540 268 112096 0 0 769 1041 420 81 5 18 77
0 1 0 772 2548 256 112152 0 36 865 1059 335 94 4 21 75
0 1 1 772 2964 252 111764 0 0 816 559 381 101 4 15 81
0 1 0 772 2764 248 112056 16 0 757 1040 443 95 4 13 82
0 1 0 768 2292 252 112348 0 0 929 1059 378 90 4 19 77
0 1 0 768 3072 252 111336 0 0 994 1060 442 104 7 23 70
0 1 0 768 2440 252 111944 0 0 1039 1059 417 114 5 18 76
0 1 0 768 2556 240 112224 0 0 1026 1068 464 109 6 22 72
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 768 3068 252 111584 0 0 990 1058 392 124 4 18 78
0 1 0 768 2696 256 111708 0 0 993 1059 427 123 8 19 73
0 1 1 768 3068 252 111604 0 0 975 1062 429 105 6 18 75
0 1 0 768 2544 252 111916 0 0 994 1063 498 101 6 17 77
0 1 0 768 3064 240 111648 0 0 1025 1065 452 104 7 19 74
0 1 0 840 2368 252 112160 0 72 976 1068 387 110 5 16 79
0 1 0 840 2876 264 111424 0 0 994 1056 395 104 6 14 79
0 1 0 836 2864 264 111464 0 0 1025 1056 473 98 7 15 78
1 0 1 836 3080 240 111340 0 0 993 1053 512 108 6 16 77
0 1 0 872 2804 240 112080 0 36 975 569 376 102 4 17 78
1 0 1 872 3056 240 111868 0 0 738 1047 505 89 4 18 77
0 1 0 872 2364 240 112448 0 0 865 1057 413 105 7 15 78
0 1 1 872 3068 240 111676 0 0 928 570 350 87 5 18 76
0 1 0 872 2712 236 112148 0 0 610 1042 349 82 4 13 83
0 1 0 872 2600 240 112236 0 0 942 1077 349 122 6 15 79
0 1 0 872 2888 244 112016 0 0 706 543 330 84 4 13 83
0 1 0 872 2736 248 111988 0 0 865 1048 390 114 6 13 81
0 1 0 872 2280 244 112448 0 0 929 1046 348 96 6 17 77
1 0 0 868 2688 244 111968 0 0 847 557 422 95 7 17 76
0 1 1 868 2536 256 111856 0 0 770 1048 411 99 5 14 81
0 1 0 868 2308 240 112716 0 0 895 1067 446 99 6 20 74
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 904 2812 248 112092 0 36 993 1067 485 125 5 18 77
0 1 0 940 2892 248 111932 0 36 879 561 463 96 3 15 81
1 0 1 940 2464 260 111992 0 0 1058 1060 429 104 6 15 79
0 1 0 940 2552 248 112304 0 0 1057 1077 509 107 5 21 73
0 1 0 976 2816 248 111972 12 36 1011 1061 424 103 7 16 77
1 0 0 976 2140 256 112300 0 0 975 1050 411 116 8 13 79
0 1 1 976 3068 256 111724 0 0 834 1066 375 94 2 17 81
0 1 1 976 3068 236 112080 0 0 929 1048 402 96 4 23 73
0 1 1 976 2960 244 111940 0 0 961 1054 394 131 5 17 78
0 1 0 1012 2828 240 112108 0 36 975 1068 490 136 8 22 70
0 1 0 1048 2828 240 112092 0 36 994 1081 438 156 5 19 76
1 0 0 1048 2992 248 111792 0 0 993 1053 414 121 6 16 77
0 1 0 1048 2824 252 111708 0 0 944 1057 382 111 7 15 78
0 1 0 1048 2896 256 111824 0 0 860 545 367 100 6 16 78
0 1 1 1048 3068 236 112060 0 0 838 1050 474 124 5 15 79
1 0 0 1048 3088 248 111864 0 0 898 1050 415 119 6 19 75
0 1 0 1084 2204 256 112596 0 36 919 1061 392 115 5 14 81
1 0 0 1084 2292 264 112316 0 0 1083 1060 445 113 10 14 76
1 0 1 1120 2420 252 112472 0 36 1025 1078 511 101 5 18 77
0 1 0 1192 2684 248 112392 0 72 993 1085 479 102 5 18 77
0 1 0 1192 3060 248 112040 0 0 1007 1060 398 118 5 15 79
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 1228 2548 276 112316 88 36 940 1048 388 124 6 14 80
0 1 0 1228 2380 284 112220 0 0 1057 1056 525 124 10 14 76
0 1 1 1228 3044 284 111460 0 0 993 1035 530 96 5 16 79
0 1 0 1228 2576 260 111900 0 0 1007 1051 394 112 4 17 79
0 1 0 1228 2348 256 113036 0 0 1026 1061 456 104 4 23 73
0 1 0 1300 2276 264 112916 0 72 993 1066 441 108 6 13 81
0 1 0 1296 2324 276 112612 0 0 919 1060 469 114 7 14 79
0 1 1 1296 2336 276 112356 0 0 761 542 429 187 4 13 83
0 1 0 1296 2292 280 112656 0 0 866 1055 328 99 3 18 79
0 1 0 1296 2820 268 112140 0 0 865 1080 462 101 6 18 76
1 0 1 1296 3068 272 111868 0 0 673 539 441 112 2 15 82
0 1 1 1284 2612 268 112516 52 0 924 1065 424 109 4 20 76
0 1 1 1284 2708 280 112244 4 0 754 542 334 86 3 19 78
0 1 0 1280 2168 276 112712 0 0 832 1050 403 101 6 14 80
0 1 0 1280 2388 248 112688 0 0 897 1062 368 94 6 18 76
0 1 0 1280 2264 244 113180 0 0 705 539 428 86 5 15 80
0 1 0 1280 3032 248 112184 0 0 879 1049 337 101 6 18 76
0 1 0 1280 2764 268 112288 0 0 870 540 306 94 4 15 80
0 1 1 1288 3072 268 112028 0 8 1022 1062 398 99 6 23 71
0 1 1 1324 3072 272 111852 0 36 801 1039 520 85 5 9 85
0 1 0 1324 2868 280 111876 0 0 1135 1068 460 132 7 16 77
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 1360 2804 264 112356 0 36 962 1061 489 110 4 18 78
0 1 1 1360 3032 260 112172 0 0 1057 1070 565 109 6 20 74
0 1 0 1360 2356 268 112708 0 0 702 1036 408 85 5 12 83
0 1 0 1396 2556 276 112428 0 36 1010 1057 444 106 4 17 78
0 1 1 1396 2852 272 112056 0 0 994 1056 420 105 6 15 79
0 1 0 1396 2300 272 112580 0 0 1025 1063 447 129 7 15 78
0 1 0 1396 2808 260 112632 0 0 977 1058 427 118 5 22 73
0 1 0 1396 2740 264 112612 0 0 994 1042 486 103 5 18 77
0 1 0 1396 2800 268 112388 0 0 993 1049 410 108 5 15 80
0 1 0 1396 2352 272 112712 0 0 993 1059 487 133 7 14 79
0 1 0 1396 3068 276 111908 0 0 975 1062 409 123 7 15 78
0 1 0 1396 2544 284 112260 0 0 642 534 455 66 5 8 87
0 1 0 1396 2176 280 112916 0 0 898 1045 353 110 6 17 77
0 1 0 1396 2760 276 112160 0 0 993 1059 475 114 6 20 74
0 1 0 1396 2888 276 112032 0 0 975 1037 399 112 5 14 80
0 0 0 1264 2976 292 112956 100 0 415 2 239 93 0 3 97
0 0 0 1264 2976 292 112956 0 0 0 0 101 10 0 0 100
0 0 0 1264 2916 292 113012 0 0 29 0 105 10 0 0 100
0 0 0 1264 2916 292 113012 0 0 0 0 101 8 0 0 100
0 0 0 1264 2916 292 113012 0 0 0 0 102 6 0 0 100
0 0 0 1264 2916 292 113012 0 0 0 0 101 14 0 0 100
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 1264 2860 296 113056 16 0 19 0 107 16 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 105 8 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 103 8 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 103 8 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 101 12 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 102 10 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 102 6 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 103 8 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 101 6 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 101 14 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 102 8 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 101 8 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 102 8 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 101 8 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 102 12 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 101 10 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 102 6 0 0 100
0 0 0 1264 2860 296 113056 0 0 0 0 102 8 0 0 100
0 0 0 1264 2732 312 113160 80 0 40 0 120 36 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 1174 177 14 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 101 8 0 0 100
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 1264 2728 312 113164 0 0 0 0 101 8 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 102 6 0 0 100
0 0 1 1264 2728 312 113164 0 0 0 2601 101 10 0 0 99
0 0 0 1264 2728 312 113164 0 0 0 1673 356 240 0 1 99
0 0 0 1264 2728 312 113164 0 0 0 0 132 10 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 102 6 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 101 8 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 102 6 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 4 104 14 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 102 12 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 101 8 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 102 6 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 101 8 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 102 12 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 101 10 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 102 6 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 101 8 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 102 6 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 101 14 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 102 8 0 0 100
0 0 0 1264 2728 312 113164 0 0 0 0 101 8 0 0 100
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 1264 2724 312 113168 4 0 1 0 110 16 0 0 99
0 0 0 1264 2552 332 113308 0 0 78 4 123 47 0 1 99
0 0 0 1264 2552 332 113308 0 0 0 0 102 8 0 0 100
0 0 0 1264 2552 332 113308 0 0 0 0 108 10 0 0 100
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 14:17 ` Simon Kirby
@ 2000-05-11 23:38 ` Simon Kirby
2000-05-12 0:09 ` Linus Torvalds
0 siblings, 1 reply; 67+ messages in thread
From: Simon Kirby @ 2000-05-11 23:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1576 bytes --]
On Thu, May 11, 2000 at 07:17:04AM -0700, Simon Kirby wrote:
> On Thu, May 11, 2000 at 12:23:19AM -0700, Linus Torvalds wrote:
> ...
> > which doesn't endup serializing the IO all the time?
>
> A little bit better! 203 vmstat-line-seconds before, now 155
> vmstat-line-seconds to complete. It seems to be doing a better job like
> this, but still doesn't write out in blocks like it used to:
Hrm! pre7 release seems to be even better. 113 vmstat-line-seconds now
(yes, I know this isn't a very scientific testing method :)). Second try
was 114 vmstat-line-seconds. classzone-27 did it in 107, so that's not
very far off! Also, it swapped much less this time, and used less CPU.
vmstat output attached.
Hmm...I don't know if this means anything, but this kernel and pre7-9
with the buffer.c modification seem to look at bit different than with
classzone and with 2.2. As the free memory is used and turned into cache
as the uncompression first starts, it seemed to kind of sweep down not in
a line but in a curve as it approached the minimum free, and during the
beginning it was writing out in groups of 500 blocks but then went back
to the continuous writing. It seems odd that when it start it has no
problem going through the first 50 MB in two or three seconds but then
takes a long time to go through the next. Maybe not, though. Just
noticing. :)
Simon-
[ Stormix Technologies Inc. ][ NetNation Communications Inc. ]
[ sim@stormix.com ][ sim@netnation.com ]
[ Opinions expressed are not necessarily those of my employers. ]
[-- Attachment #2: vmstat-4.txt --]
[-- Type: text/plain, Size: 13280 bytes --]
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 0 102848 1252 10652 0 0 27 1 67 32 1 1 98
0 0 0 0 102928 1252 10652 0 0 0 0 114 27 0 0 100
0 0 0 0 102848 1252 10652 0 0 0 0 123 34 0 0 99
0 0 0 0 102844 1252 10652 0 0 0 0 132 27 0 0 100
0 0 0 0 102844 1252 10652 0 0 0 0 133 30 0 0 100
0 0 0 0 102844 1252 10652 0 0 0 0 127 26 0 0 100
1 0 0 0 96160 1268 15896 0 0 820 0 162 109 3 4 93
1 0 0 0 65276 1308 45840 0 0 3763 0 353 313 23 13 64
0 1 0 0 50064 1324 60432 0 0 1827 1003 352 250 11 6 83
0 1 0 0 37832 1340 72264 0 0 1488 1500 591 193 8 5 87
0 1 0 0 25928 1352 83784 0 0 1442 1500 608 185 10 5 85
1 0 0 0 15760 1200 93968 0 0 1346 1000 621 165 8 7 85
0 1 0 0 15044 252 96420 0 0 1231 1503 451 193 10 4 86
1 0 0 0 13796 228 98736 0 0 1284 1000 578 205 7 7 86
0 1 0 0 13676 228 98872 0 0 705 1802 778 157 3 4 93
0 1 0 0 12536 232 100012 0 0 1522 1027 531 237 8 6 86
0 1 0 0 12024 224 100500 0 0 642 2150 748 224 5 4 91
0 1 0 0 11856 224 100696 0 0 704 1473 745 107 4 3 93
0 1 0 0 10220 204 102488 0 0 1698 932 550 244 8 7 85
1 0 0 0 9096 220 103804 0 0 1523 870 430 239 7 5 88
0 1 0 0 8284 220 104548 0 0 1025 1348 596 161 5 6 89
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 0 7940 204 104852 0 0 1183 1377 619 185 8 7 85
0 1 0 0 7184 212 105604 0 0 1269 1157 496 212 9 5 86
1 0 0 0 6052 220 106732 0 0 1059 1124 586 169 6 4 90
1 0 0 0 4928 216 107932 0 0 1426 1361 607 224 8 7 85
0 1 0 0 4464 220 108328 0 0 1218 1302 617 181 5 3 92
0 1 0 0 4016 212 108820 0 0 865 1445 698 178 4 6 89
0 1 0 0 3076 204 109800 0 0 1153 1156 545 174 8 4 88
0 1 0 0 2572 220 110280 0 0 1651 897 531 240 9 6 85
0 1 0 0 3008 220 109688 0 0 1762 1030 475 222 13 7 80
0 1 0 0 2820 224 109848 0 0 1008 1092 554 132 5 6 89
0 1 0 0 2448 232 110140 0 0 1250 1153 466 180 8 5 87
0 1 0 0 2580 232 109884 0 0 1437 1235 595 183 8 5 87
0 1 0 0 2884 228 109560 0 0 1590 1010 500 202 11 7 82
0 1 0 0 2760 228 109672 0 0 1155 1540 609 174 5 6 88
1 0 0 0 3072 232 109360 0 0 1089 1540 517 183 8 5 87
0 1 0 0 2768 216 109692 0 0 415 1605 677 285 2 3 94
0 1 0 0 2832 224 109704 0 0 1396 1155 547 188 7 5 88
1 0 0 0 2344 228 110336 0 0 1693 961 600 238 10 7 83
0 1 0 0 2684 240 109812 0 0 2038 868 500 266 14 9 77
1 0 0 0 2436 228 110184 0 0 1057 1446 576 156 5 3 91
0 1 0 0 2980 216 109556 0 0 416 1746 711 136 3 3 93
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 0 2564 224 109976 0 0 946 1108 624 173 5 6 89
0 1 0 0 2432 228 110180 0 0 1762 963 529 243 10 10 80
0 1 0 0 2336 236 110248 0 0 2097 1028 586 282 13 5 82
0 1 1 0 2812 220 109572 0 0 1250 1831 488 154 7 4 88
0 1 0 0 2592 212 109828 0 0 412 1539 763 361 2 4 93
0 1 1 0 2868 232 109560 0 0 1689 389 492 221 11 3 86
0 1 0 0 2564 232 109860 0 0 1282 1199 576 151 8 5 87
1 0 0 0 2492 236 109928 0 0 1153 1281 565 163 6 5 89
0 1 0 0 2828 224 109564 0 0 1456 1704 472 362 7 9 84
0 1 0 0 2796 220 109600 0 0 512 1156 754 280 5 1 94
1 0 0 0 2680 228 109784 0 0 994 1251 575 165 5 7 87
0 1 0 0 2580 236 109836 0 0 1634 1122 505 221 11 6 83
1 0 0 0 2256 232 110236 0 0 2097 739 434 229 18 6 76
0 1 0 0 2840 212 109612 0 0 1213 1319 540 145 9 5 86
0 1 0 0 2564 224 109880 0 0 1046 1378 592 156 4 4 92
0 1 1 0 2816 236 109568 0 0 964 1281 580 149 5 5 89
0 1 0 0 2564 236 109800 0 0 1153 1205 616 176 6 7 87
0 1 0 0 2500 240 109804 0 0 1712 996 456 200 13 8 79
0 1 0 0 2556 228 109796 0 0 1378 1267 515 150 10 6 83
0 1 0 0 2564 224 109840 0 0 961 1831 522 294 3 7 90
0 1 0 0 2236 224 110196 0 0 417 1249 741 245 3 1 96
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 0 2692 224 109828 0 0 1213 1252 602 188 8 6 86
0 1 0 0 2580 228 109996 0 0 1751 993 511 199 10 8 82
0 1 0 0 2576 232 110032 0 0 1392 1094 570 181 9 5 86
1 0 0 0 2604 232 110016 0 0 1731 1121 539 221 10 9 81
0 1 1 0 2796 236 109584 0 0 1411 1124 576 175 11 4 85
1 0 0 0 2924 236 109368 0 0 1169 1476 409 146 9 4 87
0 1 0 0 2564 236 109728 0 0 1185 1092 570 162 8 6 86
0 1 0 0 2568 220 109752 0 0 1281 1569 647 185 8 8 84
0 1 0 0 2820 228 109496 0 0 721 1381 558 112 5 3 91
0 1 0 0 2328 228 110020 0 0 706 1699 617 150 2 3 95
1 0 0 0 2884 208 109680 0 0 428 1188 691 218 3 5 92
0 1 1 0 3072 192 109720 0 0 738 2171 582 133 5 6 89
0 1 0 0 2948 188 110484 0 0 769 941 709 176 4 5 90
1 0 0 0 2264 220 110992 0 0 2613 261 382 293 16 11 72
0 1 0 36 2908 224 110112 0 36 1585 1023 441 216 9 9 82
0 1 0 36 2832 224 110092 0 0 1090 1716 552 192 6 5 89
1 0 0 36 2172 220 109876 0 0 608 1025 699 187 2 4 94
1 0 1 36 2896 220 110052 0 0 2291 708 492 285 13 8 79
1 0 0 36 3068 228 109684 0 0 1506 1414 470 237 9 8 82
0 2 0 36 2764 220 110032 0 0 446 1797 725 303 4 3 93
1 0 0 36 2260 216 110620 0 0 929 769 615 128 8 4 88
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 36 2316 232 110464 0 0 1715 1089 545 207 7 7 86
0 1 0 36 2752 224 110036 0 0 1346 1061 523 234 6 6 88
0 1 0 36 2584 224 110232 0 0 1296 1349 591 226 9 5 86
0 1 1 36 3080 216 109756 0 0 689 1988 612 212 3 6 91
0 1 0 36 2684 212 110204 0 0 1411 513 592 228 8 5 87
0 1 0 36 3064 224 109792 0 0 1586 1252 510 244 11 8 81
0 1 0 36 2900 232 109800 0 0 1858 978 533 213 11 9 80
0 1 0 36 2804 228 109800 0 0 1085 1256 495 202 6 7 87
0 1 0 36 2564 228 110040 0 0 1140 1186 534 136 7 5 87
0 1 0 36 2820 240 109632 0 0 1539 1252 583 184 9 5 86
0 1 0 36 2692 236 109772 0 0 1121 1026 567 165 11 3 86
0 1 0 36 2584 224 109912 0 0 911 1381 463 255 6 5 89
1 0 0 36 2424 220 110240 0 0 544 1026 723 130 3 3 94
0 1 0 36 2560 216 110036 0 0 1507 1445 605 170 7 10 83
0 1 0 36 2196 232 109548 0 0 1426 1378 572 191 10 5 85
0 1 0 36 2744 236 109828 0 0 2147 1110 537 252 13 12 75
0 1 0 72 2836 220 109764 0 36 739 1181 563 116 4 3 92
0 1 0 72 2228 232 110364 0 0 1071 1025 592 164 9 4 87
0 1 0 72 2632 248 109956 0 0 1221 1186 526 178 9 4 87
0 1 0 72 2824 236 109784 0 0 1121 1155 579 156 7 5 88
0 1 0 72 2572 236 110012 0 0 1264 1122 532 181 8 6 86
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 72 2984 224 109588 0 0 1442 1116 403 154 9 4 86
0 1 0 108 3020 216 109620 0 36 1123 1498 535 134 6 7 87
0 1 0 108 2580 216 110072 0 0 545 803 454 90 3 2 94
0 1 0 108 2568 228 110088 0 0 1101 1314 505 162 6 6 88
0 1 0 108 2732 232 109920 0 0 1223 1152 557 185 7 3 89
0 1 0 108 2820 228 109868 0 0 1153 1221 592 168 7 4 89
0 1 0 108 2564 240 110128 0 0 1426 901 503 254 8 5 86
0 1 0 108 2760 236 109948 0 0 1121 1090 536 141 6 6 88
0 1 0 108 2628 232 110020 0 0 1520 1347 579 200 13 5 82
0 1 0 108 2772 236 109868 0 0 1122 1187 557 157 8 5 87
0 1 0 108 2568 236 110072 0 0 1058 1219 573 178 5 6 89
1 0 0 108 2812 228 109900 0 0 1121 1412 656 168 7 3 90
0 0 0 108 3804 248 110140 0 0 1094 609 397 166 4 4 91
0 0 0 108 3804 248 110140 0 0 0 0 105 29 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 103 24 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 102 26 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 103 30 0 0 99
0 0 0 108 3804 248 110140 0 0 0 0 101 29 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 101 25 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 102 24 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 101 26 0 0 100
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 108 3804 248 110140 0 0 0 0 102 29 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 103 31 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 101 24 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 103 28 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 103 30 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 101 27 0 0 100
0 0 0 108 3804 248 110140 0 0 0 0 103 30 0 0 100
0 0 0 108 3764 264 110164 0 0 20 0 112 43 0 0 99
0 0 0 108 3764 264 110164 0 0 0 0 101 24 0 0 100
0 0 0 108 3764 264 110164 0 0 0 0 101 30 0 0 100
0 0 0 108 3764 264 110164 0 0 0 0 102 24 0 0 100
0 0 0 108 3764 264 110164 0 0 0 2 105 32 0 0 100
0 0 0 108 3764 264 110164 0 0 0 0 101 23 0 0 100
0 0 0 108 3764 264 110164 0 0 0 0 101 26 0 0 100
0 0 0 108 3760 264 110168 0 0 0 0 106 32 0 0 100
0 0 0 108 3760 264 110168 0 0 0 0 103 27 0 0 100
0 0 0 108 3760 264 110168 0 0 0 0 101 29 0 0 100
0 0 0 108 3760 264 110168 0 0 0 0 101 26 0 0 100
0 0 0 108 3760 264 110168 0 0 0 0 102 23 0 0 100
0 0 0 108 3760 264 110168 0 0 0 0 103 31 0 0 100
0 0 0 108 3760 264 110168 0 0 0 0 101 26 0 0 100
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 108 3760 264 110168 0 0 0 4834 313 332 0 1 99
0 0 0 108 3760 264 110168 0 0 0 0 211 26 0 0 100
0 0 0 108 3760 264 110168 0 0 0 0 103 27 0 0 100
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 23:38 ` Simon Kirby
@ 2000-05-12 0:09 ` Linus Torvalds
2000-05-12 2:51 ` [RFC][PATCH] shrink_mmap avoid list_del (Was: Re: [PATCH] Recent VM fiasco - fixed) Roger Larsson
0 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-12 0:09 UTC (permalink / raw)
To: Simon Kirby; +Cc: linux-mm, linux-kernel
On Thu, 11 May 2000, Simon Kirby wrote:
>
> Hrm! pre7 release seems to be even better. 113 vmstat-line-seconds now
> (yes, I know this isn't a very scientific testing method :)). Second try
> was 114 vmstat-line-seconds. classzone-27 did it in 107, so that's not
> very far off! Also, it swapped much less this time, and used less CPU.
> vmstat output attached.
The final pre7 did something that I'm not entirely excited about, but that
kind of makes sense at least from a CPU standpoint (as the SGI people have
repeated multiple times). What the real pre7 does is to just move any page
that has problems getting free'd to the head of the LRU list, so that we
won't try it immediately the next time. This way we don't test the same
pages over and over again when they are either shared, in the wrong zone,
or have dirty/locked buffers.
It means that the "LRU" is less LRU, but you could see it as a "how hard
do we want to free this" pressure-based system that really a least
recently _used_ system. And it avoids the "repeat the whole thing on the
same page" issue. And it looks like it behaves reasonably well, while
saving a lot of CPU.
Knock wood.
I'm still considering the pre7 as more a "ok, I tried to get rid of the
cruft" thing. Most of the special case code that has accumulated lately is
gone. We can start adding stuff back now, I'm happy that the basics are
reasonably clean.
I think Ingo already posted a very valid concern about high-memory
machines, and there are other issues we should look at. I just want to be
in a position where we can look at the code and say "we do X because Y",
rather than a collection of random tweaks that just happens to work.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* [RFC][PATCH] shrink_mmap avoid list_del (Was: Re: [PATCH] Recent VM fiasco - fixed)
2000-05-12 0:09 ` Linus Torvalds
@ 2000-05-12 2:51 ` Roger Larsson
0 siblings, 0 replies; 67+ messages in thread
From: Roger Larsson @ 2000-05-12 2:51 UTC (permalink / raw)
To: Linus Torvalds, Rik van Riel; +Cc: linux-mm
[-- Attachment #1: Type: text/plain, Size: 2210 bytes --]
Hi,
I tried to find a way to walk the lru list without list_del.
Here is my patch:
- not compiled nor run (low on HD...)
Could something like this be used?
If no, why not?
/RogerL
Linus Torvalds wrote:
>
> On Thu, 11 May 2000, Simon Kirby wrote:
> >
> > Hrm! pre7 release seems to be even better. 113 vmstat-line-seconds now
> > (yes, I know this isn't a very scientific testing method :)). Second try
> > was 114 vmstat-line-seconds. classzone-27 did it in 107, so that's not
> > very far off! Also, it swapped much less this time, and used less CPU.
> > vmstat output attached.
>
> The final pre7 did something that I'm not entirely excited about, but that
> kind of makes sense at least from a CPU standpoint (as the SGI people have
> repeated multiple times). What the real pre7 does is to just move any page
> that has problems getting free'd to the head of the LRU list, so that we
> won't try it immediately the next time. This way we don't test the same
> pages over and over again when they are either shared, in the wrong zone,
> or have dirty/locked buffers.
>
> It means that the "LRU" is less LRU, but you could see it as a "how hard
> do we want to free this" pressure-based system that really a least
> recently _used_ system. And it avoids the "repeat the whole thing on the
> same page" issue. And it looks like it behaves reasonably well, while
> saving a lot of CPU.
>
> Knock wood.
>
> I'm still considering the pre7 as more a "ok, I tried to get rid of the
> cruft" thing. Most of the special case code that has accumulated lately is
> gone. We can start adding stuff back now, I'm happy that the basics are
> reasonably clean.
>
> I think Ingo already posted a very valid concern about high-memory
> machines, and there are other issues we should look at. I just want to be
> in a position where we can look at the code and say "we do X because Y",
> rather than a collection of random tweaks that just happens to work.
>
> Linus
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
--
Home page:
http://www.norran.net/nra02596/
[-- Attachment #2: patch-2.3.99-pre7-9-shrink_mmap.1 --]
[-- Type: text/plain, Size: 3624 bytes --]
diff -Naur linux-2.3-pre9--/mm/filemap.c linux-2.3/mm/filemap.c
--- linux-2.3-pre9--/mm/filemap.c Fri May 12 02:42:19 2000
+++ linux-2.3/mm/filemap.c Fri May 12 04:28:30 2000
@@ -236,7 +236,6 @@
int shrink_mmap(int priority, int gfp_mask)
{
int ret = 0, count;
- LIST_HEAD(old);
struct list_head * page_lru, * dispose;
struct page * page = NULL;
@@ -244,26 +243,29 @@
/* we need pagemap_lru_lock for list_del() ... subtle code below */
spin_lock(&pagemap_lru_lock);
- while (count > 0 && (page_lru = lru_cache.prev) != &lru_cache) {
+ page_lru = &lru_cache;
+ while (count > 0 && (page_lru = page_lru->prev) != &lru_cache) {
page = list_entry(page_lru, struct page, lru);
- list_del(page_lru);
dispose = &lru_cache;
if (PageTestandClearReferenced(page))
goto dispose_continue;
count--;
- dispose = &old;
+
+ dispose = NULL;
/*
* Avoid unscalable SMP locking for pages we can
* immediate tell are untouchable..
*/
if (!page->buffers && page_count(page) > 1)
- goto dispose_continue;
+ continue;
+ /* Lock this lru page, reentrant
+ * will be disposed correctly when unlocked */
if (TryLockPage(page))
- goto dispose_continue;
+ continue;
/* Release the pagemap_lru lock even if the page is not yet
queued in any lru queue since we have just locked down
@@ -281,7 +283,7 @@
*/
if (page->buffers) {
if (!try_to_free_buffers(page))
- goto unlock_continue;
+ goto page_unlock_continue;
/* page was locked, inode can't go away under us */
if (!page->mapping) {
atomic_dec(&buffermem_pages);
@@ -336,27 +338,43 @@
cache_unlock_continue:
spin_unlock(&pagecache_lock);
-unlock_continue:
+page_unlock_continue:
spin_lock(&pagemap_lru_lock);
UnlockPage(page);
put_page(page);
+ continue;
+
dispose_continue:
- list_add(page_lru, dispose);
- }
- goto out;
+ /* have the pagemap_lru_lock, lru cannot change */
+ {
+ struct list_head * page_lru_to_move = page_lru;
+ page_lru = page_lru->next; /* continues with page_lru.prev */
+ list_del(page_lru_to_move);
+ list_add(page_lru_to_move, dispose);
+ }
+ continue;
made_inode_progress:
- page_cache_release(page);
+ page_cache_release(page);
made_buffer_progress:
- UnlockPage(page);
- put_page(page);
- ret = 1;
- spin_lock(&pagemap_lru_lock);
- /* nr_lru_pages needs the spinlock */
- nr_lru_pages--;
+ /* like to have the lru lock before UnlockPage */
+ spin_lock(&pagemap_lru_lock);
-out:
- list_splice(&old, lru_cache.prev);
+ UnlockPage(page);
+ put_page(page);
+ ret++;
+
+ /* lru manipulation needs the spin lock */
+ {
+ struct list_head * page_lru_to_free = page_lru;
+ page_lru = page_lru->next; /* continues with page_lru.prev */
+ list_del(page_lru_to_free);
+ }
+
+ /* nr_lru_pages needs the spinlock */
+ nr_lru_pages--;
+
+ }
spin_unlock(&pagemap_lru_lock);
diff -Naur linux-2.3-pre9--/mm/vmscan.c linux-2.3/mm/vmscan.c
--- linux-2.3-pre9--/mm/vmscan.c Fri May 12 02:42:19 2000
+++ linux-2.3/mm/vmscan.c Fri May 12 04:32:16 2000
@@ -443,10 +443,9 @@
priority = 6;
do {
- while (shrink_mmap(priority, gfp_mask)) {
- if (!--count)
- goto done;
- }
+ count -= shrink_mmap(priority, gfp_mask);
+ if (count <= 0)
+ goto done;
/* Try to get rid of some shared memory pages.. */
if (gfp_mask & __GFP_IO) {
@@ -481,10 +480,9 @@
} while (--priority >= 0);
/* Always end on a shrink_mmap.. */
- while (shrink_mmap(0, gfp_mask)) {
- if (!--count)
- goto done;
- }
+ count -= shrink_mmap(priority, gfp_mask);
+ if (count <= 0)
+ goto done;
return 0;
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 1:53 ` Simon Kirby
2000-05-11 7:23 ` Linus Torvalds
@ 2000-05-11 11:15 ` Rik van Riel
1 sibling, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-11 11:15 UTC (permalink / raw)
To: Simon Kirby; +Cc: Linus Torvalds, linux-mm, linux-kernel
On Wed, 10 May 2000, Simon Kirby wrote:
> Is Andrea taking a too dangerous approach for the current kernel
> version, or are you trying to get something extremely simple
> working instead?
You may want to read his patch before saying it does any good.
There probably are some good bits in the classzone patch, but
it also backs out bugfixes for bugs which have been proven to
exist and fixed by those fixes. ;(
It would be nice if Andrea could separate the good bits from
the bad bits and make a somewhat cleaner patch...
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 1:04 ` [PATCH] Recent VM fiasco - fixed Juan J. Quintela
2000-05-11 1:53 ` Simon Kirby
@ 2000-05-11 5:10 ` Linus Torvalds
2000-05-11 10:09 ` James H. Cloos Jr.
` (2 more replies)
1 sibling, 3 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-11 5:10 UTC (permalink / raw)
To: Juan J. Quintela; +Cc: James H. Cloos Jr., linux-mm, linux-kernel
On 11 May 2000, Juan J. Quintela wrote:
>
> I have done my normal mmap002 test and this goes slower than
> ever, it takes something like 3m50 seconds to complete, (pre7-8 2m50,
> andrea classzone 2m8, and 2.2.15 1m55 for reference).
Note that the mmap002 test is avery bad performance test.
Why?
Because it's a classic "walk a large array in order" test, which means
that the worst possible order to page things out in is LRU.
So toreally speed up mmap002, the best approach is to try to be as non-LRU
as possible, which is obviously the wrong thing to do in real life. So in
that sense optimizing mmap002 is a _bad_ thing.
What I found interesting was how the non-waiting version seemed to have
the actual _disk_ throughput a lot higher. That's much harder to measure,
and I don't have good numbers for it, the best I can say is that it causes
my ncr SCSI controller to complain about too deep queueing depths, which
is a sure sign that we're driving the IO layer hard. Which is a good
thingwhen you measure how efficiently you page things in and out..
But don't look at wall-clock times for mmap002.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 5:10 ` Linus Torvalds
@ 2000-05-11 10:09 ` James H. Cloos Jr.
2000-05-11 17:25 ` Juan J. Quintela
2000-05-11 23:25 ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
2 siblings, 0 replies; 67+ messages in thread
From: James H. Cloos Jr. @ 2000-05-11 10:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Juan J. Quintela, linux-mm, linux-kernel
Tried the cp of the compiled kernel tree on 7-9. *Much* better than any of
the 99s I've tried. On the 4k ext2 ide drive:
# time cp -av linux-2.3.99-pre7-9 L
[...]
0.81user 8.95system 3:37.76elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (158major+199minor)pagefaults 0swaps
# time du -s L
137404 L
0.05user 0.42system 0:05.41elapsed 8%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (105major+26minor)pagefaults 0swaps
kswapd did hit a peak of 50% cpu, but only *very* briefly; it hovered
in the 5% to 10% range for most of the 218 seconds.
On the 1k ext2 scsi drive, kswapd never exceeded 25% cpu, though the
cp took about twice as long for 2/3 the data (and no -v switch):
# time cp -a linux-2.3.99-pre7-8/ L
0.26user 6.80system 5:57.71elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (141major+180minor)pagefaults 0swaps
# time du -s L
88545 L
0.02user 0.59system 0:03.82elapsed 15%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (105major+23minor)pagefaults 0swaps
Mem usage seems to be about 2:1 in favour of cache+buffer.
Another usefule test I've found is to run realplay on large streams.
mediatrip.com has some useful ones, OTOO 22 minutes at 700 kbps.
Watching the four or five such streams which make up a given film in
the same realplay session will result in a segfault in any of the
previous 99s. At least if you watch the 700 kbps streams at double
resolution. That combo seems to have enough memory pressure.
I'd suggest someone w/ more bandwidth than my workstation try it, though.
-JimC
--
James H. Cloos, Jr. <URL:http://jhcloos.com/public_key> 1024D/ED7DAEA6
<cloos@jhcloos.com> E9E9 F828 61A4 6EA9 0F2B 63E7 997A 9F17 ED7D AEA6
Check out TGC: <URL:http://jhcloos.com/go?tgc>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 5:10 ` Linus Torvalds
2000-05-11 10:09 ` James H. Cloos Jr.
@ 2000-05-11 17:25 ` Juan J. Quintela
2000-05-11 23:25 ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
2 siblings, 0 replies; 67+ messages in thread
From: Juan J. Quintela @ 2000-05-11 17:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: James H. Cloos Jr., linux-mm, linux-kernel
>>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:
Hi
linus> On 11 May 2000, Juan J. Quintela wrote:
>>
>> I have done my normal mmap002 test and this goes slower than
>> ever, it takes something like 3m50 seconds to complete, (pre7-8 2m50,
>> andrea classzone 2m8, and 2.2.15 1m55 for reference).
linus> Note that the mmap002 test is avery bad performance test.
Yes, I know, I included it in the memtest suite not like a benchmark.
I put the time results only for comparison. The important thing is
that if we are running a memory hog like mmap002, we have very bad
interactive performance. We swap the incorrect aplications (i.e. no
mmap002 data).
More in this sense is the test mmap001, this is one test that *only*
mmaps a file the size of the physical memory and writes it (only one
pass). Then closes the file. With that test in pre7-9 I got load 14
and dropouts in sound (MP3 playing) of more than one second. And the
interactive performance is *ugly*. The system is unresponsive while I
run that, I am *unable* to change Desktops with the keyboard. You
don't want to know about the jumps of the mouse. I think that we need
to solve that problems. I don't mind that that aplication goes
slower, but it can got so much CPU/memory. My system here is an
Athlon 500Mhz with 256MB of RAM. This system is unable to write an
mmaped file of 256MB char by char. That sound bad from my point of
view.
The tests in memtest try to found problems like that. I am sorry if
it appears that I talk about raw clock time (re-reading my post I see
that I made that point very *unclear*, sorry for the confusion).
linus> Why?
linus> Because it's a classic "walk a large array in order" test, which means
linus> that the worst possible order to page things out in is LRU.
Yes, I know that we don't want to optimise for that things, but is not
good also that one of that things can got our server to its knees.
linus> So toreally speed up mmap002, the best approach is to try to be as non-LRU
linus> as possible, which is obviously the wrong thing to do in real life. So in
linus> that sense optimizing mmap002 is a _bad_ thing.
I don't want to optimize for mmap002, but mmap002 don't touch his
pages in a long time, then its pages must be swaped out, and when
touched again, swaped in. This is not what appears to happen here.
linus> What I found interesting was how the non-waiting version seemed to have
linus> the actual _disk_ throughput a lot higher. That's much harder to measure,
linus> and I don't have good numbers for it, the best I can say is that it causes
linus> my ncr SCSI controller to complain about too deep queueing depths, which
linus> is a sure sign that we're driving the IO layer hard. Which is a good
linus> thingwhen you measure how efficiently you page things in and out..
I think that the problem is that we are not agresive enough to swap
pages that can be swaped and then, in one moment we are unable to find
*any* memory.
linus> But don't look at wall-clock times for mmap002.
Yes, I know, sorry again for the confusion. And thanks for all your
comments, I appreciate them very much.
Later, Juan.
--
In theory, practice and theory are the same, but in practice they
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* [patch] balanced highmem subsystem under pre7-9
2000-05-11 5:10 ` Linus Torvalds
2000-05-11 10:09 ` James H. Cloos Jr.
2000-05-11 17:25 ` Juan J. Quintela
@ 2000-05-11 23:25 ` Ingo Molnar
2000-05-11 23:46 ` Linus Torvalds
` (2 more replies)
2 siblings, 3 replies; 67+ messages in thread
From: Ingo Molnar @ 2000-05-11 23:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: MM mailing list, linux-kernel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2431 bytes --]
IMO high memory should not be balanced. Stock pre7-9 tried to balance high
memory once it got below the treshold (causing very bad VM behavior and
high kswapd usage) - this is incorrect because there is nothing special
about the highmem zone, it's more like an 'extension' of the normal zone,
from which specific caches can turn. (patch attached)
another problem is that even during a mild test the DMA zone gets emptied
easily - but on a big RAM box kswapd has to work _alot_ to fill it up. In
fact on an 8GB box it's completely futile to fill up the DMA zone. What
worked for me is this zone-chainlist trick in the zone setup code:
case ZONE_NORMAL:
zone = pgdat->node_zones + ZONE_NORMAL;
if (zone->size)
zonelist->zones[j++] = zone;
++ break;
case ZONE_DMA:
zone = pgdat->node_zones + ZONE_DMA;
if (zone->size)
zonelist->zones[j++] = zone;
no 'normal' allocation chain leads to the ZONE_DMA zone, except GFP_DMA
and GFP_ATOMIC - both of them rightfully access the DMA zone.
this is a RL problem, without the above a 8GB box under load crashes
pretty quickly due to failed SCSI-layer DMA allocations. (i think those
allocations are silly in the first place.)
the above is suboptimal on boxes which have total RAM within one order of
magnitude of 16MB (the DMA zone stays empty most of the time and is
unaccessible to various caches) - so maybe the following (not yet
implemented) solution would be generic and acceptable:
allocate 5% of total RAM or 16MB to the DMA zone (via fixing up zone sizes
on bootup), whichever is smaller, in 2MB increments. Disadvantage of this
method: eg. it wastes 2MB RAM on a 8MB box. We could probably live with
64kb increments (there are 64kb ISA DMA constraints the sound drivers and
some SCSI drivers are hitting) - is this really true? If nobody objects
i'll implement this later one (together with the assymetric allocation
chain trick) - there will be a 64kb DMA pool allocated on the smallest
boxes, which should be acceptable even on a 4MB box. We could turn off the
DMA zone altogether on most boxes, if it wasnt for the SCSI layer
allocating DMA pages even for PCI drivers ...
Comments?
Ingo
[-- Attachment #2: Type: TEXT/PLAIN, Size: 642 bytes --]
--- linux/mm/page_alloc.c.orig Thu May 11 02:10:34 2000
+++ linux/mm/page_alloc.c Thu May 11 16:03:48 2000
@@ -553,9 +566,14 @@
mask = zone_balance_min[j];
else if (mask > zone_balance_max[j])
mask = zone_balance_max[j];
- zone->pages_min = mask;
- zone->pages_low = mask*2;
- zone->pages_high = mask*3;
+ if (j == ZONE_HIGHMEM) {
+ zone->pages_low = zone->pages_high =
+ zone->pages_min = 0;
+ } else {
+ zone->pages_min = mask;
+ zone->pages_low = mask*2;
+ zone->pages_high = mask*3;
+ }
zone->low_on_memory = 0;
zone->zone_wake_kswapd = 0;
zone->zone_mem_map = mem_map + offset;
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-11 23:25 ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
@ 2000-05-11 23:46 ` Linus Torvalds
2000-05-12 0:08 ` Ingo Molnar
2000-05-12 9:02 ` Christoph Rohland
2000-05-12 10:57 ` Andrea Arcangeli
2 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-11 23:46 UTC (permalink / raw)
To: Ingo Molnar; +Cc: MM mailing list, linux-kernel
On Fri, 12 May 2000, Ingo Molnar wrote:
>
> IMO high memory should not be balanced. Stock pre7-9 tried to balance high
> memory once it got below the treshold (causing very bad VM behavior and
> high kswapd usage) - this is incorrect because there is nothing special
> about the highmem zone, it's more like an 'extension' of the normal zone,
> from which specific caches can turn. (patch attached)
Hmm.. I think the patch is wrong. It's much easier to make
zone_balance_max[HIGHMEM] = 0;
and that will do the same thing, no?
> another problem is that even during a mild test the DMA zone gets emptied
> easily - but on a big RAM box kswapd has to work _alot_ to fill it up. In
> fact on an 8GB box it's completely futile to fill up the DMA zone. What
> worked for me is this zone-chainlist trick in the zone setup code:
Ok. This is a real problem. My inclination would be to say that your patch
is right, but only for large-memory configurations. Ie just say that if
the dang machine has more than half a gig of memory, we shouldn't touch
the 16 low megs at all unless explicitly asked for.
But the static thing ("never touch ZONE_DMA" when doing a normal
allocation) is obviously bogus on smaller-memory machines. So make it
conditional.
> allocate 5% of total RAM or 16MB to the DMA zone (via fixing up zone sizes
> on bootup), whichever is smaller, in 2MB increments. Disadvantage of this
> method: eg. it wastes 2MB RAM on a 8MB box.
This may be part of the solution - make it more gradual than a complete
cut-off at some random point (eg half a gig).
After all, this is why we zoned memory in the first place, so I think it
makes sense to be much more dynamic with the zones.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-11 23:46 ` Linus Torvalds
@ 2000-05-12 0:08 ` Ingo Molnar
2000-05-12 0:15 ` Ingo Molnar
0 siblings, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 0:08 UTC (permalink / raw)
To: Linus Torvalds; +Cc: MM mailing list, linux-kernel, Alan Cox
On Thu, 11 May 2000, Linus Torvalds wrote:
> > IMO high memory should not be balanced. Stock pre7-9 tried to balance high
> > memory once it got below the treshold (causing very bad VM behavior and
> > high kswapd usage) - this is incorrect because there is nothing special
> > about the highmem zone, it's more like an 'extension' of the normal zone,
> > from which specific caches can turn. (patch attached)
>
> Hmm.. I think the patch is wrong. It's much easier to make
yep, it does work (and fixes the 'kswapd storm'), but it's wrong.
> zone_balance_max[HIGHMEM] = 0;
>
> and that will do the same thing, no?
yep - or in fact just changing the constant initialization to ', 0 } ',
right?
> > another problem is that even during a mild test the DMA zone gets emptied
> > easily - but on a big RAM box kswapd has to work _alot_ to fill it up. In
> > fact on an 8GB box it's completely futile to fill up the DMA zone. What
> > worked for me is this zone-chainlist trick in the zone setup code:
>
> Ok. This is a real problem. My inclination would be to say that your patch
> is right, but only for large-memory configurations. Ie just say that if
> the dang machine has more than half a gig of memory, we shouldn't touch
> the 16 low megs at all unless explicitly asked for.
i think there are two fundamental problems here:
1) highmem should not be balanced (period)
2) once all easily allocatable RAM is gone to some high-flux
allocator, the DMA zone is emptied at last and is never
refilled effectively, causing a pointless 'kswapd storm' again.
1) is more or less trivially solved by fixing zone_balance_max[]
initialization. 2):
> > allocate 5% of total RAM or 16MB to the DMA zone (via fixing up zone sizes
> > on bootup), whichever is smaller, in 2MB increments. Disadvantage of this
> > method: eg. it wastes 2MB RAM on a 8MB box.
>
> This may be part of the solution - make it more gradual than a complete
> cut-off at some random point (eg half a gig).
>
> After all, this is why we zoned memory in the first place, so I think it
> makes sense to be much more dynamic with the zones.
ok, so the rule would be to put:
zone_dma_size := max(total_pages/32,16MB) &~(64k-1) + 64k
pages into the DMA zone, do the normal zone from this point up to highmem.
This gradually (linearly) increases the DMA zone's size from 64k on 1MB
boxes to 16MB on 512MB boxes and up. (in steps of 64k) This not only
serves as a DMA pool, but as an atomic allocation pool as well (which was
an ever burning problem on low memory NFS boxes).
i hope nothing relies on getting better than 64k physically aligned pages?
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-11 23:25 ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
2000-05-11 23:46 ` Linus Torvalds
@ 2000-05-12 9:02 ` Christoph Rohland
2000-05-12 9:56 ` Ingo Molnar
2000-05-12 16:12 ` Linus Torvalds
2000-05-12 10:57 ` Andrea Arcangeli
2 siblings, 2 replies; 67+ messages in thread
From: Christoph Rohland @ 2000-05-12 9:02 UTC (permalink / raw)
To: mingo; +Cc: Linus Torvalds, MM mailing list, linux-kernel
Hi Ingo,
Your patch breaks my tests again (Which run fine for some time now on
pre7):
11 1 0 0 1631764 1796 12840 0 0 0 2 115 57045 4 95 1
10 3 0 0 1420616 1796 12840 0 0 0 0 120 55463 5 95 1
9 3 0 0 998032 1796 12840 0 0 0 2 111 49490 4 96 1
VM: killing process bash
VM: killing process ipctst
VM: killing process ipctst
Greetings
Christoph
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 9:02 ` Christoph Rohland
@ 2000-05-12 9:56 ` Ingo Molnar
2000-05-12 11:49 ` Christoph Rohland
2000-05-12 16:12 ` Linus Torvalds
1 sibling, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 9:56 UTC (permalink / raw)
To: Christoph Rohland; +Cc: Linus Torvalds, MM mailing list, linux-kernel
On 12 May 2000, Christoph Rohland wrote:
> Hi Ingo,
>
> Your patch breaks my tests again (Which run fine for some time now on
> pre7):
>
> 11 1 0 0 1631764 1796 12840 0 0 0 2 115 57045 4 95 1
> 10 3 0 0 1420616 1796 12840 0 0 0 0 120 55463 5 95 1
> 9 3 0 0 998032 1796 12840 0 0 0 2 111 49490 4 96 1
> VM: killing process bash
> VM: killing process ipctst
> VM: killing process ipctst
hm, IMHO it really does nothing that should make memory balance worse.
Does the stock kernel work even after a long test?
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 9:56 ` Ingo Molnar
@ 2000-05-12 11:49 ` Christoph Rohland
0 siblings, 0 replies; 67+ messages in thread
From: Christoph Rohland @ 2000-05-12 11:49 UTC (permalink / raw)
To: mingo; +Cc: Linus Torvalds, MM mailing list, linux-kernel
Ingo Molnar <mingo@elte.hu> writes:
> > VM: killing process ipctst
>
> hm, IMHO it really does nothing that should make memory balance worse.
> Does the stock kernel work even after a long test?
No, I just ran a longer test. It does begin to swap out but later I
also get the following messages. (But your version does not swap out
at all without killing processes):
7 9 1 558816 3844 100 13096 266 9400 102 2361 10000 1611 0 99 1
VM: killing process ipctst
3 11 1 589464 5724 120 13044 321 6340 88 1587 4414 1404 0 99 1
Woops: just this moment I also got:
exec.c:265: bad pte f1d4dff8(0000000000104025).
Greetings
Christoph
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 9:02 ` Christoph Rohland
2000-05-12 9:56 ` Ingo Molnar
@ 2000-05-12 16:12 ` Linus Torvalds
1 sibling, 0 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-12 16:12 UTC (permalink / raw)
To: Christoph Rohland; +Cc: mingo, MM mailing list, linux-kernel
On 12 May 2000, Christoph Rohland wrote:
>
> Your patch breaks my tests again (Which run fine for some time now on
> pre7):
Notsurprising, actually.
Never balancing highmem pages will also mean that they never get swapped
out. Which makes sense - why should we try to page anything out if we're
not interested in having any free pages for that zone?
So at some point the VM subsystem will just give up: 90% of the pages it
sees are unswappable, and it still cannot make room to free pages..
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-11 23:25 ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
2000-05-11 23:46 ` Linus Torvalds
2000-05-12 9:02 ` Christoph Rohland
@ 2000-05-12 10:57 ` Andrea Arcangeli
2000-05-12 12:11 ` Ingo Molnar
2 siblings, 1 reply; 67+ messages in thread
From: Andrea Arcangeli @ 2000-05-12 10:57 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linus Torvalds, MM mailing list, linux-kernel
On Fri, 12 May 2000, Ingo Molnar wrote:
>IMO high memory should not be balanced. Stock pre7-9 tried to balance high
>memory once it got below the treshold (causing very bad VM behavior and
>high kswapd usage) - this is incorrect because there is nothing special
>about the highmem zone, it's more like an 'extension' of the normal zone,
>from which specific caches can turn. (patch attached)
IMHO that is an hack to workaround the currently broken design of the MM.
And it will also produce bad effect since you won't age the recycle the
cache in the highmem zone correctly.
Without classzone design you will always have kswapd and the page
allocator that shrink memory even if not necessary. Please check as
reference the very detailed explanation I posted around two weeks ago on
linux-mm in reply to Linus.
What you're trying to workaround on the highmem part is exactly the same
problem you also have between the normal zone and the dma zone. Why don't
you also just take 3mbyte always free from the dma zone and you never
shrink the normal zone?
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 10:57 ` Andrea Arcangeli
@ 2000-05-12 12:11 ` Ingo Molnar
2000-05-12 12:57 ` Andrea Arcangeli
0 siblings, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 12:11 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Linus Torvalds, MM mailing list, linux-kernel
On Fri, 12 May 2000, Andrea Arcangeli wrote:
> >IMO high memory should not be balanced. Stock pre7-9 tried to balance high
> >memory once it got below the treshold (causing very bad VM behavior and
> >high kswapd usage) - this is incorrect because there is nothing special
> >about the highmem zone, it's more like an 'extension' of the normal zone,
> >from which specific caches can turn. (patch attached)
>
> IMHO that is an hack to workaround the currently broken design of the MM.
> And it will also produce bad effect since you won't age the recycle the
> cache in the highmem zone correctly.
what bad effects? the LRU list of the pagecache is a completely
independent mechanizm. Highmem pages are LRU-freed just as effectively as
normal pages. The pagecache LRU list is not per-zone but (IMHO correctly)
global, so the particular zone of highmem pages is completely transparent
and irrelevant to the LRU mechanizm. I cannot see any bad effects wrt. LRU
recycling and the highmem zone here. (let me know if you ment some
different recycling mechanizm)
> What you're trying to workaround on the highmem part is exactly the
> same problem you also have between the normal zone and the dma zone.
> Why don't you also just take 3mbyte always free from the dma zone and
> you never shrink the normal zone?
i'm not working around anything. Highmem _should not be balanced_, period.
It's a superset of normal memory, and by just balancing normal memory (and
adding highmem free count to the total) we are completely fine. Highmem is
also a temporary phenomenon, it will probably disappear in a few years
once 64-bit systems and proper 64-bit DMA becomes commonplace. (and small
devices will do 32-bit + 32-bit DMA.)
'balanced' means: 'keep X amount of highmem free'. What is your point in
keeping free highmem around?
the DMA zone resizing suggestion from yesterday is i believe conceptually
correct as well, _want to_ isolate normal allocators from these 'emergency
pools'. IRQ handlers cannot wait for more free RAM.
about classzone. This was the initial idea how to do balancing when the
zoned allocator was implemented (along with per-zone kswapd threads or
per-zone queues), but it just gets too complex IMHO. Why dont you give the
simpler suggestion from yesterday a thought? We have only one zone
essentially which has to be balanced, ZONE_NORMAL. ZONE_DMA is and should
become special because it also serves as an atomic pool for IRQ
allocations. (ZONE_HIGHMEM is special and uninteresting as far as memory
balance goes, as explained above.) So we only have ZONE_NORMAL to worry
about. Zonechains are perfect ways of defining fallback routes.
i've had a nicely balanced (heavily loaded) 8GB box for the past couple of
weeks, just by doing (yesterday's) slight trivial changes to the
zone-chains and watermarks. The default settings in the stock kernel were
not tuned, but all the mechanizm is there. LRU is working, there was
always DMA RAM around, no classzones necessery here. So what is exactly
the case you are trying to balance?
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 12:11 ` Ingo Molnar
@ 2000-05-12 12:57 ` Andrea Arcangeli
2000-05-12 13:20 ` Rik van Riel
0 siblings, 1 reply; 67+ messages in thread
From: Andrea Arcangeli @ 2000-05-12 12:57 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linus Torvalds, MM mailing list, linux-kernel
On Fri, 12 May 2000, Ingo Molnar wrote:
>what bad effects? the LRU list of the pagecache is a completely
>independent mechanizm. Highmem pages are LRU-freed just as effectively as
>normal pages. The pagecache LRU list is not per-zone but (IMHO correctly)
>global, so the particular zone of highmem pages is completely transparent
It shouldn't be global but per-NUMA-node as I have in the classzone patch.
>and irrelevant to the LRU mechanizm. I cannot see any bad effects wrt. LRU
>recycling and the highmem zone here. (let me know if you ment some
>different recycling mechanizm)
See line 320 of filemap.c in 2.3.99-pre7-pre9. (ignore the fact it will
recycle 1 page, it's just because they didn't expected pages_high to be
zero)
>'balanced' means: 'keep X amount of highmem free'. What is your point in
>keeping free highmem around?
Assuming there is no point, you still want to free also from the highmem
zone while doing LRU aging of the cache.
And if you don't keep X amount of highmem free you'll break if an irq will
do a GFP_HIGHMEM allocation.
Note also that with highmem I don't mean not the memory between 1giga and
64giga, but the memory between 0 and 64giga. When you allocate with
GFP_HIGHUSER you ask to the MM a page between 0 and 64giga.
And in turn what is the point of keeping X amount of normal/regular memory
free? You just try to keep such X amount of memory free in the DMA zone,
so why you also try to keep it free on the normal zone? The problem is the
same.
Please read my emails on linux-mm of a few weeks ago about classzone
approch. I can forward them to linux-kernel if there is interest (I don't
know if there's a web archive but I guess there is).
If the current strict zone approch wouldn't be broken we could as well
choose to split the ZONE_HIGHMEM in 10/20 zones to scales 10/20 times
better during allocations, no? Is this argulemnt enough to make you to at
least ring a bell that the current design is flawed? The flaw is that we
pay that with drawbacks and by having the VM that does the wrong thing
because it have no enough information (it only see a little part of the
picture). You can't fix it without looking the whole picture (the
classzone).
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 12:57 ` Andrea Arcangeli
@ 2000-05-12 13:20 ` Rik van Riel
2000-05-12 16:40 ` Ingo Molnar
0 siblings, 1 reply; 67+ messages in thread
From: Rik van Riel @ 2000-05-12 13:20 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Ingo Molnar, Linus Torvalds, MM mailing list, linux-kernel
On Fri, 12 May 2000, Andrea Arcangeli wrote:
> On Fri, 12 May 2000, Ingo Molnar wrote:
>
> >what bad effects? the LRU list of the pagecache is a completely
> >independent mechanizm. Highmem pages are LRU-freed just as effectively as
> >normal pages. The pagecache LRU list is not per-zone but (IMHO correctly)
> >global, so the particular zone of highmem pages is completely transparent
>
> It shouldn't be global but per-NUMA-node as I have in the classzone patch.
*nod*
This change is in my source tree too (but the active/inactive
page list thing doesn't work yet).
> >and irrelevant to the LRU mechanizm. I cannot see any bad effects wrt. LRU
> >recycling and the highmem zone here. (let me know if you ment some
> >different recycling mechanizm)
>
> See line 320 of filemap.c in 2.3.99-pre7-pre9. (ignore the fact
> it will recycle 1 page, it's just because they didn't expected
> pages_high to be zero)
Indeed, pages_high for the higmem zone probably shouldn't be zero.
pages_min and pages_low: 0
pages_high: 128??? (free up to 512kB of high memory)
> >'balanced' means: 'keep X amount of highmem free'. What is your point in
> >keeping free highmem around?
>
> Assuming there is no point, you still want to free also from the
> highmem zone while doing LRU aging of the cache.
True, but this just involves setting the watermarks right. The
current code supports the balancing just fine.
> And if you don't keep X amount of highmem free you'll break if
> an irq will do a GFP_HIGHMEM allocation.
GFP_HIGHMEM will automatically fallback to the NORMAL zone.
There's no problem here.
> Note also that with highmem I don't mean not the memory between
> 1giga and 64giga, but the memory between 0 and 64giga.
Why do you keep insisting on meaning other things with words than
what everybody else means with them? ;)
> Please read my emails on linux-mm of a few weeks ago about
> classzone approch.
I've read them and it's overly complex and doesn't make much
sense for what we need.
> I can forward them to linux-kernel if there is interest (I don't
> know if there's a web archive but I guess there is).
http://mail.nl.linux.org/linux-mm/
http://www.linux.eu.org/Linux-MM/
> If the current strict zone approch wouldn't be broken we could
> as well choose to split the ZONE_HIGHMEM in 10/20 zones to
> scales 10/20 times better during allocations, no?
This would work just fine, except for the fact that we have
only one pagecache_lock ... maybe we want to have multiple
pagecache_locks based on a hash of the inode number? ;)
> Is this argulemnt enough to make you to at least ring a bell
> that the current design is flawed?
But we *can* split the HIGHMEM zone into a bunch of smaller
ones without affecting performance. Just set zone->pages_min
and zone->pages_low to 0 and zone->pages_high to some smallish
value. Then we can teach the allocator to skip the zone if:
1) no obscenely large amount of free pages
2) zone is locked by somebody else (TryLock(zone->lock))
This will work just fine with the current code (plus these
two minor tweaks). No big changes are needed to support this
idea.
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 13:20 ` Rik van Riel
@ 2000-05-12 16:40 ` Ingo Molnar
2000-05-12 17:15 ` Rik van Riel
` (2 more replies)
0 siblings, 3 replies; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 16:40 UTC (permalink / raw)
To: Rik van Riel
Cc: Andrea Arcangeli, Linus Torvalds, MM mailing list, linux-kernel
On Fri, 12 May 2000, Rik van Riel wrote:
> But we *can* split the HIGHMEM zone into a bunch of smaller
> ones without affecting performance. Just set zone->pages_min
> and zone->pages_low to 0 and zone->pages_high to some smallish
> value. Then we can teach the allocator to skip the zone if:
> 1) no obscenely large amount of free pages
> 2) zone is locked by somebody else (TryLock(zone->lock))
whats the point of this splitup? (i suspect there is a point, i just
cannot see it now. thanks.)
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 16:40 ` Ingo Molnar
@ 2000-05-12 17:15 ` Rik van Riel
2000-05-12 18:15 ` Linus Torvalds
2000-05-19 1:58 ` Andrea Arcangeli
2 siblings, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-12 17:15 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andrea Arcangeli, Linus Torvalds, MM mailing list, linux-kernel
On Fri, 12 May 2000, Ingo Molnar wrote:
> On Fri, 12 May 2000, Rik van Riel wrote:
>
> > But we *can* split the HIGHMEM zone into a bunch of smaller
> > ones without affecting performance. Just set zone->pages_min
> > and zone->pages_low to 0 and zone->pages_high to some smallish
> > value. Then we can teach the allocator to skip the zone if:
> > 1) no obscenely large amount of free pages
> > 2) zone is locked by somebody else (TryLock(zone->lock))
>
> whats the point of this splitup? (i suspect there is a point, i
> just cannot see it now. thanks.)
There's not much point in doing so. This is basically
just a reply to Andrea's "but you can't do _this_ with
the current approach" remark ;)
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 16:40 ` Ingo Molnar
2000-05-12 17:15 ` Rik van Riel
@ 2000-05-12 18:15 ` Linus Torvalds
2000-05-12 18:53 ` Ingo Molnar
2000-05-19 1:58 ` Andrea Arcangeli
2 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2000-05-12 18:15 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel
Ingo, one thing struck me.. Have you actually tested unmodified 99-pre7?
You said that you've been running the "standard kernel with the highmem
modification" for a few weeks on a 8GB machine, and that makes me wonder
if you maybe didn't even try pre7 without your mod?
What _used_ to happen with multi-zone setups was that if on ezone started
to need balancing, you got a lot of page-out activity in the other zones
too, because vmscan would _only_ look at the LRU information, and would
happily page stuff out from the zones that weren't affected at all. On a
highmem machine this means, for example, that if the regular memory zone
(or the DMA zone) got under pressure, we would start paging out highmem
pages too as we encountered them in vmscan.
With such a setup, your patch makes lots of sense - trying to decouple the
highmem zone as much as possible. But the more recent kernels should be
better at not touching zones that don't need touching (it will still
change the LRU information, though).
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 18:15 ` Linus Torvalds
@ 2000-05-12 18:53 ` Ingo Molnar
2000-05-12 19:06 ` Linus Torvalds
0 siblings, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 18:53 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1186 bytes --]
On Fri, 12 May 2000, Linus Torvalds wrote:
> With such a setup, your patch makes lots of sense - trying to decouple
> the highmem zone as much as possible. But the more recent kernels
> should be better at not touching zones that don't need touching (it
> will still change the LRU information, though).
i initially tested pre7-9 and it showed bad behavior: high kswapd activity
trying to balance highmem, while the pagecache is primarily filled from
the highmem. I dont think this can be fixed without 'silencing'
ZONE_HIGHMEM's balancing activities: the pagecache allocates from highmem
so it puts direct pressure on the highmem zone.
This had two effects: wasted CPU time, but it also limited the
page-cache's maximum size to the size of highmem. I'll try the final
pre7-2.3.99 kernel as well in a minute to make sure. (i think the bad
behavior is still be there, judging from the differences between pre9 and
the final patch.)
(i've attached a patch against final-pre7, which is not complete and which
i'm not yet happy about (the kernel shows bad behavior if lots of dirty
data is generated by many processes), but it shows eg. the highmem.c
cleanup that is possible.)
Ingo
[-- Attachment #2: Type: TEXT/PLAIN, Size: 5134 bytes --]
--- linux/mm/page_alloc.c.orig Fri May 12 08:45:17 2000
+++ linux/mm/page_alloc.c Fri May 12 09:14:58 2000
@@ -29,9 +29,9 @@
pg_data_t *pgdat_list = (pg_data_t *)0;
static char *zone_names[MAX_NR_ZONES] = { "DMA", "Normal", "HighMem" };
-static int zone_balance_ratio[MAX_NR_ZONES] = { 128, 128, 128, };
-static int zone_balance_min[MAX_NR_ZONES] = { 10 , 10, 10, };
-static int zone_balance_max[MAX_NR_ZONES] = { 255 , 255, 255, };
+static int zone_balance_ratio[MAX_NR_ZONES] = { 128, 128, 1, };
+static int zone_balance_min[MAX_NR_ZONES] = { 10 , 10, 0, };
+static int zone_balance_max[MAX_NR_ZONES] = { 255 , 255, 0, };
/*
* Free_page() adds the page to the free lists. This is optimized for
@@ -271,7 +271,10 @@
if (!(current->flags & PF_MEMALLOC)) {
int gfp_mask = zonelist->gfp_mask;
if (!try_to_free_pages(gfp_mask)) {
- if (!(gfp_mask & __GFP_HIGH))
+ /*
+ * Non-highprio allocations fail here:
+ */
+ if (!(gfp_mask & __GFP_PRIO))
goto fail;
}
}
@@ -440,6 +443,9 @@
zone = pgdat->node_zones + ZONE_NORMAL;
if (zone->size)
zonelist->zones[j++] = zone;
+ if ((i && __GFP_WAIT) || !(i && __GFP_PRIO) ||
+ (i && __GFP_IO))
+ break;
case ZONE_DMA:
zone = pgdat->node_zones + ZONE_DMA;
if (zone->size)
--- linux/mm/highmem.c.orig Fri May 12 09:16:25 2000
+++ linux/mm/highmem.c Fri May 12 09:27:14 2000
@@ -66,6 +66,13 @@
return new_page;
}
+/*
+ * Special zonelist so we can just query the highmem pool and
+ * return immediately if there is no highmem page free.
+ */
+static zonelist_t high_zonelist =
+ { { NODE_DATA(0)->node_zones + ZONE_HIGHMEM, NULL, }, __GFP_HIGHMEM };
+
struct page * replace_with_highmem(struct page * page)
{
struct page *highpage;
@@ -74,13 +81,11 @@
if (PageHighMem(page) || !nr_free_highpages())
return page;
- highpage = alloc_page(GFP_ATOMIC|__GFP_HIGHMEM);
+ highpage = __alloc_pages(&high_zonelist, 0);
if (!highpage)
return page;
- if (!PageHighMem(highpage)) {
- __free_page(highpage);
- return page;
- }
+ if (!PageHighMem(highpage))
+ BUG();
vaddr = kmap(highpage);
copy_page((void *)vaddr, (void *)page_address(page));
--- linux/include/linux/mm.h.orig Fri May 12 08:46:55 2000
+++ linux/include/linux/mm.h Fri May 12 09:27:56 2000
@@ -471,33 +471,49 @@
* GFP bitmasks..
*/
#define __GFP_WAIT 0x01
-#define __GFP_HIGH 0x02
+#define __GFP_PRIO 0x02
#define __GFP_IO 0x04
+/*
+ * indicates that the buffer will be suitable for DMA. Ignored on some
+ * platforms, used as appropriate on others
+ */
#define __GFP_DMA 0x08
+
+/*
+ * indicates that the buffer can be taken from high memory,
+ * which is not permanently mapped by the kernel
+ */
#ifdef CONFIG_HIGHMEM
#define __GFP_HIGHMEM 0x10
#else
#define __GFP_HIGHMEM 0x0 /* noop */
#endif
-
-#define GFP_BUFFER (__GFP_HIGH | __GFP_WAIT)
-#define GFP_ATOMIC (__GFP_HIGH)
-#define GFP_USER (__GFP_WAIT | __GFP_IO)
-#define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM)
-#define GFP_KERNEL (__GFP_HIGH | __GFP_WAIT | __GFP_IO)
-#define GFP_NFS (__GFP_HIGH | __GFP_WAIT | __GFP_IO)
-#define GFP_KSWAPD (__GFP_IO)
-
-/* Flag - indicates that the buffer will be suitable for DMA. Ignored on some
- platforms, used as appropriate on others */
-
-#define GFP_DMA __GFP_DMA
-
-/* Flag - indicates that the buffer can be taken from high memory which is not
- permanently mapped by the kernel */
-
-#define GFP_HIGHMEM __GFP_HIGHMEM
+/*
+ * The 5 GFP bits:
+ * ( __GFP_WAIT | __GFP_PRIO | __GFP_IO | __GFP_DMA | __GFP_HIGHMEM )
+ *
+ * The most typical combinations:
+ */
+
+#define GFP_BUFFER \
+ ( __GFP_WAIT | __GFP_PRIO | 0 | 0 | 0 )
+#define GFP_ATOMIC \
+ ( 0 | __GFP_PRIO | 0 | 0 | 0 )
+#define GFP_USER \
+ ( __GFP_WAIT | 0 | __GFP_IO | 0 | 0 )
+#define GFP_HIGHUSER \
+ ( __GFP_WAIT | 0 | __GFP_IO | 0 | __GFP_HIGHMEM )
+#define GFP_KERNEL \
+ ( __GFP_WAIT | __GFP_PRIO | __GFP_IO | 0 | 0 )
+#define GFP_NFS \
+ ( __GFP_WAIT | __GFP_PRIO | __GFP_IO | 0 | 0 )
+#define GFP_KSWAPD \
+ ( 0 | 0 | __GFP_IO | 0 | 0 )
+#define GFP_DMA \
+ ( 0 | 0 | 0 | __GFP_DMA | 0 )
+#define GFP_HIGHMEM \
+ ( 0 | 0 | 0 | 0 | __GFP_HIGHMEM )
/* vma is the first one with address < vma->vm_end,
* and even address < vma->vm_start. Have to extend vma. */
--- linux/include/linux/slab.h.orig Fri May 12 09:05:15 2000
+++ linux/include/linux/slab.h Fri May 12 09:27:56 2000
@@ -22,7 +22,7 @@
#define SLAB_NFS GFP_NFS
#define SLAB_DMA GFP_DMA
-#define SLAB_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_HIGHMEM)
+#define SLAB_LEVEL_MASK (__GFP_WAIT|__GFP_PRIO|__GFP_IO|__GFP_HIGHMEM)
#define SLAB_NO_GROW 0x00001000UL /* don't grow a cache */
/* flags to pass to kmem_cache_create().
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 18:53 ` Ingo Molnar
@ 2000-05-12 19:06 ` Linus Torvalds
2000-05-12 19:36 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-12 19:06 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel
On Fri, 12 May 2000, Ingo Molnar wrote:
>
> i initially tested pre7-9 and it showed bad behavior: high kswapd activity
> trying to balance highmem, while the pagecache is primarily filled from
> the highmem. I dont think this can be fixed without 'silencing'
> ZONE_HIGHMEM's balancing activities: the pagecache allocates from highmem
> so it puts direct pressure on the highmem zone.
If this is true, then that is a bug in the allocator.
I tried very hard (but must obviously have failed), to make the allocator
_always_ do the right thing - never allocating from a zone that causes
memory balancing if there is another zone that is preferable.
> This had two effects: wasted CPU time, but it also limited the
> page-cache's maximum size to the size of highmem. I'll try the final
> pre7-2.3.99 kernel as well in a minute to make sure. (i think the bad
> behavior is still be there, judging from the differences between pre9 and
> the final patch.)
Please fix the memory allocator instead. It should really go to the next
zone instead of allocating more from the highmem zone.
Actually, I think the real bug is kswapd - I thought the "for (;;)" loop
was a good idea, but I've since actually thought about it more, and in
real life we really just want to go to sleep when we need to re-schedule,
because if there is any _real_ memory pressure people _will_ wake us up
anyway. So before you touch the memory allocator logic, you might want to
change the
if (tsk->need_resched)
schedule();
to a
if (tsk->need_resched)
goto sleep;
(and add a "sleep:" thing to inside the if-statement that makes us go to
sleep). That way, if we end up scheduling away from kswapd, we won't waste
time scheduling back unless we really should.
But do check out __alloc_pages() too, maybe you see some obvious bug of
mine that I just never thought about.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 19:06 ` Linus Torvalds
@ 2000-05-12 19:36 ` Ingo Molnar
2000-05-12 19:40 ` Ingo Molnar
2000-05-12 19:54 ` Ingo Molnar
2 siblings, 0 replies; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 19:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel
On Fri, 12 May 2000, Linus Torvalds wrote:
> > i initially tested pre7-9 and it showed bad behavior: high kswapd activity
> > trying to balance highmem, while the pagecache is primarily filled from
> > the highmem. I dont think this can be fixed without 'silencing'
> > ZONE_HIGHMEM's balancing activities: the pagecache allocates from highmem
> > so it puts direct pressure on the highmem zone.
>
> If this is true, then that is a bug in the allocator.
i just re-checked final pre7-2.3.99, and saw similar behavior. Once
ZONE_HIGHMEM is empty kswapd eats ~6% CPU time (constantly running),
highmem freecount (in /proc/meminfo) fluctuating slightly above zero, but
pagecache is not growing anymore - although there is still lots of
ZONE_NORMAL RAM around.
> anyway. So before you touch the memory allocator logic, you might want to
> change the
>
> if (tsk->need_resched)
> schedule();
>
> to a
>
> if (tsk->need_resched)
> goto sleep;
>
> (and add a "sleep:" thing to inside the if-statement that makes us go to
> sleep). That way, if we end up scheduling away from kswapd, we won't waste
> time scheduling back unless we really should.
ok, will try this, and will try to find where it fails.
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 19:06 ` Linus Torvalds
2000-05-12 19:36 ` Ingo Molnar
@ 2000-05-12 19:40 ` Ingo Molnar
2000-05-12 19:54 ` Ingo Molnar
2 siblings, 0 replies; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 19:40 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel
note that now i'm running the 4GB variant of highmem (easier to fill up) -
so the physical memory layout goes like this:
1GB permanently mapped RAM
~2GB highmem
(only 2GB highmem because 5GB of RAM is above 4GB, so unaccesible to
normal 32-bit PTEs.)
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 19:06 ` Linus Torvalds
2000-05-12 19:36 ` Ingo Molnar
2000-05-12 19:40 ` Ingo Molnar
@ 2000-05-12 19:54 ` Ingo Molnar
2000-05-12 22:48 ` Rik van Riel
2 siblings, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-12 19:54 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rik van Riel, Andrea Arcangeli, MM mailing list, linux-kernel
yes, this appears to have done the trick (patch attached). A 15MB/sec
stream of pure read activity started filling up highmem first. There was
still a light spike of kswapd activity once highmem got filled up, but it
stabilized after a few seconds. Then the pagecache filled up the normal
zone just as fast as it filled up the highmem zone, and now it's in steady
state, with kswapd using up ~5% CPU time [fluctuating, sometimes as high
as 15%, sometimes zero]. (it's recycling LRU pages?) Cool!
Ingo
--- linux/mm/vmscan.c.orig Fri May 12 12:28:58 2000
+++ linux/mm/vmscan.c Fri May 12 12:29:50 2000
@@ -543,13 +543,14 @@
something_to_do = 1;
do_try_to_free_pages(GFP_KSWAPD);
if (tsk->need_resched)
- schedule();
+ goto sleep;
}
run_task_queue(&tq_disk);
pgdat = pgdat->node_next;
} while (pgdat);
if (!something_to_do) {
+sleep:
tsk->state = TASK_INTERRUPTIBLE;
interruptible_sleep_on(&kswapd_wait);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 19:54 ` Ingo Molnar
@ 2000-05-12 22:48 ` Rik van Riel
2000-05-13 11:57 ` Stephen C. Tweedie
0 siblings, 1 reply; 67+ messages in thread
From: Rik van Riel @ 2000-05-12 22:48 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Andrea Arcangeli, MM mailing list, linux-kernel
On Fri, 12 May 2000, Ingo Molnar wrote:
> --- linux/mm/vmscan.c.orig Fri May 12 12:28:58 2000
> +++ linux/mm/vmscan.c Fri May 12 12:29:50 2000
> @@ -543,13 +543,14 @@
> something_to_do = 1;
> do_try_to_free_pages(GFP_KSWAPD);
> if (tsk->need_resched)
> - schedule();
> + goto sleep;
> }
> run_task_queue(&tq_disk);
> pgdat = pgdat->node_next;
> } while (pgdat);
>
> if (!something_to_do) {
> +sleep:
> tsk->state = TASK_INTERRUPTIBLE;
> interruptible_sleep_on(&kswapd_wait);
> }
This is wrong. It will make it much much easier for processes to
get killed (as demonstrated by quintela's VM test suite).
The correct fix probably is to have the _same_ watermark for
something_to_do *and* the "easy allocation" in __alloc_pages.
(very much untested patch versus pre7-9 below)
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--- vmscan.c.orig Thu May 11 12:13:08 2000
+++ vmscan.c Fri May 12 19:46:49 2000
@@ -542,8 +542,9 @@
zone_t *zone = pgdat->node_zones+ i;
if (!zone->size || !zone->zone_wake_kswapd)
continue;
- something_to_do = 1;
do_try_to_free_pages(GFP_KSWAPD);
+ if (zone->free_pages < zone->pages_low)
+ something_to_do = 1;
if (tsk->need_resched)
schedule();
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 22:48 ` Rik van Riel
@ 2000-05-13 11:57 ` Stephen C. Tweedie
2000-05-13 12:03 ` Rik van Riel
0 siblings, 1 reply; 67+ messages in thread
From: Stephen C. Tweedie @ 2000-05-13 11:57 UTC (permalink / raw)
To: Rik van Riel
Cc: Ingo Molnar, Linus Torvalds, Andrea Arcangeli, MM mailing list,
linux-kernel, Stephen Tweedie
Hi,
On Fri, May 12, 2000 at 07:48:45PM -0300, Rik van Riel wrote:
> > if (tsk->need_resched)
> > - schedule();
> > + goto sleep;
>
> This is wrong. It will make it much much easier for processes to
> get killed (as demonstrated by quintela's VM test suite).
It shouldn't. If tasks are getting killed, then the fix should be
in alloc_pages, not in kswapd. Tasks _should_ be quite able to wait
for memory, and if necessary, drop into try_to_free_pages themselves.
Linus, the fix above seems to be necessary. Without it, even a simple
playing of mp3 audio on 2.3 fails once memory is full on a 256MB box,
with kswapd consuming between 5% and 25% of CPU and locking things up
sufficiently to cause dropouts in the playback every second or more.
With that one-liner fix, mp3 is smooth even in the presence of other
background file activity.
--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-13 11:57 ` Stephen C. Tweedie
@ 2000-05-13 12:03 ` Rik van Riel
2000-05-13 12:14 ` Ingo Molnar
0 siblings, 1 reply; 67+ messages in thread
From: Rik van Riel @ 2000-05-13 12:03 UTC (permalink / raw)
To: Stephen C. Tweedie
Cc: Ingo Molnar, Linus Torvalds, MM mailing list, linux-kernel
On Sat, 13 May 2000, Stephen C. Tweedie wrote:
> On Fri, May 12, 2000 at 07:48:45PM -0300, Rik van Riel wrote:
>
> > > if (tsk->need_resched)
> > > - schedule();
> > > + goto sleep;
> >
> > This is wrong. It will make it much much easier for processes to
> > get killed (as demonstrated by quintela's VM test suite).
>
> It shouldn't. If tasks are getting killed, then the fix should be
> in alloc_pages, not in kswapd. Tasks _should_ be quite able to wait
> for memory, and if necessary, drop into try_to_free_pages themselves.
Indeed, but waiting for memory or running
try_to_free_pages themselves is not without
problems either, as you describe below...
> Linus, the fix above seems to be necessary. Without it, even a
> simple playing of mp3 audio on 2.3 fails once memory is full on
> a 256MB box, with kswapd consuming between 5% and 25% of CPU and
> locking things up sufficiently to cause dropouts in the playback
> every second or more. With that one-liner fix, mp3 is smooth
> even in the presence of other background file activity.
Kswapd freeing pages in the background means that processes
in the foreground can proceed with their allocation without
waiting, leading to smoother VM performance. I guess we
want that ... ;)
Besides, kswapd will _only_ continue if there's a zone with
zone->free_pages < zone->pages_low ... I'm now running pre8
with the patch below and it works fine.
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--- mm/vmscan.c.orig Fri May 12 20:13:08 2000
+++ mm/vmscan.c Fri May 12 20:15:24 2000
@@ -538,16 +538,19 @@
int i;
for(i = 0; i < MAX_NR_ZONES; i++) {
zone_t *zone = pgdat->node_zones+ i;
+ if (tsk->need_resched)
+ schedule();
if (!zone->size || !zone->zone_wake_kswapd)
continue;
- something_to_do = 1;
+ if (zone->free_pages < zone->pages_low)
+ something_to_do = 1;
do_try_to_free_pages(GFP_KSWAPD);
}
run_task_queue(&tq_disk);
pgdat = pgdat->node_next;
} while (pgdat);
- if (tsk->need_resched || !something_to_do) {
+ if (!something_to_do) {
tsk->state = TASK_INTERRUPTIBLE;
interruptible_sleep_on(&kswapd_wait);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-13 12:03 ` Rik van Riel
@ 2000-05-13 12:14 ` Ingo Molnar
2000-05-13 14:23 ` Ingo Molnar
0 siblings, 1 reply; 67+ messages in thread
From: Ingo Molnar @ 2000-05-13 12:14 UTC (permalink / raw)
To: Rik van Riel
Cc: Stephen C. Tweedie, Linus Torvalds, MM mailing list, linux-kernel
i've also seen a bit more frequent allocation failures on pre8, during
high (but non-trashing) VM load. Will try your patch now.
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-13 12:14 ` Ingo Molnar
@ 2000-05-13 14:23 ` Ingo Molnar
0 siblings, 0 replies; 67+ messages in thread
From: Ingo Molnar @ 2000-05-13 14:23 UTC (permalink / raw)
To: Rik van Riel
Cc: Stephen C. Tweedie, Linus Torvalds, MM mailing list, linux-kernel
> i've also seen a bit more frequent allocation failures on pre8, during
> high (but non-trashing) VM load. Will try your patch now.
your patch has improved out-of-memory behavior, i have seen no allocation
failures so far. (stock pre8 was occasionally failing)
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-12 16:40 ` Ingo Molnar
2000-05-12 17:15 ` Rik van Riel
2000-05-12 18:15 ` Linus Torvalds
@ 2000-05-19 1:58 ` Andrea Arcangeli
2000-05-19 15:03 ` Rik van Riel
2 siblings, 1 reply; 67+ messages in thread
From: Andrea Arcangeli @ 2000-05-19 1:58 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Rik van Riel, Linus Torvalds, MM mailing list, linux-kernel
[ sorry for the late reply ]
On Fri, 12 May 2000, Ingo Molnar wrote:
>On Fri, 12 May 2000, Rik van Riel wrote:
>
>> But we *can* split the HIGHMEM zone into a bunch of smaller
>> ones without affecting performance. Just set zone->pages_min
>> and zone->pages_low to 0 and zone->pages_high to some smallish
>> value. Then we can teach the allocator to skip the zone if:
>> 1) no obscenely large amount of free pages
>> 2) zone is locked by somebody else (TryLock(zone->lock))
>
>whats the point of this splitup? (i suspect there is a point, i just
>cannot see it now. thanks.)
I quote email from Rik of 25 Apr 2000 23:10:56 on linux-mm:
-- Message-ID: <Pine.LNX.4.21.0004252240280.14340-100000@duckman.conectiva> --
We can do this just fine. Splitting a box into a dozen more
zones than what we have currently should work just fine,
except for (as you say) higher cpu use by kwapd.
If I get my balancing patch right, most of that disadvantage
should be gone as well. Maybe we *do* want to do this on
bigger SMP boxes so each processor can start out with a
separate zone and check the other zone later to avoid lock
contention?
--------------------------------------------------------------
I still strongly think that the current zone strict mem balancing design
is very broken (and I also think to be right since I believe to see
the whole picture) but I don't think I can explain my arguments
better and/or more extensively of how I just did in linux-mm some week ago.
If you see anything wrong in my reasoning please let me know. The interesting
thread was "Re: 2.3.x mem balancing" (the start were off list) in linux-mm.
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-19 1:58 ` Andrea Arcangeli
@ 2000-05-19 15:03 ` Rik van Riel
2000-05-19 16:08 ` Andrea Arcangeli
0 siblings, 1 reply; 67+ messages in thread
From: Rik van Riel @ 2000-05-19 15:03 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Ingo Molnar, Linus Torvalds, MM mailing list, linux-kernel
On Thu, 18 May 2000, Andrea Arcangeli wrote:
> I still strongly think that the current zone strict mem
> balancing design is very broken (and I also think to be right
> since I believe to see the whole picture) but I don't think I
> can explain my arguments better and/or more extensively of how I
> just did in linux-mm some week ago.
The balancing as of pre9-2 works like this:
- LRU list per pgdat
- kswapd runs and makes sure every zone has > zone->pages_low
free pages, after that it stops
- kswapd frees up to zone->pages_high pages, depending on what
pages we encounter in the LRU queue, this will make sure that
the zone with most least recently used pages will have more
free pages
- __alloc_pages() allocates all pages up to zone->pages_low on
every zone before waking up kswapd, this makes sure more pages
from the least loaded zone will be used than from more loaded
zones, this will make sure balancing between zones happens
I'm curious what would be so "very broken" about this?
AFAICS it does most of what the classzone patch would achieve,
at lower complexity and better readability.
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-19 15:03 ` Rik van Riel
@ 2000-05-19 16:08 ` Andrea Arcangeli
2000-05-19 17:05 ` Rik van Riel
2000-05-19 22:28 ` Linus Torvalds
0 siblings, 2 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2000-05-19 16:08 UTC (permalink / raw)
To: Rik van Riel; +Cc: Ingo Molnar, Linus Torvalds, MM mailing list, linux-kernel
On Fri, 19 May 2000, Rik van Riel wrote:
>I'm curious what would be so "very broken" about this?
You start eating from ZONE_DMA before you made empty ZONE_NORMAL.
>AFAICS it does most of what the classzone patch would achieve,
>at lower complexity and better readability.
I disagree.
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-19 16:08 ` Andrea Arcangeli
@ 2000-05-19 17:05 ` Rik van Riel
2000-05-19 22:28 ` Linus Torvalds
1 sibling, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-19 17:05 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Ingo Molnar, Linus Torvalds, MM mailing list, linux-kernel
On Fri, 19 May 2000, Andrea Arcangeli wrote:
> On Fri, 19 May 2000, Rik van Riel wrote:
>
> >I'm curious what would be so "very broken" about this?
>
> You start eating from ZONE_DMA before you made empty ZONE_NORMAL.
What's wrong with this? We'll never go below zone->pages_low
in ZONE_DMA, so you don't have to worry about running out of
DMA pages.
> >AFAICS it does most of what the classzone patch would achieve,
> >at lower complexity and better readability.
>
> I disagree.
The classzone patches look like a bunch of magic to most of the
people who've read it and with whom I've spoken. There has been
almost no explanation of what the patch tries to achieve or why
it would work better than the normal code (nor is it visible in
the code).
Juan Quintela's patch, on the other hand, has received continuous
feedback from 7 kernel hackers, all of whom now understand how the
code works. This provides a lot more long-term maintainability of
the code.
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [patch] balanced highmem subsystem under pre7-9
2000-05-19 16:08 ` Andrea Arcangeli
2000-05-19 17:05 ` Rik van Riel
@ 2000-05-19 22:28 ` Linus Torvalds
1 sibling, 0 replies; 67+ messages in thread
From: Linus Torvalds @ 2000-05-19 22:28 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Rik van Riel, Ingo Molnar, MM mailing list, linux-kernel
On Fri, 19 May 2000, Andrea Arcangeli wrote:
> On Fri, 19 May 2000, Rik van Riel wrote:
>
> >I'm curious what would be so "very broken" about this?
>
> You start eating from ZONE_DMA before you made empty ZONE_NORMAL.
THIS IS NOT A BUG!
It's a feature. I don't see why you insist on calling this a problem.
We do NOT keep free memory around just for DMA allocations. We
fundamentally keep free memory around because the buddy allocator (_any_
allocator, in fact) needs some slop in order to do a reasonable job at
allocating contiguous page regions, for example. We keep free memory
around because that way we have a "buffer" to allocate from atomically, so
that when network traffic occurs or there is other behaviour that requires
memory without being able to free it on the spot, we have memory to give.
Keeping only DMA memory around would be =bad=. It would mean, for example,
that when a new packet comes in on the network, it would always be
allocated from the DMA region, because the normal zone hasn't even been
balanced ("why balance it when we still have DMA memory?"). And that would
be a huge mistake, because that would mean, for example, that by selecting
the right allocation patterns and by opening sockets without reading the
data they receive the right way, somebody could force all of DMA memory to
be used up by network allocations that wouldn't be free'd.
In short, your very fundamental premise is BROKEN, Andrea. We want to keep
normal memory around, even if there is low memory available. The same is
true of high memory, for similar reasons.
Face it. The original zone-only code had problems. One of the worst
problems was that it would try to free up a lot of "normal" memory if it
got low on DMA memory. Those problems have pretty much been fixed, and
they had _nothing_ to do with your "class" patches. They were bugs, plain
and simple, not design mistakes.
If you think you should have zero free normal pages, YOU have a design
mistake. We should not be that black-and-white. The whole point in having
the min/low/max stuff is to make memory allocation less susceptible to
border conditions, and turn a black-and-white situation into more of a
"levels of gray" situation.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 0:16 ` Linus Torvalds
2000-05-11 0:32 ` Linus Torvalds
2000-05-11 1:04 ` [PATCH] Recent VM fiasco - fixed Juan J. Quintela
@ 2000-05-11 11:12 ` Christoph Rohland
2000-05-11 17:38 ` Steve Dodd
3 siblings, 0 replies; 67+ messages in thread
From: Christoph Rohland @ 2000-05-11 11:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: James H. Cloos Jr., linux-mm, linux-kernel
Linus Torvalds <torvalds@transmeta.com> writes:
> Ok, there's a pre7-9 out there, and the biggest change versus pre7-8 is
[...]
> Just the dirty buffer handling made quite an enormous difference, so
> please do test this if you hated earlier pre7 kernels.
# vmstat 5
9 3 0 0 921884 1796 12776 0 0 0 0 108 77813 2 90 8
11 1 1 12044 523248 1080 25232 0 2494 0 624 327 16323 1 97 3
13 0 1 16468 728120 720 29000 0 3818 0 955 364 17820 3 97 0
11 1 1 336 237340 720 13040 0 1114 0 278 200 10402 1 99 0
10 2 1 476 41628 720 13184 0 4066 0 1017 401 5792 1 99 0
VM: killing process ipctst
VM: killing process ipctst
VM: killing process ipctst
4 5 1 31872 2500 96 25592 22 13447 6 3362 983 10863 0 82 1
5 4 1 58708 675260 280 19024 0 5388 12 1355 2231 1558 0 77 23
0 0 0 58708 675260 280 19024 0 0 0 0 112 4 0 0 100
I still hate it ;-)
Greetings
Christoph
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-11 0:16 ` Linus Torvalds
` (2 preceding siblings ...)
2000-05-11 11:12 ` [PATCH] Recent VM fiasco - fixed Christoph Rohland
@ 2000-05-11 17:38 ` Steve Dodd
3 siblings, 0 replies; 67+ messages in thread
From: Steve Dodd @ 2000-05-11 17:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: James H. Cloos Jr., linux-mm, linux-kernel
On Wed, May 10, 2000 at 05:16:05PM -0700, Linus Torvalds wrote:
[..]
> Just the dirty buffer handling made quite an enormous difference, so
> please do test this if you hated earlier pre7 kernels.
I definitely hate pre7-9.
For various reasons, I'm stuck on a 16Mb box right now. I just tried to start
dselect[0], and it got killed. It's completely repeatable, and running vmstat
shows that something demented is happening:
frodo:~$ vmstat 1 # and then start dselect on another terminal
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 2508 6544 196 5124 6 4 94 9 137 56 40 5 55
0 0 0 2508 6524 196 5140 0 0 16 0 134 4 0 2 98
0 0 0 2508 6524 196 5140 0 0 0 0 106 2 0 2 98
0 0 0 2508 6520 200 5140 0 0 1 0 112 6 0 2 98
0 0 0 2508 6200 204 5224 16 0 77 0 125 29 4 4 92
0 0 0 2508 6200 204 5224 0 0 0 0 103 2 0 2 98
1 0 0 2508 5332 212 5504 0 0 285 0 117 21 42 4 54
1 0 0 2508 3748 216 6004 0 0 501 0 119 24 83 7 11
1 0 0 2508 2664 220 6388 0 0 389 69 164 20 55 5 40
1 0 0 2508 964 224 7020 0 0 631 0 117 15 83 6 11
1 0 0 2508 364 216 6692 0 0 341 0 113 22 81 15 5
1 0 0 2504 288 208 5900 0 0 512 0 114 18 78 22 0
1 0 0 2504 364 112 5068 0 0 514 0 114 25 77 18 5
1 0 1 2504 252 72 4416 0 0 483 12 137 47 73 15 13
1 0 0 2504 264 68 4448 0 0 511 13 147 77 32 20 48
VM: killing process dselect
0 2 0 2504 8044 76 3960 176 0 803 0 220 137 16 23 61
0 0 0 2504 8032 76 3964 0 0 2 0 106 8 0 2 98
0 0 0 2504 8032 76 3964 0 0 1 0 105 6 0 2 98
I'm not an "mm person", but that doesn't look optimal to me.
The box does have a reasonable amount of swap:
frodo:~$ cat /proc/swaps
Filename Type Size Used Priority
/dev/hdc2 partition 18140 2480 -1
/dev/hdc4 partition 50396 0 -2
[0] so I could install the libbfd header files to compile kdb to poke at the
loop device lock-up stuff so I can use loop for testing ntfs stuff.. I'm
stuck in a maze of twisty kernel bugs, none alike..
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] Recent VM fiasco - fixed
2000-05-09 7:56 ` Daniel Stone
2000-05-09 8:25 ` Christoph Rohland
@ 2000-05-09 10:21 ` Rik van Riel
1 sibling, 0 replies; 67+ messages in thread
From: Rik van Riel @ 2000-05-09 10:21 UTC (permalink / raw)
To: Daniel Stone; +Cc: Zlatko Calusic, linux-mm, linux-kernel, Linus Torvalds
On Tue, 9 May 2000, Daniel Stone wrote:
> That's astonishing, I'm sure, but think of us poor bastards who
> DON'T have an SMP machine with >1gig of RAM.
>
> This is a P120, 32meg.
The old zoned VM code will run that machine as efficiently
as if it had 16MB of ram. See my point now?
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* RE: [PATCH] Recent VM fiasco - fixed
@ 2000-05-11 11:26 Jones D (ISaCS)
2000-05-12 7:50 ` Andrea Arcangeli
0 siblings, 1 reply; 67+ messages in thread
From: Jones D (ISaCS) @ 2000-05-11 11:26 UTC (permalink / raw)
To: 'Rik van Riel', Simon Kirby
Cc: Linus Torvalds, linux-mm, linux-kernel
> There probably are some good bits in the classzone patch, but
> it also backs out bugfixes for bugs which have been proven to
> exist and fixed by those fixes. ;(
>
> It would be nice if Andrea could separate the good bits from
> the bad bits and make a somewhat cleaner patch...
As I've been playing with invalidate_inode_pages for the last few
days, this section of Andrea's classzone diff caught my eye.
I noticed that in Andrea's version, if a page is locked, then it is just
ignored, and never freed. He reduced the complexity of the function, and
sped it up immeasuarably, but aparently at the expense of leaking pages.
I've not looked at the rest of the patch, so my judgement is on the basis
of this section alone.
Andrea, for an improved version of that function see the patch I sent
yesterday.
d.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
* RE: [PATCH] Recent VM fiasco - fixed
2000-05-11 11:26 Jones D (ISaCS)
@ 2000-05-12 7:50 ` Andrea Arcangeli
0 siblings, 0 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2000-05-12 7:50 UTC (permalink / raw)
To: Jones D (ISaCS)
Cc: 'Rik van Riel',
Simon Kirby, Linus Torvalds, linux-mm, linux-kernel
On Thu, 11 May 2000, Jones D (ISaCS) wrote:
>As I've been playing with invalidate_inode_pages for the last few
>days, this section of Andrea's classzone diff caught my eye.
>
>I noticed that in Andrea's version, if a page is locked, then it is just
>ignored, and never freed. He reduced the complexity of the function, and
Note that the official kernel clearly ignores it too so I'm not
reinserting any bug there but only avoiding dropping performance for no
good reason and that's why I intentionally backed out such a recent
change.
To avoiding ignoring it you should wait_on_page() (you have no other way)
and according to Trond we can't do that because the caller doesn't handle
a blocking function.
Your patch ignores locked pages too from within
invalidate_inode_pages() as far I can tell.
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2000-05-19 22:28 UTC | newest]
Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-08 17:21 [PATCH] Recent VM fiasco - fixed Zlatko Calusic
2000-05-08 17:43 ` Rik van Riel
2000-05-08 18:16 ` Zlatko Calusic
2000-05-08 18:20 ` Linus Torvalds
2000-05-08 18:46 ` Rik van Riel
2000-05-08 18:53 ` Zlatko Calusic
2000-05-08 19:04 ` Rik van Riel
2000-05-09 7:56 ` Daniel Stone
2000-05-09 8:25 ` Christoph Rohland
2000-05-09 15:44 ` Linus Torvalds
2000-05-09 16:12 ` Simon Kirby
2000-05-09 17:42 ` Christoph Rohland
2000-05-09 19:50 ` Linus Torvalds
2000-05-10 11:25 ` Christoph Rohland
2000-05-10 11:50 ` Zlatko Calusic
2000-05-11 23:40 ` Mark Hahn
2000-05-10 4:05 ` James H. Cloos Jr.
2000-05-10 7:29 ` James H. Cloos Jr.
2000-05-11 0:16 ` Linus Torvalds
2000-05-11 0:32 ` Linus Torvalds
2000-05-11 16:36 ` [PATCH] Recent VM fiasco - fixed (pre7-9) Rajagopal Ananthanarayanan
2000-05-11 1:04 ` [PATCH] Recent VM fiasco - fixed Juan J. Quintela
2000-05-11 1:53 ` Simon Kirby
2000-05-11 7:23 ` Linus Torvalds
2000-05-11 14:17 ` Simon Kirby
2000-05-11 23:38 ` Simon Kirby
2000-05-12 0:09 ` Linus Torvalds
2000-05-12 2:51 ` [RFC][PATCH] shrink_mmap avoid list_del (Was: Re: [PATCH] Recent VM fiasco - fixed) Roger Larsson
2000-05-11 11:15 ` [PATCH] Recent VM fiasco - fixed Rik van Riel
2000-05-11 5:10 ` Linus Torvalds
2000-05-11 10:09 ` James H. Cloos Jr.
2000-05-11 17:25 ` Juan J. Quintela
2000-05-11 23:25 ` [patch] balanced highmem subsystem under pre7-9 Ingo Molnar
2000-05-11 23:46 ` Linus Torvalds
2000-05-12 0:08 ` Ingo Molnar
2000-05-12 0:15 ` Ingo Molnar
2000-05-12 9:02 ` Christoph Rohland
2000-05-12 9:56 ` Ingo Molnar
2000-05-12 11:49 ` Christoph Rohland
2000-05-12 16:12 ` Linus Torvalds
2000-05-12 10:57 ` Andrea Arcangeli
2000-05-12 12:11 ` Ingo Molnar
2000-05-12 12:57 ` Andrea Arcangeli
2000-05-12 13:20 ` Rik van Riel
2000-05-12 16:40 ` Ingo Molnar
2000-05-12 17:15 ` Rik van Riel
2000-05-12 18:15 ` Linus Torvalds
2000-05-12 18:53 ` Ingo Molnar
2000-05-12 19:06 ` Linus Torvalds
2000-05-12 19:36 ` Ingo Molnar
2000-05-12 19:40 ` Ingo Molnar
2000-05-12 19:54 ` Ingo Molnar
2000-05-12 22:48 ` Rik van Riel
2000-05-13 11:57 ` Stephen C. Tweedie
2000-05-13 12:03 ` Rik van Riel
2000-05-13 12:14 ` Ingo Molnar
2000-05-13 14:23 ` Ingo Molnar
2000-05-19 1:58 ` Andrea Arcangeli
2000-05-19 15:03 ` Rik van Riel
2000-05-19 16:08 ` Andrea Arcangeli
2000-05-19 17:05 ` Rik van Riel
2000-05-19 22:28 ` Linus Torvalds
2000-05-11 11:12 ` [PATCH] Recent VM fiasco - fixed Christoph Rohland
2000-05-11 17:38 ` Steve Dodd
2000-05-09 10:21 ` Rik van Riel
2000-05-11 11:26 Jones D (ISaCS)
2000-05-12 7:50 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox