[dirtypatch] quickhack to make pre8/9 behave (fwd)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [dirtypatch] quickhack to make pre8/9 behave (fwd)
@ 2000-05-16 19:32 Rik van Riel
  2000-05-17  0:28 ` PATCH: less dirty (Re: [dirtypatch] quickhack to make pre8/9 behave (fwd)) Juan J. Quintela
  0 siblings, 1 reply; 11+ messages in thread
From: Rik van Riel @ 2000-05-16 19:32 UTC (permalink / raw)
  To: linux-mm; +Cc: Linus Torvalds, Stephen C. Tweedie

[ARGHHH, this time -with- patch, thanks RogerL]

Hi,

with the quick&dirty patch below the system:
- gracefully (more or less) survives mmap002
- has good performance on mmap002

To me this patch shows that we really want to wait
for dirty page IO to finish before randomly evicting
the (wrong) clean pages and dying horribly.

This is a dirty hack which should be replaced by whichever
solution people thing should be implemented to have the
allocator waiting for dirty pages to be flushed out.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/



--- fs/buffer.c.orig	Mon May 15 09:49:46 2000
+++ fs/buffer.c	Tue May 16 14:53:08 2000
@@ -2124,11 +2124,16 @@
 static void sync_page_buffers(struct buffer_head *bh)
 {
 	struct buffer_head * tmp;
+	static int rand = 0;
+	if (++rand > 64)
+		rand = 0;
 
 	tmp = bh;
 	do {
 		struct buffer_head *p = tmp;
 		tmp = tmp->b_this_page;
+		if (buffer_locked(p) && !rand)
+			__wait_on_buffer(p);
 		if (buffer_dirty(p) && !buffer_locked(p))
 			ll_rw_block(WRITE, 1, &p);
 	} while (tmp != bh);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* PATCH: less dirty (Re: [dirtypatch] quickhack to make pre8/9 behave (fwd))
  2000-05-16 19:32 [dirtypatch] quickhack to make pre8/9 behave (fwd) Rik van Riel
@ 2000-05-17  0:28 ` Juan J. Quintela
  2000-05-17 20:45   ` PATCH: Possible solution to VM problems Juan J. Quintela
  0 siblings, 1 reply; 11+ messages in thread
From: Juan J. Quintela @ 2000-05-17  0:28 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, Linus Torvalds, Stephen C. Tweedie

>>>>> "rik" == Rik van Riel <riel@conectiva.com.br> writes:

rik> [ARGHHH, this time -with- patch, thanks RogerL]
rik> Hi,

rik> with the quick&dirty patch below the system:
rik> - gracefully (more or less) survives mmap002
rik> - has good performance on mmap002

rik> To me this patch shows that we really want to wait
rik> for dirty page IO to finish before randomly evicting
rik> the (wrong) clean pages and dying horribly.

rik> This is a dirty hack which should be replaced by whichever
rik> solution people thing should be implemented to have the
rik> allocator waiting for dirty pages to be flushed out.

Hi,
        after discussing with rik several designs I have done that
        patch, it behaves better indeed that rik patch.  The patch is
        against pre9-2.  It also deletes the previos patch to lock
        try_to_free_pages by zones.

I have added one argument to try_to_free pages indicating if we want
to wait for the page.  Note also that we only wait if we are allowed
to do that by the gpf_mask.

Basically what I do is from shrink_mmap I count how many dirty buffers
I have found (magical value 10), and If I find 10 dirty buffers before
freeing a page, I will wait in the next dirty buffer, to obtain a free
page.  The value 10 needs tuning, but here performance is very good
for mmap002, could people test it with other workloads? It
looks rock solid and stable.  I need to refine a bit more the patch
but It begins to look promising.

Comments?

Later, Juan.

diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/fs/buffer.c testing/fs/buffer.c
--- pre9-2/fs/buffer.c	Fri May 12 23:46:45 2000
+++ testing/fs/buffer.c	Wed May 17 01:26:55 2000
@@ -1324,7 +1324,7 @@
 	 * instead.
 	 */
 	if (!offset) {
-		if (!try_to_free_buffers(page)) {
+		if (!try_to_free_buffers(page, 0)) {
 			atomic_inc(&buffermem_pages);
 			return 0;
 		}
@@ -2121,14 +2121,14 @@
  * This all is required so that we can free up memory
  * later.
  */
-static void sync_page_buffers(struct buffer_head *bh)
+static void sync_page_buffers(struct buffer_head *bh, int wait)
 {
-	struct buffer_head * tmp;
-
-	tmp = bh;
+	struct buffer_head * tmp = bh;
 	do {
 		struct buffer_head *p = tmp;
 		tmp = tmp->b_this_page;
+		if (buffer_locked(p) && wait)
+			__wait_on_buffer(p);
 		if (buffer_dirty(p) && !buffer_locked(p))
 			ll_rw_block(WRITE, 1, &p);
 	} while (tmp != bh);
@@ -2151,7 +2151,7 @@
  *       obtain a reference to a buffer head within a page.  So we must
  *	 lock out all of these paths to cleanly toss the page.
  */
-int try_to_free_buffers(struct page * page)
+int try_to_free_buffers(struct page * page, int wait)
 {
 	struct buffer_head * tmp, * bh = page->buffers;
 	int index = BUFSIZE_INDEX(bh->b_size);
@@ -2201,7 +2201,7 @@
 	spin_unlock(&free_list[index].lock);
 	write_unlock(&hash_table_lock);
 	spin_unlock(&lru_list_lock);	
-	sync_page_buffers(bh);
+	sync_page_buffers(bh, wait);
 	return 0;
 }
 
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/include/linux/fs.h testing/include/linux/fs.h
--- pre9-2/include/linux/fs.h	Tue May 16 01:01:20 2000
+++ testing/include/linux/fs.h	Wed May 17 02:22:34 2000
@@ -900,7 +900,7 @@
 
 extern int fs_may_remount_ro(struct super_block *);
 
-extern int try_to_free_buffers(struct page *);
+extern int try_to_free_buffers(struct page *, int);
 extern void refile_buffer(struct buffer_head * buf);
 
 #define BUF_CLEAN	0
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/include/linux/mmzone.h testing/include/linux/mmzone.h
--- pre9-2/include/linux/mmzone.h	Tue May 16 01:01:20 2000
+++ testing/include/linux/mmzone.h	Tue May 16 15:36:20 2000
@@ -70,7 +70,6 @@
 typedef struct zonelist_struct {
 	zone_t * zones [MAX_NR_ZONES+1]; // NULL delimited
 	int gfp_mask;
-	atomic_t free_before_allocate;
 } zonelist_t;
 
 #define NR_GFPINDEX		0x100
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/mm/filemap.c testing/mm/filemap.c
--- pre9-2/mm/filemap.c	Fri May 12 23:46:46 2000
+++ testing/mm/filemap.c	Wed May 17 02:23:42 2000
@@ -246,12 +246,13 @@
 
 int shrink_mmap(int priority, int gfp_mask)
 {
-	int ret = 0, count;
+	int ret = 0, count, nr_dirty;
 	LIST_HEAD(old);
 	struct list_head * page_lru, * dispose;
 	struct page * page = NULL;
 	
 	count = nr_lru_pages / (priority + 1);
+	nr_dirty = 10; /* magic number */
 
 	/* we need pagemap_lru_lock for list_del() ... subtle code below */
 	spin_lock(&pagemap_lru_lock);
@@ -303,8 +304,11 @@
 		 * of zone - it's old.
 		 */
 		if (page->buffers) {
-			if (!try_to_free_buffers(page))
-				goto unlock_continue;
+			int wait = ((gfp_mask & __GFP_IO) && (nr_dirty > 0));
+			nr_dirty--;
+
+			if (!try_to_free_buffers(page, wait))
+					goto unlock_continue;
 			/* page was locked, inode can't go away under us */
 			if (!page->mapping) {
 				atomic_dec(&buffermem_pages);
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/mm/page_alloc.c testing/mm/page_alloc.c
--- pre9-2/mm/page_alloc.c	Tue May 16 00:36:11 2000
+++ testing/mm/page_alloc.c	Tue May 16 15:36:20 2000
@@ -243,9 +243,6 @@
 			if (page)
 				return page;
 		}
-		/* Somebody else is freeing pages? */
-		if (atomic_read(&zonelist->free_before_allocate))
-			try_to_free_pages(zonelist->gfp_mask);
 	}
 
 	/*
@@ -273,11 +270,7 @@
 	 */
 	if (!(current->flags & PF_MEMALLOC)) {
 		int gfp_mask = zonelist->gfp_mask;
-		int result;
-		atomic_inc(&zonelist->free_before_allocate);
-		result = try_to_free_pages(gfp_mask);
-		atomic_dec(&zonelist->free_before_allocate);
-		if (!result) {
+		if (!try_to_free_pages(gfp_mask)) {
 			if (!(gfp_mask & __GFP_HIGH))
 				goto fail;
 		}
@@ -421,7 +414,6 @@
 		zonelist = pgdat->node_zonelists + i;
 		memset(zonelist, 0, sizeof(*zonelist));
 
-		atomic_set(&zonelist->free_before_allocate, 0);
 		zonelist->gfp_mask = i;
 		j = 0;
 		k = ZONE_NORMAL;
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/mm/vmscan.c testing/mm/vmscan.c
--- pre9-2/mm/vmscan.c	Tue May 16 00:36:11 2000
+++ testing/mm/vmscan.c	Wed May 17 02:01:20 2000
@@ -439,7 +439,7 @@
 	/* Always trim SLAB caches when memory gets low. */
 	kmem_cache_reap(gfp_mask);
 
-	priority = 6;
+	priority = 64;
 	do {
 		while (shrink_mmap(priority, gfp_mask)) {
 			if (!--count)

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* PATCH: Possible solution to VM problems
  2000-05-17  0:28 ` PATCH: less dirty (Re: [dirtypatch] quickhack to make pre8/9 behave (fwd)) Juan J. Quintela
@ 2000-05-17 20:45   ` Juan J. Quintela
  2000-05-17 23:31     ` PATCH: Possible solution to VM problems (take 2) Juan J. Quintela
  0 siblings, 1 reply; 11+ messages in thread
From: Juan J. Quintela @ 2000-05-17 20:45 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, Linus Torvalds, Stephen C. Tweedie, linux-kernel

Hi
        I have done the following modifications to yesterday patch:

- I didn't remove the free_before_allocate patch form Rik, it appears
  to work well here.
- I *fix* the nr_dirty test in yesterday patch, the sense of
  the test was reversed.
- I have change in the calculation of counter in do_try_to_free_pages,
  the suggestion came from Rik.
- I have changed the priority to 64.
 
The change of the priority means that we call shrink_mmap with
smaller count and that we also try to swap_out thinks sooner, this
gives the system smoother behaviour.  I have measured the priority
with witch we obtained one page.  The values are for a UP K6-300 with
98MB of ram running mmap002 in a loop.  The data was collected for all
the calls to do_try_to_free_page during 2 and a half hours.
     - do_try_to_free_pages failed to free pages 5 times
     - It killed 2 processes (mmap002)
     - number of calls to do_try_to_free_pages: 137k
     - calls succeed with priority = 64:         58k
     - calls succeed with priority = 63:         40k
     - calls succeed with priority = 6x:        125k
     - calls succeed with priority = 5x:          9k
     - calls succeed with priority = 4x:          1.5k
     - calls succeed with priority = 3x:          0.8k
     - calls succeed with priority = 2x:          0.4k
     - calls succeed with priority = 1x:             16
     - calls succeed with priority < 10:              6
     - calls failed:                                  5

This shows us that we have almost alway pages "freeable" indeed with a
memory *hog* like mmap002.  With this patch the killed processes
problem is _almost_ solved.

It remains the problem of the slowdown, the vmstat 1 stops from time
to time over 8 seconds, typical output is that:

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1   4096   1628     92  89744   0   4  2968  2233  332   176   3  12  86
 1  0  0   4184   1520    316  83652 596 112  1160 84735 6963  4814   0  14  85
 1  0  0   4196   1696    332  80852  12  36  2949     9  185   153  15  17  68


the stall is between 1st and 2nd line, notice that we have liberated
3MB of page cache, but we are also read a bit (1160) and me have wrote
a lot (84k).  I have notice that almost all the stalls are of size
~80k or ~40k.  This thing is easy to reproduce, when you run the first
time mmap002 it will stop vmstat output just in the moment that it
begins to swap (no more free memory).  After that it happens from time
to time, not too many times, i.e. each 4/5 runs of mmap002.

Other thing that appears to be solved is the kswapd using too much
CPU.  Here it uses 1m55second in a 45minutes mmap002 testing.

Could the people having problems with memory in 2.3.99-prex, test this
patch and report his experiences?  Thanks.

Comments?

Later, Juan.

diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/fs/buffer.c testing/fs/buffer.c
--- pre9-2/fs/buffer.c	Fri May 12 23:46:45 2000
+++ testing/fs/buffer.c	Wed May 17 19:17:27 2000
@@ -1324,7 +1324,7 @@
 	 * instead.
 	 */
 	if (!offset) {
-		if (!try_to_free_buffers(page)) {
+		if (!try_to_free_buffers(page, 0)) {
 			atomic_inc(&buffermem_pages);
 			return 0;
 		}
@@ -2121,14 +2121,14 @@
  * This all is required so that we can free up memory
  * later.
  */
-static void sync_page_buffers(struct buffer_head *bh)
+static void sync_page_buffers(struct buffer_head *bh, int wait)
 {
-	struct buffer_head * tmp;
-
-	tmp = bh;
+	struct buffer_head * tmp = bh;
 	do {
 		struct buffer_head *p = tmp;
 		tmp = tmp->b_this_page;
+		if (buffer_locked(p) && wait)
+			__wait_on_buffer(p);
 		if (buffer_dirty(p) && !buffer_locked(p))
 			ll_rw_block(WRITE, 1, &p);
 	} while (tmp != bh);
@@ -2151,7 +2151,7 @@
  *       obtain a reference to a buffer head within a page.  So we must
  *	 lock out all of these paths to cleanly toss the page.
  */
-int try_to_free_buffers(struct page * page)
+int try_to_free_buffers(struct page * page, int wait)
 {
 	struct buffer_head * tmp, * bh = page->buffers;
 	int index = BUFSIZE_INDEX(bh->b_size);
@@ -2201,7 +2201,7 @@
 	spin_unlock(&free_list[index].lock);
 	write_unlock(&hash_table_lock);
 	spin_unlock(&lru_list_lock);	
-	sync_page_buffers(bh);
+	sync_page_buffers(bh, wait);
 	return 0;
 }
 
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/include/linux/fs.h testing/include/linux/fs.h
--- pre9-2/include/linux/fs.h	Wed May 17 19:11:51 2000
+++ testing/include/linux/fs.h	Wed May 17 19:20:05 2000
@@ -900,7 +900,7 @@
 
 extern int fs_may_remount_ro(struct super_block *);
 
-extern int try_to_free_buffers(struct page *);
+extern int try_to_free_buffers(struct page *, int);
 extern void refile_buffer(struct buffer_head * buf);
 
 #define BUF_CLEAN	0
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/mm/filemap.c testing/mm/filemap.c
--- pre9-2/mm/filemap.c	Fri May 12 23:46:46 2000
+++ testing/mm/filemap.c	Wed May 17 20:03:02 2000
@@ -246,12 +246,13 @@
 
 int shrink_mmap(int priority, int gfp_mask)
 {
-	int ret = 0, count;
+	int ret = 0, count, nr_dirty;
 	LIST_HEAD(old);
 	struct list_head * page_lru, * dispose;
 	struct page * page = NULL;
 	
 	count = nr_lru_pages / (priority + 1);
+	nr_dirty = 10; /* magic number */
 
 	/* we need pagemap_lru_lock for list_del() ... subtle code below */
 	spin_lock(&pagemap_lru_lock);
@@ -303,8 +304,10 @@
 		 * of zone - it's old.
 		 */
 		if (page->buffers) {
-			if (!try_to_free_buffers(page))
-				goto unlock_continue;
+			int wait = ((gfp_mask & __GFP_IO) && (nr_dirty < 0));
+			nr_dirty--;
+			if (!try_to_free_buffers(page, wait))
+					goto unlock_continue;
 			/* page was locked, inode can't go away under us */
 			if (!page->mapping) {
 				atomic_dec(&buffermem_pages);
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/mm/vmscan.c testing/mm/vmscan.c
--- pre9-2/mm/vmscan.c	Tue May 16 00:36:11 2000
+++ testing/mm/vmscan.c	Wed May 17 21:03:30 2000
@@ -363,7 +363,7 @@
 	 * Think of swap_cnt as a "shadow rss" - it tells us which process
 	 * we want to page out (always try largest first).
 	 */
-	counter = (nr_threads << 1) >> (priority >> 1);
+	counter = (nr_threads << 2) >> (priority >> 2);
 	if (counter < 1)
 		counter = 1;
 
@@ -435,11 +435,12 @@
 {
 	int priority;
 	int count = FREE_COUNT;
 
 	/* Always trim SLAB caches when memory gets low. */
 	kmem_cache_reap(gfp_mask);
 
-	priority = 6;
+	priority = 64;
 	do {
 		while (shrink_mmap(priority, gfp_mask)) {
 			if (!--count)

 



-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PATCH: Possible solution to VM problems (take 2)
  2000-05-17 20:45   ` PATCH: Possible solution to VM problems Juan J. Quintela
@ 2000-05-17 23:31     ` Juan J. Quintela
  2000-05-18  0:12       ` Juan J. Quintela
  0 siblings, 1 reply; 11+ messages in thread
From: Juan J. Quintela @ 2000-05-17 23:31 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, Linus Torvalds, Stephen C. Tweedie, linux-kernel

Hi
        after discussions with Rik, we have arrived to the conclusions
vs. previous patch that:

1- nr_dirty should be made initialised with priority value, that means
   that for big priorities, we start *quite a lot* of async writes
   before waiting for one page.  And in low priorities, we wait for
   any page, we need memory at any cost.

2- We changed do_try_to_free_pages to return success it it has freed
   some page, not only when we have liberated count pages, that makes
   the system not to kill mmap002 never get killed, 30 minutes test.

The interactive response from the system looks better, but I need to
do more testing on that.  The system time has been reduced also.

Please, can somebody with highmem test this patch, I am very
interested in know if the default values here work there also well.
They should work well, but, who nows.

As always, comments are welcome.

Later, Juan.

PD. You can get my kernel patches from: 
    http://carpanta.dc.fi.udc.es/~quintela/kernel/

diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/fs/buffer.c testing/fs/buffer.c
--- pre9-2/fs/buffer.c	Fri May 12 23:46:45 2000
+++ testing/fs/buffer.c	Wed May 17 19:17:27 2000
@@ -1324,7 +1324,7 @@
 	 * instead.
 	 */
 	if (!offset) {
-		if (!try_to_free_buffers(page)) {
+		if (!try_to_free_buffers(page, 0)) {
 			atomic_inc(&buffermem_pages);
 			return 0;
 		}
@@ -2121,14 +2121,14 @@
  * This all is required so that we can free up memory
  * later.
  */
-static void sync_page_buffers(struct buffer_head *bh)
+static void sync_page_buffers(struct buffer_head *bh, int wait)
 {
-	struct buffer_head * tmp;
-
-	tmp = bh;
+	struct buffer_head * tmp = bh;
 	do {
 		struct buffer_head *p = tmp;
 		tmp = tmp->b_this_page;
+		if (buffer_locked(p) && wait)
+			__wait_on_buffer(p);
 		if (buffer_dirty(p) && !buffer_locked(p))
 			ll_rw_block(WRITE, 1, &p);
 	} while (tmp != bh);
@@ -2151,7 +2151,7 @@
  *       obtain a reference to a buffer head within a page.  So we must
  *	 lock out all of these paths to cleanly toss the page.
  */
-int try_to_free_buffers(struct page * page)
+int try_to_free_buffers(struct page * page, int wait)
 {
 	struct buffer_head * tmp, * bh = page->buffers;
 	int index = BUFSIZE_INDEX(bh->b_size);
@@ -2201,7 +2201,7 @@
 	spin_unlock(&free_list[index].lock);
 	write_unlock(&hash_table_lock);
 	spin_unlock(&lru_list_lock);	
-	sync_page_buffers(bh);
+	sync_page_buffers(bh, wait);
 	return 0;
 }
 
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/include/linux/fs.h testing/include/linux/fs.h
--- pre9-2/include/linux/fs.h	Wed May 17 19:11:51 2000
+++ testing/include/linux/fs.h	Thu May 18 00:44:24 2000
@@ -900,7 +900,7 @@
 
 extern int fs_may_remount_ro(struct super_block *);
 
-extern int try_to_free_buffers(struct page *);
+extern int try_to_free_buffers(struct page *, int);
 extern void refile_buffer(struct buffer_head * buf);
 
 #define BUF_CLEAN	0
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/mm/filemap.c testing/mm/filemap.c
--- pre9-2/mm/filemap.c	Fri May 12 23:46:46 2000
+++ testing/mm/filemap.c	Thu May 18 01:00:39 2000
@@ -246,12 +246,13 @@
 
 int shrink_mmap(int priority, int gfp_mask)
 {
-	int ret = 0, count;
+	int ret = 0, count, nr_dirty;
 	LIST_HEAD(old);
 	struct list_head * page_lru, * dispose;
 	struct page * page = NULL;
 	
 	count = nr_lru_pages / (priority + 1);
+	nr_dirty = priority;
 
 	/* we need pagemap_lru_lock for list_del() ... subtle code below */
 	spin_lock(&pagemap_lru_lock);
@@ -303,8 +304,10 @@
 		 * of zone - it's old.
 		 */
 		if (page->buffers) {
-			if (!try_to_free_buffers(page))
-				goto unlock_continue;
+			int wait = ((gfp_mask & __GFP_IO) && (nr_dirty < 0));
+			nr_dirty--;
+			if (!try_to_free_buffers(page, wait))
+					goto unlock_continue;
 			/* page was locked, inode can't go away under us */
 			if (!page->mapping) {
 				atomic_dec(&buffermem_pages);
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-2/mm/vmscan.c testing/mm/vmscan.c
--- pre9-2/mm/vmscan.c	Tue May 16 00:36:11 2000
+++ testing/mm/vmscan.c	Thu May 18 01:20:20 2000
@@ -363,7 +363,7 @@
 	 * Think of swap_cnt as a "shadow rss" - it tells us which process
 	 * we want to page out (always try largest first).
 	 */
-	counter = (nr_threads << 1) >> (priority >> 1);
+	counter = (nr_threads << 2) >> (priority >> 2);
 	if (counter < 1)
 		counter = 1;
 
@@ -435,11 +435,12 @@
 {
 	int priority;
 	int count = FREE_COUNT;
+	int swap_count;
 
 	/* Always trim SLAB caches when memory gets low. */
 	kmem_cache_reap(gfp_mask);
 
-	priority = 6;
+	priority = 64;
 	do {
 		while (shrink_mmap(priority, gfp_mask)) {
 			if (!--count)
@@ -471,12 +472,10 @@
 		 * put in the swap cache), so we must not count this
 		 * as a "count" success.
 		 */
-		{
-			int swap_count = SWAP_COUNT;
-			while (swap_out(priority, gfp_mask))
-				if (--swap_count < 0)
-					break;
-		}
+		swap_count = SWAP_COUNT;
+		while (swap_out(priority, gfp_mask))
+			if (--swap_count < 0)
+				break;
 	} while (--priority >= 0);
 
 	/* Always end on a shrink_mmap.. */
@@ -485,7 +484,7 @@
 			goto done;
 	}
 
-	return 0;
+	return (count != FREE_COUNT);
 
 done:
 	return 1;


-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PATCH: Possible solution to VM problems (take 2)
  2000-05-17 23:31     ` PATCH: Possible solution to VM problems (take 2) Juan J. Quintela
@ 2000-05-18  0:12       ` Juan J. Quintela
  2000-05-18  1:07         ` Rik van Riel
  2000-05-21  8:14         ` Linus Torvalds
  0 siblings, 2 replies; 11+ messages in thread
From: Juan J. Quintela @ 2000-05-18  0:12 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, Linus Torvalds, Stephen C. Tweedie, linux-kernel

Hi
        after some more testing we found that:
1- the patch works also with mem=32MB (i.e. it is a winner also for
   low mem machines)
2- Interactive performance looks great, I can run an mmap002 with size
   96MB in an 32MB machine and use an ssh session in the same machine
   to do ls/vi/... without dropouts, no way I can do that with
   previous pre-*
3- The system looks really stable now, no more processes killed for
   OOM error, and we don't see any more fails in do_try_to_free_page.

Later, Juan.

PD. I will comment the patch tomorrow, I have no more time today,
    sorry about that.


-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PATCH: Possible solution to VM problems (take 2)
  2000-05-18  0:12       ` Juan J. Quintela
@ 2000-05-18  1:07         ` Rik van Riel
  2000-05-21  8:14         ` Linus Torvalds
  1 sibling, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2000-05-18  1:07 UTC (permalink / raw)
  To: Juan J. Quintela
  Cc: linux-mm, Linus Torvalds, Stephen C. Tweedie, linux-kernel

On 18 May 2000, Juan J. Quintela wrote:

>         after some more testing we found that:
> 1- the patch works also with mem=32MB (i.e. it is a winner also for
>    low mem machines)
> 2- Interactive performance looks great, I can run an mmap002 with size
>    96MB in an 32MB machine and use an ssh session in the same machine
>    to do ls/vi/... without dropouts, no way I can do that with
>    previous pre-*
> 3- The system looks really stable now, no more processes killed for
>    OOM error, and we don't see any more fails in do_try_to_free_page.

I am now testing the patch on my small test machine and must
say that things look just *great*. I can start up a gimp while
bonnie is running without having much impact on the speed of
either.

Interactive performance is nice and stability seems to be
great as well.

I'll test it on my 512MB test machine as well and will have
more test results tomorrow. This patch is most likely good
enough to include in the kernel this night ;)

(and even if it isn't, it's a hell of a lot better than
anything we had before)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PATCH: Possible solution to VM problems (take 2)
  2000-05-18  0:12       ` Juan J. Quintela
  2000-05-18  1:07         ` Rik van Riel
@ 2000-05-21  8:14         ` Linus Torvalds
  2000-05-21 16:01           ` Rik van Riel
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2000-05-21  8:14 UTC (permalink / raw)
  To: Juan J. Quintela; +Cc: Rik van Riel, linux-mm

I'm back from Canada, and finally have DSL at home, so I tried to sync up
with the patches I had in my in-queue. 

The mm patchs in particular didn't apply any more, because my tree did
some of the same stuff, so I did only a very very partial merge, much of
it to just make a full merge later simpler. I made it available under
testing as pre9-3, would you mind taking a look?

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PATCH: Possible solution to VM problems (take 2)
  2000-05-21  8:14         ` Linus Torvalds
@ 2000-05-21 16:01           ` Rik van Riel
  2000-05-21 17:15             ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Rik van Riel @ 2000-05-21 16:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Juan J. Quintela, linux-mm

On Sun, 21 May 2000, Linus Torvalds wrote:

> The mm patchs in particular didn't apply any more, because my
> tree did some of the same stuff, so I did only a very very
> partial merge, much of it to just make a full merge later
> simpler. I made it available under testing as pre9-3, would you
> mind taking a look?

Looking good (well, I've only *read* the code, not
booted it).

The only change we may want to do is completely drop
the priority argument from swap_out since:
- if we fail through to swap_out we *must* unmap some pages
- swap_out isn't balanced against anything else, so failing
  it doesn't make much sense (IMHO)
- we really want do_try_to_free_pages to succeed every time

Of course I may have overlooked something ... please tell me
what :)

BTW, I'll soon go to work with some of davem's code and will
try to make a system with active/inactive lists. I believe the
fact that we don't have those now is responsible for the 
fragility of the current "balance" between the different memory
freeing functions... (but to be honest this too is mostly a
hunch)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PATCH: Possible solution to VM problems (take 2)
  2000-05-21 16:01           ` Rik van Riel
@ 2000-05-21 17:15             ` Linus Torvalds
  2000-05-21 19:02               ` Rik van Riel
  2000-05-22 11:27               ` PATCH: Balancing patch against pre9-3 Quintela Carreira Juan J.
  0 siblings, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2000-05-21 17:15 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Juan J. Quintela, linux-mm

On Sun, 21 May 2000, Rik van Riel wrote:
> 
> The only change we may want to do is completely drop
> the priority argument from swap_out since:
> - if we fail through to swap_out we *must* unmap some pages

I don't agree.

It's a balancing act. We go from door to door, and we say "can you spare a
dime?" The fact that shrink_mmap() said "I don't have anything for you
right now" doesn't mean that swap_out() _has_ to give us memory. If nobody
gives us anything the first time through, we should just try again. A bit
more forcefully this time.

> - swap_out isn't balanced against anything else, so failing
>   it doesn't make much sense (IMHO)

This is not how I see the balancing act at all.

Think of the priority as something everybody we ask uses to judge how
badly he wants to release memory. NOBODY balances against "somebody else".
Everybody balances its own heap of memory, and there is no "global"
balance. Think of it as the same thing as "per-zone" and "class-aware"
logic all over again.

A global balance would take the other allocators into account, and say "I
only have X pages, and they have Y pages, so _they_ should pay". A global
balancing algorithm is based on envy of each others pages.

The local balance is more a "Oh, since he asks me with priority 10, I'll
just see if I can quickly look through 1% of my oldest pages, and if I
find something that I'm comfortable giving you, I'll make it available".
It doesn't take other memory users into account - it is purely selfless,
and knows that somebody asks for help.

Getting rid of the priority argument to swap_out() would mean that
swap_out() can no longer make any decisions of its own. Suddenly
swap_out() is a slave to shrink_mmap(), and is not allowed to say "there's
a lot of pressure on the VM system right now, I can't free anything up at
this moment, maybe there could be some dirty buffers you could write out
instead?".

> - we really want do_try_to_free_pages to succeed every time

Well, we do want that, but at the same time we also do want it to
recognize when it really isn't making any progress. 

When our priority level turns to "Give me some pages or I'll rape your
wife and kill your children", and _still_ nobody gives us memory, we
should just realize that we should give up.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PATCH: Possible solution to VM problems (take 2)
  2000-05-21 17:15             ` Linus Torvalds
@ 2000-05-21 19:02               ` Rik van Riel
  2000-05-22 11:27               ` PATCH: Balancing patch against pre9-3 Quintela Carreira Juan J.
  1 sibling, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2000-05-21 19:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Juan J. Quintela, linux-mm

On Sun, 21 May 2000, Linus Torvalds wrote:
> On Sun, 21 May 2000, Rik van Riel wrote:
> > 
> > The only change we may want to do is completely drop
> > the priority argument from swap_out since:
> > - if we fail through to swap_out we *must* unmap some pages
> 
> Getting rid of the priority argument to swap_out() would mean
> that swap_out() can no longer make any decisions of its own.
> Suddenly swap_out() is a slave to shrink_mmap(), and is not
> allowed to say "there's a lot of pressure on the VM system right
> now, I can't free anything up at this moment, maybe there could
> be some dirty buffers you could write out instead?".

OK, you're right here.

> > - we really want do_try_to_free_pages to succeed every time
> 
> Well, we do want that, but at the same time we also do want it to
> recognize when it really isn't making any progress. 
> 
> When our priority level turns to "Give me some pages or I'll
> rape your wife and kill your children", and _still_ nobody gives
> us memory, we should just realize that we should give up.

Problem is that the current code seems to give up way
before that. We should be able to free memory from mmap002
no matter what, because we *can* (the backing store for
the data exists).

IMHO it is not acceptable that do_try_to_free_pages() can
fail on the mmap002, but you are completely right that my
quick and dirty idea is wrong.

(I'll steal davem's code and split the current lru queue
in active, inactive and laundry, then the system will
know which page to steal, how to do effective async IO
- don't wait for pages if we have inactive pages left,
but wait for laundry pages instead of stealing active
ones - and when it *has* to call swap_out)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* PATCH: Balancing patch against pre9-3
  2000-05-21 17:15             ` Linus Torvalds
  2000-05-21 19:02               ` Rik van Riel
@ 2000-05-22 11:27               ` Quintela Carreira Juan J.
  1 sibling, 0 replies; 11+ messages in thread
From: Quintela Carreira Juan J. @ 2000-05-22 11:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rik van Riel, linux-mm, linux-kernel

>>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:

Hi

        I have finished the port of the patch to pre9-3.  My stability
problems yesterday was related to a bad patch applied by hand :(
(Yes I am porting a brown paper bag in my head).

Linus, you had applied almost all the patch, the remaining parts are:
       
- the use of the nr_dirty counter.  It counts the number of dirty
  pages that we are allowed to write asynchronously before we wait for
  completion.  Notice also that we are waiting only if __GFP_IO is
  set.  We initialize that variable to the priority.  This means that
  with high priorities, we will almost never wait, but with lower
  priorities (high VM pressure) we will wait almost for all pages.
  This looks to me as the correct behavior.
  
- We start do_try_to_free_pages with a *high* priority.  In my tests
  here (I have posted the numbers in previous posts), shown that with
  a priority of 64, we allocate almost always at that priority, then
  we need to wait for less pages, which means lower latency.  The other
  advantage of having a high priority is that we will push less hard
  shrink_mmap, and we will call swap_out sooner.  The last thing is no
  bad as we call swap_out sooner, but with a high_priority, which means
  that it will try to free few pages.

- Rik suggested (and I agree) that in swap_out we must *try* harder to
  swap_out pages if we are in low priorities.  That is the effect of
  the change in the counter calculation in swap_out.

- In do_try_to_free_pages, we return success if we have freed some
  page, not only if we have freed FREE_COUNT pages.  That solved the
  last problems with mmap002 been killed in pre9-2.  I have changed
  SWAP_COUNT to 16. In some of my experiments showed that only
  increasing the value of SWAP_COUNT improved the behavior of the
  system.  I think that the problem here is that under high VM
  pressure, we try to swap the same number of pages that we will try
  to free, and people reference that pages while we are waiting, or
  other processes steal our pages.  Other thougths on that?

- I have been studying which processes are swapped, and with this
  patch all the processes except the used ones are swapped.  Here xfs
  (font manager) is swapped out (it gets only one page 4k in memory).
  Ben, could you test that it also swaps xfs-tt pages?

- The problem of the stalls continue, we have stalls from time to
  time, basically when we have all the memory used by the page
  cache with dirty pages (i.e. mmap002 running alone in the machine).
  I know that this is a pathological case, but the stalls are
  sometimes as big as 5 seconds.  This case appears not to be as
  pathological as thought, people from multimedia are reporting
  similar problems, and people with big writes (dd a cdrom and
  similar are noting similar problems).  I am studying that problem.

Comments?

Later, Juan.

diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-3/mm/filemap.c testing/mm/filemap.c
--- pre9-3/mm/filemap.c	Sun May 21 17:38:02 2000
+++ testing/mm/filemap.c	Mon May 22 13:05:36 2000
@@ -244,13 +244,19 @@
 	spin_unlock(&pagecache_lock);
 }
 
+/*
+ * nr_dirty represents the number of dirty pages that we will write async
+ * before doing sync writes.  We can only do sync writes if we can
+ * wait for IO (__GFP_IO set).
+ */
 int shrink_mmap(int priority, int gfp_mask)
 {
-	int ret = 0, count;
+	int ret = 0, count, nr_dirty;
 	struct list_head * page_lru;
 	struct page * page = NULL;
 	
 	count = nr_lru_pages / (priority + 1);
+	nr_dirty = priority;
 
 	/* we need pagemap_lru_lock for list_del() ... subtle code below */
 	spin_lock(&pagemap_lru_lock);
@@ -287,7 +293,8 @@
 		 * of zone - it's old.
 		 */
 		if (page->buffers) {
-			if (!try_to_free_buffers(page, 1))
+			int wait = ((gfp_mask & __GFP_IO) && (nr_dirty-- < 0));
+			if (!try_to_free_buffers(page, wait))
 				goto unlock_continue;
 			/* page was locked, inode can't go away under us */
 			if (!page->mapping) {
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-3/mm/vmscan.c testing/mm/vmscan.c
--- pre9-3/mm/vmscan.c	Sun May 21 17:38:03 2000
+++ testing/mm/vmscan.c	Mon May 22 13:10:38 2000
@@ -363,7 +363,7 @@
 	 * Think of swap_cnt as a "shadow rss" - it tells us which process
 	 * we want to page out (always try largest first).
 	 */
-	counter = (nr_threads << 1) >> (priority >> 1);
+	counter = (nr_threads << 2) >> (priority >> 2);
 	if (counter < 1)
 		counter = 1;
 
@@ -430,16 +430,17 @@
  * latency.
  */
 #define FREE_COUNT	8
-#define SWAP_COUNT	8
+#define SWAP_COUNT	16
 static int do_try_to_free_pages(unsigned int gfp_mask)
 {
 	int priority;
 	int count = FREE_COUNT;
+	int swap_count;
 
 	/* Always trim SLAB caches when memory gets low. */
 	kmem_cache_reap(gfp_mask);
 
-	priority = 32;
+	priority = 64;
 	do {
 		while (shrink_mmap(priority, gfp_mask)) {
 			if (!--count)
@@ -471,12 +472,11 @@
 		 * put in the swap cache), so we must not count this
 		 * as a "count" success.
 		 */
-		{
-			int swap_count = SWAP_COUNT;
-			while (swap_out(priority, gfp_mask))
-				if (--swap_count < 0)
-					break;
-		}
+		swap_count = SWAP_COUNT;
+		while (swap_out(priority, gfp_mask))
+			if (--swap_count < 0)
+				break;
+
 	} while (--priority >= 0);
 
 	/* Always end on a shrink_mmap.. */
@@ -484,8 +484,8 @@
 		if (!--count)
 			goto done;
 	}
-
-	return 0;
+	/* We return 1 if we are freed some page */
+	return (count != FREE_COUNT);
 
 done:
 	return 1;


-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2000-05-22 11:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-16 19:32 [dirtypatch] quickhack to make pre8/9 behave (fwd) Rik van Riel
2000-05-17  0:28 ` PATCH: less dirty (Re: [dirtypatch] quickhack to make pre8/9 behave (fwd)) Juan J. Quintela
2000-05-17 20:45   ` PATCH: Possible solution to VM problems Juan J. Quintela
2000-05-17 23:31     ` PATCH: Possible solution to VM problems (take 2) Juan J. Quintela
2000-05-18  0:12       ` Juan J. Quintela
2000-05-18  1:07         ` Rik van Riel
2000-05-21  8:14         ` Linus Torvalds
2000-05-21 16:01           ` Rik van Riel
2000-05-21 17:15             ` Linus Torvalds
2000-05-21 19:02               ` Rik van Riel
2000-05-22 11:27               ` PATCH: Balancing patch against pre9-3 Quintela Carreira Juan J.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox