pre8: where has the anti-hog code gone?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* pre8: where has the anti-hog code gone?
@ 2000-05-12 23:37 Rik van Riel
  2000-05-13 15:28 ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Rik van Riel @ 2000-05-12 23:37 UTC (permalink / raw)
  To: linux-mm; +Cc: Linus Torvalds

Hi Linus,

I'm reading the pre8 code now and I see that the anti-hog
code is gone. I'm still busy developing the active/inactive
list thing, but was just doing a short test with pre8 and
noticed a *sharp* increase in the amount of filesystem IO
when a big memory hog is swapping ...

In addition, I'm seeing smaller processes blocked on disk;
this didn't happen as often when the anti-hog code was still
in and drastically reduces throughput for the memory hog
(who now has to wait in line for disk accesses).

I'm curious ... why was the anti-hog code taken out?

It helps quite a bit on systems which are more or less
low on memory (ie. not your normal working environment,
but common in universities and lots of countries all
around the world).

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-12 23:37 pre8: where has the anti-hog code gone? Rik van Riel
@ 2000-05-13 15:28 ` Linus Torvalds
  2000-05-13 18:14   ` Juan J. Quintela
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Linus Torvalds @ 2000-05-13 15:28 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm

On Fri, 12 May 2000, Rik van Riel wrote:
> 
> I'm reading the pre8 code now and I see that the anti-hog
> code is gone. I'm still busy developing the active/inactive
> list thing, but was just doing a short test with pre8 and
> noticed a *sharp* increase in the amount of filesystem IO
> when a big memory hog is swapping ...

I removed _all_ the special-case code. This included not just the hog
stuff, but pretty much all the new logic in later 2.3.x that couldn't be
sufficiently explained.

And I'm not going to add it back in before the "out of memory" condition
has been clearly understood - it's obvious right now that the system
depends critically on kswapd in order to not return out of memory, and
that is wrong. kswapd should smooth things out, it should not be a
critical bottle-neck. 

[ You may ask "why?". The reason is two-fold: (a) I don't like having a
  fragile system that depends on something like kswapd/kflushd for correct
  operation. So Linux _will_ work without bdflush, for example, and it's
  actually a common mode for laptops that want to avoid spinning up just
  to flush more smoothly. The same should be true of kswapd. And (b)
  kswapd is a regular process, as it should be, as is bound by the regular
  schduling rules. Which may, quite validly, mean that kswapd may have to
  wait for other, more important processes. We should still handle
  low-memory circumstances gracefully ]

So pre-8 with your suggested for for kswapd() looks pretty good, actually,
but still has this issue that try_to_free_pages() seems to give up too
easily and return failure when it shouldn't. I'll happily apply patches
that make for nicer behaviour once this is clearly fixed, but not before
(unless the "nicer behaviour" patch _also_ fixes the "pathological
behaviour" case ;)

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-13 15:28 ` Linus Torvalds
@ 2000-05-13 18:14   ` Juan J. Quintela
  2000-05-13 21:24   ` Arjan van de Ven
  2000-05-14 10:52   ` Ingo Molnar
  2 siblings, 0 replies; 17+ messages in thread
From: Juan J. Quintela @ 2000-05-13 18:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rik van Riel, linux-mm

>>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:

Hi

linus> So pre-8 with your suggested for for kswapd() looks pretty good, actually,
linus> but still has this issue that try_to_free_pages() seems to give up too
linus> easily and return failure when it shouldn't. I'll happily apply patches
linus> that make for nicer behaviour once this is clearly fixed, but not before
linus> (unless the "nicer behaviour" patch _also_ fixes the "pathological
linus> behaviour" case ;)

Here pre8, pre8 with any of the Rik patchs and pre9-1 looks bad.  If I
ran mmap002 in that machines it will be killed allways, now a lot of
times in around 30 seconds (in previous kernels the tests lasts around
3 min before being killed).  The system continues doing kills until
init dies, then all the system freezes, no net, no ping answer, no
keyboard answer (sysrq didn't work).  No information in logs, except
that some processes have been killed, no messages in the console
either.  If you need to reproduce the efect is easy, here in less than
5 min mmap002 test, the system is frozen.

If you need more information, let me know.

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-13 15:28 ` Linus Torvalds
  2000-05-13 18:14   ` Juan J. Quintela
@ 2000-05-13 21:24   ` Arjan van de Ven
  2000-05-13 21:59     ` Juan J. Quintela
  2000-05-14  3:41     ` Linus Torvalds
  2000-05-14 10:52   ` Ingo Molnar
  2 siblings, 2 replies; 17+ messages in thread
From: Arjan van de Ven @ 2000-05-13 21:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm

[snip stuff about "first make it work, then make it nice/fast"]

> So pre-8 with your suggested for for kswapd() looks pretty good, actually,
> but still has this issue that try_to_free_pages() seems to give up too
> easily and return failure when it shouldn't. 

I have been looking at it right now, and I think there are a few issues:

1) shrink_[id]node_memory always return 0, even if they free memory
2) shrink_inode_memory is broken for priority == 0

2) is easily fixable, but even with that fixed, my traces show that, for the
mmap002 test, shrink_mmap fails just before the OOM.

My idea is (but I have not tested this) that for priority == 0 (aka "Uh oh")
shrink_mmap or do_try_to_free_pages have to block while waiting for pages to
be commited to disk. As far as I can see, shrink_mmap just skips pages that
are being commited to disk, while these could be freed when they are waited
upon. 

Greetings,
    Arjan van de Ven
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-13 21:24   ` Arjan van de Ven
@ 2000-05-13 21:59     ` Juan J. Quintela
  2000-05-14  3:41     ` Linus Torvalds
  1 sibling, 0 replies; 17+ messages in thread
From: Juan J. Quintela @ 2000-05-13 21:59 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Linus Torvalds, linux-mm

>>>>> "arjan" == Arjan van de Ven <arjan@fenrus.demon.nl> writes:

Hi

arjan> I have been looking at it right now, and I think there are a few issues:

arjan> 1) shrink_[id]node_memory always return 0, even if they free memory
arjan> 2) shrink_inode_memory is broken for priority == 0

arjan> 2) is easily fixable, but even with that fixed, my traces show that, for the
arjan> mmap002 test, shrink_mmap fails just before the OOM.

After discussing with Arjan that changes. And later discussing with riel
about that we _need_ to swap_out more pages that we scan, because some
of the pages can be reclaimed, I made the following patch.  Now things
go better, not well, but better.

Now mmap002 finish sometimes, (where some is a low number).

The important part of the patch is the change in SWAP_COUNT, only
changing that number, I get better behaviour (thanks riel for
suggesting that).

The other patch is to make shrink_[di]cache to behave like the rest of
the shrink* functions and do the *maximum* effort when priority = 0,
not when priority = 1.  This last change improves things only a bit.

Comments?

Later, Juan.

diff -u -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-1/fs/dcache.c testing/fs/dcache.c
--- pre9-1/fs/dcache.c	Fri May 12 01:11:40 2000
+++ testing/fs/dcache.c	Sat May 13 21:58:41 2000
@@ -497,10 +497,8 @@
  */
 int shrink_dcache_memory(int priority, unsigned int gfp_mask)
 {
-	int count = 0;
+	int count = dentry_stat.nr_unused / (priority + 1);
 	lock_kernel();
-	if (priority)
-		count = dentry_stat.nr_unused / priority;
 	prune_dcache(count);
 	unlock_kernel();
 	/* FIXME: kmem_cache_shrink here should tell us
diff -u -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-1/fs/inode.c testing/fs/inode.c
--- pre9-1/fs/inode.c	Fri May 12 01:11:40 2000
+++ testing/fs/inode.c	Sat May 13 22:35:51 2000
@@ -411,11 +411,10 @@
 	(((inode)->i_state | (inode)->i_data.nrpages) == 0)
 #define INODE(entry)	(list_entry(entry, struct inode, i_list))
 
-void prune_icache(int goal)
+void prune_icache(int count)
 {
 	LIST_HEAD(list);
 	struct list_head *entry, *freeable = &list;
-	int count = 0;
 	struct inode * inode;
 
 	spin_lock(&inode_lock);
@@ -440,11 +439,10 @@
 		INIT_LIST_HEAD(&inode->i_hash);
 		list_add(tmp, freeable);
 		inode->i_state |= I_FREEING;
-		count++;
-		if (!--goal)
+		inodes_stat.nr_unused--;
+		if (!--count)
 			break;
 	}
-	inodes_stat.nr_unused -= count;
 	spin_unlock(&inode_lock);
 
 	dispose_list(freeable);
@@ -452,10 +450,7 @@
 
 int shrink_icache_memory(int priority, int gfp_mask)
 {
-	int count = 0;
-		
-	if (priority)
-		count = inodes_stat.nr_unused / priority;
+	int count = inodes_stat.nr_unused / (priority + 1);
 	prune_icache(count);
 	/* FIXME: kmem_cache_shrink here should tell us
 	   the number of pages freed, and it should
diff -u -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude pre9-1/mm/vmscan.c testing/mm/vmscan.c
--- pre9-1/mm/vmscan.c	Sat May 13 19:30:06 2000
+++ testing/mm/vmscan.c	Sat May 13 23:17:22 2000
@@ -430,7 +430,7 @@
  * latency.
  */
 #define FREE_COUNT	8
-#define SWAP_COUNT	8
+#define SWAP_COUNT	16
 static int do_try_to_free_pages(unsigned int gfp_mask)
 {
 	int priority;





-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-13 21:24   ` Arjan van de Ven
  2000-05-13 21:59     ` Juan J. Quintela
@ 2000-05-14  3:41     ` Linus Torvalds
  2000-05-14  8:45       ` Arjan van de Ven
  1 sibling, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2000-05-14  3:41 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-mm

[ Thanks for looking at this., ]

On Sat, 13 May 2000, Arjan van de Ven wrote:
> 
> My idea is (but I have not tested this) that for priority == 0 (aka "Uh oh")
> shrink_mmap or do_try_to_free_pages have to block while waiting for pages to
> be commited to disk. As far as I can see, shrink_mmap just skips pages that
> are being commited to disk, while these could be freed when they are waited
> upon. 

That's what I did in one ofthe pre-7's, and it ended up being quite bad
for performance. But that was before I put sync-out pages at the head of
the LRU queue, so what ended up happening is that that particular pre-7
tried to write out the block, and then next time around when shrink_mmap()
rolled around, because the page was still at the end of the LRU queue, so
we immediately ended up synchronously waiting for it.

With the current behaviour, which always moves a page to the front of the
LRU list if it cannot be free'd, the synchronous wait in shrink_mmap() is
probably fine, and you could try to just change "sync_page_buffers()" back
to the code that did 

	if (buffer_locked(p))
		__wait_on_buffer(p);
	else if (buffer_dirty(p))
		ll_rw_block(WRITE, 1, &p);

(instead of the current "buffer_dirty(p) && !buffer_locked(p)" test that
only starts the IO).

It's clear that at some point we _have_ to wait for pages to actually get
written out, whether they were written for paging or just because they
were dirty data buffers.

Does the above make it ok? How does it feel performance-wise?

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-14  3:41     ` Linus Torvalds
@ 2000-05-14  8:45       ` Arjan van de Ven
  2000-05-15  1:37         ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Arjan van de Ven @ 2000-05-14  8:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm

> > My idea is (but I have not tested this) that for priority == 0 (aka "Uh oh")
> > shrink_mmap or do_try_to_free_pages have to block while waiting for pages to
> > be commited to disk. As far as I can see, shrink_mmap just skips pages that
> > are being commited to disk, while these could be freed when they are waited
> > upon. 

> probably fine, and you could try to just change "sync_page_buffers()" back
> to the code that did 
> 
> 	if (buffer_locked(p))
> 		__wait_on_buffer(p);
> 	else if (buffer_dirty(p))
> 		ll_rw_block(WRITE, 1, &p);
> 
> (instead of the current "buffer_dirty(p) && !buffer_locked(p)" test that
> only starts the IO).

I changed this a bit, so that the __wait_on_buffer only gets called for
do_try_to_free_pages priority 0. With this, mmap002 doesn't OOM anymore. 


> How does it feel performance-wise?

This is a bit hard to say, as my testbox is headless. However, I started
Netscape on it (over a 100Mbit network) and did a "make -j2 bzImage" at
the same time. Netscape didn't seem to suffer, but there was usually about
30 megabytes[1] ram free (according to "top"), so maybe it is to agressive
in freeing memory.

Greetings,
   Arjan van de Ven

[1] The machine has 96 Mb total ram

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-14  8:45       ` Arjan van de Ven
@ 2000-05-15  1:37         ` Linus Torvalds
  0 siblings, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2000-05-15  1:37 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-mm

On Sun, 14 May 2000, Arjan van de Ven wrote:
> 
> > How does it feel performance-wise?
> 
> This is a bit hard to say, as my testbox is headless. However, I started
> Netscape on it (over a 100Mbit network) and did a "make -j2 bzImage" at
> the same time. Netscape didn't seem to suffer, but there was usually about
> 30 megabytes[1] ram free (according to "top"), so maybe it is to agressive
> in freeing memory.

No, it's probably not too aggressive in freeing up memory, it's just that
a kernel make is a very "wellbehaved" benchmark MM-wise.

Why? Because the kernel make will start up a lot of processes that are
short-lived in comparison to the whole build (I bet this is the first time
anybody called gcc "short-lived" - it's one slow compiler - but
comparatiely it is).

So the kernel make will actually keep noticeable amounts of memory free
"on average", simply because of processes exiting..

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-13 15:28 ` Linus Torvalds
  2000-05-13 18:14   ` Juan J. Quintela
  2000-05-13 21:24   ` Arjan van de Ven
@ 2000-05-14 10:52   ` Ingo Molnar
  2000-05-14 10:55     ` Ingo Molnar
  2000-05-14 11:28     ` Ingo Molnar
  2 siblings, 2 replies; 17+ messages in thread
From: Ingo Molnar @ 2000-05-14 10:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rik van Riel, linux-mm

On Sat, 13 May 2000, Linus Torvalds wrote:

> So pre-8 with your suggested for for kswapd() looks pretty good, actually,
> but still has this issue that try_to_free_pages() seems to give up too
> easily and return failure when it shouldn't. [...]

i believe the reason for gfp-NULL failures is the following:
do_try_to_free_pages() _does_ free pages, but we do the sync in the
writeback case _after_ releasing a particular page. This means other
processes can steal our freshly freed pages - rmqueue fails easily. So i'd
suggest the following workaround:

	if (try_to_free_pages() was succesful && final rmqueue() failed)
		goto repeat;

we could as well do the page_cache_release of the buffer-mapped cache
after sync_page_buffers(), but this only saves a single page - multipage
allocations will still have a big window to fail. The problem is that
freed RAM is anonymous right now. We can fundamentally solve this by
manipulating zone->free_pages the following way:

a __free_pages variant that does not increase zone->free_pages. this is
then later on done by the allocator (ie. __alloc_pages). This 'free page
transport' mechanizm guarantees that the non-atomic allocation path does
not 'lose' free pages along the way.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-14 10:52   ` Ingo Molnar
@ 2000-05-14 10:55     ` Ingo Molnar
  2000-05-14 11:28     ` Ingo Molnar
  1 sibling, 0 replies; 17+ messages in thread
From: Ingo Molnar @ 2000-05-14 10:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rik van Riel, linux-mm

On Sun, 14 May 2000, Ingo Molnar wrote:

> a __free_pages variant that does not increase zone->free_pages. this is
> then later on done by the allocator (ie. __alloc_pages). This 'free page
> transport' mechanizm guarantees that the non-atomic allocation path does
> not 'lose' free pages along the way.

'normal' (non- __alloc_pages()-driven) __free_pages() still increases
zone->free_pages just like before.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-14 10:52   ` Ingo Molnar
  2000-05-14 10:55     ` Ingo Molnar
@ 2000-05-14 11:28     ` Ingo Molnar
  2000-05-14 12:01       ` Rik van Riel
  1 sibling, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2000-05-14 11:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rik van Riel, MM mailing list, linux-kernel

> i believe the reason for gfp-NULL failures is the following:
> do_try_to_free_pages() _does_ free pages, but we do the sync in the
> writeback case _after_ releasing a particular page. This means other
> processes can steal our freshly freed pages - rmqueue fails easily. So i'd
> suggest the following workaround:
> 
> 	if (try_to_free_pages() was succesful && final rmqueue() failed)
> 		goto repeat;

this seems to have done the trick here - no more NULL gfps. Any better
generic suggestion than the explicit 'page transport' path between freeing
and allocation points?

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-14 11:28     ` Ingo Molnar
@ 2000-05-14 12:01       ` Rik van Riel
  2000-05-14 12:12         ` Ingo Molnar
  2000-05-14 12:19         ` Ingo Molnar
  0 siblings, 2 replies; 17+ messages in thread
From: Rik van Riel @ 2000-05-14 12:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, MM mailing list, linux-kernel

On Sun, 14 May 2000, Ingo Molnar wrote:

> this seems to have done the trick here - no more NULL gfps. Any
> better generic suggestion than the explicit 'page transport'
> path between freeing and allocation points?

Mark the zone as a "steal-before-allocate" zone while
one user process is in the page stealer because it
could not find an easy page.

if (couldn't find an easy page) {
	atomic_inc(&zone->steal_before_allocate);
	try_to_free_pages();
	blah blah blah;
	atomic_dec(&zone->steal_before_allocate);
}

And the allocation path can be changed to always call
try_to_free_pages() if zone->steal_before_allocate is
set.

This way we won't just guarantee that we can keep the page
we just freed, but also that _other_ processes won't get
false hopes and/or run out of memory. Furthermore, by going
into try_to_free_pages() a bit more agressively we could
reduce memory fragmentation a bit (but I'm not sure if this
effect would be significant or not).

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-14 12:01       ` Rik van Riel
@ 2000-05-14 12:12         ` Ingo Molnar
  2000-05-14 12:19         ` Ingo Molnar
  1 sibling, 0 replies; 17+ messages in thread
From: Ingo Molnar @ 2000-05-14 12:12 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Linus Torvalds, MM mailing list, linux-kernel

On Sun, 14 May 2000, Rik van Riel wrote:

> Mark the zone as a "steal-before-allocate" zone while
> one user process is in the page stealer because it
> could not find an easy page.

this i believe is fundamentally single-threaded (and now with the latest
Linus VM we have massively parallel allocation points). The problem is not
to notice low memory situations (we already have the low_on_memory flag),
the problem is to un-anonymize resulting free pages. Anonym freeing ==
unfairness, which unfairness ultimately leads to NULL gfp and bad
allocation latency.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-14 12:01       ` Rik van Riel
  2000-05-14 12:12         ` Ingo Molnar
@ 2000-05-14 12:19         ` Ingo Molnar
  1 sibling, 0 replies; 17+ messages in thread
From: Ingo Molnar @ 2000-05-14 12:19 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Linus Torvalds, MM mailing list, linux-kernel

On Sun, 14 May 2000, Rik van Riel wrote:

> if (couldn't find an easy page) {
> 	atomic_inc(&zone->steal_before_allocate);
> 	try_to_free_pages();
> 	blah blah blah;
> 	atomic_dec(&zone->steal_before_allocate);
> }

ignore my previous comment about single-threadedness. Yes, this could
solve the problem, but might have other problems. There are some
differences: the above method is 'global', ie. it penalizes all
allocations if a try_to_free_pages() is blocked. [think about
try_to_free_pages() blocking for a _long_ time due to some reason - every
allocation will do a try_to_free_pages even though the original low memory
situation is long gone.] Am i correct?

The fundamental point would be to shield the result of a
try_to_free_pages() from other allocation points.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
@ 2000-05-15 14:50 Mark_H_Johnson.RTS
  2000-05-15 15:58 ` Rik van Riel
  2000-05-15 16:01 ` Linus Torvalds
  0 siblings, 2 replies; 17+ messages in thread
From: Mark_H_Johnson.RTS @ 2000-05-15 14:50 UTC (permalink / raw)
  To: Juan J. Quintela; +Cc: linux-mm, riel, torvalds

I guess I have a "philosophy question" - one where I can't quite understand the
situation that we are in.
  What is the problem that killing processes is curing?
I understand that the code that [has been/still is?] killing processes is doing
so because there is no "free physical memory" - right now. Yet we have had code
to do a schedule() instead of killing the job, and gave the system the chance to
"fix" the lack of free physical memory problem (e.g., by writing dirty pages to
a mapped file or swap space on disk). From what I read from Juan's message
below, I guess this code has been lost or replaced by something more hostile to
user applications.

The problem as I see it is that we are seeing a situation where the system can
"generate" dirty pages far faster than the dirty pages make it to disk. The
relationship of
  [extremely fast CPU] --- is much faster than -> [relatively slow disk]
    -- results in --> [no free physical memory] -- system kills job--> [killed
process]
is causing the system to trigger the process killing code. The alternative I'm
suggesting is
  --> [no free memory] -- system does reschedule --> [dirty pages written & free
physical memory]
  -- resume suspended job -->  [job runs to completion, and no jobs are killed]
to give the system time to act on the situation.

If you are truly out of memory [physical memory and ALL swap space], then SOME
job has to free up memory. I think we would all agree with this premise. I
suggest we remove automatic job killing as a solution. If it must remain as a
solution, there must be several other attempts tried first.

If this is an interactive system, the user should be able to close a window or
otherwise kill a job [preferably the rogue job] to make some space available. If
this is a standalone system (say a server), the long term solution is likely
"get more memory or swap space". However, that doesn't fix the problem "right
now". In this case, give the developer or operator of that system an opportunity
to make the choice on which job to kill. Perhaps reserving a small amount of
memory [just like the disk reserve] for privileged users is a solution. Making
the job killing choice at a low level of the kernel, based on "what's currently
running" does not appear to be the "right" answer. Making this choice in the
kernel and killing "init" (as Juan notes below) is almost certainly the "wrong"
answer.

I see a choice in alternatives...
 [1] replace the raise SIGKILL code with schedule(). I've tried this in older
kernels (2.2.14) & it helps preserve system operation with mapped files, but
doesn't help when the swap file is full. [this may fix Juan's symptoms]
 [2] return "out of memory" when swap is full - let application code handle it.
If no action in "X" time, then kill a job. Add to kswapd?
 [3] long term - add a "reserve" to physical memory for root (or privileged
code). [not sure how to implement]
 [4] Protect init (could be as simple as if pid==1, then schedule() & kill
something else)
 [5] Long term - reduce resident set sizes to slow the generation of dirty
pages. Let me use a "file copy" as an example. I can use "cp A B" to do this. I
can also write a program that maps files "A" and "B" into memory & copy the
contents of "A" into "B" and then unmap the two files [Multics used to do
something like this for all file accesses]. These two methods SHOULD have
similar characteristics in terms of CPU time, memory used, elapsed time, etc.
The current VM system in Linux handles the "cp" example much better than the
memory mapped example - I think this is due to overhead in memory management
with large resident set sizes. [make code active to enforce RSSLIM].

As a system administrator and user of Linux, I am concerned about jobs getting
"killed" - please make this a "last resort". Do it only after giving me and my
users time to "fix" the problem. Thanks.

--Mark H Johnson
  <mailto:Mark_H_Johnson@raytheon.com>

|--------+----------------------->
|        |          "Juan J.     |
|        |          Quintela"    |
|        |          <quintela@fi.|
|        |          udc.es>      |
|        |                       |
|        |          05/13/00     |
|        |          01:14 PM     |
|        |                       |
|--------+----------------------->
  >----------------------------------------------------------------------------|
  |                                                                            |
  |       To:     Linus Torvalds <torvalds@transmeta.com>                      |
  |       cc:     Rik van Riel <riel@conectiva.com.br>, linux-mm@kvack.org,    |
  |       (bcc: Mark H Johnson/RTS/Raytheon/US)                                |
  |       Subject:     Re: pre8: where has the anti-hog code gone?             |
  >----------------------------------------------------------------------------|

>>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:

Hi

linus> So pre-8 with your suggested for for kswapd() looks pretty good,
actually,
linus> but still has this issue that try_to_free_pages() seems to give up too
linus> easily and return failure when it shouldn't. I'll happily apply patches
linus> that make for nicer behaviour once this is clearly fixed, but not before
linus> (unless the "nicer behaviour" patch _also_ fixes the "pathological
linus> behaviour" case ;)

Here pre8, pre8 with any of the Rik patchs and pre9-1 looks bad.  If I
ran mmap002 in that machines it will be killed allways, now a lot of
times in around 30 seconds (in previous kernels the tests lasts around
3 min before being killed).  The system continues doing kills until
init dies, then all the system freezes, no net, no ping answer, no
keyboard answer (sysrq didn't work).  No information in logs, except
that some processes have been killed, no messages in the console
either.  If you need to reproduce the efect is easy, here in less than
5 min mmap002 test, the system is frozen.

If you need more information, let me know.

Later, Juan.

--
In theory, practice and theory are the same, but in practice they
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-15 14:50 Mark_H_Johnson.RTS
@ 2000-05-15 15:58 ` Rik van Riel
  2000-05-15 16:01 ` Linus Torvalds
  1 sibling, 0 replies; 17+ messages in thread
From: Rik van Riel @ 2000-05-15 15:58 UTC (permalink / raw)
  To: Mark_H_Johnson.RTS; +Cc: Juan J. Quintela, linux-mm, torvalds

On Mon, 15 May 2000 Mark_H_Johnson.RTS@raytheon.com wrote:

>   What is the problem that killing processes is curing?

> I understand that the code that [has been/still is?] killing
> processes is doing so because there is no "free physical memory"
> - right now. Yet we have had code to do a schedule() instead of
> killing the job, and gave the system the chance to "fix" the
> lack of free physical memory problem

The problem was that while applications were busy freeing
memory themselves, other applications could happily "eat"
the pages that one application was freeing, leaving the
page-freeing application with no memory after the page
freeing was done.

With the patch I posted to linux-mm about an hour (??) ago,
this problem seems to be fixed.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pre8: where has the anti-hog code gone?
  2000-05-15 14:50 Mark_H_Johnson.RTS
  2000-05-15 15:58 ` Rik van Riel
@ 2000-05-15 16:01 ` Linus Torvalds
  1 sibling, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2000-05-15 16:01 UTC (permalink / raw)
  To: Mark_H_Johnson.RTS; +Cc: Juan J. Quintela, linux-mm, riel


On Mon, 15 May 2000 Mark_H_Johnson.RTS@raytheon.com wrote:
> 
> I guess I have a "philosophy question" - one where I can't quite understand the
> situation that we are in.
>   What is the problem that killing processes is curing?
> I understand that the code that [has been/still is?] killing processes is doing
> so because there is no "free physical memory" - right now. Yet we have had code
> to do a schedule() instead of killing the job, and gave the system the chance to
> "fix" the lack of free physical memory problem (e.g., by writing dirty pages to
> a mapped file or swap space on disk). From what I read from Juan's message
> below, I guess this code has been lost or replaced by something more hostile to
> user applications.

This is actually how Linux _used_ to work, a long long time ago. It is
very simple, and it actually worked very well indeed.

Until somebody _really_ starts to eat up memory, at which point it results
in a machine that is completely dead to the world, doing nothing but
swapping pages in and out again.

The "wait until memory is free" approach works very well under many loads,
it's just that it has some rather unfortunate pathological behaviour that
is completely unacceptable. At some point you just have to say "Enough!",
and start killing something.

The bug, of course, is that wehave been quite a bit too eager to do so;)

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2000-05-15 16:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-12 23:37 pre8: where has the anti-hog code gone? Rik van Riel
2000-05-13 15:28 ` Linus Torvalds
2000-05-13 18:14   ` Juan J. Quintela
2000-05-13 21:24   ` Arjan van de Ven
2000-05-13 21:59     ` Juan J. Quintela
2000-05-14  3:41     ` Linus Torvalds
2000-05-14  8:45       ` Arjan van de Ven
2000-05-15  1:37         ` Linus Torvalds
2000-05-14 10:52   ` Ingo Molnar
2000-05-14 10:55     ` Ingo Molnar
2000-05-14 11:28     ` Ingo Molnar
2000-05-14 12:01       ` Rik van Riel
2000-05-14 12:12         ` Ingo Molnar
2000-05-14 12:19         ` Ingo Molnar
2000-05-15 14:50 Mark_H_Johnson.RTS
2000-05-15 15:58 ` Rik van Riel
2000-05-15 16:01 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox