[patch] VM stable again?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [patch] VM stable again?
@ 2000-05-15 15:12 Rik van Riel
  2000-05-15 15:31 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Rik van Riel @ 2000-05-15 15:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm

Hi Linus,

the patch below makes sure processes won't "eat" the pages
another process is freeing and seems to avoid the nasty
out of memory situations that people have seen.

With this patch performance isn't quite what it should be,
but I have some ideas on making performance fly (without
impacting stability, of course).

With this patch kswapd uses extremely little cpu, compared
to other kernel versions. This is probably a sign that the
apps will be able to manage VM by themselves without help
from kswapd ... except for performance of course ;)

The patch works in a very simple way:
- keep track of whether some process is critically low on
  memory and needs to call try_to_free_pages()
- if another allocation starts while the other app is in
  try_to_free_pages(), free some memory ourselves
- (skip point 2 if there is enough free memory, but that's
  just a minor performance optimisation)

This way we won't "eat" the free memory 

I'd appreciate it if some people could try it and see if
it fixes all the OOM situations.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/



--- mm/page_alloc.c.orig	Fri May 12 20:13:08 2000
+++ mm/page_alloc.c	Mon May 15 11:00:17 2000
@@ -216,6 +216,7 @@
 struct page * __alloc_pages(zonelist_t *zonelist, unsigned long order)
 {
 	zone_t **zone = zonelist->zones;
+	static atomic_t free_before_allocate = ATOMIC_INIT(0);
 	extern wait_queue_head_t kswapd_wait;
 
 	/*
@@ -243,6 +244,9 @@
 			if (page)
 				return page;
 		}
+		/* Somebody else is freeing pages? */
+		if (atomic_read(&free_before_allocate))
+			try_to_free_pages(zonelist->gfp_mask);
 	}
 
 	/*
@@ -270,10 +274,12 @@
 	 */
 	if (!(current->flags & PF_MEMALLOC)) {
 		int gfp_mask = zonelist->gfp_mask;
+		atomic_inc(&free_before_allocate);
 		if (!try_to_free_pages(gfp_mask)) {
 			if (!(gfp_mask & __GFP_HIGH))
 				goto fail;
 		}
+		atomic_dec(&free_before_allocate);
 	}
 
 	/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] VM stable again?
  2000-05-15 15:31 ` Ingo Molnar
@ 2000-05-15 15:27   ` Rik van Riel
  2000-05-15 16:15     ` Ingo Molnar
  2000-05-15 18:36   ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: Rik van Riel @ 2000-05-15 15:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, linux-mm

On Mon, 15 May 2000, Ingo Molnar wrote:
> On Mon, 15 May 2000, Rik van Riel wrote:
> 
> > - keep track of whether some process is critically low on
> >   memory and needs to call try_to_free_pages()
> > - if another allocation starts while the other app is in
> >   try_to_free_pages(), free some memory ourselves
> > - (skip point 2 if there is enough free memory, but that's
> >   just a minor performance optimisation)
> 
> yep, this should work. A minor comment:
> 
> > +		if (atomic_read(&free_before_allocate))
> 
> i believe this needs to be per-zone and should preferably be
> read within the zone spinlock - not atomic operations. Updating
> a global counter is a big time problem on SMP.

It should be per-pgdat. Per-zone probably won't work since
having this counter per-zone may interfere with both fallback
and balancing between the zones.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] VM stable again?
  2000-05-15 15:12 [patch] VM stable again? Rik van Riel
@ 2000-05-15 15:31 ` Ingo Molnar
  2000-05-15 15:27   ` Rik van Riel
  2000-05-15 18:36   ` Linus Torvalds
  2000-05-15 18:29 ` Linus Torvalds
  2000-05-15 19:01 ` Stephen C. Tweedie
  2 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2000-05-15 15:31 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Linus Torvalds, linux-mm

On Mon, 15 May 2000, Rik van Riel wrote:

> - keep track of whether some process is critically low on
>   memory and needs to call try_to_free_pages()
> - if another allocation starts while the other app is in
>   try_to_free_pages(), free some memory ourselves
> - (skip point 2 if there is enough free memory, but that's
>   just a minor performance optimisation)

yep, this should work. A minor comment:

> +		if (atomic_read(&free_before_allocate))

i believe this needs to be per-zone and should preferably be read within
the zone spinlock - not atomic operations. Updating a global counter is a
big time problem on SMP.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] VM stable again?
  2000-05-15 15:27   ` Rik van Riel
@ 2000-05-15 16:15     ` Ingo Molnar
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2000-05-15 16:15 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Linus Torvalds, linux-mm

On Mon, 15 May 2000, Rik van Riel wrote:

> It should be per-pgdat. Per-zone probably won't work since having this
> counter per-zone may interfere with both fallback and balancing
> between the zones.

per-pgdat essentially means 'global counter' on a typical SMP system.
Anyway, it's not in the hot path so it doesnt matter that much. (the the
atomic_read() generates a shared cacheline in the typical case)

	Ingo







--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] VM stable again?
  2000-05-15 15:12 [patch] VM stable again? Rik van Riel
  2000-05-15 15:31 ` Ingo Molnar
@ 2000-05-15 18:29 ` Linus Torvalds
  2000-05-15 19:01 ` Stephen C. Tweedie
  2 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2000-05-15 18:29 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm

On Mon, 15 May 2000, Rik van Riel wrote:
> 
> the patch below makes sure processes won't "eat" the pages
> another process is freeing and seems to avoid the nasty
> out of memory situations that people have seen.

Hmm.. The patch has an obvious leak: if the allocation ever fails, every
single allocator ever afterwards will be forced to try to free stuff,
simply because "free_before_allocate" wasn't decremented correctly. Which
is certainly not the right behaviour.

Also, this seems to assume that the regular reason for "out of memory" is
that the free lists emptied up while we were paging stuff out, which I do
not think is necessarily the full truth. I've definitely seen the simpler
case: just an overly eager failure from "try_to_free_pages()", which this
patch does not address.

That said, I think this patch is definitely conceptually the right thing:
it just says that while somebody else (not kswapd) is trying to free up
memory, nobody else should starve him out. I like the concept. 

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] VM stable again?
  2000-05-15 15:31 ` Ingo Molnar
  2000-05-15 15:27   ` Rik van Riel
@ 2000-05-15 18:36   ` Linus Torvalds
  1 sibling, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2000-05-15 18:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Rik van Riel, linux-mm

On Mon, 15 May 2000, Ingo Molnar wrote:
> 
> yep, this should work. A minor comment:
> 
> > +		if (atomic_read(&free_before_allocate))
> 
> i believe this needs to be per-zone and should preferably be read within
> the zone spinlock - not atomic operations. Updating a global counter is a
> big time problem on SMP.

Nope.

It can't be per zone, because there is no "zone". There is only a generic
balance between different zones.

And the critical path actually only reads the counter, which is fine on
SMP: most of the time the counter should be quiescent, with every CPU just
having a shared copy in their caches. 

However, I do think that it might make sense to make this per-zonelist, so
that if a DMA request (or a request on another node in a NUMA environment)
causes another zone-list to be low-on-memory, that should not affect the
other zone-lists.

(The per-zonelist version should have pretty much the same behaviour as a
global one in the normal cases, it's just that it doesn't have the bad
behaviour in the uncommon cases).

Rik, mind cleaning that up, and fixing the leak? After that it looks
fine..

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] VM stable again?
  2000-05-15 15:12 [patch] VM stable again? Rik van Riel
  2000-05-15 15:31 ` Ingo Molnar
  2000-05-15 18:29 ` Linus Torvalds
@ 2000-05-15 19:01 ` Stephen C. Tweedie
  2000-05-15 19:10   ` Rik van Riel
  2000-05-15 19:22   ` Ingo Molnar
  2 siblings, 2 replies; 11+ messages in thread
From: Stephen C. Tweedie @ 2000-05-15 19:01 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Linus Torvalds, linux-mm, Stephen Tweedie

Hi,

On Mon, May 15, 2000 at 12:12:03PM -0300, Rik van Riel wrote:
> 
> the patch below makes sure processes won't "eat" the pages
> another process is freeing and seems to avoid the nasty
> out of memory situations that people have seen.

One other thought here --- there is another way to achieve this.
Make try_to_free_pages() return a struct page *.  That will not
only achieve some measure of SMP locality, it also guarantees that
the page freed will be reacquired by the task which did the work to
free it.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] VM stable again?
  2000-05-15 19:01 ` Stephen C. Tweedie
@ 2000-05-15 19:10   ` Rik van Riel
  2000-05-15 19:30     ` Ingo Molnar
  2000-05-15 19:22   ` Ingo Molnar
  1 sibling, 1 reply; 11+ messages in thread
From: Rik van Riel @ 2000-05-15 19:10 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Linus Torvalds, linux-mm

On Mon, 15 May 2000, Stephen C. Tweedie wrote:
> On Mon, May 15, 2000 at 12:12:03PM -0300, Rik van Riel wrote:
> > 
> > the patch below makes sure processes won't "eat" the pages
> > another process is freeing and seems to avoid the nasty
> > out of memory situations that people have seen.
> 
> One other thought here --- there is another way to achieve this.
> Make try_to_free_pages() return a struct page *.  That will not
> only achieve some measure of SMP locality, it also guarantees
> that the page freed will be reacquired by the task which did the
> work to free it.

I've thought about this but it doesn't seem worth the extra
complexity to me. Just making sure that while our task is
freeing pages nobody else will grab those pages without having
also freed some pages seems to be enough to me.

Furthermore, the "SMP locality" you talk about will probably
be completely overshadowed by the non-locality of the VM
freeing code anyway...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] VM stable again?
  2000-05-15 19:01 ` Stephen C. Tweedie
  2000-05-15 19:10   ` Rik van Riel
@ 2000-05-15 19:22   ` Ingo Molnar
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2000-05-15 19:22 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Rik van Riel, Linus Torvalds, linux-mm

On Mon, 15 May 2000, Stephen C. Tweedie wrote:

> One other thought here --- there is another way to achieve this.
> Make try_to_free_pages() return a struct page *.  That will not
> only achieve some measure of SMP locality, it also guarantees that
> the page freed will be reacquired by the task which did the work to
> free it.

i suggested this as well, but this is not always possible. Eg. the dentry
and inode cache does a slab-free, and there is no good (existing)
mechanizm to do it and recover the page freed. And singling out
shrink_mmap() is not generic enough. (although it's the most common source
of free pages)

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] VM stable again?
  2000-05-15 19:30     ` Ingo Molnar
@ 2000-05-15 19:27       ` Rik van Riel
  0 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2000-05-15 19:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Stephen C. Tweedie, Linus Torvalds, linux-mm

On Mon, 15 May 2000, Ingo Molnar wrote:
> On Mon, 15 May 2000, Rik van Riel wrote:
> 
> > I've thought about this but it doesn't seem worth the extra complexity
> > to me. Just making sure that while our task is freeing pages nobody
> > else will grab those pages without having also freed some pages seems
> > to be enough to me.
> 
> actually wouldnt it be simpler to always call
> try_to_free_pages() when the zone is low on memory? This will
> keep the pressure on the system to recover from the low memory
> situation, and it reuses the low_on_memory flag. The new
> free_before_allocate flag is a 'now we are really low on memory'
> flag.

This would disturb the balancing between zones. We do not
want to have this flag per-zone since it would return us
to the "16/64MB unused" problem (and I believe you've seen
it too with highmem).

> > Furthermore, the "SMP locality" you talk about will probably be
> > completely overshadowed by the non-locality of the VM freeing code
> > anyway...
> 
> But it would be a performance optimization for sure, a
> __free_pages() + __alloc_pages() is saved - this can make a big
> difference if (a mostly clean) pagecache is shrunk.

But we shrink 'count' pages at the same time. It would only save
2 operations out of a *lot*. A much bigger win could be had if
the list operations in shrink_mmap() would be made simpler or
lower overhead...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] VM stable again?
  2000-05-15 19:10   ` Rik van Riel
@ 2000-05-15 19:30     ` Ingo Molnar
  2000-05-15 19:27       ` Rik van Riel
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2000-05-15 19:30 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Stephen C. Tweedie, Linus Torvalds, linux-mm

On Mon, 15 May 2000, Rik van Riel wrote:

> I've thought about this but it doesn't seem worth the extra complexity
> to me. Just making sure that while our task is freeing pages nobody
> else will grab those pages without having also freed some pages seems
> to be enough to me.

actually wouldnt it be simpler to always call try_to_free_pages() when the
zone is low on memory? This will keep the pressure on the system to
recover from the low memory situation, and it reuses the low_on_memory
flag. The new free_before_allocate flag is a 'now we are really low on
memory' flag.

> Furthermore, the "SMP locality" you talk about will probably be
> completely overshadowed by the non-locality of the VM freeing code
> anyway...

But it would be a performance optimization for sure, a __free_pages() +
__alloc_pages() is saved - this can make a big difference if (a mostly
clean) pagecache is shrunk.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2000-05-15 19:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-15 15:12 [patch] VM stable again? Rik van Riel
2000-05-15 15:31 ` Ingo Molnar
2000-05-15 15:27   ` Rik van Riel
2000-05-15 16:15     ` Ingo Molnar
2000-05-15 18:36   ` Linus Torvalds
2000-05-15 18:29 ` Linus Torvalds
2000-05-15 19:01 ` Stephen C. Tweedie
2000-05-15 19:10   ` Rik van Riel
2000-05-15 19:30     ` Ingo Molnar
2000-05-15 19:27       ` Rik van Riel
2000-05-15 19:22   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox