[RFC] RSS guarantees and limits

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC] RSS guarantees and limits
@ 2000-06-21 22:29 Rik van Riel
  2000-06-22 18:00 ` John Fremlin
  0 siblings, 1 reply; 30+ messages in thread
From: Rik van Riel @ 2000-06-21 22:29 UTC (permalink / raw)
  To: linux-mm; +Cc: Stephen C. Tweedie

Hi,

I think I have an idea to solve the following two problems:
- RSS guarantees and limits to protect applications from
  each other
- make sure streaming IO doesn't cause the RSS of the application
  to grow too large
- protect smaller apps from bigger memory hogs

The idea revolves around two concepts. The first idea is to
have an RSS guarantee and an RSS limit per application, which
is recalculated periodically. A process' RSS will not be shrunk
to under the guarantee and cannot be grown to over the limit.
The ratio between the guarantee and the limit is fixed (eg.
limit = 4 x guarantee).

The second concept is the keeping of statistics per mm. We will
keep statistics of both the number of page steals per mm and the
number of re-faults per mm. A page steal is when we forcefully
shrink the RSS of the mm, by swap_out. A re-fault is pretty similar
to a page fault, with the difference that re-faults only count the
pages that are 1) faulted in  and 2) were just stolen from the
application (and are still in the lru cache).

Every second (??) we walk the list of all tasks (mms?) and do
something very much like this:

if (mm->refaults * 2 > mm->steals) { 
	mm->rss_guarantee += (mm->rss_guarantee >> 4 + 1);
} else {
	mm->rss_guarantee -= (mm->rss_guarantee >> 4 + 1);
}
mm->refaults >>= 1;
mm->steals >>= 1;

This will have different effects on different kinds of tasks.
For example, an application which has a fixed working set will
fault *all* its pages back in and get a big rss_guarantee (and
rss_limit).

However, an application which is streaming tons of data (and
using the data only once) will find itself in the situation
where it does not reclaim most of the pages that get stolen from
it. This means that the RSS of a data streaming application will
remain limited to its working set. This should reduce the bad
effects this app has on the rest of the system. Also, when the
app hits its RSS limit and the page it releases from its VM is
dirty, we can apply write throttling.

One extra protection is needed in this scheme. We must make sure
that the RSS guarantees combined never get too big. We can do this
by simply making sure that all the RSS guarantees combined never
get bigger than 1/2 of physical memory. If we "need" more than that,
we can simply decrease the biggest RSS guarantees until we get below
1/2 of physical memory.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-21 22:29 [RFC] RSS guarantees and limits Rik van Riel
@ 2000-06-22 18:00 ` John Fremlin
  2000-06-22 19:12   ` Rik van Riel
  2000-06-22 21:19   ` Stephen Tweedie
  0 siblings, 2 replies; 30+ messages in thread
From: John Fremlin @ 2000-06-22 18:00 UTC (permalink / raw)
  To: linux-mm

Rik van Riel <riel@conectiva.com.br> writes:

> I think I have an idea to solve the following two problems:
> - RSS guarantees and limits to protect applications from
>   each other

I think that this principle should be queried. Taking the base unit to
be the process, while reasonable, is not IMHO a good idea.

For multiuser systems the obvious unit is the user; that is, it is
clearly necessary to stop one user hogging system memory, whether
they've got 5 or 500 processes.

For workstations, often the user is only working with one or two
processes, which are extremely large (window system and math package
for example) and a series of smaller processes (xeyes and window
manager). Taking resources away from a coffee making simulator just so
some mailnotifier can keep a stupid animation in memory doesn't make
sense.

For special boxes with only one process running, keeping others in
cache will only be harmful.

[ Perhaps some system analogous to "nice" would be helpful. I think that
the user can directly give a lot of very useful information to the VM
(for example, the hint that when memory runs out, netscape should be
killed before other processes). ]

It would be better to treat all memory objects as equals; for example,
when emacs is taking up a huge amount of memory because it's acting
alternately as a news client and tetris game, your system would only
count it as one process, which is unfair -- it is logically being
treated as two. If the page were taken as basic block, all of these
problems would be solved (or why not? after all we're supposed to be
converting to the perpage system).

> - make sure streaming IO doesn't cause the RSS of the application
>   to grow too large

This problem could be more generally stated: make sure that streaming
IO does not chuck stuff which will be looked at again out of cache.
As I explained above, I think that the process is a bad basic
unit.

> - protect smaller apps from bigger memory hogs

Why? Yes, it's very altruistic, very sportsmanlike, but giving small,
rarely used processes a form of social security is only going to
increase bureaucracy ;-)

I don't follow the idea that processes should be squashed if they're
large, and my three examples demonstrate that this is a bad.

> The idea revolves around two concepts. The first idea is to
> have an RSS guarantee and an RSS limit per application, which
> is recalculated periodically. A process' RSS will not be shrunk
> to under the guarantee and cannot be grown to over the limit.
> The ratio between the guarantee and the limit is fixed (eg.
> limit = 4 x guarantee).

This is complex and arbitrary; the concept of a guarantee is not
naturally occuring therefore (looking at the current state of the mm
code) it will become detuned if it ever gets tuned (like the priority
argument to vmscan::swap_out which is almost always between 60 and 64
on my box) and merely make more performance trouble because the
complexity isn't helping any.

I do agree that looking at and adjusting to processes memory access
patterns is a good idea, if it can be done right. I disagree with this
particular way of doing it; it feels too arbitrary and I don't think
it will do any good.

[...]

-- 

	http://altern.org/vii
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 18:00 ` John Fremlin
@ 2000-06-22 19:12   ` Rik van Riel
  2000-06-22 21:19   ` Stephen Tweedie
  1 sibling, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2000-06-22 19:12 UTC (permalink / raw)
  To: John Fremlin; +Cc: linux-mm

On 22 Jun 2000, John Fremlin wrote:
> Rik van Riel <riel@conectiva.com.br> writes:
> 
> > I think I have an idea to solve the following two problems:
> > - RSS guarantees and limits to protect applications from
> >   each other
> 
> I think that this principle should be queried. Taking the base
> unit to be the process, while reasonable, is not IMHO a good
> idea.
> 
> For multiuser systems the obvious unit is the user; that is, it
> is clearly necessary to stop one user hogging system memory,
> whether they've got 5 or 500 processes.

Once userbeans is in place this whole process can be simply
extended to work on the level of both users and processes.

> > - make sure streaming IO doesn't cause the RSS of the application
> >   to grow too large
> 
> This problem could be more generally stated: make sure that
> streaming IO does not chuck stuff which will be looked at again
> out of cache.

Which is exactly what my code will do. ;)
(you may want to try to understand my code before you flame)

> > The idea revolves around two concepts. The first idea is to
> > have an RSS guarantee and an RSS limit per application, which
> > is recalculated periodically. A process' RSS will not be shrunk
> > to under the guarantee and cannot be grown to over the limit.
> > The ratio between the guarantee and the limit is fixed (eg.
> > limit = 4 x guarantee).
> 
> This is complex and arbitrary;

> I do agree that looking at and adjusting to processes memory
> access patterns is a good idea, if it can be done right.

*sigh*

You may want to read my idea again and try to do another
response when you understand it. I'm sorry I have to flame
you like this, but you really don't seem to grasp the concept.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 18:00 ` John Fremlin
  2000-06-22 19:12   ` Rik van Riel
@ 2000-06-22 21:19   ` Stephen Tweedie
  2000-06-22 21:37     ` Rik van Riel
  2000-06-22 22:39     ` John Fremlin
  1 sibling, 2 replies; 30+ messages in thread
From: Stephen Tweedie @ 2000-06-22 21:19 UTC (permalink / raw)
  To: John Fremlin; +Cc: linux-mm

Hi,

On Thu, Jun 22, 2000 at 07:00:54PM +0100, John Fremlin wrote:
> 
> > - protect smaller apps from bigger memory hogs
> 
> Why? Yes, it's very altruistic, very sportsmanlike, but giving small,
> rarely used processes a form of social security is only going to
> increase bureaucracy ;-)

It is critically important that when under memory pressure, a
system administrator can still log in and kill any runaway
processes.  The smaller apps in question here are system daemons
such as init, inetd and telnetd, and user apps such as bash and
ps.  We _must_ be able to allow them to make at least some
progress while the VM is under load.

Cheers, 
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 21:19   ` Stephen Tweedie
@ 2000-06-22 21:37     ` Rik van Riel
  2000-06-22 22:48       ` John Fremlin
  2000-06-22 22:39     ` John Fremlin
  1 sibling, 1 reply; 30+ messages in thread
From: Rik van Riel @ 2000-06-22 21:37 UTC (permalink / raw)
  To: Stephen Tweedie; +Cc: John Fremlin, linux-mm

On Thu, 22 Jun 2000, Stephen Tweedie wrote:
> On Thu, Jun 22, 2000 at 07:00:54PM +0100, John Fremlin wrote:
> > 
> > > - protect smaller apps from bigger memory hogs
> > 
> > Why? Yes, it's very altruistic, very sportsmanlike, but giving small,
> > rarely used processes a form of social security is only going to
> > increase bureaucracy ;-)
> 
> It is critically important that when under memory pressure, a
> system administrator can still log in and kill any runaway
> processes.  The smaller apps in question here are system daemons
> such as init, inetd and telnetd, and user apps such as bash and
> ps.  We _must_ be able to allow them to make at least some
> progress while the VM is under load.

Also, the memory space used by these small apps is usually
negligable compared to the memory used by the big program.

What is 2% memory use for the big program can be the difference
between running and crawling for something like bash...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 21:37     ` Rik van Riel
@ 2000-06-22 22:48       ` John Fremlin
  2000-06-22 23:59         ` Stephen Tweedie
  0 siblings, 1 reply; 30+ messages in thread
From: John Fremlin @ 2000-06-22 22:48 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Stephen Tweedie, linux-mm

Rik van Riel <riel@conectiva.com.br> writes:

> Also, the memory space used by these small apps is usually
> negligable compared to the memory used by the big program.
> 
> What is 2% memory use for the big program can be the difference
> between running and crawling for something like bash...

I agree that this is usually true. But that is only because the big
program actually isn't using the memory; in that case the memory
should be freed anyway. If a big program were to actually use all its
memory, then this system would destroy its performance, as all the
getty's on the system and silly luser tweaks which aren't actually
being used at all would take away useful memory.

I booted up with mem=8M today, and found that even small things like
bash were about 20% of system ram. By not letting a single big process
(about the biggest that'd fit was emacs) get most all of the memory
from the various junk that wasn't being used, the system would be
completely unusable rather than merely a little slow.

-- 

	http://altern.org/vii
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 22:48       ` John Fremlin
@ 2000-06-22 23:59         ` Stephen Tweedie
  2000-06-23 16:08           ` John Fremlin
  0 siblings, 1 reply; 30+ messages in thread
From: Stephen Tweedie @ 2000-06-22 23:59 UTC (permalink / raw)
  To: John Fremlin; +Cc: Rik van Riel, Stephen Tweedie, linux-mm

Hi,

On Thu, Jun 22, 2000 at 11:48:18PM +0100, John Fremlin wrote:
> 
> I booted up with mem=8M today, and found that even small things like
> bash were about 20% of system ram. By not letting a single big process
> (about the biggest that'd fit was emacs) get most all of the memory
> from the various junk that wasn't being used, the system would be
> completely unusable rather than merely a little slow.

The RSS bounds are *DYNAMIC*.  If there is contention for memory ---
if lots of other processes want the memory that that emacs is 
holding --- then absolutely you want to cut back on the emacs RSS.
If there is no competition, and emacs is the only active process, then
there is no need to prune its RSS.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 23:59         ` Stephen Tweedie
@ 2000-06-23 16:08           ` John Fremlin
  0 siblings, 0 replies; 30+ messages in thread
From: John Fremlin @ 2000-06-23 16:08 UTC (permalink / raw)
  To: linux-mm

Stephen Tweedie <sct@redhat.com> writes:

[...]

> The RSS bounds are *DYNAMIC*.  If there is contention for memory ---
> if lots of other processes want the memory that that emacs is 
> holding --- then absolutely you want to cut back on the emacs RSS.
> If there is no competition, and emacs is the only active process, then
> there is no need to prune its RSS.

Yes, I agree with both parts. The second part is what I was trying to
get across with the example because I thought that case was being
ignored.

I thought the part of the proposal was to control its RSS and give the
surplus to the little processes so that when the admin tried to telnet
in to kill it, inetd would be in memory and nicely responsive.

You (Stephen) said earlier:
> It is critically important that when under memory pressure, a
> system administrator can still log in and kill any runaway
> processes.  [...]

I took that to imply that inetd would have to be kept in memory. Sorry
for the confusion caused.

[...]

-- 

	http://altern.org/vii
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 21:19   ` Stephen Tweedie
  2000-06-22 21:37     ` Rik van Riel
@ 2000-06-22 22:39     ` John Fremlin
  2000-06-22 23:27       ` Rik van Riel
  2000-06-24 11:22       ` Andrey Savochkin
  1 sibling, 2 replies; 30+ messages in thread
From: John Fremlin @ 2000-06-22 22:39 UTC (permalink / raw)
  To: Stephen Tweedie; +Cc: linux-mm

Stephen Tweedie <sct@redhat.com> writes:

> On Thu, Jun 22, 2000 at 07:00:54PM +0100, John Fremlin wrote:
> > 
> > > - protect smaller apps from bigger memory hogs
> > 
> > Why? Yes, it's very altruistic, very sportsmanlike, but giving small,
> > rarely used processes a form of social security is only going to
> > increase bureaucracy ;-)
> 
> It is critically important that when under memory pressure, a
> system administrator can still log in and kill any runaway
> processes.  The smaller apps in question here are system daemons
> such as init, inetd and telnetd, and user apps such as bash and
> ps.  We _must_ be able to allow them to make at least some
> progress while the VM is under load.

I agree completely. It was one of the reasons I suggested that a
syscall like nice but giving info to the mm layer would be useful. In
general, small apps (xeyes,biff,gpm) don't deserve any special
treatment.

I also said that on a multiuser system it is important that one user
can't hog the system. In the case where it is impossible for a large
app to drop root privileges being root wouldn't help unless an
exception were made for admin caps.

The only general solution I can see is to give some process (groups) a
higher MM priority, by analogy with nice.

It is critically important that an admin can login to kill a swarm of
tiny runaway processes. A tiny program that forks every few seconds
can bring down a machine just as, if not more effectively than, a
couple of large runaways.

[...]

-- 

	http://altern.org/vii
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 22:39     ` John Fremlin
@ 2000-06-22 23:27       ` Rik van Riel
  2000-06-23  0:49         ` Ed Tomlinson
  2000-06-23 15:52         ` John Fremlin
  2000-06-24 11:22       ` Andrey Savochkin
  1 sibling, 2 replies; 30+ messages in thread
From: Rik van Riel @ 2000-06-22 23:27 UTC (permalink / raw)
  To: John Fremlin; +Cc: linux-mm

On 22 Jun 2000, John Fremlin wrote:
> Stephen Tweedie <sct@redhat.com> writes:

> > It is critically important that when under memory pressure, a
> > system administrator can still log in and kill any runaway
> > processes.  The smaller apps in question here are system daemons
> > such as init, inetd and telnetd, and user apps such as bash and
> > ps.  We _must_ be able to allow them to make at least some
> > progress while the VM is under load.
> 
> I agree completely. It was one of the reasons I suggested that a
> syscall like nice but giving info to the mm layer would be
> useful. In general, small apps (xeyes,biff,gpm) don't deserve
> any special treatment.

Why not?  In scheduling processes which use less CPU get
a better response time. Why not do the same for memory
use? The less memory you use, the less agressive we'll be
in trying to take it away from you.

Of course a small app should be removed from memory when
it's sleeping, but there's no reason to not apply some
degree of fairness in memory allocation and memory stealing.

> I also said that on a multiuser system it is important that one
> user can't hog the system.

*nod*

> The only general solution I can see is to give some process
> (groups) a higher MM priority, by analogy with nice.

That you can't see anything better doesn't mean it
isn't possible ;)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 23:27       ` Rik van Riel
@ 2000-06-23  0:49         ` Ed Tomlinson
  2000-06-23 13:45           ` Rik van Riel
  2000-06-23 15:52         ` John Fremlin
  1 sibling, 1 reply; 30+ messages in thread
From: Ed Tomlinson @ 2000-06-23  0:49 UTC (permalink / raw)
  To: linux-mm

Hi,

Just wondering what will happen with java applications?  These beasts
typically have working sets of 16M or more and use 10-20 threads.  When
using native threads linux sees each one as a process.  They all share 
the same memory though.

-- 
Ed Tomlinson <tomlins@cam.org>
http://www.cam.org/~tomlins/njpipes.html
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-23  0:49         ` Ed Tomlinson
@ 2000-06-23 13:45           ` Rik van Riel
  2000-06-23 15:36             ` volodya
  0 siblings, 1 reply; 30+ messages in thread
From: Rik van Riel @ 2000-06-23 13:45 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: linux-mm

On Thu, 22 Jun 2000, Ed Tomlinson wrote:

> Just wondering what will happen with java applications?  These
> beasts typically have working sets of 16M or more and use 10-20
> threads.  When using native threads linux sees each one as a
> process.  They all share the same memory though.

Ahh, but these limits are of course applied per _MM_, not
per thread ;)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-23 13:45           ` Rik van Riel
@ 2000-06-23 15:36             ` volodya
  0 siblings, 0 replies; 30+ messages in thread
From: volodya @ 2000-06-23 15:36 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ed Tomlinson, linux-mm

What about making some userspace hooks available and leaving the task to a
daemon ?

  * pseudo-single mode: when memory pressure reserver a fixeed amount of
    for root user owned fixed list of processes
  * simple swapout algorithm (like in 2.0.x) by default
  * hooks to allow a userspace program do all clever things as needed.
    (partially mlocked userspace program ?)

why: 
  * it was a while this discussion is going on, a userspace solution will
    allow more space for experimentation without risk of corrupting kernel
    data 

  * isolate data collection and memory reclaim interfaces (I admit I am
    vague on this part...) from the logic that takes decisiions

  * swapping data out is expensive anyway (but reclaimation in read-only
    mmaped files is not...)  
 
  * userspace daemons can differ for different setups. What is more one 
    can direct them to do something specific when, say, running squid,
    apache or something very particular..

  * when we know what to do and what works merge them back into kernel 
    (perhaps as kmod or perhaps as khttpd)


                          Vladimir Dergachev


On Fri, 23 Jun 2000, Rik van Riel wrote:

> On Thu, 22 Jun 2000, Ed Tomlinson wrote:
> 
> > Just wondering what will happen with java applications?  These
> > beasts typically have working sets of 16M or more and use 10-20
> > threads.  When using native threads linux sees each one as a
> > process.  They all share the same memory though.
> 
> Ahh, but these limits are of course applied per _MM_, not
> per thread ;)
> 
> regards,
> 
> Rik
> --
> The Internet is not a network of computers. It is a network
> of people. That is its real strength.
> 
> Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
> http://www.conectiva.com/		http://www.surriel.com/
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 23:27       ` Rik van Riel
  2000-06-23  0:49         ` Ed Tomlinson
@ 2000-06-23 15:52         ` John Fremlin
  1 sibling, 0 replies; 30+ messages in thread
From: John Fremlin @ 2000-06-23 15:52 UTC (permalink / raw)
  To: linux-mm

Rik van Riel <riel@conectiva.com.br> writes:

[...]

> > I agree completely. It was one of the reasons I suggested that a
> > syscall like nice but giving info to the mm layer would be
> > useful. In general, small apps (xeyes,biff,gpm) don't deserve
> > any special treatment.
> 
> Why not?  In scheduling processes which use less CPU get
> a better response time. Why not do the same for memory
> use? The less memory you use, the less agressive we'll be
> in trying to take it away from you.

CPU != memory.

Quick reasons:
        (1) Sleeping process takes memory.

        (2) Take away 10% CPU from a program, it runs at about 90% of
        former speed. Take away 10% mem from a program, might only run
        at 5-10% of former speed due to having to wait for disk IO.
> 
> Of course a small app should be removed from memory when
> it's sleeping, but there's no reason to not apply some
> degree of fairness in memory allocation and memory stealing.

[...]

You say you can't see why small processes like shells etc. shouldn't
be specially treated (your first paragraph). Folding double negative,
you say there should be positive discrimination for these processes,
i.e. fairer distribution of memory (your second paragraph). 

If you think I'm not qualified to disagree, reread what Matthew Dillon
said to you while discussing VM changes in May:

    Well, I have a pretty strong opinion on trying to rationalize
    penalizing big processes simply because they are big.  It's a bad
    idea for several reasons, not the least of which being that by
    making such a rationalization you are assuming a particular system
    topology -- you are assuming, for example, that the system may
    contain a few large less-important processes and a reasonable
    number of small processes.  But if the system contains hundreds of
    small processes or if some of the large processes turn out to be
    important, the rationalization fails.

    Also if the large process in question happens to really need the
    pages (is accessing them all the time), trying to page those pages
    out gratuitously does nothing but create a massive paging load on
    the system.  Unless you have a mechanism to (such as FreeBSD has)
    to impose a 20-second forced sleep under extreme memory loads, any
    focus on large processes will simply result in thrashing (read:
    screw up the system).

[...]

> > The only general solution I can see is to give some process
> > (groups) a higher MM priority, by analogy with nice.
> 
> That you can't see anything better doesn't mean it isn't possible ;)

Indeed, I wait anxiously for someone to propose a better solution.

[...]

-- 

	http://altern.org/vii
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RSS guarantees and limits
  2000-06-22 22:39     ` John Fremlin
  2000-06-22 23:27       ` Rik van Riel
@ 2000-06-24 11:22       ` Andrey Savochkin
  2000-06-27  3:26         ` Stephen C. Tweedie
  1 sibling, 1 reply; 30+ messages in thread
From: Andrey Savochkin @ 2000-06-24 11:22 UTC (permalink / raw)
  To: John Fremlin; +Cc: linux-mm, Stephen Tweedie

Hello John,

On Thu, Jun 22, 2000 at 11:39:44PM +0100, John Fremlin wrote:
> Stephen Tweedie <sct@redhat.com> writes:
> > It is critically important that when under memory pressure, a
> > system administrator can still log in and kill any runaway
> > processes.  The smaller apps in question here are system daemons
> > such as init, inetd and telnetd, and user apps such as bash and
> > ps.  We _must_ be able to allow them to make at least some
> > progress while the VM is under load.
> 
> I agree completely. It was one of the reasons I suggested that a
> syscall like nice but giving info to the mm layer would be useful. In
> general, small apps (xeyes,biff,gpm) don't deserve any special
> treatment.
> 
> I also said that on a multiuser system it is important that one user
> can't hog the system. In the case where it is impossible for a large
> app to drop root privileges being root wouldn't help unless an
> exception were made for admin caps.

That is exactly my reasons of addressing memory management in the user
beancounter patch:
 - users (and administrator) should have a protection against misbehavior of
   other user's processes;
 - we really care about certain processes which we need for system management
   under memory pressure, rather than about small applications.
Small applications are not always good, as well as big are not bad.
We just want good memory service for those applications which we want to have
it :-)  It hears like tautology, but that it.  It's completely administrator
policy decision.

> The only general solution I can see is to give some process (groups) a
> higher MM priority, by analogy with nice.

Considering the problem, I stated it in a form of guarantee rather than
priority.  Let's consider nice analogy: you can ruin the latency of a
high-priority process by spawning a huge amount lower-priority ones.
Guarantee-like approach gives you configured amount of resources
independently of behavior (or misbehavior) of other processes and users.

> It is critically important that an admin can login to kill a swarm of
> tiny runaway processes. A tiny program that forks every few seconds
> can bring down a machine just as, if not more effectively than, a
> couple of large runaways.

Best regards
					Andrey V.
					Savochkin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RSS guarantees and limits
  2000-06-24 11:22       ` Andrey Savochkin
@ 2000-06-27  3:26         ` Stephen C. Tweedie
  0 siblings, 0 replies; 30+ messages in thread
From: Stephen C. Tweedie @ 2000-06-27  3:26 UTC (permalink / raw)
  To: Andrey Savochkin; +Cc: John Fremlin, linux-mm, Stephen Tweedie

Hi,

On Sat, Jun 24, 2000 at 07:22:45PM +0800, Andrey Savochkin wrote:
> 
> Small applications are not always good, as well as big are not bad.
> We just want good memory service for those applications which we want to have
> it :-)  It hears like tautology, but that it.  It's completely administrator
> policy decision.

Somewhat, but not entirely.

Remember that we were talking about both RSS limits and RSS guarantees
being dymamic.  RSS guarantees for small processes (based on their
fault activity, of course, so that idle small tasks can still be
swapped out) are perhaps dependent on what those tasks are actually
doing if the object is to have them compete against each other more
fairly.

However, RSS limits on the largest tasks in the system have an
entirely different effect --- they prevent swap storms from
overwhelming small tasks entirely, by placing more of the burden of
the swapping on the large task.

If a task is so large that it is thrashing, then removing a few 100K
from its RSS doesn't usually have all that a dramatic effect on its
performance.  Remember, we'll only be doing this pruning if there is
continuing memory pressure.  If that large task becomes the only task
wanting more memory again, we can let its RSS limit creep up again.
That way, processes which just fit into memory on an idle system will
continue to work just fine, but once we get memory contention, they
won't stop the rest of the system from getting going again.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
@ 2000-06-22 14:41 frankeh
  2000-06-22 15:31 ` Rik van Riel
  0 siblings, 1 reply; 30+ messages in thread
From: frankeh @ 2000-06-22 14:41 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm

Seems like a good idea, for ensuring some decent response time.
This seems similar to what WinNT is doing.

Do you envision that the "RSS guarantees" decay over time. I am concerned
that some daemons hanging out there and which might be executed very rarely
(e.g. inetd) might hug to much memory (cummulatively speaking).  I think NT
at some point pages the entire working set for such apps.

-- Hubertus Franke
IBM T.J.Watson Research Center

Rik van Riel <riel@conectiva.com.br>@kvack.org on 06/21/2000 06:59:44 PM

Sent by:  owner-linux-mm@kvack.org

To:   linux-mm@kvack.org
cc:   "Stephen C. Tweedie" <sct@redhat.com>
Subject:  [RFC] RSS guarantees and limits

Hi,

I think I have an idea to solve the following two problems:
- RSS guarantees and limits to protect applications from
  each other
- make sure streaming IO doesn't cause the RSS of the application
  to grow too large
- protect smaller apps from bigger memory hogs

The idea revolves around two concepts. The first idea is to
have an RSS guarantee and an RSS limit per application, which
is recalculated periodically. A process' RSS will not be shrunk
to under the guarantee and cannot be grown to over the limit.
The ratio between the guarantee and the limit is fixed (eg.
limit = 4 x guarantee).

The second concept is the keeping of statistics per mm. We will
keep statistics of both the number of page steals per mm and the
number of re-faults per mm. A page steal is when we forcefully
shrink the RSS of the mm, by swap_out. A re-fault is pretty similar
to a page fault, with the difference that re-faults only count the
pages that are 1) faulted in  and 2) were just stolen from the
application (and are still in the lru cache).

Every second (??) we walk the list of all tasks (mms?) and do
something very much like this:

if (mm->refaults * 2 > mm->steals) {
     mm->rss_guarantee += (mm->rss_guarantee >> 4 + 1);
} else {
     mm->rss_guarantee -= (mm->rss_guarantee >> 4 + 1);
}
mm->refaults >>= 1;
mm->steals >>= 1;

This will have different effects on different kinds of tasks.
For example, an application which has a fixed working set will
fault *all* its pages back in and get a big rss_guarantee (and
rss_limit).

However, an application which is streaming tons of data (and
using the data only once) will find itself in the situation
where it does not reclaim most of the pages that get stolen from
it. This means that the RSS of a data streaming application will
remain limited to its working set. This should reduce the bad
effects this app has on the rest of the system. Also, when the
app hits its RSS limit and the page it releases from its VM is
dirty, we can apply write throttling.

One extra protection is needed in this scheme. We must make sure
that the RSS guarantees combined never get too big. We can do this
by simply making sure that all the RSS guarantees combined never
get bigger than 1/2 of physical memory. If we "need" more than that,
we can simply decrease the biggest RSS guarantees until we get below
1/2 of physical memory.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/          http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 14:41 [RFC] " frankeh
@ 2000-06-22 15:31 ` Rik van Riel
  0 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2000-06-22 15:31 UTC (permalink / raw)
  To: frankeh; +Cc: linux-mm

On Thu, 22 Jun 2000 frankeh@us.ibm.com wrote:

> Seems like a good idea, for ensuring some decent response time.
> This seems similar to what WinNT is doing.

There's a big difference here. I plan on making the RSS limit system
such that most applications should be somewhere between their limit
and their guarantee when the system is under "normal" levels of
memory pressure.

That is, I want to keep global page replacement the primary page
replacement strategy and only use the RSS guarantees and limits to
guide global page replacement and limit the system from impact by
memory hogs.

> Do you envision that the "RSS guarantees" decay over time. I am
> concerned that some daemons hanging out there and which might be
> executed very rarely (e.g. inetd) might hug to much memory
> (cummulatively speaking).  I think NT at some point pages the
> entire working set for such apps.

This is what I want to avoid. Of course if a task is really
sleeping it should of course be completely removed from
memory, but a _periodic_ task like top or atd may as well be
protected a bit if memory pressure is low enough.

I know I will have to adjust my rough draft quite a bit to
achieve the wanted effects...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
@ 2000-06-22 15:49 frankeh
  2000-06-22 16:05 ` Rik van Riel
  0 siblings, 1 reply; 30+ messages in thread
From: frankeh @ 2000-06-22 15:49 UTC (permalink / raw)
  To: linux-mm

I assume that in the <workstation> scenario, where there are limited number
of processes, your approach will work just fine.

In a server scenario where you might have lots of processes (with limited
resource requirements) this might have different effects
This inevidably will happen when we move Linux to NUMA or large scale SMP
systems and we apply images like that to webhosting.

Do you think that the resulting RSS guarantees (function of
<mem_size/2*process_count>) will  be sufficient ? Or is your assumption,
that for this kind of server apps with lots of running processes, you
better don't overextent your memory and start paging (acceptable
assumption)..

-- Hubertus

Rik van Riel <riel@conectiva.com.br> on 06/22/2000 12:01:18 PM

To:   Hubertus Franke/Watson/IBM@IBMUS
cc:   linux-mm@kvack.org
Subject:  Re: [RFC] RSS guarantees and limits

On Thu, 22 Jun 2000 frankeh@us.ibm.com wrote:

> Seems like a good idea, for ensuring some decent response time.
> This seems similar to what WinNT is doing.

There's a big difference here. I plan on making the RSS limit system
such that most applications should be somewhere between their limit
and their guarantee when the system is under "normal" levels of
memory pressure.

That is, I want to keep global page replacement the primary page
replacement strategy and only use the RSS guarantees and limits to
guide global page replacement and limit the system from impact by
memory hogs.

> Do you envision that the "RSS guarantees" decay over time. I am
> concerned that some daemons hanging out there and which might be
> executed very rarely (e.g. inetd) might hug to much memory
> (cummulatively speaking).  I think NT at some point pages the
> entire working set for such apps.

This is what I want to avoid. Of course if a task is really
sleeping it should of course be completely removed from
memory, but a _periodic_ task like top or atd may as well be
protected a bit if memory pressure is low enough.

I know I will have to adjust my rough draft quite a bit to
achieve the wanted effects...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/          http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 15:49 frankeh
@ 2000-06-22 16:05 ` Rik van Riel
  0 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2000-06-22 16:05 UTC (permalink / raw)
  To: frankeh; +Cc: linux-mm

On Thu, 22 Jun 2000 frankeh@us.ibm.com wrote:

> I assume that in the <workstation> scenario, where there are
> limited number of processes, your approach will work just fine.
> 
> In a server scenario where you might have lots of processes
> (with limited resource requirements) this might have different
> effects This inevidably will happen when we move Linux to NUMA
> or large scale SMP systems and we apply images like that to
> webhosting.

This is exactly why I want to have the RSS guarantees and
limits auto-tune themselves, depending on the ratio between
re-faults (where we have stolen a page from the working set
of a process) and page steals (these pages were not from the
working set).

If we steal a lot of pages from a process and the process
doesn't take these same pages back, we should continue stealing
from that process since obviously it isn't using all its pages.
(or it only uses the pages once)

Also, stolen pages will stay around in memory, outside of the
working set of the process, but in one of the various caches.
If they are faulted back very quickly no disk IO is needed at
all ... and faulting them back quickly is an indication that
we're stealing too many pages from the process.

> Do you think that the resulting RSS guarantees (function of
> <mem_size/2*process_count>) will be sufficient ?

The RSS guarantee is just that, a guarantee. We guarantee that
the RSS of the process will not be shrunk below its guarantee,
but that doesn't stop any process from having a larger RSS (up
to its RSS limit).

> Or is your assumption, that for this kind of server apps with
> lots of running processes, you better don't overextent your
> memory and start paging (acceptable assumption)..

If we recycle memory pages _before_ the application can re-fault
them in from the page/swap cache, it won't be able to make the
re-fault and its RSS guarantee and limit will be shrunk...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
@ 2000-06-22 16:22 frankeh
  2000-06-22 16:38 ` Rik van Riel
  2000-06-22 19:48 ` Jamie Lokier
  0 siblings, 2 replies; 30+ messages in thread
From: frankeh @ 2000-06-22 16:22 UTC (permalink / raw)
  To: linux-mm

Now I understand this much better. The RSS guarantee is a function of the
refault-rate <clever>.
This in principle implements a decay of the limit based on usage.... I like
that approach.
Is there a hardstop RSS limit below you will not evict pages from a process
(e.g.   mem_size / MAX_PROCESSES ?) to give some interactivity for
processes that haven't executed for a while, or you just let it go down
based on the refault-rate...

-- Hubertus

Rik van Riel <riel@conectiva.com.br>@kvack.org on 06/22/2000 12:35:06 PM

Sent by:  owner-linux-mm@kvack.org

To:   Hubertus Franke/Watson/IBM@IBMUS
cc:   linux-mm@kvack.org
Subject:  Re: [RFC] RSS guarantees and limits

On Thu, 22 Jun 2000 frankeh@us.ibm.com wrote:

> I assume that in the <workstation> scenario, where there are
> limited number of processes, your approach will work just fine.
>
> In a server scenario where you might have lots of processes
> (with limited resource requirements) this might have different
> effects This inevidably will happen when we move Linux to NUMA
> or large scale SMP systems and we apply images like that to
> webhosting.

This is exactly why I want to have the RSS guarantees and
limits auto-tune themselves, depending on the ratio between
re-faults (where we have stolen a page from the working set
of a process) and page steals (these pages were not from the
working set).

If we steal a lot of pages from a process and the process
doesn't take these same pages back, we should continue stealing
from that process since obviously it isn't using all its pages.
(or it only uses the pages once)

Also, stolen pages will stay around in memory, outside of the
working set of the process, but in one of the various caches.
If they are faulted back very quickly no disk IO is needed at
all ... and faulting them back quickly is an indication that
we're stealing too many pages from the process.

> Do you think that the resulting RSS guarantees (function of
> <mem_size/2*process_count>) will be sufficient ?

The RSS guarantee is just that, a guarantee. We guarantee that
the RSS of the process will not be shrunk below its guarantee,
but that doesn't stop any process from having a larger RSS (up
to its RSS limit).

> Or is your assumption, that for this kind of server apps with
> lots of running processes, you better don't overextent your
> memory and start paging (acceptable assumption)..

If we recycle memory pages _before_ the application can re-fault
them in from the page/swap cache, it won't be able to make the
re-fault and its RSS guarantee and limit will be shrunk...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/          http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 16:22 frankeh
@ 2000-06-22 16:38 ` Rik van Riel
  2000-06-22 19:48 ` Jamie Lokier
  1 sibling, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2000-06-22 16:38 UTC (permalink / raw)
  To: frankeh; +Cc: linux-mm

On Thu, 22 Jun 2000 frankeh@us.ibm.com wrote:

> Now I understand this much better. The RSS guarantee is a
> function of the refault-rate <clever>. This in principle
> implements a decay of the limit based on usage.... I like that
> approach.

My previous anti-hog code (it even seemed to work) was to
"push" big processes harder than small processes. If, for
example, process A is N times bigger than process B, every
page in process A would get sqrt(N) times the memory pressure
a page in process B would get. This promotes fairness between
memory hogs.

This code will adjust the guarantee and the limit to the
type of memory usage, so a process which streams over a huge
amount of data just once will be restricted to maybe a few
times its window size so it'll be unable to push other processes
out of memory by simply accessing all the data quickly (but just
once).

For a fair VM we probably want a combination of this new idea
*and* some fairness measures. Preferably in such a way that
we don't interfere too much with the strategy of global page
replacement...

> Is there a hardstop RSS limit below you will not evict pages
> from a process (e.g.  mem_size / MAX_PROCESSES ?) to give some
> interactivity for processes that haven't executed for a while,
> or you just let it go down based on the refault-rate...

There is none, but maybe we should have the RSS guarantee just
go down slower and slower depending on the size of the process?

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 16:22 frankeh
  2000-06-22 16:38 ` Rik van Riel
@ 2000-06-22 19:48 ` Jamie Lokier
  2000-06-22 19:52   ` Rik van Riel
  1 sibling, 1 reply; 30+ messages in thread
From: Jamie Lokier @ 2000-06-22 19:48 UTC (permalink / raw)
  To: frankeh; +Cc: linux-mm

frankeh@us.ibm.com wrote:
> Now I understand this much better. The RSS guarantee is a function of the
> refault-rate <clever>.
> This in principle implements a decay of the limit based on usage.... I like
> that approach.

Be careful with refault rate.  If a process is unable to progress
because of memory pressure, it will have a low refault rate even though
it's _trying_ to fault in lots of pages at high speed.

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 19:48 ` Jamie Lokier
@ 2000-06-22 19:52   ` Rik van Riel
  2000-06-22 20:00     ` Jamie Lokier
  0 siblings, 1 reply; 30+ messages in thread
From: Rik van Riel @ 2000-06-22 19:52 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: frankeh, linux-mm

On Thu, 22 Jun 2000, Jamie Lokier wrote:
> frankeh@us.ibm.com wrote:
> > Now I understand this much better. The RSS guarantee is a function of the
> > refault-rate <clever>.
> > This in principle implements a decay of the limit based on usage.... I like
> > that approach.
> 
> Be careful with refault rate.  If a process is unable to
> progress because of memory pressure, it will have a low refault
> rate even though it's _trying_ to fault in lots of pages at high
> speed.

*nod*

We probably want to use fault rate and memory size too in
order to promote fairness.

All of this may sound complicated, but as long as we make
sure that the feedback cycles are short (and negative ;))
it should all work out...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 19:52   ` Rik van Riel
@ 2000-06-22 20:00     ` Jamie Lokier
  2000-06-22 20:07       ` Rik van Riel
  0 siblings, 1 reply; 30+ messages in thread
From: Jamie Lokier @ 2000-06-22 20:00 UTC (permalink / raw)
  To: Rik van Riel; +Cc: frankeh, linux-mm

Rik van Riel wrote:
> > Be careful with refault rate.  If a process is unable to
> > progress because of memory pressure, it will have a low refault
> > rate even though it's _trying_ to fault in lots of pages at high
> > speed.
> 
> We probably want to use fault rate and memory size too in
> order to promote fairness.

The number of global memory events between the process getting one page
and requesting the next may indicate of how much page activity the
process is trying to do.  (Relative to other memory users).

> All of this may sound complicated, but as long as we make
> sure that the feedback cycles are short (and negative ;))
> it should all work out...

Keeping them negative is tricky :-)

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-22 20:00     ` Jamie Lokier
@ 2000-06-22 20:07       ` Rik van Riel
  0 siblings, 0 replies; 30+ messages in thread
From: Rik van Riel @ 2000-06-22 20:07 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: frankeh, linux-mm

On Thu, 22 Jun 2000, Jamie Lokier wrote:
> Rik van Riel wrote:
> > > Be careful with refault rate.  If a process is unable to
> > > progress because of memory pressure, it will have a low refault
> > > rate even though it's _trying_ to fault in lots of pages at high
> > > speed.
> > 
> > We probably want to use fault rate and memory size too in
> > order to promote fairness.
> 
> The number of global memory events between the process getting one page
> and requesting the next may indicate of how much page activity the
> process is trying to do.  (Relative to other memory users).

Oh, there are lots of possible things we could look at here.
The main thing to keep in mind is to always look at _ratios_
and not at pure magic numbers ... 

> > All of this may sound complicated, but as long as we make
> > sure that the feedback cycles are short (and negative ;))
> > it should all work out...
> 
> Keeping them negative is tricky :-)

Hehe, tell me all about it ;)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
@ 2000-06-22 23:02 Mark_H_Johnson
  0 siblings, 0 replies; 30+ messages in thread
From: Mark_H_Johnson @ 2000-06-22 23:02 UTC (permalink / raw)
  To: riel; +Cc: linux-mm, sct

Controls on resident set size has been one of the items I really want to
see established. I have some concerns about what is suggested here and have
a few suggestions. I prefer user settable RSS limits that are enforced by
the kernel & use automated methods when the user doesn't set any such
limits.

My situation is this. I'm looking at deploying a large "real time"
simulation on a cluster of Linux machines. The main application will be
locked in memory and must have predictable execution patterns. To aid in
development, we will have a number of workstations. I want to be able to
run the main application at "slower than real time" on those workstations -
using paging & swapping as needed.
  [1] Our real time application(s) will lock lots [perhaps 1/2 to 3/4] of
physical memory.
    - The RSS for our application must be at least large enough to cover
the "locked" memory plus some additional space for TBD purposes.
    - The RSS for remaining processes must be "reasonable" - take into
consideration the locked memory as unavailable until released.
    - The transition from lots of memory is "free" to lots of memory is
"locked" has to be managed in some way.
    We know in advance what "reasonable" values are for RLIMIT_RSS & can
set them appropriately. I doubt an automatic system can do well in this
case.
  [2] On the workstation, we want good performance from the program under
test.
    - The RSS of our application must be large relative to the rest of the
system applications
    - There needs to be some balance between our application and other
applications - to run gdb, X, and other tools used during test
    This is a similar situation to above when I really do want a "memory
hog" to use most of the system memory. I think user settable RSS limits
would still be better than an automatic system.

Using the existing RSS limits would go a long way to enabling us to set the
system up and meet these diverse needs. At this time, I absolutely prefer
to initiate swapping of tasks to preserve the RSS of the application we're
delivering to our customer. On our development machines, some automatic
tuning would be OK, but I don't see how it will run "better" (as measured
by page fault rates) than with carefully selected values based on the
applications being run. If there's plenty of space available, I don't mind
automatic methods for a process have more than the RSS limit [if swapping
isn't necessary]. If all [or most] of the processes have "unlimited" for
the RSS limit, do something reasonable as well in an automated way. But if
the user has specified RSS limits [via the RLIMIT_RSS setting in
setrlimit(2)], please abide by them. Thanks.
--Mark H Johnson
  <mailto:Mark_H_Johnson@raytheon.com>

                    Rik van Riel                                                                                    
                    <riel@conecti        To:     linux-mm@kvack.org                                                 
                    va.com.br>           cc:     "Stephen C. Tweedie" <sct@redhat.com>, (bcc: Mark H                
                                         Johnson/RTS/Raytheon/US)                                                   
                    06/21/00             Subject:     [RFC] RSS guarantees and limits                               
                    05:29 PM                                                                                        

Hi,

I think I have an idea to solve the following two problems:
- RSS guarantees and limits to protect applications from
  each other
- make sure streaming IO doesn't cause the RSS of the application
  to grow too large
- protect smaller apps from bigger memory hogs

The idea revolves around two concepts. The first idea is to
have an RSS guarantee and an RSS limit per application, which
is recalculated periodically. A process' RSS will not be shrunk
to under the guarantee and cannot be grown to over the limit.
The ratio between the guarantee and the limit is fixed (eg.
limit = 4 x guarantee).

The second concept is the keeping of statistics per mm. We will
keep statistics of both the number of page steals per mm and the
number of re-faults per mm. A page steal is when we forcefully
shrink the RSS of the mm, by swap_out. A re-fault is pretty similar
to a page fault, with the difference that re-faults only count the
pages that are 1) faulted in  and 2) were just stolen from the
application (and are still in the lru cache).

Every second (??) we walk the list of all tasks (mms?) and do
something very much like this:

if (mm->refaults * 2 > mm->steals) {
           mm->rss_guarantee += (mm->rss_guarantee >> 4 + 1);
} else {
           mm->rss_guarantee -= (mm->rss_guarantee >> 4 + 1);
}
mm->refaults >>= 1;
mm->steals >>= 1;

This will have different effects on different kinds of tasks.
For example, an application which has a fixed working set will
fault *all* its pages back in and get a big rss_guarantee (and
rss_limit).

However, an application which is streaming tons of data (and
using the data only once) will find itself in the situation
where it does not reclaim most of the pages that get stolen from
it. This means that the RSS of a data streaming application will
remain limited to its working set. This should reduce the bad
effects this app has on the rest of the system. Also, when the
app hits its RSS limit and the page it releases from its VM is
dirty, we can apply write throttling.

One extra protection is needed in this scheme. We must make sure
that the RSS guarantees combined never get too big. We can do this
by simply making sure that all the RSS guarantees combined never
get bigger than 1/2 of physical memory. If we "need" more than that,
we can simply decrease the biggest RSS guarantees until we get below
1/2 of physical memory.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/                      http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
@ 2000-06-23 14:01 frankeh
  2000-06-23 17:56 ` Stephen Tweedie
  0 siblings, 1 reply; 30+ messages in thread
From: frankeh @ 2000-06-23 14:01 UTC (permalink / raw)
  To: linux-mm

How is shared memory accounted for?

Options are:
(a) Creator is charged
(b) prorated per number of users

any others options come to mind ?

-- Hubertus Franke
    IBM T.J.Watson Research Center


Rik van Riel <riel@conectiva.com.br>@kvack.org on 06/23/2000 10:15:46 AM

Sent by:  owner-linux-mm@kvack.org


To:   Ed Tomlinson <tomlins@cam.org>
cc:   linux-mm@kvack.org
Subject:  Re: [RFC] RSS guarantees and limits



On Thu, 22 Jun 2000, Ed Tomlinson wrote:

> Just wondering what will happen with java applications?  These
> beasts typically have working sets of 16M or more and use 10-20
> threads.  When using native threads linux sees each one as a
> process.  They all share the same memory though.

Ahh, but these limits are of course applied per _MM_, not
per thread ;)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/          http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
  2000-06-23 14:01 frankeh
@ 2000-06-23 17:56 ` Stephen Tweedie
  0 siblings, 0 replies; 30+ messages in thread
From: Stephen Tweedie @ 2000-06-23 17:56 UTC (permalink / raw)
  To: frankeh; +Cc: linux-mm

Hi,

On Fri, Jun 23, 2000 at 10:01:14AM -0400, frankeh@us.ibm.com wrote:
> How is shared memory accounted for?

Shared memory has nothing to do with RSSes --- the RSS is strictly 
a per-process concept.  If a process exhausts its RSS, then pages are
removed from that process's working set, but these pages are not
immediately evicted from physical memory.  If the page is faulted back
in before being finally evicted from memory, ten there is no disk IO
involved. 

The advantage of the RSS limit here is that the pages which are
evicted from working set but not from memory are MUCH easier for the
VM to evict later if we run out of physical free pages.  If we are
under memory pressure, then the RSS limit causes us to prefer to page
out the pages of processes who are above their RSS limit.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC] RSS guarantees and limits
@ 2000-06-23 18:07 frankeh
  0 siblings, 0 replies; 30+ messages in thread
From: frankeh @ 2000-06-23 18:07 UTC (permalink / raw)
  To: linux-mm

John, ...

I thought Rik actually takes care of that.  He doesn't necessarily penalize
a process because it is big.
He penalizes the process if its working set size is substantially smaller
than then its memory footprint.
His measure for that is the refault rate. If he takes away pages that are
shortly thereafter being faulted back in, than as he stated, he is to
agressive. Since this refaulting is cheap when the page is still in the
cache, the overhead should be reasonable small.

-- Hubertus



"John Fremlin" <vii@penguinpowered.com>@kvack.org on 06/23/2000 11:52:59 AM

Sent by:  owner-linux-mm@kvack.org


To:   <linux-mm@kvack.org>
cc:
Subject:  Re: [RFC] RSS guarantees and limits



Rik van Riel <riel@conectiva.com.br> writes:

[...]

> > I agree completely. It was one of the reasons I suggested that a
> > syscall like nice but giving info to the mm layer would be
> > useful. In general, small apps (xeyes,biff,gpm) don't deserve
> > any special treatment.
>
> Why not?  In scheduling processes which use less CPU get
> a better response time. Why not do the same for memory
> use? The less memory you use, the less agressive we'll be
> in trying to take it away from you.

CPU != memory.

Quick reasons:
        (1) Sleeping process takes memory.

        (2) Take away 10% CPU from a program, it runs at about 90% of
        former speed. Take away 10% mem from a program, might only run
        at 5-10% of former speed due to having to wait for disk IO.
>
> Of course a small app should be removed from memory when
> it's sleeping, but there's no reason to not apply some
> degree of fairness in memory allocation and memory stealing.

[...]

You say you can't see why small processes like shells etc. shouldn't
be specially treated (your first paragraph). Folding double negative,
you say there should be positive discrimination for these processes,
i.e. fairer distribution of memory (your second paragraph).

If you think I'm not qualified to disagree, reread what Matthew Dillon
said to you while discussing VM changes in May:

    Well, I have a pretty strong opinion on trying to rationalize
    penalizing big processes simply because they are big.  It's a bad
    idea for several reasons, not the least of which being that by
    making such a rationalization you are assuming a particular system
    topology -- you are assuming, for example, that the system may
    contain a few large less-important processes and a reasonable
    number of small processes.  But if the system contains hundreds of
    small processes or if some of the large processes turn out to be
    important, the rationalization fails.

    Also if the large process in question happens to really need the
    pages (is accessing them all the time), trying to page those pages
    out gratuitously does nothing but create a massive paging load on
    the system.  Unless you have a mechanism to (such as FreeBSD has)
    to impose a 20-second forced sleep under extreme memory loads, any
    focus on large processes will simply result in thrashing (read:
    screw up the system).

[...]

> > The only general solution I can see is to give some process
> > (groups) a higher MM priority, by analogy with nice.
>
> That you can't see anything better doesn't mean it isn't possible ;)

Indeed, I wait anxiously for someone to propose a better solution.

[...]

--

     http://altern.org/vii
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2000-06-27  3:26 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-06-21 22:29 [RFC] RSS guarantees and limits Rik van Riel
2000-06-22 18:00 ` John Fremlin
2000-06-22 19:12   ` Rik van Riel
2000-06-22 21:19   ` Stephen Tweedie
2000-06-22 21:37     ` Rik van Riel
2000-06-22 22:48       ` John Fremlin
2000-06-22 23:59         ` Stephen Tweedie
2000-06-23 16:08           ` John Fremlin
2000-06-22 22:39     ` John Fremlin
2000-06-22 23:27       ` Rik van Riel
2000-06-23  0:49         ` Ed Tomlinson
2000-06-23 13:45           ` Rik van Riel
2000-06-23 15:36             ` volodya
2000-06-23 15:52         ` John Fremlin
2000-06-24 11:22       ` Andrey Savochkin
2000-06-27  3:26         ` Stephen C. Tweedie
2000-06-22 14:41 [RFC] " frankeh
2000-06-22 15:31 ` Rik van Riel
2000-06-22 15:49 frankeh
2000-06-22 16:05 ` Rik van Riel
2000-06-22 16:22 frankeh
2000-06-22 16:38 ` Rik van Riel
2000-06-22 19:48 ` Jamie Lokier
2000-06-22 19:52   ` Rik van Riel
2000-06-22 20:00     ` Jamie Lokier
2000-06-22 20:07       ` Rik van Riel
2000-06-22 23:02 Mark_H_Johnson
2000-06-23 14:01 frankeh
2000-06-23 17:56 ` Stephen Tweedie
2000-06-23 18:07 frankeh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox