Re: Load control (was: Re: 2.4.9-ac16 good perfomer?)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: Load control  (was: Re: 2.4.9-ac16 good perfomer?)
@ 2001-10-01 14:49 Jesse Pollard
  2001-10-01 15:51 ` Daniel Phillips
  0 siblings, 1 reply; 4+ messages in thread
From: Jesse Pollard @ 2001-10-01 14:49 UTC (permalink / raw)
  To: riel, Daniel Phillips; +Cc: Mike Fedyk, linux-kernel, linux-mm

Rik van Riel <riel@conectiva.com.br>:
> On Mon, 1 Oct 2001, Daniel Phillips wrote:
>
> > Nice.  With this under control, another feature of his memory manager
> > you could look at is the variable deactivation threshold, which makes
> > a whole lot more sense now that the aging is linear.
>
> Actually, when we get to the point where deactivating enough
> pages is hard, we know the working set is large and we should
> be _more careful_ in chosing what to page out...
>
> When we go one step further, where the working set approaches
> the size of physical memory, we should probably start doing
> load control FreeBSD-style ... pick a process and deactivate
> as many of its pages as possible. By introducing unfairness
> like this we'll be sure that only one or two processes will
> slow down on the next VM load spike, instead of all processes.
>
> Once we reach permanent heavy overload, we should start doing
> process scheduling, restricting the active processes to a
> subset of all processes in such a way that the active processes
> are able to make progress. After a while, give other processes
> their chance to run.

Just a comment:
This begins to sound like the old VMS handling:
1. When not loaded down, all processes allocate freely.
2. When getting tight, trim all processes down some amount, until enough is
   free (balanced by page fault rate measure - process with the lowest fault
   rate gets trimmed first).
3. Continue triming until required space available or all processes are at
   their working set minimum.
4. if still tight, swap a process completely (determined by length of time
   since last IO wait - larger CPU bound jobs/processes got swaped first),
   reclaim memory. Note, at this point OOM may occur.
5. If swap full, do not start new processes (ENOMEM)
6. When a process exits, reclaim memory - if working set minimum available
   then swapin a process.

I also vaguely remember something about processes spawning new processes -
if memory wasn't immediately available (working set minimum for the new
process) then the process attempting the spawn is put to sleep (or swapped,
or both - this may have only occured if there was room in swap for the
process, if not - ENOMEM on the fork, in case that causes the parent to
exit and free more memory).

The trimming action did not immediately cause a pageout - all that was
needed was to reduce the working set size. The process that needed memory
would then cause the system to scan memory for pages that could be freed.
The first process examined (may have been the process asking for memory)
would have the excess pages paged out. (I believe they were chosen by a
LRU mechanism)

There was also a scheduling fairness rule about swapped processes geting
a schedule increment of 1, in memory processes got incremented 4, IO wait
processes got +6. When they were selected for run: if previous state was IO,
then decrement by 2, if state run, decrement by 2. If a swapped process
schedule value > in memory process, swap the memory resident process out,
swapin  the swaped process. (Oviously this isn't quite right :-)

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Load control  (was: Re: 2.4.9-ac16 good perfomer?)
  2001-10-01 14:49 Load control (was: Re: 2.4.9-ac16 good perfomer?) Jesse Pollard
@ 2001-10-01 15:51 ` Daniel Phillips
  0 siblings, 0 replies; 4+ messages in thread
From: Daniel Phillips @ 2001-10-01 15:51 UTC (permalink / raw)
  To: Jesse Pollard, riel; +Cc: Mike Fedyk, linux-kernel, linux-mm

On October 1, 2001 04:49 pm, Jesse Pollard wrote:
> 5. If swap full, do not start new processes (ENOMEM)

I was going to pounce on this one, but then I read the rest of your post...

> I also vaguely remember something about processes spawning new processes -
> if memory wasn't immediately available (working set minimum for the new
> process) then the process attempting the spawn is put to sleep (or swapped,
> or both - this may have only occured if there was room in swap for the
> process, if not - ENOMEM on the fork, in case that causes the parent to
> exit and free more memory).

Yes, here it should degrade gracefully as well.  Child-spawning tasks should
should be made to wait an increasingly long time as pressure increases before
they start seeing a lot of ENOMEM's.  Also, such penalties must be carefully
targetted so as not to prevent, for example, a new root login for
administrative purposes.  Under tight memory conditions we would want to
target any task that spawns children rapidly, which would constitute a sane
form of fork bomb control: its ok to spawn many tasks rapidly a long as
memory is lightly loaded.

Another weapon we can add to our arsenal is the possibility of suspending
tasks to non-swap storage, which would effectively add a second level of swap
space as large as all the free space on your disk.  Equivalently but perhaps
more usefully, we could allow swap files to grow dynamically.

Implementing such complex policy seems a distant goal considering that we are
still far from even being able to make an accurate OOM determination.
However, I have a suggestion.  Such policy is exactly that, policy, and as
such should be implemented outside the kernel.  We just need to expose the
relevant statistics and vm/scheduler control hooks, taking care that the task
responsible for scheduling policy never becomes its own victim.  This is a
much smaller and more clearly defined task than actually implementing the
task control policy.

> The trimming action did not immediately cause a pageout - all that was
> needed was to reduce the working set size. The process that needed memory
> would then cause the system to scan memory for pages that could be freed.
> The first process examined (may have been the process asking for memory)
> would have the excess pages paged out. (I believe they were chosen by a
> LRU mechanism)
>
> There was also a scheduling fairness rule about swapped processes geting
> a schedule increment of 1, in memory processes got incremented 4, IO wait
> processes got +6. When they were selected for run: if previous state was IO,
> then decrement by 2, if state run, decrement by 2. If a swapped process
> schedule value > in memory process, swap the memory resident process out,
> swapin  the swaped process. (Oviously this isn't quite right :-)

Wouldn't you love to be able to tweak this policy from user space, in a
language of your choice, on a running system? ;-)

--
Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Load control  (was: Re: 2.4.9-ac16 good perfomer?)
  2001-10-01 13:57 ` Rik van Riel
@ 2001-10-01 16:05   ` Daniel Phillips
  0 siblings, 0 replies; 4+ messages in thread
From: Daniel Phillips @ 2001-10-01 16:05 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Mike Fedyk, linux-kernel, linux-mm

On October 1, 2001 03:57 pm, Rik van Riel wrote:
> On Mon, 1 Oct 2001, Daniel Phillips wrote:
>
> > Nice.  With this under control, another feature of his memory manager
> > you could look at is the variable deactivation threshold, which makes
> > a whole lot more sense now that the aging is linear.
>
> Actually, when we get to the point where deactivating enough
> pages is hard, we know the working set is large and we should
> be _more careful_ in chosing what to page out...

Naturally.  However, this is orthogonal.  Consider the case where you've hit
the wall and the inactive list has suffered sudden depletion.  At this point
you have to deactivate a large number of pages and you will have few or no
intervening age-up events (because you hit the wall and nobody's moving).
It's a useless waste of CPU and real time to cycle through the active list 5
times to deactivate enough pages.  You should cycle through at most twice,
once to age up any pages with Ref set and the second time to deactivate the
required number of pages according to a threshold you estimated on the first
pass.

This is just the first common example that came to mind where a variable
deactivation threshold is obviously desirable, I'm sure there are others.

> When we go one step further, where the working set approaches
> the size of physical memory, we should probably start doing
> load control FreeBSD-style ... pick a process and deactivate
> as many of its pages as possible. By introducing unfairness
> like this we'll be sure that only one or two processes will
> slow down on the next VM load spike, instead of all processes.
>
> Once we reach permanent heavy overload, we should start doing
> process scheduling, restricting the active processes to a
> subset of all processes in such a way that the active processes
> are able to make progress. After a while, give other processes
> their chance to run.

No question about the need for higher level process control, but the low
level machinery could still be improved, don't you think?

--
Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Load control  (was: Re: 2.4.9-ac16 good perfomer?)
       [not found] <20011001111435Z16281-2757+2605@humbolt.nl.linux.org>
@ 2001-10-01 13:57 ` Rik van Riel
  2001-10-01 16:05   ` Daniel Phillips
  0 siblings, 1 reply; 4+ messages in thread
From: Rik van Riel @ 2001-10-01 13:57 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Mike Fedyk, linux-kernel, linux-mm

On Mon, 1 Oct 2001, Daniel Phillips wrote:

> Nice.  With this under control, another feature of his memory manager
> you could look at is the variable deactivation threshold, which makes
> a whole lot more sense now that the aging is linear.

Actually, when we get to the point where deactivating enough
pages is hard, we know the working set is large and we should
be _more careful_ in chosing what to page out...

When we go one step further, where the working set approaches
the size of physical memory, we should probably start doing
load control FreeBSD-style ... pick a process and deactivate
as many of its pages as possible. By introducing unfairness
like this we'll be sure that only one or two processes will
slow down on the next VM load spike, instead of all processes.

Once we reach permanent heavy overload, we should start doing
process scheduling, restricting the active processes to a
subset of all processes in such a way that the active processes
are able to make progress. After a while, give other processes
their chance to run.

regards,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2001-10-01 16:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-01 14:49 Load control (was: Re: 2.4.9-ac16 good perfomer?) Jesse Pollard
2001-10-01 15:51 ` Daniel Phillips
     [not found] <20011001111435Z16281-2757+2605@humbolt.nl.linux.org>
2001-10-01 13:57 ` Rik van Riel
2001-10-01 16:05   ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox