From: Rik van Riel <riel@conectiva.com.br>
To: arch@freebsd.org
Cc: linux-mm@kvack.org, Matt Dillon <dillon@earth.backplane.com>,
sfkaplan@cs.amherst.edu
Subject: on load control / process swapping
Date: Mon, 7 May 2001 18:16:57 -0300 (BRST) [thread overview]
Message-ID: <Pine.LNX.4.21.0105061924160.582-100000@imladris.rielhome.conectiva> (raw)
Hi,
after staring at the code for a long long time, I finally
figured out exactly why FreeBSD's load control code (the
process swapping in vm_glue.c) can never work in many
scenarios.
In short, the process suspension / wake up code only does
load control in the sense that system load is reduced, but
absolutely no effort is made to ensure that individual
programs can run without thrashing. This, of course, kind of
defeats the purpose of doing load control in the first place.
To see this situation in some more detail, lets first look
at how the current process suspension code has evolved over
time. Early paging Unixes, including earlier BSDs, had a
rate-limited clock algorithm for the pageout code, where
the VM subsystem would only scan (and page) memory out at
a rate of fastscan pages per second.
Whenever the paging system wasn't able to keep up, free
memory would get below a certain threshold and memory load
control (in the form of process suspension) kicked in. As
soon as free memory (averaged over a few seconds) got over
this threshold, processes get swapped in again. Because of
the exact "speed limit" for the paging code, this would give
a slow rotation of memory-resident progesses at a paging rate
well below the thashing threshold.
More modern Unixes, like FreeBSD, NetBSD or Linux, however,
don't have the artificial speed limit on pageout. This means
the pageout code can go on freeing memory until well beyond
the trashing point of the system. It also means that the
amount of free memory is no longer any indication of whether
the system is thrashing or not.
Add to that the fact that the classical load control in BSD
resumes a suspended task whenever the system is above the
(now not very meaningful) free memory threshold, regardless
of whether the resident tasks have had the opportunity to
make any progress ... which of course only encourages more
thrashing instead of letting the system work itself out of
the overload situation.
Any solution will have to address the following points:
1) allow the resident processes to stay resident long
enough to make progess
2) make sure the resident processes aren't thrashing,
that is, don't let new processes back in memory if
none of the currently resident processes is "ready"
to be suspended
3) have a mechanism to detect thrashing in a VM
subsystem which isn't rate-limited (hard?)
and, for extra brownie points:
4) fairness, small processes can be paged in and out
faster, so we can suspend&resume them faster; this
has the side effect of leaving the proverbial root
shell more usable
5) make sure already resident processes cannot create
a situation that'll keep the swapped out tasks out
of memory forever ... but don't kill performance either,
since bad performance means we cannot get out of the
bad situation we're in
Points 1), 2) and 4) are relatively easy to address by simply
keeping resident tasks unswappable for a long enough time that
they've been able to do real work in an environment where
3) indicates we're not thrashing.
3) is the hard part. We know we're not thrashing when we don't
have ongoing page faults all the time, but (say) only 50% of the
time. However, I still have no idea to determine when we _are_
thrashing, since a system which always has 10 ongoing page faults
may still be functioning without thrashing... This is the part
where I cannot hand a ready solution but where we have to figure
out a solution together.
(and it's also the reason I cannot "send a patch" ... I know the
current scheme cannot possibly work all the time, I understand why,
but I just don't have a solution to the problem ... yet)
regards,
Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...
http://www.surriel.com/ http://distro.conectiva.com/
Send all your spam to aardvark@nl.linux.org (spam digging piggy)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
next reply other threads:[~2001-05-07 21:16 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-05-07 21:16 Rik van Riel [this message]
2001-05-07 22:50 ` Matt Dillon
2001-05-07 23:35 ` Rik van Riel
2001-05-08 0:56 ` Matt Dillon
2001-05-12 14:23 ` Rik van Riel
2001-05-12 17:21 ` Matt Dillon
2001-05-12 21:17 ` Rik van Riel
2001-05-12 23:58 ` Matt Dillon
2001-05-13 17:22 ` Rik van Riel
2001-05-15 6:38 ` Terry Lambert
2001-05-15 13:39 ` Cy Schubert - ITSD Open Systems Group
2001-05-15 15:31 ` Rik van Riel
2001-05-15 17:24 ` Matt Dillon
2001-05-15 23:55 ` Roger Larsson
2001-05-16 0:16 ` Matt Dillon
2001-05-16 4:22 ` Kernel Debugger Amarnath Jolad
2001-05-16 7:58 ` Kris Kennaway
2001-05-16 11:42 ` Martin Frey
2001-05-16 12:04 ` R.Oehler
2001-05-16 8:23 ` on load control / process swapping Terry Lambert
2001-05-16 17:26 ` Matt Dillon
2001-05-08 20:52 ` Kirk McKusick
2001-05-09 0:18 ` Matt Dillon
2001-05-09 2:07 ` Peter Jeremy
2001-05-09 19:41 ` Matt Dillon
2001-05-12 14:28 ` Rik van Riel
2001-05-08 12:25 ` Scott F. Kaplan
2001-05-16 15:17 Charles Randall
2001-05-16 17:14 Matt Dillon
2001-05-16 17:41 ` Rik van Riel
2001-05-16 17:54 ` Matt Dillon
2001-05-18 5:58 ` Terry Lambert
2001-05-18 6:20 ` Matt Dillon
2001-05-18 10:00 ` Andrew Reilly
2001-05-18 13:49 ` Jonathan Morton
2001-05-19 2:18 ` Rik van Riel
2001-05-19 2:56 ` Jonathan Morton
2001-05-16 17:57 ` Alfred Perlstein
2001-05-16 18:01 ` Matt Dillon
2001-05-16 18:10 ` Alfred Perlstein
[not found] <OF5A705983.9566DA96-ON86256A50.00630512@hou.us.ray.com>
2001-05-18 20:13 ` Jonathan Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.21.0105061924160.582-100000@imladris.rielhome.conectiva \
--to=riel@conectiva.com.br \
--cc=arch@freebsd.org \
--cc=dillon@earth.backplane.com \
--cc=linux-mm@kvack.org \
--cc=sfkaplan@cs.amherst.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox