From: Matt Dillon <dillon@earth.backplane.com>
To: Rik van Riel <riel@conectiva.com.br>
Cc: arch@freebsd.org, linux-mm@kvack.org, sfkaplan@cs.amherst.edu
Subject: Re: on load control / process swapping
Date: Sat, 12 May 2001 16:58:14 -0700 (PDT) [thread overview]
Message-ID: <200105122358.f4CNwEr20137@earth.backplane.com> (raw)
In-Reply-To: <Pine.LNX.4.21.0105121109210.5468-100000@imladris.rielhome.conectiva>
Consider the case where you have one large process and many small
processes. If you were to skew things to allow the large process to
run at the cost of all the small processes, you have just inconvenienced
98% of your users so one ozob can run a big job. Not only that, but
there is no guarentee that the 'big job' will ever finish (a topic of
many a paper on scheduling, BTW)... what if it's been running for hours
and still has hours to go? Do we blow away the rest of the system to
let it run?
What if there are several big jobs? If you skew things in favor of
one the others could take 60 seconds *just* to recover their RSS when
they are finally allowed to run. So much for timesharing... you
would have to run each job exclusively for 5-10 minutes at a time
to get any sort of effiency, which is not practical in a timeshare
system. So there is really very little that you can do.
:Indeed, the speed limiting of the pageout scanning takes care of
:this. But still, having the swapout threshold defined as being
:short of inactive pages while the swapin threshold uses the number
:of free+cache pages as an indication could lead to the situation
:where you suspend and wake up processes while it isn't needed.
:
:Or worse, suspending one process which easily fit in memory and
:then waking up another process, which cannot be swapped in because
:the first process' memory is still sitting in RAM and cannot be
:removed yet due to the pageout scan speed limiting (and also cannot
:be used, because we suspended the process).
We don't suspend running processes, but I do believe FreeBSD is still
vulnerable to this issue. Suspending the marked process when it hits the
vm_fault code is a good idea and would solve the problem. If the process
never takes an allocation fault, it probably doesn't have to be swapped
out. The normal pageout would suffice for that process.
:> The pagein and pageout rates have nothing to do with thrashing, per say,
:> and should never be arbitrarily limited.
:
:But they are, with the pageout daemon going to sleep for half a
:second if it doesn't succeed in freeing enough memory at once.
:It even does this if a large part of the memory on the active
:list belongs to a process which has just been suspended because
:of thrashing...
No. I did say the code was complex. A process which has been
suspended for thrashing gets all of its pages depressed in priority.
The page daemon would have no problem recovering the pages. See
line 1458 of vm_pageout.c. This code also enforces the 'memoryuse'
resource limit (which is perhaps even more important). It is not
necessary to try to launder the pages immediately. Simply depressing
their priority is sufficient and it allows for quicker recovery when
the thrashing goes away. It also allows us to implement the
vm.swap_idle_{threshold1,threshold2,enabled} sysctls trivially, which
results in proactive swapping that is extremely useful in certain
situations (like shell machines with lots of idle users).
The pagedaemon gets behind when there are too many
active pages in the system and the pagedaemon is unable to move them
to the inactive queue due to the pages still being very active... that is,
when the active resident set for all processes in the system exceeds
available memory. This is what triggers thrashing. Swapping has the
side effect of reducing the total active resident set for the system
as a whole, fixing the thrashing problem.
-Matt
:> I don't think it's possible to write a nice neat thrash-handling
:> algorithm. It's a bunch of algorithms all working together, all
:> closely tied to the VM page cache. Each taken alone is fairly easy
:> to describe and understand. All of them together result in complex
:> interactions that are very easy to break if you make a mistake.
:
:Heheh, certainly true ;)
:
:cheers,
:
:Rik
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
next prev parent reply other threads:[~2001-05-12 23:58 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-05-07 21:16 Rik van Riel
2001-05-07 22:50 ` Matt Dillon
2001-05-07 23:35 ` Rik van Riel
2001-05-08 0:56 ` Matt Dillon
2001-05-12 14:23 ` Rik van Riel
2001-05-12 17:21 ` Matt Dillon
2001-05-12 21:17 ` Rik van Riel
2001-05-12 23:58 ` Matt Dillon [this message]
2001-05-13 17:22 ` Rik van Riel
2001-05-15 6:38 ` Terry Lambert
2001-05-15 13:39 ` Cy Schubert - ITSD Open Systems Group
2001-05-15 15:31 ` Rik van Riel
2001-05-15 17:24 ` Matt Dillon
2001-05-15 23:55 ` Roger Larsson
2001-05-16 0:16 ` Matt Dillon
2001-05-16 4:22 ` Kernel Debugger Amarnath Jolad
2001-05-16 7:58 ` Kris Kennaway
2001-05-16 11:42 ` Martin Frey
2001-05-16 12:04 ` R.Oehler
2001-05-16 8:23 ` on load control / process swapping Terry Lambert
2001-05-16 17:26 ` Matt Dillon
2001-05-08 20:52 ` Kirk McKusick
2001-05-09 0:18 ` Matt Dillon
2001-05-09 2:07 ` Peter Jeremy
2001-05-09 19:41 ` Matt Dillon
2001-05-12 14:28 ` Rik van Riel
2001-05-08 12:25 ` Scott F. Kaplan
2001-05-16 15:17 Charles Randall
2001-05-16 17:14 Matt Dillon
2001-05-16 17:41 ` Rik van Riel
2001-05-16 17:54 ` Matt Dillon
2001-05-18 5:58 ` Terry Lambert
2001-05-18 6:20 ` Matt Dillon
2001-05-18 10:00 ` Andrew Reilly
2001-05-18 13:49 ` Jonathan Morton
2001-05-19 2:18 ` Rik van Riel
2001-05-19 2:56 ` Jonathan Morton
2001-05-16 17:57 ` Alfred Perlstein
2001-05-16 18:01 ` Matt Dillon
2001-05-16 18:10 ` Alfred Perlstein
[not found] <OF5A705983.9566DA96-ON86256A50.00630512@hou.us.ray.com>
2001-05-18 20:13 ` Jonathan Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200105122358.f4CNwEr20137@earth.backplane.com \
--to=dillon@earth.backplane.com \
--cc=arch@freebsd.org \
--cc=linux-mm@kvack.org \
--cc=riel@conectiva.com.br \
--cc=sfkaplan@cs.amherst.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox