Re: on load control / process swapping

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Terry Lambert <tlambert2@mindspring.com>
To: Rik van Riel <riel@conectiva.com.br>
Cc: Matt Dillon <dillon@earth.backplane.com>,
	arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu
Subject: Re: on load control / process swapping
Date: Mon, 14 May 2001 23:38:07 -0700	[thread overview]
Message-ID: <3B00CECF.9A3DEEFA@mindspring.com> (raw)
In-Reply-To: <Pine.LNX.4.21.0105131417550.5468-100000@imladris.rielhome.conectiva>

Rik van Riel wrote:
> So we should not allow just one single large job to take all
> of memory, but we should allow some small jobs in memory too.

Historically, this problem is solved with a "working set
quota".

> If you don't do this very slow swapping, NONE of the big tasks
> will have the opportunity to make decent progress and the system
> will never get out of thrashing.
> 
> If we simply make the "swap time slices" for larger processes
> larger than for smaller processes we:
> 
> 1) have a better chance of the large jobs getting any work done
> 2) won't have the large jobs artificially increase memory load,
>    because all time will be spent removing each other's RSS
> 3) can have more small jobs in memory at once, due to 2)
> 4) can be better for interactive performance due to 3)
> 5) have a better chance of getting out of the overload situation
>    sooner
> 
> I realise this would make the scheduling algorithm slightly
> more complex and I'm not convinced doing this would be worth
> it myself, but we may want to do some brainstorming over this ;)

A per vnode working set quota with a per use count adjust
would resolve most load thrashing issues.  Programs with
large working sets can either be granted a case by case
exception (via rlimit), or, more likely just have their
pages thrashed out more often.

You only ever need to do this when you have exhausted
memory to the point you are swapping, and then only when
you want to reap cached clean pages; when all you have
left is dirty pages in memory and swap, you are well and
truly thrashing -- for the right reason: your system load
is too high.

It's also relatively easy to implement something like a
per vnode working set quota, which can be self-enforced,
without making the scheduler so ugly that you will never
be able to do things like have per-CPU run queues for a
very efficient SMP that deals with the cache locality
issue naturally and easily (by merely setting migration
policies for moving from one run queue to another, and
by threads in a thread group having negative affinity for
each other's CPUs, to maximize real concurrency).

Psuedo code:

	IF THRASH_CONDITIONS
		IF (COPY_ON_WRITE_FAULT OR
		   PAGE_FILL_OF_SBRKED_PAGE_FAULT)
			IF VNODE_OVER_WORKING_SET_QUOTA
				STEAL_PAGE_FROM_VNODE_LRU
	ELSE
		GET_PAGE_FROM_SYSTEM

Obviously, this would work for vnodes that were acting as
backing store for programs, just as they would prevent a
large mmap() with a traversal from thrashing everyone else's
data and code out of core (which is, I think, a much worse
and much more common problem).

Doing extremely complicated things is only going to get
you into trouble... in particular, you don't want to
have policy in effect to deal with border load conditions
unless you are under those conditions in the first place.
The current scheduling algorithms are quite simple,
relatively speaking, and it makes much more sense to
make the thrasher fight with themselves, rather than them
peeing in everyone's pool.

I think that badly written programs taking more time, as
a result, is not a problem; if it is, it's one I could
live with much more easily than cache-busting for no good
reason, and slowing well behaved code down.  You need to
penalize the culprit.

It's possible to do a more complicated working set quota,
which actually applies to a process' working set, instead
of to vnodes, out of context with the process, but I think
that the vnode approach, particularly when you bump the
working set up per each additional opener, using the count
I suggested, to ensure proper locality of reference, is
good enough to solve the problem.

At the very least, the system would not "freeze" with this
approach, even if it could later recover.

-- Terry
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

next prev parent reply	other threads:[~2001-05-15  6:38 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-05-07 21:16 Rik van Riel
2001-05-07 22:50 ` Matt Dillon
2001-05-07 23:35   ` Rik van Riel
2001-05-08  0:56     ` Matt Dillon
2001-05-12 14:23       ` Rik van Riel
2001-05-12 17:21         ` Matt Dillon
2001-05-12 21:17           ` Rik van Riel
2001-05-12 23:58         ` Matt Dillon
2001-05-13 17:22           ` Rik van Riel
2001-05-15  6:38             ` Terry Lambert [this message]
2001-05-15 13:39               ` Cy Schubert - ITSD Open Systems Group
2001-05-15 15:31               ` Rik van Riel
2001-05-15 17:24               ` Matt Dillon
2001-05-15 23:55                 ` Roger Larsson
2001-05-16  0:16                   ` Matt Dillon
2001-05-16  4:22                     ` Kernel Debugger Amarnath Jolad
2001-05-16  7:58                       ` Kris Kennaway
2001-05-16 11:42                       ` Martin Frey
2001-05-16 12:04                         ` R.Oehler
2001-05-16  8:23                 ` on load control / process swapping Terry Lambert
2001-05-16 17:26                   ` Matt Dillon
2001-05-08 20:52   ` Kirk McKusick
2001-05-09  0:18     ` Matt Dillon
2001-05-09  2:07       ` Peter Jeremy
2001-05-09 19:41         ` Matt Dillon
2001-05-12 14:28       ` Rik van Riel
2001-05-08 12:25 ` Scott F. Kaplan
2001-05-16 15:17 Charles Randall
2001-05-16 17:14 Matt Dillon
2001-05-16 17:41 ` Rik van Riel
2001-05-16 17:54   ` Matt Dillon
2001-05-18  5:58     ` Terry Lambert
2001-05-18  6:20       ` Matt Dillon
2001-05-18 10:00         ` Andrew Reilly
2001-05-18 13:49         ` Jonathan Morton
2001-05-19  2:18           ` Rik van Riel
2001-05-19  2:56             ` Jonathan Morton
2001-05-16 17:57   ` Alfred Perlstein
2001-05-16 18:01     ` Matt Dillon
2001-05-16 18:10       ` Alfred Perlstein
     [not found] <OF5A705983.9566DA96-ON86256A50.00630512@hou.us.ray.com>
2001-05-18 20:13 ` Jonathan Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3B00CECF.9A3DEEFA@mindspring.com \
    --to=tlambert2@mindspring.com \
    --cc=arch@FreeBSD.ORG \
    --cc=dillon@earth.backplane.com \
    --cc=linux-mm@kvack.org \
    --cc=riel@conectiva.com.br \
    --cc=sfkaplan@cs.amherst.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox