Re: pressuring dirty pages (2.3.99-pre6)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mark_H_Johnson.RTS@raytheon.com
To: "Eric W. Biederman" <ebiederman@uswest.net>
Cc: linux-mm@kvack.org, riel@nl.linux.org, sct@redhat.com
Subject: Re: pressuring dirty pages (2.3.99-pre6)
Date: Tue, 25 Apr 2000 09:27:57 -0500	[thread overview]
Message-ID: <852568CC.004F0BB1.00@raylex-gh01.eo.ray.com> (raw)


Re: "RSS limits"

It would be great to have a dynamic max limit. However I can see a lot of
complexity in doing so. May I make a few suggestions.
 - take a few moments to model the system operation under load. If the model
says RSS limits would help, by all means lets do it. If not, fix what we have.
If RSS limits are what we need, then
 - implement the RSS limit using the current mechanism [e.g., ulimit]
 - use a simple page removal algorithm to start with [e.g.,"oldest page first"
or "address space order"]. The only caution I might add on this is to check that
the page you are removing isn't the one w/ the instruction you are executing
[else you page fault again on returning to the process].
 - get measurements under load to validate the model and determine if the
solution is "good enough"
Then add the bells & whistles once the basic capability is proven.

Yes, it would be nice to remove the "least recently used" page - however, for
many applications this is quite similar to "oldest page". If I remember from a
DECUS meeting (talk about VMS's virtual memory system), they saw perhaps 5-10%
improvement using LRU with a lot of extra overhead in the kernel. [you have to
remember that taking the "wrong page" out of the process will result in a low
cost page fault - that page didn't actually go into the swap area]

Yes, a dynamic max limit would be good. But even with a highly dynamic load on
the system [cycles of a burst of activity, then a quiet period], for this kind
of load, small RSS sizes may also be "good enough". You can't tell w/o a model
of system performance or real measurements.

If we get to the point of implementing a dynamic RSS limit, let's make sure it
gets done with the right information and at the "right time". I suggest it not
be done at page fault time - give it to a process like kswapd where you can
review page fault rates and memory sizes and make a global adjustment.
--Mark H Johnson
  <mailto:Mark_H_Johnson@raytheon.com>


|--------+----------------------->
|        |          ebiederman@us|
|        |          west.net     |
|        |          (Eric W.     |
|        |          Biederman)   |
|        |                       |
|        |          04/25/00     |
|        |          08:58 AM     |
|        |                       |
|--------+----------------------->
  >----------------------------------------------------------------------------|
  |                                                                            |
  |       To:     riel@nl.linux.org                                            |
  |       cc:     "Stephen C. Tweedie" <sct@redhat.com>, linux-mm@kvack.org,   |
  |       (bcc: Mark H Johnson/RTS/Raytheon/US)                                |
  |       Subject:     Re: pressuring dirty pages (2.3.99-pre6)                |
  >----------------------------------------------------------------------------|



Rik van Riel <riel@conectiva.com.br> writes:

> On Mon, 24 Apr 2000, Stephen C. Tweedie wrote:
> > On Mon, Apr 24, 2000 at 04:54:38PM -0300, Rik van Riel wrote:
> > >
> > > I've been trying to fix the VM balance for a week or so now,
> > > and things are mostly fixed except for one situation.
> > >
> > > If there is a *heavy* write going on and the data is in the
> > > page cache only .. ie. no buffer heads available, then the
> > > page cache will grow almost without bounds and kswapd and
> > > the rest of the system will basically spin in shrink_mmap()...
> >
> > shrink_mmap is the problem then -- it should be giving up sooner
> > and letting try_to_swap_out() deal with the pages.  mmap()ed
> > dirty pages can only be freed through swapper activity, not via
> > shrink_mmap().
>
> That will not work. The problem isn't that kswapd eats cpu,
> but the problem is that the dirty pages completely dominate
> physical memory.
>
> I've tried the "giving up earlier" option in shrink_mmap(),
> but that leads to memory filling up just as badly and giving
> us the same kind of trouble.
>
> I guess what we want is the kind of callback that we do in
> the direction of the buffer cache, using something like the
> bdflush wakeup call done in try_to_free_buffers() ...
>
> Maybe a "special" return value from shrink_mmap() telling
> do_try_to_free_pages() to run swap_out() unconditionally
> after this succesful shrink_mmap() call?  Maybe even with
> severity levels?
>
> Eg. more calls to swap_out() if we encountered a lot of
> dirty pages in shrink_mmap() ???

I suspect the simplest thing we could do would be to actually implement
a RSS limit per struct mm.  Roughly in handle_pte_fault if the page isn't
present and we are at our rss limit call swap_out_mm, until we are
below the limit.

This won't hurt much in the uncontended case, because the page
cache will still keep everything anyway, some dirty pages
will just get buffer_heads, and bdflush might clean those pages.

In the contended case, it removes some of the burden from swap_out,
and it should give shrink_mmap some pages to work with...

How we can approach the ideal of dynamically managed max RSS
sizes is another question...

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

next             reply	other threads:[~2000-04-25 14:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-04-25 14:27 Mark_H_Johnson.RTS [this message]
2000-04-25 16:30 ` Stephen C. Tweedie
2000-04-25 19:14   ` Eric W. Biederman
2000-04-25 19:47     ` Rik van Riel
2000-04-26 11:43       ` Stephen C. Tweedie
2000-04-26 11:06     ` Stephen C. Tweedie
  -- strict thread matches above, loose matches on Subject: below --
2000-04-24 19:54 Rik van Riel
2000-04-24 21:27 ` Stephen C. Tweedie
2000-04-24 22:42   ` Rik van Riel
2000-04-25  9:35     ` Stephen C. Tweedie
2000-04-25 15:25       ` Rik van Riel
2000-04-25 13:58     ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=852568CC.004F0BB1.00@raylex-gh01.eo.ray.com \
    --to=mark_h_johnson.rts@raytheon.com \
    --cc=ebiederman@uswest.net \
    --cc=linux-mm@kvack.org \
    --cc=riel@nl.linux.org \
    --cc=sct@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox