linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* pressuring dirty pages (2.3.99-pre6)
@ 2000-04-24 19:54 Rik van Riel
  2000-04-24 21:27 ` Stephen C. Tweedie
  0 siblings, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2000-04-24 19:54 UTC (permalink / raw)
  To: linux-mm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 934 bytes --]

Hi,

I've been trying to fix the VM balance for a week or so now,
and things are mostly fixed except for one situation.

If there is a *heavy* write going on and the data is in the
page cache only .. ie. no buffer heads available, then the
page cache will grow almost without bounds and kswapd and
the rest of the system will basically spin in shrink_mmap()...

What mechanism do we use to flush back dirty pages from eg.
mmap()s?  How could I push those pages to disk the way we
do with buffers (by waking up bdflush)?

(yes, this is a big bug, please try the attached program by
Juan Quintela and set the #defines as wanted .. it'll make
painfully clear that this bug exists and should be fixed)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

[-- Attachment #2: qmtest.c --]
[-- Type: TEXT/PLAIN, Size: 1153 bytes --]

/*
 * Memory tester by Quintela.
 */
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#define FILENAME "/tmp/testing_file"
/* Put here 2times your memory or less */
#define SIZE     (128 * 1024 * 1024) 


void error_string(char *msg)
{
        perror(msg);
        exit(EXIT_FAILURE);
}

int main(int argc, char * argv[])
{
        char *array;
        int i;
        int fd = open(FILENAME, O_RDWR | O_CREAT, 0666);
        if (fd == -1)
                error_string("Problems opening the file");

        if (lseek(fd, SIZE, SEEK_SET) != SIZE)
                error_string("Problems doing the lseek");

        if (write(fd,"\0",1) !=1)
                error_string("Problems writing");
 
        array = mmap(0, SIZE, PROT_WRITE, MAP_SHARED,fd,0);
        if (array == MAP_FAILED)
                error_string("The mmap has failed");
        
        for(i = 0; i < SIZE; i++) {
                array[i] = i;
        } 
        msync(array, SIZE, MS_SYNC);
        close(fd);
        exit(EXIT_SUCCESS);
}

^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: pressuring dirty pages (2.3.99-pre6)
@ 2000-04-25 14:27 Mark_H_Johnson.RTS
  2000-04-25 16:30 ` Stephen C. Tweedie
  0 siblings, 1 reply; 12+ messages in thread
From: Mark_H_Johnson.RTS @ 2000-04-25 14:27 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-mm, riel, sct


Re: "RSS limits"

It would be great to have a dynamic max limit. However I can see a lot of
complexity in doing so. May I make a few suggestions.
 - take a few moments to model the system operation under load. If the model
says RSS limits would help, by all means lets do it. If not, fix what we have.
If RSS limits are what we need, then
 - implement the RSS limit using the current mechanism [e.g., ulimit]
 - use a simple page removal algorithm to start with [e.g.,"oldest page first"
or "address space order"]. The only caution I might add on this is to check that
the page you are removing isn't the one w/ the instruction you are executing
[else you page fault again on returning to the process].
 - get measurements under load to validate the model and determine if the
solution is "good enough"
Then add the bells & whistles once the basic capability is proven.

Yes, it would be nice to remove the "least recently used" page - however, for
many applications this is quite similar to "oldest page". If I remember from a
DECUS meeting (talk about VMS's virtual memory system), they saw perhaps 5-10%
improvement using LRU with a lot of extra overhead in the kernel. [you have to
remember that taking the "wrong page" out of the process will result in a low
cost page fault - that page didn't actually go into the swap area]

Yes, a dynamic max limit would be good. But even with a highly dynamic load on
the system [cycles of a burst of activity, then a quiet period], for this kind
of load, small RSS sizes may also be "good enough". You can't tell w/o a model
of system performance or real measurements.

If we get to the point of implementing a dynamic RSS limit, let's make sure it
gets done with the right information and at the "right time". I suggest it not
be done at page fault time - give it to a process like kswapd where you can
review page fault rates and memory sizes and make a global adjustment.
--Mark H Johnson
  <mailto:Mark_H_Johnson@raytheon.com>


|--------+----------------------->
|        |          ebiederman@us|
|        |          west.net     |
|        |          (Eric W.     |
|        |          Biederman)   |
|        |                       |
|        |          04/25/00     |
|        |          08:58 AM     |
|        |                       |
|--------+----------------------->
  >----------------------------------------------------------------------------|
  |                                                                            |
  |       To:     riel@nl.linux.org                                            |
  |       cc:     "Stephen C. Tweedie" <sct@redhat.com>, linux-mm@kvack.org,   |
  |       (bcc: Mark H Johnson/RTS/Raytheon/US)                                |
  |       Subject:     Re: pressuring dirty pages (2.3.99-pre6)                |
  >----------------------------------------------------------------------------|



Rik van Riel <riel@conectiva.com.br> writes:

> On Mon, 24 Apr 2000, Stephen C. Tweedie wrote:
> > On Mon, Apr 24, 2000 at 04:54:38PM -0300, Rik van Riel wrote:
> > >
> > > I've been trying to fix the VM balance for a week or so now,
> > > and things are mostly fixed except for one situation.
> > >
> > > If there is a *heavy* write going on and the data is in the
> > > page cache only .. ie. no buffer heads available, then the
> > > page cache will grow almost without bounds and kswapd and
> > > the rest of the system will basically spin in shrink_mmap()...
> >
> > shrink_mmap is the problem then -- it should be giving up sooner
> > and letting try_to_swap_out() deal with the pages.  mmap()ed
> > dirty pages can only be freed through swapper activity, not via
> > shrink_mmap().
>
> That will not work. The problem isn't that kswapd eats cpu,
> but the problem is that the dirty pages completely dominate
> physical memory.
>
> I've tried the "giving up earlier" option in shrink_mmap(),
> but that leads to memory filling up just as badly and giving
> us the same kind of trouble.
>
> I guess what we want is the kind of callback that we do in
> the direction of the buffer cache, using something like the
> bdflush wakeup call done in try_to_free_buffers() ...
>
> Maybe a "special" return value from shrink_mmap() telling
> do_try_to_free_pages() to run swap_out() unconditionally
> after this succesful shrink_mmap() call?  Maybe even with
> severity levels?
>
> Eg. more calls to swap_out() if we encountered a lot of
> dirty pages in shrink_mmap() ???

I suspect the simplest thing we could do would be to actually implement
a RSS limit per struct mm.  Roughly in handle_pte_fault if the page isn't
present and we are at our rss limit call swap_out_mm, until we are
below the limit.

This won't hurt much in the uncontended case, because the page
cache will still keep everything anyway, some dirty pages
will just get buffer_heads, and bdflush might clean those pages.

In the contended case, it removes some of the burden from swap_out,
and it should give shrink_mmap some pages to work with...

How we can approach the ideal of dynamically managed max RSS
sizes is another question...

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2000-04-26 11:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-04-24 19:54 pressuring dirty pages (2.3.99-pre6) Rik van Riel
2000-04-24 21:27 ` Stephen C. Tweedie
2000-04-24 22:42   ` Rik van Riel
2000-04-25  9:35     ` Stephen C. Tweedie
2000-04-25 15:25       ` Rik van Riel
2000-04-25 13:58     ` Eric W. Biederman
2000-04-25 14:27 Mark_H_Johnson.RTS
2000-04-25 16:30 ` Stephen C. Tweedie
2000-04-25 19:14   ` Eric W. Biederman
2000-04-25 19:47     ` Rik van Riel
2000-04-26 11:43       ` Stephen C. Tweedie
2000-04-26 11:06     ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox