RE: on load control / process swapping

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* RE: on load control / process swapping
@ 2001-05-16 15:17 Charles Randall
  2001-05-16 17:14 ` Matt Dillon
  0 siblings, 1 reply; 15+ messages in thread
From: Charles Randall @ 2001-05-16 15:17 UTC (permalink / raw)
  To: 'Matt Dillon', Roger Larsson
  Cc: Rik van Riel, arch, linux-mm, sfkaplan

On a related note, we have a process (currently on Solaris, but possibly
moving to FreeBSD) that reads a 26 GB file just once for a database load. On
Solaris, we use the directio() function call to tell the filesystem to
bypass the buffer cache for this file descriptor.

>From the Solaris directio() man page,

     DIRECTIO_ON
             The system behaves as though the application is  not
             going  to reuse the file data in the near future. In
             other words, the file data  is  not  cached  in  the
             system's memory pages.

We found that without this, Solaris was aggressively trying to cache the
huge input file at the expense of database load performance (but we knew
that we'd never access it again). For some applications this is a huge win
(random I/O on a file much larger than memory seems to be another case).

Would there be an advantage to having a similar feature in FreeBSD (if not
already present)?

-Charles

-----Original Message-----
From: Matt Dillon [mailto:dillon@earth.backplane.com]
Sent: Tuesday, May 15, 2001 6:17 PM
To: Roger Larsson
Cc: Rik van Riel; arch@FreeBSD.ORG; linux-mm@kvack.org;
sfkaplan@cs.amherst.edu
Subject: Re: on load control / process swapping

:Are the heuristics persistent? 
:Or will the first use after  boot use the rough prediction? 
:For how long time will the heuristic stick? Suppose it is suddenly used in
:a slightly different way. Like two sequential readers instead of one...
:
:/RogerL
:Roger Larsson
:Skelleftea
:Sweden

    It's based on the VM page cache, so its adaptive over time.  I wouldn't
    call it persistent, it is nothing more then a simple heuristic that
    'normally' throws a page away but 'sometimes' caches it.  In otherwords,
    you lose some performance on the frontend in order to gain some later
    on.  If you loop through a file enough times, most of the file
    winds up getting cached.  It's still experimental so it is only
    lightly tied into the system.  It seems to work, though, so at some
    point in the future I'll probably try to put some significant prediction
    in.  But as I said, it's a very difficult thing to predict.  You can't
    just put your foot down and say 'I'll cache X amount of file Y'.  That
    doesn't work at all.

						-Matt

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RE: on load control / process swapping
  2001-05-16 15:17 on load control / process swapping Charles Randall
@ 2001-05-16 17:14 ` Matt Dillon
  2001-05-16 17:41   ` Rik van Riel
  0 siblings, 1 reply; 15+ messages in thread
From: Matt Dillon @ 2001-05-16 17:14 UTC (permalink / raw)
  To: Charles Randall; +Cc: Roger Larsson, Rik van Riel, arch, linux-mm, sfkaplan

    We've talked about implementing O_DIRECT.  I think it's a good idea.

    In regards to the particular case of scanning a huge multi-gigabyte
    file, FreeBSD has a sequential detection heuristic which does a
    pretty good job preventing cache blow-aways by depressing the priority
    of the data as it is read or written.  FreeBSD will still try to cache
    a good chunk, but it won't sacrifice all available memory.  If you
    access the data via the VM system, through mmap, you get even more 
    control through the madvise() syscall.

						-Matt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RE: on load control / process swapping
  2001-05-16 17:14 ` Matt Dillon
@ 2001-05-16 17:41   ` Rik van Riel
  2001-05-16 17:54     ` Matt Dillon
  2001-05-16 17:57     ` Alfred Perlstein
  0 siblings, 2 replies; 15+ messages in thread
From: Rik van Riel @ 2001-05-16 17:41 UTC (permalink / raw)
  To: Matt Dillon; +Cc: Charles Randall, Roger Larsson, arch, linux-mm, sfkaplan

On Wed, 16 May 2001, Matt Dillon wrote:

>     In regards to the particular case of scanning a huge multi-gigabyte
>     file, FreeBSD has a sequential detection heuristic which does a
>     pretty good job preventing cache blow-aways by depressing the priority
>     of the data as it is read or written.  FreeBSD will still try to cache
>     a good chunk, but it won't sacrifice all available memory.  If you
>     access the data via the VM system, through mmap, you get even more
>     control through the madvise() syscall.

There's one thing "wrong" with the drop-behind idea though;
it penalises data even when it's still in core and we're
reading it for the second or third time.

Maybe it would be better to only do drop-behind when we're
actually allocating new memory for the vnode in question and
let re-use of already present memory go "unpunished" ?

Hmmm, now that I think about this more, it _could_ introduce
some different fairness issues. Darn ;)

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RE: on load control / process swapping
  2001-05-16 17:41   ` Rik van Riel
@ 2001-05-16 17:54     ` Matt Dillon
  2001-05-16 19:59       ` Rik van Riel
  2001-05-18  5:58       ` Terry Lambert
  2001-05-16 17:57     ` Alfred Perlstein
  1 sibling, 2 replies; 15+ messages in thread
From: Matt Dillon @ 2001-05-16 17:54 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Charles Randall, Roger Larsson, arch, linux-mm, sfkaplan

    It's not dropping the data, it's dropping the priority.  And yes, it
    does penalize the data somewhat.  On the otherhand if the data happens
    to still be in the cache and you scan it a second time, the page priority
    gets bumped up relative to what it already was so the net effect is
    that the data becomes high priority after a few passes.

:Maybe it would be better to only do drop-behind when we're
:actually allocating new memory for the vnode in question and
:let re-use of already present memory go "unpunished" ?

    You get an equivalent effect even without dropping the priority,
    because you blow away prior pages when reading a file that is
    larger then main memory so they don't exist at all when you re-read.
    But you do not get the expected 'recycling' characteristics verses
    the rest of the system if you do not make a distinction between
    sequential and random access.  You want to slightly depress the priority
    behind a sequential access because the 'cost' of re-reading the disk
    sequentially is nothing compared to the cost of re-reading the disk
    randomly (by about a 30:1 ratio!).  So keeping randomly seek/read data
    is more important by degrees then keeping sequentially read data.

    This isn't to say that it isn't important to try to cache sequentially
    read data, just that the cost of throwing away sequentially read data
    is much lower then the cost of throwing away randomly read data on
    a general purpose machine.

    Terry's description of 'ld' mmap()ing and doing all sorts of random
    seeking causing most UNIXes, including FreeBSD, to have a brainfart of
    the dataset is too big to fit in the cache is true as far as it goes,
    but there really isn't much we can do about that situation
    'automatically'.  Without hints, the system can't predict the fact that
    it should be trying to cache the whole of the object files being accessed
    randomly.  A hint could make performance much better... a simple 
    madvise(... MADV_SEQUENTIAL) on the mapped memory inside LD would 
    probably be beneficial, as would madvise(... MADV_WILLNEED).

					-Matt

:Hmmm, now that I think about this more, it _could_ introduce
:some different fairness issues. Darn ;)
:
:regards,
:
:Rik
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: on load control / process swapping
  2001-05-16 17:41   ` Rik van Riel
  2001-05-16 17:54     ` Matt Dillon
@ 2001-05-16 17:57     ` Alfred Perlstein
  2001-05-16 18:01       ` Matt Dillon
  1 sibling, 1 reply; 15+ messages in thread
From: Alfred Perlstein @ 2001-05-16 17:57 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Matt Dillon, Charles Randall, Roger Larsson, arch, linux-mm, sfkaplan

* Rik van Riel <riel@conectiva.com.br> [010516 13:42] wrote:
> On Wed, 16 May 2001, Matt Dillon wrote:
> 
> >     In regards to the particular case of scanning a huge multi-gigabyte
> >     file, FreeBSD has a sequential detection heuristic which does a
> >     pretty good job preventing cache blow-aways by depressing the priority
> >     of the data as it is read or written.  FreeBSD will still try to cache
> >     a good chunk, but it won't sacrifice all available memory.  If you
> >     access the data via the VM system, through mmap, you get even more
> >     control through the madvise() syscall.
> 
> There's one thing "wrong" with the drop-behind idea though;
> it penalises data even when it's still in core and we're
> reading it for the second or third time.
> 
> Maybe it would be better to only do drop-behind when we're
> actually allocating new memory for the vnode in question and
> let re-use of already present memory go "unpunished" ?
> 
> Hmmm, now that I think about this more, it _could_ introduce
> some different fairness issues. Darn ;)

Both of you guys are missing the point.

The directio interface is meant to reduce the stress of a large
seqential operation on a file where caching is of no use.

Even if you depress the worthyness of the pages you've still
blown rather large amounts of unrelated data out of the cache
in order to allocate new cacheable pages.

A simple solution would involve passing along flags such that if
the IO occurs to a non-previously-cached page the buf/page is
immediately placed on the free list upon completion.  That way the
next IO can pull the now useless bufferspace from the freelist.

Basically you add another buffer queue for "throw away" data that
exists as a "barely cached" queue.  This way your normal data
doesn't compete on the LRU with non-cached data.

As a hack one it looks like one could use the QUEUE_EMPTYKVA
buffer queue under FreeBSD for this, however I think one might
loose the minimal amount of caching that could be done.

If the direct IO happens to a page that's previously cached
you adhere to the previous behavior.

A more fancy approach might map in user pages into the kernel to
do the IO directly, however on large MP this may cause pain because
the vm may need to issue ipi to invalidate tlb entries.

It's quite simple in theory, the hard part is the code.

-Alfred Perlstein
--
Instead of asking why a piece of software is using "1970s technology,"
start asking why software is ignoring 30 years of accumulated wisdom.
  http://www.egr.unlv.edu/~slumos/on-netbsd.html
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: on load control / process swapping
  2001-05-16 17:57     ` Alfred Perlstein
@ 2001-05-16 18:01       ` Matt Dillon
  2001-05-16 18:10         ` Alfred Perlstein
  0 siblings, 1 reply; 15+ messages in thread
From: Matt Dillon @ 2001-05-16 18:01 UTC (permalink / raw)
  To: Alfred Perlstein
  Cc: Rik van Riel, Charles Randall, Roger Larsson, arch, linux-mm, sfkaplan

:Both of you guys are missing the point.
:
:The directio interface is meant to reduce the stress of a large
:seqential operation on a file where caching is of no use.
:
:Even if you depress the worthyness of the pages you've still
:blown rather large amounts of unrelated data out of the cache
:in order to allocate new cacheable pages.
:
:A simple solution would involve passing along flags such that if
:the IO occurs to a non-previously-cached page the buf/page is
:immediately placed on the free list upon completion.  That way the
:next IO can pull the now useless bufferspace from the freelist.
:
:Basically you add another buffer queue for "throw away" data that
:exists as a "barely cached" queue.  This way your normal data
:doesn't compete on the LRU with non-cached data.
:
:As a hack one it looks like one could use the QUEUE_EMPTYKVA
:buffer queue under FreeBSD for this, however I think one might
:loose the minimal amount of caching that could be done.
:
:If the direct IO happens to a page that's previously cached
:you adhere to the previous behavior.
:
:A more fancy approach might map in user pages into the kernel to
:do the IO directly, however on large MP this may cause pain because
:the vm may need to issue ipi to invalidate tlb entries.
:
:It's quite simple in theory, the hard part is the code.
:
:-Alfred Perlstein

    I think someone tried to implement O_DIRECT a while back, but it
    was fairly complex to try to do away with caching entirely.

    I think our best bet to 'start' an implementation of O_DIRECT is
    to support the flag in open() and fcntl(), and have it simply
    modify the sequential detection heuristic to throw away pages
    and buffers rather then simply depressing their priority.

    Eventually we can implement the direct-I/O piece of the equation.

    I could do this first part in an hour, I think.  When I get home....

						-Matt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: on load control / process swapping
  2001-05-16 18:01       ` Matt Dillon
@ 2001-05-16 18:10         ` Alfred Perlstein
  0 siblings, 0 replies; 15+ messages in thread
From: Alfred Perlstein @ 2001-05-16 18:10 UTC (permalink / raw)
  To: Matt Dillon
  Cc: Rik van Riel, Charles Randall, Roger Larsson, arch, linux-mm, sfkaplan

* Matt Dillon <dillon@earth.backplane.com> [010516 14:01] wrote:
> 
>     I think someone tried to implement O_DIRECT a while back, but it
>     was fairly complex to try to do away with caching entirely.
> 
>     I think our best bet to 'start' an implementation of O_DIRECT is
>     to support the flag in open() and fcntl(), and have it simply
>     modify the sequential detection heuristic to throw away pages
>     and buffers rather then simply depressing their priority.

yes, as i said:

> :A simple solution would involve passing along flags such that if
> :the IO occurs to a non-previously-cached page the buf/page is
> :immediately placed on the free list upon completion.  That way the
> :next IO can pull the now useless bufferspace from the freelist.
> :
> :Basically you add another buffer queue for "throw away" data that
> :exists as a "barely cached" queue.  This way your normal data
> :doesn't compete on the LRU with non-cached data.
> 
>     Eventually we can implement the direct-I/O piece of the equation.
> 
>     I could do this first part in an hour, I think.  When I get home....

Thank you.

-Alfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RE: on load control / process swapping
  2001-05-16 17:54     ` Matt Dillon
@ 2001-05-16 19:59       ` Rik van Riel
  2001-05-16 20:41         ` Matt Dillon
  2001-05-18  5:58       ` Terry Lambert
  1 sibling, 1 reply; 15+ messages in thread
From: Rik van Riel @ 2001-05-16 19:59 UTC (permalink / raw)
  To: Matt Dillon; +Cc: Charles Randall, Roger Larsson, arch, linux-mm, sfkaplan

On Wed, 16 May 2001, Matt Dillon wrote:

> :There's one thing "wrong" with the drop-behind idea though;
> :it penalises data even when it's still in core and we're
> :reading it for the second or third time.
>
>     It's not dropping the data, it's dropping the priority.  And yes, it
>     does penalize the data somewhat.  On the otherhand if the data happens
>     to still be in the cache and you scan it a second time, the page priority
>     gets bumped up

But doesn't it get pushed _down_ again after the process has read
the data?  Or is this a part of the code outside of vm/* which I
haven't read yet?

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RE: on load control / process swapping
  2001-05-16 19:59       ` Rik van Riel
@ 2001-05-16 20:41         ` Matt Dillon
  0 siblings, 0 replies; 15+ messages in thread
From: Matt Dillon @ 2001-05-16 20:41 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Charles Randall, Roger Larsson, arch, linux-mm, sfkaplan

    Well, I was going to answer, but I can't find the code.  I'll have to
    look at it more closely.
    
					-Matt
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: on load control / process swapping
  2001-05-16 17:54     ` Matt Dillon
  2001-05-16 19:59       ` Rik van Riel
@ 2001-05-18  5:58       ` Terry Lambert
  2001-05-18  6:20         ` Matt Dillon
  1 sibling, 1 reply; 15+ messages in thread
From: Terry Lambert @ 2001-05-18  5:58 UTC (permalink / raw)
  To: Matt Dillon
  Cc: Rik van Riel, Charles Randall, Roger Larsson, arch, linux-mm, sfkaplan

Matt Dillon wrote:
>     Terry's description of 'ld' mmap()ing and doing all
>     sorts of random seeking causing most UNIXes, including
>     FreeBSD, to have a brainfart of the dataset is too big
>     to fit in the cache is true as far as it goes, but
>     there really isn't much we can do about that situation
>     'automatically'.  Without hints, the system can't predict
>     the fact that it should be trying to cache the whole of
>     the object files being accessed randomly.  A hint could
>     make performance much better... a simple madvise(...
>     MADV_SEQUENTIAL) on the mapped memory inside LD would
>     probably be beneficial, as would madvise(... MADV_WILLNEED).

I don't understand how either of those things could help
but make overall performance worse.

The problem is the program in question is seeking all
over the place, potentially multiple times, in order
to avoid building the table in memory itself.

For many symbols, like "printf", it will hit the area
of the library containing their addresses many, many
times.

The problem in this case is _truly_ that the program in
question is _really_ trying to optimize its performance
at the expense of other programs in the system.

The system _needs_ to make page-ins by this program come
_at the expense of this program_, rather than thrashing
all other programs out of core, only to have the quanta
given to these (now higher priority) programs used to
thrash the pages back in, instead of doing real work.

The problem is what to do about this badly behaved program,
so that the system itself doesn't spend unnecessary time
undoing its evil, and so that other (well behaved) programs
are not unfairly penalized.

Cutler suggested a working set quota (first in VMS, later
in NT) to deal with these programs.

-- Terry
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: on load control / process swapping
  2001-05-18  5:58       ` Terry Lambert
@ 2001-05-18  6:20         ` Matt Dillon
  2001-05-18 10:00           ` Andrew Reilly
  2001-05-18 13:49           ` Jonathan Morton
  0 siblings, 2 replies; 15+ messages in thread
From: Matt Dillon @ 2001-05-18  6:20 UTC (permalink / raw)
  To: Terry Lambert
  Cc: Rik van Riel, Charles Randall, Roger Larsson, arch, linux-mm, sfkaplan

:I don't understand how either of those things could help
:but make overall performance worse.
:
:The problem is the program in question is seeking all
:over the place, potentially multiple times, in order
:to avoid building the table in memory itself.
:
:For many symbols, like "printf", it will hit the area
:of the library containing their addresses many, many
:times.
:
:The problem in this case is _truly_ that the program in
:question is _really_ trying to optimize its performance
:at the expense of other programs in the system.

    The linker is seeking randomly as a side effect of
    the linking algorithm.  It is not doing it on purpose to try
    to save memory.  Forcing the VM system to think it's 
    sequential causes the VM system to perform read-aheads,
    generally reducing the actual amount of physical seeking
    that must occur by increasing the size of the chunks
    read from disk.  Even if the linker's dataset is huge,
    increasing the chunk size is beneficial because linkers
    ultimately access the entire object file anyway.  Trying
    to save a few seeks is far more important then reading
    extra data and having to throw half of it away.

:The problem is what to do about this badly behaved program,
:so that the system itself doesn't spend unnecessary time
:undoing its evil, and so that other (well behaved) programs
:are not unfairly penalized.
:
:Cutler suggested a working set quota (first in VMS, later
:in NT) to deal with these programs.
:
:-- Terry

    The problem is not the resident set size, it's the
    seeking that the program is causing as a matter of
    course.  Be that as it may, the resident set size
    can be limited with the 'memoryuse' sysctl.  The system
    imposes the specified limit only when the memory
    subsystem is under pressure.

    You can also reduce the amount of random seeking the
    linker does by ordering the object modules within the
    library to forward-reference the dependancies.

					-Matt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: on load control / process swapping
  2001-05-18  6:20         ` Matt Dillon
@ 2001-05-18 10:00           ` Andrew Reilly
  2001-05-18 13:49           ` Jonathan Morton
  1 sibling, 0 replies; 15+ messages in thread
From: Andrew Reilly @ 2001-05-18 10:00 UTC (permalink / raw)
  To: Matt Dillon
  Cc: Terry Lambert, Rik van Riel, Charles Randall, Roger Larsson,
	arch, linux-mm, sfkaplan

On Thu, May 17, 2001 at 11:20:23PM -0700, Matt Dillon wrote:
>Terry wrote:
> :The problem in this case is _truly_ that the program in
> :question is _really_ trying to optimize its performance
> :at the expense of other programs in the system.
> 
>     The linker is seeking randomly as a side effect of
>     the linking algorithm.  It is not doing it on purpose to try
>     to save memory.  Forcing the VM system to think it's 
>     sequential causes the VM system to perform read-aheads,
>     generally reducing the actual amount of physical seeking
>     that must occur by increasing the size of the chunks
>     read from disk.  Even if the linker's dataset is huge,
>     increasing the chunk size is beneficial because linkers
>     ultimately access the entire object file anyway.  Trying
>     to save a few seeks is far more important then reading
>     extra data and having to throw half of it away.

I know that this problem is real in the case of data base index
accesses---databases have data sets larger than RAM almost by
definition---and that the problem (of dealing with "randomly"
accessed memory mapped files) should be neatly solved in
general.

But is this issue of linking really the lynch pin?

Are there _any_ programs and library sets where the union of the
code sizes is larger than physical memory?

I haven't looked at the problem myself, but (on the surface)
it doesn't seem too likely.  There is a grand total of 90M of .a
files on my system (/usr/lib, /usr/X11/lib, and /usr/local/lib),
and I doubt that even a majority of them would be needed at
once.

-- 
Andrew
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: on load control / process swapping
  2001-05-18  6:20         ` Matt Dillon
  2001-05-18 10:00           ` Andrew Reilly
@ 2001-05-18 13:49           ` Jonathan Morton
  2001-05-19  2:18             ` Rik van Riel
  1 sibling, 1 reply; 15+ messages in thread
From: Jonathan Morton @ 2001-05-18 13:49 UTC (permalink / raw)
  To: Matt Dillon, Terry Lambert
  Cc: Rik van Riel, Charles Randall, Roger Larsson, arch, linux-mm, sfkaplan

>    The problem is not the resident set size, it's the
>    seeking that the program is causing as a matter of
>    course.

The RSS of 'ld' isn't the problem, no.  However, the working-set idea would
place an effective and sensible limit of the size of the disk cache, by
ensuring that other apps aren't being paged out beyond their non-working
sets.  Does this make sense?

FWIW, I've been running with a 2-line hack in my kernel for some weeks now,
which essentially forces the RSS of each process not to be forced below
some arbitrary "fair share" of the physical memory available.  It's not a
very clean hack, but it improves performance by a very large margin under a
thrashing load.  The only problem I'm seeing is a deadlock when I run out
of VM completely, but I think that's a separate issue that others are
already working on.

To others: is there already a means whereby we can (almost) calculate the
WS of a given process?  The "accessed" flag isn't a good one, but maybe the
'age' value is better.  However, I haven't quite clicked on how the 'age'
value is affected in either direction.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: on load control / process swapping
  2001-05-18 13:49           ` Jonathan Morton
@ 2001-05-19  2:18             ` Rik van Riel
  2001-05-19  2:56               ` Jonathan Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Rik van Riel @ 2001-05-19  2:18 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Matt Dillon, Terry Lambert, Charles Randall, Roger Larsson, arch,
	linux-mm, sfkaplan

On Fri, 18 May 2001, Jonathan Morton wrote:

> FWIW, I've been running with a 2-line hack in my kernel for some weeks
> now, which essentially forces the RSS of each process not to be forced
> below some arbitrary "fair share" of the physical memory available.  
> It's not a very clean hack, but it improves performance by a very
> large margin under a thrashing load.  The only problem I'm seeing is a
> deadlock when I run out of VM completely, but I think that's a
> separate issue that others are already working on.

I'm pretty sure I know what you're running into.

Say you guarantee a minimum of 3% of memory for each process;
now when you have 30 processes running your memory is full and
you cannot reclaim any pages when one of the processes runs
into a page fault.

The minimum RSS guarantee is a really nice thing to prevent the
proverbial root shell from thrashing, but it really only works
if you drop such processes every once in a while and swap them
out completely. You especially need to do this when you're
getting tight on memory and you have idle processes sitting around
using their minimum RSS worth of RAM ;)

It'd work great together with load control though. I guess I should
post a patch for - simple&naive - load control code once I've got
the inodes and the dirty page writeout code balancing fixed.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: on load control / process swapping
  2001-05-19  2:18             ` Rik van Riel
@ 2001-05-19  2:56               ` Jonathan Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Morton @ 2001-05-19  2:56 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Matt Dillon, Terry Lambert, Charles Randall, Roger Larsson, arch,
	linux-mm, sfkaplan

>> FWIW, I've been running with a 2-line hack in my kernel for some weeks
>> now, which essentially forces the RSS of each process not to be forced
>> below some arbitrary "fair share" of the physical memory available.
>> It's not a very clean hack, but it improves performance by a very
>> large margin under a thrashing load.  The only problem I'm seeing is a
>> deadlock when I run out of VM completely, but I think that's a
>> separate issue that others are already working on.
>
>I'm pretty sure I know what you're running into.
>
>Say you guarantee a minimum of 3% of memory for each process;
>now when you have 30 processes running your memory is full and
>you cannot reclaim any pages when one of the processes runs
>into a page fault.

Actually I already thought of that one, and made it a "fair share" of the
system rather than a fixed amount.  IOW, the guaranteed amount is something
like (total_memory / nr_processes).  I think I was even sane enough to
lower this value slightly to allow for some buffer/cache memory, but I
didn't allow for locked pages (including the kernel itself).

The deadlock happened when the swap ran out, not the physical RAM, and is
independent of this particular hack - remember I'm running with some
out_of_memory() fixes and some other hackery I did a month or so ago
(remember that massive "OOM killer" thread?).  I should try to figure those
out and present cleaned-up versions for further perusal...

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2001-05-19  2:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-05-16 15:17 on load control / process swapping Charles Randall
2001-05-16 17:14 ` Matt Dillon
2001-05-16 17:41   ` Rik van Riel
2001-05-16 17:54     ` Matt Dillon
2001-05-16 19:59       ` Rik van Riel
2001-05-16 20:41         ` Matt Dillon
2001-05-18  5:58       ` Terry Lambert
2001-05-18  6:20         ` Matt Dillon
2001-05-18 10:00           ` Andrew Reilly
2001-05-18 13:49           ` Jonathan Morton
2001-05-19  2:18             ` Rik van Riel
2001-05-19  2:56               ` Jonathan Morton
2001-05-16 17:57     ` Alfred Perlstein
2001-05-16 18:01       ` Matt Dillon
2001-05-16 18:10         ` Alfred Perlstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox