Re: Thread implementations...

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: Thread implementations...
@ 1998-06-30 19:30 Larry McVoy
  1998-07-01  8:50 ` Stephen C. Tweedie
  0 siblings, 1 reply; 26+ messages in thread
From: Larry McVoy @ 1998-06-30 19:30 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Eric W. Biederman, Christoph Rohland, linux-kernel, linux-mm

: Not for very large files: the forget-behind is absolutely critical in
: that case.

SunOS' local file system, UFS, implements the following alg for forget
behind on all types of accesses (SunOS has a unified page cache, all
accesses are mmap based, read/write are implemented by the kernel doing
an mmap and then a bcopy):

	if ((free_memory < we_will_start_paging_soon) &&
	    (offset is clust_size multiple) &&
	    (offset > small_file) &&
	    (access is sequential)) {
	    	free_behind(vp, offset - clust_size, clust_size);
	}

in the ufs_getpage() code.

I'll admit this was a hack, but it had some nice attributes that you might
want to consider:

	1) it was nice that I/O took care of itself.  The pageout daemon is
	   pretty costly (Stephen, we talked about this at Linux Expo - this
	   is why I want a pageout daemon that works on files, not on pages).
	
	2) Small files aren't worth the trouble and aren't the cause of the
	   trouble.  
	
	3) Random access frequently wants caching and randoms are expensive
	   to bring in.
	
	4) I/O is freed in large chunks, not a page at a time.  It's about
	   as costly to bring in one page as bring in 64-256K these days.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-30 19:30 Thread implementations Larry McVoy
@ 1998-07-01  8:50 ` Stephen C. Tweedie
  1998-07-03 15:21   ` Rik van Riel
  0 siblings, 1 reply; 26+ messages in thread
From: Stephen C. Tweedie @ 1998-07-01  8:50 UTC (permalink / raw)
  To: Larry McVoy
  Cc: Stephen C. Tweedie, Eric W. Biederman, Christoph Rohland,
	linux-kernel, linux-mm

Hi,

On Tue, 30 Jun 1998 12:30:45 -0700, lm@bitmover.com (Larry McVoy)
said:

> 	if ((free_memory < we_will_start_paging_soon) &&
> 	    (offset is clust_size multiple) &&
> 	    (offset > small_file) &&
> 	    (access is sequential)) {
> 	    	free_behind(vp, offset - clust_size, clust_size);
> 	}

Looks entirely reasonable.  I've been thinking of something very
similar but just a little more complex, so that we can also cleanly
handle the case of sequential mmap()ed reads, both of mapped files and
potentially of anonymous datasets.  The difference there is that if we
are dealing with tiled data, then we may need to allow a larger window
between the current pagein cursor and the forget-behind cursor.
Again, if we just unmap the pages and place them on a high-priority
reuse queue, then getting the guess wrong just results in a minor
fault unless we do actually reuse the memory before accessing the data
again.

> 	1) it was nice that I/O took care of itself.  The pageout daemon is
> 	   pretty costly (Stephen, we talked about this at Linux Expo - this
> 	   is why I want a pageout daemon that works on files, not on pages).

Yes, and Ingo and I have been talking about ways of doing it.

> 	2) Small files aren't worth the trouble and aren't the cause of the
> 	   trouble.  

Small files benefit from a similar scheme.  For small
sequentially-accessed files, as they age, we want to remove the entire
file from cache at once.  Repopulating a sequential file's fragmented
cache is expensive anyway, so it may in fact be _cheaper_ to do this
than to just throw out one page at a time.  

As long as we have the concept of a virtual extent, where we define
that extent as the natural readahead pattern for the workload, then we
want to uncache the same units we readahead.  That's normally
sequential clusters, but if we have things like Ingo's random swap
stats-based prediction logic, then we can use exactly the same extent
concept there too.

--Stephen

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-07-01  8:50 ` Stephen C. Tweedie
@ 1998-07-03 15:21   ` Rik van Riel
  1998-07-03 20:05     ` Stephen C. Tweedie
  0 siblings, 1 reply; 26+ messages in thread
From: Rik van Riel @ 1998-07-03 15:21 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Linux MM

On Wed, 1 Jul 1998, Stephen C. Tweedie wrote:

> sequential clusters, but if we have things like Ingo's random swap
> stats-based prediction logic, then we can use exactly the same extent
> concept there too.

Hmm, it appears this was the legendary swap readahead code I
was looking for a while ago :)

But, ehhh, just what _is_ this random swap stats-based prediction
algorithm, and how far from implementation is it?
(and if it isn't implemented yet, what should I do to make
it implemented; swapin readahead is very wanted on my
memory-starved box...)

Rik.
+-------------------------------------------------------------------+
| Linux memory management tour guide.        H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader.      http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-07-03 15:21   ` Rik van Riel
@ 1998-07-03 20:05     ` Stephen C. Tweedie
  1998-07-03 20:36       ` Rik van Riel
  0 siblings, 1 reply; 26+ messages in thread
From: Stephen C. Tweedie @ 1998-07-03 20:05 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Stephen C. Tweedie, Linux MM

Hi,

On Fri, 3 Jul 1998 17:21:51 +0200 (CEST), Rik van Riel
<H.H.vanRiel@phys.uu.nl> said:

> On Wed, 1 Jul 1998, Stephen C. Tweedie wrote:
>> sequential clusters, but if we have things like Ingo's random swap
>> stats-based prediction logic, then we can use exactly the same extent
>> concept there too.

> Hmm, it appears this was the legendary swap readahead code I
> was looking for a while ago :)

> But, ehhh, just what _is_ this random swap stats-based prediction
> algorithm, 

It's a per-swap-page readahead predictor which observes the access
patterns for vmas.  

> and how far from implementation is it?

It is implemented.  It is not in the main kernels, nor does it take
advantage of the potential for swap readahead in the 2.1.86+ kernels.

--Stephen
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-07-03 20:05     ` Stephen C. Tweedie
@ 1998-07-03 20:36       ` Rik van Riel
  1998-07-04 16:37         ` Stephen C. Tweedie
  0 siblings, 1 reply; 26+ messages in thread
From: Rik van Riel @ 1998-07-03 20:36 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Linux MM

On Fri, 3 Jul 1998, Stephen C. Tweedie wrote:
> On Fri, 3 Jul 1998 17:21:51 +0200 (CEST), Rik van Riel
> <H.H.vanRiel@phys.uu.nl> said:
> 
> > But, ehhh, just what _is_ this random swap stats-based prediction
> > algorithm, 
> It's a per-swap-page readahead predictor which observes the access
> patterns for vmas.  
> 
> > and how far from implementation is it?
> It is implemented.  It is not in the main kernels, nor does it take
> advantage of the potential for swap readahead in the 2.1.86+ kernels.

Then where is it? It would be great to test and it
would make an excellent link with description for
the Linux MM homepage...

Besides, I'm currently somewhat memory starved and
I would really like to test and possibly improve
or integrate this piece of code with the main kernel.

I know it's too late for inclusion now, but I'm willing
to keep the patch up-to-date with the kernel up to the
date of inclusion.

Rik.
+-------------------------------------------------------------------+
| Linux memory management tour guide.        H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader.      http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-07-03 20:36       ` Rik van Riel
@ 1998-07-04 16:37         ` Stephen C. Tweedie
  0 siblings, 0 replies; 26+ messages in thread
From: Stephen C. Tweedie @ 1998-07-04 16:37 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Stephen C. Tweedie, Linux MM

Hi,

On Fri, 3 Jul 1998 22:36:06 +0200 (CEST), Rik van Riel
<H.H.vanRiel@phys.uu.nl> said:

> Then where is it? It would be great to test and it
> would make an excellent link with description for
> the Linux MM homepage...

You'd have to ask Ingo for it.

--Stephen
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

[parent not found: <199806240915.TAA09504@vindaloo.atnf.CSIRO.AU>]

[parent not found: <Pine.LNX.3.96dg4.980624025515.26983E-100000@twinlark.arctic.org>]

[parent not found: <199806241213.WAA10661@vindaloo.atnf.CSIRO.AU>]

* Re: Thread implementations...
       [not found]   ` <199806241213.WAA10661@vindaloo.atnf.CSIRO.AU>
@ 1998-06-24 22:00     ` Eric W. Biederman
  1998-06-24 23:41       ` Richard Gooch
  1998-06-25  4:12       ` Dean Gaudet
  0 siblings, 2 replies; 26+ messages in thread
From: Eric W. Biederman @ 1998-06-24 22:00 UTC (permalink / raw)
  To: Richard Gooch; +Cc: Dean Gaudet, linux-kernel, linux-mm

>>>>> "RG" == Richard Gooch <Richard.Gooch@atnf.CSIRO.AU> writes:

RG> If we get madvise(2) right, we don't need sendfile(2), correct?

It looks like it from here.  As far as madvise goes, I think we need
to implement madvise(2) as:

enum madvise_strategy {
        MADV_NORMAL,
        MADV_RANDOM,
        MADV_SEQUENTIAL,
        MADV_WILLNEED,
        MADV_DONTNEED,
}
struct madvise_struct {
	caddr_t addr;
	size_t size;
	size_t strategy;
};
int sys_madvise(struct madvise_struct *, int count);

With madvise(3) following the traditional format with only one
advisement can be done easily.  The reason I suggest multiple
arguments is that for apps that have random but predictable access
patterns will want to use MADV_WILLNEED & MADV_DONTNEED to an optimum
swapping algorigthm.

And for that you will probably need multiple address ranges.  The
clustering comunity has a similiar syscall implemented for programs
whose working set size exceeds avaiable memory.  Except it has
strategy hardwired to MADV_WILLNEED.

However someone needs to look at actuall programs to see which form
is more practical to implement, in the kernel.

Of course all I know about madvise I just read in the kernel source so
I may be totally off...

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-24 22:00     ` Eric W. Biederman
@ 1998-06-24 23:41       ` Richard Gooch
  1998-06-25  4:45         ` Eric W. Biederman
  1998-06-25  4:12       ` Dean Gaudet
  1 sibling, 1 reply; 26+ messages in thread
From: Richard Gooch @ 1998-06-24 23:41 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, linux-mm

Eric W. Biederman writes:
> >>>>> "RG" == Richard Gooch <Richard.Gooch@atnf.CSIRO.AU> writes:
> 
> RG> If we get madvise(2) right, we don't need sendfile(2), correct?
> 
> It looks like it from here.  As far as madvise goes, I think we need
> to implement madvise(2) as:
> 
> enum madvise_strategy {
>         MADV_NORMAL,
>         MADV_RANDOM,
>         MADV_SEQUENTIAL,
>         MADV_WILLNEED,
>         MADV_DONTNEED,
> }
> struct madvise_struct {
> 	caddr_t addr;
> 	size_t size;
> 	size_t strategy;
> };
> int sys_madvise(struct madvise_struct *, int count);
> 
> With madvise(3) following the traditional format with only one
               ^
Don't you mean 2?

> advisement can be done easily.  The reason I suggest multiple
> arguments is that for apps that have random but predictable access
> patterns will want to use MADV_WILLNEED & MADV_DONTNEED to an optimum
> swapping algorigthm.

I'm not aware of madvise() being a POSIX standard. I've appended the
man page from alpha_OSF1, which looks reasonable. It would be nice to
be compatible with something.

				Regards,

					Richard....
===============================================================================
madvise(2)							   madvise(2)



NAME

  m\bma\bad\bdv\bvi\bis\bse\be - Advise the system of the expected paging behavior of a process

SYNOPSIS

  #\b#i\bin\bnc\bcl\blu\bud\bde\be <\b<s\bsy\bys\bs/\b/t\bty\byp\bpe\bes\bs.\b.h\bh>\b>
  #\b#i\bin\bnc\bcl\blu\bud\bde\be <\b<s\bsy\bys\bs/\b/m\bmm\bma\ban\bn.\b.h\bh>\b>
  i\bin\bnt\bt m\bma\bad\bdv\bvi\bis\bse\be (\b(
	  c\bca\bad\bdd\bdr\br_\b_t\bt _\ba_\bd_\bd_\br,\b,
	  s\bsi\biz\bze\be_\b_t\bt _\bl_\be_\bn,\b,
	  i\bin\bnt\bt _\bb_\be_\bh_\ba_\bv )\b);\b;

PARAMETERS

  _\ba_\bd_\bd_\br	    Specifies the address of the region	to which the advice refers.

  _\bl_\be_\bn	    Specifies the length in bytes of the region	specified by the _\ba_\bd_\bd_\br
	    parameter.

  _\bb_\be_\bh_\ba_\bv	    Specifies the behavior of the region.  The following values	for
	    the	_\bb_\be_\bh_\ba_\bv parameter	are defined in the s\bsy\bys\bs/\b/m\bmm\bma\ban\bn.\b.h\bh header file:

	    M\bMA\bAD\bDV\bV_\b_N\bNO\bOR\bRM\bMA\bAL\bL
		      No further special treatment

	    M\bMA\bAD\bDV\bV_\b_R\bRA\bAN\bND\bDO\bOM\bM
		      Expect random page references

	    M\bMA\bAD\bDV\bV_\b_S\bSE\bEQ\bQU\bUE\bEN\bNT\bTI\bIA\bAL\bL
		      Expect sequential	references

	    M\bMA\bAD\bDV\bV_\b_W\bWI\bIL\bLL\bLN\bNE\bEE\bED\bD
		      Will need	these pages

	    M\bMA\bAD\bDV\bV_\b_D\bDO\bON\bNT\bTN\bNE\bEE\bED\bD
		      Do not need these	pages

		      The system will free any resident	pages that are allo-
		      cated to the region.  All	modifications will be lost
		      and any swapped out pages	will be	discarded.  Subse-
		      quent access to the region will result in	a zero-fill-
		      on-demand	fault as though	it is being accessed for the
		      first time.  Reserved swap space is not affected by
		      this call.

	    M\bMA\bAD\bDV\bV_\b_S\bSP\bPA\bAC\bCE\bEA\bAV\bVA\bAI\bIL\bL
		      Ensure that resources are	reserved

DESCRIPTION

  The m\bma\bad\bdv\bvi\bis\bse\be(\b()\b)	function permits a process to advise the system	about its
  expected future behavior in referencing a mapped file	or shared memory
  region.

NOTES

  Only a few values of the b\bbe\beh\bha\bav\bv parameter values are operational on Digital
  UNIX systems.	 Non-operational values	cause the system to always return
  success (zero).

RETURN VALUES

  Upon successful completion, the m\bma\bad\bdv\bvi\bis\bse\be(\b()\b) function returns zero.  Other-
  wise,	-1 is returned and e\ber\brr\brn\bno\bo is set	to indicate the	error.

ERRORS

  If the m\bma\bad\bdv\bvi\bis\bse\be(\b()\b) function fails, e\ber\brr\brn\bno\bo may be	set to one of the following
  values:

  [\b[E\bEI\bIN\bNV\bVA\bAL\bL]\b]  The	_\bb_\be_\bh_\ba_\bv parameter	is invalid.

  [\b[E\bEN\bNO\bOS\bSP\bPC\bC]\b]  The	_\bb_\be_\bh_\ba_\bv parameter	specifies MADV_SPACEAVAIL and resources	can
	    not	be reserved.

RELATED	INFORMATION

  Functions: m\bmm\bma\bap\bp(2)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-24 23:41       ` Richard Gooch
@ 1998-06-25  4:45         ` Eric W. Biederman
  1998-06-25 17:14           ` Todd Larason
  1998-06-26  7:53           ` Christoph Rohland
  0 siblings, 2 replies; 26+ messages in thread
From: Eric W. Biederman @ 1998-06-25  4:45 UTC (permalink / raw)
  To: Richard Gooch; +Cc: linux-kernel, linux-mm

>>>>> "RG" == Richard Gooch <Richard.Gooch@atnf.CSIRO.AU> writes:

RG> Eric W. Biederman writes:
>> >>>>> "RG" == Richard Gooch <Richard.Gooch@atnf.CSIRO.AU> writes:

>> With madvise(3) following the traditional format with only one
RG>                ^
RG> Don't you mean 2?

My suggestion:
madvise(2)(struct madvise_struct *, int number_of_structs);
madvise(3)(caddr_t addr, size_t len, size_t strategy);

madvise(3) being in libc...

>> advisement can be done easily.  The reason I suggest multiple
>> arguments is that for apps that have random but predictable access
>> patterns will want to use MADV_WILLNEED & MADV_DONTNEED to an optimum
>> swapping algorigthm.

RG> I'm not aware of madvise() being a POSIX standard. I've appended the
RG> man page from alpha_OSF1, which looks reasonable. It would be nice to
RG> be compatible with something.

According to the kernel source it is available on:
the alpha, mips, and sparc.  And the mips code thinks there is a posix
version somewhere.

Does someone have the Sun/sparc man page?  Besides what is in the
kernel source I mean.

> 	    MADV_WILLNEED
	This needs to start an asynchronouse pagein if necessary.

> 	    MADV_DONTNEED
> 		      Do not need these	pages

> 		      The system will free any resident	pages that are allo-
> 		      cated to the region.  All	modifications will be lost
> 		      and any swapped out pages	will be	discarded.  Subse-
> 		      quent access to the region will result in	a zero-fill-
> 		      on-demand	fault as though	it is being accessed for the
> 		      first time.  Reserved swap space is not affected by
> 		      this call.

This one is broken, for 3 reasons.
1) madvise should only give advise.
2) This can be done with mmap(start, len, PROT..., MAP_ANON, -1, 0)
3) There is a more reasonable interpretation from IRIX:

     MADV_DONTNEED    informs the system that the address range	from addr to
		      addr + len will likely not be referenced in the near
		      future.  The memory to which the indicated addresses are
		      mapped will be the first to be reclaimed when memory is
		      needed by	the system.

Which means that with a smart programmer you can implement the optimal
swapping algorithm for your process with MADV_DONTNEED and
MADV_WILLNEED and be relatively portable.

Of course MADV_SEQUENTIAL should handle the case of sending a file out
a socket, for a userspace sendfile.

> 	    MADV_SPACEAVAIL
> 		      Ensure that resources are	reserved

This one also does more than advise and for that reason I don't like it.

Anyhow this looks like something to keep in mind for 2.3.
Currently I have too many projects in the air to do more than think
the interface through.  The mapping type could easily be stored in the
vma as a hind though.  Perhaps it could be ready for 2.2 but I
couldn't do it.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-25  4:45         ` Eric W. Biederman
@ 1998-06-25 17:14           ` Todd Larason
  1998-06-26  7:53           ` Christoph Rohland
  1 sibling, 0 replies; 26+ messages in thread
From: Todd Larason @ 1998-06-25 17:14 UTC (permalink / raw)
  To: linux-kernel, linux-mm

On Wed, Jun 24, 1998 at 11:45:52PM -0500, Eric W. Biederman wrote:
> >>>>> "RG" == Richard Gooch <Richard.Gooch@atnf.CSIRO.AU> writes:
> 
> RG> Eric W. Biederman writes:
> >> >>>>> "RG" == Richard Gooch <Richard.Gooch@atnf.CSIRO.AU> writes:
> 
> Does someone have the Sun/sparc man page?




C Library Functions                                    madvise(3)



NAME
     madvise - provide advice to VM system

SYNOPSIS
     #include <sys/types.h>
     #include <sys/mman.h>

     int madvise(caddr_t _\ba_\bd_\bd_\br, size_t _\bl_\be_\bn, int _\ba_\bd_\bv_\bi_\bc_\be);

DESCRIPTION
     madvise() advises the kernel that a region  of  user  mapped
     memory in the range [_\ba_\bd_\bd_\br, _\ba_\bd_\bd_\br + _\bl_\be_\bn) will be accessed fol-
     lowing a type of pattern.  The kernel uses this  information
     to  optimize  the procedure for manipulating and maintaining
     the resources associated with the specified mapping range.

     Values for _\ba_\bd_\bv_\bi_\bc_\be are defined in <sys/mman.h> as:

     #define MADV_NORMAL        0x0     /* No further special treatment */
     #define MADV_RANDOM        0x1     /* Expect random page references */
     #define MADV_SEQUENTIAL    0x2     /* Expect sequential page references */
     #define MADV_WILLNEED      0x3     /* Will need these pages */
     #define MADV_DONTNEED      0x4     /* Don't need these pages */

     MADV_NORMAL
          The  default  system  characteristic  where   accessing
          memory  within  the  address range causes the system to
          read data from the mapped file.  The kernel  reads  all
          data  from  files  into  pages which are retained for a
          period of time as a "cache."  System  pages  can  be  a
          scarce  resource, so the kernel steals pages from other
          mappings when needed.  This is a likely occurrence, but
          adversely  affects  system  performance only if a large
          amount of memory is accessed.

     MADV_RANDOM
          Tells the kernel to read in a minimum  amount  of  data
          from a mapped file on any single particular access.  If
          MADV_NORMAL is in effect when an address  of  a  mapped
          file  is  accessed, the system tries to read in as much
          data from the file as reasonable,  in  anticipation  of
          other accesses within a certain locality.

     MADV_SEQUENTIAL
          Tells the system  that  addresses  in  this  range  are
          likely  to  be  accessed  only once, so the system will
          free the resources mapping the address range as quickly
          as  possible.   This  is  used  in the cat(1) and cp(1)
          utilities.

     MADV_WILLNEED
          Tells the  system  that  a  certain  address  range  is



SunOS 5.6           Last change: 29 Dec 1996                    1






C Library Functions                                    madvise(3)



          definitely  needed so the kernel will start reading the
          specified range into memory.  This can benefit programs
          wanting  to  minimize  the time needed to access memory
          the first time, as the kernel would  need  to  read  in
          from the file.

     MADV_DONTNEED
          Tells the kernel that the specified address range is no
          longer  needed,  so  the  system  starts  to  free  the
          resources associated with the address range.

     madvise() should be used by programs with specific knowledge
     of  their  access  patterns  over a memory object, such as a
     mapped file, to increase system performance.

RETURN VALUES
     madvise() returns:

     0    on success.

     -1   on failure and sets errno to indicate the error.

ERRORS
     EINVAL         _\ba_\bd_\bd_\br is not a multiple of the  page  size  as
                    returned by sysconf(3C).

                    The length of the specified address range  is
                    less  than  or  equal to 0, or the advice was
                    invalid.

     EIO            An I/O error occurred while reading  from  or
                    writing to the file system.

     ENOMEM         Addresses in the range [_\ba_\bd_\bd_\br, _\ba_\bd_\bd_\br + _\bl_\be_\bn) are
                    outside the valid range for the address space
                    of a process, or specify one  or  more  pages
                    that are not mapped.

     ESTALE         Stale nfs file handle.

ATTRIBUTES
     See attributes(5) for descriptions of the  following  attri-
     butes:

     __________________________________
    | ATTRIBUTE TYPE|  ATTRIBUTE VALUE|
    |_\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b|\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b__\b|\b_
    | MT-Level      |  MT-Safe        |
    |________________\b|__________________\b|

SEE ALSO
     cat(1), cp(1), mmap(2), sysconf(3C), attributes(5)



SunOS 5.6           Last change: 29 Dec 1996                    2


No mention of conforming to any standard her.  HP-UX 10.20's manpage
claims conformance with AES and SVID3.  It defines a MADV_SPACEAVAIL
behavior too, but notes that it isn't implemented.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-25  4:45         ` Eric W. Biederman
  1998-06-25 17:14           ` Todd Larason
@ 1998-06-26  7:53           ` Christoph Rohland
  1998-06-26 14:16             ` Eric W. Biederman
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Rohland @ 1998-06-26  7:53 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, linux-mm

ebiederm+eric@npwt.net (Eric W. Biederman) writes:

> > 	    MADV_DONTNEED
> > 		      Do not need these	pages
> 
> > 		      The system will free any resident	pages that are allo-
> > 		      cated to the region.  All	modifications will be lost
> > 		      and any swapped out pages	will be	discarded.  Subse-
> > 		      quent access to the region will result in	a zero-fill-
> > 		      on-demand	fault as though	it is being accessed for the
> > 		      first time.  Reserved swap space is not affected by
> > 		      this call.
> 
> This one is broken, for 3 reasons.
> 1) madvise should only give advise.
> 2) This can be done with mmap(start, len, PROT..., MAP_ANON, -1, 0)
> 3) There is a more reasonable interpretation from IRIX:
> 
>      MADV_DONTNEED    informs the system that the address range	from addr to
> 		      addr + len will likely not be referenced in the near
> 		      future.  The memory to which the indicated addresses are
> 		      mapped will be the first to be reclaimed when memory is
> 		      needed by	the system.

I do not agree:

1) why should madvise only advise. O.K. it is a naming thing, but I
   think you can find more terms which went far from the original
   meaning.
2) Would not work on shared pages.
3) Why is IRIX more reasonable than any other implementation?

The functionality described in the OSF manpage greatly help
transactional programs, which use loads of memory for single
transactions. I do not know if it should be done with madvise, but
there is at least one OS which thinks it is the right place and I
would look for this functionality exactly there.

Cheers
      Christoph
--
#include <stddisclaimer.h>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-26  7:53           ` Christoph Rohland
@ 1998-06-26 14:16             ` Eric W. Biederman
  1998-06-29 10:19               ` Stephen C. Tweedie
  0 siblings, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 1998-06-26 14:16 UTC (permalink / raw)
  To: Christoph Rohland; +Cc: Eric W. Biederman, linux-kernel, linux-mm

>>>>> "CR" == Christoph Rohland <hans-christoph.rohland@sap-ag.de> writes:

CR> I do not agree:

CR> 1) why should madvise only advise. O.K. it is a naming thing, but I
CR>    think you can find more terms which went far from the original
CR>    meaning.

Because if it only advises, you can ignore it and return success.
If it does more than advise you have to do much more error checking
and error handling.  If it turns out we want to give lots of advise in
one syscall, instead of just one piece of advise, this could be
important.

CR> 2) Would not work on shared pages.
Not perfectly.  That does appear to be the achillies heel currently of
madvise.  Multiple users of the same memory.

CR> 3) Why is IRIX more reasonable than any other implementation?
Well IRIX also sync with the sun man page and my intuition.
I am thinking in terms of swapping hints, and specific functionality
doesn't fit into that category.

CR> The functionality described in the OSF manpage greatly help
CR> transactional programs, which use loads of memory for single
CR> transactions. I do not know if it should be done with madvise, but
CR> there is at least one OS which thinks it is the right place and I
CR> would look for this functionality exactly there.

I hadn't considered the transaction case.  In fact I haven't
considered most cases. That's partly why I'm still talking.

But still there are other more portable methods to achieve a memory
reset, as I mentioned earlier.   And there isn't another even semi
portable method to achieve swapping hints.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-26 14:16             ` Eric W. Biederman
@ 1998-06-29 10:19               ` Stephen C. Tweedie
  1998-06-30  6:19                 ` Eric W. Biederman
  0 siblings, 1 reply; 26+ messages in thread
From: Stephen C. Tweedie @ 1998-06-29 10:19 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Christoph Rohland, linux-kernel, linux-mm

Hi,

On 26 Jun 1998 09:16:14 -0500, ebiederm+eric@npwt.net (Eric
W. Biederman) said:

>>>>>> "CR" == Christoph Rohland <hans-christoph.rohland@sap-ag.de> writes:

CR> 1) why should madvise only advise. 

> Because if it only advises, you can ignore it and return success.
> If it does more than advise you have to do much more error checking
> and error handling.  

Not necessarily; even if we do take immediate action on the advise,
within the madvise system call, we don't have to do any extra layers of
error handling.   It's more a case of "Please try to do this now / OK, I
tried."

CR> 2) Would not work on shared pages.
> Not perfectly.  That does appear to be the achillies heel currently of
> madvise.  Multiple users of the same memory.

Again, madvise is the application telling us that it KNOWS what the
access pattern is.  If the app is wrong, and the page is shared, big
deal; throw away the advise, it was duff. :)

--Stephen

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-29 10:19               ` Stephen C. Tweedie
@ 1998-06-30  6:19                 ` Eric W. Biederman
  1998-06-30 13:10                   ` Stephen C. Tweedie
  0 siblings, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 1998-06-30  6:19 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Christoph Rohland, linux-kernel, linux-mm

>>>>> "ST" == Stephen C Tweedie <sct@dcs.ed.ac.uk> writes:

ST> Hi,
ST> On 26 Jun 1998 09:16:14 -0500, ebiederm+eric@npwt.net (Eric
ST> W. Biederman) said:

>>>>>>> "CR" == Christoph Rohland <hans-christoph.rohland@sap-ag.de> writes:

CR> 1) why should madvise only advise. 

>> Because if it only advises, you can ignore it and return success.
>> If it does more than advise you have to do much more error checking
>> and error handling.  

ST> Not necessarily; even if we do take immediate action on the advise,
ST> within the madvise system call, we don't have to do any extra layers of
ST> error handling.   It's more a case of "Please try to do this now / OK, I
ST> tried."

The semantics for some of one or two of the implimentation specific
madvise options were more much more like mlock...  And for that you
need extra error checking to confirm that success occured.

The try this now I see as a totally appropriate implementation.

CR> 2) Would not work on shared pages.
>> Not perfectly.  That does appear to be the achillies heel currently of
>> madvise.  Multiple users of the same memory.

ST> Again, madvise is the application telling us that it KNOWS what the
ST> access pattern is.  If the app is wrong, and the page is shared, big
ST> deal; throw away the advise, it was duff. :)

Again the case was: I have a multithreaded web server serving up
files.  The web server mmaps each file, and calls 
madvise(file_start, file_len, MADV_SEQUENTIAL).    The trick is that
it may be serving the say file to two different clients
simultaneously.

MADV_SEQUENTIAL implies readahead, and forget behind, but for a simple
process.

The forget behind is tricky and difficult to get right, but if we
concentrate on aggressive readahead (in this  we will probably be
o.k.)

And some readahead we already have implemented filemap_nopage.
Getting it general for the whole mm layer could be fun but it is
certainly doable.  Though at the moment putting hint information in
the vm_area_struct, and keeping the implemetation in the nopage functions
sounds like the way to go.  

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-30  6:19                 ` Eric W. Biederman
@ 1998-06-30 13:10                   ` Stephen C. Tweedie
  1998-06-30 19:35                     ` Dean Gaudet
  0 siblings, 1 reply; 26+ messages in thread
From: Stephen C. Tweedie @ 1998-06-30 13:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Stephen C. Tweedie, Christoph Rohland, linux-kernel, linux-mm

Hi,

On 30 Jun 1998 01:19:18 -0500, ebiederm+eric@npwt.net (Eric
W. Biederman) said:

> Again the case was: I have a multithreaded web server serving up
> files.  The web server mmaps each file, and calls madvise(file_start,
> file_len, MADV_SEQUENTIAL).  The trick is that it may be serving the
> say file to two different clients simultaneously.

The actual sharing is not a problem; the cache is already safe against
that even when doing readahead.

> MADV_SEQUENTIAL implies readahead, and forget behind, but for a simple
> process.

Yep, the forget behind is the important stuff to get right, but all we
need to do there is to unmap the pages from the process's address space:
we don't need to actually flush the page cache.  As long as the page
cache can find these pages quickly if it needs to reuse the memory for
something else, then there's no reason to actually forget the data there
and then.

> The forget behind is tricky and difficult to get right, but if we
> concentrate on aggressive readahead (in this  we will probably be
> o.k.)

Not for very large files: the forget-behind is absolutely critical in
that case.

--Stephen

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-30 13:10                   ` Stephen C. Tweedie
@ 1998-06-30 19:35                     ` Dean Gaudet
  1998-07-01  9:09                       ` Stephen C. Tweedie
  0 siblings, 1 reply; 26+ messages in thread
From: Dean Gaudet @ 1998-06-30 19:35 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Eric W. Biederman, Christoph Rohland, linux-kernel, linux-mm

On Tue, 30 Jun 1998, Stephen C. Tweedie wrote:

> Not for very large files: the forget-behind is absolutely critical in
> that case.

I dunno why you're thinking of unmapping pages though... isn't an mmap
cache the best way to amortize the extra cost of mmap()ing?  In that case
you don't want the forget-behind pages to be unmapped.  But you do want
them to be dropped from memory when appropriate.

Another thought re: sendfile.  The network layer could hint to sendfile as
to the speed of the socket it's delivering to.  With that hint and some
suitable queueing theory someone should be able to get a nifty little
algorithm that will "synchronize" sockets as much as possible without
noticeable delays to the user.  By "synchronize" I mean getting them going
from the same, or nearby pages.  That way on larger than memory data sets
the kernel can sacrifice some latency on a few connections in order to
improve the total throughput. 

I won't pretend to have a good heuristic for it ;) 

applications:  multimedia servers -- audio/video streaming.  These boxes
can be limited by disk bandwidth because their data sets are typically
much larger than RAM. 

Dean

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-30 19:35                     ` Dean Gaudet
@ 1998-07-01  9:09                       ` Stephen C. Tweedie
  0 siblings, 0 replies; 26+ messages in thread
From: Stephen C. Tweedie @ 1998-07-01  9:09 UTC (permalink / raw)
  To: Dean Gaudet
  Cc: Stephen C. Tweedie, Eric W. Biederman, Christoph Rohland,
	linux-kernel, linux-mm

Hi,

On Tue, 30 Jun 1998 12:35:35 -0700 (PDT), Dean Gaudet
<dgaudet-list-linux-kernel@arctic.org> said:

> On Tue, 30 Jun 1998, Stephen C. Tweedie wrote:

>> Not for very large files: the forget-behind is absolutely critical in
>> that case.

> I dunno why you're thinking of unmapping pages though...  But you do
> want them to be dropped from memory when appropriate.

We want to *physically* unmap them from the page tables.  You can't
evict the pages from cache if they are still physically mapped!

--Stephen

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-24 22:00     ` Eric W. Biederman
  1998-06-24 23:41       ` Richard Gooch
@ 1998-06-25  4:12       ` Dean Gaudet
  1998-06-25  3:53         ` Richard Gooch
  1998-06-25  4:56         ` Eric W. Biederman
  1 sibling, 2 replies; 26+ messages in thread
From: Dean Gaudet @ 1998-06-25  4:12 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Richard Gooch, linux-kernel, linux-mm

On 24 Jun 1998, Eric W. Biederman wrote:

> >>>>> "RG" == Richard Gooch <Richard.Gooch@atnf.CSIRO.AU> writes:
> 
> RG> If we get madvise(2) right, we don't need sendfile(2), correct?
> 
> It looks like it from here.  As far as madvise goes, I think we need
> to implement madvise(2) as:

... note that mmap() requires a bunch of kernel structures set up to map
things into the program's memory space... when in reality the program
doesn't care at all about the bytes.  (And then there's process address
space limitations...)  sendfile() and such don't have these problems, and
it may be far more simple to implement sendfile() than it would be to put
all the hints and such into the mm layer to get mmap() performance up to
the same level. 

Dean

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-25  4:12       ` Dean Gaudet
@ 1998-06-25  3:53         ` Richard Gooch
  1998-06-25 11:32           ` Stephen C. Tweedie
  1998-06-25  4:56         ` Eric W. Biederman
  1 sibling, 1 reply; 26+ messages in thread
From: Richard Gooch @ 1998-06-25  3:53 UTC (permalink / raw)
  To: Dean Gaudet; +Cc: Eric W. Biederman, linux-kernel, linux-mm

Dean Gaudet writes:
> 
> 
> On 24 Jun 1998, Eric W. Biederman wrote:
> 
> > >>>>> "RG" == Richard Gooch <Richard.Gooch@atnf.CSIRO.AU> writes:
> > 
> > RG> If we get madvise(2) right, we don't need sendfile(2), correct?
> > 
> > It looks like it from here.  As far as madvise goes, I think we need
> > to implement madvise(2) as:
> 
> ... note that mmap() requires a bunch of kernel structures set up to map
> things into the program's memory space... when in reality the program
> doesn't care at all about the bytes.  (And then there's process address
> space limitations...)  sendfile() and such don't have these problems, and
> it may be far more simple to implement sendfile() than it would be to put
> all the hints and such into the mm layer to get mmap() performance up to
> the same level. 

This may be true, but my point is that we *need* a decent madvise(2)
implementation. It will be use to a greater range of applications than
sendfile(2).

				Regards,

					Richard....

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-25  3:53         ` Richard Gooch
@ 1998-06-25 11:32           ` Stephen C. Tweedie
  1998-06-25 21:24             ` Chris Wedgwood
  1998-06-25 22:16             ` Richard Gooch
  0 siblings, 2 replies; 26+ messages in thread
From: Stephen C. Tweedie @ 1998-06-25 11:32 UTC (permalink / raw)
  To: Richard Gooch; +Cc: Dean Gaudet, Eric W. Biederman, linux-kernel, linux-mm

Hi,

On Thu, 25 Jun 1998 13:53:36 +1000, Richard Gooch
<Richard.Gooch@atnf.CSIRO.AU> said:

> This may be true, but my point is that we *need* a decent madvise(2)
> implementation. It will be use to a greater range of applications than
> sendfile(2).

Not necessarily; we may be able to detect a lot of the relevant access
patterns ourselves.  Ingo has had a swap prediction algorithm for a
while, and we talked at Usenix about a number of other things we can do
to tune vm performance automatically.  2.3 ought to be a great deal
better.  madvise() may still have merit, but we really ought to be
aiming at making the vm system as self-tuning as possible.

--Stephen

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-25 11:32           ` Stephen C. Tweedie
@ 1998-06-25 21:24             ` Chris Wedgwood
  1998-06-25 22:16             ` Richard Gooch
  1 sibling, 0 replies; 26+ messages in thread
From: Chris Wedgwood @ 1998-06-25 21:24 UTC (permalink / raw)
  To: Stephen C. Tweedie, Richard Gooch
  Cc: Dean Gaudet, Eric W. Biederman, linux-kernel, linux-mm

> Not necessarily; we may be able to detect a lot of the relevant access
> patterns ourselves.  Ingo has had a swap prediction algorithm for a
> while, and we talked at Usenix about a number of other things we can do
> to tune vm performance automatically.  2.3 ought to be a great deal
> better.  madvise() may still have merit, but we really ought to be
> aiming at making the vm system as self-tuning as possible.

madvise(2) will _always_ have some uses.

Large database applications and stuff can know in advance how to tune mmap
regions and stuff. The kernel will always be second guessing here, and
making sub optimal decisions, whereas the application can and probably does
know better.

The same argument also applies to raw devices (but lets not start that
thread again).



-Chris

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-25 11:32           ` Stephen C. Tweedie
  1998-06-25 21:24             ` Chris Wedgwood
@ 1998-06-25 22:16             ` Richard Gooch
  1 sibling, 0 replies; 26+ messages in thread
From: Richard Gooch @ 1998-06-25 22:16 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel, linux-mm

Stephen C. Tweedie writes:
> Hi,
> 
> On Thu, 25 Jun 1998 13:53:36 +1000, Richard Gooch
> <Richard.Gooch@atnf.CSIRO.AU> said:
> 
> > This may be true, but my point is that we *need* a decent madvise(2)
> > implementation. It will be use to a greater range of applications than
> > sendfile(2).
> 
> Not necessarily; we may be able to detect a lot of the relevant access
> patterns ourselves.  Ingo has had a swap prediction algorithm for a
> while, and we talked at Usenix about a number of other things we can do
> to tune vm performance automatically.  2.3 ought to be a great deal
> better.  madvise() may still have merit, but we really ought to be
> aiming at making the vm system as self-tuning as possible.

Including when I access my tiled data?

				Regards,

					Richard....

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-25  4:12       ` Dean Gaudet
  1998-06-25  3:53         ` Richard Gooch
@ 1998-06-25  4:56         ` Eric W. Biederman
  1998-06-25 11:35           ` Stephen C. Tweedie
  1 sibling, 1 reply; 26+ messages in thread
From: Eric W. Biederman @ 1998-06-25  4:56 UTC (permalink / raw)
  To: Dean Gaudet; +Cc: Eric W. Biederman, Richard Gooch, linux-kernel, linux-mm

>>>>> "DG" == Dean Gaudet <dgaudet-list-linux-kernel@arctic.org> writes:

DG> On 24 Jun 1998, Eric W. Biederman wrote:

>> >>>>> "RG" == Richard Gooch <Richard.Gooch@atnf.CSIRO.AU> writes:
>> 
RG> If we get madvise(2) right, we don't need sendfile(2), correct?
>> 
>> It looks like it from here.  As far as madvise goes, I think we need
>> to implement madvise(2) as:

DG> ... note that mmap() requires a bunch of kernel structures set up to map
DG> things into the program's memory space... when in reality the program
DG> doesn't care at all about the bytes.  (And then there's process address
DG> space limitations...)  sendfile() and such don't have these problems, and
DG> it may be far more simple to implement sendfile() than it would be to put
DG> all the hints and such into the mm layer to get mmap() performance up to
DG> the same level. 

mmap, madvise(SEQUENTIAL),write 
is easy to implement.  The mmap layer already does readahead, all we
do is tell it not to be so conservative.

Meanwhile to write sendfile, you need to do all of the same work
(except the page tables) without an interface to do it with.
madvise looks simpler from here.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-25  4:56         ` Eric W. Biederman
@ 1998-06-25 11:35           ` Stephen C. Tweedie
  1998-06-25 20:31             ` Dean Gaudet
  1998-06-30  6:40             ` Eric W. Biederman
  0 siblings, 2 replies; 26+ messages in thread
From: Stephen C. Tweedie @ 1998-06-25 11:35 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Dean Gaudet, Richard Gooch, linux-kernel, linux-mm

Hi,

On 24 Jun 1998 23:56:28 -0500, ebiederm+eric@npwt.net (Eric
W. Biederman) said:

> mmap, madvise(SEQUENTIAL),write 
> is easy to implement.  The mmap layer already does readahead, all we
> do is tell it not to be so conservative.

Swap readhead is also now possible.  However, madvise(SEQUENTIAL) needs
to do much more than this; it needs to aggressively track what region of
the vma is being actively used, and to unmap those areas no longer in
use.  (They can remain in cache until the memory is needed for something
else, of course.)  The madvise is only going to be important if the
whole file / vma does not fit into memory, so having advice that a piece
of memory not recently accessed is unlikely to be accessed again until
the next sequential pass is going to be very valuable.  It will prevent
us from having to swap out more useful stuff.

--Stephen

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-25 11:35           ` Stephen C. Tweedie
@ 1998-06-25 20:31             ` Dean Gaudet
  1998-06-30  6:40             ` Eric W. Biederman
  1 sibling, 0 replies; 26+ messages in thread
From: Dean Gaudet @ 1998-06-25 20:31 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Eric W. Biederman, Richard Gooch, linux-kernel, linux-mm



On Thu, 25 Jun 1998, Stephen C. Tweedie wrote:

> Hi,
> 
> On 24 Jun 1998 23:56:28 -0500, ebiederm+eric@npwt.net (Eric
> W. Biederman) said:
> 
> > mmap, madvise(SEQUENTIAL),write 
> > is easy to implement.  The mmap layer already does readahead, all we
> > do is tell it not to be so conservative.
> 
> Swap readhead is also now possible.  However, madvise(SEQUENTIAL) needs
> to do much more than this; it needs to aggressively track what region of
> the vma is being actively used, and to unmap those areas no longer in
> use.

Remember it's *regions* not just a region.  An http/ftp server sends the
same file over and over and over.  There are many cursors moving
sequentially within the same file.  A threaded http/ftp server will have a
single mmap, and multiple users of that mmap. 

Dean

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Thread implementations...
  1998-06-25 11:35           ` Stephen C. Tweedie
  1998-06-25 20:31             ` Dean Gaudet
@ 1998-06-30  6:40             ` Eric W. Biederman
  1 sibling, 0 replies; 26+ messages in thread
From: Eric W. Biederman @ 1998-06-30  6:40 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Dean Gaudet, Richard Gooch, linux-kernel, linux-mm

>>>>> "ST" == Stephen C Tweedie <sct@dcs.ed.ac.uk> writes:

ST> Hi,
ST> On 24 Jun 1998 23:56:28 -0500, ebiederm+eric@npwt.net (Eric
ST> W. Biederman) said:

>> mmap, madvise(SEQUENTIAL),write 
>> is easy to implement.  The mmap layer already does readahead, all we
>> do is tell it not to be so conservative.

ST> Swap readhead is also now possible.  However, madvise(SEQUENTIAL) needs
ST> to do much more than this; 

In the long term I agree.  We can get a close approximation to the
proper behavior by simply doing aggressive readahead.  This is doable
now, and should work in the presence of multiple readers.

ST> it needs to aggressively track what region of
ST> the vma is being actively used, and to unmap those areas no longer in
ST> use.  (They can remain in cache until the memory is needed for something
ST> else, of course.)  The madvise is only going to be important if the
ST> whole file / vma does not fit into memory, 

Actally it will be important if the whole working set of data, (which
in a web server would be _all_ of it's files is too large to fit into
memory).  Each file /vma may fit in fine.

ST> so having advice that a piece
ST> of memory not recently accessed is unlikely to be accessed again until
ST> the next sequential pass is going to be very valuable.  It will prevent
ST> us from having to swap out more useful stuff.

Agreed.

Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~1998-07-04 16:38 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-06-30 19:30 Thread implementations Larry McVoy
1998-07-01  8:50 ` Stephen C. Tweedie
1998-07-03 15:21   ` Rik van Riel
1998-07-03 20:05     ` Stephen C. Tweedie
1998-07-03 20:36       ` Rik van Riel
1998-07-04 16:37         ` Stephen C. Tweedie
     [not found] <199806240915.TAA09504@vindaloo.atnf.CSIRO.AU>
     [not found] ` <Pine.LNX.3.96dg4.980624025515.26983E-100000@twinlark.arctic.org>
     [not found]   ` <199806241213.WAA10661@vindaloo.atnf.CSIRO.AU>
1998-06-24 22:00     ` Eric W. Biederman
1998-06-24 23:41       ` Richard Gooch
1998-06-25  4:45         ` Eric W. Biederman
1998-06-25 17:14           ` Todd Larason
1998-06-26  7:53           ` Christoph Rohland
1998-06-26 14:16             ` Eric W. Biederman
1998-06-29 10:19               ` Stephen C. Tweedie
1998-06-30  6:19                 ` Eric W. Biederman
1998-06-30 13:10                   ` Stephen C. Tweedie
1998-06-30 19:35                     ` Dean Gaudet
1998-07-01  9:09                       ` Stephen C. Tweedie
1998-06-25  4:12       ` Dean Gaudet
1998-06-25  3:53         ` Richard Gooch
1998-06-25 11:32           ` Stephen C. Tweedie
1998-06-25 21:24             ` Chris Wedgwood
1998-06-25 22:16             ` Richard Gooch
1998-06-25  4:56         ` Eric W. Biederman
1998-06-25 11:35           ` Stephen C. Tweedie
1998-06-25 20:31             ` Dean Gaudet
1998-06-30  6:40             ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox