journaling & VM (was: Re: reiserfs being part of the kernel: it's not just the code)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* journaling & VM  (was: Re: reiserfs being part of the kernel: it's not just the code)
       [not found]           ` <20000606205447.T23701@redhat.com>
@ 2000-06-06 23:06             ` Rik van Riel
  2000-06-07  1:19               ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Hans Reiser
  2000-06-07 10:10               ` journaling & VM (was: Re: reiserfs being part of the kernel: it's not " Stephen C. Tweedie
       [not found]             ` <393DACC8.5DB60A81@reiser.to>
  1 sibling, 2 replies; 60+ messages in thread
From: Rik van Riel @ 2000-06-06 23:06 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Hans Reiser, bert hubert, linux-kernel, Chris Mason, linux-mm

On Tue, 6 Jun 2000, Stephen C. Tweedie wrote:

> It wasn't a journaling API we were talking about for this.  The
> problem is much more central to the VM than that --- basically,
> the VM currently assumes that any existing page can be evicted
> from memory with very little extra work.  It just isn't prepared
> for the situation that you have with transactions,


> journaling itself, but the transactional requirements which are
> the problem --- basically the VM cannot do _anything_ about
> individual pages which are pinned by a transaction, but rather
> we need a way to trigger a filesystem flush, AND to prevent more
> dirtying of pages by the filesystem (these are two distinct
> problems), or we just lock up under load on lower memory boxes.

This is especially tricky in the case of a large mmap()ed
file. We'll have to restrict the maximum number of read-write
mapped pages from such a file in order to keep the system
stable...

(try mmap002 from quintela's MM test suite with a journaling
FS for a nice change...)

> A reservation API which lets all transactional filesystems
> reserve the right to dirty a certain number of pages in advance
> of actually needing them is really needed to avoid such lockups.  
> The reservation call can stall if the memory limit has been
> reached, providing flow control to the filesystem; and a
> notification list can start committing and flushing older
> transactions when that happens.

Indeed we need this. Since I seem to be coordinating the VM
changes at the moment anyway, I'd love to work together with
the journaling folks on solving this problem...

It will require some changes in the page fault path and some
other areas...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-06 23:06             ` journaling & VM (was: Re: reiserfs being part of the kernel: it's not just the code) Rik van Riel
@ 2000-06-07  1:19               ` Hans Reiser
  2000-06-07  1:46                 ` Quintela Carreira Juan J.
  2000-06-07 11:12                 ` Stephen C. Tweedie
  2000-06-07 10:10               ` journaling & VM (was: Re: reiserfs being part of the kernel: it's not " Stephen C. Tweedie
  1 sibling, 2 replies; 60+ messages in thread
From: Hans Reiser @ 2000-06-07  1:19 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stephen C. Tweedie, bert hubert, linux-kernel, Chris Mason,
	linux-mm, Alexander Zarochentcev

Rik van Riel wrote:
> 
> On Tue, 6 Jun 2000, Stephen C. Tweedie wrote:
> 
> > It wasn't a journaling API we were talking about for this.  The
> > problem is much more central to the VM than that --- basically,
> > the VM currently assumes that any existing page can be evicted
> > from memory with very little extra work.  It just isn't prepared
> > for the situation that you have with transactions,
> 
> > journaling itself, but the transactional requirements which are
> > the problem --- basically the VM cannot do _anything_ about
> > individual pages which are pinned by a transaction, but rather
> > we need a way to trigger a filesystem flush, AND to prevent more
> > dirtying of pages by the filesystem (these are two distinct
> > problems), or we just lock up under load on lower memory boxes.
> 
> This is especially tricky in the case of a large mmap()ed
> file. We'll have to restrict the maximum number of read-write
> mapped pages from such a file in order to keep the system
> stable...
> 
> (try mmap002 from quintela's MM test suite with a journaling
> FS for a nice change...)
> 
> > A reservation API which lets all transactional filesystems
> > reserve the right to dirty a certain number of pages in advance
> > of actually needing them is really needed to avoid such lockups.
> > The reservation call can stall if the memory limit has been
> > reached, providing flow control to the filesystem; and a
> > notification list can start committing and flushing older
> > transactions when that happens.
> 
> Indeed we need this. Since I seem to be coordinating the VM
> changes at the moment anyway, I'd love to work together with
> the journaling folks on solving this problem...
> 
> It will require some changes in the page fault path and some
> other areas...
> 
> regards,
> 
> Rik
> --
> The Internet is not a network of computers. It is a network
> of people. That is its real strength.
> 
> Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
> http://www.conectiva.com/               http://www.surriel.com/


quite happy to see you drive it, I suggest to check with zam as he has some code
in progress.

There are two issues to address:

1) If a buffer needs to be flushed to disk, how do we let the FS flush
everything else that it is optimal to flush at the same time as that buffer. 
zam's allocate on flush code addresses that issue for reiserfs, and he has some
general hooks implemented also.  He is guessed to be two weeks away.

2) If multiple kernel subsystem page pinners pin memory, how do we keep them
from deadlocking.  Chris as you know is the reiserfs guy for that.

Hans
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel:  it'snot just the code)
  2000-06-07  1:19               ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Hans Reiser
@ 2000-06-07  1:46                 ` Quintela Carreira Juan J.
  2000-06-07  3:45                   ` Hans Reiser
  2000-06-07 11:12                 ` Stephen C. Tweedie
  1 sibling, 1 reply; 60+ messages in thread
From: Quintela Carreira Juan J. @ 2000-06-07  1:46 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Rik van Riel, Stephen C. Tweedie, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

>>>>> "hans" == Hans Reiser <hans@reiser.to> writes:

Hi

hans> quite happy to see you drive it, I suggest to check with zam as he has some code
hans> in progress.

hans> There are two issues to address:

hans> 1) If a buffer needs to be flushed to disk, how do we let the FS flush
hans> everything else that it is optimal to flush at the same time as that buffer. 
hans> zam's allocate on flush code addresses that issue for reiserfs, and he has some
hans> general hooks implemented also.  He is guessed to be two weeks away.

Ok, register a cache function and it will receive the _priority_ (also
know as _how hard_ should try to free memory).  Once that memory is
freed put that pages in the LRU list.  Not need to have them there
before because there is no way that shrink_mmap would be able to free
them anyway.

This is the reason because of what I think that one operation in the
address space makes no sense.  No sense because it can't be called
from the page.

hans> 2) If multiple kernel subsystem page pinners pin memory, how do we keep them
hans> from deadlocking.  Chris as you know is the reiserfs guy for that.

I think that Riel is also working in that just now.  I think that is
better to find one API that is good for everybody.

I would also like to see some common API for this kind of allocation
of memory.

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel:   it'snot just the code)
  2000-06-07  1:46                 ` Quintela Carreira Juan J.
@ 2000-06-07  3:45                   ` Hans Reiser
  2000-06-07 11:15                     ` Stephen C. Tweedie
  0 siblings, 1 reply; 60+ messages in thread
From: Hans Reiser @ 2000-06-07  3:45 UTC (permalink / raw)
  To: Quintela Carreira Juan J.
  Cc: Rik van Riel, Stephen C. Tweedie, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

"Quintela Carreira Juan J." wrote:
> 
> >>>>> "hans" == Hans Reiser <hans@reiser.to> writes:
> 
> Hi
> 
> hans> quite happy to see you drive it, I suggest to check with zam as he has some code
> hans> in progress.
> 
> hans> There are two issues to address:
> 
> hans> 1) If a buffer needs to be flushed to disk, how do we let the FS flush
> hans> everything else that it is optimal to flush at the same time as that buffer.
> hans> zam's allocate on flush code addresses that issue for reiserfs, and he has some
> hans> general hooks implemented also.  He is guessed to be two weeks away.
> 
> Ok, register a cache function and it will receive the _priority_ (also
> know as _how hard_ should try to free memory).  Once that memory is
> freed put that pages in the LRU list.  Not need to have them there
> before because there is no way that shrink_mmap would be able to free
> them anyway.
> 
> This is the reason because of what I think that one operation in the
> address space makes no sense.  No sense because it can't be called
> from the page.

What do you think of my argument that each of the subcaches should register
currently_consuming counters which are the number of pages that subcache
currently takes up in memory, plus register an integer "preciousness" value, and
that the pressure API should pressure according to the formula:

pressure equals currently_consuming squared times preciousness

Further, that the equation above should be a nice one line formula in one place
in the kernel so that we can easily play with variations on it and benchmark the
results.

I don't like the current scheme of priorities of caches, it seems wrong to me
intuitively.

> 
> hans> 2) If multiple kernel subsystem page pinners pin memory, how do we keep them
> hans> from deadlocking.  Chris as you know is the reiserfs guy for that.
> 
> I think that Riel is also working in that just now.  I think that is
> better to find one API that is good for everybody.

I think the issue is not who can do it well, but would somebody finally just do
it?  We have discussed it for 9 months now on fsdevel....:-)

> 
> I would also like to see some common API for this kind of allocation
> of memory.
> 
> Later, Juan.
> 
> --
> In theory, practice and theory are the same, but in practice they
> are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it's not just the code)
  2000-06-06 23:06             ` journaling & VM (was: Re: reiserfs being part of the kernel: it's not just the code) Rik van Riel
  2000-06-07  1:19               ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Hans Reiser
@ 2000-06-07 10:10               ` Stephen C. Tweedie
  1 sibling, 0 replies; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 10:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stephen C. Tweedie, Hans Reiser, bert hubert, linux-kernel,
	Chris Mason, linux-mm

Hi,

On Tue, Jun 06, 2000 at 08:06:38PM -0300, Rik van Riel wrote:
> 
> > journaling itself, but the transactional requirements which are
> > the problem --- basically the VM cannot do _anything_ about
> > individual pages which are pinned by a transaction, but rather
> > we need a way to trigger a filesystem flush, AND to prevent more
> > dirtying of pages by the filesystem (these are two distinct
> > problems), or we just lock up under load on lower memory boxes.
> 
> This is especially tricky in the case of a large mmap()ed
> file. We'll have to restrict the maximum number of read-write
> mapped pages from such a file in order to keep the system
> stable...

We need to restrict *all* pinned pages.  That includes writable
pages on a transactional filesystem, but also includes metadata
being used as part of an existing transaction, as well as any
potential metadata which *might* be used in the future by that
transaction.

> Indeed we need this. Since I seem to be coordinating the VM
> changes at the moment anyway, I'd love to work together with
> the journaling folks on solving this problem...

OK, I'll look up the old writeups I did with Chris about this.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: reiserfs being part of the kernel: it's not just the code
       [not found]             ` <393DACC8.5DB60A81@reiser.to>
@ 2000-06-07 11:00               ` Stephen C. Tweedie
  2000-06-07 17:11                 ` Rik van Riel
  2000-06-07 17:46                 ` Hans Reiser
  0 siblings, 2 replies; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 11:00 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Stephen C. Tweedie, bert hubert, linux-kernel, Chris Mason, linux-mm

Hi,

On Tue, Jun 06, 2000 at 07:00:40PM -0700, Hans Reiser wrote:
> 
> Do I miss a finepoint, or can this reservation API be as simple as using an
> agreed on counter for total system pinned pages which is constrained to some
> percentage of memory?  I think we all discussed all of this last year, and the
> workshop Riel tried to organize sadly never happened.

It's a good bit more complex than that.  We need not only that reservation
layer, but also a new notification mechanism to invoke early commit if 
we exhaust the reservation limit, and a way of interacting with dirty
pages (which are not yet part of any transaction, but which may not be
flushable to disk without a new transaction being incurred).  The dirty
mmaped data case is particularly nasty: we have very little VM 
infrastructure right now which is suitable for fixing that.

> Perhaps we should do a
> workshop July 5 at the Libre Software conference in France?  Probably this issue
> will already be solved by then, but there are plenty of other discussions to
> have in the vicinity of this problem.

Who will be at Usenix in San Diego in a couple of weeks' time?  There
will certainly be some of the XFS and GFS people there, and I'll be 
around all week.

Cheers, 
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07  1:19               ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Hans Reiser
  2000-06-07  1:46                 ` Quintela Carreira Juan J.
@ 2000-06-07 11:12                 ` Stephen C. Tweedie
  2000-06-07 16:35                   ` journaling & VM John Fremlin
  2000-06-07 17:48                   ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Hans Reiser
  1 sibling, 2 replies; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 11:12 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Rik van Riel, Stephen C. Tweedie, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

Hi,

On Tue, Jun 06, 2000 at 06:19:22PM -0700, Hans Reiser wrote:
> 
> There are two issues to address:
> 
> 1) If a buffer needs to be flushed to disk, how do we let the FS flush
> everything else that it is optimal to flush at the same time as that buffer. 
> zam's allocate on flush code addresses that issue for reiserfs, and he has some
> general hooks implemented also.  He is guessed to be two weeks away.

That's easy to deal with using address_space callbacks from shrink_mmap.
shrink_mmap just calls into the filesystem to tell it that something
needs to be done.  The filesystem can, in response, flush as much data
as it wants to in addition to the page requested --- or can flush none
at all if the page is pinned.  The address_space callbacks should be
thought of as hints from the VM that the filesystem needs to do 
something.  shrink_mmap will keep on trying until it finds something
to free if nothing happens on the first call.

> 2) If multiple kernel subsystem page pinners pin memory, how do we keep them
> from deadlocking.  Chris as you know is the reiserfs guy for that.

Use reservations.  That's the point --- you reserve in advance, so that 
the VM can *guarantee* that you can continue to pin more pages up to
the maximum you have reserved.  You take a reservation before starting
a fs operation, so that if you need to block, it doesn't prevent the
running transaction from being committed.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07  3:45                   ` Hans Reiser
@ 2000-06-07 11:15                     ` Stephen C. Tweedie
  2000-06-07 13:23                       ` Rik van Riel
  2000-06-07 13:40                       ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Chris Mason
  0 siblings, 2 replies; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 11:15 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Quintela Carreira Juan J.,
	Rik van Riel, Stephen C. Tweedie, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

Hi,

On Tue, Jun 06, 2000 at 08:45:08PM -0700, Hans Reiser wrote:
> > 
> > This is the reason because of what I think that one operation in the
> > address space makes no sense.  No sense because it can't be called
> > from the page.
> 
> What do you think of my argument that each of the subcaches should register
> currently_consuming counters which are the number of pages that subcache
> currently takes up in memory,

There is no need for subcaches at all if all of the pages can be
represented on the page cache LRU lists.  That would certainly make
balancing between caches easier.  However, there may be caches which
don't fit that model --- how would it work for ReiserFS if the cache 
balancing was all done through the page cache?  There is a lot of 
work being done on the VM to balance the page cache properly right  
now, and if we can use that work for journaling filesystems too, it
will make our final VM a lot less fragile over extreme load conditions.

Cheers, 
Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 11:15                     ` Stephen C. Tweedie
@ 2000-06-07 13:23                       ` Rik van Riel
  2000-06-07 13:41                         ` Stephen C. Tweedie
  2000-06-07 19:02                         ` journaling & VM (was: Re: reiserfs being part of the kernel:it'snot " Hans Reiser
  2000-06-07 13:40                       ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Chris Mason
  1 sibling, 2 replies; 60+ messages in thread
From: Rik van Riel @ 2000-06-07 13:23 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Hans Reiser, Quintela Carreira Juan J.,
	bert hubert, linux-kernel, Chris Mason, linux-mm,
	Alexander Zarochentcev

On Wed, 7 Jun 2000, Stephen C. Tweedie wrote:
> On Tue, Jun 06, 2000 at 08:45:08PM -0700, Hans Reiser wrote:
> > > 
> > > This is the reason because of what I think that one operation in the
> > > address space makes no sense.  No sense because it can't be called
> > > from the page.
> > 
> > What do you think of my argument that each of the subcaches should register
> > currently_consuming counters which are the number of pages that subcache
> > currently takes up in memory,
> 
> There is no need for subcaches at all if all of the pages can be
> represented on the page cache LRU lists.  That would certainly
> make balancing between caches easier.

Wouldn't this mean we could end up with an LRU cache full of
unfreeable pages?

Then we would scan the LRU cache and apply pressure on all of
the filesystems, but then the filesystem could decide it wants
to flush *other* pages from the ones we have on the LRU queue.

This could get particularly nasty when we have a VM with
active / inactive / scavenge lists... (like what I'm working
on now)

Then again, if the filesystem knows which pages we want to
push, it could base the order in which it is going to flush
its blocks on that memory pressure. Then your scheme will
undoubtedly be the more robust one.

Question is, are the filesystems ready to play this game?

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 11:15                     ` Stephen C. Tweedie
  2000-06-07 13:23                       ` Rik van Riel
@ 2000-06-07 13:40                       ` Chris Mason
  2000-06-07 13:47                         ` Stephen C. Tweedie
  1 sibling, 1 reply; 60+ messages in thread
From: Chris Mason @ 2000-06-07 13:40 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Hans Reiser, Quintela Carreira Juan J.,
	Rik van Riel, bert hubert, linux-kernel, linux-mm,
	Alexander Zarochentcev


On Wed, 7 Jun 2000, Stephen C. Tweedie wrote:

> Hi,
> 
> On Tue, Jun 06, 2000 at 08:45:08PM -0700, Hans Reiser wrote:
> > > 
> > > This is the reason because of what I think that one operation in the
> > > address space makes no sense.  No sense because it can't be called
> > > from the page.
> > 
> > What do you think of my argument that each of the subcaches should register
> > currently_consuming counters which are the number of pages that subcache
> > currently takes up in memory,
> 
> There is no need for subcaches at all if all of the pages can be
> represented on the page cache LRU lists.  That would certainly make
> balancing between caches easier.  However, there may be caches which
> don't fit that model --- how would it work for ReiserFS if the cache 
> balancing was all done through the page cache?  

Right now, almost of the pinned pages will be buffer cache pages, and only
metadata is logged.  But, sometimes a data block must be flushed before
transaction commit, and those pages are pinned, but can be written at any
time.  I'm not sure I fully understand the issues with doing all the
balancing through the page cache...

Allocate on flush will be different, and the address_space->pressure()
method makes even more sense there.  Those pages will be on the LRU lists,
and you want the pressure function to be called on each page.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 13:23                       ` Rik van Riel
@ 2000-06-07 13:41                         ` Stephen C. Tweedie
  2000-06-07 14:27                           ` Rik van Riel
  2000-06-07 19:02                         ` journaling & VM (was: Re: reiserfs being part of the kernel:it'snot " Hans Reiser
  1 sibling, 1 reply; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 13:41 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stephen C. Tweedie, Hans Reiser, Quintela Carreira Juan J.,
	bert hubert, linux-kernel, Chris Mason, linux-mm,
	Alexander Zarochentcev

Hi,

On Wed, Jun 07, 2000 at 10:23:35AM -0300, Rik van Riel wrote:
> > 
> > There is no need for subcaches at all if all of the pages can be
> > represented on the page cache LRU lists.  That would certainly
> > make balancing between caches easier.
> 
> Wouldn't this mean we could end up with an LRU cache full of
> unfreeable pages?

Rik, we need the VM to track dirty pages anyway, precisely so that
we can obtain some degree of write throttling to avoid having the
whole of memory full of dirty pages.

If we get short of memory, we really need to start flushing dirty
pages to disk independently of the task of finding free pages.  
Interrupts cannot wait for IO to complete --- they need the free 
memory immediately.  Page cleaning needs to be identified as a 
very different job from page reclaiming.  Whatever list we use to
track dirty pages can equally well be used for callbacks to 
transactional filesystems.

> This could get particularly nasty when we have a VM with
> active / inactive / scavenge lists... (like what I'm working
> on now)

Right, we definitely need a better distinction between different
lists and different types of page activity before we can do this.

> Question is, are the filesystems ready to play this game?

With an address_space callback, yes --- ext3 can certainly find
a transaction covering a given page.  I'd imagine reiserfs can do
something similar, but even if not, it's not important if the
filesystem can't do its lookup by page.  The mere fact that the
filesystem sees the VM trying to scavenge dirty pages can trigger
it into starting to flush its oldest transactions, and that is 
something that all filesystems should be able to do easily.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 13:40                       ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Chris Mason
@ 2000-06-07 13:47                         ` Stephen C. Tweedie
  0 siblings, 0 replies; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 13:47 UTC (permalink / raw)
  To: Chris Mason
  Cc: Stephen C. Tweedie, Hans Reiser, Quintela Carreira Juan J.,
	Rik van Riel, bert hubert, linux-kernel, linux-mm,
	Alexander Zarochentcev

Hi,

On Wed, Jun 07, 2000 at 06:40:24AM -0700, Chris Mason wrote:
> 
> Right now, almost of the pinned pages will be buffer cache pages, and only
> metadata is logged.  But, sometimes a data block must be flushed before
> transaction commit, and those pages are pinned, but can be written at any
> time.  I'm not sure I fully understand the issues with doing all the
> balancing through the page cache...

In 2.4, it's not a problem in principle to keep the buffer cache pages
on the page cache LRUs, even if they are not on the page cache hash 
lists.

> Allocate on flush will be different, and the address_space->pressure()
> method makes even more sense there.  Those pages will be on the LRU lists,
> and you want the pressure function to be called on each page.

Absolutely.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 13:41                         ` Stephen C. Tweedie
@ 2000-06-07 14:27                           ` Rik van Riel
  2000-06-07 14:46                             ` Stephen C. Tweedie
  0 siblings, 1 reply; 60+ messages in thread
From: Rik van Riel @ 2000-06-07 14:27 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Hans Reiser, Quintela Carreira Juan J.,
	bert hubert, linux-kernel, Chris Mason, linux-mm,
	Alexander Zarochentcev

On Wed, 7 Jun 2000, Stephen C. Tweedie wrote:
> On Wed, Jun 07, 2000 at 10:23:35AM -0300, Rik van Riel wrote:
> > > 
> > > There is no need for subcaches at all if all of the pages can be
> > > represented on the page cache LRU lists.  That would certainly
> > > make balancing between caches easier.
> > 
> > Wouldn't this mean we could end up with an LRU cache full of
> > unfreeable pages?
> 
> Rik, we need the VM to track dirty pages anyway, precisely so
> that we can obtain some degree of write throttling to avoid
> having the whole of memory full of dirty pages.

*nod*

> If we get short of memory, we really need to start flushing dirty
> pages to disk independently of the task of finding free pages.  

Indeed, page replacement and page flushing need to be
pretty much independant of each other, with the only
gotcha that page replacement is able to trigger page
flushing...

> > This could get particularly nasty when we have a VM with
> > active / inactive / scavenge lists... (like what I'm working
> > on now)
> 
> Right, we definitely need a better distinction between different
> lists and different types of page activity before we can do this.

I think we'll want something like what FreeBSD has, mainly
because their feedback loop is really simple and has proven
to be robust.

1) active list
	This list contains the pages which are active, we
	age the pages, they can be mapped in processes
2) inactive list
	All pages here are ready to be reclaimed. We are
	free to reclaim the clean inactive page before the
	dirty ones (to delay/minimise IO) because no page
	ends up here unless we want to reclaim it anyway.
3) scavenge list   (BSD calls this cache list, -EOVERLOADEDWORD)
	All pages here are clean and can be reclaimed for
	all page allocations which have __GFP_WAIT set. We
	keep only a minimal amount of free pages and most
	times __alloc_pages() is called we'll take a scavenge
	page instead.
4) free list
	Not much of a list, the current free page structure.
	We use the pages here for atomic allocations and, when
	we have too many free pages, for normal allocations.

The filesystem callbacks would be made for pages on the
inactive list, the filesystem (or shm, or swap subsystem)
is free to cluster any "eligable" pages with the page we
requested to be freed.

So if, eg., we request ext3 to flush page X, the filesystem
can make its own decision on if it wants to also flush some
other inactive (or even active) pages which are contiguous
on disk with the block page X is written to.

> > Question is, are the filesystems ready to play this game?
> 
> With an address_space callback, yes --- ext3 can certainly find
> a transaction covering a given page. 

This is what we need...

> I'd imagine reiserfs can do something similar, but even if not,
> it's not important if the filesystem can't do its lookup by
> page.

I don't necessarily agree on this point. What if our
inactive list is filled with pages the filesystem somehow
regards as new, and the filesystem will be busy flushing
the "wrong" (in the eyes of the page stealer) pages?

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 14:27                           ` Rik van Riel
@ 2000-06-07 14:46                             ` Stephen C. Tweedie
  2000-06-07 14:51                               ` bert hubert
  2000-06-07 15:20                               ` Quintela Carreira Juan J.
  0 siblings, 2 replies; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 14:46 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stephen C. Tweedie, Hans Reiser, Quintela Carreira Juan J.,
	bert hubert, linux-kernel, Chris Mason, linux-mm,
	Alexander Zarochentcev

Hi,

On Wed, Jun 07, 2000 at 11:27:56AM -0300, Rik van Riel wrote:
> 
> > I'd imagine reiserfs can do something similar, but even if not,
> > it's not important if the filesystem can't do its lookup by
> > page.
> 
> I don't necessarily agree on this point. What if our
> inactive list is filled with pages the filesystem somehow
> regards as new, and the filesystem will be busy flushing
> the "wrong" (in the eyes of the page stealer) pages?

It doesn't matter.  *If* the filesystem knows better than the 
page cleaner what progress can be made, then let the filesystem
make progress where it can.  There are likely to be transaction
dependencies which mean we have to clean some pages in a specific
order.  As soon as the page cleaner starts exerting back pressure
on the filesystem, the filesystem needs to start clearing stuff,
and if that means we have to start cleaning things that shrink_
mmap didn't expect us to, then that's fine.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 14:46                             ` Stephen C. Tweedie
@ 2000-06-07 14:51                               ` bert hubert
  2000-06-07 15:20                               ` Quintela Carreira Juan J.
  1 sibling, 0 replies; 60+ messages in thread
From: bert hubert @ 2000-06-07 14:51 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Rik van Riel, Hans Reiser, Quintela Carreira Juan J.,
	linux-kernel, Chris Mason, linux-mm, Alexander Zarochentcev

On Wed, Jun 07, 2000 at 03:46:20PM +0100, Stephen C. Tweedie wrote:

> It doesn't matter.  *If* the filesystem knows better than the 
> page cleaner what progress can be made, then let the filesystem
> make progress where it can.  There are likely to be transaction

I'm happy to see you talking to each other in a productive way. Once I said
it wasn't just about the code, all you guys have been talking about is
design :-)

But the main point of this message is that you can stop CC'ing me, as this
is all far over my head. 

Thanks.

Regards,

Bert Hubert.

-- 
                       |              http://www.rent-a-nerd.nl
                       |                     - U N I X -
                       |          Inspice et cautus eris - D11T'95
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 14:46                             ` Stephen C. Tweedie
  2000-06-07 14:51                               ` bert hubert
@ 2000-06-07 15:20                               ` Quintela Carreira Juan J.
  2000-06-07 15:35                                 ` Stephen C. Tweedie
  2000-06-07 20:16                                 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Hans Reiser
  1 sibling, 2 replies; 60+ messages in thread
From: Quintela Carreira Juan J. @ 2000-06-07 15:20 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Rik van Riel, Hans Reiser, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

>>>>> "stephen" == Stephen C Tweedie <sct@redhat.com> writes:

Hi

stephen> It doesn't matter.  *If* the filesystem knows better than the 
stephen> page cleaner what progress can be made, then let the filesystem
stephen> make progress where it can.  There are likely to be transaction
stephen> dependencies which mean we have to clean some pages in a specific
stephen> order.  As soon as the page cleaner starts exerting back pressure
stephen> on the filesystem, the filesystem needs to start clearing stuff,
stephen> and if that means we have to start cleaning things that shrink_
stephen> mmap didn't expect us to, then that's fine.

I don't like that, if you put some page in the LRU cache, that means
that you think that _this_ page is freeable.  Yes some times that can
fail, but in the _normal_ case things just work that way.  It doesn't
make sense to have pages in the LRU cache that are unfreeable and each
time that we ask the filesystem code to free them it tolds us: 
     - Well that page is actually busy, but I have that other free
       instead. 
If we really need a notify to the relevant fs that tells it: We are
short of memory, please free as much memory as possible.  Where as
much as possible is an ammount related to the priority number (or any
other number).

I like the idea of having pages of Journaled FS in the cache if I can
ask the FS:  free this page, and the fs will write/free that page and
*possible* more pages, but I am not *interested* in that detail.

If you need pages in the LRU cache only for getting notifications,
then change the system to send notifications each time that we are
short of memory.

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 15:20                               ` Quintela Carreira Juan J.
@ 2000-06-07 15:35                                 ` Stephen C. Tweedie
  2000-06-07 15:41                                   ` Rik van Riel
                                                     ` (3 more replies)
  2000-06-07 20:16                                 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Hans Reiser
  1 sibling, 4 replies; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 15:35 UTC (permalink / raw)
  To: Quintela Carreira Juan J.
  Cc: Stephen C. Tweedie, Rik van Riel, Hans Reiser, bert hubert,
	linux-kernel, Chris Mason, linux-mm, Alexander Zarochentcev

Hi,

On Wed, Jun 07, 2000 at 05:20:41PM +0200, Quintela Carreira Juan J. wrote:
> 
> stephen> It doesn't matter.  *If* the filesystem knows better than the 
> stephen> page cleaner what progress can be made, then let the filesystem
> stephen> make progress where it can.  There are likely to be transaction
> stephen> dependencies which mean we have to clean some pages in a specific
> stephen> order.  As soon as the page cleaner starts exerting back pressure
> stephen> on the filesystem, the filesystem needs to start clearing stuff,
> stephen> and if that means we have to start cleaning things that shrink_
> stephen> mmap didn't expect us to, then that's fine.
> 
> I don't like that, if you put some page in the LRU cache, that means
> that you think that _this_ page is freeable.

Remember that Rik is talking about multiple LRUs.  Pages can only
be on the inactive LRU if they are clean and unpinned, yes, but we
still need a way of tracking pages which are in a more difficult
state.

> If you need pages in the LRU cache only for getting notifications,
> then change the system to send notifications each time that we are
> short of memory.

It's a matter of pressure.  The filesystem with most pages in the LRU
cache, or with the oldest pages there, should stand the greatest chance
of being the first one told to clean up its act.

Cheers, 
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 15:35                                 ` Stephen C. Tweedie
@ 2000-06-07 15:41                                   ` Rik van Riel
  2000-06-07 15:44                                   ` Juan J. Quintela
                                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 60+ messages in thread
From: Rik van Riel @ 2000-06-07 15:41 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Quintela Carreira Juan J.,
	Hans Reiser, bert hubert, linux-kernel, Chris Mason, linux-mm,
	Alexander Zarochentcev

On Wed, 7 Jun 2000, Stephen C. Tweedie wrote:
> On Wed, Jun 07, 2000 at 05:20:41PM +0200, Quintela Carreira Juan J. wrote:
> > 
> > stephen> It doesn't matter.  *If* the filesystem knows better than the 
> > stephen> page cleaner what progress can be made, then let the filesystem
> > stephen> make progress where it can.  There are likely to be transaction
> > stephen> dependencies which mean we have to clean some pages in a specific
> > stephen> order.  As soon as the page cleaner starts exerting back pressure
> > stephen> on the filesystem, the filesystem needs to start clearing stuff,
> > stephen> and if that means we have to start cleaning things that shrink_
> > stephen> mmap didn't expect us to, then that's fine.
> > 
> > I don't like that, if you put some page in the LRU cache, that means
> > that you think that _this_ page is freeable.
> 
> Remember that Rik is talking about multiple LRUs.  Pages can
> only be on the inactive LRU if they are clean and unpinned, yes,
> but we still need a way of tracking pages which are in a more
> difficult state.

That's the scavenge list ;)

The inactive list contains unmapped pages with age 0, from
the inactive list I want to clean the pages whenever there's
demand for memory.

This could potentially mean that the pinned buffers from one
fs would be spread over both the active and the inactive list.

> > If you need pages in the LRU cache only for getting notifications,
> > then change the system to send notifications each time that we are
> > short of memory.
> 
> It's a matter of pressure.  The filesystem with most pages in
> the LRU cache, or with the oldest pages there, should stand the
> greatest chance of being the first one told to clean up its act.

Indeed, the more I think of it the more I think any other
approach than shared-lru is the right one.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 15:35                                 ` Stephen C. Tweedie
  2000-06-07 15:41                                   ` Rik van Riel
@ 2000-06-07 15:44                                   ` Juan J. Quintela
  2000-06-07 17:10                                   ` Jeff V. Merkey
  2000-06-07 20:16                                   ` Hans Reiser
  3 siblings, 0 replies; 60+ messages in thread
From: Juan J. Quintela @ 2000-06-07 15:44 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Rik van Riel, Hans Reiser, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

>>>>> "stephen" == Stephen C Tweedie <sct@redhat.com> writes:

Hi

stephen> Remember that Rik is talking about multiple LRUs.  Pages can only
stephen> be on the inactive LRU if they are clean and unpinned, yes, but we
stephen> still need a way of tracking pages which are in a more difficult
stephen> state.

erhhh, If I have understand well Rik, pages in the inactive queue can
be dirty, they need to be unmmaped, but not clean.  Rik, clarify here,
please.  And yes, if you put in the Inactive queues only unpinned
page, I retire all my objections :)  But I think that all the unpinned
pages are freeable after a (possible needed write).

>> If you need pages in the LRU cache only for getting notifications,
>> then change the system to send notifications each time that we are
>> short of memory.

stephen> It's a matter of pressure.  The filesystem with most pages in the LRU
stephen> cache, or with the oldest pages there, should stand the greatest chance
stephen> of being the first one told to clean up its act.

Then if the 10 oldest pages in the LRU are from that subsystem, we
call a notifier 10 times.  That means that that subsystem will try to
free pages 10 times.  As each time it does it own clustering, etc,
etc, he has freed a *lot* of pages, when we will expect only to free
10 pages.  That means a bit unfair to me. 

Thanks a lot for your comments.

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM
  2000-06-07 11:12                 ` Stephen C. Tweedie
@ 2000-06-07 16:35                   ` John Fremlin
  2000-06-07 17:11                     ` Stephen C. Tweedie
  2000-06-07 17:48                   ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Hans Reiser
  1 sibling, 1 reply; 60+ messages in thread
From: John Fremlin @ 2000-06-07 16:35 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel, linux-mm

"Stephen C. Tweedie" <sct@redhat.com> writes:
[...]
> > There are two issues to address:
> > 
> > 1) If a buffer needs to be flushed to disk, how do we let the FS flush
> > everything else that it is optimal to flush at the same time as that buffer. 
> > zam's allocate on flush code addresses that issue for reiserfs, and he has some
> > general hooks implemented also.  He is guessed to be two weeks away.
> 
> That's easy to deal with using address_space callbacks from shrink_mmap.
> shrink_mmap just calls into the filesystem to tell it that something
> needs to be done.  The filesystem can, in response, flush as much data
> as it wants to in addition to the page requested --- or can flush none
> at all if the page is pinned.  The address_space callbacks should be
> thought of as hints from the VM that the filesystem needs to do 
> something.  shrink_mmap will keep on trying until it finds something
> to free if nothing happens on the first call.
> 
I don't understand the idea behind this. (Clueless newbie alert.)

You are saying, that the MM system maintains a list of pages, then
when it wants to free some memory it goes down the list seeing which
subsystem owns each page, and asks it to free some memory. (Correct me
if I am wrong).
That is, each filesystem or whatever can basically implement its own
MM. If so, why not simply have a list of subsystems that own memory
with some sort of measure of how much space they're wasting, and ask
the ones with a lot to free some?

-- 

	http://altern.org/vii
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 15:35                                 ` Stephen C. Tweedie
  2000-06-07 15:41                                   ` Rik van Riel
  2000-06-07 15:44                                   ` Juan J. Quintela
@ 2000-06-07 17:10                                   ` Jeff V. Merkey
  2000-06-07 17:14                                     ` Stephen C. Tweedie
  2000-06-07 20:16                                   ` Hans Reiser
  3 siblings, 1 reply; 60+ messages in thread
From: Jeff V. Merkey @ 2000-06-07 17:10 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Quintela Carreira Juan J.,
	Rik van Riel, Hans Reiser, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

Stephen,

When will the journalling subsystem you are working on be available, and
where can I get it to start integration work.  It sounds like you will
be "bundling"  associated LRU meta-data blocks in the buffer cache for
journal commits?  What Alan described to me sounds fairly decent.  I am
wondering when you will have this posted so the rest of us can
instrument your journalling code into our FS's.

Please advise.

:-)

Jeff 

"Stephen C. Tweedie" wrote:
> 
> Hi,
> 
> On Wed, Jun 07, 2000 at 05:20:41PM +0200, Quintela Carreira Juan J. wrote:
> >
> > stephen> It doesn't matter.  *If* the filesystem knows better than the
> > stephen> page cleaner what progress can be made, then let the filesystem
> > stephen> make progress where it can.  There are likely to be transaction
> > stephen> dependencies which mean we have to clean some pages in a specific
> > stephen> order.  As soon as the page cleaner starts exerting back pressure
> > stephen> on the filesystem, the filesystem needs to start clearing stuff,
> > stephen> and if that means we have to start cleaning things that shrink_
> > stephen> mmap didn't expect us to, then that's fine.
> >
> > I don't like that, if you put some page in the LRU cache, that means
> > that you think that _this_ page is freeable.
> 
> Remember that Rik is talking about multiple LRUs.  Pages can only
> be on the inactive LRU if they are clean and unpinned, yes, but we
> still need a way of tracking pages which are in a more difficult
> state.
> 
> > If you need pages in the LRU cache only for getting notifications,
> > then change the system to send notifications each time that we are
> > short of memory.
> 
> It's a matter of pressure.  The filesystem with most pages in the LRU
> cache, or with the oldest pages there, should stand the greatest chance
> of being the first one told to clean up its act.
> 
> Cheers,
>  Stephen
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: reiserfs being part of the kernel: it's not just the code
  2000-06-07 11:00               ` reiserfs being part of the kernel: it's not just the code Stephen C. Tweedie
@ 2000-06-07 17:11                 ` Rik van Riel
  2000-06-07 17:13                   ` Stephen C. Tweedie
  2000-06-07 17:46                 ` Hans Reiser
  1 sibling, 1 reply; 60+ messages in thread
From: Rik van Riel @ 2000-06-07 17:11 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Hans Reiser, bert hubert, linux-kernel, Chris Mason, linux-mm

On Wed, 7 Jun 2000, Stephen C. Tweedie wrote:

> Who will be at Usenix in San Diego in a couple of weeks' time?  
> There will certainly be some of the XFS and GFS people there,
> and I'll be around all week.

I won't be there.

Maybe OLS would be a more suitable event to discuss these
matters?  Most of the people involved seem to be speaking
at OLS anyway (and the GFS people are relatively near, at
car or train distance, almost).

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM
  2000-06-07 16:35                   ` journaling & VM John Fremlin
@ 2000-06-07 17:11                     ` Stephen C. Tweedie
       [not found]                       ` <20000608114435.A15433@uni-koblenz.de>
  0 siblings, 1 reply; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 17:11 UTC (permalink / raw)
  To: John Fremlin; +Cc: Stephen C. Tweedie, linux-kernel, linux-mm

Hi,

On Wed, Jun 07, 2000 at 05:35:13PM +0100, John Fremlin wrote:
> 
> You are saying, that the MM system maintains a list of pages, then
> when it wants to free some memory it goes down the list seeing which
> subsystem owns each page, and asks it to free some memory. (Correct me
> if I am wrong).
> That is, each filesystem or whatever can basically implement its own
> MM. If so, why not simply have a list of subsystems that own memory
> with some sort of measure of how much space they're wasting, and ask
> the ones with a lot to free some?

Because you want to have some idea of the usage patterns of the 
pages, too, so that you can free pages which haven't been accessed 
recently regardless of who owns them.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: reiserfs being part of the kernel: it's not just the code
  2000-06-07 17:11                 ` Rik van Riel
@ 2000-06-07 17:13                   ` Stephen C. Tweedie
  0 siblings, 0 replies; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 17:13 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stephen C. Tweedie, Hans Reiser, bert hubert, linux-kernel,
	Chris Mason, linux-mm

Hi,

On Wed, Jun 07, 2000 at 02:11:42PM -0300, Rik van Riel wrote:

> Maybe OLS would be a more suitable event to discuss these
> matters?  Most of the people involved seem to be speaking
> at OLS anyway (and the GFS people are relatively near, at
> car or train distance, almost).

Sure, I can do OLS too.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 17:10                                   ` Jeff V. Merkey
@ 2000-06-07 17:14                                     ` Stephen C. Tweedie
  2000-06-07 17:21                                       ` Jeff V. Merkey
  0 siblings, 1 reply; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 17:14 UTC (permalink / raw)
  To: Jeff V. Merkey
  Cc: Stephen C. Tweedie, Quintela Carreira Juan J.,
	Rik van Riel, Hans Reiser, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

Hi,

On Wed, Jun 07, 2000 at 11:10:28AM -0600, Jeff V. Merkey wrote:
> 
> When will the journalling subsystem you are working on be available, and
> where can I get it to start integration work.  It sounds like you will
> be "bundling"  associated LRU meta-data blocks in the buffer cache for
> journal commits?  What Alan described to me sounds fairly decent.  I am
> wondering when you will have this posted so the rest of us can
> instrument your journalling code into our FS's.

Have a look at the fs/jfs directory in ext3 if you want to see
what I've been implementing.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 17:14                                     ` Stephen C. Tweedie
@ 2000-06-07 17:21                                       ` Jeff V. Merkey
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff V. Merkey @ 2000-06-07 17:21 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Quintela Carreira Juan J.,
	Rik van Riel, Hans Reiser, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

Stephen,

I will go look at it.

Thanks 

:-) :-) :-) :-)

Jeff

"Stephen C. Tweedie" wrote:
> 
> Hi,
> 
> On Wed, Jun 07, 2000 at 11:10:28AM -0600, Jeff V. Merkey wrote:
> >
> > When will the journalling subsystem you are working on be available, and
> > where can I get it to start integration work.  It sounds like you will
> > be "bundling"  associated LRU meta-data blocks in the buffer cache for
> > journal commits?  What Alan described to me sounds fairly decent.  I am
> > wondering when you will have this posted so the rest of us can
> > instrument your journalling code into our FS's.
> 
> Have a look at the fs/jfs directory in ext3 if you want to see
> what I've been implementing.
> 
> Cheers,
>  Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: reiserfs being part of the kernel: it's not just the code
  2000-06-07 11:00               ` reiserfs being part of the kernel: it's not just the code Stephen C. Tweedie
  2000-06-07 17:11                 ` Rik van Riel
@ 2000-06-07 17:46                 ` Hans Reiser
  2000-06-07 19:53                   ` Stephen C. Tweedie
  1 sibling, 1 reply; 60+ messages in thread
From: Hans Reiser @ 2000-06-07 17:46 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: bert hubert, linux-kernel, Chris Mason, linux-mm

"Stephen C. Tweedie" wrote:
> 
> Hi,
> 
> On Tue, Jun 06, 2000 at 07:00:40PM -0700, Hans Reiser wrote:
> >
> > Do I miss a finepoint, or can this reservation API be as simple as using an
> > agreed on counter for total system pinned pages which is constrained to some
> > percentage of memory?  I think we all discussed all of this last year, and the
> > workshop Riel tried to organize sadly never happened.
> 
> It's a good bit more complex than that.  We need not only that reservation
> layer, but also a new notification mechanism to invoke early commit if
> we exhaust the reservation limit, and a way of interacting with dirty
> pages (which are not yet part of any transaction, but which may not be
> flushable to disk without a new transaction being incurred).  The dirty
> mmaped data case is particularly nasty: we have very little VM
> infrastructure right now which is suitable for fixing that.

Have the FS stall if the limit is reached, and if the limit is reached, increase
memory pressure invoking the mechanism that will drive allocate on flush.

The FS needs a lot of code, VFS needs something around ten lines, yes?

> 
> > Perhaps we should do a
> > workshop July 5 at the Libre Software conference in France?  Probably this issue
> > will already be solved by then, but there are plenty of other discussions to
> > have in the vicinity of this problem.
> 
> Who will be at Usenix in San Diego in a couple of weeks' time?  There
> will certainly be some of the XFS and GFS people there, and I'll be
> around all week.
> 
> Cheers,
>  Stephen

None of the ReiserFS team will be there.  I can see you at the UK thing, I know
you are planning on going there.  If it is the Libre conference, I can probably
get zam and any other key people flown there is the thing.  You don't have to
attend the whole conference.... I can't because I am speaking at the UK one
also....  If you have a different conference you prefer, see if you can get
people flown there....

Hans
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 11:12                 ` Stephen C. Tweedie
  2000-06-07 16:35                   ` journaling & VM John Fremlin
@ 2000-06-07 17:48                   ` Hans Reiser
  2000-06-07 18:01                     ` Rik van Riel
  1 sibling, 1 reply; 60+ messages in thread
From: Hans Reiser @ 2000-06-07 17:48 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Rik van Riel, bert hubert, linux-kernel, Chris Mason, linux-mm,
	Alexander Zarochentcev

"Stephen C. Tweedie" wrote:

> Use reservations.  That's the point --- you reserve in advance, so that
> the VM can *guarantee* that you can continue to pin more pages up to
> the maximum you have reserved.  You take a reservation before starting
> a fs operation, so that if you need to block, it doesn't prevent the
> running transaction from being committed.
> 
> Cheers,
>  Stephen

Ok, let's admit it, we have been agreeing on this with you for 9 months and no
code has been written by any of us.:-/

Hnas
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 17:48                   ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Hans Reiser
@ 2000-06-07 18:01                     ` Rik van Riel
  2000-06-07 19:58                       ` Stephen C. Tweedie
  0 siblings, 1 reply; 60+ messages in thread
From: Rik van Riel @ 2000-06-07 18:01 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Stephen C. Tweedie, bert hubert, linux-kernel, Chris Mason,
	linux-mm, Alexander Zarochentcev

On Wed, 7 Jun 2000, Hans Reiser wrote:
> "Stephen C. Tweedie" wrote:
> 
> > Use reservations.  That's the point --- you reserve in advance, so that
> > the VM can *guarantee* that you can continue to pin more pages up to
> > the maximum you have reserved.  You take a reservation before starting
> > a fs operation, so that if you need to block, it doesn't prevent the
> > running transaction from being committed.
> 
> Ok, let's admit it, we have been agreeing on this with you for 9
> months and no code has been written by any of us.:-/

I'd like to be able to keep stuff simple in the shrink_mmap
"equivalent" I'm working on. Something like:

if (PageDirty(page) && page->mapping && page->mapping->flush)
	maxlaunder -= page->mapping->flush();

Where the flush() function would return the amount of _inactive_
pages that were flushed at the time we called this function...
(we should not decrease maxlaunder if we flushed active pages
since that would imply we didn't make any progress)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel:it'snot just the code)
  2000-06-07 13:23                       ` Rik van Riel
  2000-06-07 13:41                         ` Stephen C. Tweedie
@ 2000-06-07 19:02                         ` Hans Reiser
  1 sibling, 0 replies; 60+ messages in thread
From: Hans Reiser @ 2000-06-07 19:02 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stephen C. Tweedie, Quintela Carreira Juan J.,
	bert hubert, linux-kernel, Chris Mason, linux-mm,
	Alexander Zarochentcev

Rik van Riel wrote:
> 
> On Wed, 7 Jun 2000, Stephen C. Tweedie wrote:
> > On Tue, Jun 06, 2000 at 08:45:08PM -0700, Hans Reiser wrote:
> > > >
> > > > This is the reason because of what I think that one operation in the
> > > > address space makes no sense.  No sense because it can't be called
> > > > from the page.
> > >
> > > What do you think of my argument that each of the subcaches should register
> > > currently_consuming counters which are the number of pages that subcache
> > > currently takes up in memory,
> >
> > There is no need for subcaches at all if all of the pages can be
> > represented on the page cache LRU lists.  That would certainly
> > make balancing between caches easier.
> 
> Wouldn't this mean we could end up with an LRU cache full of
> unfreeable pages?
> 
> Then we would scan the LRU cache and apply pressure on all of
> the filesystems, but then the filesystem could decide it wants
> to flush *other* pages from the ones we have on the LRU queue.

And we intend to do exactly that with allocate on flush.  Eventually we will
even repack on flush.

> 
> This could get particularly nasty when we have a VM with
> active / inactive / scavenge lists... (like what I'm working
> on now)
> 
> Then again, if the filesystem knows which pages we want to
> push, it could base the order in which it is going to flush
> its blocks on that memory pressure. Then your scheme will
> undoubtedly be the more robust one.
> 
> Question is, are the filesystems ready to play this game? 

Yes, we are eager to play, but you do intend that the filesystem will be
pressured to age not flush, yes?

That is, if aging causes something to get flushed, it gets flushed, but if not
then not.
The filesystems should get passed some notion of how much of their cache to age
so that you MM guys can have fun varying this.

You might want us to return how much got scheduled for flushing as a result of
the aging, that way you know when to stop pressuring caches.

> 
> regards,
> 
> Rik
> --
> The Internet is not a network of computers. It is a network
> of people. That is its real strength.
> 
> Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
> http://www.conectiva.com/               http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: reiserfs being part of the kernel: it's not just the code
  2000-06-07 17:46                 ` Hans Reiser
@ 2000-06-07 19:53                   ` Stephen C. Tweedie
  0 siblings, 0 replies; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 19:53 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Stephen C. Tweedie, bert hubert, linux-kernel, Chris Mason, linux-mm

Hi,

On Wed, Jun 07, 2000 at 10:46:16AM -0700, Hans Reiser wrote:
> 
> Have the FS stall if the limit is reached, and if the limit is reached, increase
> memory pressure invoking the mechanism that will drive allocate on flush.
> 
> The FS needs a lot of code, VFS needs something around ten lines, yes?

It's not the VFS as much as the VM which needs the work.  For example,
currently we have no way of exerting flow control on processes generating
dirty pages via mmap(), and fixing that requires work in the page fault
path.

> None of the ReiserFS team will be there.  I can see you at the UK thing, I know
> you are planning on going there.

OK, will you be bringing any other folks there?  

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 18:01                     ` Rik van Riel
@ 2000-06-07 19:58                       ` Stephen C. Tweedie
  2000-06-07 20:56                         ` Juan J. Quintela
  0 siblings, 1 reply; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 19:58 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Hans Reiser, Stephen C. Tweedie, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

Hi,

On Wed, Jun 07, 2000 at 03:01:22PM -0300, Rik van Riel wrote:
> 
> I'd like to be able to keep stuff simple in the shrink_mmap
> "equivalent" I'm working on. Something like:
> 
> if (PageDirty(page) && page->mapping && page->mapping->flush)
> 	maxlaunder -= page->mapping->flush();

That looks ideal.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 15:20                               ` Quintela Carreira Juan J.
  2000-06-07 15:35                                 ` Stephen C. Tweedie
@ 2000-06-07 20:16                                 ` Hans Reiser
  2000-06-07 20:54                                   ` Stephen C. Tweedie
  1 sibling, 1 reply; 60+ messages in thread
From: Hans Reiser @ 2000-06-07 20:16 UTC (permalink / raw)
  To: Quintela Carreira Juan J.
  Cc: Stephen C. Tweedie, Rik van Riel, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

"Quintela Carreira Juan J." wrote:

> 
> If you need pages in the LRU cache only for getting notifications,
> then change the system to send notifications each time that we are
> short of memory.

I think the right thing is for the filesystems to use the LRU code as templates
from which they may vary or not from in implementing their subcaches with their
own lists.  I say this for intuitive not concrete reasons.  In other words, I
agree with Juan.

Hans
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 15:35                                 ` Stephen C. Tweedie
                                                     ` (2 preceding siblings ...)
  2000-06-07 17:10                                   ` Jeff V. Merkey
@ 2000-06-07 20:16                                   ` Hans Reiser
  2000-06-07 21:20                                     ` Rik van Riel
  3 siblings, 1 reply; 60+ messages in thread
From: Hans Reiser @ 2000-06-07 20:16 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Quintela Carreira Juan J.,
	Rik van Riel, bert hubert, linux-kernel, Chris Mason, linux-mm,
	Alexander Zarochentcev

"Stephen C. Tweedie" wrote:

> 
> It's a matter of pressure.  The filesystem with most pages in the LRU
> cache, or with the oldest pages there, should stand the greatest chance
> of being the first one told to clean up its act.
> 
> Cheers,
>  Stephen
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

The new age one 64th of your objects scheme causes pressure to be
proportional.....

I am looking forward to reading the new 2.4 mm code during my next aeroflot
experience this sunday....

Hans
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 20:16                                 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Hans Reiser
@ 2000-06-07 20:54                                   ` Stephen C. Tweedie
  2000-06-07 21:29                                     ` Hans Reiser
  0 siblings, 1 reply; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 20:54 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Quintela Carreira Juan J.,
	Stephen C. Tweedie, Rik van Riel, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

Hi,

On Wed, Jun 07, 2000 at 01:16:04PM -0700, Hans Reiser wrote:

> "Quintela Carreira Juan J." wrote:
> > If you need pages in the LRU cache only for getting notifications,
> > then change the system to send notifications each time that we are
> > short of memory.
> 
> I think the right thing is for the filesystems to use the LRU code as templates
> from which they may vary or not from in implementing their subcaches with their
> own lists.  I say this for intuitive not concrete reasons.

Every time we have tried to keep the caches completely separate, we 
have ended up losing the ability to balance the various caches against 
each other.  The major advantage of a common set of LRU lists is that
it gives us a basis for a balanced VM.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 19:58                       ` Stephen C. Tweedie
@ 2000-06-07 20:56                         ` Juan J. Quintela
  2000-06-07 21:14                           ` Rik van Riel
  2000-06-07 21:24                           ` Stephen C. Tweedie
  0 siblings, 2 replies; 60+ messages in thread
From: Juan J. Quintela @ 2000-06-07 20:56 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Rik van Riel, Hans Reiser, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

>>>>> "sct" == Stephen C Tweedie <sct@redhat.com> writes:

Hi

>> I'd like to be able to keep stuff simple in the shrink_mmap
>> "equivalent" I'm working on. Something like:
>> 
>> if (PageDirty(page) && page->mapping && page->mapping->flush)
>> maxlaunder -= page->mapping->flush();

sct> That looks ideal.

But this is supposed to flush that _page_, at least in the normal
case.

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 20:56                         ` Juan J. Quintela
@ 2000-06-07 21:14                           ` Rik van Riel
  2000-06-07 21:24                           ` Stephen C. Tweedie
  1 sibling, 0 replies; 60+ messages in thread
From: Rik van Riel @ 2000-06-07 21:14 UTC (permalink / raw)
  To: Juan J. Quintela
  Cc: Stephen C. Tweedie, Hans Reiser, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

On 7 Jun 2000, Juan J. Quintela wrote:
> >>>>> "sct" == Stephen C Tweedie <sct@redhat.com> writes:
> 
> >> I'd like to be able to keep stuff simple in the shrink_mmap
> >> "equivalent" I'm working on. Something like:
> >> 
> >> if (PageDirty(page) && page->mapping && page->mapping->flush)
> >> maxlaunder -= page->mapping->flush();
> 
> sct> That looks ideal.
> 
> But this is supposed to flush that _page_, at least in the
> normal case.

But not *just* that page ...

In the ideal case the flush() function will search around
memory for objects to cluster and write out together with
this page.

I'll probably write an example page->mapping->flush()
function for swap. The function will do the following:
- find other swap pages to cluster with this page,
  those must be:
	- contiguous with this page
	- inactive or seldomly used active pages
	- dirty (duh)
- flush out the collection of pages
- return the number of INACTIVE pages we flushed,
  ignoring the number of active pages

That last point is very important because:
- if we mainly flushed active pages, we should not give
  shrink_mmap (or similar) the illusion that we cleared
  up the inactive list ... don't pretend we made a lot
  of progress cleaning inactive pages if we didn't
- since we wrote the pages in the same disk seek, writing
  the active pages was essentially for free so it doesn't
  matter that we don't report having written them ...
  (having written that page and potentially saving some IO
  later, otoh, definately does matter)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 20:16                                   ` Hans Reiser
@ 2000-06-07 21:20                                     ` Rik van Riel
  2000-06-07 21:52                                       ` journaling & VM Hans Reiser
  0 siblings, 1 reply; 60+ messages in thread
From: Rik van Riel @ 2000-06-07 21:20 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Stephen C. Tweedie, Quintela Carreira Juan J.,
	linux-kernel, Chris Mason, linux-mm, Alexander Zarochentcev

On Wed, 7 Jun 2000, Hans Reiser wrote:

> The new age one 64th of your objects scheme causes pressure to
> be proportional.....

Which is wrong, unless the oldest pages from each zone happen
to be the same age ;)

Suppose a 5MB SHM segment gets deattached and not used for a
long time. In this situation it makes little sense to round-robin
free from the different caches if the other caches are under more
pressure.

> I am looking forward to reading the new 2.4 mm code during my
> next aeroflot experience this sunday....

I'm working on it, but I can't promise to have all of the
active/inactive/scavenge list framework ready by then ;)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 20:56                         ` Juan J. Quintela
  2000-06-07 21:14                           ` Rik van Riel
@ 2000-06-07 21:24                           ` Stephen C. Tweedie
  2000-06-07 21:40                             ` Juan J. Quintela
  1 sibling, 1 reply; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 21:24 UTC (permalink / raw)
  To: Juan J. Quintela
  Cc: Stephen C. Tweedie, Rik van Riel, Hans Reiser, bert hubert,
	linux-kernel, Chris Mason, linux-mm, Alexander Zarochentcev

Hi,

On Wed, Jun 07, 2000 at 10:56:17PM +0200, Juan J. Quintela wrote:
> 
> >> I'd like to be able to keep stuff simple in the shrink_mmap
> >> "equivalent" I'm working on. Something like:
> >> 
> >> if (PageDirty(page) && page->mapping && page->mapping->flush)
> >> maxlaunder -= page->mapping->flush();
> 
> sct> That looks ideal.
> 
> But this is supposed to flush that _page_, at least in the normal
> case.

All transactional filesystems will have ordering constraints which
the core VM cannot know about.  In that case, the filesystem may
simply have no choice about cleaning and unpinning pages in a given
order.  For actually removing a page from memory, evicting precisely
the right page is far more important, but for writeback, it's
controlling the amount of dirty/pinned data from the various different
sources which counts.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 20:54                                   ` Stephen C. Tweedie
@ 2000-06-07 21:29                                     ` Hans Reiser
  2000-06-07 21:31                                       ` Rik van Riel
                                                         ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Hans Reiser @ 2000-06-07 21:29 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Quintela Carreira Juan J.,
	Rik van Riel, bert hubert, linux-kernel, Chris Mason, linux-mm,
	Alexander Zarochentcev

"Stephen C. Tweedie" wrote:
> 
> Hi,
> 
> On Wed, Jun 07, 2000 at 01:16:04PM -0700, Hans Reiser wrote:
> 
> > "Quintela Carreira Juan J." wrote:
> > > If you need pages in the LRU cache only for getting notifications,
> > > then change the system to send notifications each time that we are
> > > short of memory.
> >
> > I think the right thing is for the filesystems to use the LRU code as templates
> > from which they may vary or not from in implementing their subcaches with their
> > own lists.  I say this for intuitive not concrete reasons.
> 
> Every time we have tried to keep the caches completely separate, we
> have ended up losing the ability to balance the various caches against
> each other.  The major advantage of a common set of LRU lists is that
> it gives us a basis for a balanced VM.
> 
> Cheers,
>  Stephen

If I understand Juan correctly, they fixed this issue.  Aging 1/64th of the
cache for every cache evenly at every round of trying to free pages should be an
excellent fix.  It should do just fine at the task of handling a system with
both ext3 and reiserfs running.

Was this Juan's code that did this?  If so, kudos to him.

Hans
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 21:29                                     ` Hans Reiser
@ 2000-06-07 21:31                                       ` Rik van Riel
  2000-06-07 21:33                                       ` Stephen C. Tweedie
  2000-06-07 21:50                                       ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Juan J. Quintela
  2 siblings, 0 replies; 60+ messages in thread
From: Rik van Riel @ 2000-06-07 21:31 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Stephen C. Tweedie, Quintela Carreira Juan J.,
	linux-kernel, Chris Mason, linux-mm, Alexander Zarochentcev

On Wed, 7 Jun 2000, Hans Reiser wrote:
> "Stephen C. Tweedie" wrote:
> > On Wed, Jun 07, 2000 at 01:16:04PM -0700, Hans Reiser wrote:
> > > "Quintela Carreira Juan J." wrote:
> > > > If you need pages in the LRU cache only for getting notifications,
> > > > then change the system to send notifications each time that we are
> > > > short of memory.
> > >
> > > I think the right thing is for the filesystems to use the LRU code as templates
> > > from which they may vary or not from in implementing their subcaches with their
> > > own lists.  I say this for intuitive not concrete reasons.
> > 
> > Every time we have tried to keep the caches completely separate, we
> > have ended up losing the ability to balance the various caches against
> > each other.  The major advantage of a common set of LRU lists is that
> > it gives us a basis for a balanced VM.
> 
> If I understand Juan correctly, they fixed this issue.  Aging
> 1/64th of the cache for every cache evenly at every round of
> trying to free pages should be an excellent fix.  It should do
> just fine at the task of handling a system with both ext3 and
> reiserfs running.

Unfortunately it doesn't...

> Was this Juan's code that did this?  If so, kudos to him.

I believe Stephen made this code for 2.2, the code has served us
well but we've determined that having separate LRU queues just
isn't the way to go.

(explanation not repeated to avoid reader boredom)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 21:29                                     ` Hans Reiser
  2000-06-07 21:31                                       ` Rik van Riel
@ 2000-06-07 21:33                                       ` Stephen C. Tweedie
  2000-06-07 22:20                                         ` journaling & VM Hans Reiser
  2000-06-07 21:50                                       ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Juan J. Quintela
  2 siblings, 1 reply; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 21:33 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Stephen C. Tweedie, Quintela Carreira Juan J.,
	Rik van Riel, bert hubert, linux-kernel, Chris Mason, linux-mm,
	Alexander Zarochentcev

Hi,

On Wed, Jun 07, 2000 at 02:29:25PM -0700, Hans Reiser wrote:
> 
> If I understand Juan correctly, they fixed this issue.  Aging 1/64th of the
> cache for every cache evenly at every round of trying to free pages should be an
> excellent fix.  It should do just fine at the task of handling a system with
> both ext3 and reiserfs running.

That is _exactly_ what breaks the VM balance!  The net result of
an algorithm like that is that all caches are shrunk at the same
rate regardless of which ones are busy.  The "shrink everything
at once" principle is what used to cause large filesystem scans 
(such as find|grep over a large source tree) to swap all our
running processes out.

There _has_ to be a way to allow the relative ages of the different
pages to influence the reclamation of pages from different sources.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 21:24                           ` Stephen C. Tweedie
@ 2000-06-07 21:40                             ` Juan J. Quintela
  2000-06-07 21:49                               ` Stephen C. Tweedie
  0 siblings, 1 reply; 60+ messages in thread
From: Juan J. Quintela @ 2000-06-07 21:40 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Rik van Riel, Hans Reiser, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

>>>>> "stephen" == Stephen C Tweedie <sct@redhat.com> writes:

Hi

stephen> All transactional filesystems will have ordering constraints which
stephen> the core VM cannot know about.  In that case, the filesystem may
stephen> simply have no choice about cleaning and unpinning pages in a given
stephen> order.  For actually removing a page from memory, evicting precisely
stephen> the right page is far more important, but for writeback, it's
stephen> controlling the amount of dirty/pinned data from the various different
stephen> sources which counts.

Fair enough, don't put pinned pages in the LRU, *why* do you want put
pages in the LRU if you can't freed it when the LRU told it: free that
page?  Ok. New example.  You have the 10 (put here any number) older
pages in the LRU.  That pages are pinned in memory, i.e. you can't
remove them.  You will call the ->flush() function in each of them
(put it any name for the method).  Now, the same fs has a lot of new
pages in the LRU that are being used actively, but are not pinned in
this precise instant.  Each time that we call the flush method, we
will free some dirty pages, not the pinned ones, evidently. We will
call that flush function 10 times consecutively.  Posibly we will
flush all the pages from the cache for that fs, and for not good
reason.  The only reason was that it was the 10 oldest pages in the
LRU, nothing else.  Yes, I know that this is a pathological case, but
I think that we should work ok in that case also.

I will be also very happy with only one place where doing the aging,
cleaning, ... of _all_ the pages, but for that place we need a policy,
and that policy _must_ be honored (almost) always or it doesn't make
sense and we will arrive to unstable/unfair situations.

I am working just now in a patch that will allow pages to be defered
the write of mmaped pages from the swap_out function to shrink_mmap
time.  The same that we do with swap pages actually, but for fs pages
mmaped in processes.  That would help that.  But note that in this
case, I put in the LRU pages that can be freed.  I can't understand
putting pages that are not freeable.  I told that to show that I am
supportive of the idea of only one LRU queue (or multiqueue, that is
the same).

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 21:40                             ` Juan J. Quintela
@ 2000-06-07 21:49                               ` Stephen C. Tweedie
  2000-06-07 22:00                                 ` Juan J. Quintela
                                                   ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-07 21:49 UTC (permalink / raw)
  To: Juan J. Quintela
  Cc: Stephen C. Tweedie, Rik van Riel, Hans Reiser, bert hubert,
	linux-kernel, Chris Mason, linux-mm, Alexander Zarochentcev

Hi,

On Wed, Jun 07, 2000 at 11:40:47PM +0200, Juan J. Quintela wrote:
> Hi
> Fair enough, don't put pinned pages in the LRU, *why* do you want put
> pages in the LRU if you can't freed it when the LRU told it: free that
> page?

Because even if the information about which page is least recently
used doesn't help you, the information about which filesystems are
least active _does_ help.

> Ok. New example.  You have the 10 (put here any number) older
> pages in the LRU.  That pages are pinned in memory, i.e. you can't
> remove them.  You will call the ->flush() function in each of them
> (put it any name for the method).  Now, the same fs has a lot of new
> pages in the LRU that are being used actively, but are not pinned in
> this precise instant.  Each time that we call the flush method, we
> will free some dirty pages, not the pinned ones, evidently. We will
> call that flush function 10 times consecutively.  Posibly we will
> flush all the pages from the cache for that fs, and for not good
> reason.

No, Rik was explicitly allowing the per-fs flush functions to 
indicate how much progress was being made, to avoid this.

> I will be also very happy with only one place where doing the aging,
> cleaning, ... of _all_ the pages, but for that place we need a policy,
> and that policy _must_ be honored (almost) always or it doesn't make
> sense and we will arrive to unstable/unfair situations.

We _have_ to have separate mechanisms for page cleaning and for page
reclaim.  Interrupt load requires that we free pages rapidly on 
demand, regardless of whether the page cleaner is stalled in the 
middle of a write operation or not.

> I am working just now in a patch that will allow pages to be defered
> the write of mmaped pages from the swap_out function to shrink_mmap
> time.  The same that we do with swap pages actually, but for fs pages
> mmaped in processes.  That would help that.  But note that in this
> case, I put in the LRU pages that can be freed.  I can't understand
> putting pages that are not freeable.

We are talking about separate queues for the different page types ---
you obviously don't want to pollute the clean (inactive?) list with
pinned pages.  Within the list of pinned pages (or dirty pages), we
still want to maintain enough ordering so that we go to the filesystems
in the right order when we start cleaning pages.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel:  it'snot just the code)
  2000-06-07 21:29                                     ` Hans Reiser
  2000-06-07 21:31                                       ` Rik van Riel
  2000-06-07 21:33                                       ` Stephen C. Tweedie
@ 2000-06-07 21:50                                       ` Juan J. Quintela
  2 siblings, 0 replies; 60+ messages in thread
From: Juan J. Quintela @ 2000-06-07 21:50 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Stephen C. Tweedie, Rik van Riel, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

>>>>> "hans" == Hans Reiser <hans@reiser.to> writes:

Hi

>> Every time we have tried to keep the caches completely separate, we
>> have ended up losing the ability to balance the various caches against
>> each other.  The major advantage of a common set of LRU lists is that
>> it gives us a basis for a balanced VM.
>> 
>> Cheers,
>> Stephen

hans> If I understand Juan correctly, they fixed this issue.  Aging 1/64th of the
hans> cache for every cache evenly at every round of trying to free pages should be an
hans> excellent fix.  It should do just fine at the task of handling a system with
hans> both ext3 and reiserfs running.

hans> Was this Juan's code that did this?  If so, kudos to him.

I am working in that also, but in the merging of all the caches
allways than possible. I.e. Rik and me done the defered swap patch, I
am finising the defered mmap page write and after that I will try the
defered shm code (I need to read the shm code first :()

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM
  2000-06-07 21:20                                     ` Rik van Riel
@ 2000-06-07 21:52                                       ` Hans Reiser
  2000-06-07 22:11                                         ` James Sutherland
  2000-06-08  1:11                                         ` Neil Schemenauer
  0 siblings, 2 replies; 60+ messages in thread
From: Hans Reiser @ 2000-06-07 21:52 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stephen C. Tweedie, Quintela Carreira Juan J.,
	linux-kernel, Chris Mason, linux-mm, Alexander Zarochentcev

Let me convey an aspect of its rightness.

Caches have a declining marginal utility.  It is a good idea to keep at least a
little bit of each cache around.  The classic problem is when you switch usage
patterns back and forth, and one of the caches has been completely flushed by,
say, a large file read.  If just 3% of the amount of cache remained from when it
was being used that 3% might give you a lot of speedup when the usage pattern
flipped back.

Hans

Rik van Riel wrote:
> 
> On Wed, 7 Jun 2000, Hans Reiser wrote:
> 
> > The new age one 64th of your objects scheme causes pressure to
> > be proportional.....
> 
> Which is wrong, unless the oldest pages from each zone happen
> to be the same age ;)
> 
> Suppose a 5MB SHM segment gets deattached and not used for a
> long time. In this situation it makes little sense to round-robin
> free from the different caches if the other caches are under more
> pressure.
> 
> > I am looking forward to reading the new 2.4 mm code during my
> > next aeroflot experience this sunday....
> 
> I'm working on it, but I can't promise to have all of the
> active/inactive/scavenge list framework ready by then ;)
> 
> regards,
> 
> Rik
> --
> The Internet is not a network of computers. It is a network
> of people. That is its real strength.
> 
> Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
> http://www.conectiva.com/               http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 21:49                               ` Stephen C. Tweedie
@ 2000-06-07 22:00                                 ` Juan J. Quintela
  2000-06-07 22:22                                 ` Manfred Spraul
  2000-06-07 22:28                                 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Hans Reiser
  2 siblings, 0 replies; 60+ messages in thread
From: Juan J. Quintela @ 2000-06-07 22:00 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Rik van Riel, Hans Reiser, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

>>>>> "sct" == Stephen C Tweedie <sct@redhat.com> writes:

sct> Hi,
sct> On Wed, Jun 07, 2000 at 11:40:47PM +0200, Juan J. Quintela wrote:
>> Hi
>> Fair enough, don't put pinned pages in the LRU, *why* do you want put
>> pages in the LRU if you can't freed it when the LRU told it: free that
>> page?

sct> Because even if the information about which page is least recently
sct> used doesn't help you, the information about which filesystems are
sct> least active _does_ help.

ok, I see what is your point here.

>> Ok. New example.  You have the 10 (put here any number) older
>> pages in the LRU.  That pages are pinned in memory, i.e. you can't
>> remove them.  You will call the ->flush() function in each of them
>> (put it any name for the method).  Now, the same fs has a lot of new
>> pages in the LRU that are being used actively, but are not pinned in
>> this precise instant.  Each time that we call the flush method, we
>> will free some dirty pages, not the pinned ones, evidently. We will
>> call that flush function 10 times consecutively.  Posibly we will
>> flush all the pages from the cache for that fs, and for not good
>> reason.

sct> No, Rik was explicitly allowing the per-fs flush functions to 
sct> indicate how much progress was being made, to avoid this.

That didn't avoid this, the next time that you scan that list, the
page from the same filesystem will appear, and you will flush pages
from that filesystem.  And so on.

>> I will be also very happy with only one place where doing the aging,
>> cleaning, ... of _all_ the pages, but for that place we need a policy,
>> and that policy _must_ be honored (almost) always or it doesn't make
>> sense and we will arrive to unstable/unfair situations.

sct> We _have_ to have separate mechanisms for page cleaning and for page
sct> reclaim.  Interrupt load requires that we free pages rapidly on 
sct> demand, regardless of whether the page cleaner is stalled in the 
sct> middle of a write operation or not.

I agree on that also, I have offered my help to Rik to implement
that.  That means that I also like that idea.

[Rest of the mail deleted, I also agree on that].

Thanks a lot for your comments in this topic.  I apreciate a lot the
comments of everybody.

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM
  2000-06-07 21:52                                       ` journaling & VM Hans Reiser
@ 2000-06-07 22:11                                         ` James Sutherland
  2000-06-07 22:29                                           ` Rik van Riel
  2000-06-08  1:11                                         ` Neil Schemenauer
  1 sibling, 1 reply; 60+ messages in thread
From: James Sutherland @ 2000-06-07 22:11 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Rik van Riel, Stephen C. Tweedie, Quintela Carreira Juan J.,
	linux-kernel, Chris Mason, linux-mm, Alexander Zarochentcev

On Wed, 7 Jun 2000, Hans Reiser wrote:

> Let me convey an aspect of its rightness.
> 
> Caches have a declining marginal utility.  It is a good idea to keep
> at least a little bit of each cache around.  The classic problem is
> when you switch usage patterns back and forth, and one of the caches
> has been completely flushed by, say, a large file read.  If just 3% of
> the amount of cache remained from when it was being used that 3% might
> give you a lot of speedup when the usage pattern flipped back.

Incidentally, this effect comes up in Andrew Schulman's book, Unauthorized
Windows '95, in the section where he compares raw DOS, SmartDrive, Windows
3.1 with 32 bit disk access, and WfWG/Win95 with 32 bit file and disk
access. One of his test sets illustrates this beautifully, as well as
showing the performance gains from each; he runs a text search on varying
sizes of text file, and there is a huge speed increase on the second
run-through - up until the file is larger than the cache, at which point
there is almost no difference between the first and second runs in some
configurations, IIRC...

On a related note, any chance of making some caches swappable? The
application I have in mind is for much slower block devices (floppy/CD
media); using the free swap space as a cache for the CD ROM drive could be
quite an improvement in some cases. (Actually, even for hard drives it
could help: imagine an extremely busy disk on /dev/sda, with an
almost-idle swap disk on /dev/hdb. Much more difficult to code, though,
and probably not worth it...)

Thoughts??

James.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM
  2000-06-07 21:33                                       ` Stephen C. Tweedie
@ 2000-06-07 22:20                                         ` Hans Reiser
  0 siblings, 0 replies; 60+ messages in thread
From: Hans Reiser @ 2000-06-07 22:20 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Quintela Carreira Juan J.,
	Rik van Riel, bert hubert, linux-kernel, Chris Mason, linux-mm,
	Alexander Zarochentcev

"Stephen C. Tweedie" wrote:
> 
> Hi,
> 
> On Wed, Jun 07, 2000 at 02:29:25PM -0700, Hans Reiser wrote:
> >
> > If I understand Juan correctly, they fixed this issue.  Aging 1/64th of the
> > cache for every cache evenly at every round of trying to free pages should be an
> > excellent fix.  It should do just fine at the task of handling a system with
> > both ext3 and reiserfs running.
> 
> That is _exactly_ what breaks the VM balance!  The net result of
> an algorithm like that is that all caches are shrunk at the same
> rate regardless of which ones are busy.  The "shrink everything
> at once" principle is what used to cause large filesystem scans
> (such as find|grep over a large source tree) to swap all our
> running processes out.
> 
> There _has_ to be a way to allow the relative ages of the different
> pages to influence the reclamation of pages from different sources.
> 
> Cheers,
>  Stephen

I am confused, if a page is accessed the aging is undone.  Aging 1/64th is not
the same as flushing 1/64th.  If cache A is not used the aging process gradually
shrinks it to nothing because its pages aren't unaged, if cache B is heavily
used the aging process doesn't age fast enough to overcome the unaging and new
pages get added and it grows.  I am missing something....

Hans
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 21:49                               ` Stephen C. Tweedie
  2000-06-07 22:00                                 ` Juan J. Quintela
@ 2000-06-07 22:22                                 ` Manfred Spraul
  2000-06-09 15:08                                   ` Rik van Riel
  2000-06-07 22:28                                 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Hans Reiser
  2 siblings, 1 reply; 60+ messages in thread
From: Manfred Spraul @ 2000-06-07 22:22 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Juan J. Quintela, Rik van Riel, Hans Reiser, bert hubert,
	linux-kernel, Chris Mason, linux-mm, Alexander Zarochentcev

"Stephen C. Tweedie" wrote:
> 
> Hi,
> 
> On Wed, Jun 07, 2000 at 11:40:47PM +0200, Juan J. Quintela wrote:
> > Hi
> > Fair enough, don't put pinned pages in the LRU, *why* do you want put
> > pages in the LRU if you can't freed it when the LRU told it: free that
> > page?
> 
> Because even if the information about which page is least recently
> used doesn't help you, the information about which filesystems are
> least active _does_ help.
> 

What about using a time based aproach for pinned pages?

* only individually freeable pages are added into the LRU.
* everyone else registers callbacks.
* shrink_mmap estimates (*) the age (in jiffies) of the oldest entry in
the LRU, and then it calls the pressure callbacks with that time.

(*) nr_of_lru_pages/lru_reclaimed_pages_during_last_jiffies. Another
field in "struct page" is too expensive.

--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 21:49                               ` Stephen C. Tweedie
  2000-06-07 22:00                                 ` Juan J. Quintela
  2000-06-07 22:22                                 ` Manfred Spraul
@ 2000-06-07 22:28                                 ` Hans Reiser
  2 siblings, 0 replies; 60+ messages in thread
From: Hans Reiser @ 2000-06-07 22:28 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Juan J. Quintela, Rik van Riel, bert hubert, linux-kernel,
	Chris Mason, linux-mm, Alexander Zarochentcev

Juan, while the FS cannot immediately unpin the pages, if pushed into doing so
it can startup the mechanisms to unpin them.  The pressure to start those
mechanisms should be proportional to the amount of pages it is hogging.

Memory pressure should have a central pusher and decentralized FS delegated
response to the pushing.

Hans
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM
  2000-06-07 22:11                                         ` James Sutherland
@ 2000-06-07 22:29                                           ` Rik van Riel
  0 siblings, 0 replies; 60+ messages in thread
From: Rik van Riel @ 2000-06-07 22:29 UTC (permalink / raw)
  To: James Sutherland
  Cc: Hans Reiser, Stephen C. Tweedie, Quintela Carreira Juan J.,
	linux-kernel, Chris Mason, linux-mm, Alexander Zarochentcev

On Wed, 7 Jun 2000, James Sutherland wrote:
> On Wed, 7 Jun 2000, Hans Reiser wrote:
> 
> > Let me convey an aspect of its rightness.
> > 
> > Caches have a declining marginal utility.
> 
> Incidentally, this effect comes up in Andrew Schulman's book,
> Unauthorized Windows '95, in the section where he compares raw
> DOS, SmartDrive, Windows 3.1 with 32 bit disk access,

	[SNIP]

The difference here is that those systems do NOT have a
unified VM. Also, mmap() isn't used for program data and
lots of other stuff we're doing isn't done on those
systems.

In a world where you mmap() your executables and major
parts of your program data, properly managing all the
caches *is* important...

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM
  2000-06-07 21:52                                       ` journaling & VM Hans Reiser
  2000-06-07 22:11                                         ` James Sutherland
@ 2000-06-08  1:11                                         ` Neil Schemenauer
  2000-06-08  1:29                                           ` Rik van Riel
  1 sibling, 1 reply; 60+ messages in thread
From: Neil Schemenauer @ 2000-06-08  1:11 UTC (permalink / raw)
  To: linux-mm

[recipient list brutally slashed]

On Wed, Jun 07, 2000 at 02:52:10PM -0700, Hans Reiser wrote:
> Caches have a declining marginal utility. It is a good idea to
> keep at least a little bit of each cache around. The classic
> problem is when you switch usage patterns back and forth, and
> one of the caches has been completely flushed by, say, a large
> file read. If just 3% of the amount of cache remained from when
> it was being used that 3% might give you a lot of speedup when
> the usage pattern flipped back.

I'm not sure about this.  The problem is that things like file
reads break the LRU heuristic.  If the new pages read will be
accessed sooner than the cache pages (instead of being just
accessed once) then the cache pages should be paged out.  Am I
missing something?

    Neil

-- 
Real Life? I played that game. The plot sucks but the graphics are
awesome.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM
  2000-06-08  1:11                                         ` Neil Schemenauer
@ 2000-06-08  1:29                                           ` Rik van Riel
  0 siblings, 0 replies; 60+ messages in thread
From: Rik van Riel @ 2000-06-08  1:29 UTC (permalink / raw)
  To: Neil Schemenauer; +Cc: linux-mm

On Wed, 7 Jun 2000, Neil Schemenauer wrote:

> I'm not sure about this.  The problem is that things like file
> reads break the LRU heuristic.  If the new pages read will be
> accessed sooner than the cache pages (instead of being just
> accessed once) then the cache pages should be paged out.  Am I
> missing something?

No. You just described why LRU is not the algorithm we want
to use for page aging ;)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM
       [not found]                       ` <20000608114435.A15433@uni-koblenz.de>
@ 2000-06-08 21:29                         ` Stephen C. Tweedie
  2000-06-09 11:53                           ` Ralf Baechle
  0 siblings, 1 reply; 60+ messages in thread
From: Stephen C. Tweedie @ 2000-06-08 21:29 UTC (permalink / raw)
  To: Ralf Baechle, linux-mm; +Cc: Stephen C. Tweedie

Hi,

On Thu, Jun 08, 2000 at 11:44:35AM +0200, Ralf Baechle wrote:
> On Wed, Jun 07, 2000 at 06:11:44PM +0100, Stephen C. Tweedie wrote:
> 
> > Because you want to have some idea of the usage patterns of the 
> > pages, too, so that you can free pages which haven't been accessed 
> > recently regardless of who owns them.
> 
> some device drivers may also collect relativly large amounts of memory.
> In case of my HIPPI cards this may be in the range of megabytes.  So I'd
> like to see a hook for freeing device memory.

Rik, here's yet another item for the wishlist on your new VM. :)

Device drivers really are a special case because they typically need
their memory at short notice, and at awkward times (such as in the 
middle of interrupts).  What sort of flexibility do you have regarding
the allocation/release of the buffer pull in your driver?

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM
  2000-06-08 21:29                         ` Stephen C. Tweedie
@ 2000-06-09 11:53                           ` Ralf Baechle
  0 siblings, 0 replies; 60+ messages in thread
From: Ralf Baechle @ 2000-06-09 11:53 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm

On Thu, Jun 08, 2000 at 10:29:50PM +0100, Stephen C. Tweedie wrote:

> On Thu, Jun 08, 2000 at 11:44:35AM +0200, Ralf Baechle wrote:

> > some device drivers may also collect relativly large amounts of memory.
> > In case of my HIPPI cards this may be in the range of megabytes.  So I'd
> > like to see a hook for freeing device memory.

> Device drivers really are a special case because they typically need
> their memory at short notice, and at awkward times (such as in the 
> middle of interrupts).  What sort of flexibility do you have regarding
> the allocation/release of the buffer pull in your driver?

I can release those buffers immediately.  The driver only holds them for
some while since it delays cleaning the tx ring, depending on the various
interrupt avoidance strategies we might use even indefinately.  Allocation
on rx is done at interrupt time but that's no big deal, if we fail to
allocate memory we just drop the packet and try again later.  Such
interrupt avoidance is actually a very common thing for alot of NICs.  The
(rare...) HIPPI case is worst because HIPPI has the largest MTU with 64kb.

  Ralf
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-07 22:22                                 ` Manfred Spraul
@ 2000-06-09 15:08                                   ` Rik van Riel
  2000-06-09 16:52                                     ` Manfred Spraul
  0 siblings, 1 reply; 60+ messages in thread
From: Rik van Riel @ 2000-06-09 15:08 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel, linux-mm

On Thu, 8 Jun 2000, Manfred Spraul wrote:
> "Stephen C. Tweedie" wrote:
> > On Wed, Jun 07, 2000 at 11:40:47PM +0200, Juan J. Quintela wrote:
> > > Hi
> > > Fair enough, don't put pinned pages in the LRU, *why* do you want put
> > > pages in the LRU if you can't freed it when the LRU told it: free that
> > > page?
> > 
> > Because even if the information about which page is least recently
> > used doesn't help you, the information about which filesystems are
> > least active _does_ help.
> 
> What about using a time based aproach for pinned pages?
> 
> * only individually freeable pages are added into the LRU.
> * everyone else registers callbacks.
> * shrink_mmap estimates (*) the age (in jiffies) of the oldest entry in
> the LRU, and then it calls the pressure callbacks with that time.

This is exactly what one global LRU will achieve, at less
cost and with better readable code.

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-09 15:08                                   ` Rik van Riel
@ 2000-06-09 16:52                                     ` Manfred Spraul
  2000-06-09 17:23                                       ` Rik van Riel
  0 siblings, 1 reply; 60+ messages in thread
From: Manfred Spraul @ 2000-06-09 16:52 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm

>
> This is exactly what one global LRU will achieve, at less
> cost and with better readable code.
>
You are right, but what will you do with pinned pages once they reach the
end of the LRU? Will you drop them from the LRU, or will you add them to the
beginning?
AFAICS a few global LRU lists [your inactive, active, scavenge (sp?) lists]
should work, but I don't understand yet how you want to prevent that one
grep over the kernel tree will push everyone else into swap.

Is the active list also a LRU list? AFAICS we don't have the reverse
mapping "struct page ->all pte's", so we cannot push a page once it reaches
the end of the LRU. AFAIK BSD has that reverse mapping (Please correct me if
I'm wrong). IMHO an LRU won't help us.

--
    Manfred

P.S.: You could ignore the rest of the mail, just a few random thoughts.

Level 1 (your active list): the page users such as
* mmapped pages, annon pages, mapped shm pages: they are unmapped by
mm/vmscan.c. vma->swapout() should add them to the level 2 list.

* a tiny hotlist for the page & buffer cache, otherwise we have
"spin_lock();list_del(page);list_add(page,list_head);spin_unlock()" during
every operation. Clock algorithm with a referenced bit.

Level 2: (your inactive list)
* unmapped pages LRU list 1 [pages can be dirty or clean]. At the end of
this list, page->a_ops->?? is called, and the page is dropped from the list.
The memory owner adds it to the level 3 list once it's clean.

Level 3: (your scavenge list)
* LRU list of clean pages, ready for immediate reclamation. gfp(GFP_WAIT)
takes the oldest entry from this list.

Level 4:
free pages in the buddy. for GFP_ATOMIC allocations, and for multi page
allocations.

Pages in Level 2 and 3 are never "in use", i.e. never reachable from user
space, or read/written by generic_file_{read,write}. The page owner can
still reclaim them if a soft pagefault occurs. File pages are still in the
page cache hash table, shm & anon pages are reachable through the swap
cache.

Level 2 could be split in 2 halfs, clean pages are added in the middle.
[reduces IO]

The selection between the Level 1 page holders could be made on their
"reanimate rate": if one owner often request pages from Level 2 or 3 back,
then we reap him too often.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel: it'snot just the code)
  2000-06-09 16:52                                     ` Manfred Spraul
@ 2000-06-09 17:23                                       ` Rik van Riel
  2000-06-09 18:26                                         ` journaling & VM (was: Re: reiserfs being part of the kernel:it'snot " Manfred Spraul
  0 siblings, 1 reply; 60+ messages in thread
From: Rik van Riel @ 2000-06-09 17:23 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-mm

On Fri, 9 Jun 2000, Manfred Spraul wrote:

> > This is exactly what one global LRU will achieve, at less
> > cost and with better readable code.
>
> You are right, but what will you do with pinned pages once they
> reach the end of the LRU? Will you drop them from the LRU, or
> will you add them to the beginning?

We will ask the filesystem to write out data and unpin this
block. If it doesn't, we'll ask again next time, ....

Note that this is essentially harmless since we only ask the
filesystem to clean up pages so they can be unpinned, we are
in no way asking the filesystem to free used pages...

> AFAICS a few global LRU lists [your inactive, active, scavenge
> (sp?) lists] should work, but I don't understand yet how you
> want to prevent that one grep over the kernel tree will push
> everyone else into swap.

Ahh, but the swap and filesystem IO will be triggered from the
end of the _inactive_ list. We will unmap pages and allocate
swap earlier on, but we won't actually do any of the IO...

> Is the active list also a LRU list? AFAICS we don't have the
> reverse mapping "struct page ->all pte's", so we cannot push a
> page once it reaches the end of the LRU. AFAIK BSD has that
> reverse mapping (Please correct me if I'm wrong). IMHO an LRU
> won't help us.

The active list will probably have to be what our current
swap_out/shrink_mmap combo does. In 2.5 we can add the
changes needed to do reverse mapping, but until then we'll
probably have to leave this kludge ;(

> Level 1 (your active list): the page users such as * mmapped
> pages, annon pages, mapped shm pages: they are unmapped by
> mm/vmscan.c. vma->swapout() should add them to the level 2 list.
>
> * a tiny hotlist for the page & buffer cache, otherwise we have
> "spin_lock();list_del(page);list_add(page,list_head);spin_unlock()"
> during every operation. Clock algorithm with a referenced bit.

Not so fast ... this is the only level where we do page aging, so
we don't want to move the pages to the inactive list too fast. When
we first unmap a page, it'll get added to the list and start out
with a certain page age, after which aging has to happen for it to
be moved to the inactive list...

> Level 2: (your inactive list)
> * unmapped pages LRU list 1 [pages can be dirty or clean]. At
> the end of this list, page->a_ops->?? is called, and the page is
> dropped from the list. The memory owner adds it to the level 3
> list once it's clean.

The operation we call is basically only there to get the page
cleaned and the buffers removed. We try to keep a certain number
of inactive pages around so we'll always have something to reclaim
and page aging is balanced.

> Level 3: (your scavenge list)
> * LRU list of clean pages, ready for immediate reclamation.
> gfp(GFP_WAIT) takes the oldest entry from this list.

*nod*

> Level 4:
> free pages in the buddy. for GFP_ATOMIC allocations, and for
> multi page allocations.

*nod*  (and for PF_MEMALLOC allocations)

> Pages in Level 2 and 3 are never "in use", i.e. never reachable
> from user space, or read/written by generic_file_{read,write}.
> The page owner can still reclaim them if a soft pagefault
> occurs. File pages are still in the page cache hash table, shm &
> anon pages are reachable through the swap cache.

Yes.

> Level 2 could be split in 2 halfs, clean pages are added in the
> middle. [reduces IO]

We do something like this, but splitting the list in half is,
IMHO not a good idea. What we do instead is:
- walk the list, reclaiming free pages
- if we didn't get enough, walk the list again and start
  (async?) IO on a number of dirty pages
- if we didn't get enough free pages after the second run
  (unlikely at the moment, but some page->mapping->flush()
  functions we may want to make synchronous later...) we
  kick bdflush/kflushd in the nuts so we'll have enough
  free pages next time

> The selection between the Level 1 page holders could be made on
> their "reanimate rate": if one owner often request pages from
> Level 2 or 3 back, then we reap him too often.

That's what page aging is for.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: journaling & VM  (was: Re: reiserfs being part of the kernel:it'snot just the code)
  2000-06-09 17:23                                       ` Rik van Riel
@ 2000-06-09 18:26                                         ` Manfred Spraul
  0 siblings, 0 replies; 60+ messages in thread
From: Manfred Spraul @ 2000-06-09 18:26 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm

Is it correct that you want to use 5 levels?

* "mapped" or "hot file cache" / "hot buffer cache"
* active [here your page aging is performed]
* inactive list
* scavenge list
* gfp buddy list.

I thought that unmapping of the last externally visible mapping will move a
page into the inactive list, and the LRU nature of that list will perform
the aging. Is your inactive list a usually short clock like list? My
inactive list is a long LRU list. If the scavenge list gets empty, then the
last few dozend entries would be spliced out from the inactive list, and
page->a_op->we_need_memory__unpin_yourself_and_add_yourself_to_the_scavenge_
list() is called.

From: "Rik van Riel" <riel@conectiva.com.br>
> > You are right, but what will you do with pinned pages once they
> > reach the end of the LRU? Will you drop them from the LRU, or
> > will you add them to the beginning?
>
> We will ask the filesystem to write out data and unpin this
> block. If it doesn't, we'll ask again next time, ....
>

Why? E.g. you have a box with a fast raid array, and a slow parallel port
zip drive. I'd give the filesystem one "flush now" call for the page, and
remove the page immediately from the inactive list. If you walk circles,
then it's a clock like algorithm, not LRU like.

>
> Ahh, but the swap and filesystem IO will be triggered from the
> end of the _inactive_ list. We will unmap pages and allocate
> swap earlier on, but we won't actually do any of the IO...
>
Hey, I only have 192 MB. One kernel tree is ~90 MB, a diff between 2 trees
180 MB. One diff will push everything behind the end of the inactive list.

> > The selection between the Level 1 page holders could be made on
> > their "reanimate rate": if one owner often request pages from
> > Level 2 or 3 back, then we reap him too often.
>
> That's what page aging is for.
>
If a subsystem request a page back from the inactive/scavenge list, then we
must remove the page from these lists. We could use these function calls to
calculate accurate hit/miss rates for the memory users, and use these stats
for the page aging without a special aging level.

We could go one step further and assign these stats to each address space
[file data, shm]//each process [anon pages,mmap]. Playing a DVD & running a
database could auto-tune into discard the DVD data immediately, don't touch
the database data.

--
    Manfred

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2000-06-09 18:26 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.10.10006060811120.15888-100000@dax.joh.cam.ac.uk>
     [not found] ` <393CA40C.648D3261@reiser.to>
     [not found]   ` <20000606114851.A30672@home.ds9a.nl>
     [not found]     ` <393CBBB8.554A0D2A@reiser.to>
     [not found]       ` <20000606172606.I25794@redhat.com>
     [not found]         ` <393D37D1.1BC61DC3@reiser.to>
     [not found]           ` <20000606205447.T23701@redhat.com>
2000-06-06 23:06             ` journaling & VM (was: Re: reiserfs being part of the kernel: it's not just the code) Rik van Riel
2000-06-07  1:19               ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Hans Reiser
2000-06-07  1:46                 ` Quintela Carreira Juan J.
2000-06-07  3:45                   ` Hans Reiser
2000-06-07 11:15                     ` Stephen C. Tweedie
2000-06-07 13:23                       ` Rik van Riel
2000-06-07 13:41                         ` Stephen C. Tweedie
2000-06-07 14:27                           ` Rik van Riel
2000-06-07 14:46                             ` Stephen C. Tweedie
2000-06-07 14:51                               ` bert hubert
2000-06-07 15:20                               ` Quintela Carreira Juan J.
2000-06-07 15:35                                 ` Stephen C. Tweedie
2000-06-07 15:41                                   ` Rik van Riel
2000-06-07 15:44                                   ` Juan J. Quintela
2000-06-07 17:10                                   ` Jeff V. Merkey
2000-06-07 17:14                                     ` Stephen C. Tweedie
2000-06-07 17:21                                       ` Jeff V. Merkey
2000-06-07 20:16                                   ` Hans Reiser
2000-06-07 21:20                                     ` Rik van Riel
2000-06-07 21:52                                       ` journaling & VM Hans Reiser
2000-06-07 22:11                                         ` James Sutherland
2000-06-07 22:29                                           ` Rik van Riel
2000-06-08  1:11                                         ` Neil Schemenauer
2000-06-08  1:29                                           ` Rik van Riel
2000-06-07 20:16                                 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Hans Reiser
2000-06-07 20:54                                   ` Stephen C. Tweedie
2000-06-07 21:29                                     ` Hans Reiser
2000-06-07 21:31                                       ` Rik van Riel
2000-06-07 21:33                                       ` Stephen C. Tweedie
2000-06-07 22:20                                         ` journaling & VM Hans Reiser
2000-06-07 21:50                                       ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Juan J. Quintela
2000-06-07 19:02                         ` journaling & VM (was: Re: reiserfs being part of the kernel:it'snot " Hans Reiser
2000-06-07 13:40                       ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Chris Mason
2000-06-07 13:47                         ` Stephen C. Tweedie
2000-06-07 11:12                 ` Stephen C. Tweedie
2000-06-07 16:35                   ` journaling & VM John Fremlin
2000-06-07 17:11                     ` Stephen C. Tweedie
     [not found]                       ` <20000608114435.A15433@uni-koblenz.de>
2000-06-08 21:29                         ` Stephen C. Tweedie
2000-06-09 11:53                           ` Ralf Baechle
2000-06-07 17:48                   ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Hans Reiser
2000-06-07 18:01                     ` Rik van Riel
2000-06-07 19:58                       ` Stephen C. Tweedie
2000-06-07 20:56                         ` Juan J. Quintela
2000-06-07 21:14                           ` Rik van Riel
2000-06-07 21:24                           ` Stephen C. Tweedie
2000-06-07 21:40                             ` Juan J. Quintela
2000-06-07 21:49                               ` Stephen C. Tweedie
2000-06-07 22:00                                 ` Juan J. Quintela
2000-06-07 22:22                                 ` Manfred Spraul
2000-06-09 15:08                                   ` Rik van Riel
2000-06-09 16:52                                     ` Manfred Spraul
2000-06-09 17:23                                       ` Rik van Riel
2000-06-09 18:26                                         ` journaling & VM (was: Re: reiserfs being part of the kernel:it'snot " Manfred Spraul
2000-06-07 22:28                                 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Hans Reiser
2000-06-07 10:10               ` journaling & VM (was: Re: reiserfs being part of the kernel: it's not " Stephen C. Tweedie
     [not found]             ` <393DACC8.5DB60A81@reiser.to>
2000-06-07 11:00               ` reiserfs being part of the kernel: it's not just the code Stephen C. Tweedie
2000-06-07 17:11                 ` Rik van Riel
2000-06-07 17:13                   ` Stephen C. Tweedie
2000-06-07 17:46                 ` Hans Reiser
2000-06-07 19:53                   ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox