From: Rik van Riel <riel@conectiva.com.br>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: linux-mm@kvack.org
Subject: Re: journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code)
Date: Fri, 9 Jun 2000 14:23:29 -0300 (BRST) [thread overview]
Message-ID: <Pine.LNX.4.21.0006091410100.31358-100000@duckman.distro.conectiva> (raw)
In-Reply-To: <007501bfd233$288827c0$0a1e17ac@local>
On Fri, 9 Jun 2000, Manfred Spraul wrote:
> > This is exactly what one global LRU will achieve, at less
> > cost and with better readable code.
>
> You are right, but what will you do with pinned pages once they
> reach the end of the LRU? Will you drop them from the LRU, or
> will you add them to the beginning?
We will ask the filesystem to write out data and unpin this
block. If it doesn't, we'll ask again next time, ....
Note that this is essentially harmless since we only ask the
filesystem to clean up pages so they can be unpinned, we are
in no way asking the filesystem to free used pages...
> AFAICS a few global LRU lists [your inactive, active, scavenge
> (sp?) lists] should work, but I don't understand yet how you
> want to prevent that one grep over the kernel tree will push
> everyone else into swap.
Ahh, but the swap and filesystem IO will be triggered from the
end of the _inactive_ list. We will unmap pages and allocate
swap earlier on, but we won't actually do any of the IO...
> Is the active list also a LRU list? AFAICS we don't have the
> reverse mapping "struct page ->all pte's", so we cannot push a
> page once it reaches the end of the LRU. AFAIK BSD has that
> reverse mapping (Please correct me if I'm wrong). IMHO an LRU
> won't help us.
The active list will probably have to be what our current
swap_out/shrink_mmap combo does. In 2.5 we can add the
changes needed to do reverse mapping, but until then we'll
probably have to leave this kludge ;(
> Level 1 (your active list): the page users such as * mmapped
> pages, annon pages, mapped shm pages: they are unmapped by
> mm/vmscan.c. vma->swapout() should add them to the level 2 list.
>
> * a tiny hotlist for the page & buffer cache, otherwise we have
> "spin_lock();list_del(page);list_add(page,list_head);spin_unlock()"
> during every operation. Clock algorithm with a referenced bit.
Not so fast ... this is the only level where we do page aging, so
we don't want to move the pages to the inactive list too fast. When
we first unmap a page, it'll get added to the list and start out
with a certain page age, after which aging has to happen for it to
be moved to the inactive list...
> Level 2: (your inactive list)
> * unmapped pages LRU list 1 [pages can be dirty or clean]. At
> the end of this list, page->a_ops->?? is called, and the page is
> dropped from the list. The memory owner adds it to the level 3
> list once it's clean.
The operation we call is basically only there to get the page
cleaned and the buffers removed. We try to keep a certain number
of inactive pages around so we'll always have something to reclaim
and page aging is balanced.
> Level 3: (your scavenge list)
> * LRU list of clean pages, ready for immediate reclamation.
> gfp(GFP_WAIT) takes the oldest entry from this list.
*nod*
> Level 4:
> free pages in the buddy. for GFP_ATOMIC allocations, and for
> multi page allocations.
*nod* (and for PF_MEMALLOC allocations)
> Pages in Level 2 and 3 are never "in use", i.e. never reachable
> from user space, or read/written by generic_file_{read,write}.
> The page owner can still reclaim them if a soft pagefault
> occurs. File pages are still in the page cache hash table, shm &
> anon pages are reachable through the swap cache.
Yes.
> Level 2 could be split in 2 halfs, clean pages are added in the
> middle. [reduces IO]
We do something like this, but splitting the list in half is,
IMHO not a good idea. What we do instead is:
- walk the list, reclaiming free pages
- if we didn't get enough, walk the list again and start
(async?) IO on a number of dirty pages
- if we didn't get enough free pages after the second run
(unlikely at the moment, but some page->mapping->flush()
functions we may want to make synchronous later...) we
kick bdflush/kflushd in the nuts so we'll have enough
free pages next time
> The selection between the Level 1 page holders could be made on
> their "reanimate rate": if one owner often request pages from
> Level 2 or 3 back, then we reap him too often.
That's what page aging is for.
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/ http://www.surriel.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
next prev parent reply other threads:[~2000-06-09 17:23 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <Pine.LNX.4.10.10006060811120.15888-100000@dax.joh.cam.ac.uk>
[not found] ` <393CA40C.648D3261@reiser.to>
[not found] ` <20000606114851.A30672@home.ds9a.nl>
[not found] ` <393CBBB8.554A0D2A@reiser.to>
[not found] ` <20000606172606.I25794@redhat.com>
[not found] ` <393D37D1.1BC61DC3@reiser.to>
[not found] ` <20000606205447.T23701@redhat.com>
2000-06-06 23:06 ` journaling & VM (was: Re: reiserfs being part of the kernel: it's not " Rik van Riel
2000-06-07 1:19 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Hans Reiser
2000-06-07 1:46 ` Quintela Carreira Juan J.
2000-06-07 3:45 ` Hans Reiser
2000-06-07 11:15 ` Stephen C. Tweedie
2000-06-07 13:23 ` Rik van Riel
2000-06-07 13:41 ` Stephen C. Tweedie
2000-06-07 14:27 ` Rik van Riel
2000-06-07 14:46 ` Stephen C. Tweedie
2000-06-07 14:51 ` bert hubert
2000-06-07 15:20 ` Quintela Carreira Juan J.
2000-06-07 15:35 ` Stephen C. Tweedie
2000-06-07 15:41 ` Rik van Riel
2000-06-07 15:44 ` Juan J. Quintela
2000-06-07 17:10 ` Jeff V. Merkey
2000-06-07 17:14 ` Stephen C. Tweedie
2000-06-07 17:21 ` Jeff V. Merkey
2000-06-07 20:16 ` Hans Reiser
2000-06-07 21:20 ` Rik van Riel
2000-06-07 21:52 ` journaling & VM Hans Reiser
2000-06-07 22:11 ` James Sutherland
2000-06-07 22:29 ` Rik van Riel
2000-06-08 1:11 ` Neil Schemenauer
2000-06-08 1:29 ` Rik van Riel
2000-06-07 20:16 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Hans Reiser
2000-06-07 20:54 ` Stephen C. Tweedie
2000-06-07 21:29 ` Hans Reiser
2000-06-07 21:31 ` Rik van Riel
2000-06-07 21:33 ` Stephen C. Tweedie
2000-06-07 22:20 ` journaling & VM Hans Reiser
2000-06-07 21:50 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Juan J. Quintela
2000-06-07 19:02 ` journaling & VM (was: Re: reiserfs being part of the kernel:it'snot " Hans Reiser
2000-06-07 13:40 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Chris Mason
2000-06-07 13:47 ` Stephen C. Tweedie
2000-06-07 11:12 ` Stephen C. Tweedie
2000-06-07 16:35 ` journaling & VM John Fremlin
2000-06-07 17:11 ` Stephen C. Tweedie
[not found] ` <20000608114435.A15433@uni-koblenz.de>
2000-06-08 21:29 ` Stephen C. Tweedie
2000-06-09 11:53 ` Ralf Baechle
2000-06-07 17:48 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot just the code) Hans Reiser
2000-06-07 18:01 ` Rik van Riel
2000-06-07 19:58 ` Stephen C. Tweedie
2000-06-07 20:56 ` Juan J. Quintela
2000-06-07 21:14 ` Rik van Riel
2000-06-07 21:24 ` Stephen C. Tweedie
2000-06-07 21:40 ` Juan J. Quintela
2000-06-07 21:49 ` Stephen C. Tweedie
2000-06-07 22:00 ` Juan J. Quintela
2000-06-07 22:22 ` Manfred Spraul
2000-06-09 15:08 ` Rik van Riel
2000-06-09 16:52 ` Manfred Spraul
2000-06-09 17:23 ` Rik van Riel [this message]
2000-06-09 18:26 ` journaling & VM (was: Re: reiserfs being part of the kernel:it'snot " Manfred Spraul
2000-06-07 22:28 ` journaling & VM (was: Re: reiserfs being part of the kernel: it'snot " Hans Reiser
2000-06-07 10:10 ` journaling & VM (was: Re: reiserfs being part of the kernel: it's not " Stephen C. Tweedie
[not found] ` <393DACC8.5DB60A81@reiser.to>
2000-06-07 11:00 ` reiserfs being part of the kernel: it's not just the code Stephen C. Tweedie
2000-06-07 17:11 ` Rik van Riel
2000-06-07 17:13 ` Stephen C. Tweedie
2000-06-07 17:46 ` Hans Reiser
2000-06-07 19:53 ` Stephen C. Tweedie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.21.0006091410100.31358-100000@duckman.distro.conectiva \
--to=riel@conectiva.com.br \
--cc=linux-mm@kvack.org \
--cc=manfred@colorfullife.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox