Re: Q: PAGE_CACHE_SIZE? - Stephen C. Tweedie

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Stephen C. Tweedie" <sct@redhat.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Rik van Riel <riel@nl.linux.org>,
	ak@muc.de, ebiederm+eric@ccr.net, linux-kernel@vger.rutgers.edu,
	linux-mm@kvack.org
Subject: Re: Q: PAGE_CACHE_SIZE?
Date: Fri, 28 May 1999 21:46:01 +0100 (BST)	[thread overview]
Message-ID: <14159.137.169623.500547@dukat.scot.redhat.com> (raw)
In-Reply-To: <E10n8Ic-0003h9-00@the-village.bc.nu>

Hi,

On Thu, 27 May 1999 23:06:48 +0100 (BST), Alan Cox
<alan@lxorguk.ukuu.org.uk> said:

>> A larger page size is no compensation for the lack of a decent
>> read-{ahead,back,anywhere} I/O clustering algorithm in the OS.

> It isnt compensating for that. If you have 4Gig of memory and a high
> performance I/O controller the constant cost per page for VM
> management begins to dominate the equation. Its also a win for other
> CPU related reasons (reduced tlb misses and the like), and with 4Gig
> of RAM the argument is a larger page size isnt a problem.

That's still only half of the issue.  Remember, there isn't any one
block size in the kernel.

We can happily use 1k or 2k blocks in the buffer cache.  We use 8k
chunks for stacks.  Nothing is forcing us to use the hardware page size
for all of our operations.  

In short, I think the real answer as far as the VM is concerned is
indeed to use clustering,  not just for IO (we already do that for
mmap paging and for sequential reads), but for pageout too.  I really
believe that the best unit in which to do pageout is whatever unit we
did the pagein in in the first place.

This has a lot of really nice properties.  If we record sequential
accesses when setting up data in the first place, then we can
automatically optimise for that when doing the pageout again.  For swap,
it reduces fragmentation: we can allocate in multi-page chunks and keep
that allocation persistent.

There are very few places where doing such clustering falls short of
what we'd get by increasing the page size.  COW is one: retaining
per-page COW granularity is an easy way to fragment swap, but increasing
the COW chunk size changes the semantics unless we actually export a
larger pagesize to user space (because we end up destroying the VM
backing store sharing for pages which haven't actually been touched by
the user).

Finally, there's one other thing worth considering: if we use a larger
chunk size for dividing up memory (say, set the buddy heap up in a basic
unit of 32k or more), then the slab allocator is actually a very good
way of getting page allocations out of that.  

If we did in fact use the 4k minipage for all kernel get_free_page()
allocations as usual, but used the larger 32k buddy heap pages for all
VM allocations, then 8K kernel allocations (eg. stack allocations and
large NFS packets) become trivial to deal with.

The biggest problem we have had with these multi-page allocations up to
now is fragmentation in the VM.  If we populate the _entire_ VM in
multiples of 8k or more then we can never see such fragmentation at all.
8k might actually be a reasonable pagesize even on low memory machines:
we found in 2.2 that the increased size of the kernel was compensated
for by more efficient swapping so that things still went faster in low
memory than under 2.2, and large pages may well have the same tradeoff.

--Stephen

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

next prev parent reply	other threads:[~1999-05-28 20:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1999-05-18 14:03 Eric W. Biederman
1999-05-18 15:04 ` Andi Kleen
1999-05-19 23:29   ` Chris Wedgwood
1999-05-20 17:12   ` Andrea Arcangeli
1999-05-25 16:29   ` Alan Cox
1999-05-25 20:16     ` Rik van Riel
1999-05-25 22:17       ` Matti Aarnio
1999-05-27 22:06       ` Alan Cox
1999-05-28 20:46         ` Stephen C. Tweedie [this message]
1999-05-28 21:33           ` Rik van Riel
1999-05-29  1:59             ` Stephen C. Tweedie
1999-05-30 23:12               ` Andrea Arcangeli
1999-06-01  0:01                 ` Stephen C. Tweedie
1999-06-01 14:23                   ` Andrea Arcangeli
1999-05-29 15:07           ` Ralf Baechle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14159.137.169623.500547@dukat.scot.redhat.com \
    --to=sct@redhat.com \
    --cc=ak@muc.de \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=ebiederm+eric@ccr.net \
    --cc=linux-kernel@vger.rutgers.edu \
    --cc=linux-mm@kvack.org \
    --cc=riel@nl.linux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox