From: Christoph Lameter <clameter@sgi.com>
To: linux-mm@kvack.org
Cc: William Lee Irwin III <wli@holomorphy.com>,
Badari Pulavarty <pbadari@gmail.com>, David Chinner <dgc@sgi.com>,
Jens Axboe <jens.axboe@oracle.com>,
Adam Litke <aglitke@gmail.com>,
Christoph Lameter <clameter@sgi.com>,
Dave Hansen <hansendc@us.ibm.com>, Mel Gorman <mel@skynet.ie>,
Avi Kivity <avi@argo.co.il>
Subject: [RFC 00/16] Variable Order Page Cache Patchset V2
Date: Sun, 22 Apr 2007 23:48:45 -0700 (PDT) [thread overview]
Message-ID: <20070423064845.5458.2190.sendpatchset@schroedinger.engr.sgi.com> (raw)
Sorry for the earlier mail. quilt and exim not cooperating.
RFC V1->V2
- Some ext2 support
- Some block layer, fs layer support etc.
- Better page cache macros
- Use macros to clean up code.
This patchset modifies the Linux kernel so that higher order page cache
pages become possible. The higher order page cache pages are compound pages
and can be handled in the same way as regular pages.
Rationales:
1. We have problems supporting devices with a higher blocksize than
page size. This is for example important to support CD and DVDs that
can only read and write 32k or 64k blocks. We currently have a shim
layer in there to deal with this situation which limits the speed
of I/O. The developers are currently looking for ways to completely
bypass the page cache because of this deficiency.
2. 32/64k blocksize is also used in flash devices. Same issues.
3. Future harddisks will support bigger block sizes
4. Performace. If we look at IA64 vs. x86_64 then it seems that the
faster interrupt handling on x86_64 compensate for the speed loss due to
a smaller page size (4k vs 16k on IA64). Having higher page sizes on all
platform allows a significant reduction in I/O overhead and increases the
size of I/O that can be performed by hardware in a single request
since the number of scatter gather entries are typically limited for
one request. This is going to become increasingly important to support
the ever growing memory sizes since we may have to handle excessively
large amounts of 4k requests for data sizes that may become common
soon. For example to write a 1 terabyte file the kernel would have to
handle 256 million 4k chunks.
5. Cross arch compatibility: It is currently not possible to mount
an 16k blocksize ext2 filesystem created on IA64 on an x86_64 system.
The support here is currently only for buffered I/O and only for two
filesystems ramfs and ext2.
Note that the higher order pages are subject to reclaim. This works in general
since we are always operating on a single page struct. Reclaim is fooled to
think that it is touching page sized objects (there are likely issues to be
fixed there if we want to go down this road).
What is currently not supported:
- Mmapping higher order pages
- Direct I/O (there are some fundamental issues with direct I/O
putting compound pages that have to be treated as single pages
on the pagevecs and the variable order page cache putting higher
order compound pages that hjave to be treated as a single large page
onto pagevecs.
Breakage:
- Reclaim does not work for some reasons. Compound pages on the active
list get lost somehow.
- Disk data is corrupted when writing ext2fs data. There is likely
still a lot of work to do in the block layer.
- There is a lot of incomplete work. There are numerous places
where the kernel can no longer assume that the page cache consists
of PAGE_SIZE pages that have not been fixed yet.
Future:
- Expect several more RFCs
- We hope for XFS support soon
- There are filesystem layer and lower layer issues here that I am not
that familiar with. If you can then please enhance my patches.
- Mmap support could be done in a way that makes the mmap page size
independent from the page cache order. There is no problem of mapping a
4k section of a larger page cache page. This should leave mmap as is.
- Lets try to keep scope as small as possible.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2007-04-23 6:48 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-23 6:48 Christoph Lameter [this message]
2007-04-23 6:48 ` [RFC 01/16] Free up page->private for compound pages Christoph Lameter
2007-04-24 2:12 ` Dave Hansen
2007-04-24 2:23 ` Christoph Lameter
2007-04-25 10:55 ` Mel Gorman
2007-04-23 6:48 ` [RFC 02/16] vmstat.c: Support accounting " Christoph Lameter
2007-04-25 10:59 ` Mel Gorman
2007-04-25 15:43 ` Christoph Lameter
2007-04-23 6:49 ` [RFC 03/16] Variable Order Page Cache: Add order field in mapping Christoph Lameter
2007-04-25 11:05 ` Mel Gorman
2007-04-23 6:49 ` [RFC 04/16] Variable Order Page Cache: Add basic allocation functions Christoph Lameter
2007-04-23 6:49 ` [RFC 05/16] Variable Order Page Cache: Add functions to establish sizes Christoph Lameter
2007-04-25 11:20 ` Mel Gorman
2007-04-25 15:54 ` Christoph Lameter
2007-04-23 6:49 ` [RFC 06/16] Variable Page Cache: Add VM_BUG_ONs to check for correct page order Christoph Lameter
2007-04-25 11:22 ` Mel Gorman
2007-04-23 6:49 ` [RFC 07/16] Variable Order Page Cache: Add clearing and flushing function Christoph Lameter
2007-04-23 6:49 ` [RFC 08/16] Variable Order Page Cache: Fixup fallback functions Christoph Lameter
2007-04-23 6:49 ` [RFC 09/16] Variable Order Page Cache: Fix up mm/filemap.c Christoph Lameter
2007-04-23 6:49 ` [RFC 10/16] Variable Order Page Cache: Readahead fixups Christoph Lameter
2007-04-25 11:36 ` Mel Gorman
2007-04-25 15:56 ` Christoph Lameter
[not found] ` <20070521104204.GA8795@mail.ustc.edu.cn>
2007-05-21 10:42 ` Fengguang Wu
2007-05-21 16:53 ` Christoph Lameter
[not found] ` <20070522005903.GA6184@mail.ustc.edu.cn>
2007-05-22 0:59 ` Fengguang Wu
[not found] ` <20070524040453.GA10662@mail.ustc.edu.cn>
2007-05-24 4:04 ` Fengguang Wu
2007-05-24 4:06 ` Christoph Lameter
2007-04-23 6:49 ` [RFC 11/16] Variable Page Cache Size: Fix up reclaim counters Christoph Lameter
2007-04-25 13:08 ` Mel Gorman
2007-04-23 6:49 ` [RFC 12/16] Variable Order Page Cache: Fix up the writeback logic Christoph Lameter
2007-04-23 6:49 ` [RFC 13/16] Variable Order Page Cache: Fixed to block layer Christoph Lameter
2007-04-23 6:49 ` [RFC 14/16] Variable Order Page Cache: Add support to ramfs Christoph Lameter
2007-04-23 6:50 ` [RFC 15/16] ext2: Add variable page size support Christoph Lameter
2007-04-23 16:30 ` Badari Pulavarty
2007-04-24 1:11 ` Christoph Lameter
2007-04-23 6:50 ` [RFC 16/16] Variable Order Page Cache: Alternate implementation of page cache macros Christoph Lameter
2007-04-25 13:16 ` Mel Gorman
2007-04-23 9:23 ` [RFC 00/16] Variable Order Page Cache Patchset V2 David Chinner
2007-04-23 9:31 ` David Chinner
-- strict thread matches above, loose matches on Subject: below --
2007-04-23 6:21 clameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070423064845.5458.2190.sendpatchset@schroedinger.engr.sgi.com \
--to=clameter@sgi.com \
--cc=aglitke@gmail.com \
--cc=avi@argo.co.il \
--cc=dgc@sgi.com \
--cc=hansendc@us.ibm.com \
--cc=jens.axboe@oracle.com \
--cc=linux-mm@kvack.org \
--cc=mel@skynet.ie \
--cc=pbadari@gmail.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox