linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/16] Variable Order Page Cache Patchset V2
@ 2007-04-23  6:48 Christoph Lameter
  2007-04-23  6:48 ` [RFC 01/16] Free up page->private for compound pages Christoph Lameter
                   ` (17 more replies)
  0 siblings, 18 replies; 40+ messages in thread
From: Christoph Lameter @ 2007-04-23  6:48 UTC (permalink / raw)
  To: linux-mm
  Cc: William Lee Irwin III, Badari Pulavarty, David Chinner,
	Jens Axboe, Adam Litke, Christoph Lameter, Dave Hansen,
	Mel Gorman, Avi Kivity

Sorry for the earlier mail. quilt and exim not cooperating.

RFC V1->V2
- Some ext2 support
- Some block layer, fs layer support etc.
- Better page cache macros
- Use macros to clean up code.

This patchset modifies the Linux kernel so that higher order page cache
pages become possible. The higher order page cache pages are compound pages
and can be handled in the same way as regular pages.

Rationales:

1. We have problems supporting devices with a higher blocksize than
   page size. This is for example important to support CD and DVDs that
   can only read and write 32k or 64k blocks. We currently have a shim
   layer in there to deal with this situation which limits the speed
   of I/O. The developers are currently looking for ways to completely
   bypass the page cache because of this deficiency.

2. 32/64k blocksize is also used in flash devices. Same issues.

3. Future harddisks will support bigger block sizes

4. Performace. If we look at IA64 vs. x86_64 then it seems that the
   faster interrupt handling on x86_64 compensate for the speed loss due to
   a smaller page size (4k vs 16k on IA64). Having higher page sizes on all
   platform allows a significant reduction in I/O overhead and increases the
   size of I/O that can be performed by hardware in a single request
   since the number of scatter gather entries are typically limited for
   one request. This is going to become increasingly important to support
   the ever growing memory sizes since we may have to handle excessively
   large amounts of 4k requests for data sizes that may become common
   soon. For example to write a 1 terabyte file the kernel would have to
   handle 256 million 4k chunks.

5. Cross arch compatibility: It is currently not possible to mount
   an 16k blocksize ext2 filesystem created on IA64 on an x86_64 system.

The support here is currently only for buffered I/O and only for two
filesystems ramfs and ext2.

Note that the higher order pages are subject to reclaim. This works in general
since we are always operating on a single page struct. Reclaim is fooled to
think that it is touching page sized objects (there are likely issues to be
fixed there if we want to go down this road).

What is currently not supported:
- Mmapping higher order pages
- Direct I/O (there are some fundamental issues with direct I/O
  putting compound pages that have to be treated as single pages
  on the pagevecs and the variable order page cache putting higher
  order compound pages that hjave to be treated as a single large page
  onto pagevecs.

Breakage:
- Reclaim does not work for some reasons. Compound pages on the active
  list get lost somehow.
- Disk data is corrupted when writing ext2fs data. There is likely
  still a lot of work to do in the block layer.
- There is a lot of incomplete work. There are numerous places
  where the kernel can no longer assume that the page cache consists
  of PAGE_SIZE pages that have not been fixed yet.

Future:
- Expect several more RFCs
- We hope for XFS support soon
- There are filesystem layer and lower layer issues here that I am not
  that familiar with. If you can then please enhance my patches.
- Mmap support could be done in a way that makes the mmap page size
  independent from the page cache order. There is no problem of mapping a
  4k section of a larger page cache page. This should leave mmap as is.
- Lets try to keep scope as small as possible.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread
* [RFC 00/16] Variable Order Page Cache Patchset V2
@ 2007-04-23  6:21 clameter
  2007-04-23  6:21 ` [RFC 10/16] Variable Order Page Cache: Readahead fixups clameter
  0 siblings, 1 reply; 40+ messages in thread
From: clameter @ 2007-04-23  6:21 UTC (permalink / raw)
  To: linux-mm
  Cc: Mel Gorman, William Lee Irwin III, Adam Litke, David Chinner,
	Jens Axboe, Avi Kivity, Dave Hansen, Badari Pulavarty,
	Maxim Levitsky

RFC V1->V2
- Some ext2 support
- Some block layer, fs layer support etc.
- Better page cache macros
- Use macros to clean up code.

This patchset modifies the Linujx kernel so that higher order page cache
pages become possible. The higher order page cache pages are compound pages
and can be handled in the same way as regular pages.

Rationales:

1. We have problems supporting devices with a higher blocksize than
   page size. This is for example important to support CD and DVDs that
   can only read and write 32k or 64k blocks. We currently have a shim
   layer in there to deal with this situation which limits the speed
   of I/O. The developers are currently looking for ways to completely
   bypass the page cache because of this deficiency.

2. 32/64k blocksize is also used in flash devices. Same issues.

3. Future harddisks will support bigger block sizes

4. Performace. If we look at IA64 vs. x86_64 then it seems that the
   faster interrupt handling on x86_64 compensate for the speed loss due to
   a smaller page size (4k vs 16k on IA64). Having higher page sizes on all
   platform allows a significant reduction in I/O overhead and increases the
   size of I/O that can be performed by hardware in a single request
   since the number of scatter gather entries are typically limited for
   one request. This is going to become increasingly important to support
   the ever growing memory sizes since we may have to handle excessively
   large amounts of 4k requests for data sizes that may become common
   soon. For example to write a 1 terabyte file the kernel would have to
   handle 256 million 4k chunks.

5. Cross arch compatibility: It is currently not possible to mount
   an 16k blocksize ext2 filesystem created on IA64 on an x86_64 system.

The support here is currently only for buffered I/O and only for two
filesystems ramfs and ext2.

Note that the higher order pages are subject to reclaim. This works in general
since we are always operating on a single page struct. Reclaim is fooled to
think that it is touching page sized objects (there are likely issues to be
fixed there if we want to go down this road).

What is currently not supported:
- Mmapping higher order pages
- Direct I/O (there are some fundamental issues with direct I/O
  putting compound pages that have to be treated as single pages
  on the pagevecs and the variable order page cache putting higher
  order compound pages that hjave to be treated as a single large page
  onto pagevecs.

Breakage:
- Reclaim does not work for some reasons. Compound pages on the active
  list get lost somehow.
- Disk data is corrupted when writing ext2fs data. There is likely
  still a lot of work to do in the block layer.
- There is a lot of incomplete work. There are numerous places
  where the kernel can no longer assume that the page cache consists
  of PAGE_SIZE pages that have not been fixed yet.

Future:
- Expect several more RFCs
- We hope for XFS support soon
- There are filesystem layer and lower layer issues here that I am not
  that familiar with. If you can then please enhance my patches.
- Mmap support could be done in a way that makes the mmap page size
  independent from the page cache order. There is no problem of mapping a
  4k section of a larger page cache page. This should leave mmap as is.
- Lets try to keep scope as small as possible.


--

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2007-05-24  4:06 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-23  6:48 [RFC 00/16] Variable Order Page Cache Patchset V2 Christoph Lameter
2007-04-23  6:48 ` [RFC 01/16] Free up page->private for compound pages Christoph Lameter
2007-04-24  2:12   ` Dave Hansen
2007-04-24  2:23     ` Christoph Lameter
2007-04-25 10:55   ` Mel Gorman
2007-04-23  6:48 ` [RFC 02/16] vmstat.c: Support accounting " Christoph Lameter
2007-04-25 10:59   ` Mel Gorman
2007-04-25 15:43     ` Christoph Lameter
2007-04-23  6:49 ` [RFC 03/16] Variable Order Page Cache: Add order field in mapping Christoph Lameter
2007-04-25 11:05   ` Mel Gorman
2007-04-23  6:49 ` [RFC 04/16] Variable Order Page Cache: Add basic allocation functions Christoph Lameter
2007-04-23  6:49 ` [RFC 05/16] Variable Order Page Cache: Add functions to establish sizes Christoph Lameter
2007-04-25 11:20   ` Mel Gorman
2007-04-25 15:54     ` Christoph Lameter
2007-04-23  6:49 ` [RFC 06/16] Variable Page Cache: Add VM_BUG_ONs to check for correct page order Christoph Lameter
2007-04-25 11:22   ` Mel Gorman
2007-04-23  6:49 ` [RFC 07/16] Variable Order Page Cache: Add clearing and flushing function Christoph Lameter
2007-04-23  6:49 ` [RFC 08/16] Variable Order Page Cache: Fixup fallback functions Christoph Lameter
2007-04-23  6:49 ` [RFC 09/16] Variable Order Page Cache: Fix up mm/filemap.c Christoph Lameter
2007-04-23  6:49 ` [RFC 10/16] Variable Order Page Cache: Readahead fixups Christoph Lameter
2007-04-25 11:36   ` Mel Gorman
2007-04-25 15:56     ` Christoph Lameter
     [not found]       ` <20070521104204.GA8795@mail.ustc.edu.cn>
2007-05-21 10:42         ` Fengguang Wu
2007-05-21 16:53           ` Christoph Lameter
     [not found]             ` <20070522005903.GA6184@mail.ustc.edu.cn>
2007-05-22  0:59               ` Fengguang Wu
     [not found]             ` <20070524040453.GA10662@mail.ustc.edu.cn>
2007-05-24  4:04               ` Fengguang Wu
2007-05-24  4:06                 ` Christoph Lameter
2007-04-23  6:49 ` [RFC 11/16] Variable Page Cache Size: Fix up reclaim counters Christoph Lameter
2007-04-25 13:08   ` Mel Gorman
2007-04-23  6:49 ` [RFC 12/16] Variable Order Page Cache: Fix up the writeback logic Christoph Lameter
2007-04-23  6:49 ` [RFC 13/16] Variable Order Page Cache: Fixed to block layer Christoph Lameter
2007-04-23  6:49 ` [RFC 14/16] Variable Order Page Cache: Add support to ramfs Christoph Lameter
2007-04-23  6:50 ` [RFC 15/16] ext2: Add variable page size support Christoph Lameter
2007-04-23 16:30   ` Badari Pulavarty
2007-04-24  1:11     ` Christoph Lameter
2007-04-23  6:50 ` [RFC 16/16] Variable Order Page Cache: Alternate implementation of page cache macros Christoph Lameter
2007-04-25 13:16   ` Mel Gorman
2007-04-23  9:23 ` [RFC 00/16] Variable Order Page Cache Patchset V2 David Chinner
2007-04-23  9:31 ` David Chinner
  -- strict thread matches above, loose matches on Subject: below --
2007-04-23  6:21 clameter
2007-04-23  6:21 ` [RFC 10/16] Variable Order Page Cache: Readahead fixups clameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox