[Summery] In order to evaluate page cache efficiency, system admins are happy to know whether a block of data is cached for subsequent use, or whether the page is read-in but seldom used. This patch extends an effort to provide such kind of information. We adds three counters, which are exported to the user space, for the Page Cache that is almost transparent to the applications. This would benifit some heavy page cache users that might try to tune the performance in hybrid storage situation. [Detail] The kernel would query the page cache first when it tries to manipulate file data & meta data. If the target data is out there, this is called page cache _hit_ and will save one IO operation to disk. If the target data is absent, then the kernel will issue the real IO requests to the disk, this is called page cache _miss_. Two counters are page cache specific, that is, page cache _hit_ and _miss_. Another counter named _readpages_ is also added because the kernel relys on the readahead module to make the real read requests to save future read IOs. The _readpages_ is supposed to give more information about kernel read operations. The different combinations of three counters would give some hints on kernel page cache system. For example, nr(hit) + nr(miss) would means how many request[nr(request)] the kernel ask for in some time. nr(miss)/nr(requests) would produce miss ratio, etc. There is a long request from our operation teams who run hapdoop in a very large scale. They ask for some information about underlying Page Cache system when they are tuning the applications. The statistics are collected per partition. This would benifit performance tuning at the situation when the hybrid storage are applied (for example, SSD + SAS + SATA). Currently only regular file data in the page acche are collected.[meta data accounting is also under consideration] There is still much work that needs to be done, but it is better for me to send it out to review and get feedbacks as early as possible. [Performance] Since the patch is on the one of the hottest code path of the kernel, I did a simple function gragh tracing on the sys_read() path by _no-inlining_ the hit function with loop-reading a 2G file.[hit/miss/readpages share virtually the same logic] 1)first read a 2G file from disk into page cache. 2)read 2G file in a loop without disk IOs. 3)function graph tracing on sys_read() This is the worst case for hit function, it is called every time when kernel query the page cache. In the context, test shows that sys_read() costs 8.567us, hit() costs 0.173us (approximate to put_page() function), so 0.173 / 8.567 = 2%. Any comments are more than welcome :) -Yuan -------------------- Liu Yuan(5) x86/Kconfig: Add Page Cache Accounting entry block: Add functions and data types for Page Cache Accounting block: Make page cache counters work with sysfs mm: Add hit/miss accounting for Page Cache mm: Add readpages accounting arch/x86/Kconfig.debug | 9 +++++++ block/genhd.c | 6 ++++ fs/partitions/check.c | 23 ++++++++++++++++++ include/linux/genhd.h | 60 ++++++++++++++++++++++++++++++++++++++++++++++++ mm/filemap.c | 27 ++++++++++++++++++--- mm/readahead.c | 2 + 6 files changed, 123 insertions(+), 4 deletions(-)