* [RFC PATCH 0/5] Add accountings for Page Cache
@ 2011-03-02 8:37 noname noname
2011-03-03 1:50 ` KOSAKI Motohiro
0 siblings, 1 reply; 3+ messages in thread
From: noname noname @ 2011-03-02 8:37 UTC (permalink / raw)
To: linux-kernel, linux-mm, jaxboe, akpm, fengguang.wu
[-- Attachment #1: Type: text/plain, Size: 3308 bytes --]
[Summery]
In order to evaluate page cache efficiency, system admins are happy to
know whether a block of data is cached for subsequent use, or whether
the page is read-in but seldom used. This patch extends an effort to
provide such kind of information. We adds three counters, which are
exported to the user space, for the Page Cache that is almost
transparent to the applications. This would benifit some heavy page
cache users that might try to tune the performance in hybrid storage
situation.
[Detail]
The kernel would query the page cache first when it tries to manipulate
file data & meta data. If the target data is out there, this is called
page cache _hit_ and will save one IO operation to disk. If the target
data is absent, then the kernel will issue the real IO requests to the
disk, this is called page cache _miss_.
Two counters are page cache specific, that is, page cache _hit_ and
_miss_. Another counter named _readpages_ is also added because the
kernel relys on the readahead module to make the real read requests to
save future read IOs. The _readpages_ is supposed to give more
information about kernel read operations.
The different combinations of three counters would give some hints on
kernel page cache system. For example, nr(hit) + nr(miss) would means
how many request[nr(request)] the kernel ask for in some time.
nr(miss)/nr(requests) would produce miss ratio, etc.
There is a long request from our operation teams who run hapdoop in a
very large scale. They ask for some information about underlying Page
Cache system when they are tuning the applications.
The statistics are collected per partition. This would benifit
performance tuning at the situation when the hybrid storage are applied
(for example, SSD + SAS + SATA).
Currently only regular file data in the page acche are collected.[meta
data accounting is also under consideration]
There is still much work that needs to be done, but it is better for me
to send it out to review and get feedbacks as early as possible.
[Performance]
Since the patch is on the one of the hottest code path of the kernel, I
did a simple function gragh tracing on the sys_read() path by
_no-inlining_ the hit function with loop-reading a 2G
file.[hit/miss/readpages share virtually the same logic]
1)first read a 2G file from disk into page cache.
2)read 2G file in a loop without disk IOs.
3)function graph tracing on sys_read()
This is the worst case for hit function, it is called every time when
kernel query the page cache.
In the context, test shows that sys_read() costs 8.567us, hit() costs
0.173us (approximate to put_page() function), so 0.173 / 8.567 = 2%.
Any comments are more than welcome :)
-Yuan
--------------------
Liu Yuan(5)
x86/Kconfig: Add Page Cache Accounting entry
block: Add functions and data types for Page Cache Accounting
block: Make page cache counters work with sysfs
mm: Add hit/miss accounting for Page Cache
mm: Add readpages accounting
arch/x86/Kconfig.debug | 9 +++++++
block/genhd.c | 6 ++++
fs/partitions/check.c | 23 ++++++++++++++++++
include/linux/genhd.h | 60
++++++++++++++++++++++++++++++++++++++++++++++++
mm/filemap.c | 27 ++++++++++++++++++---
mm/readahead.c | 2 +
6 files changed, 123 insertions(+), 4 deletions(-)
[-- Attachment #2: Type: text/html, Size: 3583 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC PATCH 0/5] Add accountings for Page Cache
2011-03-02 8:37 [RFC PATCH 0/5] Add accountings for Page Cache noname noname
@ 2011-03-03 1:50 ` KOSAKI Motohiro
2011-03-04 1:55 ` Liu Yuan
0 siblings, 1 reply; 3+ messages in thread
From: KOSAKI Motohiro @ 2011-03-03 1:50 UTC (permalink / raw)
To: noname noname
Cc: kosaki.motohiro, linux-kernel, linux-mm, jaxboe, akpm, fengguang.wu
> [Summery]
>
> In order to evaluate page cache efficiency, system admins are happy to
> know whether a block of data is cached for subsequent use, or whether
> the page is read-in but seldom used. This patch extends an effort to
> provide such kind of information. We adds three counters, which are
> exported to the user space, for the Page Cache that is almost
> transparent to the applications. This would benifit some heavy page
> cache users that might try to tune the performance in hybrid storage
> situation.
I think you need to explain exact and concrete use-case. Typically,
cache-hit ratio doesn't help administrator at all. because merely backup
operation (eg. cp, dd, et al) makes prenty cache-miss. But it is no sign
of memory shortage. Usually, vmscan stastics may help memroy utilization
obzavation.
Plus, as ingo said, you have to consider to use trancepoint framework
at first. Because, it is zero cost if an admin don't enable such tracepoint.
At last, I don't think disk_stats have to have page cache stastics. It seems
slightly layer violation.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC PATCH 0/5] Add accountings for Page Cache
2011-03-03 1:50 ` KOSAKI Motohiro
@ 2011-03-04 1:55 ` Liu Yuan
0 siblings, 0 replies; 3+ messages in thread
From: Liu Yuan @ 2011-03-04 1:55 UTC (permalink / raw)
To: KOSAKI Motohiro; +Cc: linux-kernel, linux-mm, jaxboe, akpm, fengguang.wu
[-- Attachment #1: Type: text/plain, Size: 1611 bytes --]
On Thu, Mar 3, 2011 at 9:50 AM, KOSAKI Motohiro <
kosaki.motohiro@jp.fujitsu.com> wrote:
> > [Summery]
> >
> > In order to evaluate page cache efficiency, system admins are happy to
> > know whether a block of data is cached for subsequent use, or whether
> > the page is read-in but seldom used. This patch extends an effort to
> > provide such kind of information. We adds three counters, which are
> > exported to the user space, for the Page Cache that is almost
> > transparent to the applications. This would benifit some heavy page
> > cache users that might try to tune the performance in hybrid storage
> > situation.
>
> I think you need to explain exact and concrete use-case. Typically,
> cache-hit ratio doesn't help administrator at all. because merely backup
> operation (eg. cp, dd, et al) makes prenty cache-miss. But it is no sign
> of memory shortage. Usually, vmscan stastics may help memroy utilization
> obzavation.
>
> Plus, as ingo said, you have to consider to use trancepoint framework
> at first. Because, it is zero cost if an admin don't enable such
> tracepoint.
>
>
Thanks very much for your comments.
Yeah, we'er going to try tracepoint and perf as Ingo said.
> At last, I don't think disk_stats have to have page cache stastics. It
> seems
> slightly layer violation.
>
> Thanks.
>
>
This is the starting point of the patch set, so I simply embedded the
structure into the existing infrastructure. This did saved me a lot of
effort because disk_stats is a good place to collect stats on _partition_
basis. Anyway, as you pointed out, this is kind of the mess.
Thanks,
Yuan
[-- Attachment #2: Type: text/html, Size: 2213 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-03-04 1:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-02 8:37 [RFC PATCH 0/5] Add accountings for Page Cache noname noname
2011-03-03 1:50 ` KOSAKI Motohiro
2011-03-04 1:55 ` Liu Yuan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox