On Tue, Sep 25, 2007 at 01:47:42PM +0530, Balbir Singh wrote: > Fengguang Wu wrote: > > On Mon, Sep 24, 2007 at 05:02:02PM -0500, Matt Mackall wrote: > >> I think Fengguang is just thinking forward to the next logical step > >> here which is "expose what's in the page cache". Which means being > > > > I have been doing it for a long time - that's the filecache patch I > > sent you. However it's not quite ready for a public review. > > > >> able to go from page back to device:inode:offset or (better, but > >> trickier) path:offset. > > > > It's doing the other way around - a top-down way. > > > > First, you get a table of all cached inodes with the following fields: > > device-number inode-number file-path cached-page-count status > > > > Then, one can query any file he's interested in, and list all its > > cached pages in the following format: > > index length page-flags reference-count > > This design sounds good to me, I would expect people using madvise() > to probably use this interface. Questions on the interface Thank you, answers below. > 1. What permissions would a program need to use the interface - inode list (whole system) Only root is allowed. Or there may be leak of information. Because we don't know the permission of the path. - page list (for one file) It's OK to view any file he can open. > 2. Do we export both mapped and unmapped page cache. How does this > interface gel with mincore(2)? Is there duplicate information Both are exported. It's system wide info, and hence the superset of mincore(2). > 3. If the user already knows the file of interest, is it possible > to list, it's cached pages without having to list all cached inodes Sure, it's easy: # echo 'cat /bin/bash' > /proc/filecache # cat /proc/filecache To get the inode list: # echo 'ls' > /proc/filecache # cat /proc/filecache Yes, /proc/filecache accepts simple commands. Which could make it an unfavorably complex interface... Anyway, I've been focusing on exporting (more than enough) information. There will be a lot room of discussion when it comes down to the details of *interface*. > 4. What's the size of data (expected average) and the format, binary > or text? Here are some numbers and output samples on my desktop: - inode list: 185KB # ino size cached cached% refcnt state accessed uid process dev file [...] 2888725 82 84 100 7 -- 140 0 init 08:02(sda2) /lib/libselinux.so.1 2888724 216 84 38 7 -- 140 0 init 08:02(sda2) /lib/libsepol.so.1 1258136 50 52 100 0 -- 2349 0 init 08:02(sda2) /etc/ld.so.cache 2889047 115 116 100 52 -- 1727 0 swapper 08:02(sda2) /lib/ld-2.6.so 1403527 32 32 100 1 -- 1 0 swapper 08:02(sda2) /sbin/init [...] - page list: bash 523B; firefox 6.5KB # file /bin/bash # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback B:buffer # idx len state refcnt 0 76 RAMU___ 2 76 4 ___U___ 1 80 5 RAMU___ 2 85 1 ___U___ 1 86 1 RAMU___ 2 87 1 RA_U___ 1 88 1 RAMU___ 2 89 4 RA_U___ 1 93 1 ___U___ 1 94 1 R__U___ 1 95 1 RA_U___ 1 96 3 RAMU___ 2 99 1 RA_U___ 1 100 1 RAMU___ 2 101 2 RA_U___ 1 103 6 RAMU___ 2 109 1 RA_U___ 1 110 10 ___U___ 1 123 10 ___U___ 1 133 4 RA_U___ 1 137 7 RAMU___ 2 144 10 ___U___ 1 154 7 RAMU___ 2 161 2 RA_U___ 1 163 1 ___U___ 1 164 2 RA_U___ 1 Attached is the patch on 2.6.23-rc6 for your convenience. It's pretty stable and safe to use, in despite of it being a bit fat ;-) Thank you, Fengguang