This is version 3 or 4....for rss counting. removed RFC. My purpose is gathering more (rss-related) information per process without scalability impact. (and improve oom-killer etc..) The whole patch series is organized as [1/5] clean-up per mm stat counting. [2/5] making counter (a bit) more scalable with per-thread counting. [3/5] adding swap counter per mm [4/5] adding lowmem detection logic [5/5] adding lowmem usage counter per mm. Big changes from previous one are... - removed per-cpu counter. added per-thread counter - synchronization point of a counter is moved to memory.c no hooks to ticks and scheduler. Now, this patch is not very invasive as previous ones. cache-miss/page fault with my benchmark on my box is [Before patch] 4.55 cache-miss/fault [After patch 2] 3.99 cache-miss/fault [After all patch] 4.06 cache-miss/fault >From this numbers, I think swap/lowmem counters can be added. My test program is attached (this is not modified from previous one) [Future Plan] - add CONSTRAINT_LOWMEM oom killer. - add rss+swap based oom killer (with sysctl ?) - add some patch for perf ? - add mm_accessor patch. - improve page faults scalability, finally. Thanks, -Kame