On Wed, Mar 30, 2011 at 7:25 PM, KAMEZAWA Hiroyuki < kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Wed, 30 Mar 2011 17:48:18 -0700 > Ying Han wrote: > > > In memory controller, we do both targeting reclaim and global reclaim. > The > > later one walks through the global lru which links all the allocated > pages > > on the system. It breaks the memory isolation since pages are evicted > > regardless of their memcg owners. This patch takes pages off global lru > > as long as they are added to per-memcg lru. > > > > Memcg and cgroup together provide the solution of memory isolation where > > multiple cgroups run in parallel without interfering with each other. In > > vm, memory isolation requires changes in both page allocation and page > > reclaim. The current memcg provides good user page accounting, but need > > more work on the page reclaim. > > > > In an over-committed machine w/ 32G ram, here is the configuration: > > > > cgroup-A/ -- limit_in_bytes = 20G, soft_limit_in_bytes = 15G > > cgroup-B/ -- limit_in_bytes = 20G, soft_limit_in_bytes = 15G > > > > 1) limit_in_bytes is the hard_limit where process will be throttled or > OOM > > killed by going over the limit. > > 2) memory between soft_limit and limit_in_bytes are best-effort. > soft_limit > > provides "guarantee" in some sense. > > > > Then, it is easy to generate the following senario where: > > > > cgroup-A/ -- usage_in_bytes = 20G > > cgroup-B/ -- usage_in_bytes = 12G > > > > The global memory pressure triggers while cgroup-A keep allocating > memory. At > > this point, pages belongs to cgroup-B can be evicted from global LRU. > > > > We do have per-memcg targeting reclaim including per-memcg background > reclaim > > and soft_limit reclaim. Both of them need some improvement, and > regardless we > > still need this patch since it breaks isolation. > > > > Besides, here is to-do list I have on memcg page reclaim and they are > sorted. > > a) per-memcg background reclaim. to reclaim pages proactively > agree, > > > b) skipping global lru reclaim if soft_limit reclaim does enough work. > this is > > both for global background reclaim and global ttfp reclaim. > > agree. but zone-balancing cannot be avoidalble for now. So, I think we need > a > inter-zone-page-migration to balancing memory between zones...if necessary. > thank you for your comments, and can you clarify a bit on this? Actually I was thinking about the zone balancing within memcg, but haven't thought it through yet. I would like to learn more on the cases that we can not avoid global zone-balancing totally. > > > > c) improve the soft_limit reclaim to be efficient. > > must be done. > The current design of soft_limit is more on the correctness rather than efficiency. If we are talking about to improve the efficiency of target reclaim, there are quite a lot to change. The first thing might be improving the per-zone RB tree. They are currently based on per-memcg (usage_limit-soft_limit) regardless of how much pages landed on the zone. > > > d) isolate pages in memcg from global list since it breaks memory > isolation. > > > > > I never agree this until about a),b),c) is fixed and we can go nowhere. > > BTW, in other POV, for reducing size of page_cgroup, we must remove ->lru > on page_cgroup. If divide-and-conquer memory reclaim works enough, > we can do that. But this is a big global VM change, so we need enough > justification. > I can agree on that. The change looks big, especially without efficient target reclaim. However I do believe we need this to have isolation guarantee. > > > > > I have some basic test on this patch and more tests definitely are > needed: > > > > > Functional: > > two memcgs under root. cgroup-A is reading 20g file with 2g limit, > > cgroup-B is running random stuff with 500m limit. Check the counters for > > per-memcg lru and global lru, and they should add-up. > > > > 1) total file pages > > $ cat /proc/meminfo | grep Cache > > Cached: 6032128 kB > > > > 2) file lru on global lru > > $ cat /proc/vmstat | grep file > > nr_inactive_file 0 > > nr_active_file 963131 > > > > 3) file lru on root cgroup > > $ cat /dev/cgroup/memory.stat | grep file > > inactive_file 0 > > active_file 0 > > > > 4) file lru on cgroup-A > > $ cat /dev/cgroup/A/memory.stat | grep file > > inactive_file 2145759232 > > active_file 0 > > > > 5) file lru on cgroup-B > > $ cat /dev/cgroup/B/memory.stat | grep file > > inactive_file 401408 > > active_file 143360 > > > > Performance: > > run page fault test(pft) with 16 thread on faulting in 15G anon pages > > in 16G cgroup. There is no regression noticed on "flt/cpu/s" > > > > You need a fix for /proc/meminfo, /proc/vmstat to count memcg's ;) > Yes. :) Since this is RFC prototype, i took the shortcut by reusing the existing stat by only count the pages on global LRU. > > Anyway, this seems too aggresive to me, for now. Please do a), b), c), at > first. > > > IIUC, this patch itself can cause a livelock when softlimit is > misconfigured. > What is the protection against wrong softlimit ? > Hmm, can you help to clarify on that? > > If we do this kind of LRU isolation, we'll need some limitation of the sum > of > limits of all memcg for avoiding wrong configuration. That may change UI, > dramatically. > (As RT-class cpu limiting cgroup does.....) > This sounds related the question above, so I just wait for my question being answered :) Anyway, thank you for data. > > sure --Ying > Thanks, > -Kame > > >