Thanks for the reply. When the job is finished, "tasks" is empty, but "memory.stat" still contains cache, active_file... # cat tasks # cat memory.stat cache 81920 rss 0 mapped_file 0 pgpgin 9440 pgpgout 9420 swap 0 inactive_anon 0 active_anon 0 inactive_file 77824 active_file 4096 unevictable 0 hierarchical_memory_limit 9223372036854775807 hierarchical_memsw_limit 9223372036854775807 total_cache 81920 total_rss 0 total_mapped_file 0 total_pgpgin 9440 total_pgpgout 9420 total_swap 0 total_inactive_anon 0 total_active_anon 0 total_inactive_file 77824 total_active_file 4096 total_unevictable 0 After echo 0 to memory.force_empty, cache is cleaned. # echo 0 > memory.force_empty # cat memory.stat cache 0 rss 0 mapped_file 0 pgpgin 9440 pgpgout 9440 swap 0 inactive_anon 0 active_anon 0 inactive_file 0 active_file 0 unevictable 0 hierarchical_memory_limit 9223372036854775807 hierarchical_memsw_limit 9223372036854775807 total_cache 0 total_rss 0 total_mapped_file 0 total_pgpgin 9440 total_pgpgout 9440 total_swap 0 total_inactive_anon 0 total_active_anon 0 total_inactive_file 0 total_active_file 0 total_unevictable 0 We cannot leave it lazily because when new job reuse the cgroup, "cache" doesn't be cleaned automatically. We need a mechanism that clean memory.stat. Thanks & Regards, --Zhaohui From: Johannes Weiner To: Zhao Hui Ding/China/IBM@IBMCN Cc: Tejun Heo , cgroups@vger.kernel.org, linux-mm@kvack.org Date: 2016-11-04 ÏÂÎç 11:21 Subject: Re: memory.force_empty is deprecated Hi, On Fri, Nov 04, 2016 at 04:24:25PM +0800, Zhao Hui Ding wrote: > Hello, > > I'm Zhaohui from IBM Spectrum LSF development team. I got below message > when running LSF on SUSE11.4, so I would like to share our use scenario > and ask for the suggestions without using memory.force_empty. > > memory.force_empty is deprecated and will be removed. Let us know if it is > needed in your usecase at linux-mm@kvack.org > > LSF is a batch workload scheduler, it uses cgroup to do batch jobs > resource enforcement and accounting. For each job, LSF creates a cgroup > directory and put job's PIDs to the cgroup. > > When we implement LSF cgroup integration, we found creating a new cgroup > is much slower than renaming an existing cgroup, it's about hundreds of > milliseconds vs less than 10 milliseconds. Cgroup creation/deletion is not expected to be an ultra-hot path, but I'm surprised it takes longer than actually reclaiming leftover pages. By the time the jobs conclude, how much is usually left in the group? That said, is it even necessary to pro-actively remove the leftover cache from the group before starting the next job? Why not leave it for the next job to reclaim it lazily should memory pressure arise? It's easy to reclaim page cache, and the first to go as it's behind the next job's memory on the LRU list.