Thanks for the reply.

When the job is finished, "tasks" is empty, but "memory.stat" still 
contains cache, active_file...
# cat tasks
# cat memory.stat 
cache 81920
rss 0
mapped_file 0
pgpgin 9440
pgpgout 9420
swap 0
inactive_anon 0
active_anon 0
inactive_file 77824
active_file 4096
unevictable 0
hierarchical_memory_limit 9223372036854775807
hierarchical_memsw_limit 9223372036854775807
total_cache 81920
total_rss 0
total_mapped_file 0
total_pgpgin 9440
total_pgpgout 9420
total_swap 0
total_inactive_anon 0
total_active_anon 0
total_inactive_file 77824
total_active_file 4096
total_unevictable 0

After echo 0 to memory.force_empty, cache is cleaned.
# echo 0 > memory.force_empty 
# cat memory.stat 
cache 0
rss 0
mapped_file 0
pgpgin 9440
pgpgout 9440
swap 0
inactive_anon 0
active_anon 0
inactive_file 0
active_file 0
unevictable 0
hierarchical_memory_limit 9223372036854775807
hierarchical_memsw_limit 9223372036854775807
total_cache 0
total_rss 0
total_mapped_file 0
total_pgpgin 9440
total_pgpgout 9440
total_swap 0
total_inactive_anon 0
total_active_anon 0
total_inactive_file 0
total_active_file 0
total_unevictable 0

We cannot leave it lazily because when new job reuse the cgroup, "cache" 
doesn't be cleaned automatically.
We need a mechanism that clean memory.stat.

Thanks & Regards,
--Zhaohui



From:   Johannes Weiner <hannes@cmpxchg.org>
To:     Zhao Hui Ding/China/IBM@IBMCN
Cc:     Tejun Heo <tj@kernel.org>, cgroups@vger.kernel.org, 
linux-mm@kvack.org
Date:   2016-11-04 دآخه 11:21
Subject:        Re: memory.force_empty is deprecated



Hi,

On Fri, Nov 04, 2016 at 04:24:25PM +0800, Zhao Hui Ding wrote:
> Hello,
> 
> I'm Zhaohui from IBM Spectrum LSF development team. I got below message 
> when running LSF on SUSE11.4, so I would like to share our use scenario 
> and ask for the suggestions without using memory.force_empty.
> 
> memory.force_empty is deprecated and will be removed. Let us know if it 
is 
> needed in your usecase at linux-mm@kvack.org
> 
> LSF is a batch workload scheduler, it uses cgroup to do batch jobs 
> resource enforcement and accounting. For each job, LSF creates a cgroup 
> directory and put job's PIDs to the cgroup.
> 
> When we implement LSF cgroup integration, we found creating a new cgroup 

> is much slower than renaming an existing cgroup, it's about hundreds of 
> milliseconds vs less than 10 milliseconds.

Cgroup creation/deletion is not expected to be an ultra-hot path, but
I'm surprised it takes longer than actually reclaiming leftover pages.

By the time the jobs conclude, how much is usually left in the group?

That said, is it even necessary to pro-actively remove the leftover
cache from the group before starting the next job? Why not leave it
for the next job to reclaim it lazily should memory pressure arise?
It's easy to reclaim page cache, and the first to go as it's behind
the next job's memory on the LRU list.