Hi Michal, I've done the performance testing, please check it out. >> Yes this is all understood but the level of the overhead is not really >> clear. So the question is whether this will induce a visible overhead. >> Because from the maintainability point of view it is much less costly to >> have a clear life time model. Right now we have a mix of reference >> counting and per-task requirements which is rather subtle and easy to >> get wrong. In an ideal world we would have get_vma_policy always >> returning a reference counted policy or NULL. If we really need to >> optimize for cache line bouncing we can go with per cpu reference >> counters (something that was not available at the time the mempolicy >> code has been introduced). >> >> So I am not saying that the task_work based solution is not possible I >> just think that this looks like a good opportunity to get from the >> existing subtle model. Test tools: numactl -m 0-3 ./run-mmtests.sh -n -c configs/config-workload- aim9-pagealloc test_name Modification: Get_vma_policy(), get_task_policy() always returning a reference counted policy, except for the static policy(default_policy and preferred_node_policy[nid]). All vma manipulation is protected by a down_read, so mpol_get() can be called directly to take a refcount on the mpol. but there is no lock in task->mempolicy context. so task->mempolicy should be protected by task_lock. struct mempolicy *get_task_policy(struct task_struct *p) { struct mempolicy *pol; int node; if (p->mempolicy) { task_lock(p); pol = p->mempolicy; mpol_get(pol); task_unlock(p); if (pol) return pol; } ..... } Test Case1: Describe: Test directly, no other user processes. Result: This will degrade performance about 1% to 3%. For more information, please see the attachment:mpol.txt aim9 Hmean page_test 484561.68 ( 0.00%) 471039.34 * -2.79%* Hmean brk_test 1400702.48 ( 0.00%) 1388949.10 * -0.84%* Hmean exec_test 2339.45 ( 0.00%) 2278.41 * -2.61%* Hmean fork_test 6500.02 ( 0.00%) 6500.17 * 0.00%* Test Case2: Describe: Added a user process, top. Result: This will degrade performance about 2.1%. For more information, please see the attachment:mpol_top.txt Hmean page_test 477916.47 ( 0.00%) 467829.01 * -2.11%* Hmean brk_test 1351439.76 ( 0.00%) 1373663.90 * 1.64%* Hmean exec_test 2312.24 ( 0.00%) 2296.06 * -0.70%* Hmean fork_test 6483.46 ( 0.00%) 6472.06 * -0.18%* Test Case3: Describe: Add a daemon to read /proc/$test_pid/status, which will acquire task_lock. while :;do cat /proc/$(pidof singleuser)/status;done Result: the baseline is degrade from 484561(case1) to 438591(about 10%) when the daemon was add, but the performance degradation in case3 is about 3.2%. For more information, please see the attachment:mpol_status.txt Hmean page_test 438591.97 ( 0.00%) 424251.22 * -3.27%* Hmean brk_test 1268906.57 ( 0.00%) 1278100.12 * 0.72%* Hmean exec_test 2301.19 ( 0.00%) 2192.71 * -4.71%* Hmean fork_test 6453.24 ( 0.00%) 6090.48 * -5.62%* Thanks, Zhongkun.