at 4:34 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:When running some mmap/munmap scalability tests with large memory (i.e.300GB), the below hung task issue may happen occasionally.INFO: task ps:14018 blocked for more than 120 seconds. Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ps D 0 14018 1 0x00000004(snip)Zapping pages is the most time consuming part, according to the suggestion from Michal Hock [1], zapping pages can be done with holding read mmap_sem, like what MADV_DONTNEED does. Then re-acquire write mmap_sem to manipulate vmas.Does munmap() == MADV_DONTNEED + munmap() ?
For example, what happens with userfaultfd in this case? Can you get an extra #PF, which would be visible to userspace, before the munmap is finished?
In addition, would it be ok for the user to potentially get a zeroed page in the time window after the MADV_DONTNEED finished removing a PTE and before the munmap() is done?
Regards, Nadav