On 6/19/18 3:17 PM, Nadav Amit wrote: > at 4:34 PM, Yang Shi wrote: > >> When running some mmap/munmap scalability tests with large memory (i.e. >>> 300GB), the below hung task issue may happen occasionally. >> INFO: task ps:14018 blocked for more than 120 seconds. >> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this >> message. >> ps D 0 14018 1 0x00000004 >> > (snip) > >> Zapping pages is the most time consuming part, according to the >> suggestion from Michal Hock [1], zapping pages can be done with holding >> read mmap_sem, like what MADV_DONTNEED does. Then re-acquire write >> mmap_sem to manipulate vmas. > Does munmap() == MADV_DONTNEED + munmap() ? Not exactly the same. So, I basically copied the page zapping used by munmap instead of calling MADV_DONTNEED. > > For example, what happens with userfaultfd in this case? Can you get an > extra #PF, which would be visible to userspace, before the munmap is > finished? userfaultfd is handled by regular munmap path. So, no change to userfaultfd part. > > In addition, would it be ok for the user to potentially get a zeroed page in > the time window after the MADV_DONTNEED finished removing a PTE and before > the munmap() is done? This should be undefined behavior according to Michal. This has been discussed in https://lwn.net/Articles/753269/. Thanks, Yang > > Regards, > Nadav