Here's one more dmesg output with more information captured in __get_user_pages() as well. It basically confirms that handle_mm_fault() returns VM_FAULT_RETRY. I'm not sure where and what to change ("fix with a FOLL_TRIED somewhere") to make it work. My (uneducated) impression is, that only __get_user_pages() needs to be changed - but I might be wrong. On Tue, 2019-11-05 at 21:05 +0100, Robert Stupp wrote: > On Tue, 2019-11-05 at 13:22 -0500, Johannes Weiner wrote: > > Judging from Robert's stack captures, the task is not hung but > > busy-looping in __mm_populate(). AFAICS, the only way this can > > occur > > is if populate_vma_page_range() returns 0 and we don't advance the > > iteration position (if it returned an error, we wouldn't reset nend > > and move on to the next vma as ignore_errors is 1 for mlockall.) > > > > populate_vma_page_range() returns 0 when the first page is not > > found > > and faultin_page() returns -EBUSY (if it were processing pages, or > > if > > the error from faultin_page() would be a different one, we would > > return the number of pages processed or -error). > > > > faultin_page() returns -EBUSY when VM_FAULT_RETRY is set, i.e. we > > dropped the mmap_sem in order to initiate IO and require a retry. > > That > > is consistent with the bisect result (new VM_FAULT_RETRY > > conditions). > > > > At this point, regular page fault would retry with FAULT_FLAG_TRIED > > to > > indicate that the mmap_sem cannot be dropped a second time. But > > this > > mlock path doesn't set that flag and we can loop repeatedly. That > > is > > something we probably need to fix with a FOLL_TRIED somewhere. > > > > What I don't quite understand yet is why the fault path doesn't > > make > > progress eventually. We must drop the mmap_sem without changing the > > state in any way. How can we keep looping on the same page? > > I've played a bit around by adding some `printk` messages (see > attached > patch) and found exactly what you describe: it's busy-looping in > __mm_populate(), because populate_vma_page_range returns 0. > > However, there's a slightly interesting thing in there. Before it > loops > forever, it processes > nstart=5574d92e1000 > locked=1 > vma->vm_start=7f5e4bfec000 > vma->vm_end= 7f5e4c011000 > vma->vm_flags=8002071 > for which populate_vma_page_range() returns 1, then it processes this > over and over again: > nstart=7f5e4bfed000 > locked=0 > vma->vm_start=7f5e4bfec000 (same as before) > vma->vm_end= 7f5e4c011000 > vma->vm_flags=8002071 > These are the additional dmesg messages with timestamp 105.x. At > timestamp 106.x, I've hit ctrl-c (ret=-512). > > dmesg output with the patch applied (on top of the v5.3.8 git tag) > attached. >