On 7/11/17 11:29 AM, Jerome Glisse wrote: > Can you test if attached patch helps ? I am having trouble reproducing > this > from inside a vm. > > My theory is that 2 concurrent CPU page fault happens. First one manage to > start the migration back to system memory but second one see the migration > special entry and call migration_entry_wait() which increase page refcount > and this happen before first one check page refcount are ok for migration. > > For regular migration such scenario is ok as the migration bails out and > because page is CPU accessible there is no need to kick again the migration > for other thread that CPU fault to migrate. > > I am looking into how i can change migration_entry_wait() not to refcount > pages. Let me know if the attached patch helps. > > Thank you > Jerome Hi Jerome, Thanks for the update. Unfortunately, the patch does not help. I just applied it and recompiled the kernel. Please find attached a new kernel log and an app log. -- Evgeny Baskakov NVIDIA