On Fri, Nov 05, 2004 at 10:49:58PM +0900, Hirokazu Takahashi wrote: > Hi, Marcelo, > > I happened to meet a bug. > > > > Yep thats probably what caused your failures. > > > > > > I'll prepare a new patch. > > > > Here it is - with the copy_page_range() fix as you pointed out, > > plus sys_swapon() fix as suggested by Hiroyuki. > > > > I've also added a BUG() in case of swap_free() failure, so we > > get a backtrace. > > > > Can you please test this - thanks. > > >From the attached message, lookup_migration_cache() returned NULL > in do_swap_page(). There might be a race condition related the > migration cache. Hi Hirokazu! The problem is that another thread can fault in the pte (removing the radix tree entry) while the current thread dropped the page_table_lock - which explains the NULL lookup_migration_cache. The swap code handles this situation, but I've completly missed it. Updated patch attached. Extreme thanks for your testing, its being crucial! We're getting there. do_swap_page now does: again: + if (pte_is_migration(orig_pte)) { + page = lookup_migration_cache(entry.val); + if (!page) { + spin_lock(&mm->page_table_lock); + page_table = pte_offset_map(pmd, address); + if (likely(pte_same(*page_table, orig_pte))) + ret = VM_FAULT_OOM; + else + ret = VM_FAULT_MINOR; + pte_unmap(page_table); + spin_unlock(&mm->page_table_lock); + goto out; + } + } else {