Matthew Wilcox <willy@infradead.org> writes:
It's not been reported before, but it seems obvious why it triggers.
I think it's ffa65753c431 "mm/migrate.c: rework migration_entry_wait()
to not take a pageref".
It's obvious the BUG_ON() is because the page isn't locked, but it's less
obvious to me at least why we have a migration entry pointing to an unlocked
page.
There's a lot of inlining going on, so let's write this out:
__handle_mm_fault
handle_pte_fault
do_swap_page():
entry = pte_to_swp_entry(vmf->orig_pte);
if (unlikely(non_swap_entry(entry))) {
if (is_migration_entry(entry)) {
migration_entry_wait(vma->vm_mm, vmf->pmd,
vmf->address);
We don't (and can't) have the underlying page locked here. We're just
waiting for it to finish migration.
Why can't the page be locked here? Waiting for migration to finish is equivalent
to waiting for the page to be unlocked. __unmap_and_move() follows this rough
sequence:
__unmap_and_move()
if (!trylock_page(page)) {
if (!force || mode `= MIGRATE_ASYNC)
goto out;
/*
* It's not safe for direct compaction to call lock_page.
* For example, during page readahead pages are added locked
* to the LRU. Later, when the IO completes the pages are
* marked uptodate and unlocked. However, the queueing
* could be merging multiple pages for one bio (e.g.
* mpage_readahead). If an allocation happens for the
* second or third page, the process can end up locking
* the same page twice and deadlocking. Rather than
* trying to be clever about what pages can be locked,
* avoid the use of lock_page for direct compaction
* altogether.
*/
if (current->flags & PF_MEMALLOC)
goto out;
lock_page(page);
}
[...]
if (unlikely(!trylock_page(newpage)))
goto out_unlock;
[...]
} else if (page_mapped(page)) {
/* Establish migration ptes */
VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
page);
try_to_migrate(page, 0);
page_was_mapped = true;
}
if (!page_mapped(page))
rc = move_to_new_page(newpage, page, mode);
if (page_was_mapped)
remove_migration_ptes(page,
rc =' MIGRATEPAGE_SUCCESS ? newpage : page, false);
Both the source and destination pages (page and newpage respectively) are locked
prior to installing migration entries in try_to_migrate() and unlocked after
migration entries are removed. In order to remove a migration entry the PTL is
required. Therefore while holding the PTL for a migration entry it should be
impossible to observe it pointing to an unlocked page.