On Thu, Jul 21, 2022 at 07:00:23PM -0400, Felix Kuehling wrote:
Hi all,
We're noticing some unexpected behaviour when the amdgpu and Mellanox
drivers are interacting on shared memory with hmm_range_fault. If the amdgpu
driver migrated pages to DEVICE_PRIVATE memory, we would expect
hmm_range_fault called by the Mellanox driver to fault them back to system
memory. But that's not happening. Instead hmm_range_fault fails.
For an experiment, Philip hacked hmm_vma_handle_pte to treat DEVICE_PRIVATE
pages like device_exclusive pages, which gave us the expected behaviour. It
would result in a dev_pagemap_ops.migrate_to_ram callback in our driver, and
hmm_range_fault would return system memory pages to the Mellanox driver.
So something is clearly wrong. It could be:
* our expectations are wrong,
* the implementation of hmm_range_fault is wrong, or
* our driver is missing something when migrating to DEVICE_PRIVATE memory.
Do you have any insights?
I think it is a bug
Jason