On 7/22/22 08:34, Jason Gunthorpe wrote: > On Thu, Jul 21, 2022 at 07:00:23PM -0400, Felix Kuehling wrote: >> Hi all, >> >> We're noticing some unexpected behaviour when the amdgpu and Mellanox >> drivers are interacting on shared memory with hmm_range_fault. If the amdgpu >> driver migrated pages to DEVICE_PRIVATE memory, we would expect >> hmm_range_fault called by the Mellanox driver to fault them back to system >> memory. But that's not happening. Instead hmm_range_fault fails. >> >> For an experiment, Philip hacked hmm_vma_handle_pte to treat DEVICE_PRIVATE >> pages like device_exclusive pages, which gave us the expected behaviour. It >> would result in a dev_pagemap_ops.migrate_to_ram callback in our driver, and >> hmm_range_fault would return system memory pages to the Mellanox driver. >> >> So something is clearly wrong. It could be: >> >> * our expectations are wrong, >> * the implementation of hmm_range_fault is wrong, or >> * our driver is missing something when migrating to DEVICE_PRIVATE memory. >> >> Do you have any insights? > I think it is a bug > > Jason Yes, looks like a bug to me too. In hmm_vma_handle_pte(), it calls hmm_is_device_private_entry() which correctly handles the case where the device private entry is owned by the driver calling hmm_range_fault() but then does nothing to fault in the page if it is a device private entry not owned by the driver. I'll work with Alistair and one of us will post a fix. Thanks for finding this!