Hello Sagi, On Wed, Feb 08, 2012 at 03:55:43PM +0200, Sagi Grimberg wrote: > Now that anon_vma lock and i_mmap_mutex are both sleepable mutex, it is possible to schedule inside invalidation callbacks > (such as invalidate_page, invalidate_range_start/end and change_pte) . > This is essential for a scheduling HW sync in RDMA drivers which apply on demand paging methods. Even after this, you still can't schedule in invalidate_page/change_pte/clear_flush_young yet because the PT lock is still hold there and that's a spinlock. You can only schedule in invalidate_range_start/end. It's definitely a change in the right direction but a few more patches are needed to make all methods schedule capable. Originally it started with srcu but then not even invalidate_range_start/end could schedule because of the i_mmap_lock, so then we changed to not sleepable rcu in the upstream final version merged to keep it optimal. Originally I developed a version of mmu notifier that was capable of scheduling everywhere to proof the point it was possible, but it was deferred because of the cost of using mutex instead of spinlocks for anon-vma and i_mmap_lock (measured by Andi as a double digit percent snowdown in some workloads in latest upstream, but hey it's upstream already and unconditional change, so we should at least take advantage of it in mmu notifier now! total agreement. At least we get the full advantage of it in terms of easier coding of mmu notifier users). I archived the missing patches that were deferred back then just waiting for this moment, so now I'm attaching the ones that are still missing and that are needed in addition to the srcu change you did to get full scheduling capability from all mmu notifier methods. They're unlikely to apply but you can revisit these now that only the PT lock is a problem left and we should check the free_pgtable shenanigans too. The new test_young method used by khugepaged to know if it's worth collapsing an hugepage (in case it was only accessed by the secondary MMU), probably is ok to stay non-sleep capable as it's readonly thing and doesn't need to invalidate but in that case proper docs should be added on which methods are sleepable and which are not. Thanks, Andrea