On Thu, 2023-10-12 at 15:05 +0100, Matthew Wilcox wrote: > On Thu, Oct 12, 2023 at 02:53:05PM +0100, David Woodhouse wrote: > > > +       arch_enter_lazy_mmu_mode(); > > > +       for (;;) { > > > +               set_pte(ptep, pte); > > > +               if (--nr == 0) > > > +                       break; > > > +               ptep++; > > > +               pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT)); > > > +       } > > > +       arch_leave_lazy_mmu_mode(); > > > > This breaks the Xen PV guest. > > > > In move_ptes() in mm/mremap.c we arch_enter_lazy_mmu_mode() and then > > loop calling set_pte_at(). Which now (or at least in a few commits time > > when you wire it up for x86 in commit a3e1c9372c9b959) ends up in your > > implementation of set_ptes(), calls arch_enter_lazy_mmu_mode() again, > > and: > > > > [    0.628700] ------------[ cut here ]------------ > > [    0.628718] kernel BUG at arch/x86/kernel/paravirt.c:144! > > Easy fix ... don't do that ;-) > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index af7639c3b0a3..f3da8836f689 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -231,9 +231,11 @@ static inline pte_t pte_next_pfn(pte_t pte) >  static inline void set_ptes(struct mm_struct *mm, unsigned long addr, >                 pte_t *ptep, pte_t pte, unsigned int nr) >  { > +       bool multiple = nr > 1; >         page_table_check_ptes_set(mm, ptep, pte, nr); >   > -       arch_enter_lazy_mmu_mode(); > +       if (multiple) > +               arch_enter_lazy_mmu_mode(); >         for (;;) { >                 set_pte(ptep, pte); >                 if (--nr == 0) > @@ -241,7 +243,8 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, >                 ptep++; >                 pte = pte_next_pfn(pte); >         } > -       arch_leave_lazy_mmu_mode(); > +       if (multiple) > +               arch_leave_lazy_mmu_mode(); >  } >  #endif >  #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) > > I think long-term, we should make lazy_mmu_mode nestable.  But this is > a reasonable quick fix. I don't much like doing it implicitly based on (nr==1) but sure, as a quick fix that works. The 64-bit PV guest now boots again. Tested-by: David Woodhouse Thanks.