> On Mar 10, 2026, at 4:19 PM, David Hildenbrand (Arm) wrote: > > !-------------------------------------------------------------------| > This Message Is From an External Sender > This message came from outside your organization. > |-------------------------------------------------------------------! >> >> I was previously testing on 6.12 and didn’t see any changes to vfio-pci or >> pagewalk.c which prompted me to check whether I could reproduce the >> bug in a more recent kernel. >> >> However, when I tried to reproduce the bug on 7.0-rc2 (after adding some >> tracing to get a clearer picture of the sequence of events) it doesn’t happen. >> The VFIO DMA set operation is much faster on 7.0, so possibly the race >> window is too small for it to occur in reasonable time. > > Interesting. You could try adding a delay to a test kernel to see if you > can still provoke it. > > There is the slight possibility that something else fixed the race for > your reproducer by "accident". > > [...] Could be that the race window was made a lot smaller by [1], I see that the occurrence of the race also drops significantly already when I’m just adding some extra trace logs in the VFIO DMA set functions. […] > > Just to double-check: is that expected? > > I wonder why "-EINVAL" would be returned here. Do you know? > I don’t think it’s expected, but at least it’s an error that can be caught in userspace. I’ll do a bit more digging into why and where that originates, but I think that’ll get a patch in vfio. The -EINVAL originates from: vfio_dma_do_map -> vfio_pin_map_dma -> vfio_pin_pages_remote -> vaddr_get_pfns -> pin_user_pages_remote (mm/gup.c) Possibly that’s also the origin of the concurrent PUD modification that requires the retry in the walker in this patch. >> >> For my own understanding, why is this patch preferred over: >> - if (pud_none(*pud)) >> + if (pud_none(*pud) || pud_leaf(*pud)) >> in the walk_pud_range function? > > It might currently work for PUDs, but as soon as we have non-present PUD > entries (like migration entries) the code could become shaky: pud_leaf() > is only guaranteed to yield the right result if pud_present() is true. > > So I decided to instead make walk_pud_range() look more similar to > walk_pmd_range(), which is quite helpful for spotting actual differences > in the logic. > >> >> I do think moving the check to walk_pmd_range is a more clear on the code’s intent and >> personally prefer the code there, but I don’t see why this check is removing the possibility >> of a race after the (!pud_present(pudval) || pud_leaf(pudval)) check, as to me it looks >> like the PMD entry was possible to disappear between the splitting and this check? > > I distilled that in the comment: PMD page tables cannot/are not > reclaimed. So once you see a PMD page table, it's not going anywhere > while you hold relevant locks (mmap_lock or VMA lock). > > Only PMD leaf entries can get zapped any time and PMD none entries can > get populated any time. But not PMD page tables. Gotcha, thanks! > >> >> Anyways, regardless, this patch resolves the bug and looks good to me - what’s the >> course of action as we probably want to backport this to earlier kernels as well. Shall >> I send in a new PATCH without cover letter and take it from there? > > Right, I think you should: > > (1) rework the patch description to incorporate the essential stuff from > the cover letter > (2) Identify and add Fixes: tag and Cc: stable > (3) Document that we are reworking the code to mimic what we do in > walk_pmd_range(), to have less inconsistency on the core logic > (4) Document why you think the reproducer fails on newer kernels. (or > best try to get it reproduced by adding some delays in the code) > (5) Clarify that only PUD handling are prone to the race and that PMDs > are fine (and point out why) > (6) Use a patch subject like "mm/pagewalk: fix race between unmapping > and refaulting in walk_pud_range()" > > Once you resend, best to add > > Co-developed-by: David Hildenbrand (Arm) > Signed-off-by: David Hildenbrand (Arm) > > Above your SOB. > > To get something like: > > Co-developed-by: David Hildenbrand (Arm) > Signed-off-by: David Hildenbrand (Arm) > Signed-off-by: Max Boone > > Note that the existing > > Signed-off-by: Max Tottenham > > Is weird, as Max Tottenham did not send out this patch. If he was > involved in the development, you should either make him > > Suggested-by: > > Or > Debugged-by: > > Or > Co-developed-by: + Signed-off-by: > > See Documentation/process/submitting-patches.rst > > > Let me know if you have any questions :) > Will do, thanks a lot! > -- > Cheers, > > David [1] https://lore.kernel.org/all/20250814064714.56485-1-lizhe.67@bytedance.com/