On Mon, Sep 15, 2025 at 8:35 AM Zi Yan <ziy@nvidia.com> wrote:
On 15 Sep 2025, at 9:52, Kiryl Shutsemau wrote:

> From: Kiryl Shutsemau <kas@kernel.org>
>
> MADV_COLLAPSE on a file mapping behaves inconsistently depending on if
> PMD page table is installed or not.
>
> Consider following example:
>
>       p = mmap(NULL, 2UL << 20, PROT_READ | PROT_WRITE,
>                MAP_SHARED, fd, 0);
>       err = madvise(p, 2UL << 20, MADV_COLLAPSE);
>
> fd is a populated tmpfs file.
>
> The result depends on the address that the kernel returns on mmap().
> If it is located in an existing PMD table, the madvise() will succeed.
> However, if the table does not exist, it will fail with -EINVAL.
>
> This occurs because find_pmd_or_thp_or_none() returns SCAN_PMD_NULL when
> a page table is missing, which causes collapse_pte_mapped_thp() to fail.
>
> SCAN_PMD_NULL and SCAN_PMD_NONE should be treated the same in
> collapse_pte_mapped_thp(): install the PMD leaf entry and allocate page
> tables as needed.

Why does collapse code want to know the difference between SCAN_PMD_NULL and
SCAN_PMD_NONE? Both seems to be treated as “nothing here, install a PMD
leaf”. One difference is that madvise_collapse() will continue
on SCAN_PMD_NULL but bail out on SCAN_PMD_NONE.

I wonder if we could have SCAN_PMD_NULL_OR_NONE instead.

Zach, since you added both, can you share some insight? Thanks.

>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
>
> v2:
>  - Modify set_huge_pmd() instead of introducing install_huge_pmd();
>
> ---
>  mm/khugepaged.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>

The changes look good to me. Reviewed-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi

Thanks Zi. Hugh had also looped me into this. Travelling today but will respond tomorrow. Generally though, this is a behavioural cleanup I’d had been meaning to do for a while, but didn’t realize it’d be so straightforward. Thank you, 
Kiryl