linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 6.1.y 2/3] mm: hugetlb: independent PMD page table shared count
       [not found]   ` <20250620213334.158850-2-jannh@google.com>
@ 2025-06-29 13:00     ` Vitaly Chikunov
  2025-06-30 17:12       ` Jann Horn
  0 siblings, 1 reply; 3+ messages in thread
From: Vitaly Chikunov @ 2025-06-29 13:00 UTC (permalink / raw)
  To: Jann Horn, Sasha Levin, Andrew Morton, gregkh, stable
  Cc: Jane Chu, Nanyong Sun, Muchun Song, Ken Chen, Kefeng Wang,
	Liu Shixin, linux-mm

Hi,

LTP tests failure with the following commit described below:

On Fri, Jun 20, 2025 at 11:33:32PM +0200, Jann Horn wrote:
> From: Liu Shixin <liushixin2@huawei.com>
> 
> [ Upstream commit 59d9094df3d79443937add8700b2ef1a866b1081 ]
> 
> The folio refcount may be increased unexpectly through try_get_folio() by
> caller such as split_huge_pages.  In huge_pmd_unshare(), we use refcount
> to check whether a pmd page table is shared.  The check is incorrect if
> the refcount is increased by the above caller, and this can cause the page
> table leaked:
> 
>  BUG: Bad page state in process sh  pfn:109324
>  page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x66 pfn:0x109324
>  flags: 0x17ffff800000000(node=0|zone=2|lastcpupid=0xfffff)
>  page_type: f2(table)
>  raw: 017ffff800000000 0000000000000000 0000000000000000 0000000000000000
>  raw: 0000000000000066 0000000000000000 00000000f2000000 0000000000000000
>  page dumped because: nonzero mapcount
>  ...
>  CPU: 31 UID: 0 PID: 7515 Comm: sh Kdump: loaded Tainted: G    B              6.13.0-rc2master+ #7
>  Tainted: [B]=BAD_PAGE
>  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
>  Call trace:
>   show_stack+0x20/0x38 (C)
>   dump_stack_lvl+0x80/0xf8
>   dump_stack+0x18/0x28
>   bad_page+0x8c/0x130
>   free_page_is_bad_report+0xa4/0xb0
>   free_unref_page+0x3cc/0x620
>   __folio_put+0xf4/0x158
>   split_huge_pages_all+0x1e0/0x3e8
>   split_huge_pages_write+0x25c/0x2d8
>   full_proxy_write+0x64/0xd8
>   vfs_write+0xcc/0x280
>   ksys_write+0x70/0x110
>   __arm64_sys_write+0x24/0x38
>   invoke_syscall+0x50/0x120
>   el0_svc_common.constprop.0+0xc8/0xf0
>   do_el0_svc+0x24/0x38
>   el0_svc+0x34/0x128
>   el0t_64_sync_handler+0xc8/0xd0
>   el0t_64_sync+0x190/0x198
> 
> The issue may be triggered by damon, offline_page, page_idle, etc, which
> will increase the refcount of page table.
> 
> 1. The page table itself will be discarded after reporting the
>    "nonzero mapcount".
> 
> 2. The HugeTLB page mapped by the page table miss freeing since we
>    treat the page table as shared and a shared page table will not be
>    unmapped.
> 
> Fix it by introducing independent PMD page table shared count.  As
> described by comment, pt_index/pt_mm/pt_frag_refcount are used for s390
> gmap, x86 pgds and powerpc, pt_share_count is used for x86/arm64/riscv
> pmds, so we can reuse the field as pt_share_count.

The commit causes LTP test memfd_create03 to fail on i586 architecture
on v6.1.142 stable release, the test was passing on v6.1.141. Found the
commit with git bisect.

The failure:

  root@i586:~# /usr/lib/ltp/testcases/bin/memfd_create03
  tst_hugepage.c:78: TINFO: 2 hugepage(s) reserved
  tst_test.c:1526: TINFO: Timeout per run is 0h 00m 30s
  memfd_create03.c:171: TINFO: --TESTING WRITE CALL IN HUGEPAGES--
  memfd_create03.c:176: TINFO: memfd_create() succeeded
  memfd_create03.c:70: TPASS: write(3, "LTP", 3) failed as expected

  memfd_create03.c:171: TINFO: --TESTING PAGE SIZE OF CREATED FILE--
  memfd_create03.c:176: TINFO: memfd_create() succeeded
  memfd_create03.c:43: TINFO: mmap((nil), 4194304, 2, 2, 3, 0) succeeded
  memfd_create03.c:92: TINFO: munmap(0xb7800000, 1024kB) failed as expected
  memfd_create03.c:92: TINFO: munmap(0xb7800000, 2048kB) failed as expected
  memfd_create03.c:92: TINFO: munmap(0xb7800000, 3072kB) failed as expected
  memfd_create03.c:111: TPASS: munmap() fails for page sizes less than 4096kB

  memfd_create03.c:171: TINFO: --TESTING HUGEPAGE ALLOCATION LIMIT--
  memfd_create03.c:176: TINFO: memfd_create() succeeded
  memfd_create03.c:39: TBROK: mmap((nil),0,2,2,3,0) failed: EINVAL (22)

  Summary:
  passed   2
  failed   0
  broken   1
  skipped  0
  warnings 0

dmesg while the test run:

  [   16.072078] memfd_create03 (203): drop_caches: 3
  [   16.075298] mm/pgtable-generic.c:51: bad pgd 7d4000e7

The same error occurs for v5.10.239. There is no test failure on v6.12.35 nor
v6.15.4 even though they contain the same commit.

Thanks,

> 
> Link: https://lkml.kernel.org/r/20241216071147.3984217-1-liushixin2@huawei.com
> Fixes: 39dde65c9940 ("[PATCH] shared page table for hugetlb page")
> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: Ken Chen <kenneth.w.chen@intel.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Nanyong Sun <sunnanyong@huawei.com>
> Cc: Jane Chu <jane.chu@oracle.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> [backport note: struct ptdesc did not exist yet, stuff it equivalently
> into struct page instead]
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
>  include/linux/mm.h       |  3 +++
>  include/linux/mm_types.h |  3 +++
>  mm/hugetlb.c             | 16 +++++++---------
>  3 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 03357c196e0b..b36dffbfbe69 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2537,6 +2537,9 @@ static inline bool pgtable_pmd_page_ctor(struct page *page)
>  	if (!pmd_ptlock_init(page))
>  		return false;
>  	__SetPageTable(page);
> +#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
> +	atomic_set(&page->pt_share_count, 0);
> +#endif
>  	inc_lruvec_page_state(page, NR_PAGETABLE);
>  	return true;
>  }
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index a9c1d611029d..9b64610eddcc 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -160,6 +160,9 @@ struct page {
>  			union {
>  				struct mm_struct *pt_mm; /* x86 pgds only */
>  				atomic_t pt_frag_refcount; /* powerpc */
> +#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
> +				atomic_t pt_share_count;
> +#endif
>  			};
>  #if ALLOC_SPLIT_PTLOCKS
>  			spinlock_t *ptl;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fc5d3d665266..a3907edf2909 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -7114,7 +7114,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
>  			spte = huge_pte_offset(svma->vm_mm, saddr,
>  					       vma_mmu_pagesize(svma));
>  			if (spte) {
> -				get_page(virt_to_page(spte));
> +				atomic_inc(&virt_to_page(spte)->pt_share_count);
>  				break;
>  			}
>  		}
> @@ -7129,7 +7129,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
>  		mm_inc_nr_pmds(mm);
>  	} else {
> -		put_page(virt_to_page(spte));
> +		atomic_dec(&virt_to_page(spte)->pt_share_count);
>  	}
>  	spin_unlock(ptl);
>  out:
> @@ -7141,10 +7141,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
>  /*
>   * unmap huge page backed by shared pte.
>   *
> - * Hugetlb pte page is ref counted at the time of mapping.  If pte is shared
> - * indicated by page_count > 1, unmap is achieved by clearing pud and
> - * decrementing the ref count. If count == 1, the pte page is not shared.
> - *
>   * Called with page table lock held.
>   *
>   * returns: 1 successfully unmapped a shared pte page
> @@ -7153,18 +7149,20 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
>  int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
>  					unsigned long addr, pte_t *ptep)
>  {
> +	unsigned long sz = huge_page_size(hstate_vma(vma));
>  	pgd_t *pgd = pgd_offset(mm, addr);
>  	p4d_t *p4d = p4d_offset(pgd, addr);
>  	pud_t *pud = pud_offset(p4d, addr);
>  
>  	i_mmap_assert_write_locked(vma->vm_file->f_mapping);
>  	hugetlb_vma_assert_locked(vma);
> -	BUG_ON(page_count(virt_to_page(ptep)) == 0);
> -	if (page_count(virt_to_page(ptep)) == 1)
> +	if (sz != PMD_SIZE)
> +		return 0;
> +	if (!atomic_read(&virt_to_page(ptep)->pt_share_count))
>  		return 0;
>  
>  	pud_clear(pud);
> -	put_page(virt_to_page(ptep));
> +	atomic_dec(&virt_to_page(ptep)->pt_share_count);
>  	mm_dec_nr_pmds(mm);
>  	return 1;
>  }
> -- 
> 2.50.0.rc2.701.gf1e915cc24-goog
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 6.1.y 2/3] mm: hugetlb: independent PMD page table shared count
  2025-06-29 13:00     ` [PATCH 6.1.y 2/3] mm: hugetlb: independent PMD page table shared count Vitaly Chikunov
@ 2025-06-30 17:12       ` Jann Horn
  2025-06-30 19:17         ` Jann Horn
  0 siblings, 1 reply; 3+ messages in thread
From: Jann Horn @ 2025-06-30 17:12 UTC (permalink / raw)
  To: Vitaly Chikunov, Muchun Song, Oscar Salvador, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra
  Cc: Sasha Levin, Andrew Morton, gregkh, stable, Jane Chu,
	Nanyong Sun, Ken Chen, Kefeng Wang, Liu Shixin, linux-mm

tl;dr: 32-bit x86 without PAE opts into hugetlb page table sharing
despite only having 2-level paging, which means the "sharable" page
tables are PGDs, and then stuff breaks

On Sun, Jun 29, 2025 at 3:00 PM Vitaly Chikunov <vt@altlinux.org> wrote:
> LTP tests failure with the following commit described below:

Uuugh... thanks for letting me know.

> On Fri, Jun 20, 2025 at 11:33:32PM +0200, Jann Horn wrote:
> > From: Liu Shixin <liushixin2@huawei.com>
> >
> > [ Upstream commit 59d9094df3d79443937add8700b2ef1a866b1081 ]
> >
> > The folio refcount may be increased unexpectly through try_get_folio() by
> > caller such as split_huge_pages.  In huge_pmd_unshare(), we use refcount
> > to check whether a pmd page table is shared.  The check is incorrect if
> > the refcount is increased by the above caller, and this can cause the page
> > table leaked:
[...]
> The commit causes LTP test memfd_create03 to fail on i586 architecture
> on v6.1.142 stable release, the test was passing on v6.1.141. Found the
> commit with git bisect.

Ah, yes, I can reproduce this; specifically it reproduces on a 32-bit
X86 builds without X86_PAE. If I enable X86_PAE, the tests pass.

Okay, I don't know precisely why this is breaking, but at a high
level: x86 unconditionally selects ARCH_WANT_HUGE_PMD_SHARE (and still
does in mainline). That flag means "when we have PMD entries pointing
to hugetlb pages, we want to share the PMD table across processes".

32-bit X86 with PAE has 3 page table levels (pgd, pmd, pte); so with
this sharing mechanism, we'd have multiple PGD entries pointing to the
same PMD. I guess that seems fine.

But 32-bit X86 with PAE only has 2 page table levels (pgd, pte). So a
hugepage is referenced by a PGD entry, and it makes no sense to try to
share PGDs. PGDs not being shared page tables is also baked into
(looking at the mainline version) "struct ptdesc", which puts "struct
mm_struct *pt_mm;" (for x86 PGDs) and "atomic_t pt_share_count;" (for
hugetlb page table sharing) into the same union.

I guess I'll send a patch later to disable page table sharing in
non-PAE 32-bit x86... or maybe we should disable it entirely for
32-bit x86...


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 6.1.y 2/3] mm: hugetlb: independent PMD page table shared count
  2025-06-30 17:12       ` Jann Horn
@ 2025-06-30 19:17         ` Jann Horn
  0 siblings, 0 replies; 3+ messages in thread
From: Jann Horn @ 2025-06-30 19:17 UTC (permalink / raw)
  To: Vitaly Chikunov, Muchun Song, Oscar Salvador, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, gregkh
  Cc: Sasha Levin, Andrew Morton, stable, Jane Chu, Nanyong Sun,
	Kefeng Wang, Liu Shixin, linux-mm

On Mon, Jun 30, 2025 at 7:12 PM Jann Horn <jannh@google.com> wrote:
> tl;dr: 32-bit x86 without PAE opts into hugetlb page table sharing
> despite only having 2-level paging, which means the "sharable" page
> tables are PGDs, and then stuff breaks
>
> On Sun, Jun 29, 2025 at 3:00 PM Vitaly Chikunov <vt@altlinux.org> wrote:
> > LTP tests failure with the following commit described below:
>
> Uuugh... thanks for letting me know.
>
> > On Fri, Jun 20, 2025 at 11:33:32PM +0200, Jann Horn wrote:
> > > From: Liu Shixin <liushixin2@huawei.com>
> > >
> > > [ Upstream commit 59d9094df3d79443937add8700b2ef1a866b1081 ]
> > >
> > > The folio refcount may be increased unexpectly through try_get_folio() by
> > > caller such as split_huge_pages.  In huge_pmd_unshare(), we use refcount
> > > to check whether a pmd page table is shared.  The check is incorrect if
> > > the refcount is increased by the above caller, and this can cause the page
> > > table leaked:
> [...]
> > The commit causes LTP test memfd_create03 to fail on i586 architecture
> > on v6.1.142 stable release, the test was passing on v6.1.141. Found the
> > commit with git bisect.
>
> Ah, yes, I can reproduce this; specifically it reproduces on a 32-bit
> X86 builds without X86_PAE. If I enable X86_PAE, the tests pass.
[...]
> I guess I'll send a patch later to disable page table sharing in
> non-PAE 32-bit x86... or maybe we should disable it entirely for
> 32-bit x86...

This follow-up patch should address that:
<https://lore.kernel.org/r/20250630-x86-2level-hugetlb-v1-1-077cd53d8255@google.com>


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-06-30 19:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <2025062041-uplifted-cahoots-6c42@gregkh>
     [not found] ` <20250620213334.158850-1-jannh@google.com>
     [not found]   ` <20250620213334.158850-2-jannh@google.com>
2025-06-29 13:00     ` [PATCH 6.1.y 2/3] mm: hugetlb: independent PMD page table shared count Vitaly Chikunov
2025-06-30 17:12       ` Jann Horn
2025-06-30 19:17         ` Jann Horn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox