linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
To: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>,
	linuxppc-dev@lists.ozlabs.org
Cc: linux-mm@kvack.org, Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@kernel.org>,
	Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Subject: Re: [RFC v1 01/10] powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy
Date: Wed, 4 Mar 2026 09:53:05 +0100	[thread overview]
Message-ID: <e10b3f59-603b-4d7f-a4bf-91f4e9f51ae7@kernel.org> (raw)
In-Reply-To: <62dfff55a7f4f465ac1f8077cee93e6e87ebddd0.1772013273.git.ritesh.list@gmail.com>



Le 25/02/2026 à 12:04, Ritesh Harjani (IBM) a écrit :
> powerpc uses pt_frag_refcount as a reference counter for tracking it's
> pte and pmd page table fragments. For PTE table, in case of Hash with
> 64K pagesize, we have 16 fragments of 4K size in one 64K page.
> 
> Patch series [1] "mm: free retracted page table by RCU"
> added pte_free_defer() to defer the freeing of PTE tables when
> retract_page_tables() is called for madvise MADV_COLLAPSE on shmem
> range.
> [1]: https://lore.kernel.org/all/7cd843a9-aa80-14f-5eb2-33427363c20@google.com/
> 
> pte_free_defer() sets the active flag on the corresponding fragment's
> folio & calls pte_fragment_free(), which reduces the pt_frag_refcount.
> When pt_frag_refcount reaches 0 (no active fragment using the folio), it
> checks if the folio active flag is set, if set, it calls call_rcu to
> free the folio, it the active flag is unset then it calls pte_free_now().
> 
> Now, this can lead to following problem in a corner case...
> 
> [  265.351553][  T183] BUG: Bad page state in process a.out  pfn:20d62
> [  265.353555][  T183] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20d62
> [  265.355457][  T183] flags: 0x3ffff800000100(active|node=0|zone=0|lastcpupid=0x7ffff)
> [  265.358719][  T183] raw: 003ffff800000100 0000000000000000 5deadbeef0000122 0000000000000000
> [  265.360177][  T183] raw: 0000000000000000 c0000000119caf58 00000000ffffffff 0000000000000000
> [  265.361438][  T183] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [  265.362572][  T183] Modules linked in:
> [  265.364622][  T183] CPU: 0 UID: 0 PID: 183 Comm: a.out Not tainted 6.18.0-rc3-00141-g1ddeaaace7ff-dirty #53 VOLUNTARY
> [  265.364785][  T183] Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries
> [  265.364908][  T183] Call Trace:
> [  265.364955][  T183] [c000000011e6f7c0] [c000000001cfaa18] dump_stack_lvl+0x130/0x148 (unreliable)
> [  265.365202][  T183] [c000000011e6f7f0] [c000000000794758] bad_page+0xb4/0x1c8
> [  265.365384][  T183] [c000000011e6f890] [c00000000079c020] __free_frozen_pages+0x838/0xd08
> [  265.365554][  T183] [c000000011e6f980] [c0000000000a70ac] pte_frag_destroy+0x298/0x310
> [  265.365729][  T183] [c000000011e6fa30] [c0000000000aa764] arch_exit_mmap+0x34/0x218
> [  265.365912][  T183] [c000000011e6fa80] [c000000000751698] exit_mmap+0xb8/0x820
> [  265.366080][  T183] [c000000011e6fc30] [c0000000001b1258] __mmput+0x98/0x300
> [  265.366244][  T183] [c000000011e6fc80] [c0000000001c81f8] do_exit+0x470/0x1508
> [  265.366421][  T183] [c000000011e6fd70] [c0000000001c95e4] do_group_exit+0x88/0x148
> [  265.366602][  T183] [c000000011e6fdc0] [c0000000001c96ec] pid_child_should_wake+0x0/0x178
> [  265.366780][  T183] [c000000011e6fdf0] [c00000000003a270] system_call_exception+0x1b0/0x4e0
> [  265.366958][  T183] [c000000011e6fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
> 
> The bad page state error occurs when such a folio gets freed (with
> active flag set), from do_exit() path in parallel.
> 
> ... this can happen when the pte fragment was allocated from this folio,
> but when all the fragments get freed, the pte_frag_refcount still had some
> unused fragments. Now, if this process exits, with such folio as it's cached
> pte_frag in mm->context, then during pte_frag_destroy(), we simply call
> pagetable_dtor() and pagetable_free(), meaning it doesn't clear the
> active flag. This, can lead to the above bug. Since we are anyway in
> do_exit() path, then if the refcount is 0, then I guess it should be
> ok to simply clear the folio active flag before calling pagetable_dtor()
> & pagetable_free().
> 
> Fixes: 32cc0b7c9d50 ("powerpc: add pte_free_defer() for pgtables sharing page")
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>

> ---
>   arch/powerpc/mm/pgtable-frag.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
> index 77e55eac16e4..ae742564a3d5 100644
> --- a/arch/powerpc/mm/pgtable-frag.c
> +++ b/arch/powerpc/mm/pgtable-frag.c
> @@ -25,6 +25,7 @@ void pte_frag_destroy(void *pte_frag)
>   	count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
>   	/* We allow PTE_FRAG_NR fragments from a PTE page */
>   	if (atomic_sub_and_test(PTE_FRAG_NR - count, &ptdesc->pt_frag_refcount)) {
> +		folio_clear_active(ptdesc_folio(ptdesc));
>   		pagetable_dtor(ptdesc);
>   		pagetable_free(ptdesc);
>   	}



      parent reply	other threads:[~2026-03-04  8:53 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
2026-02-25 11:04 ` [RFC v1 02/10] powerpc: book3s64: Fix unmap race with PMD THP migration entry Ritesh Harjani (IBM)
2026-03-04  8:54   ` Christophe Leroy (CS GROUP)
2026-02-25 11:04 ` [RFC v1 03/10] mm/debug_vm_pgtable.c: Add test to zap " Ritesh Harjani (IBM)
2026-02-25 11:04 ` [RFC v1 04/10] powerpc/64s/tlbflush-radix: Remove unused radix__flush_tlb_pwc() Ritesh Harjani (IBM)
2026-03-04  8:55   ` Christophe Leroy (CS GROUP)
2026-02-25 11:04 ` [RFC v1 05/10] powerpc/64s: Move serialize_against_pte_lookup() to hash_pgtable.c Ritesh Harjani (IBM)
2026-03-04  9:00   ` Christophe Leroy (CS GROUP)
2026-02-25 11:04 ` [RFC v1 06/10] powerpc/64s: Kill the unused argument of exit_lazy_flush_tlb Ritesh Harjani (IBM)
2026-03-04  9:02   ` Christophe Leroy (CS GROUP)
2026-02-25 11:04 ` [RFC v1 07/10] powerpc: book3s64: Rename tlbie_va_lpid to tlbie_va_pid_lpid Ritesh Harjani (IBM)
2026-03-04  9:04   ` Christophe Leroy (CS GROUP)
2026-02-25 11:04 ` [RFC v1 08/10] powerpc: book3s64: Rename tlbie_lpid_va to tlbie_va_lpid Ritesh Harjani (IBM)
2026-03-04  9:06   ` Christophe Leroy (CS GROUP)
2026-02-25 11:04 ` [RFC v1 09/10] powerpc: book3s64: Make use of H_RPTI_TYPE_ALL macro Ritesh Harjani (IBM)
2026-03-04  9:07   ` Christophe Leroy (CS GROUP)
2026-02-25 11:04 ` [RFC v1 10/10] powerpc: Add MMU_FTRS_POSSIBLE & MMU_FTRS_ALWAYS Ritesh Harjani (IBM)
2026-03-04  9:09   ` Christophe Leroy (CS GROUP)
2026-03-04  9:23     ` Ritesh Harjani
2026-02-25 11:42 ` [RFC v1 01/10] powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy Ritesh Harjani
2026-02-25 11:04   ` Ritesh Harjani (IBM)
2026-03-04  8:53   ` Christophe Leroy (CS GROUP) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e10b3f59-603b-4d7f-a4bf-91f4e9f51ae7@kernel.org \
    --to=chleroy@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=ritesh.list@gmail.com \
    --cc=venkat88@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox