linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Erhard Furtner <erhard_f@mailbox.org>,
	Nicholas Piggin <npiggin@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	Rohan McLure <rmclure@linux.ibm.com>
Subject: Re: BUG: Bad page map in process init pte:c0ab684c pmd:01182000 (on a PowerMac G4 DP)
Date: Thu, 29 Feb 2024 17:11:28 +0000	[thread overview]
Message-ID: <707f617f-45c8-4fa1-83aa-779f2b542871@csgroup.eu> (raw)
In-Reply-To: <20240229020941.2b30abe0@yea>



Le 29/02/2024 à 02:09, Erhard Furtner a écrit :
> On Mon, 12 Dec 2022 14:31:35 +1000
> "Nicholas Piggin" <npiggin@gmail.com> wrote:
> 
>> On Thu Dec 1, 2022 at 7:44 AM AEST, Erhard F. wrote:
>>> Getting this at boot sometimes, but not always (PowerMac G4 DP, kernel 6.0.9):
>>>
>>> [...]
>>> Freeing unused kernel image (initmem) memory: 1328K
>>> Checked W+X mappings: passed, no W+X pages found
>>> rodata_test: all tests were successful
>>> Run /sbin/init as init process
>>> _swap_info_get: Bad swap file entry 24c0ab68
>>> BUG: Bad page map in process init  pte:c0ab684c pmd:01182000
>>
>> Have you run memtest on the system? Are the messages related to a
>> kernel upgrade? This and your KASAN bugs look possibly like random
>> corruption.
>>
>> Although with that KASAN one it's strange that kernfs_node_cache
>> was involved both times, it's strange that page tables are pointing
>> to that same slab memory. It could be a page table page use-after
>> -free maybe? Maybe with the page table fragment code. I'm sure other
>> people would have hit that before though, so I don't know what to
>> suggest.
>>
>> Thanks,
>> Nick
> 
> Revisited the issue on kernel v6.8-rc6 and I can still reproduce it.
> 
> Short summary as my last post was over a year ago:
>   (x) I get this memory corruption only when CONFIG_VMAP_STACK=y and CONFIG_SMP=y is enabled.
>   (x) I don't get this memory corruption when only one of the above is enabled. ^^
>   (x) memtester says the 2 GiB RAM in my G4 DP are fine.
>   (x) I don't get this issue on my G5 11,2 or Talos II.
>   (x) "stress -m 2 --vm-bytes 965M" provokes the issue in < 10 secs. (https://salsa.debian.org/debian/stress)
> 
> For the test I used CONFIG_KASAN_INLINE=y for v6.8-rc6 and debug_pagealloc=on, page_owner=on and got this dmesg:
> 
> [...]
> pagealloc: memory corruption
> f5fcfff0: 00 00 00 00                                      ....
> CPU: 1 PID: 1788 Comm: stress Tainted: G    B              6.8.0-rc6-PMacG4 #15
> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
> Call Trace:
> [f3bfbac0] [c162a8e8] dump_stack_lvl+0x60/0x94 (unreliable)
> [f3bfbae0] [c04edf9c] __kernel_unpoison_pages+0x1e0/0x1f0
> [f3bfbb30] [c04a8aa0] post_alloc_hook+0xe0/0x174
> [f3bfbb60] [c04a8b58] prep_new_page+0x24/0xbc
> [f3bfbb80] [c04abcc4] get_page_from_freelist+0xcd0/0xf10
> [f3bfbc50] [c04aecd8] __alloc_pages+0x204/0xe2c
> [f3bfbda0] [c04b07a8] __folio_alloc+0x18/0x88
> [f3bfbdc0] [c0461a10] vma_alloc_zeroed_movable_folio.isra.0+0x2c/0x6c
> [f3bfbde0] [c046bb90] handle_mm_fault+0x91c/0x19ac
> [f3bfbec0] [c0047b8c] ___do_page_fault+0x93c/0xc14
> [f3bfbf10] [c0048278] do_page_fault+0x28/0x60
> [f3bfbf30] [c000433c] DataAccess_virt+0x124/0x17c
> --- interrupt: 300 at 0xbe30d8
> NIP:  00be30d8 LR: 00be30b4 CTR: 00000000
> REGS: f3bfbf40 TRAP: 0300   Tainted: G    B               (6.8.0-rc6-PMacG4)
> MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 20882464  XER: 00000000
> DAR: 88c7a010 DSISR: 42000000
> GPR00: 00be30b4 af8397d0 a78436c0 6b2ee010 3c500000 20224462 fe77f7e1 00b00264
> GPR08: 1d98d000 1d98c000 00000000 40ae256a 20882262 00bffff4 00000000 00000000
> GPR16: 00000000 00000002 00000000 0000005a 40802262 80002262 40002262 00c000a4
> GPR24: ffffffff ffffffff 3c500000 00000000 00000000 6b2ee010 00c07d64 00001000
> NIP [00be30d8] 0xbe30d8
> LR [00be30b4] 0xbe30b4
> --- interrupt: 300
> page:ef4bd92c refcount:1 mapcount:0 mapping:00000000 index:0x1 pfn:0x310b3
> flags: 0x80000000(zone=2)
> page_type: 0xffffffff()
> raw: 80000000 00000100 00000122 00000000 00000001 00000000 ffffffff 00000001
> raw: 00000000
> page dumped because: pagealloc: corrupted page details
> page_owner info is not present (never set?)
> swapper/1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G    B              6.8.0-rc6-PMacG4 #15
> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
> Call Trace:
> [f101b9d0] [c162a8e8] dump_stack_lvl+0x60/0x94 (unreliable)
> [f101b9f0] [c04ae948] warn_alloc+0x154/0x2e0
> [f101bab0] [c04af030] __alloc_pages+0x55c/0xe2c
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> [...]
> 
> New findings:
>   (x) The page corruption only shows up the 1st time I run "stress -m 2 --vm-bytes 965M". When I quit and restart stress no additional page corruption shows up.
>   (x) The page corruption shows up shortly after I run "stress -m 2 --vm-bytes 965M" but no additional page corruption shows up afterwards, even if left running for 30min.
> 
> 
> For additional testing I thought it would be a good idea to try "modprobe test_vmalloc" but this remained inconclusive. Sometimes a 'BUG: Unable to handle kernel data access on read at 0xe0000000' like this shows up but not always:
> 

Interesting.

I guess 0xe0000000 is where linear RAM starts to be mapped with pages ? 
Can you confirm with a dump of 
/sys/kernel/debug/powerpc/block_address_translation ?

Do we have a problem of race with hash table ?

Would KCSAN help with that ?

Christophe

  reply	other threads:[~2024-02-29 17:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30 21:44 Erhard F.
2022-12-12  4:31 ` Nicholas Piggin
2022-12-12 22:17   ` Erhard F.
2022-12-17 21:39   ` Erhard F.
2022-12-18 11:38     ` Christophe Leroy
2022-12-18 22:47       ` Erhard F.
2025-04-30 16:24     ` Erhard Furtner
2022-12-31 17:22   ` Erhard F.
2024-02-29  1:09   ` Erhard Furtner
2024-02-29 17:11     ` Christophe Leroy [this message]
2024-03-05  1:29       ` Erhard Furtner
2024-03-05  1:57       ` Erhard Furtner
2024-04-17  0:56       ` Erhard Furtner
2024-06-19 22:42       ` Erhard Furtner
2024-08-11 16:52         ` Jonas Vidra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=707f617f-45c8-4fa1-83aa-779f2b542871@csgroup.eu \
    --to=christophe.leroy@csgroup.eu \
    --cc=erhard_f@mailbox.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=rmclure@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox