From: Mikulas Patocka <mpatocka@redhat.com>
To: James Morse <james.morse@arm.com>
Cc: Michal Hocko <mhocko@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
Pavel Tatashin <Pavel.Tatashin@microsoft.com>
Subject: Re: A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
Date: Thu, 23 Aug 2018 07:02:37 -0400 (EDT) [thread overview]
Message-ID: <alpine.LRH.2.02.1808220808050.17906@file01.intranet.prod.int.rdu2.redhat.com> (raw)
In-Reply-To: <e35b7c14-c7ea-412d-2763-c961b74576f3@arm.com>
On Tue, 21 Aug 2018, James Morse wrote:
> Hi guys,
>
> On 08/21/2018 11:44 AM, Michal Hocko wrote:
> > On Fri 17-08-18 15:44:27, Mikulas Patocka wrote:
> > > I report this crash on ARM64 on the kernel 4.17.11. The reason is that the
> > > function move_freepages_block accesses contiguous runs of
> > > pageblock_nr_pages. The ARM64 firmware sets holes of reserved memory there
> > > and when move_freepages_block stumbles over this hole, it accesses
> > > uninitialized page structures and crashes.
>
> Any idea if this is nomap (so a hole in the linear map), or a missing struct
> page?
The page for this hole seems to be filled with 0xff.
> > > 00000000-03ffffff : System RAM
> > > 00080000-007bffff : Kernel code
> > > 00820000-00aa3fff : Kernel data
> > > 04200000-bf80ffff : System RAM
> > > bf810000-bfbeffff : reserved
> > > bfbf0000-bfc8ffff : System RAM
> > > bfc90000-bffdffff : reserved
> > > bffe0000-bfffffff : System RAM
> > > c0000000-dfffffff : MEM
> > > c0000000-c00fffff : PCI Bus 0000:01
> > > c0000000-c0003fff : 0000:01:00.0
> > > c0000000-c0003fff : nvme
> To test Laura's bounds-of-zone theory [0], could you put some empty space
> between the nvme and the System RAM? (It sounds like this is a KVM guest).
> Reducing the amount of memory is probably easiest.
This is not KVM - it is real hardware with real PCIe nvme device. I don't
have smaller memory stick.
The board can use u-boot firmware or EFI firmware. The u-boot firmware
doesn't put a hole in the memory map and the board has been running with
it for several months without a problem.
The EFI firmware puts a hole below 0xc0000000 and I got a crash after two
weeks of uptime.
> > > The bug was already reported here for x86:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1598462
> > >
> > > For x86, it was fixed in the kernel 4.17.7 - but I observed it in the
> > > kernel 4.17.11 on ARM64. I also observed it on 4.18-rc kernels running in
> > > KVM virtual machine on ARM when I compiled the guest kernel with 64kB page
> > > size.
>
> I'm not sure this is the same bug.
>
> [1] reports hitting a VM_BUG, this is a dereference of -ENOENT:
This crash is not from -ENOENT. It crashes because page->compound_head is
0xffffffffffffffff (see below).
If I enable CONFIG_DEBUG_VM, I also get VM_BUG.
> > > Unable to handle kernel paging request at virtual address fffffffffffffffe
>
> Does your kernel have HOLES_IN_ZONE enabled? (It looks like it depends on
> NUMA)
No.
> Could you reproduce this with CONIG_DEBUG_VM enabled?
I reproduced it in KVM with 64k pages and I enabled CONIG_DEBUG_VM, see
below. (the bug could be triggerd more quickly in KVM).
> move_freepages() uses pfn_valid_within(), so it should handle missing struct
> pages in this range.
>
>
> > > CPU: 3 PID: 14823 Comm: updatedb.mlocat Not tainted 4.17.11 #16
> > > Hardware name: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin,
> > > BIOS EDK II Jul 30 2018
> > > pstate: 00000085 (nzcv daIf -PAN -UAO)
> > > pc : move_freepages_block+0xb4/0x160
> > > lr : steal_suitable_fallback+0xe4/0x188
>
> Any chance you could addr2line these?
I analyzed the assembler:
PageBuddy in move_freepages returns false
Then we call PageLRU, the macro calls PF_HEAD which is compound_page()
compound_page reads page->compound_head, it is 0xffffffffffffffff, so it
resturns 0xfffffffffffffffe - and accessing this address causes crash
> > > Call trace:
> > > move_freepages_block+0xb4/0x160
> > > get_page_from_freelist+0xad8/0xea8
> > > __alloc_pages_nodemask+0xac/0x970
> > > new_slab+0xc0/0x348
> > > ___slab_alloc.constprop.32+0x2cc/0x350
> > > __slab_alloc.isra.26.constprop.31+0x24/0x38
> > > kmem_cache_alloc+0x168/0x198
> > > spadfs_alloc_inode+0x2c/0x88
> > > alloc_inode+0x20/0xa0
> > > iget5_locked+0xf8/0x1c0
>
> > > spadfs_iget+0x44/0x4c8
> > > spadfs_lookup+0x70/0x108
>
> Hmmm. What's this?
http://artax.karlin.mff.cuni.cz/~mikulas/spadfs/download/
> Thanks,
>
> James
>
>
> [0] https://www.spinics.net/lists/linux-mm/msg157223.html
> [1] https://www.spinics.net/lists/linux-mm/msg156764.html
The same crash in KVM. The guest kernel has 64k pages. I enabled
CONFIG_DEBUG_VM:
[ 1493.526129] page:fffffdff802e1780 is uninitialized and poisoned
[ 1493.526136] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[ 1493.528030] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[ 1493.529320] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
[ 1493.530441] ------------[ cut here ]------------
[ 1493.531301] kernel BUG at include/linux/mm.h:978!
[ 1493.532176] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 1493.533196] Modules linked in: raid0 raid10 dm_delay xfs reiserfs loop dm_crypt dm_zero dm_integrity raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx md_mod dm_thin_pool dm_cache_smq dm_cache dm_persistent_data dm_bio_prison libcrc32c dm_mirror dm_region_hash dm_log dm_snapshot dm_bufio dm_mod ipv6 autofs4 binfmt_misc nls_utf8 nls_cp852 vfat fat af_packet aes_ce_blk crypto_simd cryptd aes_ce_cipher crc32_ce crct10dif_ce ghash_ce gf128mul aes_arm64 sha2_ce sha256_arm64 sha1_ce sha1_generic efivars virtio_net virtio_rng net_failover rng_core failover virtio_console ext4 crc32c_generic crc16 mbcache jbd2 virtio_scsi sd_mod scsi_mod virtio_blk virtio_mmio virtio_pci virtio_ring virtio [last unloaded: brd]
[ 1493.545466] CPU: 1 PID: 25236 Comm: dd Not tainted 4.18.0 #7
[ 1493.546540] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 1493.547833] pstate: 40000085 (nZcv daIf -PAN -UAO)
[ 1493.548749] pc : move_freepages_block+0x144/0x248
[ 1493.549647] lr : move_freepages_block+0x144/0x248
[ 1493.550539] sp : fffffe0071177680
[ 1493.551176] x29: fffffe0071177680 x28: fffffc000861f3f8
[ 1493.552184] x27: 0000000000000048 x26: fffffc0008492000
[ 1493.553197] x25: fffffe007117771c x24: 000000000007ffc0
[ 1493.554203] x23: fffffc000861ef80 x22: fffffdff802fffc0
[ 1493.555209] x21: 0000000000000020 x20: fffffdff80280000
[ 1493.556220] x19: fffffdff802e1780 x18: 0000000000000000
[ 1493.557227] x17: 000003ff88424b08 x16: fffffc0008182c9c
[ 1493.558232] x15: 000000000000000a x14: 0720072007200720
[ 1493.559239] x13: 0720072007200720 x12: 0720072007200720
[ 1493.560249] x11: 0720072907290770 x10: 072807640765076e
[ 1493.561256] x9 : 076f07730769076f x8 : 0000000000000000
[ 1493.562261] x7 : 0750072807450747 x6 : 0000000000000007
[ 1493.563270] x5 : fffffe00bff30750 x4 : 0000000000000001
[ 1493.564276] x3 : 0000000000000007 x2 : 0000000000000007
[ 1493.565283] x1 : fffffe006260cd00 x0 : 0000000000000034
[ 1493.566297] Process dd (pid: 25236, stack limit = 0x0000000094cc07fb)
[ 1493.567506] Call trace:
[ 1493.567985] move_freepages_block+0x144/0x248
[ 1493.568812] steal_suitable_fallback+0x100/0x16c
[ 1493.569694] get_page_from_freelist+0x440/0xb20
[ 1493.570554] __alloc_pages_nodemask+0xe8/0x838
[ 1493.571401] new_slab+0xd4/0x418
[ 1493.572022] ___slab_alloc.constprop.27+0x380/0x4a8
[ 1493.572952] __slab_alloc.isra.21.constprop.26+0x24/0x34
[ 1493.573955] kmem_cache_alloc+0xa8/0x180
[ 1493.574704] alloc_buffer_head+0x1c/0x90
[ 1493.575452] alloc_page_buffers+0x68/0xb0
[ 1493.576222] create_empty_buffers+0x20/0x1ec
[ 1493.577033] create_page_buffers+0xb0/0xf0
[ 1493.577815] __block_write_begin_int+0xc4/0x564
[ 1493.578676] __block_write_begin+0x10/0x18
[ 1493.579457] block_write_begin+0x48/0xd0
[ 1493.580212] blkdev_write_begin+0x28/0x30
[ 1493.580977] generic_perform_write+0x98/0x16c
[ 1493.581807] __generic_file_write_iter+0x138/0x168
[ 1493.582715] blkdev_write_iter+0x80/0xf0
[ 1493.583470] __vfs_write+0xe4/0x10c
[ 1493.584138] vfs_write+0xb4/0x168
[ 1493.584775] ksys_write+0x44/0x88
[ 1493.585412] sys_write+0xc/0x14
[ 1493.586018] el0_svc_naked+0x30/0x34
[ 1493.586708] Code: aa1303e0 90001a01 91296421 94008902 (d4210000)
[ 1493.587857] ---[ end trace 1601ba47f6e883fe ]---
[ 1493.588780] note: dd[25236] exited with preempt_count 1
memory map for the KVM guest:
09000000-09000fff : pl011@9000000
09000000-09000fff : pl011@9000000
09030000-09030fff : pl061@9030000
10000000-3efeffff : pcie@10000000
10000000-101fffff : PCI Bus 0000:01
10000000-1003ffff : 0000:01:00.0
10040000-10040fff : 0000:01:00.0
10200000-103fffff : PCI Bus 0000:02
10400000-105fffff : PCI Bus 0000:03
10400000-10400fff : 0000:03:00.0
10600000-107fffff : PCI Bus 0000:04
10800000-109fffff : PCI Bus 0000:05
10800000-10800fff : 0000:05:00.0
3f000000-3fffffff : PCI ECAM
40000000-f85dffff : System RAM
40080000-4057ffff : Kernel code
405d0000-408effff : Kernel data
f85e0000-f86bffff : reserved
f86c0000-f86dffff : System RAM
f86e0000-f874ffff : reserved
f8750000-fbc1ffff : System RAM
fbc20000-fbffffff : reserved
fc000000-ffffffff : System RAM
8000000000-ffffffffff : pcie@10000000
8000000000-80001fffff : PCI Bus 0000:01
8000000000-8000003fff : 0000:01:00.0
8000000000-8000003fff : virtio-pci-modern
8000200000-80003fffff : PCI Bus 0000:02
8000400000-80005fffff : PCI Bus 0000:03
8000400000-8000403fff : 0000:03:00.0
8000400000-8000403fff : virtio-pci-modern
8000600000-80007fffff : PCI Bus 0000:04
8000600000-8000603fff : 0000:04:00.0
8000600000-8000603fff : virtio-pci-modern
8000800000-80009fffff : PCI Bus 0000:05
8000800000-8000803fff : 0000:05:00.0
8000800000-8000803fff : virtio-pci-modern
Mikulas
next prev parent reply other threads:[~2018-08-23 11:02 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-17 19:44 Mikulas Patocka
2018-08-21 10:44 ` Michal Hocko
2018-08-21 12:58 ` James Morse
2018-08-23 11:02 ` Mikulas Patocka [this message]
2018-08-23 11:10 ` Michal Hocko
2018-08-23 11:16 ` Mikulas Patocka
2018-08-23 11:23 ` Michal Hocko
2018-08-23 13:13 ` Pasha Tatashin
2018-08-23 13:14 ` Pasha Tatashin
2018-08-23 14:34 ` Mikulas Patocka
2018-08-23 14:06 ` James Morse
2018-08-24 11:41 ` Michal Hocko
2018-08-29 17:37 ` James Morse
2018-08-30 15:58 ` Mikulas Patocka
2018-08-30 16:11 ` Will Deacon
2018-08-30 16:25 ` James Morse
2018-09-03 19:33 ` Michal Hocko
2018-09-07 17:47 ` James Morse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LRH.2.02.1808220808050.17906@file01.intranet.prod.int.rdu2.redhat.com \
--to=mpatocka@redhat.com \
--cc=Pavel.Tatashin@microsoft.com \
--cc=catalin.marinas@arm.com \
--cc=james.morse@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox