From: Igor Raits <igor@gooddata.com>
To: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Hillf Danton <hdanton@sina.com>
Subject: Re: kernel BUG at include/linux/swapops.h:204!
Date: Sun, 11 Jul 2021 08:06:08 +0200 [thread overview]
Message-ID: <CA+9S74i1kqAEXt6GjPpiWsCeBOxp0MFvdGsKmf=MFVogMGbzKg@mail.gmail.com> (raw)
In-Reply-To: <4c9e24db-29d5-5bbb-17ae-8dc32ceb66ed@google.com>
[-- Attachment #1: Type: text/plain, Size: 8036 bytes --]
Hi Hugh,
On Sun, Jul 11, 2021 at 6:17 AM Hugh Dickins <hughd@google.com> wrote:
> On Sat, 10 Jul 2021, Igor Raits wrote:
>
> > Hello,
> >
> > I've seen one weird bug on 5.12.14 that happened a couple of times when I
> > started a bunch of VMs on a server.
>
> Would it be possible for you to try the same on a 5.12.13 kernel?
> Perhaps by reverting the diff between 5.12.13 and 5.12.14 temporarily.
> Enough to form an impression of whether the issue is new in 5.12.14.
>
We've been using 5.12.12 for quite some time (~ a month) and I never saw it
there.
But I have to admit that I don't really have a reproducer. For example, on
servers where it happened,
I just rebooted them and panic did not happen anymore (so I saw it only
only once,
only on 2 servers out of 32 that we have on 5.12.14).
> I ask because 5.12.14 did include several fixes and cleanups from me
> to page_vma_mapped_walk(), and that is involved in inserting and
> removing pmd migration entries. I am not aware of introducing any
> bug there, but your report has got me worried. If it's happening in
> 5.12.14 but not in 5.12.13, then I must look again at my changes.
>
> I don't expect Hillf's patch to help at at all: the pmd_lock()
> is supposed to be taken by page_vma_mapped_walk(), before
> set_pmd_migration_entry() and remove_migration_pmd() are called.
>
> Thanks,
> Hugh
>
> >
> > I've briefly googled this problem but could not find any relevant commit
> > that would fix this issue.
> >
> > Do you have any hint how to debug this further or know the fix by any
> > chance?
> >
> > Thanks in advance. Stack trace following:
> >
> > [ 376.876610] ------------[ cut here ]------------
> > [ 376.881274] kernel BUG at include/linux/swapops.h:204!
> > [ 376.886455] invalid opcode: 0000 [#1] SMP NOPTI
> > [ 376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G
> E
> > 5.12.14-1.gdc.el8.x86_64 #1
> > [ 376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> > Gen10, BIOS U30 05/24/2021
> > [ 376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > [ 376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
> 00
> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > [ 376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > [ 376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > ffffffffffffffff
> > [ 376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > fffff497473b2ae8
> > [ 376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > 0000000000000000
> > [ 376.960230] R10: 0000000000000000 R11: 0000000000000000 R12:
> > 0000000000000af8
> > [ 376.967407] R13: 0400000000000000 R14: 0400000000000080 R15:
> > ffff908bbef7b6a8
> > [ 376.974582] FS: 00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> > knlGS:0000000000000000
> > [ 376.982718] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > 00000000007726e0
> > [ 376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [ 377.010026] PKRU: 55555554
> > [ 377.012745] Call Trace:
> > [ 377.015207] __handle_mm_fault+0x5ad/0x6e0
> > [ 377.019335] handle_mm_fault+0xc5/0x290
> > [ 377.023194] do_user_addr_fault+0x1cd/0x740
> > [ 377.027406] exc_page_fault+0x54/0x110
> > [ 377.031182] ? asm_exc_page_fault+0x8/0x30
> > [ 377.035307] asm_exc_page_fault+0x1e/0x30
> > [ 377.039340] RIP: 0033:0x7f5bb91d6734
> > [ 377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31
> c0
> > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
> > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
> > [ 377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206
> > [ 377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > 00007f5ba0000020
> > [ 377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI:
> > 0000000000000001
> > [ 377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 00007f5bb93ea2f0
> > [ 377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12:
> > 0000000000000001
> > [ 377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15:
> > 00007f5bb1f801f0
> > [ 377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E)
> > ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E) nft_limit(E)
> > nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E) xt_multiport(E)
> > xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E) nft_compat(E)
> > ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E)
> > tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E) nf_tables(E)
> > vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E) binfmt_misc(E)
> > iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E)
> > vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E) sunrpc(E)
> > rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E) target_core_mod(E)
> > ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
> scsi_transport_iscsi(E)
> > intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E)
> > isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E)
> > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> > crct10dif_pclmul(E)
> > [ 377.102999] crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
> > intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E) ioatdma(E)
> > ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E) qede(E)
> > enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E)
> > intel_pch_thermal(E) dca(E) ipmi_msghandler(E) acpi_power_meter(E)
> ext4(E)
> > mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E) crc8(E)
> > libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E)
> scsi_transport_sas(E)
> > wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E) crc32c_intel(E)
> > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> > [ 377.243468] ---[ end trace 04bce3bb051f7620 ]---
> > [ 377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > [ 377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
> 00
> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > [ 377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > [ 377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > ffffffffffffffff
> > [ 377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > fffff497473b2ae8
> > [ 377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > 0000000000000000
> > [ 377.436902] R10: 0000000000000000 R11: 0000000000000000 R12:
> > 0000000000000af8
> > [ 377.444086] R13: 0400000000000000 R14: 0400000000000080 R15:
> > ffff908bbef7b6a8
> > [ 377.451272] FS: 00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> > knlGS:0000000000000000
> > [ 377.459415] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > 00000000007726e0
> > [ 377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [ 377.486738] PKRU: 55555554
> > [ 377.489465] Kernel panic - not syncing: Fatal exception
> > [ 377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000
> (relocation
> > range: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 377.716482] ---[ end Kernel panic - not syncing: Fatal exception ]---
> >
>
--
Igor Raits
Sr. SW Engineer
igor@gooddata.com
+420 775 117 817
Moravske namesti 1007/14
602 00 Brno-Veveri, Czech Republic
Twitter <https://twitter.com/gooddata> | Facebook
<https://www.facebook.com/gooddata> | LinkedIn
<http://www.linkedin.com/company/gooddata> | Blog
<http://www.gooddata.com/blog>
<https://www.gooddata.com/>
[-- Attachment #2: Type: text/html, Size: 13860 bytes --]
next prev parent reply other threads:[~2021-07-11 6:06 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-10 7:33 Igor Raits
2021-07-10 12:46 ` Hillf Danton
2021-07-11 4:17 ` Hugh Dickins
2021-07-11 6:06 ` Igor Raits [this message]
2021-07-15 17:47 ` Igor Raits
2021-07-16 19:45 ` Hugh Dickins
2021-07-19 19:11 ` Hugh Dickins
2021-07-19 22:12 ` Peter Xu
2021-07-19 22:42 ` Hugh Dickins
2021-07-20 0:34 ` Peter Xu
2021-07-20 3:31 ` Hugh Dickins
2021-07-20 7:47 ` Igor Raits
2021-07-20 16:01 ` Peter Xu
2021-07-20 16:05 ` Igor Raits
2021-07-20 15:51 ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Peter Xu
2021-07-20 15:51 ` [PATCH stable 5.13.y/5.12.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork Peter Xu
2021-07-20 15:51 ` [PATCH stable 5.13.y/5.12.y 2/2] mm/userfaultfd: fix uffd-wp special cases for fork() Peter Xu
2021-07-20 20:32 ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Hugh Dickins
2021-07-22 14:02 ` Greg KH
2021-07-20 15:56 ` [PATCH stable 5.10.y " Peter Xu
2021-07-20 15:56 ` [PATCH stable 5.10.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork Peter Xu
2021-07-20 15:56 ` [PATCH stable 5.10.y 2/2] mm/userfaultfd: fix uffd-wp special cases for fork() Peter Xu
2021-07-20 20:38 ` [PATCH stable 5.10.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Hugh Dickins
2021-07-22 14:05 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+9S74i1kqAEXt6GjPpiWsCeBOxp0MFvdGsKmf=MFVogMGbzKg@mail.gmail.com' \
--to=igor@gooddata.com \
--cc=akpm@linux-foundation.org \
--cc=hdanton@sina.com \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox