On Oct 28, 2016 7:12 PM, "Linus Torvalds" <torvalds@linux-foundation.org>
wrote:
>
> [ Chris, Kent, ignore the subject line, that was a mis-attribution of
> the cause ]
>
> On Fri, Oct 28, 2016 at 3:25 PM, Joseph Yasi <joe.yasi@gmail.com> wrote:
> >
> > I've been able to reproduce the issue with 19be0eaffa3ac7d8eb ("mm:
remove
> > gup_flags FOLL_WRITE games from __get_user_pages()") reverted.
>
> Yeah, this doesn't look to have anything to do with that commit.
>
> >   This smells like a race condition
> > somewhere. It's possible I just happened to never encounter that race
> > before.
>
> It looks like some seriously odd corruption. It's doing spin_lock()
> inside lockref_get_not_dead(), which is just a spinlock in the dentry.
> There's no way it should cause problems.
>
> The code disassembles to
>
>    0: 45 31 c9             xor    %r9d,%r9d
>    3: 85 c0                 test   %eax,%eax
>    5: 74 44                 je     0x4b
>    7: 48 89 c2             mov    %rax,%rdx
>    a: c1 e8 12             shr    $0x12,%eax
>    d: 48 c1 ea 0c           shr    $0xc,%rdx
>   11: 83 e8 01             sub    $0x1,%eax
>   14: 83 e2 30             and    $0x30,%edx
>   17: 48 98                 cltq
>   19: 48 81 c2 c0 6e 01 00 add    $0x16ec0,%rdx
>   20: 48 03 14 c5 a0 21 a7 add    -0x5e58de60(,%rax,8),%rdx
>   27: a1
>   28:* 48 89 0a             mov    %rcx,(%rdx) <-- trapping instruction
>   2b: 8b 41 08             mov    0x8(%rcx),%eax
>   2e: 85 c0                 test   %eax,%eax
>   30: 75 09                 jne    0x3b
>   32: f3 90                 pause
>   34: 8b 41 08             mov    0x8(%rcx),%eax
>   37: 85 c0                 test   %eax,%eax
>   39: 74 f7                 je     0x32
>
> where the beginning of that sequence is the "decode_tail() code, and I
> think the trapping instruction is the
>
>                 WRITE_ONCE(prev->next, node);
>
> so it's from kernel/locking/qspinlock.c:536:
>
>                 prev = decode_tail(old);
>                 /*
>                  * The above xchg_tail() is also a load of @lock which
> generates,
>                  * through decode_tail(), a pointer.
>                  *
>                  * The address dependency matches the RELEASE of
xchg_tail()
>                  * such that the access to @prev must happen after.
>                  */
>                 smp_read_barrier_depends();
>
>                 WRITE_ONCE(prev->next, node);
>
>                 pv_wait_node(node, prev);
>
> and yes, %rdx (which should contain that pointer to 'prev') has that
> bogus pointer value 00007facb85592b6.
>
> So that's a core spinlock in the dentry being corrupted.
>
> Quite frankly, I'm somewhat suspicious of this:
>
>   Modules linked in: pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O)
>      vboxdrv(O) rfcomm bnep binfmt_misc vfat fat snd_hda_codec_hdmi
>
> ie those out-of-tree vbox modules..
>
> But for others, here's a cleaned-up copy of the oops in case somebody
> else sees something.
>
>   BUG: unable to handle kernel paging request at 00007facb85592b6
>   IP: queued_spin_lock_slowpath+0xe1/0x170
>   PGD 7cee19067 PUD 0
>   Oops: 0002 [#1] PREEMPT SMP
>   Modules linked in: pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O)
>      vboxdrv(O) rfcomm bnep binfmt_misc vfat fat snd_hda_codec_hdmi
>      snd_hda_codec_realtek snd_hda_codec_generic uvcvideo
>      videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core
>      snd_usb_audio videodev snd_usbmidi_lib media snd_hda_intel
>      snd_hda_codec snd_hwdep snd_hda_core snd_pcm_oss snd_mixer_oss
>      snd_pcm input_leds intel_rapl x86_pkg_temp_thermal btusb
>      intel_powerclamp crct10dif_pclmul btrtl btbcm efi_pstore
>      crc32_pclmul btintel crc32c_intel bluetooth ghash_clmulni_intel
>      aesni_intel aes_x86_64 snd_seq_oss lrw glue_helper ablk_helper
>      cryptd intel_cstate snd_seq_midi snd_rawmidi intel_rapl_perf
>      snd_seq_midi_event snd_seq efivars snd_seq_device snd_timer snd
>      soundcore wl(PO) cfg80211 rfkill sg battery intel_lpss_acpi
>      intel_lpss mfd_core acpi_pad tpm_tis acpi_als tpm_tis_core
>      kfifo_buf tpm industrialio nfsd auth_rpcgss coretemp oid_registry
>      nfs_acl lockd loop grace sunrpc efivarfs ipv6 crc_ccitt hid_generic
>      usbhid uas usb_storage igb e1000e dca ptp mxm_wmi bcache psmouse
>      i915 intel_gtt pps_core drm_kms_helper xhci_pci hwmon syscopyarea
>      xhci_hcd sysfillrect sysimgblt i2c_algo_bit fb_sys_fops usbcore
>      sr_mod drm cdrom i2c_core usb_common fan thermal
>      pinctrl_sunrisepoint wmi video pinctrl_intel button
>   CPU: 3 PID: 1139 Comm: lsof Tainted: P           O    4.8.3-customskl #1
>   Hardware name: System manufacturer System Product Name/Z170-DELUXE,
> BIOS 2202 09/19/2016
>   task: ffff9e4a40062640 task.stack: ffff9e468ef80000
>   RIP: 0010:[<ffffffffa1082731>]  [<ffffffffa1082731>]
> queued_spin_lock_slowpath+0xe1/0x170
>   RSP: 0018:ffff9e468ef83d00  EFLAGS: 00010202
>   RAX: 0000000000001fff RBX: ffff9e494f7f2718 RCX: ffff9e4b2ecd6ec0
>   RDX: 00007facb85592b6 RSI: 0000000080000000 RDI: ffff9e494f7f2718
>   RBP: 0000000000000000 R08: 0000000000100000 R09: 0000000000000000
>   R10: 0000000020ab886e R11: ffff9e494f7f26f8 R12: 0000000000000000
>   R13: ffff9e494f7f26c0 R14: ffff9e468ef83d90 R15: 0000000000000000
>   FS:  00007f3344595800(0000) GS:ffff9e4b2ecc0000(0000)
knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007facb85592b6 CR3: 00000007daf09000 CR4: 00000000003406e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>     lockref_get_not_dead+0x3a/0x80
>     unlazy_walk+0xee/0x180
>     complete_walk+0x2e/0x70
>     path_lookupat+0x93/0x100
>     filename_lookup+0x99/0x150
>     pipe_read+0x27e/0x340
>     getname_flags+0x6a/0x1d0
>     vfs_fstatat+0x44/0x90
>     SYSC_newlstat+0x1d/0x40
>     vfs_read+0x112/0x130
>     SyS_read+0x3d/0x90
>     entry_SYSCALL_64_fastpath+0x17/0x93
>   Code: c1 e0 10 45 31 c9 85 c0 74 44 48 89 c2 c1 e8 12 48 c1 ea 0c 83
> e8 01 83 e2 30 48 98 48 81 c2 c0 6e 01 00 48 03 14 c5 a0 21 a7 a1 <48>
> 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b
>   RIP  [<ffffffffa1082731>] queued_spin_lock_slowpath+0xe1/0x170
>    RSP <ffff9e468ef83d00>
>   CR2: 00007facb85592b6
>
> > The /home partition in question is btrfs on bcache in writethrough
mode. The
> > cache drive is an 180 GB Intel SATA SSD, and the backing device is two
WD 3
> > TB SATA HDDs configured in MD RAID 10 f2 layout. / is btrfs on an NVMe
SSD.
> >
> > I've also seen btrfs checksum errors in the kernel log when reproducing
> > this. Rebooting and running btrfs scrub finds nothing though so it seems
> > like in memory corruption.
>
> I'm adding Chris Mason and Kent Overstreet to the participants,
> because we did have a recent btrfs memory corruption thing. This
> corruption seems to be pretty widespread through, you migth also want
> to just run "memtest" on your machine.
>
> *Most* memory corruption tends to be due to software issues, but
> sometimes it really ends up being the memory itself going bad.
>
> But also, please test if this happens without the out-of-tree modules?

I was testing it without VirtualBox and broadcom-wl out-of-tree modules,
and the machine locked up and won't POST anymore. The motherboard is
claiming it's the CPU, so it looks like this was hardware. Sorry for the
noise.

-Joe

>
>                 Kubys