linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Vyukov <dvyukov@google.com>
To: Andrea Righi <andrea.righi@canonical.com>
Cc: Marco Elver <elver@google.com>,
	Alexander Potapenko <glider@google.com>,
	kasan-dev@googlegroups.com,  linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
Date: Mon, 11 Oct 2021 09:19:48 +0200	[thread overview]
Message-ID: <CACT4Y+b4Xmev7uLhASpHnELcteadhaXCBkkD5hO2YNP5M2451g@mail.gmail.com> (raw)
In-Reply-To: <YWPjZv7ClDOE66iI@arighi-desktop>

On Mon, 11 Oct 2021 at 09:10, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Mon, Oct 11, 2021 at 08:48:29AM +0200, Marco Elver wrote:
> > On Mon, 11 Oct 2021 at 08:32, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > On Mon, Oct 11, 2021 at 08:00:00AM +0200, Marco Elver wrote:
> > > > On Sun, 10 Oct 2021 at 15:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > I can systematically reproduce the following soft lockup w/ the latest
> > > > > 5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I've
> > > > > tested so far).
> > > > >
> > > > > I've found this issue by running systemd autopkgtest (I'm using the
> > > > > latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen with
> > > > > any recent version of systemd).
> > > > >
> > > > > I'm running this test inside a local KVM instance and apparently systemd
> > > > > is starting up its own KVM instances to run its tests, so the context is
> > > > > a nested KVM scenario (even if I don't think the nested KVM part really
> > > > > matters).
> > > > >
> > > > > Here's the oops:
> > > > >
> > > > > [   36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [udevadm:333]
> > > > > [   36.466565] Modules linked in: btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy
> > > > > [   36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4
> > > > > [   36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> > > > [...]
> > > > >
> > > > > If I disable CONFIG_KFENCE the soft lockup doesn't happen and systemd
> > > > > autotest completes just fine.
> > > > >
> > > > > We've decided to disable KFENCE in the latest Ubuntu Impish kernel
> > > > > (5.13) for now, because of this issue, but I'm still investigating
> > > > > trying to better understand the problem.
> > > > >
> > > > > Any hint / suggestion?
> > > >
> > > > Can you confirm this is not a QEMU TCG instance? There's been a known
> > > > issue with it: https://bugs.launchpad.net/qemu/+bug/1920934
> > >
> > > It looks like systemd is running qemu-system-x86 without any "accel"
> > > options, so IIUC the instance shouldn't use TCG. Is this a correct
> > > assumption or is there a better way to check?
> >
> > AFAIK, the default is TCG if nothing else is requested. What was the
> > command line?
>
> This is the full command line of what systemd is running:
>
>   /bin/qemu-system-x86_64 -smp 4 -net none -m 512M -nographic -vga none -kernel /boot/vmlinuz-5.15-rc4 -drive format=raw,cache=unsafe,file=/var/tmp/systemd-test.sI1nrh/badid.img -initrd /boot/initrd.img-5.15-rc4 -append  root=/dev/sda1 rw raid=noautodetect rd.luks=0 loglevel=2 init=/lib/systemd/systemd console=ttyS0 selinux=0  SYSTEMD_UNIT_PATH=/usr/lib/systemd/tests/testdata/testsuite-14.units:/usr/lib/systemd/tests/testdata/units: systemd.unit=testsuite.target systemd.wants=testsuite-14.service systemd.wants=end.service
>
> And this is running inside a KVM instance (so a nested KVM scenario).

Hi Andrea,

I think you need to pass -enable-kvm to make it "nested KVM scenario",
otherwise it's TCG emulation.

You seem to use the default 20s stall timeout. FWIW syzbot uses 160
secs timeout for TCG emulation to avoid false positive warnings:
https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
There are a number of other timeouts raised as well, some as high as
420 seconds.


  reply	other threads:[~2021-10-11  7:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-10 13:53 Andrea Righi
2021-10-11  6:00 ` Marco Elver
2021-10-11  6:32   ` Andrea Righi
2021-10-11  6:48     ` Marco Elver
2021-10-11  7:10       ` Andrea Righi
2021-10-11  7:19         ` Dmitry Vyukov [this message]
2021-10-11  9:23           ` Andrea Righi
2021-10-11  9:52             ` Andrea Righi
2021-10-11 10:03               ` Marco Elver
2021-10-11 14:42                 ` Andrea Righi
2021-10-11 15:00                   ` Dmitry Vyukov
2021-10-11 15:11                     ` Andrea Righi
2021-10-11 15:16                       ` Dmitry Vyukov
2021-10-20  6:00                   ` Marco Elver
2021-10-20  6:11                     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACT4Y+b4Xmev7uLhASpHnELcteadhaXCBkkD5hO2YNP5M2451g@mail.gmail.com \
    --to=dvyukov@google.com \
    --cc=andrea.righi@canonical.com \
    --cc=elver@google.com \
    --cc=glider@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox