From: Andrea Righi <andrea.righi@canonical.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>,
Alexander Potapenko <glider@google.com>,
kasan-dev@googlegroups.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
Date: Mon, 11 Oct 2021 11:23:30 +0200 [thread overview]
Message-ID: <YWQCknwPcGlOBfUi@arighi-desktop> (raw)
In-Reply-To: <CACT4Y+b4Xmev7uLhASpHnELcteadhaXCBkkD5hO2YNP5M2451g@mail.gmail.com>
On Mon, Oct 11, 2021 at 09:19:48AM +0200, Dmitry Vyukov wrote:
> On Mon, 11 Oct 2021 at 09:10, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Mon, Oct 11, 2021 at 08:48:29AM +0200, Marco Elver wrote:
> > > On Mon, 11 Oct 2021 at 08:32, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > On Mon, Oct 11, 2021 at 08:00:00AM +0200, Marco Elver wrote:
> > > > > On Sun, 10 Oct 2021 at 15:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > I can systematically reproduce the following soft lockup w/ the latest
> > > > > > 5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I've
> > > > > > tested so far).
> > > > > >
> > > > > > I've found this issue by running systemd autopkgtest (I'm using the
> > > > > > latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen with
> > > > > > any recent version of systemd).
> > > > > >
> > > > > > I'm running this test inside a local KVM instance and apparently systemd
> > > > > > is starting up its own KVM instances to run its tests, so the context is
> > > > > > a nested KVM scenario (even if I don't think the nested KVM part really
> > > > > > matters).
> > > > > >
> > > > > > Here's the oops:
> > > > > >
> > > > > > [ 36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [udevadm:333]
> > > > > > [ 36.466565] Modules linked in: btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy
> > > > > > [ 36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4
> > > > > > [ 36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> > > > > [...]
> > > > > >
> > > > > > If I disable CONFIG_KFENCE the soft lockup doesn't happen and systemd
> > > > > > autotest completes just fine.
> > > > > >
> > > > > > We've decided to disable KFENCE in the latest Ubuntu Impish kernel
> > > > > > (5.13) for now, because of this issue, but I'm still investigating
> > > > > > trying to better understand the problem.
> > > > > >
> > > > > > Any hint / suggestion?
> > > > >
> > > > > Can you confirm this is not a QEMU TCG instance? There's been a known
> > > > > issue with it: https://bugs.launchpad.net/qemu/+bug/1920934
> > > >
> > > > It looks like systemd is running qemu-system-x86 without any "accel"
> > > > options, so IIUC the instance shouldn't use TCG. Is this a correct
> > > > assumption or is there a better way to check?
> > >
> > > AFAIK, the default is TCG if nothing else is requested. What was the
> > > command line?
> >
> > This is the full command line of what systemd is running:
> >
> > /bin/qemu-system-x86_64 -smp 4 -net none -m 512M -nographic -vga none -kernel /boot/vmlinuz-5.15-rc4 -drive format=raw,cache=unsafe,file=/var/tmp/systemd-test.sI1nrh/badid.img -initrd /boot/initrd.img-5.15-rc4 -append root=/dev/sda1 rw raid=noautodetect rd.luks=0 loglevel=2 init=/lib/systemd/systemd console=ttyS0 selinux=0 SYSTEMD_UNIT_PATH=/usr/lib/systemd/tests/testdata/testsuite-14.units:/usr/lib/systemd/tests/testdata/units: systemd.unit=testsuite.target systemd.wants=testsuite-14.service systemd.wants=end.service
> >
> > And this is running inside a KVM instance (so a nested KVM scenario).
>
> Hi Andrea,
>
> I think you need to pass -enable-kvm to make it "nested KVM scenario",
> otherwise it's TCG emulation.
So, IIUC I shouldn't hit the QEMU TCG issue mentioned by Marco, right?
>
> You seem to use the default 20s stall timeout. FWIW syzbot uses 160
> secs timeout for TCG emulation to avoid false positive warnings:
> https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
> There are a number of other timeouts raised as well, some as high as
> 420 seconds.
I see, I'll try with these settings and see if I can still hit the soft
lockup messages.
Thanks,
-Andrea
next prev parent reply other threads:[~2021-10-11 9:23 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-10 13:53 Andrea Righi
2021-10-11 6:00 ` Marco Elver
2021-10-11 6:32 ` Andrea Righi
2021-10-11 6:48 ` Marco Elver
2021-10-11 7:10 ` Andrea Righi
2021-10-11 7:19 ` Dmitry Vyukov
2021-10-11 9:23 ` Andrea Righi [this message]
2021-10-11 9:52 ` Andrea Righi
2021-10-11 10:03 ` Marco Elver
2021-10-11 14:42 ` Andrea Righi
2021-10-11 15:00 ` Dmitry Vyukov
2021-10-11 15:11 ` Andrea Righi
2021-10-11 15:16 ` Dmitry Vyukov
2021-10-20 6:00 ` Marco Elver
2021-10-20 6:11 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YWQCknwPcGlOBfUi@arighi-desktop \
--to=andrea.righi@canonical.com \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=glider@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox