From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85086C433F5 for ; Mon, 11 Oct 2021 07:20:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E008E60C4A for ; Mon, 11 Oct 2021 07:20:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E008E60C4A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 6CDD86B006C; Mon, 11 Oct 2021 03:20:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 67DB4900002; Mon, 11 Oct 2021 03:20:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 56DFC6B0072; Mon, 11 Oct 2021 03:20:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0004.hostedemail.com [216.40.44.4]) by kanga.kvack.org (Postfix) with ESMTP id 479016B006C for ; Mon, 11 Oct 2021 03:20:01 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id F17CB8249980 for ; Mon, 11 Oct 2021 07:20:00 +0000 (UTC) X-FDA: 78683307360.29.301AB9B Received: from mail-oi1-f181.google.com (mail-oi1-f181.google.com [209.85.167.181]) by imf03.hostedemail.com (Postfix) with ESMTP id A06B23000216 for ; Mon, 11 Oct 2021 07:20:00 +0000 (UTC) Received: by mail-oi1-f181.google.com with SMTP id a3so23479305oid.6 for ; Mon, 11 Oct 2021 00:20:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=hvLHxd7M6lrg6ngmQ548ehLrcVjs8MUn2wc/QvypJI4=; b=Qqgyce7Vxo23qX8i28+cTzfuYBMnevWdx/tqmmhlgBixOEtsWhq0FDW2RBEiPiSXoL jo98vABaBaiM47Mp+uRRDpLswD5Z9d5R+pNvPCkFyK4oj9udapu36GlxvddQB472+Baz RucJpn8zBXTrik6i/m/Tl8MFy7QH3brx0GvLuObTnLMWHTQgQapW42Z1XVACyzw/OYd5 9eMYq0Rb0YFvyXkVr7hgC89AdNFXC+AL6PgSlCDcYBtHKeXAebd2TJXOvjbspYAyQ8bm neseL7niXZPlX2HNkF1LzEORNn69t5NpujwAKgI7XjExbB+gddPo7/y0oPhSXugMs7CW 2l2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=hvLHxd7M6lrg6ngmQ548ehLrcVjs8MUn2wc/QvypJI4=; b=fkJ1PLWE8BHKzEqsHeofkuB3kpe4xdCirTS2/fi1iHhy1rrTXFHQsAtszem/XCRc+t J33i6Xc7Mfr07aJm4hQSKC3L5XN9JJAwH1V3Ynq5HFDnuCP++ltXcQ4+ET7b4DMJov/F yn4Uf5T7ZY8g53etvGifX+iNBYvKoaRyZo4V/8tTPsAPT67CAi2zC5kdOf0gB6MRP1KS l02s2sjKyi4t1zZGvBFWPhxYSwBQY+g/foIwqY1D0ErGzaMWC8MIUIkzSxWOJVEwdisZ NYLjsSMHT8H1JVKK/z7ZKSfUO/8xa0+i/d+uMucLKGz7J2yXtC8wx06Z1u/887+qnJbV 6ZTg== X-Gm-Message-State: AOAM5305/nfXDStSZoBRmThbS+U97c1ySVT6vPpLVWc3o6mIBWIkXltI Uycs0v/Bv5Q1mxJgDrEVrqD5q+04sY0vKqMFpTzlzw== X-Google-Smtp-Source: ABdhPJwtYyd90bqWzxWBaDCiYF7L1wudQnwl9e52EgCOG/GVAWR0Kic3luA7DhxEY9yg5A4DFtl7wOtKfBDBmsBnCxU= X-Received: by 2002:aca:5dc5:: with SMTP id r188mr2376347oib.160.1633936799664; Mon, 11 Oct 2021 00:19:59 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dmitry Vyukov Date: Mon, 11 Oct 2021 09:19:48 +0200 Message-ID: Subject: Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled To: Andrea Righi Cc: Marco Elver , Alexander Potapenko , kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: A06B23000216 X-Stat-Signature: 59sgxoxq8mcxeepm9zm7w3i8qf5yiski Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Qqgyce7V; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of dvyukov@google.com designates 209.85.167.181 as permitted sender) smtp.mailfrom=dvyukov@google.com X-HE-Tag: 1633936800-744396 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 11 Oct 2021 at 09:10, Andrea Righi wro= te: > > On Mon, Oct 11, 2021 at 08:48:29AM +0200, Marco Elver wrote: > > On Mon, 11 Oct 2021 at 08:32, Andrea Righi = wrote: > > > On Mon, Oct 11, 2021 at 08:00:00AM +0200, Marco Elver wrote: > > > > On Sun, 10 Oct 2021 at 15:53, Andrea Righi wrote: > > > > > I can systematically reproduce the following soft lockup w/ the l= atest > > > > > 5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I'v= e > > > > > tested so far). > > > > > > > > > > I've found this issue by running systemd autopkgtest (I'm using t= he > > > > > latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen = with > > > > > any recent version of systemd). > > > > > > > > > > I'm running this test inside a local KVM instance and apparently = systemd > > > > > is starting up its own KVM instances to run its tests, so the con= text is > > > > > a nested KVM scenario (even if I don't think the nested KVM part = really > > > > > matters). > > > > > > > > > > Here's the oops: > > > > > > > > > > [ 36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! = [udevadm:333] > > > > > [ 36.466565] Modules linked in: btrfs blake2b_generic zstd_comp= ress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async= _tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy > > > > > [ 36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4 > > > > > [ 36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 19= 96), BIOS 1.14.0-2 04/01/2014 > > > > [...] > > > > > > > > > > If I disable CONFIG_KFENCE the soft lockup doesn't happen and sys= temd > > > > > autotest completes just fine. > > > > > > > > > > We've decided to disable KFENCE in the latest Ubuntu Impish kerne= l > > > > > (5.13) for now, because of this issue, but I'm still investigatin= g > > > > > trying to better understand the problem. > > > > > > > > > > Any hint / suggestion? > > > > > > > > Can you confirm this is not a QEMU TCG instance? There's been a kno= wn > > > > issue with it: https://bugs.launchpad.net/qemu/+bug/1920934 > > > > > > It looks like systemd is running qemu-system-x86 without any "accel" > > > options, so IIUC the instance shouldn't use TCG. Is this a correct > > > assumption or is there a better way to check? > > > > AFAIK, the default is TCG if nothing else is requested. What was the > > command line? > > This is the full command line of what systemd is running: > > /bin/qemu-system-x86_64 -smp 4 -net none -m 512M -nographic -vga none -= kernel /boot/vmlinuz-5.15-rc4 -drive format=3Draw,cache=3Dunsafe,file=3D/va= r/tmp/systemd-test.sI1nrh/badid.img -initrd /boot/initrd.img-5.15-rc4 -appe= nd root=3D/dev/sda1 rw raid=3Dnoautodetect rd.luks=3D0 loglevel=3D2 init= =3D/lib/systemd/systemd console=3DttyS0 selinux=3D0 SYSTEMD_UNIT_PATH=3D/u= sr/lib/systemd/tests/testdata/testsuite-14.units:/usr/lib/systemd/tests/tes= tdata/units: systemd.unit=3Dtestsuite.target systemd.wants=3Dtestsuite-14.s= ervice systemd.wants=3Dend.service > > And this is running inside a KVM instance (so a nested KVM scenario). Hi Andrea, I think you need to pass -enable-kvm to make it "nested KVM scenario", otherwise it's TCG emulation. You seem to use the default 20s stall timeout. FWIW syzbot uses 160 secs timeout for TCG emulation to avoid false positive warnings: https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863= d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509 There are a number of other timeouts raised as well, some as high as 420 seconds.