From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84D08C433EF for ; Mon, 11 Oct 2021 06:01:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 23F3460EE7 for ; Mon, 11 Oct 2021 06:01:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 23F3460EE7 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 6DAB26B006C; Mon, 11 Oct 2021 02:01:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 68A86900003; Mon, 11 Oct 2021 02:01:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 579BE900002; Mon, 11 Oct 2021 02:01:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0244.hostedemail.com [216.40.44.244]) by kanga.kvack.org (Postfix) with ESMTP id 476576B006C for ; Mon, 11 Oct 2021 02:01:42 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 0B5508249980 for ; Mon, 11 Oct 2021 06:01:42 +0000 (UTC) X-FDA: 78683110044.31.F3519FE Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) by imf06.hostedemail.com (Postfix) with ESMTP id B544B801BEDB for ; Mon, 11 Oct 2021 06:01:41 +0000 (UTC) Received: by mail-ot1-f44.google.com with SMTP id s18-20020a0568301e1200b0054e77a16651so3618469otr.7 for ; Sun, 10 Oct 2021 23:01:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7h13AY5afsp7vOYth4Tp6m8iDZB9FmezNoNd7LDicIs=; b=lyUPYDShvubOAEDt7S9DgDqXjg/pLHKhHBiXEIAL/UZd4qHyG0u2JQvRBhznnZ4x35 qNS3e4oo10kLWLPIc1XmaUqnnWcNFxG+AuHVjpIddAJhBBoIfh0KW7rRYMekjnF4KyEp 1hb5BzmWrPWTphiVf6RtPJdGRevl/uO9NQVZQJKZM/yl0z6sGhxnvY1LpDXFjC5X1pgT MRs1ZGtQpwSKzuBjP0d/wlfAG6FcsQgOSUlAWkpbxwdckN00vxBU6Ge0VQsy1/p/gmHx lLlmy20jRLeLNq4p+nkGMo835+JVweSz4QZf927vSsRBbpLIeZC1AxgIIEXnPg3D44Yg M0/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7h13AY5afsp7vOYth4Tp6m8iDZB9FmezNoNd7LDicIs=; b=Hk+kb3GiBTCrgPVjOoxJ9/EOUKifK0cXCFjT7NSHrsybFb86LYj8RghXZ0hpa3RoJT JWuSgFW9iBQrBf50iyeweA3NlGb9+Y5tHK43R+AM1CTQeYAHs/SHwVPQGfP/f2UiKOcM bxYJuxJwVH9GVV5EKpoke1izWPG0M1TDQX9vQBGIQ9kfnl46r7LwWmhWl/j2qaZKtb/C sGGOT1QzU6y5TlcYAOeteD2cDOvSmOANsNhgXJR+eWMOfr6pbpUde91GCoA9AsuCGONw Sb5vOu3qrEdwFuImsl0I+z8y6rR7PXs381ZhLaa8bjnh9AFBW2CkTNhrljPGqewdM1FP 0kYQ== X-Gm-Message-State: AOAM531C8R4KTckqEuhTqVUZt+ifD98IxwLx5KyE9IysIlcMfN5oLPTh l4J0fc81hZsrgCM0TvPJHhxulYyWAZh+u6EkAf5FIA== X-Google-Smtp-Source: ABdhPJz6zIDjEcRBmZaIIG+x4aD2tmmvwn24+vDfvh9lRvssH6+aecFI5E50J4ASRStv3JjDrHLbNCI/zEb5vtfp0mY= X-Received: by 2002:a9d:3e04:: with SMTP id a4mr20242022otd.329.1633932100754; Sun, 10 Oct 2021 23:01:40 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Marco Elver Date: Mon, 11 Oct 2021 08:00:00 +0200 Message-ID: Subject: Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled To: Andrea Righi Cc: Alexander Potapenko , Dmitry Vyukov , kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: B544B801BEDB X-Stat-Signature: 1nsdkwfwwyfgoyzjkm3y86iumidm1ebq Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=lyUPYDSh; spf=pass (imf06.hostedemail.com: domain of elver@google.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=elver@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam06 X-HE-Tag: 1633932101-540602 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, 10 Oct 2021 at 15:53, Andrea Righi wrote: > I can systematically reproduce the following soft lockup w/ the latest > 5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I've > tested so far). > > I've found this issue by running systemd autopkgtest (I'm using the > latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen with > any recent version of systemd). > > I'm running this test inside a local KVM instance and apparently systemd > is starting up its own KVM instances to run its tests, so the context is > a nested KVM scenario (even if I don't think the nested KVM part really > matters). > > Here's the oops: > > [ 36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [udevadm:333] > [ 36.466565] Modules linked in: btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy > [ 36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4 > [ 36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [...] > > If I disable CONFIG_KFENCE the soft lockup doesn't happen and systemd > autotest completes just fine. > > We've decided to disable KFENCE in the latest Ubuntu Impish kernel > (5.13) for now, because of this issue, but I'm still investigating > trying to better understand the problem. > > Any hint / suggestion? Can you confirm this is not a QEMU TCG instance? There's been a known issue with it: https://bugs.launchpad.net/qemu/+bug/1920934 One thing that I've been wondering is, if we can make CONFIG_KFENCE_STATIC_KEYS=n the default, because the static keys approach is becoming more trouble than it's worth. It requires us to re-benchmark the defaults. If you're thinking of turning KFENCE on by default (i.e. CONFIG_KFENCE_SAMPLE_INTERVAL non-zero), you could make this decision for Ubuntu with whatever sample interval you choose. We've found that for large deployments 500ms or above is more than adequate. Thanks, -- Marco