From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31A43FA3741 for ; Mon, 24 Oct 2022 19:39:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD714940009; Mon, 24 Oct 2022 15:39:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C8711940007; Mon, 24 Oct 2022 15:39:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2A1B940009; Mon, 24 Oct 2022 15:39:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A19FE940007 for ; Mon, 24 Oct 2022 15:39:44 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 71335120307 for ; Mon, 24 Oct 2022 19:39:44 +0000 (UTC) X-FDA: 80056857888.19.0CAF670 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf05.hostedemail.com (Postfix) with ESMTP id 6BC32100010 for ; Mon, 24 Oct 2022 19:39:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666640383; x=1698176383; h=message-id:subject:from:to:cc:date:in-reply-to: references:content-transfer-encoding:mime-version; bh=Hy92JwTra/z8JWqWMJC7+K5x6uq/6h9D0Gg0b3wf/6c=; b=B+LqAz1u5TcLhyR8csSuhsidTMi35GmgrKg4zot1yhMRTsRjRq8ZKqYv 2IxovBmii+rLxlvbMRteZXcFpw+GQPa+o/1YS2gt6UbKYqDW/6WtSSE28 DzIq//8Fho/C7UEtJDUqt8Bs8fnODO4PI/UCc/D13YvzfPiIw0hbsC4gH yzVuMYIZqegMTsk5f8xoqThpO1Z0hZ6sXsr3BaYjKQHvDoP1HNS6/K79n fjqrkcuQWz0EF1eUV2ohFhxwKLAemN1Y6or4Y8SFJi8gOi8k9noebU1L5 eONjBqxSYoRBH/2VXGA24mS/UBMPhxzw00nSdeNyALE5GWJAAghv+rGrM Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10510"; a="309194287" X-IronPort-AV: E=Sophos;i="5.95,210,1661842800"; d="scan'208";a="309194287" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2022 12:39:41 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10510"; a="694672754" X-IronPort-AV: E=Sophos;i="5.95,210,1661842800"; d="scan'208";a="694672754" Received: from relbaz1-mobl.amr.corp.intel.com (HELO [10.209.26.196]) ([10.209.26.196]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2022 12:39:41 -0700 Message-ID: Subject: Re: writeback completion soft lockup BUG in folio_wake_bit() From: Tim Chen To: Linus Torvalds , Dan Williams Cc: Matthew Wilcox , Brian Foster , Linux-MM , linux-fsdevel , linux-xfs , Hugh Dickins Date: Mon, 24 Oct 2022 12:39:33 -0700 In-Reply-To: References: <6350a5f07bae2_6be12944c@dwillia2-xfh.jf.intel.com.notmuch> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.4 (3.44.4-2.fc36) MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666640384; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hy92JwTra/z8JWqWMJC7+K5x6uq/6h9D0Gg0b3wf/6c=; b=xzfY8/iPLZZxosxrZuegjo0Gyhaw25bqJU7nN3HRS3twdWCQGbvgjybCxw8jsrx1AiUGek F038RWlg2OpuMDZ6b61pkn2pNenEjW9HKGwIWnIU3Os9ihlFnb3lnZOWyvt93QsgISceYw 066/d2p1/gsAfeTHXbJxixtqynljaHo= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=B+LqAz1u; spf=none (imf05.hostedemail.com: domain of tim.c.chen@linux.intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=tim.c.chen@linux.intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666640384; a=rsa-sha256; cv=none; b=0i2bVrCc3XLtbOmRsJqz+V1cK/Gko1dXv8L6+STSctR+wQc1ujjD4CGBtJ8NF45j2iGXRh 2XSm9Or35G9Y9C77SXwT2b/EzpUsCvCV3Z3rIFBuWrIX8T5nH1E3a65YIjR7tIDP7g1BPx 7yGItZwc5K1K6LHBCTdXyH1LY24T9i4= Authentication-Results: imf05.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=B+LqAz1u; spf=none (imf05.hostedemail.com: domain of tim.c.chen@linux.intel.com has no SPF policy when checking 134.134.136.65) smtp.mailfrom=tim.c.chen@linux.intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) X-Stat-Signature: ys6k33nh83zc9dr6sbggwh8c3ory9hbs X-Rspamd-Queue-Id: 6BC32100010 X-Rspamd-Server: rspam07 X-Rspam-User: X-HE-Tag: 1666640383-917902 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, 2022-10-23 at 15:38 -0700, Linus Torvalds wrote: > On Wed, Oct 19, 2022 at 6:35 PM Dan Williams > wrote: > >=20 > > A report from a tester with this call trace: > >=20 > > =C2=A0watchdog: BUG: soft lockup - CPU#127 stuck for 134s! > > [ksoftirqd/127:782] > > =C2=A0RIP: 0010:_raw_spin_unlock_irqrestore+0x19/0x40 [..] >=20 > Whee. >=20 > > ...lead me to this thread. This was after I had them force all > > softirqs > > to run in ksoftirqd context, and run with rq_affinity =3D=3D 2 to force > > I/O completion work to throttle new submissions. > >=20 > > Willy, are these headed upstream: > >=20 > > https://lore.kernel.org/all/YjSbHp6B9a1G3tuQ@casper.infradead.org > >=20 > > ...or I am missing an alternate solution posted elsewhere? >=20 > Can your reporter test that patch? I think it should still apply > pretty much as-is.. And if we actually had somebody who had a > test-case that was literally fixed by getting rid of the old bookmark > code, that would make applying that patch a no-brainer. >=20 > The problem is that the original load that caused us to do that thing > in the first place isn't repeatable because it was special production > code - so removing that bookmark code because we _think_ it now hurts > more than it helps is kind of a big hurdle. >=20 > But if we had some hard confirmation from somebody that "yes, the > bookmark code is now hurting", that would make it a lot more > palatable > to just remove the code that we just _think_ that probably isn't > needed any more.. >=20 >=20 I do think that the original locked page on migration problem was fixed by commit 9a1ea439b16b. Unfortunately the customer did not respond to us when we asked them to test their workload when that patch went=C2=A0 into the mainline.=C2=A0 I don't have objection to Matthew's fix to remove the bookmark code, now that it is causing problems with this scenario that I didn't anticipate in my original code. Tim