From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D792EC4345F for ; Thu, 18 Apr 2024 01:39:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 369336B0092; Wed, 17 Apr 2024 21:39:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F2386B0093; Wed, 17 Apr 2024 21:39:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11E146B0095; Wed, 17 Apr 2024 21:39:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DEBF96B0092 for ; Wed, 17 Apr 2024 21:39:27 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 64FED121021 for ; Thu, 18 Apr 2024 01:39:27 +0000 (UTC) X-FDA: 82020945174.13.7C0E1D2 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf12.hostedemail.com (Postfix) with ESMTP id C3D5C4000B for ; Thu, 18 Apr 2024 01:39:25 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=XSoUtPdh; spf=none (imf12.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713404365; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nPQW+PhTfzyYpqrNWwtHYFjE/ppW5K9b7LVALH2CUbw=; b=fIl7BXYmZjhHkWLpUH/RSMGPXxrhXKaDibO4xnFPsy7dAEr2iZ5RHfmVb/WKadjf1tewjq n7xvfVwL4S819wL6yGxXhcdDMsA/dpLtT45PEci7dlrjw6GN2/jhHSI6HgXsVsEm8MUCT5 p/GYV8unCvJp2ZKu1uB6h07EoC8gt6E= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=XSoUtPdh; spf=none (imf12.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713404365; a=rsa-sha256; cv=none; b=kWE5rZ9PQNyJyH02e8S7mo1bjrI42Snten7F1RRVK3gcEtOPSKYb/N1BB5TdLFAKdsaqkH 4n7IHEv9x3eXqVthM3JsSvu1gYA6VRKdxfucfRE9fZ+JhhhwGMcea2U9i+iB7eCAfcn5K8 wLj6vqHIH4L5PVdby06ayh44iEScKXM= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=nPQW+PhTfzyYpqrNWwtHYFjE/ppW5K9b7LVALH2CUbw=; b=XSoUtPdhF00S40gBsQwATN/3XT CnvglDPefCM2L3t0WX4necLIz5bEyMigX6zqeLM0zda4jE9nRkB8K/dtIFwikK+eGnCF0zT8t29kO Mw+VgiqxvX+U2gmvFHIAIJr7nFTZcClJ/dgJ5cNLEiQJ3QYPa0D1AgDk51uHA0rS5G1cXCCXe2ywm p2x2bhS0+CeddI3Ivo5uE/oWlTFOGyPU9hPZEkPnPkcsLCAv+ZOxPrhKCHZKZb3Sdc7f+/soUvuNx CSRRY/+clV4HihWR5oBpOzc3Iq4QGLDxijsyq48FQWIfDCqzVUT1NRA2YPfB7SRMtTuCv6ctsiawl le7qMwMw==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rxGkB-00000004Ejb-3VEf; Thu, 18 Apr 2024 01:39:20 +0000 Date: Thu, 18 Apr 2024 02:39:19 +0100 From: Matthew Wilcox To: Luis Chamberlain Cc: fstests@vger.kernel.org, kdevops@lists.linux.dev, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, david@redhat.com, linmiaohe@huawei.com, muchun.song@linux.dev, osalvador@suse.de Subject: Re: [PATCH] fstests: add fsstress + compaction test Message-ID: References: <20240418001356.95857-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240418001356.95857-1-mcgrof@kernel.org> X-Rspamd-Queue-Id: C3D5C4000B X-Stat-Signature: 589okneb6rntroqp3e9mqzn6p57q6g4b X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1713404365-788610 X-HE-Meta: U2FsdGVkX1+kLwbgc+BVK1ZOo55QUN6TEAtARLBGnLRVSbqhUql9JEd76sNZDeFLfhcDIRH210/BtgcIhWMrQKEoDBOh41uiCV5T2wlaCx7CArom1hVXpRkqMOMqW7oG31X07bRyUn7NlBr+vNW4ANrDKMPUVGnZVWQyCE3GMXxNRM/Akm4krx4lUxHwlgugzoyHfvDp0JcThCO6ngRFLE1Rcgz/7Pn3cfDDFNhSzRj/az8GyMIDMSYxMTtY5yuDZ3o5CAe7P7tWICGgv53tNu8ERQQTQdkRZx/ecza45R8k7+HpU3ygZqFyMZ3zlaIocHyo/MxomfnD9yddZ0uDiuolgOy0p5L8LCyb9MjcRH0DhOqzMeQStR67P8fGMTVqSvz8rTjlFHQp6QlO+k+uk+DmPYMxu+ScWY83D2pmQaegD5grEODOcTAl8n1TO1XbeMWtWNmp9bgduENWTHKc5nr5jKDGvpirzLVwVFIilg8KddDzUam9dF2eQEQkYyJT7SFNEMLVGrOyaHnxuzRmzk8axC2PUgPdtAws7lm3s0daHn+UkjWELx1wXMYLGi/ixZwccOUhScQMFv335gr0dSdJBoz4k86yGs2qchiOjBdyzTSKRxD111UV1rwosLA3nJ7CvOoUUXK1S2siytGJTNzXVQEcdI8WH8XEHBAZ9mwKu1GATn9g83YX+qbIwBHZQlktBYbW6DNBpwyt5rMDy2thNoJYhQGkhDDkZMs6yDOUpCHZ9QnbFTtfvRTRj0KySOFFQYPyL6xmCbOUa0/OZDQpHnNgs2brD6bJnHDouhOMS/uT6cPsj4cIvi0/R6o5tjDkz3WaXOBELggkcIJyu1vajQfcQA6z9c/UkXMjwpBWQuYbL5Yxc7dijsk0h3Og+nRLP3nfAtNgBWXPWYFSoadmpWfaWA/CrFncQcZOvF8tYCOVAJm8ijAvzYyQyGZXR3e6MmP1KbQsZ5H8+3N RNaCNNix krLo1dSSSKfbnC01xv2vSHFZJiPurG7f4tkK+uAJ5S30MFOZcIauZF/cWO2wMatY8ZyDPsNTjNlhLBBEPi5dJHvGhJ160vnGeKo1zz1mf5cYbNF/PLcJ1d17Kf+Ex4O2RD0HyQ+3uIB4aaOB+gi5cLStFRjJPdGZMhS5qf2AoUdyUBleYfk8vY0oNl2Fc0Q6OfAkZsdkPM+ecWQ052NlyfAdKCvkVq8f4NR+5Pl2i60Dqw28RsDgtRxHn7Fw8TBkSI8wQO4ZoZeqzDsd/O0JsN3KjTSM7sO7CYj2kM8IWuBuEEXc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 17, 2024 at 05:13:56PM -0700, Luis Chamberlain wrote: > Running compaction while we run fsstress can crash older kernels as per > korg#218227 [0], the fix for that [0] has been posted [1] but that patch > is not yet on v6.9-rc4 and the patch requires changes for v6.9. It doesn't require changes, it just has prerequisites: https://lore.kernel.org/all/ZgHhcojXc9QjynUI@casper.infradead.org/ > Today I find that v6.9-rc4 is also hitting an unrecoverable hung task > between compaction and fsstress while running generic/476 on the > following kdevops test sections [2]: > > * xfs_nocrc > * xfs_nocrc_2k > * xfs_nocrc_4k > > Analyzing the trace I see the guest uses loopback block devices for the > fstests TEST_DEV, the loopback file uses sparsefiles on a btrfs > partition. The contention based on traces [3] [4] seems to be that we > have somehow have fsstress + compaction race on folio_wait_bit_common(). What do you mean by "race"? Here's what I see: Apr 16 23:06:11 base-xfs-nocrc-2k kernel: INFO: task kcompactd0:72 blocked for more than 120 seconds. Apr 16 23:06:11 base-xfs-nocrc-2k kernel: Not tainted 6.9.0-rc4 #4 Apr 16 23:06:11 base-xfs-nocrc-2k kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 16 23:06:11 base-xfs-nocrc-2k kernel: task:kcompactd0 state:D stack:0 pid:72 tgid:72 ppid:2 flags:0x00004000 Apr 16 23:06:11 base-xfs-nocrc-2k kernel: Call Trace: Apr 16 23:06:11 base-xfs-nocrc-2k kernel: Apr 16 23:06:11 base-xfs-nocrc-2k kernel: __schedule+0x3d9/0xaf0 Apr 16 23:06:11 base-xfs-nocrc-2k kernel: schedule+0x26/0xf0 Apr 16 23:06:11 base-xfs-nocrc-2k kernel: io_schedule+0x42/0x70 Apr 16 23:06:11 base-xfs-nocrc-2k kernel: folio_wait_bit_common+0x123/0x370 Apr 16 23:06:11 base-xfs-nocrc-2k kernel: ? __pfx_wake_page_function+0x10/0x10 Apr 16 23:06:11 base-xfs-nocrc-2k kernel: migrate_pages_batch+0x69a/0xd70 But you didn't run the backtrace through scripts/decode_stacktrace.sh so I can't figure out what we're waiting on. > We have this happening: > > a) kthread compaction --> migrate_pages_batch() > --> folio_wait_bit_common() > b) workqueue on btrfs writeback wb_workfn --> extent_write_cache_pages() > --> folio_wait_bit_common() > c) workqueue on loopback loop_rootcg_workfn() --> filemap_fdatawrite_wbc() > --> folio_wait_bit_common() > d) kthread xfsaild --> blk_mq_submit_bio() --> wbt_wait() > > I tried to reproduce but couldn't easily do so, so I wrote this test > to help, and with this I have 100% failure rate so far out of 2 runs. > > Given we also have korg#218227 and that patch likely needing > backporting, folks will want a reproducer for this issue. This should > hopefully help with that case and this new separate issue. > > To reproduce with kdevops just: > > make defconfig-xfs_nocrc_2k -j $(nproc) > make -j $(nproc) > make fstests > make linux > make fstests-baseline TESTS=generic/733 > tail -f guestfs/*-xfs-nocrc-2k/console.log > > [0] https://bugzilla.kernel.org/show_bug.cgi?id=218227 > [1] https://lore.kernel.org/all/7ee2bb8c-441a-418b-ba3a-d305f69d31c8@suse.cz/T/#u > [2] https://github.com/linux-kdevops/kdevops/blob/main/playbooks/roles/fstests/templates/xfs/xfs.config > [3] https://gist.github.com/mcgrof/4dfa3264f513ce6ca398414326cfab84 > [4] https://gist.github.com/mcgrof/f40a9f31a43793dac928ce287cfacfeb > > Signed-off-by: Luis Chamberlain > --- > > Note: kdevops uses its own fork of fstests which has this merged > already, so the above should just work. If it's your first time using > kdevops be sure to just read the README for the first time users: > > https://github.com/linux-kdevops/kdevops/blob/main/docs/kdevops-first-run.md > > common/rc | 7 ++++++ > tests/generic/744 | 56 +++++++++++++++++++++++++++++++++++++++++++ > tests/generic/744.out | 2 ++ > 3 files changed, 65 insertions(+) > create mode 100755 tests/generic/744 > create mode 100644 tests/generic/744.out > > diff --git a/common/rc b/common/rc > index b7b77ac1b46d..d4432f5ce259 100644 > --- a/common/rc > +++ b/common/rc > @@ -120,6 +120,13 @@ _require_hugepages() > _notrun "Kernel does not report huge page size" > } > > +# Requires CONFIG_COMPACTION > +_require_compaction() > +{ > + if [ ! -f /proc/sys/vm/compact_memory ]; then > + _notrun "Need compaction enabled CONFIG_COMPACTION=y" > + fi > +} > # Get hugepagesize in bytes > _get_hugepagesize() > { > diff --git a/tests/generic/744 b/tests/generic/744 > new file mode 100755 > index 000000000000..2b3c0c7e92fb > --- /dev/null > +++ b/tests/generic/744 > @@ -0,0 +1,56 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2024 Luis Chamberlain. All Rights Reserved. > +# > +# FS QA Test 744 > +# > +# fsstress + compaction test > +# > +. ./common/preamble > +_begin_fstest auto rw long_rw stress soak smoketest > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > + $KILLALL_PROG -9 fsstress > /dev/null 2>&1 > +} > + > +# Import common functions. > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs generic > + > +_require_scratch > +_require_compaction > +_require_command "$KILLALL_PROG" "killall" > + > +echo "Silence is golden." > + > +_scratch_mkfs > $seqres.full 2>&1 > +_scratch_mount >> $seqres.full 2>&1 > + > +nr_cpus=$((LOAD_FACTOR * 4)) > +nr_ops=$((25000 * nr_cpus * TIME_FACTOR)) > +fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus) > + > +# start a background getxattr loop for the existing xattr > +runfile="$tmp.getfattr" > +touch $runfile > +while [ -e $runfile ]; do > + echo 1 > /proc/sys/vm/compact_memory > + sleep 15 > +done & > +getfattr_pid=$! > + > +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION") > + > +$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full > + > +rm -f $runfile > +wait > /dev/null 2>&1 > + > +status=0 > +exit > diff --git a/tests/generic/744.out b/tests/generic/744.out > new file mode 100644 > index 000000000000..205c684fa995 > --- /dev/null > +++ b/tests/generic/744.out > @@ -0,0 +1,2 @@ > +QA output created by 744 > +Silence is golden > -- > 2.43.0 >