From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0D6AC4332F for ; Tue, 31 Oct 2023 14:02:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D2CB6B02D2; Tue, 31 Oct 2023 10:02:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 483116B02EB; Tue, 31 Oct 2023 10:02:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34A5D6B02FB; Tue, 31 Oct 2023 10:02:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2216B6B02D2 for ; Tue, 31 Oct 2023 10:02:14 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C762E808E9 for ; Tue, 31 Oct 2023 14:02:13 +0000 (UTC) X-FDA: 81405920946.05.84BA3C9 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf28.hostedemail.com (Postfix) with ESMTP id C2089C00D0 for ; Tue, 31 Oct 2023 14:01:38 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Ftzu5Fhv; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=5AehXmbL; dmarc=none; spf=pass (imf28.hostedemail.com: domain of jack@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698760899; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1LF7kbHX9N77tVn1FdmTn2yQglSxGX4kd1aiT/nhq30=; b=7cGq7H74E261ESwfYyLFSBGG3CHAOQS6RTX0+LZxiFj+JvXb/arvB/LxlIQ1ZAe2cggNqg 9HPAEU/K507iI6eOZAUI3xF3B3vMPIKxFLwRv04glZJiDmX65snEd60V/5C0KREyf6/s62 kO9atGK8oS9yOY1utWqr4fiZDw75564= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Ftzu5Fhv; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=5AehXmbL; dmarc=none; spf=pass (imf28.hostedemail.com: domain of jack@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698760899; a=rsa-sha256; cv=none; b=Omxno2xSZ/Qncyy/cVq/6zgfTLkLciOORDH5UBndYU1chrGfULOOJkzi8sxXhPTw4ov4HM m8ekdjt2VNTp5lgZzV2s7sGYFMc09FaMwbUOv0lsK86LyDhjd9n/ZoBCWCjTl8hTOEoPMy 4kJ9pgbwBLidd+CvF67SE+OuyAsDYq0= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 0F9DF1F38C; Tue, 31 Oct 2023 14:01:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1698760897; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1LF7kbHX9N77tVn1FdmTn2yQglSxGX4kd1aiT/nhq30=; b=Ftzu5FhvcqTcaOYiHY91lRJ/oN1s8Qz2uzrsxDg8ACQGCpWpL93PeQDVfINnK7CpjomSpE NXAbrLkoe3mb8TfnGpUS70JM+8YxPaJzg7w3rNjI0SBpXmF5BIRurx7ZD6OV3D7ZVGiajY oWTbXPPTJKemeoCQE8dwtnBjOOYXmvs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1698760897; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1LF7kbHX9N77tVn1FdmTn2yQglSxGX4kd1aiT/nhq30=; b=5AehXmbL1cgO0YhZ0g3JQJBoKhxnlVe5OHvpvwxfI0mcRDdnIpEDe+uHpt80iAs8NMxPIQ HqrlJuHl1wvzhNDA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 002D3138EF; Tue, 31 Oct 2023 14:01:36 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 4EQNAMEIQWV8BAAAMHmgww (envelope-from ); Tue, 31 Oct 2023 14:01:36 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 7B01AA06E5; Tue, 31 Oct 2023 15:01:36 +0100 (CET) Date: Tue, 31 Oct 2023 15:01:36 +0100 From: Jan Kara To: Marek =?utf-8?Q?Marczykowski-G=C3=B3recki?= Cc: Mikulas Patocka , Jan Kara , Vlastimil Babka , Andrew Morton , Matthew Wilcox , Michal Hocko , stable@vger.kernel.org, regressions@lists.linux.dev, Alasdair Kergon , Mike Snitzer , dm-devel@lists.linux.dev, linux-mm@kvack.org Subject: Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Message-ID: <20231031140136.25bio5wajc5pmdtl@quack3> References: <18a38935-3031-1f35-bc36-40406e2e6fd2@suse.cz> <3514c87f-c87f-f91f-ca90-1616428f6317@redhat.com> <1a47fa28-3968-51df-5b0b-a19c675cc289@suse.cz> <20231030122513.6gds75hxd65gu747@quack3> <20231030155603.k3kejytq2e4vnp7z@quack3> <98aefaa9-1ac-a0e4-fb9a-89ded456750@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: C2089C00D0 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 6dduqi4tcgt49zskcnff9est4ea71tdh X-HE-Tag: 1698760898-479472 X-HE-Meta: U2FsdGVkX1+s65ePC4taivdP59tlz+u31p1W+eUCJOPxV42ps0jVV1kW09NDyPUSPYzonWMTzz9nLGeiKsg4kcJB6DEvMBE+HB4fS7SaTcg95AKgrgoZZSlllKemXJmC8ZIYBHG86Ay0grj/eOFemiTBZ1rkGXUbuA8WikJ+tCECeuH+vrP/bVpjfOUUvRN4pStlBV33oP/YbMpIs0+dwSjpiPLyOwLWwDUF85r2WvcUmSvrGymOWKyrQWf8hX5VMisetllCaTNjWx0T0QvaW9TxTALHCYqV9+zHBlPcENM4t5p9SuPJxmOM1CApB/S4JsOOYbmZinU0+BjftqpistQbKBpdIpOJ0l5tPrPnbjRPDhqMl/kBq+0DcmM9hYqcuoju4Rp1MChCdS3zZNHRa0ggJCx9c+DLENRhYGbAzjtbfRVUnCTpXSw9YLG5qu4vlZH2BNwb/KJ388/pzMEu10Q3tV9choHNFyhR998/65u5rVC8BtZDaW1CjkZMfRUD4moWFBEWFhoUt1eoOABySSBiqUbq2r8qEr+Nc0mkDUw5IZeiOijz2wnD/aZDRA6cT3wkqIzycwfWSkwxAwuFBt64rT+HlYPxl3rV52rW1H+npDXQgcM5L8PDOgB4oxQCy8ImtxJrDnVbdtmreHyNRkVvikVdeyvnttQFyR8T4gUlf8kTDVyBpmmkGF3g0kdalu8v49WPd8EN0rkO0bAK/WYXLOj+O0Ztn/fls1DwNjulIsf3dd5dUqGMfaI/hooNWDVgAJ0WU39WT0on/Mw+QqZYtC5jI2nMOyikDsE810sDs4nMSyXCdK7YxI07doiHzuNW0hwvgvBSsiMH/kMvubYvpKUb7FhvLMGJWoqToge2xvvHurld1HtrN3qR0KlXQUwzkszdkZyI1yLPBtOZaNdcNEkEK1K/Yxd9ZfH85gm4qAhfTfZ/SEv3wMorwXEYy1SZ19TMqX/3JsZ+y3r au2modz9 FRy7BN6GGaStIORW2Rxl2h8S8sAJjspjkqBfpGjIP/vCWEpdYRpvPVVa1jEgNDOHckc8SLHBDHmKX6p8Gri0rMZ22cWpvov31H74WVox/qOOqheIM9byRy+lt4yyDSiWo41Fl5vk8LZ7L+1VA2zSKkDOLOrQp5+L4O3dNbeGeqALBQhMU/pJ/kr+XavkIRvjZ7j9Nu/NgLAXWnkMjMhEdYnZgEhgFbMSO9v0zIlXQlcpGdWqvNiv6y4kDKtXHBstVGxC1b0+xlkkz8AtEuE6Ozf+zVPtTSg1HlWVvlX52jGTwv/E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue 31-10-23 04:48:44, Marek Marczykowski-Górecki wrote: > On Mon, Oct 30, 2023 at 06:50:35PM +0100, Mikulas Patocka wrote: > > On Mon, 30 Oct 2023, Marek Marczykowski-Górecki wrote: > > > Then retried with order=PAGE_ALLOC_COSTLY_ORDER and > > > PAGE_ALLOC_COSTLY_ORDER back at 3, and also got similar crash. > > > > So, does it mean that even allocating with order=PAGE_ALLOC_COSTLY_ORDER > > isn't safe? > > That seems to be another bug, see below. > > > Try enabling CONFIG_DEBUG_VM (it also needs CONFIG_DEBUG_KERNEL) and try > > to provoke a similar crash. Let's see if it crashes on one of the > > VM_BUG_ON statements. > > This was very interesting idea. With this, immediately after login I get > the crash like below. Which makes sense, as this is when pulseaudio > starts and opens /dev/snd/*. I then tried with the dm-crypt commit > reverted and still got the crash! But, after blacklisting snd_pcm, > there is no BUG splat, but the storage freeze still happens on vanilla > 6.5.6. OK, great. Thanks for testing. > Plain 6.5.6 (so order = MAX_ORDER - 1, and PAGE_ALLOC_COSTLY_ORDER=3), in frozen state: > [ 143.196106] task:blkdiscard state:D stack:13672 pid:4884 ppid:2025 flags:0x00000002 > [ 143.196130] Call Trace: > [ 143.196139] > [ 143.196147] __schedule+0x30e/0x8b0 > [ 143.196162] schedule+0x59/0xb0 > [ 143.196175] schedule_timeout+0x14c/0x160 > [ 143.196193] io_schedule_timeout+0x4b/0x70 > [ 143.196207] wait_for_completion_io+0x81/0x130 > [ 143.196226] submit_bio_wait+0x5c/0x90 > [ 143.196241] blkdev_issue_discard+0x94/0xe0 > [ 143.196260] blkdev_common_ioctl+0x79e/0x9c0 > [ 143.196279] blkdev_ioctl+0xc7/0x270 > [ 143.196293] __x64_sys_ioctl+0x8f/0xd0 > [ 143.196310] do_syscall_64+0x3c/0x90 So this shows there was bio submitted and it never ran to completion. > for f in $(grep -l crypt /proc/*/comm); do head $f ${f/comm/stack}; done So this shows dm-crypt layer isn't stuck anywhere. So the allocation path itself doesn't seem to be locking up, looping or anything. > Then tried: > - PAGE_ALLOC_COSTLY_ORDER=4, order=4 - cannot reproduce, > - PAGE_ALLOC_COSTLY_ORDER=4, order=5 - cannot reproduce, > - PAGE_ALLOC_COSTLY_ORDER=4, order=6 - freeze rather quickly > > I've retried the PAGE_ALLOC_COSTLY_ORDER=4,order=5 case several times > and I can't reproduce the issue there. I'm confused... And this kind of confirms that allocations > PAGE_ALLOC_COSTLY_ORDER causing hangs is most likely just a coincidence. Rather something either in the block layer or in the storage driver has problems with handling bios with sufficiently high order pages attached. This is going to be a bit painful to debug I'm afraid. How long does it take for you trigger the hang? I'm asking to get rough estimate how heavy tracing we can afford so that we don't overwhelm the system... Honza -- Jan Kara SUSE Labs, CR