From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8E71C4332F for ; Wed, 1 Nov 2023 10:27:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E42C8D0059; Wed, 1 Nov 2023 06:27:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 49F1A8D0001; Wed, 1 Nov 2023 06:27:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3354D8D0059; Wed, 1 Nov 2023 06:27:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 248158D0001 for ; Wed, 1 Nov 2023 06:27:05 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CEA6B4070A for ; Wed, 1 Nov 2023 10:27:04 +0000 (UTC) X-FDA: 81409007568.21.3F56D1C Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf05.hostedemail.com (Postfix) with ESMTP id 96E5510001A for ; Wed, 1 Nov 2023 10:27:02 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=h8gvWhpv; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=Xos+WuMc; dmarc=none; spf=pass (imf05.hostedemail.com: domain of jack@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698834422; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uJoXUHYwYfb1qi04WanGMYggfSfr0/lYzD994xmjmAs=; b=JcFQL0FQCRNZvSCVNSJJWyu7ZhuQjcNFTAeB9h45wd2P+W+miT3qoxEgFuJ0Br2tZSMLbY lJpIYPKOGUKsTKjCy9Jgbx8+2e/dCddSwETBdztZ+vW//0kjkgyc8Dc3YuucrFLFqpn4FY Ysef2xVwE26aOlHfEN8IcNM0lcJpOzs= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=h8gvWhpv; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=Xos+WuMc; dmarc=none; spf=pass (imf05.hostedemail.com: domain of jack@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698834422; a=rsa-sha256; cv=none; b=hymnbgJoRBXzThA6YZqNAWIVVrI8MixYMySJJcHrz9udDgVpo4uPU6MjSxtd+VRVI0HjFC HHzcbNdzkdPIhKDJYLlccMC59Mo9cMQeZ2g3H5Tl/tB+TSx9ZBPg+fclv+i6y2fWmFnkKc yFywOrdQ2iN8/NTo5UmfU5UmajXCuQ0= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 518231F74D; Wed, 1 Nov 2023 10:27:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1698834420; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uJoXUHYwYfb1qi04WanGMYggfSfr0/lYzD994xmjmAs=; b=h8gvWhpvZB3n/xcDCcojT7LMB3QHifmTxGq7OrzobOd4AuXggtFQaPjc4pm0LJWrr2dYiG aDgrXCV+KG+RQPta8qtu1wqiUhgi/HWIzHZCLkAhJrcxE2cA49R4ankBPiuJr/LhvB1dHU OIobjkr9YGLh+Ra9FWzLSzoRnwpJJsQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1698834420; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uJoXUHYwYfb1qi04WanGMYggfSfr0/lYzD994xmjmAs=; b=Xos+WuMcJB4jSHMIJpla91EZ7FxYsdmIjrGTuYyTORMbWDoJ7KhGSfWLD0a3hb/V3fGYJK DvmNwC3g9vEwYMBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3DC991348D; Wed, 1 Nov 2023 10:27:00 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id hjMMD/QnQmViIAAAMHmgww (envelope-from ); Wed, 01 Nov 2023 10:27:00 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id BA87DA06E3; Wed, 1 Nov 2023 11:26:59 +0100 (CET) Date: Wed, 1 Nov 2023 11:26:59 +0100 From: Jan Kara To: Hannes Reinecke Cc: Ming Lei , Marek =?utf-8?Q?Marczykowski-G=C3=B3recki?= , Jan Kara , Mikulas Patocka , Vlastimil Babka , Andrew Morton , Matthew Wilcox , Michal Hocko , stable@vger.kernel.org, regressions@lists.linux.dev, Alasdair Kergon , Mike Snitzer , dm-devel@lists.linux.dev, linux-mm@kvack.org, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, ming.lei@redhat.com Subject: Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Message-ID: <20231101102659.mg5sb6kei5plapvo@quack3> References: <20231030155603.k3kejytq2e4vnp7z@quack3> <98aefaa9-1ac-a0e4-fb9a-89ded456750@redhat.com> <20231031140136.25bio5wajc5pmdtl@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 96E5510001A X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 1muhpgq4t3uwhh8cafzd89p69mf88fmm X-HE-Tag: 1698834422-30368 X-HE-Meta: U2FsdGVkX19eP7yZlFk+c28GfwNjzo7OWBxwSte9DJvWFSx5apmcC0oFJvU7Al+pzAWXfymnfQ1NFxlYav/A4oJq5Qw0iKF1ihwFXhR2mjmsbf6ti/yUQn+gsHpUt71dl1GQoV30rzTi+Ohz3J4v7k/OReXqHodTVsfh8wmdP3K/Ml/7mJ5w7Xl5eEn+OoYcfTUuvV9jFi6irJ+FZj3JVUbRUQdLc6Z/OY3JooM+gM7N2PwJMlUMieeK1PzZC9fwGsuuNaw8lYlpUvN+0a6W3GT+H2PICRCUZuOtWXi98NMC7D55MpBxS8yav88AmUqPrmMw16G9PEqdehFPmoo1b/pc63YyMmhQFIP8RTC5tmv64RXv3ZwTk8dp2GAgOSvMmzY65mXCfMvydXpDlYgZKhn8ODb6qUHCHcd9jEyQEoCkPQ5zPMOBgmpO89l2kxU7g0T5V6LDklEtIDVvTKYaA0vZeOpvi/f6qjuU2bHXviAtcURxIsWljmwfa0uZMurBKeo/nBDgo6ayBH0zH5nG3DJ/6iwJKamFLqzy5HAvndRcrqMc2W7vTY+e8YGR6pD+kXr1cWGOUXaciAgw8MizsbUwDkBZmWjHxqhFkVUPOkTYCNZS32W3+Z4pX3VgW08a897thSVzwP/6I2k5LMrqXaRHgTa1iZF15DEyfM3jbM8I/zyN9QVyDYseB2cT511igVUon5lIJ6Cd6PdXC2IcQi4KsejlTUTHpN/eo96MJKuJprcpPIJH41kOL+lzpM7++A90/VmtwaYf2mkbsIpTDj0pbWdxS6eFEiJh4QQdtEuADgWQAB0cj5BL6oCRXvqVE2lYIbZsOEL6Zw6fuaSH7FnxqpLDQvpHUaL9DJvFSWtjQ/wTYf06Y1PJh3TzzsCI4RfASxG79C4t95ryZVUlu4jOIq9zM2IxU6j3RItKtxx7jQWcbtYzjJdJ06hwURZ8BotvdKZ03Qr0nrX58S5 BQLtxdYi lFyzWwQJxf/6QaVa+Yv6qYavR2opw8Wbbq9S1d/73JcXFcQYfiSHAT/hsPsL2ML1Fm36SEmh+4dF22tqzwY/WPmnQT8IwGfrKPUTNzV8hyrnDU3WW47pU2a2017lsrJTMT+ikwedsphip2IHW6wVLpNCZT0yOAMf8lLxc/Scwn/JRB1Vkus3UbXVMs6PnCbvKfSyZsXOz7NoQvTfgVaT+LYy2Nk1hJkzZme9X6c1Xvp4pR99z79oPHicv6zd5w+AIqvKQZeuw8+ZEYO5pP2T4JlsPaSdh5JdQBJAHtdhZR3QvGsHtaNZcicgUlQhy7DZoRNgbAj7zsVmyQjAeclNczJHlSu1roJ+2rcmsaUbwNrJCPQTKsb93EkUdAstkeAQYdmU5TfU0CpeJRMc+53lvdxlrZPw6L6SmySlMKPNbio+h/WRH1RWIs0Oq06KvDHTfouRihg0rqC3CGOmhy+xgH93ye2582WF5zi2xBqZQdUxMx8c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed 01-11-23 11:15:02, Hannes Reinecke wrote: > On 11/1/23 04:24, Ming Lei wrote: > > On Wed, Nov 01, 2023 at 03:14:22AM +0100, Marek Marczykowski-Górecki wrote: > > > On Wed, Nov 01, 2023 at 09:27:24AM +0800, Ming Lei wrote: > > > > On Tue, Oct 31, 2023 at 11:42 PM Marek Marczykowski-Górecki > > > > wrote: > > > > > > > > > > On Tue, Oct 31, 2023 at 03:01:36PM +0100, Jan Kara wrote: > > > > > > On Tue 31-10-23 04:48:44, Marek Marczykowski-Górecki wrote: > > > > > > > Then tried: > > > > > > > - PAGE_ALLOC_COSTLY_ORDER=4, order=4 - cannot reproduce, > > > > > > > - PAGE_ALLOC_COSTLY_ORDER=4, order=5 - cannot reproduce, > > > > > > > - PAGE_ALLOC_COSTLY_ORDER=4, order=6 - freeze rather quickly > > > > > > > > > > > > > > I've retried the PAGE_ALLOC_COSTLY_ORDER=4,order=5 case several times > > > > > > > and I can't reproduce the issue there. I'm confused... > > > > > > > > > > > > And this kind of confirms that allocations > PAGE_ALLOC_COSTLY_ORDER > > > > > > causing hangs is most likely just a coincidence. Rather something either in > > > > > > the block layer or in the storage driver has problems with handling bios > > > > > > with sufficiently high order pages attached. This is going to be a bit > > > > > > painful to debug I'm afraid. How long does it take for you trigger the > > > > > > hang? I'm asking to get rough estimate how heavy tracing we can afford so > > > > > > that we don't overwhelm the system... > > > > > > > > > > Sometimes it freezes just after logging in, but in worst case it takes > > > > > me about 10min of more or less `tar xz` + `dd`. > > > > > > > > blk-mq debugfs is usually helpful for hang issue in block layer or > > > > underlying drivers: > > > > > > > > (cd /sys/kernel/debug/block && find . -type f -exec grep -aH . {} \;) > > > > > > > > BTW, you can just collect logs of the exact disks if you know what > > > > are behind dm-crypt, > > > > which can be figured out by `lsblk`, and it has to be collected after > > > > the hang is triggered. > > > > > > dm-crypt lives on the nvme disk, this is what I collected when it > > > hanged: > > > > > ... > > > nvme0n1/hctx4/cpu4/default_rq_list:000000000d41998f {.op=READ, .cmd_flags=, .rq_flags=IO_STAT, .state=idle, .tag=65, .internal_tag=-1} > > > nvme0n1/hctx4/cpu4/default_rq_list:00000000d0d04ed2 {.op=READ, .cmd_flags=, .rq_flags=IO_STAT, .state=idle, .tag=70, .internal_tag=-1} > > > > Two requests stays in sw queue, but not related with this issue. > > > > > nvme0n1/hctx4/type:default > > > nvme0n1/hctx4/dispatch_busy:9 > > > > non-zero dispatch_busy means BLK_STS_RESOURCE is returned from > > nvme_queue_rq() recently and mostly. > > > > > nvme0n1/hctx4/active:0 > > > nvme0n1/hctx4/run:20290468 > > > > ... > > > > > nvme0n1/hctx4/tags:nr_tags=1023 > > > nvme0n1/hctx4/tags:nr_reserved_tags=0 > > > nvme0n1/hctx4/tags:active_queues=0 > > > nvme0n1/hctx4/tags:bitmap_tags: > > > nvme0n1/hctx4/tags:depth=1023 > > > nvme0n1/hctx4/tags:busy=3 > > > > Just three requests in-flight, two are in sw queue, another is in hctx->dispatch. > > > > ... > > > > > nvme0n1/hctx4/dispatch:00000000b335fa89 {.op=WRITE, .cmd_flags=NOMERGE, .rq_flags=DONTPREP|IO_STAT, .state=idle, .tag=78, .internal_tag=-1} > > > nvme0n1/hctx4/flags:alloc_policy=FIFO SHOULD_MERGE > > > nvme0n1/hctx4/state:SCHED_RESTART > > > > The request staying in hctx->dispatch can't move on, and nvme_queue_rq() > > returns -BLK_STS_RESOURCE constantly, and you can verify with > > the following bpftrace when the hang is triggered: > > > > bpftrace -e 'kretfunc:nvme_queue_rq { @[retval, kstack]=count() }' > > > > It is very likely that memory allocation inside nvme_queue_rq() > > can't be done successfully, then blk-mq just have to retry by calling > > nvme_queue_rq() on the above request. > > > And that is something I've been wondering (for quite some time now): > What _is_ the appropriate error handling for -ENOMEM? > At this time, we assume it to be a retryable error and re-run the queue > in the hope that things will sort itself out. > But if they don't we're stuck. > Can we somehow figure out if we make progress during submission, and (at > least) issue a warning once we detect a stall? Well, but Marek has show [1] the machine is pretty far from being OOM when it is stuck. So it doesn't seem like a simple OOM situation... Honza [1] https://lore.kernel.org/all/ZTiJ3CO8w0jauOzW@mail-itl/ -- Jan Kara SUSE Labs, CR