From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 644CAC00142 for ; Wed, 1 Nov 2023 03:24:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBDA16B01F2; Tue, 31 Oct 2023 23:24:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D6BA66B01F3; Tue, 31 Oct 2023 23:24:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C34126B01F6; Tue, 31 Oct 2023 23:24:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B12CE6B01F2 for ; Tue, 31 Oct 2023 23:24:34 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 757DC160BFA for ; Wed, 1 Nov 2023 03:24:34 +0000 (UTC) X-FDA: 81407942868.18.6D57175 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf14.hostedemail.com (Postfix) with ESMTP id 9B51A100011 for ; Wed, 1 Nov 2023 03:24:32 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fJHTlPXp; spf=pass (imf14.hostedemail.com: domain of tom.leiming@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=tom.leiming@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698809072; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+2C1iXZcOTrTqGCV4PTQYVBg6Rkhk9UufGCighF5v2k=; b=NsPknL/FaQa1ZiSrAqEG2QQIo2SZmB7AP+W2pau5iIb0ptGSt4mWbawcR+I6fQdOhs82iv yaVAQr01CKeIMkC/O63agmTrzIV0vW8N4HFvJXtCa2B2XOVPKufzwB7mpFnuG/owG4eNRL KhFmDzjkNo38WxdYcqMY3P4YuM7ddDg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698809072; a=rsa-sha256; cv=none; b=Y8axh3sWoZq9SCG6bjIgbXHSLNgBXGBwzrDvNz3yw15ajcfQ59cJw7KZM8n1fxmKWaJ/ko 8xZgRPAJCxojRFz8GfY2GtAzy7ylHROdanEf6z+T0McNCKwgySr/gpRwzKzN5/4ywu90ux TSKaRSMALNerCdUg2zRQVNhqixhWakQ= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fJHTlPXp; spf=pass (imf14.hostedemail.com: domain of tom.leiming@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=tom.leiming@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-1cc330e8f58so26856765ad.3 for ; Tue, 31 Oct 2023 20:24:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698809071; x=1699413871; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=+2C1iXZcOTrTqGCV4PTQYVBg6Rkhk9UufGCighF5v2k=; b=fJHTlPXpzsA8FCsbiHXTs0zTIipWZfDbk1Ocou6/Mj6fU7nurbxCViUxP0XlxQP6JQ aFZ61Vq0ZbO15pH3+tnyvBezEBswYLnxhDkGV/dchHPl6UpiEEOPJ+tGdBppPyEP4k3F gsZXe7uCsODTVukucIiR3YqTlv3QKnmxlMouMh+XqTkiF5x5jEYuTGJkj+mXdbrONXjD o6PupgBltQcMalrQTlcXFpxljdON7axm8yL9y7IYqTCNIcz90Jd5AVYDdlPnEtGsa1Px uBRyCAcE8A08A3iLL/5PPnZaXSeLBX/xLIYwlNb8GI6Eyk7JLuk8EzSHJ/TS3wYgTu2F uQEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698809071; x=1699413871; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+2C1iXZcOTrTqGCV4PTQYVBg6Rkhk9UufGCighF5v2k=; b=oF7y3fuLchPBohiisNKwpl+K+xbEe+rHJBauE3uodINd0SG58gn9EA0Ta8uRY3JiUL 1RuAsLLuHwH+mdrcTGXlGei+tKpy0JXsVicGgIlej1C8WQXh6caRmML5xANjvV0k1Wuv INlj2s/vtrAZh5PLhpciJdxGBGYxmYZuwEq83PJdzma/XQHzlZofzKRetUTFwLNihwwp P2hU7PNAwdiUMG915LUzJy+x0h4GgKgLxH2tJq8f9lyJhb8cAvfDG7OM94K87YSj+LIM +4gsnQq2OlQdq045C6sPe1OxlOnSP6JfxANhKdRVAVEjsWo7vcTMCo7JGBc3/q4U+WZM NtwQ== X-Gm-Message-State: AOJu0YwhNKPBrylA4NsU2DjeDcAiNTlSTuFyGiMV6TYAv7x5o1bIQIFI 07r7daNjhr3d6kBgth/6RGE= X-Google-Smtp-Source: AGHT+IEiqzbxuH1oyyz4CFB5xcpfQcMLCwLmTN0qfwugw7Xk6L/aWx4ZRSK3TgVkGOAedxuqcdqOIQ== X-Received: by 2002:a17:902:9688:b0:1c9:d90b:c3e4 with SMTP id n8-20020a170902968800b001c9d90bc3e4mr11705042plp.10.1698809071306; Tue, 31 Oct 2023 20:24:31 -0700 (PDT) Received: from fedora ([43.228.180.230]) by smtp.gmail.com with ESMTPSA id j6-20020a170902da8600b001c60d0a6d84sm279375plx.127.2023.10.31.20.24.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 20:24:30 -0700 (PDT) Date: Wed, 1 Nov 2023 11:24:23 +0800 From: Ming Lei To: Marek =?iso-8859-1?Q?Marczykowski-G=F3recki?= Cc: Jan Kara , Mikulas Patocka , Vlastimil Babka , Andrew Morton , Matthew Wilcox , Michal Hocko , stable@vger.kernel.org, regressions@lists.linux.dev, Alasdair Kergon , Mike Snitzer , dm-devel@lists.linux.dev, linux-mm@kvack.org, linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, ming.lei@redhat.com Subject: Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Message-ID: References: <20231030155603.k3kejytq2e4vnp7z@quack3> <98aefaa9-1ac-a0e4-fb9a-89ded456750@redhat.com> <20231031140136.25bio5wajc5pmdtl@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Stat-Signature: yfq791az9yey1x617ysoq5sozihhj9gm X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 9B51A100011 X-Rspam-User: X-HE-Tag: 1698809072-866196 X-HE-Meta: U2FsdGVkX18eYxD61sbOL2003injqUTZb0rT8yRVf0FdakgWSHvZqCrdt4qg5xVPSfIpTDVcN8g3JEV161z0u1VJKp1MCGMe/oAuDpujzsknc/wQwK0SNkVU5aDa7QUGO1b9xdzSiOb74ryZbDVq6IDjLTaB+PsK/pdLfzHsptdvQu1J62PQL09GAxDXAYGsETOCY8seabzLV6lINUN0myhw8DVN01Qn5+JoeVBOiuSn1Ka6SLkYONto+ppbvbGS1HQlkJOVpP4RCFJiDfoXexv0iwm70HIdZs8DDRPkclfVCD7RFWIamz3yLlZw+OYfd5MhrsVAqpXC4VX5VqX1cTutjVh3ChJN8D+a6195R+nuAiGh7aroq/0YUDTt2fzw/I8skYFO5fMQQUCKMu3kzEAJTrFqGmeKJoW/QyxC0AnK5To4Ij41rqlQg5QZ03d7wpO7/dRNzwQ457WWBUh5SvR2pNg2Hs6FN6hON+i3quEE4fAOdN00GNXykr2MRKqbdM3yPtAyW4w7OKTmwzWbHoDLtP2KuccG4+Cxe7hTxKpLZUXpWEg5cGISfnmUxybQB2Iv8ZMGITCGIW+m9LaZxWSsSREupx1xKYcrkCA0Hi9EWpICotLnVQAEUB9K+zPc/C+JCts6/4C7gRe0aFtah+NDORMn+cJm++5ejcLwHj212BcnNzdH//F3Pxl5LyrSAJA0yjzi3minR3HlScFiHZ6vxpY5gC6ISNLHfTVceCvhLKTCAqxojKOwgOWwAWXd4jCJtAKLGceokkBvPCv93IhPBsqBQzXyIeGraTw5x2eFZwi/c8R2iP2CgwuEzPSOEaV9xBCXeg0xfxyVhHrrAAcsF6yPvsKamCIhtOkRZPxIpHJVJquNXJi/aBvCBIlOa92NF7pPSattDZZna7gpPLQT3FC5rO91Ve6Hhfdjmx5p+pj3N9qllzfS1uWF9L+TYUL+2JwOK7yUuxMGYSl DdOb2CwI 0yP0u3VvvLI/3TsdHgoiRYa8KwjejAnmIxmgQQRiKgrtbM9twU9p1fhEKSn/x3PIPHg7VuXfkpfeFsUWxeAp7cAGTEIZDOf8akNycZWDBUvgVzKidAzpOTYI30ht32zRXjlpddf7sKDZcjpkM3Loz+8rl1YxEAQ/kAt1T0A8fN8kQi/XApnRM2Dfa9KbDdwREMZhSSR1axf7EfBYb+OKHqmbE3lW1ijBcVPWsA5dJkAxl5T7NkvrcA6RAjiEbelN6IaaicN98+MweiXwyA+qf9yVuvpW+B6HMfC2Qtfx8b+W7acHKDUz5GpyhajSR7EsUSF7msQEo//v2A2biADnCXK2i6LpJrsWXCbwAo7cCsBSf9tYYmLLjbFHmUQjKbGJqI/MCBb9Znz0yQ63yrZjcqUxMTSddqXFrHwS61d7AALwMe0pkbF5NQTVnCt5npoFk75Y9MXvQpR6QQDzCWb9Lb4bbCjHxC3cW8LhL4ZSNeJve+MB1uWbpuWQUN92v77XKZ4DBL6v1qQsRu2FZuHa/KYD8/Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 01, 2023 at 03:14:22AM +0100, Marek Marczykowski-Górecki wrote: > On Wed, Nov 01, 2023 at 09:27:24AM +0800, Ming Lei wrote: > > On Tue, Oct 31, 2023 at 11:42 PM Marek Marczykowski-Górecki > > wrote: > > > > > > On Tue, Oct 31, 2023 at 03:01:36PM +0100, Jan Kara wrote: > > > > On Tue 31-10-23 04:48:44, Marek Marczykowski-Górecki wrote: > > > > > Then tried: > > > > > - PAGE_ALLOC_COSTLY_ORDER=4, order=4 - cannot reproduce, > > > > > - PAGE_ALLOC_COSTLY_ORDER=4, order=5 - cannot reproduce, > > > > > - PAGE_ALLOC_COSTLY_ORDER=4, order=6 - freeze rather quickly > > > > > > > > > > I've retried the PAGE_ALLOC_COSTLY_ORDER=4,order=5 case several times > > > > > and I can't reproduce the issue there. I'm confused... > > > > > > > > And this kind of confirms that allocations > PAGE_ALLOC_COSTLY_ORDER > > > > causing hangs is most likely just a coincidence. Rather something either in > > > > the block layer or in the storage driver has problems with handling bios > > > > with sufficiently high order pages attached. This is going to be a bit > > > > painful to debug I'm afraid. How long does it take for you trigger the > > > > hang? I'm asking to get rough estimate how heavy tracing we can afford so > > > > that we don't overwhelm the system... > > > > > > Sometimes it freezes just after logging in, but in worst case it takes > > > me about 10min of more or less `tar xz` + `dd`. > > > > blk-mq debugfs is usually helpful for hang issue in block layer or > > underlying drivers: > > > > (cd /sys/kernel/debug/block && find . -type f -exec grep -aH . {} \;) > > > > BTW, you can just collect logs of the exact disks if you know what > > are behind dm-crypt, > > which can be figured out by `lsblk`, and it has to be collected after > > the hang is triggered. > > dm-crypt lives on the nvme disk, this is what I collected when it > hanged: > ... > nvme0n1/hctx4/cpu4/default_rq_list:000000000d41998f {.op=READ, .cmd_flags=, .rq_flags=IO_STAT, .state=idle, .tag=65, .internal_tag=-1} > nvme0n1/hctx4/cpu4/default_rq_list:00000000d0d04ed2 {.op=READ, .cmd_flags=, .rq_flags=IO_STAT, .state=idle, .tag=70, .internal_tag=-1} Two requests stays in sw queue, but not related with this issue. > nvme0n1/hctx4/type:default > nvme0n1/hctx4/dispatch_busy:9 non-zero dispatch_busy means BLK_STS_RESOURCE is returned from nvme_queue_rq() recently and mostly. > nvme0n1/hctx4/active:0 > nvme0n1/hctx4/run:20290468 ... > nvme0n1/hctx4/tags:nr_tags=1023 > nvme0n1/hctx4/tags:nr_reserved_tags=0 > nvme0n1/hctx4/tags:active_queues=0 > nvme0n1/hctx4/tags:bitmap_tags: > nvme0n1/hctx4/tags:depth=1023 > nvme0n1/hctx4/tags:busy=3 Just three requests in-flight, two are in sw queue, another is in hctx->dispatch. ... > nvme0n1/hctx4/dispatch:00000000b335fa89 {.op=WRITE, .cmd_flags=NOMERGE, .rq_flags=DONTPREP|IO_STAT, .state=idle, .tag=78, .internal_tag=-1} > nvme0n1/hctx4/flags:alloc_policy=FIFO SHOULD_MERGE > nvme0n1/hctx4/state:SCHED_RESTART The request staying in hctx->dispatch can't move on, and nvme_queue_rq() returns -BLK_STS_RESOURCE constantly, and you can verify with the following bpftrace when the hang is triggered: bpftrace -e 'kretfunc:nvme_queue_rq { @[retval, kstack]=count() }' It is very likely that memory allocation inside nvme_queue_rq() can't be done successfully, then blk-mq just have to retry by calling nvme_queue_rq() on the above request. Thanks, Ming