From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B73D6C4332F for ; Thu, 2 Nov 2023 11:46:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 20B1E80021; Thu, 2 Nov 2023 07:46:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1BBCA8D000F; Thu, 2 Nov 2023 07:46:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 034E080021; Thu, 2 Nov 2023 07:46:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E738A8D000F for ; Thu, 2 Nov 2023 07:46:04 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C40A81A09E8 for ; Thu, 2 Nov 2023 11:46:04 +0000 (UTC) X-FDA: 81412835448.30.9870BE9 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by imf26.hostedemail.com (Postfix) with ESMTP id A036F140025 for ; Thu, 2 Nov 2023 11:46:02 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=invisiblethingslab.com header.s=fm3 header.b=N5WuW9tF; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=ToeFUeiA; dmarc=none; spf=none (imf26.hostedemail.com: domain of marmarek@invisiblethingslab.com has no SPF policy when checking 66.111.4.28) smtp.mailfrom=marmarek@invisiblethingslab.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698925562; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jyR214XHmiYuLiqo14R8R1Hmefqc34vJfyvprJgyycI=; b=j7Z+c5WH3lTp2zQF7n8yFQjZbxFIIDxBUWPv6n0YpJjhp6ItEWutNhVPaL4CZBAVQvHTc2 x/DLWOH2lpSuYo0OTRUwPovwqK6UaNj6d7K7tVEZKCVvUZnzI5wTyPpNSTLGvSwI1JfvP1 GfPNuk9b0f0z4Ypv3OBuKZEs7KlhhQw= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=invisiblethingslab.com header.s=fm3 header.b=N5WuW9tF; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=ToeFUeiA; dmarc=none; spf=none (imf26.hostedemail.com: domain of marmarek@invisiblethingslab.com has no SPF policy when checking 66.111.4.28) smtp.mailfrom=marmarek@invisiblethingslab.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698925562; a=rsa-sha256; cv=none; b=vlXQAf8rgBiGmnNxz9cZvHVp1CBrOAgRVujO1Pwn2RYwzNYEWgNbpL7ZclprM1urCQTIDg joz6BmjhBVcZx7hhLL/Val0nbDrE8MlqvaU02EgOatiMiVJyBoDW7C02nWqYA7xHIiJnxr MTsJ5ED74fZqDm2BZUQH/GxRK3nbKfc= Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id DEB8D5C0087; Thu, 2 Nov 2023 07:46:01 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Thu, 02 Nov 2023 07:46:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= invisiblethingslab.com; h=cc:cc:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm3; t= 1698925561; x=1699011961; bh=jyR214XHmiYuLiqo14R8R1Hmefqc34vJfyv prJgyycI=; b=N5WuW9tFkWLqw8Pnijhdo7CtpshkwsARFGFqwVhaMG9EFkUyTdY M2grn94+F+ckmaJLJg90vKyvfoRBEfxTOMMMECHpzzXPYeWafW3pLCo4t/ZtcC71 TmFFNMFBxd4UYfUNpO+j9tGRWq1y0GFJWUf6evWnQeEZAeDkNsW84Gt9RmZs1yMC DVBGivL7HpbIAygmTAwUDC/7UaYnxzByVnkfmlSDPEPpiME3Ddnhmn0SP4cYv9VC jumkuS/DhuPXv61wBgZ4u23tiyDoZEGWox8V8Wi7LOxfxju5FZhqUeR3qf9H+MFn BQu/ARO/DzlFERK3Hv730B6+y5Z8VVhYxxQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; t=1698925561; x=1699011961; bh=jyR214XHmiYuL iqo14R8R1Hmefqc34vJfyvprJgyycI=; b=ToeFUeiA/zZbb+OeBzfcmtZbAlV6O G3Dv1dj+ZhtQLjkV+uEYn1t74MOxFs3MPvD3+yQHPoiS+Ip4ir6l27Bhidu1/fu9 eH/EA7ZdZT4QGXnPsZb55JNsNqajZednb4flvTpHtNyARMi6gIUWfF6+aoks8zmL t91pGDKLGzs/KWPnU2cQ2+iKwQnq90frLX81QY3/B+e8CrWjx60qtFuR9eaTcHDr YyPpBTon2oSW04OA72SKcyJ41VR3oHiHisYieYK0e4y4CQQ/TduOGqbAd2OMh1qy 2OJM9LayXUwOGJtukorW45iQcKfe95slF2JQdIueZOFugDEQDZrl7D/6Q== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedruddtiedgfedvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesghdtreertddtjeenucfhrhhomhepofgrrhgv khcuofgrrhgtiiihkhhofihskhhiqdfikphrvggtkhhiuceomhgrrhhmrghrvghksehinh hvihhsihgslhgvthhhihhnghhslhgrsgdrtghomheqnecuggftrfgrthhtvghrnhepieel uddvkeejueekhfffteegfeeiffefjeejvdeijedvgfejheetuddvkeffudeinecuffhomh grihhnpehkvghrnhgvlhdrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgr mhepmhgrihhlfhhrohhmpehmrghrmhgrrhgvkhesihhnvhhishhisghlvghthhhinhhgsh hlrggsrdgtohhm X-ME-Proxy: Feedback-ID: i1568416f:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 2 Nov 2023 07:45:56 -0400 (EDT) Date: Thu, 2 Nov 2023 12:45:54 +0100 From: Marek =?utf-8?Q?Marczykowski-G=C3=B3recki?= To: Mikulas Patocka Cc: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Jan Kara , Vlastimil Babka , Andrew Morton , Matthew Wilcox , Michal Hocko , stable@vger.kernel.org, regressions@lists.linux.dev, Alasdair Kergon , Mike Snitzer , dm-devel@lists.linux.dev, linux-mm@kvack.org Subject: Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Message-ID: References: <20231030155603.k3kejytq2e4vnp7z@quack3> <98aefaa9-1ac-a0e4-fb9a-89ded456750@redhat.com> <20231031140136.25bio5wajc5pmdtl@quack3> <8a35cdea-3a1a-e859-1f7c-55d1c864a48@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="ryHRs7NcQmS/qwTv" Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: A036F140025 X-Stat-Signature: w7ajx69gciihyhfz8acooneqzciyz3u1 X-Rspam-User: X-HE-Tag: 1698925562-136755 X-HE-Meta: U2FsdGVkX1+JEs7sLgp4JNaiBrsgLjKD+66sTBZTy1yWuAExIYV03qkJ/0WuzzBTiyBTGZOcIv5lc3a8wr1R2VKcP18rAaYdG8jTjXQ4mbTtvM+z8no9uOg9y9c5tGi12hzsGuQ5RETBJ4E+tbq1SEXW6ScejWr3F7SQYxdG7eZsxUVtPIgoQOIKF/rznzOehyt022xys0oiC83tVKeS1UB1Tx28x4xgKxO8fWJqKl2ClfUQgGhD79hBWs8rJJWMui5Tt844OHgRksE85Jkz7CKqFypfs7AuG/plggLHQ7APdh/Y2EWT7h5t2pDtswjJLLJWBEJw/TrRTpidBFkYNFtGgXQztDzLBEx3TbJcn1C/8rM5B89dgrM3aM14cFZzDyJ8ACqg1VIz5HR5Od0bItcj2a3llc0Vs7vdlsngICQChP5kiqwRLShwUrtLSqV121zb0dfIwBko5SZYbtaCXQjlDQvAE/eYreMw0S8wjiBTkNtZAssufVGPmAgsAeCFPf94/KwSBnJNTOQKLnFiHQdC5KeJpxW/xAmdv5CUyQiF5L+0y0tNdavksdS8axFC7s4WvwDylN6FMCasc3BqmzxBMEIFCL+k5aEE6iOvAImsaUifKniTF2O5EecewOQnl5ghUnmT/aibwPpworirO0Vh4GY8GfkQA22liv1GyW9vwoV4VS9cppPbSP8Lyj77S/45IBHeahuWi6BmXTemFstNGMjpz/jAhLZyKqOookZJDa/ssJRQtH4jESQiIqWCH8iV9ltodXOoTubGQCKaB/f+qhySsXLwgshRireus6dH8I/AKTtIIFjkom/2eQCauCs2SvISxGj4oEK3L611qEJgIOISUsgC0RXuQI7bx0HaJxiQKlnpHKBH3TnF4O4MYlN2g5UtLwz40QCUe9gKlWTabNNOHB3HQTeYUvmTPDOlmp608U4k+j+PCyuMCMgp/4N09M69OyyNHy4wSF0 0BE6eK4i FzipDT22nhcOO5DHRJNLHnetPgIDKQ4fB9NPA6Y32xwBzgV3y6i9ZYdkcr/4rS5o2aCbeV/SEHkkiVG7zshJ0kbCIw90J3TwXz/EBcH1m1UtwtLMba4u4vdmoclYLiJUWPEex0Ov8uFAkoQCg57hd5JqKxg6ZHuAARBd/4T1U/BFND759l9myks38b9E1mN9jRJuZNRo7htWAZLxtNJScDYZUtxEclXceyrrC1k9rn5s2HE6EtI1005lpMrAUazItw5Cp1aDmU6DOXjvri5HzCEX7YYC4zH+JYQkfSdmPcyK2r9M2PMu/09uNiGIf5M7Lhr/Bsj+evFwLX/ID8naEhFu+KJKVixyUuD/7qBpFVemsxPT3b2VCnKuPhCrPcZIauYCK7z8xCDtICaGbSsIqMIA0PXsAH7jDNDzzMWmLdVFkTkKEq4y1fMQMZtBWkJwvMgvG7LDLDEnW0ZukURdaCXmWRtuy+8+KVFnPEda7GexdpXlYiKt+iIk0YagRZSFk/QwO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --ryHRs7NcQmS/qwTv Content-Type: text/plain; protected-headers=v1; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Date: Thu, 2 Nov 2023 12:45:54 +0100 From: Marek =?utf-8?Q?Marczykowski-G=C3=B3recki?= To: Mikulas Patocka Cc: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Jan Kara , Vlastimil Babka , Andrew Morton , Matthew Wilcox , Michal Hocko , stable@vger.kernel.org, regressions@lists.linux.dev, Alasdair Kergon , Mike Snitzer , dm-devel@lists.linux.dev, linux-mm@kvack.org Subject: Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 On Thu, Nov 02, 2023 at 10:28:57AM +0100, Mikulas Patocka wrote: >=20 >=20 > On Thu, 2 Nov 2023, Marek Marczykowski-G=C3=B3recki wrote: >=20 > > On Tue, Oct 31, 2023 at 06:24:19PM +0100, Mikulas Patocka wrote: > >=20 > > > > Hi > > > >=20 > > > > I would like to ask you to try this patch. Revert the changes to "o= rder"=20 > > > > and "PAGE_ALLOC_COSTLY_ORDER" back to normal and apply this patch o= n a=20 > > > > clean upstream kernel. > > > >=20 > > > > Does it deadlock? > > > >=20 > > > > There is a bug in dm-crypt that it doesn't account large pages in= =20 > > > > cc->n_allocated_pages, this patch fixes the bug. > >=20 > > This patch did not help. > >=20 > > > If the previous patch didn't fix it, try this patch (on a clean upstr= eam=20 > > > kernel). > > > > > > This patch allocates large pages, but it breaks them up into single-p= age=20 > > > entries when adding them to the bio. > >=20 > > But this does help. >=20 > Thanks. So we can stop blaming the memory allocator and start blaming the= =20 > NVMe subsystem. >=20 >=20 > I added NVMe maintainers to this thread - the summary of the problem is:= =20 > In dm-crypt, we allocate a large compound page and add this compound page= =20 > to the bio as a single big vector entry. Marek reports that on his system= =20 > it causes deadlocks, the deadlocks look like a lost bio that was never=20 > completed. When I chop the large compound page to individual pages in=20 > dm-crypt and add bio vector for each of them, Marek reports that there ar= e=20 > no longer any deadlocks. So, we have a problem (either hardware or=20 > software) that the NVMe subsystem doesn't like bio vectors with large=20 > bv_len. This is the original bug report:=20 > https://lore.kernel.org/stable/ZTNH0qtmint%2FzLJZ@mail-itl/ >=20 >=20 > Marek, what NVMe devices do you use? Do you use the same device on all 3= =20 > machines where you hit this bug? This one is "Star Drive PCIe SSD", another one is "Samsung SSD 970 EVO Plus 1TB", I can't check the third one right now. > In the directory /sys/block/nvme0n1/queue: what is the value of=20 > dma_alignment, max_hw_sectors_kb, max_sectors_kb, max_segment_size,=20 > max_segments, virt_boundary_mask? /sys/block/nvme0n1/queue/dma_alignment:3 /sys/block/nvme0n1/queue/max_hw_sectors_kb:2048 /sys/block/nvme0n1/queue/max_sectors_kb:1280 /sys/block/nvme0n1/queue/max_segment_size:4294967295 /sys/block/nvme0n1/queue/max_segments:128 /sys/block/nvme0n1/queue/virt_boundary_mask:4095 > Try lowring /sys/block/nvme0n1/queue/max_sectors_kb to some small value= =20 > (for example 64) and test if it helps. Yes, this helps too. --=20 Best Regards, Marek Marczykowski-G=C3=B3recki Invisible Things Lab --ryHRs7NcQmS/qwTv Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAmVDi/IACgkQ24/THMrX 1ywWiAgAi+onSGtXf8qh3OO8wNl5vwpa+xEkcDadwdJPa9ICy67uroUWHAhHQAGR BQf0lKlq3bFyzRqn8+Zc3CtF4oDTt5NOqymZwhmm3zFvfvBy924TpWe5IGdmqlz6 06VcRIpTlX27WRMqlzYwPPUrUbW9gtQEoSEzZejU4E++YaEgzMkk9YFEYic2PcQ7 XqOmFVswdWqAKliEaOqyqx0aj5BGEM/Hbca7NY1jSUds/YShs6mkXk1GpKrBtmJ+ 74TQiwr5nIyERqUvg7dOCE/NYTnyLEVPUtSXQdYVsyfg6Ok9Gg2mp4KDWZu26v4J 9YX3BdqjNVHXhIPOxJ10aDJga0Y0fQ== =7y+j -----END PGP SIGNATURE----- --ryHRs7NcQmS/qwTv--