From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3891EDE9AB for ; Tue, 10 Sep 2024 17:53:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 215AF8D009D; Tue, 10 Sep 2024 13:53:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 03A248D0056; Tue, 10 Sep 2024 13:53:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6BC08D009D; Tue, 10 Sep 2024 13:53:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C286B8D0056 for ; Tue, 10 Sep 2024 13:53:29 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3FDAEA862A for ; Tue, 10 Sep 2024 17:53:29 +0000 (UTC) X-FDA: 82549575738.22.346E5E3 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf11.hostedemail.com (Postfix) with ESMTP id 8958140004 for ; Tue, 10 Sep 2024 17:53:27 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PArc4Ics; spf=pass (imf11.hostedemail.com: domain of kbusch@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=kbusch@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725990780; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yRK1RxwVW0BlQwtvOw18QnUBnnzg3LO9FR4oTg1R41Q=; b=CCCbWe4x7THoTaNkcive/UMBGc8w223RvNmn92GM42JT+dgMtZczm+nbMfVqAdg7WDJ/gs fpM1JtuU2Z2E8kziJk8EyheUCCzZETl3Ips3PvF6t3V2nR8l6xzltrX751VC7qKnNmFz0v lQHBitLNd8oyIwXbQuGhubHlS3BbUYo= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PArc4Ics; spf=pass (imf11.hostedemail.com: domain of kbusch@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=kbusch@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725990780; a=rsa-sha256; cv=none; b=Bc02YBSQODzHTBgYBn242ws2CRlodb0uMyGZJPQb/qOgWEZlXv+85/cDgCdT1WWf9Iodfb ygjcPy5WhuUnY7Kg4o2Raaa72//EvXsy8+7XqyHetK1vAG3YTVSoaHZUuN4b03nj2VvxXo mcj8Y9dIAGMY3W7txauFT4ESKxgkpgs= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 2589B5C04FC; Tue, 10 Sep 2024 17:53:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F0E6FC4CEC3; Tue, 10 Sep 2024 17:53:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1725990806; bh=igoeBjib071T2Fw8o0YqKVgDabZCIMtx9Z1jvelehiM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=PArc4IcsVFWAXrD/ObIjdYDdBkz6YCQnBoxdpEnV7XdV+b5ab0lLqkpF2rNxbB1xR SWF3Y6424rsKdfxIRSFjvTcOYjNyLK+Hw9bIH/3oPzfNAf55WOFdzQrXsXpw1DBSCq uHqDs9Qi45aeLJMEr4y1Uhza/+oRa/osKfvGPTNUMB1pT6IhUv0UinHFAg2IAqGbzb UFbqFX+aqm5Sa3Dyt4Pz26dD4gMfF1dLsen3ChjRWsnaucQarllYvp3Dkzg4xwyxQ5 4a3L0Q5YaA8vVuUo3Fjv5Xx8d2dNjAhcNgluT30uDIsjbgy0SNmcG0Tmy1lpt1MKWT jZFLn0jGrU21Q== Date: Tue, 10 Sep 2024 11:53:23 -0600 From: Keith Busch To: Robert Beckett Cc: linux-nvme , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Andrew Morton , linux-mm Subject: Re: possible regression fs corruption on 64GB nvme Message-ID: References: <191d810a4e3.fcc6066c765804.973611676137075390@collabora.com> <191db450152.e0b28690987786.6989198174827147639@collabora.com> <191dcfa4846.bb18f3291189856.1624418308692137124@collabora.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <191dcfa4846.bb18f3291189856.1624418308692137124@collabora.com> X-Rspam-User: X-Stat-Signature: mjsnkbrowttrhmdpeoekmihrphg6ttzq X-Rspamd-Queue-Id: 8958140004 X-Rspamd-Server: rspam11 X-HE-Tag: 1725990807-928667 X-HE-Meta: U2FsdGVkX1/79Xsk9aCTHRO8h9MYpKkO7uiBIdayJOrvTz9ftW7BMWuqoEjd+k0bs1GEFq/Q5nHv0Q31ayqeYm5yTPZ5yKo5MKFMcz1lowllmD18LhJuy7cRE+F8BwslqsO2j6dSsIlIxCIJo4gXOrFrHOg0ZEQ2P3mcV/rXTNHqzqhSsJkf3iGsJD8X9aH2rf3aPe8nbKYLawbuGz7PpatnF5Si5aCCcCggRW9B1UqYkud2PYsZOKDWyS00rEzfZnibz39bND4T5+8z+TzfBHP/CY/XG46Uu0441DbQRREunZQTYDgXw37fLZ1mTKT9tp+lQx+2Xr8qNBQ23AJwt7L2M3JpKf522spUf/g8mj3xrQH4tiu1xqxzXLj5ZMGKypQqgnZ6XOr0WpmIFAfdLf20Ta1mERA27RWkHiJu9j3kPzuKc5xbTGUHzoOMBgEjI7BJvvBY20BoW4DylLAzr72KgqWZeIoZrWukg8r5MappCc3oRvvA0OG75xByYM5tKFUuYUORWsBcrbmZSgfaBkJCPHENtPJOsEthfe49odXfHrQbWJUmJ0TsPSWnzv8uMB73P/5vVyFVEspja90PdO9bgvI9UeexAc6hB44XuWKRJlihT5dEr9aSCXVyo+mDKPxMCD1ENIj2l+uLDlSb5xJ5tRug1f1X17Z4Os+52Kb/xhXJYAUrXXx3qbkJo3/aN89q1Lh5BE+rwfCaQo9vv2Qgp/UcT8BW2NNjvms9tIppgHbp7GR9Omi4DIf3IVjPFe5KueDRLXqwu8uw6sK6vSIeXfFbpa2I+XxEMQhhk2lvddPnVdHyn1pVUxcvx261NwE6g5VN7O5pYVVaGBR3SekzrqYY6bXJ6ASsn1z4321rCMvBuCOeb6qtqW/aRNhMqnNetWw7HIFc3kUAegCV/AYH0jbq3Luhmgp+szqU/uKCdCl2o8BK6euYTML3dQZyiaOAyDxp5n6G9T5IZYF NMPGYD/a LX8Re716itFZA5IVbwL7F9Vlu/Bn1idblmp9yEE86+JAz0cPFd2WAA7EB9+5PkoeynRy1TscOiCN0fv5ANE+XeMnzxe9eCMU2kVD2zeR40vAWd2uuXgbHGT+7GHz3RFX2o0aruQwo9FO2pQ3neqHoOZXQqm9kNvLnmCd5u1O1u5Ewjo26LI+36FMxZ1wkhpxG7Uft3c2jNfO1xqSHYa6hciETE02maBOjDdSJz+GJYywInlE5tC6xtRcsyneMrQrI/7hZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 10, 2024 at 06:27:55PM +0100, Robert Beckett wrote: > nvme.io_queue_depth=2 appears to fix it. Could you explain the implications of this? > I assume it is limiting to 2 outstanding requests concurrently. You'd think so, but not quite. NVMe queues need to leave one entry empty, so a submission queue with depth "2" means you can have at most 1 command outstanding. > Does it suggest an issue with the specific device's FW? I think that sounds probable. Especially considering the dmapool code has had considerable run time in real life, and no other such issue has been reported. > I assume this would suggest that it is not actually anything wrong with the dmapool, it was just exposing the issue of the device/fw? That's what I'm thinking, though, if you have a single queue with depth 2, we're not stressing the dmapool implementation either. It's always going to return the same dma block for each command. > Any advice for handling this and/or investigating further? If you have the resources for it, get protocol analyzer trace and show it to your nvme vendor. > My initial speculation was that maybe the disk fw is signalling completion of an access before it has actually finished making it's way to ram. I checked the code and saw that the dmapool appears to be used for storing the buffer page addresses, so I imagine that is not updated by the disk at all, which would rule out my assumption. Right, it's used to make the prp/sgl list. Once we get a completion, that dma block becomes immediately available for the very next command. If you have a higher queue depth, it's possible that dma block is reused immediately while the driver is still notifying the block layer of the completion. If we're thinking that the device is completing the command before it's really done with the list (which could explain your observation), that would be a problem. Going to single queue-depth might introduce a delay or work around some firmware issue when dealing with concurrent commands. Prior to the "new" dmapool allocation, it was much less likely (though I think still possible) for your next command to reuse the same dma block of the command currently being completed.