From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E484EE57C6 for ; Wed, 11 Sep 2024 16:57:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE916940072; Wed, 11 Sep 2024 12:57:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D7219940066; Wed, 11 Sep 2024 12:57:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3995940072; Wed, 11 Sep 2024 12:57:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9E76D940066 for ; Wed, 11 Sep 2024 12:57:24 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3FE051C4E01 for ; Wed, 11 Sep 2024 16:57:24 +0000 (UTC) X-FDA: 82553063208.23.40A5BF1 Received: from sender4-op-o12.zoho.com (sender4-op-o12.zoho.com [136.143.188.12]) by imf10.hostedemail.com (Postfix) with ESMTP id 291B7C0003 for ; Wed, 11 Sep 2024 16:57:21 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=collabora.com header.s=zohomail header.b=lQOhV7+T; arc=pass ("zohomail.com:s=zohoarc:i=1"); spf=pass (imf10.hostedemail.com: domain of bob.beckett@collabora.com designates 136.143.188.12 as permitted sender) smtp.mailfrom=bob.beckett@collabora.com; dmarc=pass (policy=none) header.from=collabora.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726073739; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4JqNaJXldq8FIMnfaqDaw0f3mjGTip+MzSYQvZ1NAg4=; b=WOfrRu1w5Eakq/M+eAxDto+iDw9ioVxKZoTwiAPitWuaMMbJGax3vRJJS9jZu6AJQR2m/X i6DSBHXpNTYWOo0eV3TebfBugRc0GIok7LScBughefSt2/vRJKxe0hAH+AUppameEkOGAk 17mSnG42xuGOCq86xTxq94kMiIWLE6Y= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1726073739; a=rsa-sha256; cv=pass; b=hekiuJzoeQ1J+87Tojy40vCjcQn2XULut1nWbPBF9U8ymPqcRisiIqCbrBMv/QTFEQ6UMn 16XM/gxrwnqQY6qol7v+hsfQGTkDLa5jLDzG9jXCk5R6jU/Vs5LfiTaKxRYZAmUZpfin1c XwACCapPgfQrCJ4SBJO8xCUEanKjaX4= ARC-Authentication-Results: i=2; imf10.hostedemail.com; dkim=pass header.d=collabora.com header.s=zohomail header.b=lQOhV7+T; arc=pass ("zohomail.com:s=zohoarc:i=1"); spf=pass (imf10.hostedemail.com: domain of bob.beckett@collabora.com designates 136.143.188.12 as permitted sender) smtp.mailfrom=bob.beckett@collabora.com; dmarc=pass (policy=none) header.from=collabora.com ARC-Seal: i=1; a=rsa-sha256; t=1726073830; cv=none; d=zohomail.com; s=zohoarc; b=ZxEQr7sfE3hnAlGVxNnCSRD3qhUOe9vFrxHqJu/d7gDrUAA0C0kOzjh+QvtK3PGEiL1ughI4g8LwfJYtOXmCb1HIN7yGv6ydhJBC7dWaPqlJA4GTCKA2JIKcjC4afwk8EJabc7zVQ4qoAxAC4xyFNBbYuvt+6l1wGnxUbKzthw8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1726073830; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=4JqNaJXldq8FIMnfaqDaw0f3mjGTip+MzSYQvZ1NAg4=; b=RDFHhx4oZBaxaDr6F3I999QUMX9G+5/oFCCUrevGbwZKY+vYSEAf3Pqo2PXSGx+pGzINxFJBssIetMQr7U8STcmnJTgxpJh1PUtfRg+T4iCoX0N2Z15xuXyDsl+w64lfpph8AbKSiSVZCXWLLjuPRachsCPRDYBdxW6BPheFivg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=collabora.com; spf=pass smtp.mailfrom=bob.beckett@collabora.com; dmarc=pass header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1726073830; s=zohomail; d=collabora.com; i=bob.beckett@collabora.com; h=Date:Date:From:From:To:To:Cc:Cc:Message-ID:In-Reply-To:References:Subject:Subject:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-Id:Reply-To; bh=4JqNaJXldq8FIMnfaqDaw0f3mjGTip+MzSYQvZ1NAg4=; b=lQOhV7+Txpb/9vXYKQI/vuiM9YfKVsbvBS3gJJofQu1meWj9gsgAEMEeAzJdzfUg b69HtPTjXTzimyy2ikwfEJSxTya9YcpAPEsMKqE0fEId9XHrA/y9yWt2ps7wjOJVn24 gJlm2Vdev7o64Xu7egQYNh7aqiZmsV/7yoE6am5A= Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1726073797935336.22030053810806; Wed, 11 Sep 2024 09:56:37 -0700 (PDT) Date: Wed, 11 Sep 2024 17:56:37 +0100 From: Robert Beckett To: "Keith Busch" Cc: "linux-nvme" , "Jens Axboe" , "Christoph Hellwig" , "Sagi Grimberg" , "Andrew Morton" , "linux-mm" Message-ID: <191e203fd17.118394e2d1606957.7379866259983341167@collabora.com> In-Reply-To: References: <191d810a4e3.fcc6066c765804.973611676137075390@collabora.com> <191db450152.e0b28690987786.6989198174827147639@collabora.com> <191dcfa4846.bb18f3291189856.1624418308692137124@collabora.com> Subject: Re: possible regression fs corruption on 64GB nvme MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Importance: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-Rspamd-Queue-Id: 291B7C0003 X-Stat-Signature: 5m8a97dgbfshxurcwc6ore1m5uy91usb X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1726073841-36618 X-HE-Meta: U2FsdGVkX1/ZxuuZtqiuxzAsm4ES6tW7WBX0VgW+QTjhKr0YCtbDRzPpEQkthFEc0pZFoTQN6ToV5ITzbtqVUiDUSgTA5HSzYIl9JbO7Ve48CxvahZBt8Y6QZiSINvTpJJjIfqY4LnEKen5I9y0ztgQ6LJPuTXqOxrqjvuXohISQF4MWPQQbnuQVKP7eiVcn7I54nAD27IP1VUvbA2ooD6mw/P4uyuhm+FXnLD6C81yqsGsiFMFHYT71mpo2h1K7cpHX1Hxey2KchlJIoPOTJsrldiZYqXpu4NROfH4NkluK1To7bRI/hh1er2KwtxXjxGza+4lpBo4MjtyML/VqJ0pdcPW7rk4VwxiwcgXfFYjmqy7I4dy71IoMl+ih9Q3JCZdJY08k+NmNBKjxvE2GdQYRLSz3JR8hnfYZkv7zKYYmRlfyME7NstsQNoBHculxJME01yWMhsBhF/zMSG4rJxmVXMXf+qidkEl2u2U4vG/NP2gyNboikzG4njEvDtPnOeeVfIHnSTnrNag6FZiAbJPxlGT88aaU3JWQzZuA7PEOGODqdVZ9fMObc2sU+TYN/p5Z1zhEsbeDovJ7mOMKT9NfPpbwAWaeYf9G7yGTvnai/mzDZ2Qb7fkDbv5UK5sfCcR4zUuFvh5dqavcJfsMO29XAEAh+oY5HnLW7Hj0HbcBwmGQkEZ2wwFuD+Ro/fn3oNaAU102MisMlnwDD5eAlcXZJhCHNyqcMUuK0lMQ2P12a7wZdBbwddLHYBJCYIKo5nCQAmBxIhObCVOcbssx3OgV9SUyoEJf+LGLB/6DCBtUYIElh516iJFfjKmGgqC/7b/4O4xBZhk+o12Dphu5KdDS8KnQdjgG0HukwiLfM1ba1pAh4xl5+0hOZu678V7n2c8gvT2MaTCnVD7cu3F9mc9VSKLDkKrMC4Je26SWvNgjtAf7iIolwRUnWE+xr0BFpMS1T66+GmcIiix5ryB yslXheut 61J/U7wMO3BHRo3ZDS7JxwsX2CNAwho1JTew00LjTwIQfgHkZzB3wiG7zE42G33kgiCwlvkfkUG7abGTJzKmidDwVCrBUU/tffTqsK/9ym0Pcj77EDGJGM4C23e0Uu8eCuffWol2i0W2ABFdyzyLVxO31w2/8P8LdKZ8bzXphzKPvT02DXjxcVC/8a/F0xzIO1x8Lye81ht66aHCNpczPgQxQ1ANstW6YCszdaGmnE3VcUeZOdEa4cSj1HQuLH0bPBwG/Z/XVOY4nm4jFt+GNKdJ7CdoPjgQ95RC/TQncaOjYPgaiHNPsHiRR3A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000014, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: ---- On Tue, 10 Sep 2024 18:53:23 +0100 Keith Busch wrote --- > On Tue, Sep 10, 2024 at 06:27:55PM +0100, Robert Beckett wrote: > > nvme.io_queue_depth=2 appears to fix it. Could you explain the implications of this? > > I assume it is limiting to 2 outstanding requests concurrently. > > You'd think so, but not quite. NVMe queues need to leave one entry > empty, so a submission queue with depth "2" means you can have at most 1 > command outstanding. > > > Does it suggest an issue with the specific device's FW? > > I think that sounds probable. Especially considering the dmapool code > has had considerable run time in real life, and no other such issue has > been reported. > > > I assume this would suggest that it is not actually anything wrong with the dmapool, it was just exposing the issue of the device/fw? > > That's what I'm thinking, though, if you have a single queue with depth > 2, we're not stressing the dmapool implementation either. It's always > going to return the same dma block for each command. > > > Any advice for handling this and/or investigating further? > > If you have the resources for it, get protocol analyzer trace and show > it to your nvme vendor. Unfortunately this is infeasible for us. > > > My initial speculation was that maybe the disk fw is signalling completion of an access before it has actually finished making it's way to ram. I checked the code and saw that the dmapool appears to be used for storing the buffer page addresses, so I imagine that is not updated by the disk at all, which would rule out my assumption. > > Right, it's used to make the prp/sgl list. Once we get a completion, > that dma block becomes immediately available for the very next command. > If you have a higher queue depth, it's possible that dma block is reused > immediately while the driver is still notifying the block layer of the > completion. > > If we're thinking that the device is completing the command before it's > really done with the list (which could explain your observation), that > would be a problem. Going to single queue-depth might introduce a delay > or work around some firmware issue when dealing with concurrent > commands. > > Prior to the "new" dmapool allocation, it was much less likely (though I > think still possible) for your next command to reuse the same dma block > of the command currently being completed. > given this ~9 year old temporary fix is still in the kernel for the Apple device, could we just add another device specific override? I could maybe convert it to a quirk that is set for them both (and any future devices)