From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72117EE57C6 for ; Wed, 11 Sep 2024 16:58:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 087D4940074; Wed, 11 Sep 2024 12:58:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F2092940066; Wed, 11 Sep 2024 12:58:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC155940074; Wed, 11 Sep 2024 12:58:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B4F16940066 for ; Wed, 11 Sep 2024 12:58:21 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 787CFAA6B4 for ; Wed, 11 Sep 2024 16:58:21 +0000 (UTC) X-FDA: 82553065602.09.45D0D99 Received: from sender4-pp-f112.zoho.com (sender4-pp-f112.zoho.com [136.143.188.112]) by imf26.hostedemail.com (Postfix) with ESMTP id 87BCF14000D for ; Wed, 11 Sep 2024 16:58:19 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=collabora.com header.s=zohomail header.b=CWzqApk8; arc=pass ("zohomail.com:s=zohoarc:i=1"); spf=pass (imf26.hostedemail.com: domain of bob.beckett@collabora.com designates 136.143.188.112 as permitted sender) smtp.mailfrom=bob.beckett@collabora.com; dmarc=pass (policy=none) header.from=collabora.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726073795; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1UiwXcX4jKi45nQYZ/MfB8lQxx/7ODIvmMlalZCaCIA=; b=WwqfTvH9snLlwadpptPjKUEdTYjZ7N/Eei227J+DOljFiRlF4cOzAFweg6UVdHJt3pBnPd 2KtFyWw38Ezx7xgRruMOFx56ek5V1pwt0PgKHKj6XYdtBe3tLU/k8syUynbEQm+QhhL53G ekTeqoTJlNLM0HBu2kQuX02ah57cqmo= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1726073795; a=rsa-sha256; cv=pass; b=BmLNyWm0b1KlFDWFXqAhD3qcQC2F3hxKHvy39Jw8d3ml491BN8wxftFX6Qs+eYTrJ9tNV4 GiVUuz1aB01PFABYG28I/tu+jufFMsxl2CIbkH8+wZmvHhOCzZ2abpoPDR5Y1fdi9myT+B ekqiHJxyUChxZnZK/z54mWBzxzoJodM= ARC-Authentication-Results: i=2; imf26.hostedemail.com; dkim=pass header.d=collabora.com header.s=zohomail header.b=CWzqApk8; arc=pass ("zohomail.com:s=zohoarc:i=1"); spf=pass (imf26.hostedemail.com: domain of bob.beckett@collabora.com designates 136.143.188.112 as permitted sender) smtp.mailfrom=bob.beckett@collabora.com; dmarc=pass (policy=none) header.from=collabora.com ARC-Seal: i=1; a=rsa-sha256; t=1726073890; cv=none; d=zohomail.com; s=zohoarc; b=CWb2g7MDzmelk7dYlzqqf+jSCNaR0cqpPXGmnNTOAEFEeqZDAw6ojQ0JW0pLujnXq9wFKfs/9isDlPCbRAeSkzuMU7S4mwmcVBplv9KK8tsVtuLTZwW1n3mhF9Rhw0AFB4ulnwnXVM8NN1Jv0TbiXu8k00Cko8t3kubOcbn5DnQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1726073890; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=1UiwXcX4jKi45nQYZ/MfB8lQxx/7ODIvmMlalZCaCIA=; b=IhRqz6bizoQxRGpN40XFx+zY6Rvefk3mMCB3kth2lkn4Y3eLo0RtYdFd42uAekUrd8PaR1+qNYt6DhyiBl1oM/NHLsk6HGPsBPpDBCLhe8miVhvOJtbFaKUOjB8bTtcDdv9iqKZPXV4uYxHCKm3OwM/bu61hb3YqrtB3HQLj3N8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=collabora.com; spf=pass smtp.mailfrom=bob.beckett@collabora.com; dmarc=pass header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1726073890; s=zohomail; d=collabora.com; i=bob.beckett@collabora.com; h=Date:Date:From:From:To:To:Cc:Cc:Message-ID:In-Reply-To:References:Subject:Subject:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-Id:Reply-To; bh=1UiwXcX4jKi45nQYZ/MfB8lQxx/7ODIvmMlalZCaCIA=; b=CWzqApk8j5mB9be+yl6jEG4RgPW3nZhzO9pILg2h/2PbKNQV0vrFlAYQ837dQRxO 7RjZOQuQ3NUPQV9nFftfa4C5rYpL2Dz8NOmGnr0zzb4VcCKuj7p2VCTSwIiPp3uN8nX rK39l5jXYx4nykS5yMZdipmAavjoGh3lPRyf09ZE= Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1726073859497154.92140523446767; Wed, 11 Sep 2024 09:57:39 -0700 (PDT) Date: Wed, 11 Sep 2024 17:57:39 +0100 From: Robert Beckett To: "Keith Busch" Cc: "linux-nvme" , "Jens Axboe" , "Christoph Hellwig" , "Sagi Grimberg" , "Andrew Morton" , "linux-mm" Message-ID: <191e204ed94.126febf821607337.6118223401232629522@collabora.com> In-Reply-To: <191e203fd17.118394e2d1606957.7379866259983341167@collabora.com> References: <191d810a4e3.fcc6066c765804.973611676137075390@collabora.com> <191db450152.e0b28690987786.6989198174827147639@collabora.com> <191dcfa4846.bb18f3291189856.1624418308692137124@collabora.com> <191e203fd17.118394e2d1606957.7379866259983341167@collabora.com> Subject: Re: possible regression fs corruption on 64GB nvme MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Importance: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 87BCF14000D X-Stat-Signature: ca5eytkdcy6yigp4z1ds5h58atgx15zw X-HE-Tag: 1726073899-229180 X-HE-Meta: U2FsdGVkX1+cfu7P6sMSLg6yo6yBwZV7/g94UYCMbMlee5FR5v111a3H6Aoibxal2SMQkyGjht7icbYfTDCGikt8JNZSaANxR+sX9BFIBYqxLQPvsRP6KAciPb20QezmeI5Uzl5KsPvy8mw5i1pBH45r2bz8z6QUhTjXx5uBoTdX5MZzlMFwbEdMVyHC4yxAwlEgnVx9Rml0V9eD4YaD5iliU8KfX9uedwhuDQSVpxrFd9txnJ4/cnqg/oZnNpEIk6q/qdfY2Pc5q3UbFYLtoWS8gbvmaD5SEYCtqe7+a24OwBEbx2aciC1LdDi/LGg+57qBa38NXPAbJcNj/oewMDSZwKnB56BYRvwDeBgTX9EZQ2AEygW6mMDtcpZ4k7wiwA+hLni1Lmk019frQAZ+6jsKAUOAinnpYfMRUwJ8aOtB9uGI1NgJZOc5vObLKKaW9YVV+Yr28ItvxhzIhnx7tVkgA2GoTibI8ZrJppRgpqmRkoPmZYZxKG29ya7p+WCg+1/zGmVmyiuqde4Whk72N7uQsQXV9P0HgUUC9vYT1B2b20iAW30epmDOHddrtu5cmhlUm9UF9e6jL/IdqE5fMk5C3VHikhvA4XuTlpmWKmljW1QtsWr0pwiZ7vp3nl+EPtRtdtScMSVMet+gjZwKO83XaO2X8pVDta0W8sI1g9SE6sc85vPHMB4CrDEg8Yd5CFI+uAhUL5XrRC/l31QCd87VU3mhDkiQr0x+YEaFc8uwBUiy2KeSPOeErCvKly8Yk2bg19r5dnbGu3/z96DX4BVdAp300v/wxKyWDaZHmPwBS23ugu9eh2X2QYwfruYPxsJgOvVsC5644mZWT7Cm0iy/wTGgakHY9+KmhU1kECXeLy52oaubvpYYaI/BGtMPqI+SxMNsGcdn1lHylHcAjFlZUgiSRKQJqMta03JFMMgVxV8cUnFyRK/X1+NiLkPHCdKSy0LwBZHeN73K26J eFvBKbmO tAO0nwoj4NXD3wTvauJwTCDcTO8POYso5t0S74m9CTgeknNSIlBWLln38mGTundFoUJtOXW+dRU+Wacx3IWaIGUEF/GHXYpb6rBvDoAcdt5TW9TXmznTogGmE9o4VDC5LtFN+Ojvbmswf5sZvfVRM+SSGD7hx9qO0A3ziq7HolDP+wn2eJOMcVTlL0qrevUb+1Y3A/SMhGwFHcgFVOOC1XUgZYN1L/3PCz6t7cRAOeiCc8CzVBN0E2qE6Vs8/k+SAJ5Hosq4k/jTtGEdyeEiEI6fmoySOjOT4AEzd9m8gGURn/4L1eQ4ZBTSG9i8jK3WrxJA+IniHLe8lAVM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000035, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: ---- On Wed, 11 Sep 2024 17:56:37 +0100 Robert Beckett wrote --- > ---- On Tue, 10 Sep 2024 18:53:23 +0100 Keith Busch wrote --- > > On Tue, Sep 10, 2024 at 06:27:55PM +0100, Robert Beckett wrote: > > > nvme.io_queue_depth=2 appears to fix it. Could you explain the implications of this? > > > I assume it is limiting to 2 outstanding requests concurrently. > > > > You'd think so, but not quite. NVMe queues need to leave one entry > > empty, so a submission queue with depth "2" means you can have at most 1 > > command outstanding. > > > > > Does it suggest an issue with the specific device's FW? > > > > I think that sounds probable. Especially considering the dmapool code > > has had considerable run time in real life, and no other such issue has > > been reported. > > > > > I assume this would suggest that it is not actually anything wrong with the dmapool, it was just exposing the issue of the device/fw? > > > > That's what I'm thinking, though, if you have a single queue with depth > > 2, we're not stressing the dmapool implementation either. It's always > > going to return the same dma block for each command. > > > > > Any advice for handling this and/or investigating further? > > > > If you have the resources for it, get protocol analyzer trace and show > > it to your nvme vendor. > > Unfortunately this is infeasible for us. > > > > > > My initial speculation was that maybe the disk fw is signalling completion of an access before it has actually finished making it's way to ram. I checked the code and saw that the dmapool appears to be used for storing the buffer page addresses, so I imagine that is not updated by the disk at all, which would rule out my assumption. > > > > Right, it's used to make the prp/sgl list. Once we get a completion, > > that dma block becomes immediately available for the very next command. > > If you have a higher queue depth, it's possible that dma block is reused > > immediately while the driver is still notifying the block layer of the > > completion. > > > > If we're thinking that the device is completing the command before it's > > really done with the list (which could explain your observation), that > > would be a problem. Going to single queue-depth might introduce a delay > > or work around some firmware issue when dealing with concurrent > > commands. > > > > Prior to the "new" dmapool allocation, it was much less likely (though I > > think still possible) for your next command to reuse the same dma block > > of the command currently being completed. > > > > given this ~9 year old temporary fix is still in the kernel for the Apple device, could we just add another device specific override? I could maybe convert it to a quirk that is set for them both (and any future devices) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v6.11-rc7#n2570 >