From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8F38C52D6F for ; Tue, 27 Aug 2024 03:36:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FE5C6B0093; Mon, 26 Aug 2024 23:36:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 488096B0095; Mon, 26 Aug 2024 23:36:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 328026B0096; Mon, 26 Aug 2024 23:36:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0FA5E6B0093 for ; Mon, 26 Aug 2024 23:36:38 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8A3F1A9D96 for ; Tue, 27 Aug 2024 03:36:37 +0000 (UTC) X-FDA: 82496613234.11.D8D0708 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf03.hostedemail.com (Postfix) with ESMTP id E358520007 for ; Tue, 27 Aug 2024 03:36:35 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ttq4LiKK; spf=none (imf03.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724729709; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6tyAm2bI6jnhZGsoe6xMNZqvocekGaBxCFlm8ROaKf8=; b=ZNz1CydyHgqYHdVr/XU4Fbl6RFIQQOB90pOxqV6pXn/JBqxibWnGzzmG0lFWhtumngqLMf ho2ncAnW2+ReXnnnVnZu8eGlIMO8ODsA5+VByWKtiB5MpaQsyPvfNgdFWe0aFJKQ2sZOMK Pq3BiDkHWdz7MPRg3WZu2MiUgb8cHv0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724729709; a=rsa-sha256; cv=none; b=uC4jZNFjoiw/sHeHdZEFZ9fKCQ6kldXzKdQpRqzkauNz0HhSkJ1srAC+gAS4LlKBZP3aZK mfc3N7Q/w5rHY3urfJtKDAPkNDJOTa6b7O470Qv1CtZ7MyIvw7zy6z6EVGB7mUBZx1r+FX E5rvGDftwK2FfaQy93Kh9enr1UGWp8Q= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ttq4LiKK; spf=none (imf03.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=6tyAm2bI6jnhZGsoe6xMNZqvocekGaBxCFlm8ROaKf8=; b=ttq4LiKK72IodBkv3KiZ4MYohr Zk79dyOcxdMU3A4BglAZfnNL9CfTPmtY0uatfIJecWYzTyxJDXUHwSeqwqeSKFKzgz758jHoj3FwW dFZ0pi7p57J5HwcHbZUVcGMikKJE15BdWN3+nsWXHlTnDlsZkAKvTZBa89IT8Q2kwXauyToRp4Lju zqwANiP1HOt9LFJUJ+P+oTD1M54qm2iBgEtZArNTGeOaLc3YCwwPGNZYd5iWjXfNdzvpc6oIj+Kxh 5L81rBQYEPFZoyA2gmDsXenTZ+rqWQs8ELSI5Yi2fuCEwlEs26cirtZgbFOEUMH5XZBj1pxRtevyS JI0J9oNg==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1sin0T-0000000GKx0-3RU4; Tue, 27 Aug 2024 03:36:34 +0000 Date: Tue, 27 Aug 2024 04:36:33 +0100 From: Matthew Wilcox To: Kent Overstreet Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-bcachefs@vger.kernel.org Subject: Re: bcachefs dropped writes with lockless buffered io path, COMPACTION/MIGRATION=y Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E358520007 X-Stat-Signature: j8y7kwoiw9m3cgbcxxi3qk33t4tpyhix X-HE-Tag: 1724729795-506153 X-HE-Meta: U2FsdGVkX1/JJNwnL6zWgb3bhep23HOohr7y6GoFvJO+TBbL4pHvDshew7yIBsf6yRdFkk7stuDckNnkBnnuU2/obyWdVAN54IFw8j2/x5oTANQoNAzrpF9GAfTy9Tya7bdN0LvLAr/b+6pt/mjkZnWbwMhtDZBqBJ2IRoB5ixzGEEwJ3rB7nkrzETvkwmNa0KsRw2UZm+7I0CogvO51h7Zb8US+t54rCOddqVPWjOwmSJZoJxPxo1HyVSFNEVTjE2kUkb284qi8tJoJSThx5DT1O4eT6Qor/A9HDaEZzrKWsGrb8XAGncOhP6KzyWgN+dFUxA2OX6uQPGCmVCuClUS2v8zrz3MOlQjYmSKSq9tG/t5YCRV7FsRgMCcrudxMu+GjfhQXBTKSjrJiFBg8dY4DblxqHNqVuNwmfKFnGZoWpqYmNplG/YA/FNveGXKpwq4QallemQV7WLaRPekVg9Sy7QgVhe140K27tLizgHwvApO8P+qiDnKSRFnH9H6qmeFORhZW3RicidiJqwmGHXPxJW/w/dU80qDrDGKNmhlGEPmKlDkj4cN5MvhqIGlkrD0tTNvBawSiqd8O6bMoy2E/S+dwbyD/D0phyfVidCDt4t/BQUJEGV9PPL45u0ztVyO64AvwS9BVwtCJPDiscpohPGgvOG/RaykspikiivT7hO1K1Gd58kDwPNQCLQkWFDkEIbRuRcFP/HC6yx0VKHhuPP6V8SqUnFEnxTThhfWrEWFAx6zyM+snz9iGvQqLpQUqFelbYlkUiuuUrRg9QVupWCb6RHyjNe3BPR0tjIek28iViRUuQZZ0yoOGptH1rhk9bohmlbZ9jW81DJNrPsPSpoKxeKykuKDNzODxgxeT3tR7vJGhZFTT2DbYPL/U94RrImSufxA6/dOYMwHrSEUrIj3vhFP53XrJeRkDaWn7t7YUjfGmUVOpVdw3b+Ea53mscBnze9DK4yOxBqy 81Ws5ioq kbnxX7aFRDzfnis0IbaT8gs+Ooh5cWzzJ+8GFPrSdJ+1COihjiC9uTCT0kSqBN/YFEDlmUdXJGYemoh7Q5DdXFU42Dbk7ExACNyjNjWK1yrhg46C97hOdI24RD7PbIEsUU5hYFTG3E1y5XrHo+YgVzRJcwXMohkOw9SqZkr8/y3UC3jRqj4rmqtNus40C+SjcJXO5C2Lc3mcIvzku3nEjI+qe15/3HSoo6Ge6BWTFbPRL9nIzkAPXuOEpEn+LK29Tw4Hwcoeh0L3EIrM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Aug 26, 2024 at 11:29:52PM -0400, Kent Overstreet wrote: > We had a report of corruption on nixos, on tests that build a system > image, it bisected to the patch that enabled buffered writes without > taking the inode lock: > > https://evilpiepirate.org/git/bcachefs.git/commit/?id=7e64c86cdc6c > > It appears that dirty folios are being dropped somehow; corrupt files, > when checked against good copies, have ranges of 0s that are 4k aligned > (modulo 2k, likely a misaligned partition). > > Interestingly, it only triggers for QEMU - the test fails pretty > consistently and we have a lot of nixos users, we'd notice (via nix > store verifies) if the corruption was more widespread. We believe it > only triggers with QEMU's snapshots mode (but don't quote me on that). Just to be crystal clear here, the corruption happens while running bcachefs in the qemu guest, and it doesn't matter what the host filesystem is? Or did I misunderstand, and it occurs while running anything inside qemu on top of a bcachefs host? > Further digging implicates CONFIG_COMPACTION or CONFIG_MIGRATION. > > Testing with COMPACTION, MIGRATION=n and TRANSPARENT_HUGEPAGE=y passes > reliably. > > On the bcachefs side, I've been testing with that patch reduced to just > "don't take inode lock if not extending"; i.e. killing the fancy stuff > to preserve write atomicity. It really does appear to be "don't take > inode lock -> dirty folios get dropped". > > It's not a race with truncate, or anything silly like that; bcachefs has > the pagecache add lock, which serves here for locking vs. truncate. > > So - this is a real head scratcher. The inode lock really doesn't do > much in IO paths, it's there for synchronization with truncate and write > vs. write atomicity - the mm paths know nothing about it. Page > fault/mkwrite paths don't take it at all; a buffered non-extending write > should be able to work similarly: the folio lock should be entirely > sufficient here. > > Anyone got any bright ideas? No, but I'm going to sleep on it.