From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2B11C52D6F for ; Tue, 27 Aug 2024 04:50:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 588326B007B; Tue, 27 Aug 2024 00:50:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5383E6B0082; Tue, 27 Aug 2024 00:50:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 426D86B0083; Tue, 27 Aug 2024 00:50:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 247466B007B for ; Tue, 27 Aug 2024 00:50:35 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 97E061C5546 for ; Tue, 27 Aug 2024 04:50:34 +0000 (UTC) X-FDA: 82496799588.18.113752C Received: from out-170.mta1.migadu.com (out-170.mta1.migadu.com [95.215.58.170]) by imf10.hostedemail.com (Postfix) with ESMTP id D08EEC000B for ; Tue, 27 Aug 2024 04:50:32 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=i8xcbSD1; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf10.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.170 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724734163; a=rsa-sha256; cv=none; b=cpKyNSI2y/QE0CLcGM4k03Kq40C9WOQNqEg9cysrpaPw0vPkc+cLqh6umaNS1QWhHN5DGX +bLSr1yUh0bnuVMgRb94DBrQ3evfBmHLGjl1K12DosiBEk6H1bNYaCwLJkcpVoWmSh2qHg 8gcEkq6P9wHkC3pYv5Cwq9y9nkqZ84A= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=i8xcbSD1; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf10.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.170 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724734163; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=WKIOaTG/O7gm+4/VSs3liOn/sNcWabSkU8R9R2Wzwc4=; b=L8KDhqfkty1qOZrS52Lf6M30PsgIZtKmPTDtifXD5Rx0rVoTpmm2k59UL92BdxSNkuPjqN U+LCuMqrEr8Y2VKfzVaLnWtFDr6cX9egJY5S4O9dOs9tLn80uh7gHsb7pGNyh0vUtwqMLK JWkNlQksU6R1Cggii7l6UEa+2ycf5vo= Date: Mon, 26 Aug 2024 23:29:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1724729396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type; bh=WKIOaTG/O7gm+4/VSs3liOn/sNcWabSkU8R9R2Wzwc4=; b=i8xcbSD1ASqsUiZOK46wyT2H2y2KvX+yFJtQNePtQ8Jhn/SeA770yXrIrhJdqoP9xMD2qC DISxxYz1yd34hv1DRkXqrnKCS+gNxCY/JwgpU6yYnllUYTLfiaKvLLoo35WY+ECvHo0DWR +zAXU9wG/OplzgB1NY2W3hNLUH3afng= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-bcachefs@vger.kernel.org Subject: bcachefs dropped writes with lockless buffered io path, COMPACTION/MIGRATION=y Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D08EEC000B X-Stat-Signature: hz995ughc4bw7kga1mheqdbazce4piok X-Rspam-User: X-HE-Tag: 1724734232-566195 X-HE-Meta: U2FsdGVkX18M3UVexpiL2rpSQP9xFSd5KZF3lnoKdB/mSENKJJ0sAdGiCsJt26xiLSQrMYhKGri9EyYQrI1feek7hHsIduWmieDV2DdJJKGrH3kqpDLG5i7VBd3n9d0fe3QtJ+3Rafij540FHxrsKIxKR0nDI8USwSVYM7QggIEfYIEGEAUBRI7XtGLt4Ksk+7+ba25os3lBpnJgtaoGNRQkYE7moIbEplmuWKOkzwWirE1sT5+KY7BcttYziigla9YxDsYCe8jtQwMgo3pVHx1MuTKFNxXZjW+Qzj3J9yScdpAGH2xtIeOifBH+ncceKgEMOVOfvoee8gmDlDC6ZV9/Ab+yaE8B8vFgGW7SKdAa0CyPDZNGTDilz/Y60IB3zSu00v1daYmS3eGIc7d3FYlCv6XUsg5Aq4bbbdVoi91jh6CVURAEaWo63l3cCx0USao82yGJ6STX+GVl5rH/bECdvQiM3uEd1yUD0bAfOL8NRHJtVHbtoamf/Qmrx4Vy/jF5QgAkUzxDxHHx1zoWoaPzU/rwzm0I32vmHmNmTFChZr9wNp9ayfkFhXzqA620Vzgc2pLb8/dsZUh42irag8M9yWjTxHugnssLyfeu/CmSIKLj4kK72qos9UHnvqCfUCo8fvolEtnXHQvs4dHukDOERl8JMo5G6A1akQFmwRzJtBYPfBBqy8meY2rQCmib4d19/y+TppB1YccsefiG48C3mbKxCkKIMuj3NG+T/hU9pYjxVbGpVllxbXwlfZNt5wK4GTM2aap3unYpjQropfxIjSqb+ghSUwfL6aH5gd/edun7QYEVFTw2i70sYXEcvbgY5ycGB5iQILVokurBcHkCpTK095F1M4aVizEFzZVOBiJf1A7elk6VzJzw0u9kMOdOPL4HAIyiXG/CDzkoEKHZvrCy8T02bA9Ru2f6vWL3R88ebOJdUenySvQcmltwywzxC3TC+Pq9cYZTlhT oPxzyNl8 jbMaGW8/VJAC9kdGfqszVKKocLkPTJt7nkqwfLiqTxmsx+/oHNwmHuAJR7IoXI/WGlpsU937fUr4u4hTV9LOVxaHQeEo3S9FX2A5ZRQSvWi4NrWTZZImUjoJ8p/9XcNrbpV8HFHfMu87tLONDoREIq3uot6FnB4FHCUO+paWHx+uci3A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We had a report of corruption on nixos, on tests that build a system image, it bisected to the patch that enabled buffered writes without taking the inode lock: https://evilpiepirate.org/git/bcachefs.git/commit/?id=7e64c86cdc6c It appears that dirty folios are being dropped somehow; corrupt files, when checked against good copies, have ranges of 0s that are 4k aligned (modulo 2k, likely a misaligned partition). Interestingly, it only triggers for QEMU - the test fails pretty consistently and we have a lot of nixos users, we'd notice (via nix store verifies) if the corruption was more widespread. We believe it only triggers with QEMU's snapshots mode (but don't quote me on that). Further digging implicates CONFIG_COMPACTION or CONFIG_MIGRATION. Testing with COMPACTION, MIGRATION=n and TRANSPARENT_HUGEPAGE=y passes reliably. On the bcachefs side, I've been testing with that patch reduced to just "don't take inode lock if not extending"; i.e. killing the fancy stuff to preserve write atomicity. It really does appear to be "don't take inode lock -> dirty folios get dropped". It's not a race with truncate, or anything silly like that; bcachefs has the pagecache add lock, which serves here for locking vs. truncate. So - this is a real head scratcher. The inode lock really doesn't do much in IO paths, it's there for synchronization with truncate and write vs. write atomicity - the mm paths know nothing about it. Page fault/mkwrite paths don't take it at all; a buffered non-extending write should be able to work similarly: the folio lock should be entirely sufficient here. Anyone got any bright ideas?