From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66C2BC54E58 for ; Mon, 11 Mar 2024 22:38:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EFD5B6B0142; Mon, 11 Mar 2024 18:38:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E90CE6B0143; Mon, 11 Mar 2024 18:38:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D268B6B0144; Mon, 11 Mar 2024 18:38:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BD2EC6B0142 for ; Mon, 11 Mar 2024 18:38:19 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8957C14035E for ; Mon, 11 Mar 2024 22:38:19 +0000 (UTC) X-FDA: 81886223118.07.96D95B8 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf06.hostedemail.com (Postfix) with ESMTP id 0436F180003 for ; Mon, 11 Mar 2024 22:38:17 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Zc5NxIpL; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710196698; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8kfvUILs31CfxbizHhiyoKisPDZ2/e5iL8l9Jp17Jh0=; b=Hb0DwTI2Fccnvzn4BlnSjB8XwxjlXC0VvmSXWwyK3BQgJfgjdxr/bjFilDl4ijwmAfIs6E iV14caa9tnhWoolkKtNePP/O5eGt10NkiMGzW+fz0DTVO09tNLrpUeuAduWpNJvsiVE1c7 u/YItzh+mkiK1LI/HhKcIeIfUU0jElQ= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Zc5NxIpL; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710196698; a=rsa-sha256; cv=none; b=GESCzO8FmfV7vjxASyWcXCDZmxf9Rp47a/bR/aPT5bthmovQudRRb/IOI0UJcLf0ue2Ug1 L5woJAP5L051LGPkl1SF5EXgSKBUErVRxMZq4GYfNtNx9V22SSKK4nyQpW8OxGTNVtNj1K uIqlElAdOBs8KOYujXo/J4PLikiWkPk= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id E3D0861017; Mon, 11 Mar 2024 22:38:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7D870C43390; Mon, 11 Mar 2024 22:38:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1710196696; bh=hJur/pDiHJhAry8SmKfQygvh/y7PJVKL8g1n94l4nyQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Zc5NxIpLQhKwbRvhfYCZw5nNzCYgx8pGde02Jr+kqIsJ2Ya6IfzJxLXRLmqHVa/MS PnuRTFTd4GJ5ArL3Y1zv6GA92/Ugc9fzCiTpLObZDi7Foqo3HduW4a9/3GoK9rWHrR lXigdAJaXtyDSX5ZTBVZeWqh7doOG5Z0lXGC5sKkAG42SR0JDw0xE2LHsxbogyni/t Cxo3ZccaqAD2AOpLPQTeEwJ3i74ISBWRxe6hXwZbV6zMdUGAII5Xj6cqIYkqNxcZ0C UFODpOUC+2gOeL1TL0HEYpsvOIRtKBfKrnOYkBMC/uwTOx7W2tssR8OccyvqTpF4fN BRCOXZOSiWXDw== Date: Mon, 11 Mar 2024 15:38:15 -0700 From: "Darrick J. Wong" To: Matthew Wilcox Cc: Andrey Albershteyn , fsverity@lists.linux.dev, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, chandan.babu@oracle.com, akpm@linux-foundation.org, linux-mm@kvack.org, Eric Biggers Subject: Re: [PATCH v5 06/24] fsverity: pass tree_blocksize to end_enable_verity() Message-ID: <20240311223815.GW1927156@frogsfrogsfrogs> References: <20240304191046.157464-2-aalbersh@redhat.com> <20240304191046.157464-8-aalbersh@redhat.com> <20240305005242.GE17145@sol.localdomain> <20240306163000.GP1927156@frogsfrogsfrogs> <20240307220224.GA1799@sol.localdomain> <20240308034650.GK1927156@frogsfrogsfrogs> <20240308044017.GC8111@sol.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240308044017.GC8111@sol.localdomain> X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 0436F180003 X-Stat-Signature: iuuttgua8gu4s3tjwh5ekiug7gqodz1b X-HE-Tag: 1710196697-918033 X-HE-Meta: U2FsdGVkX1/ud7Fxw2zPSo6+rxnNzVBnwjYqEYLUI3xCkn+4cPPIztH5wjd+vlO7Ud6Y+lQ9WysKciIJSoCol+bt3RloA8j0MZRcgyFCehRzdONcw6wxuVcXh4YhZQXMO5hVESQXgY6/yo7DsQgI5NKNo+UIqmXxajHBCKJnDZWiJIaux90lmvPLLFZ9ac18RUaeFTjiycB2xuWa26cQcUAv1l3e4O64u8wezb7CeoeS5VZPCLJEh2pLMbC6COg98iMsxdA8JX9UftI3wXFlHcRTYDviUP6XqSHTId4E0TUHv5EwN6lT7K6b/nQHYN/ZKe3tLpke0MgcsErmj8DvA+/oF/9m/fZWORWGcBZ/qaKUeYuea02An0Gx8hMo3kd7Kz6pVytkr47F8YI+NNmgl4VMmDAnbzlkNQ+n0n+4+XiWrKBCoskQPGnEqRQk/pPhsODtzbLxLcx8P3Bko4kl5AMALJ7RWCuF0ONZ1mjA5S+FJtEQ0w+VwXrU8Xo05N743wjTIM2cQvcUDH/5MvfBDZjWJr0XgEE2zXKQe7YUjR+kY0w+76oSCVWMZLcm+GOWUK3H6s4Og2G9ay6WkMot93GhnpoMHhbj4IiabZoNpD6CQoBUAY1SZy2tU/NrSp3pPsX1nPiz8Cqs2gRUTtyV3yw4LPe5AuJhE3CQ6/+tOWRYGC4VuMN8HAxXa/8tVeGxxbJlPNUP5iXuYm3u9PCV8DuKiPfgpHNwaU427oBmz4xF5Z/ULzqa/Gnzbhkq+pfdl2Rzid4Jx1d1Mqjk0iwjR8ig43YZxEJhQumjmxj5x8xGTr205wBz2XzIcQ4YHIjSv3yiasZE2HosxUyK9nhJHOh9pdAX3C2v+yC5mCmi6+NV7kvdQOZunBuTtT/SXALd2W1VCyBbyO49FJjr/xfv0Fjmc7I0H6voYa0zfiDFb6GhCv5smO/5xg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [add willy and linux-mm] On Thu, Mar 07, 2024 at 08:40:17PM -0800, Eric Biggers wrote: > On Thu, Mar 07, 2024 at 07:46:50PM -0800, Darrick J. Wong wrote: > > > BTW, is xfs_repair planned to do anything about any such extra blocks? > > > > Sorry to answer your question with a question, but how much checking is > > $filesystem expected to do for merkle trees? > > > > In theory xfs_repair could learn how to interpret the verity descriptor, > > walk the merkle tree blocks, and even read the file data to confirm > > intactness. If the descriptor specifies the highest block address then > > we could certainly trim off excess blocks. But I don't know how much of > > libfsverity actually lets you do that; I haven't looked into that > > deeply. :/ > > > > For xfs_scrub I guess the job is theoretically simpler, since we only > > need to stream reads of the verity files through the page cache and let > > verity tell us if the file data are consistent. > > > > For both tools, if something finds errors in the merkle tree structure > > itself, do we turn off verity? Or do we do something nasty like > > truncate the file? > > As far as I know (I haven't been following btrfs-progs, but I'm familiar with > e2fsprogs and f2fs-tools), there isn't yet any precedent for fsck actually > validating the data of verity inodes against their Merkle trees. > > e2fsck does delete the verity metadata of inodes that don't have the verity flag > enabled. That handles cleaning up after a crash during FS_IOC_ENABLE_VERITY. > > I suppose that ideally, if an inode's verity metadata is invalid, then fsck > should delete that inode's verity metadata and remove the verity flag from the > inode. Checking for a missing or obviously corrupt fsverity_descriptor would be > fairly straightforward, but it probably wouldn't catch much compared to actually > validating the data against the Merkle tree. And actually validating the data > against the Merkle tree would be complex and expensive. Note, none of this > would work on files that are encrypted. > > Re: libfsverity, I think it would be possible to validate a Merkle tree using > libfsverity_compute_digest() and the callbacks that it supports. But that's not > quite what it was designed for. > > > Is there an ioctl or something that allows userspace to validate an > > entire file's contents? Sort of like what BLKVERIFY would have done for > > block devices, except that we might believe its answers? > > Just reading the whole file and seeing whether you get an error would do it. > > Though if you want to make sure it's really re-reading the on-disk data, it's > necessary to drop the file's pagecache first. I tried a straight pagecache read and it worked like a charm! But then I thought to myself, do I really want to waste memory bandwidth copying a bunch of data? No. I don't even want to incur system call overhead from reading a single byte every $pagesize bytes. So I created 2M mmap areas and read a byte every $pagesize bytes. That worked too, insofar as SIGBUSes are annoying to handle. But it's annoying to take signals like that. Then I started looking at madvise. MADV_POPULATE_READ looked exactly like what I wanted -- it prefaults in the pages, and "If populating fails, a SIGBUS signal is not generated; instead, an error is returned." But then I tried rigging up a test to see if I could catch an EIO, and instead I had to SIGKILL the process! It looks filemap_fault returns VM_FAULT_RETRY to __xfs_filemap_fault, which propagates up through __do_fault -> do_read_fault -> do_fault -> handle_pte_fault -> handle_mm_fault -> faultin_page -> __get_user_pages. At faultin_pages, the VM_FAULT_RETRY is translated to -EBUSY. __get_user_pages squashes -EBUSY to 0, so faultin_vma_page_range returns that to madvise_populate. Unfortunately, madvise_populate increments its loop counter by the return value (still 0) so it runs in an infinite loop. The only way out is SIGKILL. So I don't know what the correct behavior is here, other than the infinite loop seems pretty suspect. Is it the correct behavior that madvise_populate returns EIO if __get_user_pages ever returns zero? That doesn't quite sound right if it's the case that a zero return could also happen if memory is tight. I suppose filemap_fault could return VM_FAULT_SIGBUS in this one scenario so userspace would get an -EFAULT. That would solve this one case of weird behavior. But I think that doesn't happen in the page_not_uptodate case because fpin is non-null? As for xfs_scrub validating data files, I suppose it's not /so/ terrible to read one byte every $fsblocksize so that we can report exactly where fsverity and the file data became inconsistent. The POPULATE_READ interface doesn't tell you how many pages it /did/ manage to load, so perhaps MADV_POPULATE_READ isn't workable anyway. (and now I'm just handwaving wildly about pagecache behaviors ;)) --D > > Also -- inconsistencies between the file data and the merkle tree aren't > > something that xfs can self-heal, right? > > Similar to file data itself, only way to self-heal would be via mechanisms that > provide redundancy. There's been some interest in adding support forward error > correction (FEC) to fsverity similar to what dm-verity has, but this would be > complex, and it's not something that anyone has gotten around to yet. > > - Eric >