From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADE95C4706C for ; Tue, 16 Jan 2024 11:00:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B4B16B006E; Tue, 16 Jan 2024 06:00:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 164CC6B0074; Tue, 16 Jan 2024 06:00:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02C276B0075; Tue, 16 Jan 2024 06:00:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E10486B006E for ; Tue, 16 Jan 2024 06:00:34 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A624D1C0B5E for ; Tue, 16 Jan 2024 11:00:34 +0000 (UTC) X-FDA: 81684880788.02.1AC88A0 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf15.hostedemail.com (Postfix) with ESMTP id A9C7AA0030 for ; Tue, 16 Jan 2024 11:00:31 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=njZImTaL; spf=pass (imf15.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705402831; a=rsa-sha256; cv=none; b=A72PS4ZVytCENDDcJeLyK0Zp8BujKxB8HH5UanlMm7HRkBRJLwFeSkMGkZ2k6jWiymoaYK cmtHn8vymISGg8zHK0GI540AP9AyE9nh/8yqykE5IT/hVVLmbSt4uvuk3F/myN6vXoDi+Z UaZNlmCFcpjVMVbmOH6RqHhlZKOdwj4= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=njZImTaL; spf=pass (imf15.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705402831; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=LgSqzoPx/XQAO4ZjfIuvn0t0DPCv9xD+HlBzqy5EMOY=; b=JzL2Rp29LI4EaGb8dBSt8tVLlKODLvXUHoQrYFhqhKW5piKF9vLbdYFXZ3LTz638hiCCk+ PUZDIXJtv+UavtT98jWPXRn7fjh5gitNumUYRzEc2XO8gL/aKw33LPemKug0cRmATW2Ohz lSwomHMEx/qn9BBBs3fYtM2GMjT7zaw= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id A8411611AC; Tue, 16 Jan 2024 11:00:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 31BB1C433F1; Tue, 16 Jan 2024 11:00:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705402830; bh=ka4ginUjS5CjHM0V8b8SOVIgogjHzG2IgkhfNwVc7Mk=; h=From:To:Cc:Subject:Date:From; b=njZImTaLdGb3i9JS73leCK9eh65efGnlVfx/VydMfLkNv5haxpeZsJz542IqxYsAd bk0gnJZretKX1VoQYIliU5Z9a8SVRDJyb0fPuSXtVPAUuXQlmkrfW208s9DtVHKaqu oBs0HJCLs2LSHkpCGwpvXoKhbJSHLJ1k/+L+Tyo1s0G2fqEMt4wimAIFk1kKQfsAiF /t2suYFIZybiRQR0l0UALkir0ZAjgfbJaIrRjP42+hAQUdVHv05DZCdAwzV2+GySso hk0R/R31qg4CLn2XgwnQSj0W1Mys3DF3+LDZ49i3v4/Ue+LpznmxXib7T/rYR93IGP UWVPaWNx7TRKA== From: Christian Brauner To: lsf-pc@lists.linux-foundation.org Cc: Christian Brauner , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-btrfs@vger.kernel.org, linux-block@vger.kernel.org, Matthew Wilcox , Jan Kara , Christoph Hellwig Subject: [LSF/MM/BPF TOPIC] Dropping page cache of individual fs Date: Tue, 16 Jan 2024 11:50:32 +0100 Message-ID: <20240116-tagelang-zugnummer-349edd1b5792@brauner> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4352; i=brauner@kernel.org; h=from:subject:message-id; bh=EZGa/THoxMnhCwXco+IB49hkRcRBCokvtP+wu+ae8TI=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaQui6/iW+TkW1bY+nLiIYdrzh8e7+7m1EpZM4GVceKZa 8sXfT/9u6OUhUGMi0FWTJHFod0kXG45T8Vmo0wNmDmsTCBDGLg4BWAi6S0Mf4X/PtP+YNM6IYvH 8eqNoOLTTVszb3D5fp7MoH4wqqaO7SLD/yrh1jc7bf/qtzIvCHPh+Xq6gf/7hrsrVj+ZLmHxpcV jBwcA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A9C7AA0030 X-Stat-Signature: c3ynewk3drepbec7q4zhq7prggaoyk6z X-Rspam-User: X-HE-Tag: 1705402831-403592 X-HE-Meta: U2FsdGVkX18f8N1CMwh7vedFjGRSMSEqVxjX2KAfc1anRlyI0O9T6sNlIpcIJdAz/JiyOUS7SK3sGNvw4wIUcX0UboZ4bEqqW09Fau5umfyVid2UpZG4DPv0033jEhDngw1zyI1+JVVp17MtY5rWPLAAKHxLXwIjL6feXwrJjRveJ1jxxBoarplVfzIAKaYDaE26vZllV7XnahxRiB5St8OCAehrtNTRIY1Tn2PS2f49259KhpM7S8wSG0XEusijhEiLaOX9Z79L5p7IEtvd3xYfmlL+VCUeyt5J72rGayy5rmg+s6IH8sXrUl3e6CnPE6tK7LL40CQBhhof/ju9+2eaL7Th3+IrbnLDqbOl09HG3TXF7dyB4/6ZdZKhDOfC/FJNSAr18KlG6qt2f48Yc0HnFJFi+boARTTT01ORe0LI4gHFd7A5kBFDz6hAbmoqKEzhBPY7K8E1ftmJtuZK4vl4G25U0MwLt/BvEtpofhAvibg41cz1nqxKee3/LWXPcBZ3eV3Q8eJ9EBbXb+JyT2tvNTnOuYZkEbWXXevsaElAslNyne3obbFsXz5/24M72bVcfqPv19os7nQKtZJZKqxG9/u2BwZh9yzlFWl95GxtScZQL2k9omWh0CEqW+V4FPSiTW5OseOEMAjXv1lbsMyVJK3mKNMCFx67WIITzy2rVqLnW19gd8oI+uGFa1QMJN54NImc6u6vfyol5dIHgOy5DSTr3udWfsU3A95KOPLKe8jzkdY4ag794bArBqK9bZm0ni6A0wMhIk3ioaKHKfbF2+KaW3c1Znjn2fWutPOxqinuY4PE95PNxuKYjqKMBoBVjNrSPeDA4FrRzQipjLAT2Ey2jfS4IZ/eoGrHOBlb7hA8zMpj+umBdISwDWv54/xv68QgtlcRHqhdY4YOvlXZ4wGPIB3SqdKma/5ZdN4JzpV7E3jAgW7YVfkTeet6r738CdVo/9u1Yjl1G8q /0eGV5bQ bcn5mVbuNaB6MibzXx1p/eFRY8FDxqRNSHjBDAMiOYy18yMR/R8r0flgjlX3R61bIpnAh1AmAKqD2frp2WzA/HsyH20mam6yw/baP9EElk74PkRBFBiBWPjdO2PZ8R4LLr/t6FvINu8GYvO5hK26nUVwPxkppbeAbwaYtWfuOK7g3MJp3PHFvcqxMmrIuU9jG7ONFQPSbcH32EvSW1PMTfh2x5L2Ce83rt6xrizuYyL9x2CwSGuwlleb4kJ0XDEbxvoxln2jTQzyXtV5N+CHq/0nGr/2vUT8a0SwcNlKzqVP6A57Ooxp7b28rZzIFydqRk2AK744uY60ryCZ4d222ytYS1J9joalgkuaBY824Z83afHnU0TxBhchEWtF88aaufWVvIFkwkTEETD12LXQN8OoKdg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hey, I'm not sure this even needs a full LSFMM discussion but since I currently don't have time to work on the patch I may as well submit it. Gnome recently got awared 1M Euro by the Sovereign Tech Fund (STF). The STF was created by the German government to fund public infrastructure: "The Sovereign Tech Fund supports the development, improvement and maintenance of open digital infrastructure. Our goal is to sustainably strengthen the open source ecosystem. We focus on security, resilience, technological diversity, and the people behind the code." (cf. [1]) Gnome has proposed various specific projects including integrating systemd-homed with Gnome. Systemd-homed provides various features and if you're interested in details then you might find it useful to read [2]. It makes use of various new VFS and fs specific developments over the last years. One feature is encrypting the home directory via LUKS. An approriate image or device must contain a GPT partition table. Currently there's only one partition which is a LUKS2 volume. Inside that LUKS2 volume is a Linux filesystem. Currently supported are btrfs (see [4] though), ext4, and xfs. The following issue isn't specific to systemd-homed. Gnome wants to be able to support locking encrypted home directories. For example, when the laptop is suspended. To do this the luksSuspend command can be used. The luksSuspend call is nothing else than a device mapper ioctl to suspend the block device and it's owning superblock/filesystem. Which in turn is nothing but a freeze initiated from the block layer: dm_suspend() -> __dm_suspend() -> lock_fs() -> bdev_freeze() So when we say luksSuspend we really mean block layer initiated freeze. The overall goal or expectation of userspace is that after a luksSuspend call all sensitive material has been evicted from relevant caches to harden against various attacks. And luksSuspend does wipe the encryption key and suspend the block device. However, the encryption key can still be available clear-text in the page cache. To illustrate this problem more simply: truncate -s 500M /tmp/img echo password | cryptsetup luksFormat /tmp/img --force-password echo password | cryptsetup open /tmp/img test mkfs.xfs /dev/mapper/test mount /dev/mapper/test /mnt echo "secrets" > /mnt/data cryptsetup luksSuspend test cat /mnt/data This will still happily print the contents of /mnt/data even though the block device and the owning filesystem are frozen because the data is still in the page cache. To my knowledge, the only current way to get the contents of /mnt/data or the encryption key out of the page cache is via /proc/sys/vm/drop_caches which is a big hammer. My initial reaction is to give userspace an API to drop the page cache of a specific filesystem which may have additional uses. I initially had started drafting an ioctl() and then got swayed towards a posix_fadvise() flag. I found out that this was already proposed a few years ago but got rejected as it was suspected this might just be someone toying around without a real world use-case. I think this here might qualify as a real-world use-case. This may at least help securing users with a regular dm-crypt setup where dm-crypt is the top layer. Users that stack additional layers on top of dm-crypt may still leak plaintext of course if they introduce additional caching. But that's on them. Of course other ideas welcome. [1]: https://www.sovereigntechfund.de/en [2]: https://systemd.io/HOME_DIRECTORY [3]: https://lore.kernel.org/linux-btrfs/20230908-merklich-bebauen-11914a630db4@brauner/ [4]: A bdev_freeze() call ideally does the following: (1) Freeze the block device @bdev (2) Find the owning superblock of the block device @bdev and freeze the filesystem as well. Especially (2) wasn't true for a long time. Filesystems would only be able to freeze the filesystems on the main block device. For example, an xfs filesystem using an external log device would not be able to be frozen if the block layer request came via the external log device. This is fixed since v6.8 for all filesystems using appropriate holder operations. Except for btrfs where block device initiated freezes don't work at all; not even for the main block device. I've pointed this out months ago in [3]. Which is why we currently can't use btrfs with LUKS2 encryption as as luksSuspend call will leave the filesystem unfrozen. [5]: https://gitlab.com/cryptsetup/cryptsetup/-/issues/855 https://gitlab.gnome.org/Teams/STF/homed/-/issues/23