From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2D1CC47258 for ; Wed, 17 Jan 2024 13:19:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 34B3A6B00D5; Wed, 17 Jan 2024 08:19:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2FB696B00D7; Wed, 17 Jan 2024 08:19:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EA446B00D6; Wed, 17 Jan 2024 08:19:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 10B836B00E1 for ; Wed, 17 Jan 2024 08:19:54 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D6982A2201 for ; Wed, 17 Jan 2024 13:19:53 +0000 (UTC) X-FDA: 81688860666.13.6685DD4 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf16.hostedemail.com (Postfix) with ESMTP id 1487B18001E for ; Wed, 17 Jan 2024 13:19:51 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=oQGWDOx+; spf=pass (imf16.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705497592; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IvuUtUN6MC8x/OGceyvDL/O2nj8243G6X/i9m+c5obc=; b=HPbAWGK8fe1V5EWKq1DCFzuO38VWkUBZqusH9HLa3gARK46iBCEA/kl9P54LcKWnW8IFoG cYRSW9EOeht63UlnILjLZlLvV0czBkt2yo58LpNuS04W2+prNBBK8hfjBgtOJG5vTHrClj nkuNAuon/TuOZ2+1wgOSGKT4zgNab10= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705497592; a=rsa-sha256; cv=none; b=b/2FHmt1r/1Xol9dEZxrkT6x9wHPQjAs2TaQ4LH1eg1bAd6TJlgcxW7fD+gYFg8FrCYakA Sz4z0xRpKbxdephYJcqzVDGLxx6iLNEdOsZu2GDHcavhvqE63CuPuRrD7Bo+OaH+sMaF25 AoDYcYcL9rlVH5KgM9A2z0L8t+LgysM= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=oQGWDOx+; spf=pass (imf16.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 6779D61615; Wed, 17 Jan 2024 13:19:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 53C2EC433C7; Wed, 17 Jan 2024 13:19:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705497589; bh=PFwWNdCAPHdz4P1b3gtmw5Cqqs2CPi48jQ7oai7GPro=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=oQGWDOx+k9/gtTlV55SvHK6YUjTw+wm+xhLrBjx2MlYgoVwNOaP6nf+m+kaLdejFf Y04Pvgx7/3X0l9L+mXg+6fqLwiCM/sXrhvNsbMi5BbNUVLEhcW9aUUUgaAwVwszxlL dw9ebqaY/u3Uir4vJfV6W/2XgsDF/sh8O1vMAuPyhBW4uvOOoajXFLJ5sxioqM+uU+ cAhUxThfI3aHqjpB4W4InYUByKDpGo8g9iDvRLMHas78O3VIqNtyf51LflG1wvnK1x Fi1zqzl7t5A0R667dcj8bg07nID9RXzniHIuZwT4Ak+RmfKsVouaxBDJBlsR5nz1nu kk4xNvM9wyRww== Date: Wed, 17 Jan 2024 14:19:43 +0100 From: Christian Brauner To: Dave Chinner Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-btrfs@vger.kernel.org, linux-block@vger.kernel.org, Matthew Wilcox , Jan Kara , Christoph Hellwig , adrianvovk@gmail.com Subject: Re: [LSF/MM/BPF TOPIC] Dropping page cache of individual fs Message-ID: <20240117-yuppie-unflexibel-dbbb281cb948@brauner> References: <20240116-tagelang-zugnummer-349edd1b5792@brauner> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Stat-Signature: witak953kg7ogzjc1575x8qkkixqxmb3 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 1487B18001E X-Rspam-User: X-HE-Tag: 1705497591-531222 X-HE-Meta: U2FsdGVkX18NOZj4pC7hlNJLXl6STz2a6SmY4an+hx3DyubnxTe5IFqZmEoA1ni7U9jkT+pzsAhE1evyPlNn/nEMpvJDk0eSPd95u/cJWzMKKsceul+PgeG84NC+3V6/6BuO/3/+FVE5RRBy8RzULbPE+AMo329xMVcYwcwgHHnC0hiMqDTlC0wQTN07BQpX363qdOfDR7iNd2FnS/qEljO2smr+MhPjrxtzRgYod0AwpPyBM1K4lS4NPRayq+inBYaQOP617jgNPYJY2/lx1v3Xz8q1qC8fSnqXgXj655jCrU0O+OsrJJrMAnhAXxyXbQ2DOm+bDaAEp2jVlP4+0HY35s/KwxJL+HvcP7mc3Jcz4j31qvtbLwgF40dSyfX9Yg9KUqjFA973aV3z/zBF2zYdFO7TMohELkfZp1hhz0iwgmD5y/yjkzK2d5v147J/elLcjJeRnfaMO4SuBGtv9kxCinE2LJg9ZBJRBvp5Xr9VuKStIZ+F4O7siTxEYb5pn0qkOZcgBrkMgZEXB2DpiJe2fDWIu5WHN5WVfZc/iaGIznlozhvmpluaPk0ippxLkAAwt5B7rY4bNiAW2iQPYVNpNi0PDGvUbEz8Oh2EJXH35jQ2UOSnFCyz4J4CwmZQPqZZwevJ7UMG+fPg5Za1vqPFqFNn8slbG9y4yCG3Z5+r5Dsd9YGxVhx3yvp9XuM2pa9orHDH1Ro2YIZ0RmOcBtEyuUL47pbLKaNhtoUji6UgtuT09fo/n+lmy/sxV9nBX8th5z1NnhZZtnkVMTIIrLvWRkxiQoloN4z+m/SIgzBZsoq6fylfH6yIUr1kxMpWO7yVOHMnr6veRi/86MdGsXrasR7wYTX+BrgokYLb2+R+VryqJcssT+7IzZgWIUdqZWEcZAOVWbxOJSxEc+gRdbqgxsOxA35toC82/Up9hQiCARR4ML5qZ70XstbzDIDN5WpixSx+uQOqoyXtmTb 8hRljmHt au2llR8dZXP2LKuSAwB7D9qAIkT6YqVZWe2ZSCQ3U/Jh+LnSuWoIvKPLyLAvyQZVPZf09hdShhs3lODH3GaizkP4IiygZYPLVff7hfjWKp+tPK2Vl7OhuHb46C8rwezxNC2hu31LDrz2RMIc2neNaY7d08GTvAWl5SDyH0g0VnQzVi/8rrHIa+za5h3twP0bD6GSB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 17, 2024 at 07:56:01AM +1100, Dave Chinner wrote: > On Tue, Jan 16, 2024 at 11:50:32AM +0100, Christian Brauner wrote: > > Hey, > > > > I'm not sure this even needs a full LSFMM discussion but since I > > currently don't have time to work on the patch I may as well submit it. > > > > Gnome recently got awared 1M Euro by the Sovereign Tech Fund (STF). The > > STF was created by the German government to fund public infrastructure: > > > > "The Sovereign Tech Fund supports the development, improvement and > > maintenance of open digital infrastructure. Our goal is to sustainably > > strengthen the open source ecosystem. We focus on security, resilience, > > technological diversity, and the people behind the code." (cf. [1]) > > > > Gnome has proposed various specific projects including integrating > > systemd-homed with Gnome. Systemd-homed provides various features and if > > you're interested in details then you might find it useful to read [2]. > > It makes use of various new VFS and fs specific developments over the > > last years. > > > > One feature is encrypting the home directory via LUKS. An approriate > > image or device must contain a GPT partition table. Currently there's > > only one partition which is a LUKS2 volume. Inside that LUKS2 volume is > > a Linux filesystem. Currently supported are btrfs (see [4] though), > > ext4, and xfs. > > > > The following issue isn't specific to systemd-homed. Gnome wants to be > > able to support locking encrypted home directories. For example, when > > the laptop is suspended. To do this the luksSuspend command can be used. > > > > The luksSuspend call is nothing else than a device mapper ioctl to > > suspend the block device and it's owning superblock/filesystem. Which in > > turn is nothing but a freeze initiated from the block layer: > > > > dm_suspend() > > -> __dm_suspend() > > -> lock_fs() > > -> bdev_freeze() > > > > So when we say luksSuspend we really mean block layer initiated freeze. > > The overall goal or expectation of userspace is that after a luksSuspend > > call all sensitive material has been evicted from relevant caches to > > harden against various attacks. And luksSuspend does wipe the encryption > > key and suspend the block device. However, the encryption key can still > > be available clear-text in the page cache. > > The wiping of secrets is completely orthogonal to the freezing of > the device and filesystem - the freeze does not need to occur to > allow the encryption keys and decrypted data to be purged. They > should not be conflated; purging needs to be a completely separate > operation that can be run regardless of device/fs freeze status. Yes, I'm aware. I didn't mean to imply that these things are in any way necessarily connected. Just that there are use-cases where they are. And the encrypted home directory case is one. One froze the block device and filesystem one would now also like to drop the page cache which has most of the interesting data. The fact that after a block layer initiated freeze - again mostly a device mapper problem - one may or may not be able to successfully read from the filesystem is annoying. Of course one can't write, that will hang one immediately. But if one still has some data in the page cache one can still dump the contents of that file. That's at least odd behavior from a users POV even if for us it's cleary why that's the case. And a freeze does do a sync_filesystem() and a sync_blockdev() to flush out any dirty data for that specific filesystem. So it would be fitting to give users an api that allows them to also drop the page cache contents. For some use-cases like the Gnome use-case one wants to do a freeze and drop everything that one can from the page cache for that specific filesystem. And drop_caches is a big hammer simply because there are workloads where that isn't feasible. Even on a modern boring laption system one may have lots of services. On a large scale system one may have thousands of services and they may all uses separate images (And the border between isolated services and containers is fuzzy at best.). And here invoking drop_caches penalizes every service. One may want to drop the contents of _some_ services but not all of them. Especially during suspend where one cares about dropping the page cache of the home directory that gets suspended - encrypted or unencrypted. Ignoring the security aspect itself. Just the fact that one froze the block device and the owning filesystem one may want to go and drop the page cache as well without impacting every other filesystem on the system. Which may be thousands. One doesn't want to penalize them all. Ignoring the specific use-case I know that David has been interested in a way to drop the page cache for afs. So this is not just for the home directory case. I mostly wanted to make it clear that there are users of an interface like this; even if it were just best effort. > > FWIW, focussing on purging the page cache omits the fact that > having access to the directory structure is a problem - one can > still retrieve other user information that is stored in metadata > (e.g. xattrs) that isn't part of the page cache. Even the directory > structure that is cached in dentries could reveal secrets someone > wants to keep hidden (e.g code names for operations/products). Yes, of course but that's fine. The most sensitive data and the biggest chunks of data will be the contents of files. We don't necessarily need to cater to the paranoid with this. > > So if we want luksSuspend to actually protect user information when > it runs, then it effectively needs to bring the filesystem right > back to it's "just mounted" state where the only thing in memory is > the root directory dentry and inode and nothing else. Yes, which we know isn't feasible. > > And, of course, this is largely impossible to do because anything > with an open file on the filesystem will prevent this robust cache > purge from occurring.... > > Which brings us back to "best effort" only, and at this point we > already have drop-caches.... > > Mind you, I do wonder if drop caches is fast enough for this sort of > use case. It is single threaded, and if the filesystem/system has > millions of cached inodes it can take minutes to run. Unmount has > the same problem - purging large dentry/inode caches takes a *lot* > of CPU time and these operations are single threaded. > > So it may not be practical in the luks context to purge caches e.g. > suspending a laptop shouldn't take minutes. However laptops are > getting to the hundreds of GB of RAM these days and so they can > cache millions of inodes, so cache purge runtime is definitely a > consideration here. I'm really trying to look for a practical api that doesn't require users to drop the caches for every mounted image on the system. FYI, I've tried to get some users to reply here so they could speak to the fact that they don't expect this to be an optimal solution but none of them know how to reply to lore mboxes so I can just relay information.