From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4C1FC48295 for ; Fri, 2 Feb 2024 15:57:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F8476B0071; Fri, 2 Feb 2024 10:57:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A8F16B0072; Fri, 2 Feb 2024 10:57:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 070956B0074; Fri, 2 Feb 2024 10:57:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id EC1736B0071 for ; Fri, 2 Feb 2024 10:57:57 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A1DD4160F97 for ; Fri, 2 Feb 2024 15:57:57 +0000 (UTC) X-FDA: 81747319794.20.41C0195 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf26.hostedemail.com (Postfix) with ESMTP id C26FE140024 for ; Fri, 2 Feb 2024 15:57:54 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UDvKmU9l; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706889474; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5fucXjs3Pug8NIrqUChZMc42UrlJY9vx+TjWlUCGhf8=; b=ikq8EzguMgI9ElQfvlKijGHOHH+psKm7HGXc/zpcEP0n9qYylRzoeKerzKdWMa6dFZb4Qt Psu+CdYzB6nMm6BSRPrnV8UAXAgpvwSEWPA6XMBVLFiVCnjopZ+Kp9OMX+bFRnN7ZEwd66 +w8OGObF9xLVJ1J84gxuJhKY3uGDqic= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UDvKmU9l; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706889474; a=rsa-sha256; cv=none; b=Gqzlm63TsiEKy0HU3/D2GZuQumOut/JPsENReQJ7Pplg+Ev0BOKtPhvvkeiubAbR8pi1eO 46hUMql3Q0Z3hS3rG/lGXSoDn6Tk2cRZNrCC+/8DPTKOPqWpKeoK8rrUUBIyw5/LoJAhlW vyjAugnZK8XpUxJPOJut7W/hJYFcXnU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706889474; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=5fucXjs3Pug8NIrqUChZMc42UrlJY9vx+TjWlUCGhf8=; b=UDvKmU9l7C1gg+LkEy6COyzJ+wYwHzK+HAHfTwvu5He8GQUVNbzBmw1e89r8fuAmOxH6M0 NKmMS0gQFaHTrjxEyq8Cs2vYQfL5/rXGY1bb6lg1h5b5ZkxDoxCsw3nNH00a1Nbxfy75Ao vCOdjGavwLFJnEE/PTtx7V2R/ykUqwM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-631-bL1KB8JvMPeLsTPWVeBEKw-1; Fri, 02 Feb 2024 10:57:50 -0500 X-MC-Unique: bL1KB8JvMPeLsTPWVeBEKw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 53A03185A785; Fri, 2 Feb 2024 15:57:50 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.42.28.245]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7F6071C060AF; Fri, 2 Feb 2024 15:57:49 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: References: <2701740.1706864989@warthog.procyon.org.uk> To: Matthew Wilcox Cc: dhowells@redhat.com, lsf-pc@lists.linux-foundation.org, netfs@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [LSF/MM/BPF TOPIC] Large folios, swap and fscache MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2761654.1706889464.1@warthog.procyon.org.uk> Date: Fri, 02 Feb 2024 15:57:44 +0000 Message-ID: <2761655.1706889464@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.7 X-Rspamd-Queue-Id: C26FE140024 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: z1s1qoq6bh1aq6scerd7s3qba4t3cjnc X-HE-Tag: 1706889474-577782 X-HE-Meta: U2FsdGVkX18qzSzvwGjuRvXMS932CNy/krox8PO2PAAIe+IP6BEDXRYrZe/Etan6zh+PQdOkuje8UH3ys306M/O45TPLZdjU5duUOwtgQq3peylAvVG9QCpCygjDQ2cjEv3/dObl2upF6dtAKVbmek+MhgF0S5fkX4L+6KST3YnoYeSCdNfTYT2J4KJCCNWEtOz/nPs2rKTMgPGHiUg3l7yLbbU6YEP5P0gke5iw9VV5S5I0D1+AyQuhDHcd1y5rvYl7lk27STdIimWRH/orYrl7hnzYGu62kUqWV10ty7VH6+UyiuZ7u/eahjqxzU8P4QvEuTYYGRVK91fdS8nPBOFiQuDEvJ6fjPp2gTBihpCyL1TV3H/EzbOSwO+T2A/XAGF95xUOCyxxQZAv0n4FjjIOdnpQ5eN+T08hP/6FLSKF9a520ZzAKDH5Bzs1e9i+mCkEtazSh9OWm7SgKKWi+7WPq9gpYmHrMPJW+bL6k3skeKgkiHNfBXXj2k+IQjzO1WLCAHMBX61+Rn96Zx9JujNXNzJOp0EidgoyUzTD9pNORdv6it9zxnbcCYELvPCbzKfcyFHdBbahJxv5rQ1hkUukvONEmysReJ2i7B15khsmZvI0zaLqtAMtZL/IPC8aR4Een8aS4F0FGGXN/dAd0jnXHR+HeVaxw4b5KvDjSuWZm5S7SyTcN/MxebUjtdJJWAZwUbbhb+pMjfiwIUC4X9dL9WRCak1KRVgXAno8+KozeJyWQCsqkM4NLv17/97vMkysn4eP6q192hM3dIVYbNR3FOXgmGax10fLhUzDJTwpfSXKuuO7oTTYnB/XeLmRK4S7xnXOgWFbEsXNckH2nkCgR9qinKlle2C8bfg/VDs/LVyDEXACGKY2Ix5rF0Ft3kj9+4QuXOEy4LcFllcBoI51D04QzfyBnzfhQxyi/PlBqI+RsiRiH6ZLYN880gJTC14BUi7yEhMWUKtnt72 Ds9BL8Up PRwyTzHelYJ3YkT3w9lLTRacQHvIbyELfbx83ZsWm9VHo1leTRpVJ2hPL602iVpMrhRVLMb4AxlQPyQ8ShUvCXgrOUMnw8a9sByLdv0iYnb+3OPWt0EuXSiC7FbssgBcsGDMNhHtBCCUcLS6Tgk06QM+sQXlERRguYN/dgQMLYnvmy1Lo+ZVAmPfVva/TVhnuKpP+St9f+WP3JipKkz0UHPxd6w5dTbm3LXpF01gnIxgECT2BvA+VvG9nVQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Matthew Wilcox wrote: > So my modest proposal is that we completely rearchitect how we handle > swap. Instead of putting swp entries in the page tables (and in shmem's > case in the page cache), we turn swap into an (object, offset) lookup > (just like a filesystem). That means that each anon_vma becomes its > own swap object and each shmem inode becomes its own swap object. > The swap system can then borrow techniques from whichever filesystem > it likes to do (object, offset, length) -> n x (device, block) mappings. That's basically what I'm suggesting, I think, but offloading the mechanics down to a filesystem. That would be fine with me. bcachefs is an {key,val} store right? > > Further to this, we have at least two ways to cache data on > > disk/flash/etc. - swap and fscache - and both want to set aside disk space > > for their operation. Might it be possible to combine the two? > > > > One thing I want to look at for fscache is the possibility of switching > > from a file-per-object-based approach to a tagged cache more akin to the > > way OpenAFS does things. In OpenAFS, you have a whole bunch of small > > files, each containing a single block (e.g. 256K) of data, and an index > > that maps a particular {volume,file,version,block} to one of these files > > in the cache. > > I think my proposal above works for you? For each file you want to cache, > create a swap object, and then tell swap when you want to read/write to > the local swap object. What you do need is to persist the objects over > a power cycle. That shouldn't be too hard ... after all, filesystems > manage to do it. Sure - but there is an integrity constraint that doesn't exist with swap. There is also an additional feature of fscache: unless the cache entry is locked in the cache (e.g. we're doing diconnected operation), we can throw away an object from fscache and recycle it if we need space. In fact, this is the way OpenAFS works: every write transaction done on a file/dir on the server is done atomically and is given a monotonically increasing data version number that is then used as part of the index key in the cache. So old versions of the data get recycled as the cache needs to make space. Which also means that if swap needs more space, it can just kick stuff out of fscache if it is not locked in. > All we need to do is figure out how to name the lookup (I don't think we > need to use strings to name the swap object, but obviously we could). Maybe > it's just a stream of bytes. A binary blob would probably be better. I would use a separate index to map higher level organisations, such as cell+volume in afs or the server address + share name in cifs to an index number that can be used in the cache. Further, I could do with a way to invalidate all objects matching a particular subkey. David