From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49921C4828E for ; Fri, 2 Feb 2024 19:22:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D5DAD6B0082; Fri, 2 Feb 2024 14:22:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D0D0F8D0001; Fri, 2 Feb 2024 14:22:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD5666B0085; Fri, 2 Feb 2024 14:22:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AADFA6B0082 for ; Fri, 2 Feb 2024 14:22:18 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 843C9C0478 for ; Fri, 2 Feb 2024 19:22:18 +0000 (UTC) X-FDA: 81747834756.30.90720F1 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf28.hostedemail.com (Postfix) with ESMTP id 2787DC0006 for ; Fri, 2 Feb 2024 19:22:14 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="nW/FA4Uy"; spf=none (imf28.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706901736; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eL4LwatgR0k7zkyz5LeWPxwi/V/nQ0lFnAvigMlFZ0s=; b=gYk8xMeqI9szYq54R5pCigvRDzmtEdMLbbn6Kdxgmb83WhC+qK50RtgNMousacvL6hLKKV MRlq/JmPB3CQ579pT4E1sOV7E8evTXZWcYxF0m4bkI74+pL1Tbq0GYTotmP+dIncaJc23I ESMSmhvjnoQszOjevgW6KOxqJ9/4ABM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706901736; a=rsa-sha256; cv=none; b=Y8tmFtXtxHnlKrFRftgx+z+wxsAFsYDyfQK/xq3cyIlC6W1fMjnrZnVGSuADcuvvvHMesi L1e4EYyHxbQAt3reHWf4ce67k/zhaIAh5NVNo1WZVB53e06lcp/k4UQQhb7KFmDNp5GaRY NrsmaHxOOUehQulPCioyWbelLkVa/c8= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="nW/FA4Uy"; spf=none (imf28.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=eL4LwatgR0k7zkyz5LeWPxwi/V/nQ0lFnAvigMlFZ0s=; b=nW/FA4Uyt8zdHwT6NLbAYLGz2g EdNmZ0RFlDFl6Cg3+HLSqSutlQAX42Vhdo0i2vUU5fW7g99ur14g9H8dTRDy539219y7agQyqrXNY fJnTi6fZTy9kNmzOWDyG08Up16q8ByCJQZzu9BSQhYbad2yl3rINCQoUgcUoBUskUMdPFZS9q6n7/ V/iaawGVobY3BzdAsGmL4xFiG4EjMo4zYGf9fEPMsnQpKcxdCjEUBpyVw2piZWJWUa82tBHEWmbgS aMDTHT1feqgXxG2839cruejVUfnXorPLiZLTWwkQh1V43t9FzWQzzNJrod3ZCmqs4Bi/NlsPTC3Rr wU5HANeA==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rVz75-00000001sSe-3iYR; Fri, 02 Feb 2024 19:22:11 +0000 Date: Fri, 2 Feb 2024 19:22:11 +0000 From: Matthew Wilcox To: David Howells Cc: lsf-pc@lists.linux-foundation.org, netfs@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [LSF/MM/BPF TOPIC] Large folios, swap and fscache Message-ID: References: <2701740.1706864989@warthog.procyon.org.uk> <2761655.1706889464@warthog.procyon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2761655.1706889464@warthog.procyon.org.uk> X-Stat-Signature: ysonp7kkhcm8zbpji4w41mcugye8cpxc X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2787DC0006 X-Rspam-User: X-HE-Tag: 1706901734-753436 X-HE-Meta: U2FsdGVkX19TONBud0A464uwzv5e826z4Z3QUQzZvHC+3gBjsNe07XhCoq0dYWIe3/YEYVwmjGXmFUluLej6cXfk2u+nr/X5x5Uedk0EmXq4H3IMZp75VMp4MIBYMIgo9Q8BZCA378+NqMJWPNek0K7QSwlX+TsVR4P6RsHhN/8rWr3kgHrvf5/J7HVK350iFWJPVbqRAgV/h6TDooAkvwZHxxDtYOX7TNZkj5DzaUmTwv7JB3XYyLzD9gjAksHU5aGo3+G4DAeqmK7RZ+KddJWhPozoUheY+c/eDW/siwwSy6TYh6uv3gwgh3XDQzX/p6U1++Qs+Tinf7ku5YaeruGxWNk5TSsAsWoPtjk0CmARWN2UIRYsCITyteCoJz4+2JkHgtnctThlg67e9gTty76x01RITDFGy2v2PC0PkGjutuEo968P89MiKvDqs5X+EsrB7DPnLhNXUll02LFbRoZCJ8kJohKJoxV5UyAXbD7oRDtgX5nmkJVunNuhU/dOzNlK7VheaGbgdxPdZZ9x5uxkIJUh+s68T+sRpiz5L3CP2Y/1t2xhj1gE/JUhesgx4gDQ8OD7Dgy2vNX6FHHcshqOPPchbjKaGhASzRAXuWOORygESEl2wZXJu1qhLNtes0BqXtyY/G9YAqn2KDCuC3+94gM67B+Ac9xcICN9oKW6EsSlAOr7p13q4v+ec2XE16SqNkrfdTpNvxDD+9RVyvKzr4L5laAZhiWMBh+VhMqf52XDrR9tcUTKsra3egDFO93fs7rVjcvCMFaHnbxrffujVAprzAAghimYASRanwGjHRBIjOGRSR1RkQM2iYwaYFDnM8nMi42lyZ9LIylhcA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Feb 02, 2024 at 03:57:44PM +0000, David Howells wrote: > Matthew Wilcox wrote: > > > So my modest proposal is that we completely rearchitect how we handle > > swap. Instead of putting swp entries in the page tables (and in shmem's > > case in the page cache), we turn swap into an (object, offset) lookup > > (just like a filesystem). That means that each anon_vma becomes its > > own swap object and each shmem inode becomes its own swap object. > > The swap system can then borrow techniques from whichever filesystem > > it likes to do (object, offset, length) -> n x (device, block) mappings. > > That's basically what I'm suggesting, I think, but offloading the mechanics > down to a filesystem. That would be fine with me. bcachefs is an {key,val} > store right? Hmm. That's not a bad idea. So instead of having a swapfile, we could create a swap directory on an existing filesystem. Or if we want to partition the drive and have a swap partition we just mkfs.favourite that and tell it that root is the swap directory. I think this means we do away with the swap cache? If the page has been brought back in, we'd be able to find it in the anon_vma's page cache rather than having to search the global swap cache. > > I think my proposal above works for you? For each file you want to cache, > > create a swap object, and then tell swap when you want to read/write to > > the local swap object. What you do need is to persist the objects over > > a power cycle. That shouldn't be too hard ... after all, filesystems > > manage to do it. > > Sure - but there is an integrity constraint that doesn't exist with swap. > > There is also an additional feature of fscache: unless the cache entry is > locked in the cache (e.g. we're doing diconnected operation), we can throw > away an object from fscache and recycle it if we need space. In fact, this is > the way OpenAFS works: every write transaction done on a file/dir on the > server is done atomically and is given a monotonically increasing data version > number that is then used as part of the index key in the cache. So old > versions of the data get recycled as the cache needs to make space. > > Which also means that if swap needs more space, it can just kick stuff out of > fscache if it is not locked in. Ah, more requirements ;-) > > All we need to do is figure out how to name the lookup (I don't think we > > need to use strings to name the swap object, but obviously we could). Maybe > > it's just a stream of bytes. > > A binary blob would probably be better. > > I would use a separate index to map higher level organisations, such as > cell+volume in afs or the server address + share name in cifs to an index > number that can be used in the cache. > > Further, I could do with a way to invalidate all objects matching a particular > subkey. That seems to map to a directory hierarchy? So, named swap objects for fscache; anonymous ones for anon memory?