From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81C02C5475B for ; Wed, 6 Mar 2024 18:16:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD9EC6B0078; Wed, 6 Mar 2024 13:16:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D89EA6B0082; Wed, 6 Mar 2024 13:16:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C78EF6B0083; Wed, 6 Mar 2024 13:16:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B62AE6B0078 for ; Wed, 6 Mar 2024 13:16:35 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6645DA0282 for ; Wed, 6 Mar 2024 18:16:35 +0000 (UTC) X-FDA: 81867419550.01.B7BE3F3 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf20.hostedemail.com (Postfix) with ESMTP id 5E6561C0004 for ; Wed, 6 Mar 2024 18:16:29 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=T5P8C8Iu; spf=pass (imf20.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709748991; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RWlgIPc5XOpVb8l2+dG0A6t8SG9qxcUhE3kZsuSn9hc=; b=3g95lM4lVQPbw65PNDy+4AScg0ZyDqlXkREDbJMj8QkWXNku8SRO668WWMDXvkwuid0aSk 6Bp+DcIj8ZQhZFpDnXHQdBbVcJQuKwP3UfOr+BryX1ubt83M+Inqe8gK2KicVDH7Xad7HK DRQb4w70b3B5Yc8riAp316vdCxdIbEo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709748991; a=rsa-sha256; cv=none; b=yk5XdoZxN7L8vi2NoUjDyBkgclQRLTPcV01yOwa7SodWvPyytIea0FfE9i/SjinJ1h6v+k JF2w95LSmB7VbEd6uZieOefi1uBSqJeg+T+81j7ZeoiJp057u66Xu+uwg31Tyksfn9Aoeo tRGij0z7y9HVYOjYmasrfLc7NLR3KkI= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=T5P8C8Iu; spf=pass (imf20.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id E383ECE22F8 for ; Wed, 6 Mar 2024 18:16:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3150BC433F1 for ; Wed, 6 Mar 2024 18:16:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709748986; bh=4KIQWBfCOkbfHeeAx3yodlbVacpGVZPFTPjXCapXkGU=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=T5P8C8IujYIaNzVNYNXRZdkxTFANnzDQGF/xpeGU/GqpKgFx9EWqg5JR89H+63Q1F wu05Xi0+9SXHIoQglnmp7BkhnINrwbMc337AZfRi55Y6MAZESz4vsJuHTyzZFecp3J hee54jOK66lXxEFk6arzliYeycnEegKtwDJtRYC+X5u5f/kp/OxioDA5/uB4Uo+n2y ndT5YmxfMGcixRv6knKZodMvAYFDgmEYqzonRBKuXFUvDmfJoTpnJ2o5Mb3OeYYMiU RZ1A6Y3IeaSAsofMyV+F8M4DQ3FeUZaXjZa75IGiRnpjlIfhv0MFNic6rMlux62Y/2 xkqGGdlfktrIA== Received: by mail-lf1-f47.google.com with SMTP id 2adb3069b0e04-5132010e5d1so3060541e87.0 for ; Wed, 06 Mar 2024 10:16:26 -0800 (PST) X-Gm-Message-State: AOJu0YwVHDHM66ESpuNKnadtToX40YXDLogZxRjw7Dx398hpVZeaIwHa Ee+LAVqqnygZ0rjPKie4fu/EV/Z9NuO3p7bXs1dvMmbpEduhDBw4mRQbAvleSarfDC0rnre9wOS VcPJIndxFIj7qjLWT6ylnXyxJTA== X-Google-Smtp-Source: AGHT+IEQgQaa1bOgJVsvFb0TPDgtF6X2/ATythIXO3ou/atgDFcbcrrXGhGm4G20dHvQMs6/XIVW4GzyFp2ET65nzA8= X-Received: by 2002:a05:6512:78c:b0:513:3d3c:722f with SMTP id x12-20020a056512078c00b005133d3c722fmr3526938lfr.15.1709748984600; Wed, 06 Mar 2024 10:16:24 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Chris Li Date: Wed, 6 Mar 2024 10:16:12 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" To: Jared Hulbert Cc: linux-mm Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 5E6561C0004 X-Rspam-User: X-Stat-Signature: kqbt1naxanbhxe7m8qipt8e4fk6rht34 X-Rspamd-Server: rspam03 X-HE-Tag: 1709748989-196812 X-HE-Meta: U2FsdGVkX18HP0a2iP89TOKlL+cAt0AK6Q79YIpcuKoPpimj7fm3JPR/xNzAdT2yNI7vagVb3UcLsLW5foD+LkiOsNuvK7ibsM7dhnDwBEuEtrr94BurFgJFwG9dyp0D72bTdL6uu9bHAw8E02XEc1G2M5ia2LFLZZvcAv8ZPOPoz17tJ5igObvyfJM61a5DrLuwUbknYlW0JIxYQffCEF6VzGZpZ3OW2Qz/xUEjaBWaxuMI9dS+qHRUns+O51owPCqpvh26LK/us+41tNRirFDZfiNTI7G2lLESiSmCBbBo/PkJXNdGzpzC+rl5EQk1dBev4hIuVzS+dFtVHwpog+O/5iyt7PXHYdQxwNuYexWWtgTqWq56avCW0YTjO/5PkAicaupZ3he35qkSdeGbTP0lQHWb3OksrdNCyl4KjLpgsusSICSt0t5BHTMKv9UrXdQaZ+4htbEUpmBwopMe+6D4ridXd+1WSS8bHe+ASAuX++XadAmgAXdmocEYYrSilFtYuJm5V4htAsAdwBP/rIyBufIBcjkb6MEsv5CgKC2sZvLOqPhQFsdsKZqNbYG6rp1KcedOoTbpsA5Gvc/4egMpQ9njPqewoLqNYwWis4SswZOpV3s9beOBIM1ozDQykGfbCTt2zAS6pczwkTK8MinLMP5VvHM1OTXh6Qm5mvYXb4sj06/vC88tG73uUd4I009Gq8LpDUvlTqs5JgfRFMLYlpmkLXmOWsO8unER9quumInRdnXdfjX2nsct3HdCqZ8v1YhN07gW6joEDxjMmjV4qeJipgwP7hO6mUb8QugX32qkYBCYcy4N28n4z/jSmHDRy24U+gpDDwSTA/zdRSkAHkidUvjfVy9UACgexHtVV+DkfsTuP/YivJv112v8lYsVa3EEVaFiR0i+DxNS8ihslDIYPMq2pmhX2G4SIlGJA1tHl6znD6bsWgHFPqfz0F6ECP4lUhzfvQNhQRr rCvuBMch 71v40uG4RyGoEQGsYL6BUOtLRBGlNO/h5SOjpQkBhyvdg92YAbDVhMCAaMQn9MjB8FJGsHS3sQI2c3pc6hDSzjk1TC7jF1sATPoI5SZAEjdT/wwhdJgtmW5du3DADOf4omO8xtIU3IQ8j/Tj//IcGs7WYynKsdfoZs39k7yGADbqZGpMIebvjTXyvbyvLolsaKplHH6eBMaPs90eYw2QwnXrqqdXX7KOaFpwP1PSs6Z2rrbQamZdxkinN4GyPNCHNthpJWUYbgayZ+nfOqkz62GZSjQIqfDfne5nT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 6, 2024 at 2:39=E2=80=AFAM Jared Hulbert wr= ote: > > On Tue, Mar 5, 2024 at 9:51=E2=80=AFPM Chris Li wrote= : > > If your file size is 4K each and you need to store millions of 4K > > small files, reference it by an integer like filename. Typical file > > systems like ext4, btrfs will definitely not be able to get 1% meta > > data storage for that kind of usage. 1% of 4K is 40 bytes. Your > > typical inode struct is much bigger than that. Last I checked, the > > sizeof(struct inode) is 632. > > Okay that is an interesting difference in assumptions. I see no need > to have file =3D=3D page, I think that would be insane to have an inode > per swap page. You'd have one big "file" and do offsets. Or a file > per cgroup, etc. Then you are back to design your own data structure to manage how to map the swap entry into large file offsets. The swap file is a one large file, it can group clusters as smaller large files internally. Why not use the swap file directly? The VFS does not really help, it is more of a burden to maintain all those super blocks, directory, inode etc. > Remember I'm advocating a subset of the VFS interface, learning from > it not using it as is. You can't really use a subset without having the other parts drag alone. Most of the VFS operations, those op call back functions do not apply to swap directly any way. If you say VFS is just an inspiration, then that is more or less what I had in mind earlier :-) > > > > From a fundamental architecture standpoint it's not a stretch to thin= k > > > that a modified filesystem would be meet or beat existing swap engine= s > > > on metadata overhead. > > > > Please show me one file system that can beat the existing swap system > > in the swap specific usage case (load/store of individual 4K pages), I > > am interested in learning. > > Well mind you I'm suggesting a modified filesystem and this is hard to > compare apples to apples, but sure... here we go :) > > Consider an unmodified EXT4 vs ZRAM with a backing device of the same > sizes, same hardware. > > Using the page cache as a bad proxy for RAM caching in the case of > EXT4 and comparing to the ZRAM without sending anything to the backing > store. The ZRAM is faster at reads while the EXT4 is a little faster > at writes > > | ZRAM | EXT4 | > ----------------------------- > read | 4.4 GB/s | 2.5 GB/s | > write | 643 MB/s | 658 MB/s | > > If you look at what happens when you talk about getting thing to and > from the disk then while the ZRAM is a tiny bit faster at the reads > but ZRAM is way slow at writes. > > | ZRAM | EXT4 | > ------------------------------- > read | 1.14 GB/s | 1.10 GB/s | > write | 82.3 MB/s | 548 MB/s | I am more interested in terms of per swap entry memory overhead. Without knowing how you map the swap entry into file read/writes, I have no idea now how to interpertet those numbers in the swap back end usage context. ZRAM is just a block device, ZRAM does not participate in how the swap entry was allocated or free. ZRAM does compression, which is CPU intensive. While EXT4 doesn't, it is understandable ZRAM might have lower write bandwidth. I am not sure how those numbers translate into prediction of how a file system based swap back end system performs. Regards, Chris