From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AD4CC54798 for ; Tue, 5 Mar 2024 21:58:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98BB86B0092; Tue, 5 Mar 2024 16:58:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 916206B0093; Tue, 5 Mar 2024 16:58:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78FC16B0095; Tue, 5 Mar 2024 16:58:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 60B666B0092 for ; Tue, 5 Mar 2024 16:58:18 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0CF1E120287 for ; Tue, 5 Mar 2024 21:58:18 +0000 (UTC) X-FDA: 81864349476.16.BE77B41 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf23.hostedemail.com (Postfix) with ESMTP id 3B518140005 for ; Tue, 5 Mar 2024 21:58:16 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=QUnoKkRU; spf=pass (imf23.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709675896; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/wDGQf2J4ZNCsrhvQHPVnVTKJv+L3CN8JfOfEbvEjV4=; b=rxfLOBwkyv3trCCAeViyI7dC4HPNyVi2aBUa1GN9+zG0WY22W7Zaeoy13M6AEGnL5BBUi5 eTiUzoJpQvP8++Xyz7aw42xU3N6qWeVwDyJBxi+G+St1dRipgqJLVkwqe2Ee9BTIlISEU3 5TGC/MGZCqp303vHQcc2UD/vXQHA9VM= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=QUnoKkRU; spf=pass (imf23.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709675896; a=rsa-sha256; cv=none; b=GH3C3GqimEvzKbwdoyvOvpf9lI6gs9CzWC7FgxGlLk/qddtsUqw8jS+dSwiLDWYJp3Hmjt tdzCXX4cTHjuiAnFLmsruzkQ5EZe1RAVmMfg2izJriWCGz+vExR1iGWTyi730Lcw6NQgM5 bJ7pWJbmTS2fC8xLbybtCwETSZMsFOg= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 2268661804 for ; Tue, 5 Mar 2024 21:58:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CC304C433F1 for ; Tue, 5 Mar 2024 21:58:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709675894; bh=DF8yoP+n9eK6V0IQZPUF0IKgRKBmCN82q5aAP6spstU=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=QUnoKkRULCO8YpXbRhA+JCFTiAEuBKCSlscSuOvwCOgvuSgIOe9eesLcYvBEYV7YB shplJiIYkAbHWrDTorX0uNkhyMMKyLAU2SgPl/bG3riXqgvQBT+ZT/GzTOxeSulJQL FmS6Lt2ESGQqS2hvs+gyd5422oKqyaLThBgpFf/RgXlyHsC9S4p0TY0ACunaReqxIk hx+SntZkp3JsDdQqX++aP528ousyjPg67hYO9BmTYUvyrpYGkM+dG1RWwCz9edRdPf fDGZxukJ7oSe7acdAiaoc7Y8+Y/h2UQZtA/LfQNNUbURDRt+oVmxPVD/CQjr45h5cL 8g37dUA3mS2Ow== Received: by mail-il1-f179.google.com with SMTP id e9e14a558f8ab-366019c21b4so3356015ab.1 for ; Tue, 05 Mar 2024 13:58:14 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCWqDG6Jloms1thWG+XpGlZTJieTvrX/vSpYiiKZMeKOQKIEEH2xVxiHqrUpdrzzE72HqXMHTZPqYvWe29XzU6hjxcM= X-Gm-Message-State: AOJu0YxBy+oI+BWSTt9WOSOVaSn1r4vYnmsGvItOqyvbUz3PUlugEKgE c8dz7OXjOH59YEjips7qzE1Y4KWFoTxLDw/J+muDRj8TiWMOmru683mdJ4o30PIOO3Bgzb/9uVH OF9MjG/w2QsnoKw2dq+fgwnAw3fsDN7mc/83Y X-Google-Smtp-Source: AGHT+IGExPctGB61yMF2Mo+YgsWCw1DPt4ih9NlDEa+H3f9h7Ml3Z/Cqu/QBl0JF9hVZwCnoXEIRlcrH8B+Oabp86gs= X-Received: by 2002:a92:cda3:0:b0:365:ff52:141d with SMTP id g3-20020a92cda3000000b00365ff52141dmr2811883ild.9.1709675894132; Tue, 05 Mar 2024 13:58:14 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Chris Li Date: Tue, 5 Mar 2024 13:58:00 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" To: Jared Hulbert Cc: Chengming Zhou , Matthew Wilcox , Nhat Pham , lsf-pc@lists.linux-foundation.org, linux-mm , ryan.roberts@arm.com, David Hildenbrand , Barry Song <21cnbao@gmail.com>, Chuanhua Han Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3B518140005 X-Rspam-User: X-Stat-Signature: 86uqtpagfsjtmxo5eur7gi9bcukocn1d X-Rspamd-Server: rspam01 X-HE-Tag: 1709675896-175741 X-HE-Meta: U2FsdGVkX1+chb9Cfrkl2ZwEoaXPtXM4WqIR7yejgW3KhbRTRahXobID1l1uUQhr/+bbvTPdGE0NO9ojG+AcOEjzd8REo5Xv0Y01aA3szPuI/IoAi0BM97r4i8QrAqP73QEfquje2RWVA7MW/dOuxkOinJJbKa8T0Fh/PR5Bsv6kpf5h+AFxPs3ftgtbd+H0xY/wb04+McwhPX2KqTA1vg2Ruk42p4ZunIJYG9UTSZQVDxhff0j9+d+eDTRX7vXOlfppim5VDrTA2R/rssbYUUvZZfjrkiLGL08hPFF17c48G+YwQvYtlNVyTC9GfaJhoQURqdTxqYB6io9OanGKnWYZff/6BlbSF1CEzt6KqXnK9GDOIrrLonEnO00fDgH4/eGZcZqmR1ltu6U8gsNI8LZY698NArbwj10Qym9992hFp0OzEJS52+ww+SK5OCIvaeHcyJQ3TITs0H6WtctoZnuYSgEHEzrTcd5E2+qu7xWE/I0CC6kMp/g+m8GtHLMhAX7tFulYxc2MPCJKgn+EQH5ToWCmpoa+sxQeo09hZeBJznYhS95POcOR8ng8MI4yLa4q2Owx7JX8YBjp57t3f76sqGx5eliwOdbazcjPKh75Eum+LgurrIGBBPGJQiBGWSZ2L/zCe5BQIukHkEVczWpZ0toIri+YBpk2lAPdt+gwNwXGaKA2bDNXeMUOoDcs3or46X3owRl1YtzHYr7XhNRqbqJje+ZjLNqVwNHwwp1f7F37U/Oa4VJXZTrzSnfInZaHPDbhY7awDZr9+XCdIItvUCBjkiaAYPwzGQuvOOfp7qhfG13Ev+P4Mq1c1A1g//ecO8nQUMYkBdTStCkfhai3MzKRJsIoBkstUmVqt0z1pjBBnMCuaSMPJLSa2XcGjriyb5N/NqQL4SiYgtzj8w/B3kZBLPQ/96Ll/1lhVqE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 5, 2024 at 1:38=E2=80=AFPM Jared Hulbert wr= ote: > > On Mon, Mar 4, 2024 at 11:49=E2=80=AFPM Chris Li wrot= e: > > > > I have considered that as well, that is further than writing from one > > swap device to another. The current swap device currently can't accept > > write on non page aligned offset. If we allow byte aligned write out > > size, the whole swap entry offset stuff needs some heavy changes. > > > > If we write out 4K pages, and the compression ratio is lower than 50%, > > it means a combination of two compressed pages can't fit into one > > page. Which means some of the page read back will need to overflow > > into another page. We kind of need a small file system to keep track > > of how the compressed data is stored, because it is not page aligned > > size any more. > > > > We can write out zsmalloc blocks of data as it is, however there is no > > guarantee the data in zsmalloc blocks have the same LRU order. > > > > It makes more sense when writing higher order > 0 swap pages. e.g > > writing 64K pages in one buffer, then we can write out compressed data > > as page boundary aligned and page sizes, accepting the waste on the > > last compressed page, might not fill up the whole page. > > A swap device not a device, until recently, it was a really bad > filesystem with no abstractions between the block device and the > filesystem. Zswap and zram are, in some respects, attempts to make > specialized filesystems without any of the advantages of using the vfs > tooling. > > What stops us from using an existing compressing filesystem? The issue is that the swap has a lot of different usage than a typical file system. Please take a look at the current different usage cases of swap and their related data structures, in the beginning of this email thread. If you want to use an existing file system, you still need to to bridge the gap between swap system and file systems. For example, the cgroup information is associated with each swap entry. You can think of swap as a special file system that can read and write 4K objects by keys. You can always use file system extend attributes to track the additional information associated with each swap entry. The end of the day, using the existing file system, the per swap entry metadata overhead would likely be much higher than the current swap back end. I understand the current swap back end organizes the data around swap offset, that makes swap data spreading to many different places. That is one reason people might not like it. However, it does have pretty minimal per swap entry memory overheads. The file system can store their meta data on disk, reducing the in memory overhead. That has a price that when you swap in a page, you might need to go through a few file system metadata reads before you can read in the real swapping data. > > Crazy talk here. What if we handled swap pages like they were mmap'd > to a special swap "file(s)"? That is already the case in the kernel, the swap cache handling is the same way of handling file cache with a file offset. Some of them even share the same underlying function, for example filemap_get_folio(). Chris