From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC0CDC47DD9 for ; Fri, 23 Feb 2024 03:03:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 546FA6B00BE; Thu, 22 Feb 2024 22:03:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F7706B00C0; Thu, 22 Feb 2024 22:03:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BF2E6B00C2; Thu, 22 Feb 2024 22:03:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2D2856B00BE for ; Thu, 22 Feb 2024 22:03:19 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 02209A0A4B for ; Fri, 23 Feb 2024 03:03:18 +0000 (UTC) X-FDA: 81821572476.19.157DA39 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf06.hostedemail.com (Postfix) with ESMTP id 0851B18001F for ; Fri, 23 Feb 2024 03:03:16 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=dilger-ca.20230601.gappssmtp.com header.s=20230601 header.b=VvfBR4WH; dmarc=none; spf=pass (imf06.hostedemail.com: domain of adilger@dilger.ca designates 209.85.216.45 as permitted sender) smtp.mailfrom=adilger@dilger.ca ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708657397; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gRbla0f/Hl6rGJ+ONURsPxKzzUMpoaT/YVrejYpQIh8=; b=L3ffpIbE3rBa93YG2MRi2TZRZkiICw1ex1c60m/4tdDm+4fM5D9Djepoeq6S6aP3d3259+ GlefjDOKU3Dj4fEbcdzTJADT3LTApfS/0PMVvPuA2oOWCuEEJbWSpfj+DyzAtoRj2/JBdV eiWYaB02lP7P9xpRidRFm7xe0F25L5M= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=dilger-ca.20230601.gappssmtp.com header.s=20230601 header.b=VvfBR4WH; dmarc=none; spf=pass (imf06.hostedemail.com: domain of adilger@dilger.ca designates 209.85.216.45 as permitted sender) smtp.mailfrom=adilger@dilger.ca ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708657397; a=rsa-sha256; cv=none; b=4HAvBqGBnV0u3tBqdcXeyLcb6bvGY/KXhwF1INGVxLao4xWE5wYnHwgRmo2krqVk1GZ0cT dgD4JyZoDv4dM/vFQo1gpxmfbTRjmfANZ5cnZK8rJ0G89HIuEeqXiTQI0SCPRhNnTDGe6W /vZUihaDe8sYC1/0gENqdAPa40qP25I= Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-29951f5c2e7so369244a91.1 for ; Thu, 22 Feb 2024 19:03:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dilger-ca.20230601.gappssmtp.com; s=20230601; t=1708657396; x=1709262196; darn=kvack.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=gRbla0f/Hl6rGJ+ONURsPxKzzUMpoaT/YVrejYpQIh8=; b=VvfBR4WHjN0AYMzHvxb9TbkSt0RLknQ+HP/WSs8GocKeAy4UrvKZg77MZ0lEAza2/N wnaXZ4BRXeaq5B8KViJ8tF0SmEVDgxcyfc9cyZmbnRzpSw5gGgFtEDys8df46pYmMLrB bg6mqpp3pKh7L8ef7nwN2RRmggvdo/HMfxJRF/AfAb228wOJTsFMgKksBNt5T4px61qp 80DnKj/QxpasCfJ5VPF1MFUdQutm6yeytE+0dp1uxLvFLDYktxKuC3hd3N07nzwv83aX HrpM0JTEcQQW0WdpZIxTzpjHW7Dkb0BI77cyQjPYWuTdmZrKw+cmKcEeIJvEXxIshi+K apqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708657396; x=1709262196; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gRbla0f/Hl6rGJ+ONURsPxKzzUMpoaT/YVrejYpQIh8=; b=BF0/I52RULa8wRASwRIWXOBklJ/a7J6/kqN+MM8xN+FFjm0dg0STLQNRQ91Iq/R7/s iXrBejBenjTM3+K1uQ1ZcdNqyPq5U06Mi2D7Zo/VZEHg6FL6/znujkpbPXAaJxjRB4lY u5nk8nCr+jS+vSn3CqZzUfIcK9de+ZV2WUUqgblZfEdqIP6xGqBBDVkIizvm83Va7jXU uXSv7eL6E0wqgv8D/0u5YtYSyR2GCU93Y4mdSXz5QJkC7Ax8HuPdVneis8xIgmbewLic E0RDTu+0f+QJrwDFDm1EMaXUrY5Fb6fT9Qt11+nhVCmQNdFGEVIiMrjjE6B4098WFwxM Gj9A== X-Forwarded-Encrypted: i=1; AJvYcCUarVFbONakJCsY45v3sEzHq27gK2iCpvDawQbk7M0z5FwB9gbsBCfFGzOzY4lobSvZp98qME/UFrezPE7moadNGE8= X-Gm-Message-State: AOJu0Yy7XCyhLnJSI/adCKoLkRYNlyb/FPzU3ncjfy+/MPcCv/kYBL1R MPShswkYkveVLE+dP8EHttI4DY23BJdlhm3nociI0UZY0qXYFASE3RbV38l9XdQZjdjN0UH5Qzf Q X-Google-Smtp-Source: AGHT+IGJqnlYOapYN19RR+kXIGIG4wNS531YY06c5pt+RcA/HoLkHhJLwaHFhA7xa4Mgo19sRY5rxw== X-Received: by 2002:a17:90b:2394:b0:29a:6d80:364 with SMTP id mr20-20020a17090b239400b0029a6d800364mr597656pjb.39.1708657395659; Thu, 22 Feb 2024 19:03:15 -0800 (PST) Received: from cabot.adilger.int (S01068c763f81ca4b.cg.shawcable.net. [70.77.200.158]) by smtp.gmail.com with ESMTPSA id nr5-20020a17090b240500b00299fe9c395asm238960pjb.4.2024.02.22.19.03.14 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Feb 2024 19:03:15 -0800 (PST) From: Andreas Dilger Message-Id: <1B53E6AF-0EFA-4290-A4CF-CFA7F3BF0E51@dilger.ca> Content-Type: multipart/signed; boundary="Apple-Mail=_8E84EA68-A783-46F7-8DB7-9FF3874ABA36"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [LSF/MM/BPF TOPIC] Large folios, swap and fscache Date: Thu, 22 Feb 2024 20:00:46 -0700 In-Reply-To: Cc: David Howells , lsf-pc@lists.linux-foundation.org, Matthew Wilcox , netfs@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org To: Chris Li References: <2701740.1706864989@warthog.procyon.org.uk> X-Mailer: Apple Mail (2.3273) X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 0851B18001F X-Stat-Signature: 5sw9qcrxz8richh17m9gzuxehw13aumh X-Rspam-User: X-HE-Tag: 1708657396-307165 X-HE-Meta: U2FsdGVkX18Q12OopanbVX4M7asmyN6GaZVCxajfb9RZRsXYKC+3n6fJ6hmd/ygzKOadJgKJV4sMlIesPBBZl1Q6yj2S/3k/4sWZuGArdr510R8OkmP+yBbDuzuXG1BDOdfLeCPtNyKVxstU2g4/4qAEEWFt/aGMAa/NWGvrEQepNDlR4rQdt5g2vcFgrqEEvPaqTBslenUxlyWJN/PHkm8vwDBpCaLr4ugOg+4Xao+bkryluPIb29CcTybcgGGQMZ0wIMxJ5oryiMPgMjNPUf7OAe+873oGyqRKIZFPnjPnhqFIqCsoLOj6M0+p+jCU8uNDgiw/PN1mvL1VjPldL4DMzSNyYUSUIDbBJrQE4ANiZEikh56Fg+lXuRxqrNX2xo9izPpjftViKb36hJQH3YjZwrEMDUt7AiE8gxeswspRK0err7Q/7OO9pF6a438JvnXOIN6nFCnOPtDOI+bDiVflBef9fFZHpikdHsN1r2DHlZjqopxTZBWrieVkMMsSqzU2VOyK28v+gU15111/ZLXBQogrhJ1s6eBdzB6WkDkMrWppPdEYSaGhBl2JjGaE6bcw0nwlU6SnYWmLQKZuWtcDXOqM10T37OkfVZ3vS8XFgjq9HPr4zd4dbB3rCfUOHBd0HY6UKjStMM/HpqWVOQfBUnUqhzMCcokZcpIBu69vj0Gu6jAfkpTbEPnmWU/Hu+JXJgU7CfHwcatqahZjUkYr3Hpli6cnbH796hQuH7ZEVpEkQkX+6Ne0/3xxNlgYT1/y26D8C5F2mvpjbTuClbUxNG5hTRAFbxj7B0pGLcc0Tm/YS0zLACMkM/7u5kMoRmWisJwtYPuYf2cIFWkYQrE6TppxwB4RvF5uB/wKyawj6JbsOroQLwVahE46tKbKUDkAGEvnohaX3RB3XwufqdnmNviPyDWiSWhH8Z8tzb73HVQ7hwwxM1/a0fs1ASm1MhSRmGSWyWsQuMC/Izb ZlHRnijv ikceK5bhHzcny8ifOwJVHTi3IR6EWEMflFltO/K1g7gmZNTHpaVUKUe9SWzYlJz1xVHE7/yvk/Y1XQx4oMUuqPEJ1tcbpY8khu6iJGDdjUdqsSo/spFrRe5aCe2ysbZNmRyZvsY+5HIXwIcs9McR9vgq6clo8TnKgDFdfyAj/Az0vlLkQfjS8SSIOVCk+zRL9k9xKdZhtc/U8WltJpa+7O2BT+pK7mbdNbK9mvZ9nujZ+7VjHIa2zWZnTJCRPn94tAsS3iIyAo2hCWBYSpsXhYkIJg2rWPvJiuJabap2E4G4jL4YkUL2pOdDX+b95DkH/1RCHgy+Xe3BPdjinzhUypbKDSBiKue447/P2YqpVWybdv4oeNFxBYiGdoOYvU8XVpOixd7EHnGtGxGYk2DL/2aI4KRAsKLA4KGOrqqdabRVdz4JX/VHXu/7mCtvONIT8vx1RxrFc0OGx+sFwAqtCFzp2Zk23oXGrRpTtDW/ZT9g8nHAfKuI+82eJI9Io4yDpRXVNNVDRqpiegWjcp3Pfxq4I7dg+D59wuRi09HC4ZOIQoO6W1MMZee5NwIK0IpxeqKNItWusu2cPi/GboSpSrpm75B+xhWCzIjcS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --Apple-Mail=_8E84EA68-A783-46F7-8DB7-9FF3874ABA36 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 On Feb 22, 2024, at 3:45 PM, Chris Li wrote: >=20 > Hi David, >=20 > On Fri, Feb 2, 2024 at 1:10=E2=80=AFAM David Howells = wrote: >>=20 >> Hi, >>=20 >> The topic came up in a recent discussion about how to deal with large = folios >> when it comes to swap as a swap device is normally considered a = simple array >> of PAGE_SIZE-sized elements that can be indexed by a single integer. >=20 > Sorry for being late for the party. I think I was the one that brought > this topic up in the online discussion with Will and You. Let me know > if you are referring to a different discussion. >=20 >>=20 >> With the advent of large folios, however, we might need to change = this in >> order to be better able to swap out a compound page efficiently. = Swap >> fragmentation raises its head, as does the need to potentially save = multiple >> indices per folio. Does swap need to grow more filesystem features? >=20 > Yes, with a large folio, it is harder to allocate continuous swap > entries where 4K swap entries are allocated and free all the time. The > fragmentation will likely make the swap file have very little > continuous swap entries. One option would be to reuse the multi-block allocator (mballoc) from ext4, which has quite efficient power-of-two buddy allocation. That would naturally aggregate contiguous pages as they are freed. Since the swap partition is not containing anything useful across a remount there is no need to save allocation bitmaps persistently. Cheers, Andreas > We can change that assumption, allow large folio reading and writing > of discontinued blocks on the block device level. We will likely need > a file system like kind of the indirection layer to store the location > of those blocks. In other words, the folio needs to read/write a list > of io vectors, not just one block. >=20 >>=20 >> Further to this, we have at least two ways to cache data on = disk/flash/etc. - >> swap and fscache - and both want to set aside disk space for their = operation. >> Might it be possible to combine the two? >>=20 >> One thing I want to look at for fscache is the possibility of = switching from a >> file-per-object-based approach to a tagged cache more akin to the way = OpenAFS >> does things. In OpenAFS, you have a whole bunch of small files, each >> containing a single block (e.g. 256K) of data, and an index that maps = a >> particular {volume,file,version,block} to one of these files in the = cache. >>=20 >> Now, I could also consider holding all the data blocks in a single = file (or >> blockdev) - and this might work for swap. For fscache, I do, = however, need to >> have some sort of integrity across reboots that swap does not = require. >=20 > The main trade off is the memory usage for the meta data and latency > of reading and writing. > The file system has typically a different IO pattern than swap, e.g. > file reads can be batched and have good locality. > Where swap is a lot of random location read/write. >=20 > Current swap using array like swap entry, one of the pros of that is > just one IO required for one folio. > The performance gets worse when swap needs to read the metadata first > to locate the block, then read the block of data in. > Page fault latency will get longer. That is one of the trade-offs we > need to consider. >=20 > Chris >=20 Cheers, Andreas --Apple-Mail=_8E84EA68-A783-46F7-8DB7-9FF3874ABA36 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQIzBAEBCAAdFiEEDb73u6ZejP5ZMprvcqXauRfMH+AFAmXYCl4ACgkQcqXauRfM H+DDDA/9G6tcCC1/Oa1ddLtr9tEvfJ58H2FNe6ZEBrNm16Kavw7WtfDnsXek+Z2Z AebkBz8PmrXfoSypAwqjH5PnCnU9q65UN0xCBEbPZyi1OnHLorK8X0tGzEkPtJRd o1f+MQw2poZTPrJYYZIrvWGtk+ifoL1nb0NyjWc4/NHeS6PZHID79wFXtvLudRTJ itQUPARFM6gQ10uRy44d08KlBQtRBKPPaT34+Ov502+iJxsrs6Abpr03ol9sIsK6 /Dw31dqLgkP1Dg/hiFHQpkDAEsYjn2QIPZZQ/889E++yMYtRxNqp4MQ7bdsnjiRW gWKtU8EPx3z6PHN0u7s0RipDQmmWT1l3KyfT+meunJvkHNRkGfBSURjUDroAG8yl lZnK/w9y6GHIp9s0Ho8byn9osgEyoW68HDdgr3Ubfqtw2T/yL0fZtuhcooShmDDL udyEn4VFlaOsdeF6F2YM9m3ERXsGvzu4Il34rDH4iYWc6Bd5zuCBkCWwpK+Hw9na li2ykVgDedJzTVmGGITn/FslxmnemMI+btkfX9g+pBqXY3ZgN7d4ud3EUl0u4tyA BtJSa51D9JXP2e7rRxXTw7MZITOA8O2QODfAquNj9/2oUsH5GTmEfh1Xbda0Ux+T 1wO/SoXrtFx3Cp9QPbroluMe3XeXTNSeTNrxtcM6p21pgmqea8U= =DMhO -----END PGP SIGNATURE----- --Apple-Mail=_8E84EA68-A783-46F7-8DB7-9FF3874ABA36--