From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB945C47DD9 for ; Thu, 22 Feb 2024 22:45:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DB056B006E; Thu, 22 Feb 2024 17:45:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 58B366B0071; Thu, 22 Feb 2024 17:45:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 452A46B0072; Thu, 22 Feb 2024 17:45:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 344196B006E for ; Thu, 22 Feb 2024 17:45:50 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id F11B6121038 for ; Thu, 22 Feb 2024 22:45:49 +0000 (UTC) X-FDA: 81820923618.26.4A5B349 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf05.hostedemail.com (Postfix) with ESMTP id 04F3510001A for ; Thu, 22 Feb 2024 22:45:46 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ZvjLeczf; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf05.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708641947; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X7c2jib599HsyOwxxd9ZbCLdTU1DFJ0qTJOI8O/Sg1I=; b=qP0SVo/Ux7dfYqSMymj4J3YPCohrUnvKFDQojtQnKCW6L/cdDTly0xfpBdywXMpyFbzjYb PxceZpXw3QpzV37xr6Qy95KHp0s7a/jh9I2bp7zKyHmXJMl4Pt9W2vnj89ZRTGMPSu/ZnH Hh6wVaG5teZJQW/0IsavXWgWw8aaQSU= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ZvjLeczf; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf05.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708641947; a=rsa-sha256; cv=none; b=qYC2xtuXHRo9Rne1Ft4HzamAxH2d7LE24/85KBxy3qQH5yfKivftnkHbOb87aPWLdocTC2 tf6z3Jf8n8PCLmkaKUE3ZcAoL6TQtqFEBBvvsL8B5BA1YS4cIQKtnoULFf0XKbgwSnLpvc 0U2s4zCmJy5/89k7ixLIhWc5ntinATo= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 50A6F63343 for ; Thu, 22 Feb 2024 22:45:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 02FE0C433F1 for ; Thu, 22 Feb 2024 22:45:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1708641944; bh=kmJAjR+7ZTgtiQN/s7z+cBHgMau140oHAN6WUoLV83A=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=ZvjLeczfaD+ARQ8sOt/1pzJbwDndoezg6YvfHkTFYpPvzhp3J3p/+OBfPmACWy7ws EumZY9U8710GY16/4CUlxfZh2WA4Lg2dFqAwkiCDDzYBpYBfI/h27AcLUGGaofO1rz oIxoKJXiSBW+thtGEtw8NG8zz1nrWMKDeCyXhqQJAf4GA3Bef6szc6TCFBpZDaOB1+ YTcZ4HCqGU+cWFQhi9ykBUYJU6XWpFEKqTp/TLiITg4A9zy6dzrBnvD9H46Ltocj/x IKIj3evzw7Ik9bbBMS5kuPBaDnoW7/8HOUNJfQnDfSIXFnt7aO6Rj8BWMi3epgpvsO iysiTER08D7Ig== Received: by mail-io1-f43.google.com with SMTP id ca18e2360f4ac-7c7476a11e0so9956839f.3 for ; Thu, 22 Feb 2024 14:45:43 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCXEL/L/rPZfrI2OufedlE39S2Q+MYXffI5RgtihQA5OIJtwowy8ILkZmw/P/iF0UZPihzXDpUf0I8/o9JuOR2jX8Yc= X-Gm-Message-State: AOJu0YyG0SPuGKk2hTTsgpLgy4SK+EIT0XoU4YaG2Vc0SlbzErQCSaE3 0IfMLMK6uBR6fYSljMo7XPxFUEBz85B/X4rTPzjiKeoVwwokOmqJWze0yyGnLCAqXK2Y9j8tdaU vv4aCqfLBIUvtcNlAxu28DUWVdg/rCr26A2ki X-Google-Smtp-Source: AGHT+IH/pp2pGlKmyFGqyB7/Jec48p2xPhDOGpbe89kkOgM/DKmGX79GlR+z+BxAPG4u9xYopEFeJcXMcXT0vtDqSEQ= X-Received: by 2002:a92:d58d:0:b0:365:41b5:b3c4 with SMTP id a13-20020a92d58d000000b0036541b5b3c4mr376244iln.18.1708641943304; Thu, 22 Feb 2024 14:45:43 -0800 (PST) MIME-Version: 1.0 References: <2701740.1706864989@warthog.procyon.org.uk> In-Reply-To: <2701740.1706864989@warthog.procyon.org.uk> From: Chris Li Date: Thu, 22 Feb 2024 14:45:30 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Large folios, swap and fscache To: David Howells Cc: lsf-pc@lists.linux-foundation.org, Matthew Wilcox , netfs@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 04F3510001A X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: g6cmibpxwamx44gpjakb6twjz6gund1q X-HE-Tag: 1708641946-640689 X-HE-Meta: U2FsdGVkX19+lEqCYUCYj6Agu9cHIy2IRYN0+mFDGq/TtROgiZxlISMfVtgQ402dGZ1nSgAKk+O05vyiqxhGvxkminhBfOngSUSu9UYh2dqjZIcOM8vLmb9WeCu7Madnw1ekAMaa3WSlJY32znDUwB2JYY9cDg4skqRVFtILkmrdbRykdkpHFIAMYrf9u/fnSwcc+vlC+gxF7VLgXmhZC3IO3jHkVXSn/86j7W+3kQtvSR9RARJmuaFCiPiq7ChRB+b9+Tmqun6eCIcbkrs1UAUJZiiP73DSLI+g8lY3YnFcsOOgfhtIGj82wx4IlIR9BwmxwS3wWpETWo2+3VLBp3lFrtYfupOGx2HjdSbI0BRmDE/v884ixPtNAYDP67sNWBXbqjfi8dart4ahiylfABsf5HRaw21U91V/N296ckSsyy/Fxu4YklpbzLHnDJYgxfNXrXgEm+fZO3CgjIOzmK3QgaTl54wkXzV2K8Thw/1SX3MytTmTDQUaS/LvkF4pFSJuv7Xv6m1BHX8OZh0q58zcQAt8VM07BnsgQVW0sYgtd1/nV6YFnWoijH+HTCmfVZ6aswsBA3F3mhfWVabIWuUnxquofKdbrIqjUqQ3VcA0BZwgzC/kxIdI20zkDiqOtAGxDO3x3SVmYQ+uJxlbojdY22mg9qLwQlk/vMAcxUg8XRxdWLbSkRyjrLdVj8EH4B4S1Q6l0k/ehKgfKNlB9++04bgVs/9Vdvbte6jc8WxNExPUPGq6mJpla6l0th+lWJHMnA3GdRzjTgDk33paeHBWpss9cgFQfxvJAAal5WHsAkV4Pv+5btOpODdtE3sMQTZVw8nybQDxLnV9jMtnwT2Nt3RwSO+8ST7OquNmz+VGcwI6K0a6RQ0NkLww+7JbUsSz2ZQxSYxoYZPBYKgVaSgymr5JzJRgcm1VommS1oaznhMshWJ+rjwWWfoGBK3X45/WNTTJ2jlD+CPYoRY u6DcEDEH dwxE/lONjtVvhymDgh0lKY+j5sxBeBJVRnQOAgdRCUh5VQZUUgzb39i4EOnz1ueu80OSPXsGhlRH50lMRnXYzftr9gapl4NUX4Ze+TdjoG3s/X66bD2rsCWOmxAb7AUrlkxfldFfz1h1T3SqAnTsZWmqovirTMv4Y+8l7Y3IoEs+kbRw3PdzkeJjxGAYpvc7HJo24widQIcLBA9WWr5mfFBZoEnjSGqQaKSXZfKNMMvBpAy8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi David, On Fri, Feb 2, 2024 at 1:10=E2=80=AFAM David Howells = wrote: > > Hi, > > The topic came up in a recent discussion about how to deal with large fol= ios > when it comes to swap as a swap device is normally considered a simple ar= ray > of PAGE_SIZE-sized elements that can be indexed by a single integer. Sorry for being late for the party. I think I was the one that brought this topic up in the online discussion with Will and You. Let me know if you are referring to a different discussion. > > With the advent of large folios, however, we might need to change this in > order to be better able to swap out a compound page efficiently. Swap > fragmentation raises its head, as does the need to potentially save multi= ple > indices per folio. Does swap need to grow more filesystem features? Yes, with a large folio, it is harder to allocate continuous swap entries where 4K swap entries are allocated and free all the time. The fragmentation will likely make the swap file have very little continuous swap entries. We can change that assumption, allow large folio reading and writing of discontinued blocks on the block device level. We will likely need a file system like kind of the indirection layer to store the location of those blocks. In other words, the folio needs to read/write a list of io vectors, not just one block. > > Further to this, we have at least two ways to cache data on disk/flash/et= c. - > swap and fscache - and both want to set aside disk space for their operat= ion. > Might it be possible to combine the two? > > One thing I want to look at for fscache is the possibility of switching f= rom a > file-per-object-based approach to a tagged cache more akin to the way Ope= nAFS > does things. In OpenAFS, you have a whole bunch of small files, each > containing a single block (e.g. 256K) of data, and an index that maps a > particular {volume,file,version,block} to one of these files in the cache= . > > Now, I could also consider holding all the data blocks in a single file (= or > blockdev) - and this might work for swap. For fscache, I do, however, ne= ed to > have some sort of integrity across reboots that swap does not require. The main trade off is the memory usage for the meta data and latency of reading and writing. The file system has typically a different IO pattern than swap, e.g. file reads can be batched and have good locality. Where swap is a lot of random location read/write. Current swap using array like swap entry, one of the pros of that is just one IO required for one folio. The performance gets worse when swap needs to read the metadata first to locate the block, then read the block of data in. Page fault latency will get longer. That is one of the trade-offs we need to consider. Chris