From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DFB4C02180 for ; Wed, 15 Jan 2025 14:22:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CAD216B007B; Wed, 15 Jan 2025 09:22:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C5CDC6B0082; Wed, 15 Jan 2025 09:22:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B24E16B0083; Wed, 15 Jan 2025 09:22:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9373F6B007B for ; Wed, 15 Jan 2025 09:22:08 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 48F92C1496 for ; Wed, 15 Jan 2025 14:22:08 +0000 (UTC) X-FDA: 83009900736.02.DED704B Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf17.hostedemail.com (Postfix) with ESMTP id 337B740008 for ; Wed, 15 Jan 2025 14:22:05 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QRj2A4mY; spf=pass (imf17.hostedemail.com: domain of amir73il@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=amir73il@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736950926; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZpNCaGxLUFXdw93+hpnk6scIhRKltwNHo+b61K1Jcdw=; b=TKVDWOazK04D6xmBlMGxda7U4vaQ3oe3FVhiOWkT2iqjv/uPZVsb4g5h0nXM/rQK7LeZNA A3ho1dqsb93cZzBtCCy3KqxCSkQKtmMTUnvQalwSqY+X7DxK/j5JB3Uxz7MBKYw8dQyy9F 4my16iMJpB+OeYjmAjixnDVTHl1pnoY= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QRj2A4mY; spf=pass (imf17.hostedemail.com: domain of amir73il@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=amir73il@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736950926; a=rsa-sha256; cv=none; b=BivHQLEz48upQVmJVeL0RwoUob0V1CSf3GYLG4dZJwPhJEwWx+Wl4GYcH6ZSDFVWFiJzpD 3QfgePLdzVMaVK9kLpmAppkrmkSOTwpcoF7MH1rCc6fj0g0A5QEx9SZhKMwM++y9I6+NBm 430qyp+tovn7p4HywTCiNuUpg98rA90= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-aab925654d9so558199566b.2 for ; Wed, 15 Jan 2025 06:22:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736950925; x=1737555725; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZpNCaGxLUFXdw93+hpnk6scIhRKltwNHo+b61K1Jcdw=; b=QRj2A4mYn8rCQtAAPMZ5rXMnhekZqadCV517A3aXis9RLFekHyOqSTdN0T1RKL7usC Cy59eXO9NbJ+wEAhzGRbe1HTbCbRysIbUybgRTXsXBePh+WZkn8M1BRqhdprNf/7RYl3 bdvnHpq0M2uGOgJM/ZoJoRLYVNlLo/0Gtn5uPx8s1kOpRVJtLxPSG7tBp+ze12CDYtV2 iyBjA2YSDD0QVejlY2AVKFR1jbFEN8oq/3l+hZRb7SBTraWzHo2wXNGljlziZDbCp5Yr ptDoGKDf7ndbiCnKZUzBE33O9rSGDp+8nSRH4By8te+GcLT0NZExefcNqds0ueB+qdER 29Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736950925; x=1737555725; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZpNCaGxLUFXdw93+hpnk6scIhRKltwNHo+b61K1Jcdw=; b=jjgxyTKrOGQHU/JyF15ASIiZXYMbZhbRCncdxyYX2Ye6+tXycTuRgAxMcZcFKMSeGa NU3BLQujNr5eKhUxoytXMCWEfhE5gK88eV4n5X0obeNoDcg+1ow74cF1GnOlrDFmigul seHMtlHwb83ERrkpY/eFNLClcEembqws0CgI2Fo2v0fYZDnicC56RbY3W1PXB6Iheq8c H026QU/T1b8xuzZ5OPcDwdSRxBxTv4JYih2JAAW1lIWupw3c9P4wO1serkTSNn75jnq1 FblChZf2daDOpgpWLYybnes4t+PaxzxqD+v4vZODIoqXc996tlUZZtL6lr4vWaviBE9M vCkA== X-Forwarded-Encrypted: i=1; AJvYcCWWBzYkrjPelvWO6VgPAoAPMfJhdGbk15oAjLYU4ZrEP0W+XcIxC4L4doYpgIIYdAm1uLEsDLTeNA==@kvack.org X-Gm-Message-State: AOJu0Ywhc/4FtgPHiDO3TRKAp/wSIx/XWRqLyCDL+h3MKB3IFLSx+EJK hQuXHgieyR7kcI+GioHdJp5SzvMIoPtMTkV0sKstV/Pf/5u4x5vU49vcnf5lNrzztqXf9FP0x07 tiTRiKQOGwVX8WU5pv5EjizS5wH8= X-Gm-Gg: ASbGncvIAq6s7Bn0xonMpBXYjVtBx2VLE/Ay7uJOfojqfU1GO6E5nF+pt7yOPfdCnYX aTHlj9TfYxBzmPqDx8T2PbUC8bOcz5cBMbPIOHQ== X-Google-Smtp-Source: AGHT+IGLJ7ogFsKvkeEut7+IPf2Nvn/0WbXgOqesdp3s7FRRqdDRryVXxhP3JbWwPE+O4c2IxEyTfuY/uH590XIRMhw= X-Received: by 2002:a17:907:3f95:b0:aae:b259:ef6a with SMTP id a640c23a62f3a-ab2aad3a453mr2845127566b.0.1736950924154; Wed, 15 Jan 2025 06:22:04 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Amir Goldstein Date: Wed, 15 Jan 2025 15:21:52 +0100 X-Gm-Features: AbW1kvaUCHcOHyshncYTFWiOBUhFR2gWq_OQcImpg-Lrap4n5gI-Z1Qaoz55BkE Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Predictive readahead of dentries To: Shyam Prasad N Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel , linux-mm@kvack.org, brauner@kernel.org, Matthew Wilcox , David Howells , Jeff Layton , Steve French , trondmy@kernel.org, Shyam Prasad N Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 337B740008 X-Stat-Signature: p461ki1hmms994s9z8wy1qt38owuz4sj X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1736950925-521328 X-HE-Meta: U2FsdGVkX1+zhybMY29P/08sR92mOF4TpnwLhHztSvQjOwMxLtSzQGkbJ+OrWvFPqTSmo1hH9PAeF7k9fHCOpwiM/HVFPzKAH3WWLvjf2qmWhO1Bg/SVeYfb23hIJ/j30oZXbzXNZQN8W8rFsc3XbAIZsFRuTSug9pHcfNYmHxK/UdWi8iLDfUBX5gX/cRiCEQT51b76o33oK10Xfl+VU/s5imHX+9ICuE9jzTuCcmOmearncrpTKN2kJIu3jDtLu5/rYtmkKMcJCIjl9sBhrS1LHPfnvoCUOBCWxSGNsyE3NYDSD6lQCFe8NtiEr2s4GepqZTlAqcyWtY0HM/4n0N7frZglZwgrLD4RT5zOsWYo14zUfx1OsTg9iQZ6RehWJksqMiRxAKKDtuCCqZ8CYbVeDcFHWv+DtAh91fN76C6u/YTFJO+l46iBIPZbfWMu+hG/OQCUnCHPLpxSVdj6rBD8It8+31vdOrwcjhATNNRZgmkXFwoMqxh4gvjWWmkLvgXcfl9ni6hAsqmrYuQm66XR+5fHh+a7Tuw1EMAi96xHoIwp1DeY19C6IFqhhLm455a/f+IkcC84T8tcxY2H8O+rRkJQhkR4gZgcR9UY9x712EUtGfkzgzoLxaTj3C5UGSELSdZy8Lj37/RfLmYnvGJ4gCz52179Oo/h3RRwFMvu9xxM6qD0/Nnn5tqzcMtiGXgflZBxlvdKdMKlsqvjSb9yLFzbH5JCxh0ZPKIYbrnopuaYcSiBtTjDK64nIaRG4VpZYs4BrTRiQJ7UvtPZCm1GgHBGg9XbMva7j0LtihlyDQDXz0tMU/rybycHrXm8U0sTpfjoVHxbUPD/iFRN0fu7BIqqot6TmGsxnm2nq+bzZgSDz7x+L6xxjmFawaFecKwREoyTz1Hkxsk56KssbxMOD2uTH/RqjvshNGF/xRJBOQAxNow9rAA4tx5xKuYUutEZmG0HwIgtFVBeKWd Ii8Bvsk2 NaVZ4YZT0mSOtkYfeUzGpfrJUmV87MRWMHcJG8Ik/dRcWERfu7AOHqVX8FWpxcpGU0cM/YDkA42kqDS+ziVgL7gAAZMW0oKk5LQ2wja3celnIFb+k0eU4u78nAWfe+YyZenLnbcJFQ2e7w0zn7qtNZg4R8+TG5T/EvX8Z0/dyvenWhFgkYY1MidP2XvLR9rw95BWJYI1p2DNF9CvnX0OoZ8LupiSZNMtw8TCAapndsiBITr+7fR8yIWtk+w/J4LNfF17lQ5xeKoTKtY8W4TeRAri5/sOr1M5G4zTi8vsoBDoBxW1+FmRdTwJ3POi50dhPtrEXIm1EYR65iUx/rnwvj+0lBw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 15, 2025 at 12:27=E2=80=AFPM Shyam Prasad N wrote: > > On Tue, Jan 14, 2025 at 6:55=E2=80=AFPM Amir Goldstein wrote: > > > > On Tue, Jan 14, 2025 at 4:38=E2=80=AFAM Shyam Prasad N wrote: > > > > > > The Linux kernel does buffered reads and writes using the page cache > > > layer, where the filesystem reads and writes are offloaded to the > > > VM/MM layer. The VM layer does a predictive readahead of data by > > > optionally asking the filesystem to read more data asynchronously tha= n > > > what was requested. > > > > > > The VFS layer maintains a dentry cache which gets populated during > > > access of dentries (either during readdir/getdents or during lookup). > > > This dentries within a directory actually forms the address space for > > > the directory, which is read sequentially during getdents. For networ= k > > > filesystems, the dentries are also looked up during revalidate. > > > > > > During sequential getdents, it makes sense to perform a readahead > > > similar to file reads. Even for revalidations and dentry lookups, > > > there can be some heuristics that can be maintained to know if the > > > lookups within the directory are sequential in nature. With this, the > > > dentry cache can be pre-populated for a directory, even before the > > > dentries are accessed, thereby boosting the performance. This could > > > give even more benefits for network filesystems by avoiding costly > > > round trips to the server. > > > > > > > I believe you are referring to READDIRPLUS, which is quite common > > for network protocols and also supported by FUSE. > This discussion is not completely about readdirplus, but definitely is > a part of it. > I'm suggesting doing the next set of readdir() calls in advance, so > that the data needed to serve those are already in the cache. > I'm also suggesting artificially doing a readdir to avoid sequential > revalidation of each dentry; or a readdirplus to avoid stat of each > inode corresponding to these dentries Well, if readdirplus is implemented, then "readaheadplus" could be implemented by async io_uring readdirplus commands. Right? io_uring command would have to know to chain the following readdirplus commands with the offset returned from the previous readdirplus response, but that should be doable I think? > > > > Unlike network protocols, FUSE decides by server configuration and > > heuristics whether to "fuse_use_readdirplus" - specifically in readdirp= lus_auto > > mode, FUSE starts with readdirplus, but if nothing calls lookup on the > > directory inode by the time the next getdents call, it stops with readd= irplus. > > > > I personally ran into the problem that I would like to control from the > > application, which knows if it is doing "ls" or "ls -l" whether a speci= fic > > getdents() will use FUSE readdirplus or not, because in some situations > > where "ls -l" is not needed that can avoid a lot of unneeded IO. > > > > I do not know if implementing readdirplus (i.e. populate inode and dent= ry) > > makes sense for disk filesystems, but if we do it in VFS level, there h= as to > > be at an API to control or at least opt-out of readdirplus, like with r= eadahead. > That would be a great knob to have for network filesystems. We have to > rely on heuristics today to predict which of these patterns the > workload is using. > It seems like the demand existed for a long time. Man page for posix_fadvise(2) says: "Programs can use posix_fadvise() to announce an intention to access file d= ata in a specific pattern in the future, thus allowing the kernel to perform appropriate optimizations." I do not read this as limiting to non-directory files, and indeed fadvise()= can be called on directories, but others could argue that this is an API abuse. Mind sending a patch for POSIX_FADV_{NO,}READDIRPLUS? make sure it fails with -ENOTDIR on non-dir and be ready to face the inevitable bikeshedding ;) Thanks, Amir.