From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E2B2C02183 for ; Thu, 16 Jan 2025 20:15:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA9546B0082; Thu, 16 Jan 2025 15:15:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A59AC6B0083; Thu, 16 Jan 2025 15:15:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FA016B0085; Thu, 16 Jan 2025 15:15:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 720886B0082 for ; Thu, 16 Jan 2025 15:15:03 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id F2C5745D4A for ; Thu, 16 Jan 2025 20:15:02 +0000 (UTC) X-FDA: 83014418844.04.9297E53 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf27.hostedemail.com (Postfix) with ESMTP id 17CC940006 for ; Thu, 16 Jan 2025 20:15:00 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GQ+XGhnr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737058501; a=rsa-sha256; cv=none; b=DtY/NXbFl+UO3HjpOsie3dkByVrWPNoblF6hhwp5bbXQthSEizvx4Gbrw5Dpz/Zn/mqG3T 1qAMGUgikXTyZ95ZmmHKfiEcbou5EvmfkiksPf2jcF/yza7K5iqUvRc4RRx0dZLoAmuoQY N79HBEI+dv8AhDD2R2HhAL8cICGLRoo= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GQ+XGhnr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737058501; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K1u06/6NefMqF/hqn0Wz6QaN/ufqBA3YdyfxR4C0J6I=; b=DWJJviIjSDcU0wzonGHlyaU2B45pwzta+cZJttmQfic8/nbW3JISiVenwwMGz/dv6zV3xD k1uAt2C3+qRLo+zwTzwA5ABesRJCR5FmCHfSs67MJz43NKCWHyfZVtwW+EROeo/zb1sBR3 wBSBVerr3PjnpUGlPt3/E+Pv2Ed67go= Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-467a37a2a53so14726031cf.2 for ; Thu, 16 Jan 2025 12:15:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737058500; x=1737663300; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=K1u06/6NefMqF/hqn0Wz6QaN/ufqBA3YdyfxR4C0J6I=; b=GQ+XGhnrG6vui8wIMns1clt3kjDnGPu+WM2WJx3ykoHp74Sok5kuUm1KbJkkzrJ4pp EdHhI/DXcA7BqVqvKZ0YzUeiWHUAE/7ld2mYTIXBcg5RdMphjmx4SjjiOBZvq4dZXfCk ICKJpPIXqegidqVd1ibtCYDhQz8CbxEj0xVzhXEp4ytFdCFbot8BVfPbSR3dzW7Lncsm k6McsUjhTyDuSZ58B6XHmV5aDU+cVKRiQl1gH61r3ffabKQklOe7FLBaA0U6wSsoL/u9 LHqgSfnKWYXmcevT3580fqD2dp7lEKlYKcOD/GKF0V8hMxq1GeQnNE8o6jwgvxlwG4Gn 58ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737058500; x=1737663300; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K1u06/6NefMqF/hqn0Wz6QaN/ufqBA3YdyfxR4C0J6I=; b=gbzKihljZRXWrnfLBkKByS2Ras4Ogz6GSiZLLUH3S7k+3MNDall6+LZjOjBdS2Z0Iz YuH/N/+zH+iF6dndTxCwP+4rmm6aOQQdavREb58/1IoaRi034aiADbI+EARcLStLgAY9 srm6txvJESHxRL6P4UIn194zbAoaBt/ii+IMtk0vUXAeuBbjlqlDWjjGLuuJ09fYCbRj oKftA/s8XZeqm1RhBhPHjB0sgVvS2KMMwTtX7rb2ORrzxjyW6e/VEM585PNpdEPU0Hkb xqWmANChUEGTXWS7GwVpFiogKLxGRol7fEfs/iqxrfX4+Te813IvsIbCB7r+yj8gUTaI XDyg== X-Forwarded-Encrypted: i=1; AJvYcCXKsHoJ6t6yx8WIs3xOoZ7K7bZftmZoQPQWOwE+lh3S+y29Vtp4fagqoJkXQSqQUjXA7uozXZNLpQ==@kvack.org X-Gm-Message-State: AOJu0YzgmIHQPFQkcsUt1S0RNhgTFhAYNMI7Q5b3NbscK02yhI9zqQHA u4S7AMLCLnJ0Yx86rdI9kujIVnzs+zp0/Ig6O98HwfJrx7CJU2c5L1MqEdyzKajp7EivzOqfddZ lbAcHTWZ1NDuMWCNpfrFYAcOSaok= X-Gm-Gg: ASbGncsrrXq1ufasBWzkKIq5eFoWQWILzL1eMmhB8N5wpIO3bHdtcBgCTm3nRIRcKp9 Ilr95zBsvw9+hggzx72K/nFLgHd5JgwcyajXeino= X-Google-Smtp-Source: AGHT+IHjN7cES20yV4cclSrrmnbaK0fqWc6ccM3AGykYSVlbj1C3MTT7UhPU4hSgR40r01TigrqnT/OclkyhNYbG2E0= X-Received: by 2002:ac8:584e:0:b0:467:7295:b75f with SMTP id d75a77b69052e-46c71083e8amr587925221cf.38.1737058500158; Thu, 16 Jan 2025 12:15:00 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Joanne Koong Date: Thu, 16 Jan 2025 12:14:49 -0800 X-Gm-Features: AbW1kvaRsQV9SGQM08TpPYtrvLDidDlX3gTwCZe3t_AMuuLA74Dez_9XFzMQ_8A Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Improving large folio writeback performance To: Dave Chinner Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, "Matthew Wilcox (Oracle)" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 17CC940006 X-Stat-Signature: k49sa8xi7n3gb6ryj5kbh1zgkf4ssmkj X-HE-Tag: 1737058500-28166 X-HE-Meta: U2FsdGVkX1+FuZDzQvtpSJx2daZ8MHEtD4nX14ekbK7HXX4gLA0bLQW8ktby4nTYdHM6iZunLLGjRZxoBnHbBpXP+jwBLgabGFBw3B+wgW3T1GdXQ0ryS4DSkkVSWgnmpucI2dRb4L0lTgwVq0lpS1H5MJbXWh13S4G/T06LHE5ReLF/GNf3WDv8Mj0TJeKMKP7ncA3FsxQ3hWglb8PS+Leyt7IfBlxJfD9l2bYReCi5Njx0NBHpfGDaAvFu/QFB4lsCek6bHkWq1yndBiOYehTT7rIZkeUtu2hIxSScHgczjEqyE6M6zzpbzsPXAAX4hyTiMSwNh60WmPsTlheT622UNOM33HQZHVTum/APo/gj0wV1XHWqWdbE+BTxA+rVWPPE48t0Sg44l5og08qsc48pzYk6zovEOfd2bPhsnhjNTcc6Rfl5vVcvmPRWKurxmU5pN/sT8G/qO4riCxdoF7W5Bc3kGCLsdq+CO/eNGo4VRIYXmaMvF2TDStiEIVZVwSQCCOPKS2fack/lcHi1rcEfgh7OprOO6vVNAJUgjTIkxM6IYMjQxSqWW2MS/IDSGQRo/oU2m9SfNvV9XBbI7/a2dgqgZ+ect5RctOvyg4lpWV0cgx0Y+1XyxXV2uFTNZ9aknEVMpFcjviz4L6Oe7U/p203UFh/eeluXO9GQ7xg1a3mFxaDy1krk0rxjcEcqq6vXey0fWo9LoMABYIvl/326xa9srqnl9wdL9Qsv+INeRutT56JTq8a/sQJ261FCSrX3GEPbcumQATdiCj9KIChBmbfvn6juzFAa2z/3EWeckdSCAcFbT3LbojhM68QSAYMA18bYGI/FkQI9qxaDk+jBkzUeRDBaoVlg0666su6xiQw2h4dEo0DGWmOwXwzVnzhKC65sK56xQWOmI8jNNhj0o1k1I+Ag8NTcCQDlBDZcbguBN0b3g3JorASYQnZOXJE4yvsd4V8gTV7AFmO 6gfXOUzE J1rH2x2ibojqHjCdCJ20hcaHC2vEQE4kSqVJUU7nXKrrqk9GvsBHu+rIfyqjDhrQnFyn2LzHaLMhUz95KgdKFegN5ihXPLSfL36Y6nLGUqXVAd2iOkz92ph8D697chKa9ifPRWmGwnnYXDtEIiipCWC/f59vyPNCmC2JgeLc409TZ50o1pxly2/P5MOWil8kxQoToi/mZJCRf3zvok520+W1o2166a5Ht1AMK634lO3jHXesdsOmf2Z2wyj1gztf/MoNCbedseTJ5CWZQhfvTQelQfMn5yZtToYJW14ymSPK1RIV1rbkd04IcFgAqpnzHsKm9tFHbjfgTpK0n6t3AFknF6tSSo47U/JXayHzNLZ/o5a5LRwALs4RHCXyxD4kE9kEfaY3Rji4JTgpy12V4CF8x9hP8+eXAfNTSM0ITKlBfpOZNn1ZGzCo60FB+0wyB3Ou4kw/ytUH1GfarVfTbKSG0GQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.015100, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 14, 2025 at 5:21=E2=80=AFPM Dave Chinner = wrote: > > On Tue, Jan 14, 2025 at 04:50:53PM -0800, Joanne Koong wrote: > > Hi all, > > > > I would like to propose a discussion topic about improving large folio > > writeback performance. As more filesystems adopt large folios, it > > becomes increasingly important that writeback is made to be as > > performant as possible. There are two areas I'd like to discuss: > > > > > > =3D=3D Granularity of dirty pages writeback =3D=3D > > Currently, the granularity of writeback is at the folio level. If one > > byte in a folio is dirty, the entire folio will be written back. This > > becomes unscalable for larger folios and significantly degrades > > performance, especially for workloads that employ random writes. > > This sounds familiar, probably because we fixed this exact issue in > the iomap infrastructure some while ago. > > commit 4ce02c67972211be488408c275c8fbf19faf29b3 > Author: Ritesh Harjani (IBM) > Date: Mon Jul 10 14:12:43 2023 -0700 > > iomap: Add per-block dirty state tracking to improve performance > > When filesystem blocksize is less than folio size (either with > mapping_large_folio_support() or with blocksize < pagesize) and when = the > folio is uptodate in pagecache, then even a byte write can cause > an entire folio to be written to disk during writeback. This happens > because we currently don't have a mechanism to track per-block dirty > state within struct iomap_folio_state. We currently only track uptoda= te > state. > > This patch implements support for tracking per-block dirty state in > iomap_folio_state->state bitmap. This should help improve the filesys= tem > write performance and help reduce write amplification. > > Performance testing of below fio workload reveals ~16x performance > improvement using nvme with XFS (4k blocksize) on Power (64K pagesize= ) > FIO reported write bw scores improved from around ~28 MBps to ~452 MB= ps. > > 1. > [global] > ioengine=3Dpsync > rw=3Drandwrite > overwrite=3D1 > pre_read=3D1 > direct=3D0 > bs=3D4k > size=3D1G > dir=3D./ > numjobs=3D8 > fdatasync=3D1 > runtime=3D60 > iodepth=3D64 > group_reporting=3D1 > > [fio-run] > > 2. Also our internal performance team reported that this patch improv= es > their database workload performance by around ~83% (with XFS on Po= wer) > > Reported-by: Aravinda Herle > Reported-by: Brian Foster > Signed-off-by: Ritesh Harjani (IBM) > Reviewed-by: Darrick J. Wong > > > > One idea is to track dirty pages at a smaller granularity using a > > 64-bit bitmap stored inside the folio struct where each bit tracks a > > smaller chunk of pages (eg for 2 MB folios, each bit would track 32k > > pages), and only write back dirty chunks rather than the entire folio. > > Have a look at how sub-folio state is tracked via the > folio->iomap_folio_state->state{} bitmaps. > > Essentially it is up to the subsystem to track sub-folio state if > they require it; there is some generic filesystem infrastructure > support already in place (like iomap), but if that doesn't fit a > filesystem then it will need to provide it's own dirty/uptodate > tracking.... Great, thanks for the info. I'll take a look at how the iomap layer does th= is. > > -Dave. > -- > Dave Chinner > david@fromorbit.com