From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA397E77188 for ; Wed, 15 Jan 2025 01:21:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 45E396B007B; Tue, 14 Jan 2025 20:21:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 40CDF6B0082; Tue, 14 Jan 2025 20:21:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2874B6B0083; Tue, 14 Jan 2025 20:21:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 02A5D6B007B for ; Tue, 14 Jan 2025 20:21:44 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A3304B0279 for ; Wed, 15 Jan 2025 01:21:44 +0000 (UTC) X-FDA: 83007934128.01.AD13D97 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf06.hostedemail.com (Postfix) with ESMTP id AC73A18000B for ; Wed, 15 Jan 2025 01:21:42 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=Rtasqzmf; spf=pass (imf06.hostedemail.com: domain of david@fromorbit.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736904102; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NBOBPA8Ipngxc0B42kthsUrQ7FIam0y2QCMQkPQgvcc=; b=4+p2efOkRVU98TuosOs5w9S3TrpGc5Q5X2KY41zJeqwZMVfUvEflcYUuqN3uQ4tHnFujuP 6ceyZnp83MwdUBNeeV6M5Dvm1EUYGg4G98i+CbtRMpbk+AP84cebGQA8rcTkJYqmCR0kmp KeSqOReqJWsAu4gNvV2ZmMjVx2LMuCw= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=Rtasqzmf; spf=pass (imf06.hostedemail.com: domain of david@fromorbit.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736904102; a=rsa-sha256; cv=none; b=7V27RbrRqExIJQHjNICREScDHBsavm1ap5FUvbqIU+ZdzPgvPl4RCrX5Xr0e08W2vbiqT8 Jmti0UXPQmQxN/zHgc2GyXfacFnxy/qVYaH2ezohNHQgpdKB23jdpqHyk1VeBC5HVl66R1 A/DmlIN0sMLCn/chc/6rTG+OoAz9/7k= Received: by mail-pj1-f48.google.com with SMTP id 98e67ed59e1d1-2ef87d24c2dso8121576a91.1 for ; Tue, 14 Jan 2025 17:21:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1736904101; x=1737508901; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=NBOBPA8Ipngxc0B42kthsUrQ7FIam0y2QCMQkPQgvcc=; b=RtasqzmfkkI4qHvn9BxVWazvG7Gs4yETRd/yWPaYpafjwabVdWFXqy7/Wtu4I4n3ep qOLPAfLQFTH2+HdjLK6JcGGX8yJzAUp7WMdfzfjZqUEuQq+6O/Qob1XMj4YioGuWn/LI CMCXf0yGhiTHsZOeBJcPiQaEZ59d+5b2nOiV3OtIs/CRQtwJnO+7V0PkPiAMmJHh9o+y fkTwRhQMOsHgAx6nrt9j58C9u4CmIadDIExiT9jkMaB3eWPnkZq0nWK7/oFfILSD/5NY XqynN7w9WqbR1tzVWa+Rzjcwvh/bfW3cGmPrQKefcl7kttKeo97SzpKvYTbxo5702wr1 mgvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736904101; x=1737508901; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=NBOBPA8Ipngxc0B42kthsUrQ7FIam0y2QCMQkPQgvcc=; b=DPI7uuCZWchNMnWtpONCwcWU2vzmt0BqYpgG6HLw0QlxEZ6ZzO13DZufeE2BCSqOWh rHqT3/HDduj+1/ZQo9AE2NxxNki04qZd+QsX2jNV/7rxRZURoNXqlvjP6oj6fsXpmj4S trUz4e7F8XoYpj3D7Qalof8bfLhhYZOlZsnULVLpp/0kPAnYCE86OgXrgrvn7BZdAYck BYyE33jgiQlqbJYJogvA28DAF4k4I5M9kKq8qNF+kSxCssA3MQpEIO17IVPn+X4ljzD0 UqTYP3Sts06Vws2y2DWVqvXCwONA/U9eyeHn5I/IzBZPXmRFZ050cRY6JD2iV+cXs76f NyOw== X-Forwarded-Encrypted: i=1; AJvYcCVSC8lzMXHQNOGRvROnv9lBqiI0ctkuKY6nKufDPkPcsE5YQ1QUfe1MJbAR23Wi4UzCbFqRrjYysw==@kvack.org X-Gm-Message-State: AOJu0Yx0LoYIu70aiOkCWQQbORanZuzJglWjTytjsfOVjCDoXv4CByCa nCqWbUw4pcLggxQF1rWLe30NVsUqNgqRv+KL0hZDtKL2m6DPN1/lOILPocqpEWg= X-Gm-Gg: ASbGncu86lrEtaoOlm6GnH5J4oLabkgWC+2w2lMNQPnGPimMRCUP1OTqM/2zLVnzVsH EDqof6XWQaBFb9kX1+pXQJf0eOMWAzQQRpLQttPp9ZWanSkai369gAJl41wBr36mfqeaJgUhl0k JquMd0uO7kA4lOpXHRrWYCmwjRAfTyNcKk9hX0XrpnWBaXD1SQyjZaLeJAd6EAkAIodyWreN61I /FFuDDEP+AVWh+iESAB6mQ/4J/wgWR7O1eMr9qo6sT0eh8N562rEYNV7pFnu2KIbqY6HJRiYVO8 PW6+6NXmqwmT5GVfHdysr/nQYvRGqv7G X-Google-Smtp-Source: AGHT+IHuZaMET3OrKLCAUT+QNZSCbeemRrlNpVTGqf1GqmIeLz5Cfauklsp95HyiIXeV02U3eBuz/g== X-Received: by 2002:a17:90b:2748:b0:2f1:30c8:6e75 with SMTP id 98e67ed59e1d1-2f5490e89e0mr33201668a91.32.1736904101339; Tue, 14 Jan 2025 17:21:41 -0800 (PST) Received: from dread.disaster.area (pa49-186-89-135.pa.vic.optusnet.com.au. [49.186.89.135]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f72c17f949sm213531a91.17.2025.01.14.17.21.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Jan 2025 17:21:40 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.98) (envelope-from ) id 1tXs6D-00000005yrJ-1Q3m; Wed, 15 Jan 2025 12:21:37 +1100 Date: Wed, 15 Jan 2025 12:21:37 +1100 From: Dave Chinner To: Joanne Koong Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, "Matthew Wilcox (Oracle)" Subject: Re: [LSF/MM/BPF TOPIC] Improving large folio writeback performance Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: AC73A18000B X-Rspam-User: X-Stat-Signature: 14jc9mfqomkx843e4bc4xgbp3nqjs1up X-HE-Tag: 1736904102-982807 X-HE-Meta: U2FsdGVkX1+txdRDJS/LQXD4fihOiphar/YugwCgbAlX9q7gbYQ3KZEmLYrJxvuVFiW0uKRU13CODXk6M0L73zE3LkwmHK7J1XwQym12Cg3XwOxdt8NgnWwusmYlDw0UKHLoy01DueWq+0o6hlf2UfVRIeBNLQ/7J/dwxCAxyrp3spOc3/HMU0Rxx/pH4T1UvutI6pI2ZfcLNAU9BsBAR84EY0EQ7mwmgvd8zHooNXIo2joIWNG6BrfFZWiBi3qss471aOp6r0m2yjuvI2FrP94u/hAzhq9lHY4TPCs2e3pBcTi0g7q89radtyqCxrrWmFIWBx8r1LgSPSqRJryvhgArqvrOgVhYr/+UH1BCY/RbOz5nJHcAurjfRYpNkuGueDcK9RQ4JiXhz37gutVoV/zHZcvsPLFzbZq8s+UPmSKdra2uzGnvrdxptUpfLBE4wtU1SXAAUJGkfMFR/h0YgJZMHrOsVgKo+/e/20JHOjKfw9r5+SePVeys+RnVBOKp3nU8OsKb/SZ69BQHA3jqYAFO2wcBwCTWgwGBEDZkdrQj4zkbz0ta7ogKAnjyOnYw1azukHy7+9NBZFEBmQKsGnN+ZCrfe4242K6JOZOCFvNGdo0rxpAyq8CLcCwEJycDL5YTvJRjIINiSVH39zkXiQEh0th47LrtrssK7gGm1veDfL7FRo0DxPEKECxqTeuWjoVppgEeocbTZHrkO4Urzc98WUpkQOi1fZMdkQUcSHubLceo+TBtQn9oFwhGAfZzeukwTYA2JL2n9sy/PCU3051eSoutFXvnbDLCPZCyE/272rAlvrPQtfNYVn8qKR7kzPICwy+tkx82RC3uyGY5MPomnidQSoketzu5/a1awJbsCLMAu5JHxHFCXW0+5ZmjLtyv6dAFVkPdTKv5QK0o4HmKznTJxnqZrwaQt1II/wwCRSQ0XLudUzbvdYbDEZR1hizN49fSikyHtI6mBCd sHV9lRad N0dQQqykY4Ckg0LQWnc7KTKWDo5taHSQ6a/+15iHC0V6UJyswUrA6Yh9a40B0NqO6XjSNS5B5bG2Gtva1g0Tyb+STgBlnGGzTFNQf+DCD6BGSVgG9RZzfsgYMfLn1KjP4SKEKssWlqAMOeSDMrpr19lPyAd7BeHDj78gt8LZ2LM3p0x0mvRWCt088WXUIVg9KYSFJ5T4+UTGbat+VRiy5h3ADPfSutQBSbTu75NXmq6d8Ki1cqLLXk3iox/NdfOJgTLTX1735VO58M9upxhrVXouQfwJ/Po9aJnj/LZ33tJBEI4CAdZVrOa2XUySzEC+8Zsv9E1nsn6Lu8U+JcGJPHVWLDQjdy66yf+Q+RwqTNessJH9gMeZ7/EK8lGwSB0uYSgKdAGzSwDPeN8Tnkb2yShxaJFOG/Toy//gtIZkMHuPwDN7a7jnoIObQIMfBHHCr7Z/viISNod4Wc7Fh8YPKKu5lP55iuKZCMAKJ3kYGsBUbxG6bZhM/4uKyrtWKz5ISaGmX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 14, 2025 at 04:50:53PM -0800, Joanne Koong wrote: > Hi all, > > I would like to propose a discussion topic about improving large folio > writeback performance. As more filesystems adopt large folios, it > becomes increasingly important that writeback is made to be as > performant as possible. There are two areas I'd like to discuss: > > > == Granularity of dirty pages writeback == > Currently, the granularity of writeback is at the folio level. If one > byte in a folio is dirty, the entire folio will be written back. This > becomes unscalable for larger folios and significantly degrades > performance, especially for workloads that employ random writes. This sounds familiar, probably because we fixed this exact issue in the iomap infrastructure some while ago. commit 4ce02c67972211be488408c275c8fbf19faf29b3 Author: Ritesh Harjani (IBM) Date: Mon Jul 10 14:12:43 2023 -0700 iomap: Add per-block dirty state tracking to improve performance When filesystem blocksize is less than folio size (either with mapping_large_folio_support() or with blocksize < pagesize) and when the folio is uptodate in pagecache, then even a byte write can cause an entire folio to be written to disk during writeback. This happens because we currently don't have a mechanism to track per-block dirty state within struct iomap_folio_state. We currently only track uptodate state. This patch implements support for tracking per-block dirty state in iomap_folio_state->state bitmap. This should help improve the filesystem write performance and help reduce write amplification. Performance testing of below fio workload reveals ~16x performance improvement using nvme with XFS (4k blocksize) on Power (64K pagesize) FIO reported write bw scores improved from around ~28 MBps to ~452 MBps. 1. [global] ioengine=psync rw=randwrite overwrite=1 pre_read=1 direct=0 bs=4k size=1G dir=./ numjobs=8 fdatasync=1 runtime=60 iodepth=64 group_reporting=1 [fio-run] 2. Also our internal performance team reported that this patch improves their database workload performance by around ~83% (with XFS on Power) Reported-by: Aravinda Herle Reported-by: Brian Foster Signed-off-by: Ritesh Harjani (IBM) Reviewed-by: Darrick J. Wong > One idea is to track dirty pages at a smaller granularity using a > 64-bit bitmap stored inside the folio struct where each bit tracks a > smaller chunk of pages (eg for 2 MB folios, each bit would track 32k > pages), and only write back dirty chunks rather than the entire folio. Have a look at how sub-folio state is tracked via the folio->iomap_folio_state->state{} bitmaps. Essentially it is up to the subsystem to track sub-folio state if they require it; there is some generic filesystem infrastructure support already in place (like iomap), but if that doesn't fit a filesystem then it will need to provide it's own dirty/uptodate tracking.... -Dave. -- Dave Chinner david@fromorbit.com