From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C5D8C531DE for ; Fri, 16 Aug 2024 11:12:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 362EE8D006C; Fri, 16 Aug 2024 07:12:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 313228D0062; Fri, 16 Aug 2024 07:12:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DBA18D006C; Fri, 16 Aug 2024 07:12:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id F03FD8D0062 for ; Fri, 16 Aug 2024 07:12:37 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A2758140D13 for ; Fri, 16 Aug 2024 11:12:37 +0000 (UTC) X-FDA: 82457845554.25.C1017F3 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf24.hostedemail.com (Postfix) with ESMTP id 5962718001A for ; Fri, 16 Aug 2024 11:12:33 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=gYeXurtm; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf24.hostedemail.com: domain of horms@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=horms@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723806695; a=rsa-sha256; cv=none; b=dVnsWtIf84Tpvb3S9muTe4NAtBVCn4+7rB2yHvxjtvPiHraVDRoDKPYz3Fk8eNAsbEjUtj 5mrt7v07o7X74QDhF7lQduhiwIfjedemahGYPfvJMdjeiIk4AG4RAJZpM9oNjzj7g1GpOr oZFl5wnAZ67mIL3qbCWap56lFzTOSxE= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=gYeXurtm; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf24.hostedemail.com: domain of horms@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=horms@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723806695; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YsaE8CWVBTrqR1Z6HNIDZN5PeH8bWIEtB61HnFo2Dsw=; b=jplt2XDnYSNH+FToUJMbSZscPl95ILiqfV/lPlZ1UeI5Lc/8tTMXwrHNjrFGcDYj23swWm 9NNtJD4/XFOwojrlVu7XTZ08g+oN46d8zueSSdvCe0THiArXoWFupKgCVfuKigHpwDOaZC hJS+DqUirJwtTn12Lo2cqIjp96d/yZc= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 35CF9CE1FA0; Fri, 16 Aug 2024 11:12:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BDCC8C32782; Fri, 16 Aug 2024 11:12:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1723806749; bh=Xvhko6m4gXDcrQOr/bnImNG8IdOv4SrzCD327herT1c=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=gYeXurtmdnv5fR0Lvxwg11Rmuc4ghH5uPedjp9RUF9ZTVRnFe2KEe0exw3cCImVFK rCaAS7BmZxAY/8WKFqSPpuApwA/YreEZ9JZ14QNkvArFFGle3hRED2RvbCpLyCba2l rIPVVL5acPzqM+YeW9LejtpG8c61dmocnDYvshJQKqGbPQJB2fbGcmY9H6N30CodvY inuJJ0QRj7hpszFcgk6KmUmw7RkjTNRCuLb4pQX2F0ynh6meOkJZNOwdlryD6Dy8zV g7Vy8wfyyWx8LeA3HgxdqtkuHQaZe/I9xJVOX78DNQ2ZR0Dft1NVeuF8t5cTsweIxv AbnNlysFgTzuQ== Date: Fri, 16 Aug 2024 12:12:22 +0100 From: Simon Horman To: David Howells Cc: Christian Brauner , Steve French , Matthew Wilcox , Jeff Layton , Gao Xiang , Dominique Martinet , Marc Dionne , Paulo Alcantara , Shyam Prasad N , Tom Talpey , Eric Van Hensbergen , Ilya Dryomov , netfs@lists.linux.dev, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs@lists.linux.dev, linux-erofs@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 19/25] netfs: Speed up buffered reading Message-ID: <20240816111222.GT632411@kernel.org> References: <20240814203850.2240469-1-dhowells@redhat.com> <20240814203850.2240469-20-dhowells@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240814203850.2240469-20-dhowells@redhat.com> X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 5962718001A X-Stat-Signature: wztbteon8ngp5dxfsahncf1cecuu3531 X-Rspam-User: X-HE-Tag: 1723806753-36604 X-HE-Meta: U2FsdGVkX1+tWkxeSp1c5VXATjq3F/VqDOlLsvIpbmgNMucrP/HHobD8k+1Y1UVcLviIKmPSI3k/iB08pbMI+0bj/IKyWgPmcaFP7It9apsPKeYGSdyDC+blX6LEa3XMUjAREz6rwCpSJ2VXtrUyocmQf12fc0GwnWUIkmULx3PcqyK3lcp5ET8lB0/1qtJR3rfkgaDKwR6lKQuewC5CX86xR/EbvsboXCEVz1vj9V0/ZkHRq3gJ30Kh/0B3lTsi8n0/Mvz2xQu+PFpSbTpkqJa5P18T/g0NSn0uaBIgRW4kTbhw53ljRVVSJgG+jwAgF71xeiDuCAurs9v5OyTzxhVYrjrI5U5/YUAMcV5chsfF3zkXRzFH1TfqWg5ZhGmur79oARQNosEz5xZSA6VxdFxUEj1bIA8hv/63mGL0OykNYTfc0M+bBbNHiunF4iX2BYu+tjLxc3lV50mkIssudIOSJ0y0okp2PZFKeMyu+TYW60M7R4iiVG5oAfJzz/yafW8CtRMT0zE84syfwfbjlZxKJlDdRpcTdkqWUN7eCU71AW+I4qfhMBqWDoc2wF9t3x8pzWp1LU01oPYO4IBP6qf+EUisfyIHKnzdpJ42v+bya7NFpHu+6Kr47Md9WCkSdDar6QtEazElu9JzdLnAKnmyc2/rCqHllyk7Uk1E1MU56DTlkjo+UMyfbdCadegz21XsB4rTYIbgM1UFftf/ago/5k5nFVL7txJsQK1Wji0kwZZJf4wpf28CYmnvkuSEb7tVlaXPjSCoAZHzif3aqDkHgID2sjBpROkrH+k9tzhDQkoQ8KpsUxbIGZf3XX0laT8axLjweYSY0LUXhgEJ27mmy8SLmbN+5N7jG3L9Z+CUFuLzcJzX6BywWZurElvB4BD4fNeVwinPR3O7sLYmc+aOY16AiXvEHQ/B9wk/vd7nx7Oi5zyf9IQQ/dAcjqI8OHISls6YmNcDgBgrJ4r 8T6KZmoB 9JtoHrAUezqt/ZyeLEPPK5lluKDHDfFuPYWPYkIREt/rIODkIzw49OYqS2Drh+IVg0RnSifAebDKjsT9bdT1utpfZG15IuZXlwWMznz0Ldc97VMxBboR159GtMTBVhoW0JNW2W9E4oBPTj0d8YDSvCoAR+2/7Rgo/KYe0cePQpaBj3bZ5P79Et2ISTAdt51NfEmP5T9fEKOM/COVRqqVi8u/vARtsiHXHOBtu+/FsbZs/3sGoinu1VjTBhnhaPas/P0zppopZF8EBnuE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Aug 14, 2024 at 09:38:39PM +0100, David Howells wrote: > Improve the efficiency of buffered reads in a number of ways: > > (1) Overhaul the algorithm in general so that it's a lot more compact and > split the read submission code between buffered and unbuffered > versions. The unbuffered version can be vastly simplified. > > (2) Read-result collection is handed off to a work queue rather than being > done in the I/O thread. Multiple subrequests can be processes > simultaneously. > > (3) When a subrequest is collected, any folios it fully spans are > collected and "spare" data on either side is donated to either the > previous or the next subrequest in the sequence. > > Notes: > > (*) Readahead expansion is massively slows down fio, presumably because it > causes a load of extra allocations, both folio and xarray, up front > before RPC requests can be transmitted. > > (*) RDMA with cifs does appear to work, both with SIW and RXE. > > (*) PG_private_2-based reading and copy-to-cache is split out into its own > file and altered to use folio_queue. Note that the copy to the cache > now creates a new write transaction against the cache and adds the > folios to be copied into it. This allows it to use part of the > writeback I/O code. > > Signed-off-by: David Howells ... > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c ... > @@ -334,9 +344,8 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) > struct ceph_client *cl = fsc->client; > struct ceph_osd_request *req = NULL; > struct ceph_vino vino = ceph_vino(inode); > - struct iov_iter iter; > - int err = 0; > - u64 len = subreq->len; > + int err; Hi David, err is set conditionally in various places in this function, and then read unconditionally near the end of this function. With this change isn't entirely clear that err is always initialised by the end of the function. Flagged by Smatch. > + u64 len; > bool sparse = IS_ENCRYPTED(inode) || ceph_test_mount_opt(fsc, SPARSEREAD); > u64 off = subreq->start; > int extent_cnt; ... > @@ -410,17 +423,19 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) > req->r_inode = inode; > ihold(inode); > > + trace_netfs_sreq(subreq, netfs_sreq_trace_submit); > ceph_osdc_start_request(req->r_osdc, req); > out: > ceph_osdc_put_request(req); > if (err) > - netfs_subreq_terminated(subreq, err, false); > + netfs_read_subreq_terminated(subreq, err, false); > doutc(cl, "%llx.%llx result %d\n", ceph_vinop(inode), err); > } ... > diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c ... > +/* > + * Go through the list of failed/short reads, retrying all retryable ones. We > + * need to switch failed cache reads to network downloads. > + */ > +static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) > +{ > + struct netfs_io_subrequest *subreq; > + struct netfs_io_stream *stream0 = &rreq->io_streams[0]; > + LIST_HEAD(sublist); > + LIST_HEAD(queue); > + > + _enter("R=%x", rreq->debug_id); > + > + if (list_empty(&rreq->subrequests)) > + return; > + > + if (rreq->netfs_ops->retry_request) > + rreq->netfs_ops->retry_request(rreq, NULL); > + > + /* If there's no renegotiation to do, just resend each retryable subreq > + * up to the first permanently failed one. > + */ > + if (!rreq->netfs_ops->prepare_read && > + !test_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags)) { > + struct netfs_io_subrequest *subreq; > + > + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { > + if (test_bit(NETFS_SREQ_FAILED, &subreq->flags)) > + break; > + if (__test_and_clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) { > + netfs_reset_iter(subreq); > + netfs_reissue_read(rreq, subreq); > + } > + } > + return; > + } > + > + /* Okay, we need to renegotiate all the download requests and flip any > + * failed cache reads over to being download requests and negotiate > + * those also. All fully successful subreqs have been removed from the > + * list and any spare data from those has been donated. > + * > + * What we do is decant the list and rebuild it one subreq at a time so > + * that we don't end up with donations jumping over a gap we're busy > + * populating with smaller subrequests. In the event that the subreq > + * we just launched finishes before we insert the next subreq, it'll > + * fill in rreq->prev_donated instead. > + > + * Note: Alternatively, we could split the tail subrequest right before > + * we reissue it and fix up the donations under lock. > + */ > + list_splice_init(&rreq->subrequests, &queue); > + > + do { > + struct netfs_io_subrequest *from; > + struct iov_iter source; > + unsigned long long start, len; > + size_t part, deferred_next_donated = 0; > + bool boundary = false; > + > + /* Go through the subreqs and find the next span of contiguous > + * buffer that we then rejig (cifs, for example, needs the > + * rsize renegotiating) and reissue. > + */ > + from = list_first_entry(&queue, struct netfs_io_subrequest, rreq_link); > + list_move_tail(&from->rreq_link, &sublist); > + start = from->start + from->transferred; > + len = from->len - from->transferred; > + > + _debug("from R=%08x[%x] s=%llx ctl=%zx/%zx/%zx", > + rreq->debug_id, from->debug_index, > + from->start, from->consumed, from->transferred, from->len); > + > + if (test_bit(NETFS_SREQ_FAILED, &from->flags) || > + !test_bit(NETFS_SREQ_NEED_RETRY, &from->flags)) > + goto abandon; > + > + deferred_next_donated = from->next_donated; > + while ((subreq = list_first_entry_or_null( > + &queue, struct netfs_io_subrequest, rreq_link))) { > + if (subreq->start != start + len || > + subreq->transferred > 0 || > + !test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) > + break; > + list_move_tail(&subreq->rreq_link, &sublist); > + len += subreq->len; > + deferred_next_donated = subreq->next_donated; > + if (test_bit(NETFS_SREQ_BOUNDARY, &subreq->flags)) > + break; > + } > + > + _debug(" - range: %llx-%llx %llx", start, start + len - 1, len); > + > + /* Determine the set of buffers we're going to use. Each > + * subreq gets a subset of a single overall contiguous buffer. > + */ > + netfs_reset_iter(from); > + source = from->io_iter; > + source.count = len; > + > + /* Work through the sublist. */ > + while ((subreq = list_first_entry_or_null( > + &sublist, struct netfs_io_subrequest, rreq_link))) { > + list_del(&subreq->rreq_link); > + > + subreq->source = NETFS_DOWNLOAD_FROM_SERVER; > + subreq->start = start - subreq->transferred; > + subreq->len = len + subreq->transferred; > + stream0->sreq_max_len = subreq->len; > + > + __clear_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags); > + __set_bit(NETFS_SREQ_RETRYING, &subreq->flags); > + > + spin_lock_bh(&rreq->lock); > + list_add_tail(&subreq->rreq_link, &rreq->subrequests); > + subreq->prev_donated += rreq->prev_donated; > + rreq->prev_donated = 0; > + trace_netfs_sreq(subreq, netfs_sreq_trace_retry); > + spin_unlock_bh(&rreq->lock); > + > + BUG_ON(!len); > + > + /* Renegotiate max_len (rsize) */ > + if (rreq->netfs_ops->prepare_read(subreq) < 0) { Earlier in this function it is assumed that prepare_read may be NULL. Can that also be the case here? Also flagged by Smatch. > + trace_netfs_sreq(subreq, netfs_sreq_trace_reprep_failed); > + __set_bit(NETFS_SREQ_FAILED, &subreq->flags); > + } ...