From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4108BC41513 for ; Fri, 28 Jul 2023 23:55:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8512B8D0001; Fri, 28 Jul 2023 19:55:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 801836B0074; Fri, 28 Jul 2023 19:55:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C8AA8D0001; Fri, 28 Jul 2023 19:55:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5D22D6B0071 for ; Fri, 28 Jul 2023 19:55:10 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1C35B1201BF for ; Fri, 28 Jul 2023 23:55:10 +0000 (UTC) X-FDA: 81062679180.28.BD6D79F Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf24.hostedemail.com (Postfix) with ESMTP id 0921F180005 for ; Fri, 28 Jul 2023 23:55:06 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=Zg53RflB; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=8Q55lJUX; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf24.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690588507; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vpkws96VxtaEEsabDbCElfZ/CAGrK02wRH+80AynJeI=; b=tx4hga6ps9LGIYv16WbqkEoDM6bxo8dRFE8CZ2zO3jSB/N9avlwX0RUU5ICVAEv7xESzbG Vjtt+lTSRl8rzBvnmT+AJ//EVfGPSzyUc+ry1PVaneArzM1fEXGjZRDtgZiJF2vwodql9R kSjM7Wy0Qgm61bdaZGFiDAeqUU4DGWU= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=Zg53RflB; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=8Q55lJUX; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf24.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690588507; a=rsa-sha256; cv=none; b=H/nTjl7sxgzNQnuEZgkZIDCshKLAjM7q5gb62+P5gavaSZQM2MEZhrEQwLvmxqRQXWqy// 78WG45qEXqIBSuya/yDzTd8gr4vBS8E6G0p8QpLr4nW1YbJVpPbwyyr0vxH2haRoZ2/zcK JId9fk79oy050YBuEzt/QvcaclE+598= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 582231F896; Fri, 28 Jul 2023 23:55:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1690588505; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Vpkws96VxtaEEsabDbCElfZ/CAGrK02wRH+80AynJeI=; b=Zg53RflBzXPb/WjOhq9z7GxPVaVruduuA94kKZsJzMtKSgnxgDI5subY4BsPKxKUj9bawz R8IYSw0j2TterfOw53YuY58c+Hd4LizvlARaUXX+5LbD5+ptMqSBHyBgjlVKJI92alnzLb RtXNi0nCF9IwrnONB9jSWfS5SJgSWbg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1690588505; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Vpkws96VxtaEEsabDbCElfZ/CAGrK02wRH+80AynJeI=; b=8Q55lJUXFBHf57Cv+CiVQiUiSOH7V5hOKA6feYU+GV3JRVICNdSfLqIBR1+peepyi30Au+ Ahu9gx8p4TVr9XAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id EC26D13276; Fri, 28 Jul 2023 23:55:01 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id EVBRJ1VVxGTpfgAAMHmgww (envelope-from ); Fri, 28 Jul 2023 23:55:01 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 From: "NeilBrown" To: "Chuck Lever" Cc: , "Chuck Lever" , "David Howells" , "Jeff Layton" , "Hugh Dickins" , "Jens Axboe" , "Matthew Wilcox" , linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] nfsd: Fix reading via splice In-reply-to: <169054754615.3783.11682801287165281930.stgit@klimt.1015granger.net> References: <169054754615.3783.11682801287165281930.stgit@klimt.1015granger.net> Date: Sat, 29 Jul 2023 09:54:58 +1000 Message-id: <169058849828.32308.14965537137761913794@noble.neil.brown.name> X-Rspamd-Queue-Id: 0921F180005 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 5qcaftfp93p5h3h8hieiqsnp3rwmsm41 X-HE-Tag: 1690588506-921904 X-HE-Meta: U2FsdGVkX190/nJZRBhMEqTz72ZsDuOMBqGYKGA1EmkrgLlqKPOh1uRD0Nxbuc5NZoUOZiblSIYX2PdZ5bgXvIZp1JtS34QMd8sjqkyMu533TnIU9pLYlMEQFVNV7qqT0CmTOFkGb21iBaUYA3+v+paiFacwfftBIKQk57nvHuuAZc9NYykw5VRsMhNeil8SlOSy8b+dj1weoy8JguoVFwm64K+fNLt1A3EKVz6TzYY6hAFj+4m62p4AjmBgy/TZ36VVdOuylOYg/akNNHYMdXZfky1HCY5m2C1HgY+0mdk4X5K8OGLpyOx89BWMIERxP+iWyWEEjn/RKDY6XFtpm/9l7iCubTDTfYiRnRgvVw3C2TBA259UcDubAwdfU8NMFaaC3LGk8oGAxJ4TKvZgfz88QOyeuwtiQprYJB5YoT1QBbgiNaMbGAV+HCDl5HKtmHoM2D/jLcOAkZbRCtTgWRUJP4GUiBapc8d8nxSewGchVVV+PIJiTTNvOKXLGnYwrlHa+UNjFjIomHazV7X/IEuXAm7DvC19GYnP3zI9AhiRvVoV/mj6pf46dfY/4LWVbGPFsBWvs9RNaKPdCJ821zeo5dqbSIC2hKs3ZL1Ly7KVNVsyvm66LswOsMoGlmV5WshXqJKnUoAllxrJQ2a4OU2doWC8UV7BjX6C+5gEl1TA6qgk5i0DTmceZNCPYDdCy9QiLTmyy56G9wnSaPk6ybTIhluf2MN8NfedBpeUaM4nfutTVGEwF89su23hv4zO6Y0qnow6P92M3kKJwB346iSZIequfKD5LGLUqV1L7tYlE6gC9nu2EZqS6ld8RlhPUzmjaUKtvJqK2OYyqcJH/SAZhQ0hdqESRMDQC6IxBih5m2T/KW2SnGICamDio7/0K4J2A5ng35JjYAOK/kf7sArMkx4eCgnwi2jfSNBlLbu1YMjst3a/Fs/8dU91z5n8MijIzNQKqBc7i9ikHEN vEb91egT qVQ/aA5w9n5IOUaLDZGyLTnFBv+nCAXElNXg7iEkg4gxYqqrMMyPQsOIkk4Cu8lYFY6EQK+wKhRJm4QAEt5bdGPl8hx4F2Ar8CipGI5YIVbLTap3GlkJoOQ4wud1Wso5wCF1pAVNgxn+0zZrcrfe4WRpNbZHZMfaqgFRrQ95ZMhGraXs5MlaeKI8qgMermt9/vMQbNiTa5ATyjJntKXGQQdbzoxWNyJGXQFvYjBFyiy7GQLkM/tYh+gssRXhN+tO2hbRBBOiaxNGeNWy5uxmPi7NpwdZ8ATMi7Tllv16jOX9Vjc9mGQ7xpKkANfNrxyOCLIWlyzhOAenJJPzqPSYTii+36lxaRhFxRtoYqqUY9A8BEKndP1EGy9Rpa0SXXBhtYQ+529d30l5R44hg/E37OUb7IfUFpNrpGeOWXjbogSUcm7rAS03jeGFPYddCo3TFJ1Xno2njrmVJDfU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 28 Jul 2023, Chuck Lever wrote: > From: David Howells >=20 > nfsd_splice_actor() has a clause in its loop that chops up a compound page > into individual pages such that if the same page is seen twice in a row, it > is discarded the second time. This is a problem with the advent of > shmem_splice_read() as that inserts zero_pages into the pipe in lieu of > pages that aren't present in the pagecache. >=20 > Fix this by assuming that the last page is being extended only if the > currently stored length + starting offset is not currently on a page > boundary. >=20 > This can be tested by NFS-exporting a tmpfs filesystem on the test machine > and truncating it to more than a page in size (eg. truncate -s 8192) and > then reading it by NFS. The first page will be all zeros, but thereafter > garbage will be read. >=20 > Note: I wonder if we can ever get a situation now where we get a splice > that gives us contiguous parts of a page in separate actor calls. As NFSD > can only be splicing from a file (I think), there are only three sources of > the page: copy_splice_read(), shmem_splice_read() and file_splice_read(). > The first allocates pages for the data it reads, so the problem cannot > occur; the second should never see a partial page; and the third waits for > each page to become available before we're allowed to read from it. >=20 > Fixes: bd194b187115 ("shmem: Implement splice-read") > Reported-by: Chuck Lever > Signed-off-by: David Howells > Reviewed-by: Jeff Layton > cc: Hugh Dickins > cc: Jens Axboe > cc: Matthew Wilcox > cc: linux-nfs@vger.kernel.org > cc: linux-fsdevel@vger.kernel.org > cc: linux-mm@kvack.org > Signed-off-by: Chuck Lever > --- > fs/nfsd/vfs.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) >=20 > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 59b7d60ae33e..ee3bbaa79478 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -956,10 +956,13 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struc= t pipe_buffer *buf, > last_page =3D page + (offset + sd->len - 1) / PAGE_SIZE; > for (page +=3D offset / PAGE_SIZE; page <=3D last_page; page++) { > /* > - * Skip page replacement when extending the contents > - * of the current page. > + * Skip page replacement when extending the contents of the > + * current page. But note that we may get two zero_pages in a > + * row from shmem. > */ > - if (page =3D=3D *(rqstp->rq_next_page - 1)) > + if (page =3D=3D *(rqstp->rq_next_page - 1) && > + offset_in_page(rqstp->rq_res.page_base + > + rqstp->rq_res.page_len)) This seems fragile in that it makes assumptions about the pages being sent and their alignment. Given that it was broken by the splice-read change, that confirms it is fragile. Maybe we could make the code a bit more explicit about what is expected. Also, I don't think this test can ever be relevant after the first time through the loop. So I think it would be clearest to have the interesting case outside the loop. page +=3D offset / PAGE_SIZE; if (rqstp->rq_res.pages_len > 0) { /* appending to page list - check alignment */ if (offset % PAGE_SIZE !=3D (rqstp->rq_res.page_base + rqstp-.rq_res.page_len) % PAGE_SIZE) return -EIO; if (offset % PAGE_SIZE !=3D 0) { /* continuing previous page */ if (page !=3D rqstp->rq_next_page[-1]) return -EIO; page +=3D 1; } } else /* Starting new page list */ rqstp->rq_res.page_base =3D offset % PAGE_SIZE; for ( ; page <=3D last_page ; page++) if (unlikely(!svc_rqst_replace_page(rqstp, page))) return -EIO; rqstp->rq_res.page_len +=3D sd->len; return sd->len; Also, the name "svc_rqst_replace_page" doesn't give any hint that the next_page pointer is advanced. Maybe svc_rqst_add_page() ??? Not great I admit. NeilBrown =20 > continue; > if (unlikely(!svc_rqst_replace_page(rqstp, page))) > return -EIO; >=20 >=20 >=20