From: "NeilBrown" <neilb@suse.de>
To: "Chuck Lever" <cel@kernel.org>
Cc: , "Chuck Lever" <chuck.lever@oracle.com>,
"David Howells" <dhowells@redhat.com>,
"Jeff Layton" <jlayton@kernel.org>,
"Hugh Dickins" <hughd@google.com>, "Jens Axboe" <axboe@kernel.dk>,
"Matthew Wilcox" <willy@infradead.org>,
linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH] nfsd: Fix reading via splice
Date: Sat, 29 Jul 2023 09:54:58 +1000 [thread overview]
Message-ID: <169058849828.32308.14965537137761913794@noble.neil.brown.name> (raw)
In-Reply-To: <169054754615.3783.11682801287165281930.stgit@klimt.1015granger.net>
On Fri, 28 Jul 2023, Chuck Lever wrote:
> From: David Howells <dhowells@redhat.com>
>
> nfsd_splice_actor() has a clause in its loop that chops up a compound page
> into individual pages such that if the same page is seen twice in a row, it
> is discarded the second time. This is a problem with the advent of
> shmem_splice_read() as that inserts zero_pages into the pipe in lieu of
> pages that aren't present in the pagecache.
>
> Fix this by assuming that the last page is being extended only if the
> currently stored length + starting offset is not currently on a page
> boundary.
>
> This can be tested by NFS-exporting a tmpfs filesystem on the test machine
> and truncating it to more than a page in size (eg. truncate -s 8192) and
> then reading it by NFS. The first page will be all zeros, but thereafter
> garbage will be read.
>
> Note: I wonder if we can ever get a situation now where we get a splice
> that gives us contiguous parts of a page in separate actor calls. As NFSD
> can only be splicing from a file (I think), there are only three sources of
> the page: copy_splice_read(), shmem_splice_read() and file_splice_read().
> The first allocates pages for the data it reads, so the problem cannot
> occur; the second should never see a partial page; and the third waits for
> each page to become available before we're allowed to read from it.
>
> Fixes: bd194b187115 ("shmem: Implement splice-read")
> Reported-by: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> cc: Hugh Dickins <hughd@google.com>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: linux-nfs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-mm@kvack.org
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/vfs.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 59b7d60ae33e..ee3bbaa79478 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -956,10 +956,13 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
> last_page = page + (offset + sd->len - 1) / PAGE_SIZE;
> for (page += offset / PAGE_SIZE; page <= last_page; page++) {
> /*
> - * Skip page replacement when extending the contents
> - * of the current page.
> + * Skip page replacement when extending the contents of the
> + * current page. But note that we may get two zero_pages in a
> + * row from shmem.
> */
> - if (page == *(rqstp->rq_next_page - 1))
> + if (page == *(rqstp->rq_next_page - 1) &&
> + offset_in_page(rqstp->rq_res.page_base +
> + rqstp->rq_res.page_len))
This seems fragile in that it makes assumptions about the pages being
sent and their alignment.
Given that it was broken by the splice-read change, that confirms it is
fragile. Maybe we could make the code a bit more explicit about what is
expected.
Also, I don't think this test can ever be relevant after the first time
through the loop. So I think it would be clearest to have the
interesting case outside the loop.
page += offset / PAGE_SIZE;
if (rqstp->rq_res.pages_len > 0) {
/* appending to page list - check alignment */
if (offset % PAGE_SIZE != (rqstp->rq_res.page_base +
rqstp-.rq_res.page_len) % PAGE_SIZE)
return -EIO;
if (offset % PAGE_SIZE != 0) {
/* continuing previous page */
if (page != rqstp->rq_next_page[-1])
return -EIO;
page += 1;
}
} else
/* Starting new page list */
rqstp->rq_res.page_base = offset % PAGE_SIZE;
for ( ; page <= last_page ; page++)
if (unlikely(!svc_rqst_replace_page(rqstp, page)))
return -EIO;
rqstp->rq_res.page_len += sd->len;
return sd->len;
Also, the name "svc_rqst_replace_page" doesn't give any hint that the
next_page pointer is advanced. Maybe svc_rqst_add_page() ??? Not great
I admit.
NeilBrown
> continue;
> if (unlikely(!svc_rqst_replace_page(rqstp, page)))
> return -EIO;
>
>
>
next prev parent reply other threads:[~2023-07-28 23:55 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-28 12:32 Chuck Lever
2023-07-28 23:54 ` NeilBrown [this message]
2023-07-30 15:29 ` Chuck Lever
2023-07-30 16:50 ` Hugh Dickins
2023-07-30 16:55 ` Chuck Lever
2023-07-30 22:02 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=169058849828.32308.14965537137761913794@noble.neil.brown.name \
--to=neilb@suse.de \
--cc=axboe@kernel.dk \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=dhowells@redhat.com \
--cc=hughd@google.com \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox