From: Chuck Lever III <chuck.lever@oracle.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Patrice CHOTARD <patrice.chotard@foss.st.com>,
Mikulas Patocka <mpatocka@redhat.com>,
Lukas Czerner <lczerner@redhat.com>,
Christoph Hellwig <hch@lst.de>,
"Darrick J. Wong" <djwong@kernel.org>,
Linux-MM <linux-mm@kvack.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: Regression in xfstests on tmpfs-backed NFS exports
Date: Thu, 7 Apr 2022 19:24:11 +0000 [thread overview]
Message-ID: <2B7AF707-67B1-4ED8-A29F-957C26B7F87A@oracle.com> (raw)
In-Reply-To: <11f319-c9a-4648-bfbb-dc5a83c774@google.com>
> On Apr 6, 2022, at 8:18 PM, Hugh Dickins <hughd@google.com> wrote:
>
> On Wed, 6 Apr 2022, Chuck Lever III wrote:
>
>> Good day, Hugh-
>
> Huh! If you were really wishing me a good day, would you tell me this ;-?
>
>>
>> I noticed that several fsx-related tests in the xfstests suite are
>> failing after updating my NFS server to v5.18-rc1. I normally test
>> against xfs, ext4, btrfs, and tmpfs exports. tmpfs is the only export
>> that sees these new failures:
>>
>> generic/075 2s ... [failed, exit status 1]- output mismatch (see /home/cel/src/xfstests/results//generic/075.out.bad)
>> --- tests/generic/075.out 2014-02-13 15:40:45.000000000 -0500
>> +++ /home/cel/src/xfstests/results//generic/075.out.bad 2022-04-05 16:39:59.145991520 -0400
>> @@ -4,15 +4,5 @@
>> -----------------------------------------------
>> fsx.0 : -d -N numops -S 0
>> -----------------------------------------------
>> -
>> ------------------------------------------------
>> -fsx.1 : -d -N numops -S 0 -x
>> ------------------------------------------------
>> ...
>> (Run 'diff -u /home/cel/src/xfstests/tests/generic/075.out /home/cel/src/xfstests/results//generic/075.out.bad' to see the entire diff)
>>
>> generic/091 9s ... [failed, exit status 1]- output mismatch (see /home/cel/src/xfstests/results//generic/091.out.bad)
>> --- tests/generic/091.out 2014-02-13 15:40:45.000000000 -0500
>> +++ /home/cel/src/xfstests/results//generic/091.out.bad 2022-04-05 16:41:24.329063277 -0400
>> @@ -1,7 +1,75 @@
>> QA output created by 091
>> fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
>> -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
>> -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
>> -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
>> -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
>> -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W
>> ...
>> (Run 'diff -u /home/cel/src/xfstests/tests/generic/091.out /home/cel/src/xfstests/results//generic/091.out.bad' to see the entire diff)
>>
>> generic/112 2s ... [failed, exit status 1]- output mismatch (see /home/cel/src/xfstests/results//generic/112.out.bad)
>> --- tests/generic/112.out 2014-02-13 15:40:45.000000000 -0500
>> +++ /home/cel/src/xfstests/results//generic/112.out.bad 2022-04-05 16:41:38.511075170 -0400
>> @@ -4,15 +4,4 @@
>> -----------------------------------------------
>> fsx.0 : -A -d -N numops -S 0
>> -----------------------------------------------
>> -
>> ------------------------------------------------
>> -fsx.1 : -A -d -N numops -S 0 -x
>> ------------------------------------------------
>> ...
>> (Run 'diff -u /home/cel/src/xfstests/tests/generic/112.out /home/cel/src/xfstests/results//generic/112.out.bad' to see the entire diff)
>>
>> generic/127 49s ... - output mismatch (see /home/cel/src/xfstests/results//generic/127.out.bad)
>> --- tests/generic/127.out 2016-08-28 12:16:20.000000000 -0400
>> +++ /home/cel/src/xfstests/results//generic/127.out.bad 2022-04-05 16:42:07.655099652 -0400
>> @@ -4,10 +4,198 @@
>> === FSX Light Mode, Memory Mapping ===
>> All 100000 operations completed A-OK!
>> === FSX Standard Mode, No Memory Mapping ===
>> -All 100000 operations completed A-OK!
>> +ltp/fsx -q -l 262144 -o 65536 -S 191110531 -N 100000 -R -W fsx_std_nommap
>> +READ BAD DATA: offset = 0x9cb7, size = 0xfae3, fname = /tmp/mnt/manet.ib-2323703/fsx_std_nommap
>> +OFFSET GOOD BAD RANGE
>> ...
>> (Run 'diff -u /home/cel/src/xfstests/tests/generic/127.out /home/cel/src/xfstests/results//generic/127.out.bad' to see the entire diff)
>>
>> I bisected the problem to:
>>
>> 56a8c8eb1eaf ("tmpfs: do not allocate pages on read")
>>
>> generic/075 fails almost immediately without any NFS-level errors.
>> Likely this is data corruption rather than an overt I/O error.
>
> That's sad. Thanks for bisecting and reporting. Sorry for the nuisance.
>
> I suspect this patch is heading for a revert, because I shall not have
> time to debug and investigate. Cc'ing fsdevel and a few people who have
> an interest in it, to warn of that likely upcoming revert.
>
> But if it's okay with everyone, please may we leave it in for -rc2?
> Given that having it in -rc1 already smoked out another issue (problem
> of SetPageUptodate(ZERO_PAGE(0)) without CONFIG_MMU), I think keeping
> it in a little longer might smoke out even more.
>
> The xfstests info above doesn't actually tell very much, beyond that
> generic/075 generic/091 generic/112 generic/127, each a test with fsx,
> all fall at their first hurdle. If you have time, please rerun and
> tar up the results/generic directory (maybe filter just those failing)
> and send as attachment. But don't go to any trouble, it's unlikely
> that I shall even untar it - it would be mainly to go on record if
> anyone has time to look into it later. And, frankly, it's unlikely
> to tell us anything more enlightening, than that the data seen was
> not as expected: which we do already know.
>
> I've had no problem with xfstests generic 075,091,112,127 testing
> tmpfs here, not before and not in the month or two I've had that
> patch in: so it's something in the way that NFS exercises tmpfs
> that reveals it. If I had time to duplicate your procedure, I'd be
> asking for detailed instructions: but no, I won't have a chance.
>
> But I can sit here and try to guess. I notice fs/nfsd checks
> file->f_op->splice_read, and employs fallback if not available:
> if you have time, please try rerunning those xfstests on an -rc1
> kernel, but with mm/shmem.c's .splice_read line commented out.
> My guess is that will then pass the tests, and we shall know more.
This seemed like the most probative next step, so I commented
out the .splice_read call-out in mm/shmem.c and ran the tests
again. Yes, that change enables the fsx-related tests to pass
as expected.
> What could be going wrong there? I've thought of two possibilities.
> A minor, hopefully easily fixed, issue would be if fs/nfsd has
> trouble with seeing the same page twice in a row: since tmpfs is
> now using the ZERO_PAGE(0) for all pages of a hole, and I think I
> caught sight of code which looks to see if the latest page is the
> same as the one before. It's easy to imagine that might go wrong.
Are you referring to this function in fs/nfsd/vfs.c ?
847 static int
848 nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
849 struct splice_desc *sd)
850 {
851 struct svc_rqst *rqstp = sd->u.data;
852 struct page **pp = rqstp->rq_next_page;
853 struct page *page = buf->page;
854
855 if (rqstp->rq_res.page_len == 0) {
856 svc_rqst_replace_page(rqstp, page);
857 rqstp->rq_res.page_base = buf->offset;
858 } else if (page != pp[-1]) {
859 svc_rqst_replace_page(rqstp, page);
860 }
861 rqstp->rq_res.page_len += sd->len;
862
863 return sd->len;
864 }
rq_next_page should point to the first unused element of
rqstp->rq_pages, so IIUC that check is looking for the
final page that is part of the READ payload.
But that does suggest that if page -> ZERO_PAGE and so does
pp[-1], then svc_rqst_replace_page() would not be invoked.
> A more difficult issue would be, if fsx is racing writes and reads,
> in a way that it can guarantee the correct result, but that correct
> result is no longer delivered: because the writes go into freshly
> allocated tmpfs cache pages, while reads are still delivering
> stale ZERO_PAGEs from the pipe. I'm hazy on the guarantees there.
>
> But unless someone has time to help out, we're heading for a revert.
>
> Thanks,
> Hugh
--
Chuck Lever
next prev parent reply other threads:[~2022-04-07 19:24 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-06 17:18 Chuck Lever III
2022-04-07 0:18 ` Hugh Dickins
2022-04-07 4:25 ` Mark Hemment
2022-04-07 22:04 ` Hugh Dickins
2022-04-07 19:24 ` Chuck Lever III [this message]
2022-04-07 22:26 ` Hugh Dickins
2022-04-07 23:45 ` Chuck Lever III
2022-04-08 14:38 ` Mark Hemment
2022-04-08 16:10 ` Chuck Lever III
2022-04-08 19:09 ` Hugh Dickins
2022-04-08 19:52 ` Chuck Lever III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2B7AF707-67B1-4ED8-A29F-957C26B7F87A@oracle.com \
--to=chuck.lever@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=hughd@google.com \
--cc=lczerner@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=patrice.chotard@foss.st.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox