From: Hannes Reinecke <hare@suse.de>
To: David Howells <dhowells@redhat.com>,
"Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>
Cc: brauner@kernel.org, akpm@linux-foundation.org,
chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org,
djwong@kernel.org, gost.dev@samsung.com,
linux-xfs@vger.kernel.org, hch@lst.de, david@fromorbit.com,
Zi Yan <ziy@nvidia.com>,
yang@os.amperecomputing.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com,
cl@os.amperecomputing.com, p.raghav@samsung.com,
mcgrof@kernel.org, ryan.roberts@arm.com
Subject: Re: [PATCH v12 00/10] enable bs > ps in XFS
Date: Mon, 19 Aug 2024 14:48:00 +0200 [thread overview]
Message-ID: <03ae65df-a369-436d-b31c-b3cec6ca3bc1@suse.de> (raw)
In-Reply-To: <3402933.1724068015@warthog.procyon.org.uk>
On 8/19/24 13:46, David Howells wrote:
> Hi Pankaj,
>
> I can reproduce the problem with:
>
> xfs_io -t -f -c "pwrite -S 0x58 0 40" -c "fsync" -c "truncate 4" -c "truncate 4096" /xfstest.test/wubble; od -x /xfstest.test/wubble
>
> borrowed from generic/393. I've distilled it down to the attached C program.
>
> Turning on tracing and adding a bit more, I can see the problem happening.
> Here's an excerpt of the tracing (I've added some non-upstream tracepoints).
> Firstly, you can see the second pwrite at fpos 0, 40 bytes (ie. 0x28):
>
> pankaj-5833: netfs_write_iter: WRITE-ITER i=9e s=0 l=28 f=0
> pankaj-5833: netfs_folio: pfn=116fec i=0009e ix=00000-00001 mod-streamw
>
> Then first ftruncate() is called to reduce the file size to 4:
>
> pankaj-5833: netfs_truncate: ni=9e isz=2028 rsz=2028 zp=4000 to=4
> pankaj-5833: netfs_inval_folio: pfn=116fec i=0009e ix=00000-00001 o=4 l=1ffc d=78787878
> pankaj-5833: netfs_folio: pfn=116fec i=0009e ix=00000-00001 inval-part
> pankaj-5833: netfs_set_size: ni=9e resize-file isz=4 rsz=4 zp=4
>
> You can see the invalidate_folio call, with the offset at 0x4 an the length as
> 0x1ffc. The data at the beginning of the page is 0x78787878. This looks
> correct.
>
> Then second ftruncate() is called to increase the file size to 4096
> (ie. 0x1000):
>
> pankaj-5833: netfs_truncate: ni=9e isz=4 rsz=4 zp=4 to=1000
> pankaj-5833: netfs_inval_folio: pfn=116fec i=0009e ix=00000-00001 o=1000 l=1000 d=78787878
> pankaj-5833: netfs_folio: pfn=116fec i=0009e ix=00000-00001 inval-part
> pankaj-5833: netfs_set_size: ni=9e resize-file isz=1000 rsz=1000 zp=4
>
> And here's the problem: in the invalidate_folio() call, the offset is 0x1000
> and the length is 0x1000 (o= and l=). But that's the wrong half of the folio!
> I'm guessing that the caller thereafter clears the other half of the folio -
> the bit that should be kept.
>
> David
> ---
> /* Distillation of the generic/393 xfstest */
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <fcntl.h>
>
> #define ERR(x, y) do { if ((long)(x) == -1) { perror(y); exit(1); } } while(0)
>
> static const char xxx[40] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
> static const char yyy[40] = "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy";
> static const char dropfile[] = "/proc/sys/vm/drop_caches";
> static const char droptype[] = "3";
> static const char file[] = "/xfstest.test/wubble";
>
> int main(int argc, char *argv[])
> {
> int fd, drop;
>
> /* Fill in the second 8K block of the file... */
> fd = open(file, O_CREAT|O_TRUNC|O_WRONLY, 0666);
> ERR(fd, "open");
> ERR(ftruncate(fd, 0), "pre-trunc $file");
> ERR(pwrite(fd, yyy, sizeof(yyy), 0x2000), "write-2000");
> ERR(close(fd), "close");
>
> /* ... and drop the pagecache so that we get a streaming
> * write, attaching some private data to the folio.
> */
> drop = open(dropfile, O_WRONLY);
> ERR(drop, dropfile);
> ERR(write(drop, droptype, sizeof(droptype) - 1), "write-drop");
> ERR(close(drop), "close-drop");
>
> fd = open(file, O_WRONLY, 0666);
> ERR(fd, "reopen");
> /* Make a streaming write on the first 8K block (needs O_WRONLY). */
> ERR(pwrite(fd, xxx, sizeof(xxx), 0), "write-0");
> /* Now use truncate to shrink and reexpand. */
> ERR(ftruncate(fd, 4), "trunc-4");
> ERR(ftruncate(fd, 4096), "trunc-4096");
> ERR(close(fd), "close-2");
> exit(0);
> }
>
Wouldn't the second truncate end up with a 4k file, and not an 8k?
IE the resulting file will be:
After step 1: 8k
After step 2: 4
After step 3: 4k
Hmm?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
next prev parent reply other threads:[~2024-08-19 12:48 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-15 9:08 Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 01/10] fs: Allow fine-grained control of folio sizes Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 02/10] filemap: allocate mapping_min_order folios in the page cache Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 03/10] readahead: allocate folios with mapping_min_order in readahead Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 04/10] mm: split a folio in minimum folio order chunks Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 05/10] filemap: cap PTE range to be created to allowed zero fill in folio_map_range() Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 06/10] iomap: fix iomap_dio_zero() for fs bs > system page size Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 07/10] xfs: use kvmalloc for xattr buffers Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 08/10] xfs: expose block size in stat Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 09/10] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 10/10] xfs: enable block size larger than page size support Pankaj Raghav (Samsung)
2024-08-16 19:31 ` [PATCH v12 00/10] enable bs > ps in XFS David Howells
2024-08-18 16:51 ` Pankaj Raghav (Samsung)
2024-08-18 20:16 ` David Howells
2024-08-19 7:24 ` Hannes Reinecke
2024-08-19 7:37 ` Pankaj Raghav (Samsung)
2024-08-19 12:25 ` David Howells
2024-08-19 11:46 ` David Howells
2024-08-19 12:48 ` Hannes Reinecke [this message]
2024-08-19 14:08 ` David Howells
2024-08-19 16:39 ` Pankaj Raghav (Samsung)
2024-08-19 18:40 ` David Howells
2024-08-20 9:17 ` Pankaj Raghav (Samsung)
2024-08-19 11:59 ` David Howells
2024-08-20 23:24 ` David Howells
2024-08-21 7:16 ` Pankaj Raghav (Samsung)
2024-08-19 15:17 ` David Howells
2024-08-19 16:51 ` David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=03ae65df-a369-436d-b31c-b3cec6ca3bc1@suse.de \
--to=hare@suse.de \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=cl@os.amperecomputing.com \
--cc=david@fromorbit.com \
--cc=dhowells@redhat.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=kernel@pankajraghav.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=ryan.roberts@arm.com \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox