From: Dave Chinner <david@fromorbit.com>
To: Hugh Dickins <hughd@google.com>
Cc: Adam Borowski <kilobyte@angband.pl>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
Marcin Slusarz <marcin.slusarz@intel.com>
Subject: Re: tmpfs fails fallocate(more than DRAM)
Date: Tue, 19 Feb 2019 15:16:50 +1100 [thread overview]
Message-ID: <20190219041650.GB15503@dastard> (raw)
In-Reply-To: <alpine.LSU.2.11.1902181745010.1241@eggly.anvils>
On Mon, Feb 18, 2019 at 07:35:01PM -0800, Hugh Dickins wrote:
> On Mon, 18 Feb 2019, Adam Borowski wrote:
> > I searched a bit for references that would suggest failed fallocates need to
> > be undone, and I can't seem to find any. Neither POSIX nor our man pages
> > say a word about semantics of interrupted fallocate, and both glibc's and
> > FreeBSD's fallback emulation don't rollback.
>
> To me it was self-evident: with a few awkward exceptions (awkward because
> they would have a difficult job to undo, and awkward because they argue
> against me!), a system call either succeeds or fails, or reports partial
> success. If fallocate() says it failed (and is not allowed to report
> partial success), then it should not have allocated. Especially in the
> case of RAM, when filling it up makes it rather hard to unfill (another
> persistent problem with tmpfs is the way it can occupy all of memory,
> and the OOM killer go about killing a thousand processes, but none of
> them help because the memory is occupied by a tmpfs, not by a process).
>
> Now that you question it (did I not do so at the time? I thought I did),
> I try fallocate() on btrfs and ext4 and xfs. btrfs and xfs behave as I
> expect above, failing outright with ENOSPC if it will not fit;
If only it were that simple. :/
XFS can do partial allocation and fail - it all depends on how many
extent allocations are required before ENOSPC is actually hit. e.g.
if you ask for 10GB and there is only 5GB free, it should fail
straight away. However, if there's 20GB free in 1GB chunks, it will
loop allocating 1GB extents. If something else is allocating at the
same time, the fallocate could get to, say, 8GB allocated and then
hit ENOSPC.
In which case, we'll return the ENOSPC error, but we'll also leave
the 8GB of space already allocated to the file there. i.e. it
doesn't clean up after itself.
The reason for this is that we don't know after we've performed
allocations what regions of the preallocated range were actually
allocated by the preallocation. i.e. fallocate can be run over a
range that already contains some extents - it simply skips over
regions that are already allocated. hence we don't know what we are
supposed to clean up, and so we leave the corpse lying around for
someone else to deal with (e.g. by sparsifying the file again).
> whereas
> ext4 proceeds to fill up the filesystem, leaving it full when it says
> that it failed.
This is much the same behaviour as XFS - you see it more easily with
ext4 because it has much smaller maximum extent size (128MB) than
XFS (8GB) and so needs to iterate multiple allocations sooner than
XFS or btrfs need to.
I'm not sure what btrfs does
> Looks like I had a choice of models to follow: the
> ext4 model would have been easier to follow, but risked OOM.
fallocate() gives you the rope to choose what is best for the
filesystem - it doesn't specify behaviour on failure precisely
because it can be very difficult (not to mention complex!) for
filesystems to unwind partial failures....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
prev parent reply other threads:[~2019-02-19 4:16 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-18 13:34 Adam Borowski
2019-02-18 15:15 ` Matthew Wilcox
2019-02-18 20:25 ` Adam Borowski
2019-02-19 3:35 ` Hugh Dickins
2019-02-19 4:16 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190219041650.GB15503@dastard \
--to=david@fromorbit.com \
--cc=hughd@google.com \
--cc=kilobyte@angband.pl \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=marcin.slusarz@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox