From: Adam Borowski <kilobyte@angband.pl>
To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Cc: "Marcin Ślusarz" <marcin.slusarz@intel.com>
Subject: tmpfs fails fallocate(more than DRAM)
Date: Mon, 18 Feb 2019 14:34:23 +0100 [thread overview]
Message-ID: <20190218133423.tdzawczn4yjdzjqf@angband.pl> (raw)
Hi!
There's something that looks like a bug in tmpfs' implementation of
fallocate. If you try to fallocate more than the available DRAM (yet
with plenty of swap space), it will evict everything swappable out
then fail, undoing all the work done so far first.
The returned error is ENOMEM rather than POSIX mandated ENOSPC (for
posix_allocate(), but our documentation doesn't mention ENOMEM for
Linux-specific fallocate() either).
Doing the same allocation in multiple calls -- be it via non-overlapping
calls or even with same offset but increasing len -- works as expected.
An example:
Machine has 32GB RAM, minus 4GB memmapped as fake pmem. No big tasks
(X, some shells, browser, ...). Run 「while :;do free -m;done」 on another
terminal, then:
# mount -osize=64G -t tmpfs none /mnt/vol1
# chown you /mnt/vol1
$ cd /mnt/vol1
$ fallocate -l 32G foo
fallocate: fallocate failed: Cannot allocate memory
$ fallocate -l 28G foo
fallocate: fallocate failed: Cannot allocate memory
$ fallocate -l 27G foo
fallocate: fallocate failed: Cannot allocate memory
$ fallocate -l 26G foo
$ fallocate -l 52G foo
It takes a few seconds for the allocation to succeed, then a couple for it
to be torn down if it fails. More if it has to writeout the zeroes it
allocated in the previous call.
This raises multiple questions:
* why would fallocate bother to prefault the memory instead of just
reserving it? We want to kill overcommit, but reserving swap is as good
-- if there's memory pressure, our big allocation will be evicted anyway.
* why does it insist on doing everything in one piece? Biggest chunk I
see to be beneficial is 1G (for hugepages).
* when it fails, why does it undo the work done so far? This can matter
for other reasons, such as EINTR -- and fallocate isn't expected to be
atomic anyway.
* if I'm wrong and atomicity+prefaulting are desired, why does fallocate
forces just the delta (pages not yet allocated) to reside in core, rather
than the entire requested range?
Thus, I believe fallocate on tmpfs should behave consistently with other
filesystems and succeed unless we run into ENOSPC.
Am I missing something?
Meow!
--
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ Have you accepted Khorne as your lord and saviour?
⠈⠳⣄⠀⠀⠀⠀
next reply other threads:[~2019-02-18 13:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-18 13:34 Adam Borowski [this message]
2019-02-18 15:15 ` Matthew Wilcox
2019-02-18 20:25 ` Adam Borowski
2019-02-19 3:35 ` Hugh Dickins
2019-02-19 4:16 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190218133423.tdzawczn4yjdzjqf@angband.pl \
--to=kilobyte@angband.pl \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=marcin.slusarz@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox