From: Al Viro <viro@zeniv.linux.org.uk>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Hugh Dickins <hughd@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: Orphan filesystems after mount namespace destruction and tmpfs "leak"
Date: Mon, 2 Feb 2026 18:43:56 +0000 [thread overview]
Message-ID: <20260202184356.GD3183987@ZenIV> (raw)
In-Reply-To: <aYDjHJstnz2V-ZZg@thinkstation>
On Mon, Feb 02, 2026 at 05:50:30PM +0000, Kiryl Shutsemau wrote:
> In the Meta fleet, we saw a problem where destroying a container didn't
> lead to freeing the shmem memory attributed to a tmpfs mounted inside
> that container. It triggered an OOM when a new container attempted to
> start.
>
> Investigation has shown that this happened because a process outside of
> the container kept a file from the tmpfs mapped. The mapped file is
> small (4k), but it holds all the contents of the tmpfs (~47GiB) from
> being freed.
>
> When a tmpfs filesystem is mounted inside a mount namespace (e.g., a
> container), and a process outside that namespace holds an open file
> descriptor to a file on that tmpfs, the tmpfs superblock remains in
> kernel memory indefinitely after:
>
> 1. All processes inside the mount namespace have exited.
> 2. The mount namespace has been destroyed.
> 3. The tmpfs is no longer visible in any mount namespace.
Yes? That's precisely what should happen as long as something's opened
on a filesystem.
> The superblock persists with mnt_ns = NULL in its mount structures,
> keeping all tmpfs contents pinned in memory until the external file
> descriptor is closed.
Yes.
> The problem is not specific to tmpfs, but for filesystems with backing
> storage, the memory impact is not as severe since the page cache is
> reclaimable.
>
> The obvious solution to the problem is "Don't do that": the file should
> be unmapped/closed upon container destruction.
Or remove the junk there from time to time, if you don't want it to stay
until the filesystem shutdown...
> But I wonder if the kernel can/should do better here? Currently, this
> scenario is hard to diagnose. It looks like a leak of shmem pages.
>
> Also, I wonder if the current behavior can lead to data loss on a
> filesystem with backing storage:
> - The mount namespace where my USB stick was mounted is gone.
> - The USB stick is no longer mounted anywhere.
> - I can pull the USB stick out.
> - Oops, someone was writing there: corruption/data loss.
>
> I am not sure what a possible solution would be here. I can only think
> of blocking exit(2) for the last process in the namespace until all
> filesystems are cleanly unmounted, but that is not very informative
> either.
That's insane - if nothing else, the process that holds the sucker
opened may very well be waiting for the one you've blocked.
You are getting exactly what you asked for - same as you would on
lazy umount, for that matter.
Filesystem may be active without being attached to any namespace;
it's an intentional behaviour. What's more, it _is_ visible to
ustat(2), as well as lsof(1) and similar userland tools in case
of opened file keeping it busy.
next prev parent reply other threads:[~2026-02-02 18:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-02 17:50 Kiryl Shutsemau
2026-02-02 18:43 ` Al Viro [this message]
2026-02-02 19:43 ` Kiryl Shutsemau
2026-02-02 20:03 ` Askar Safin
2026-02-03 14:58 ` Christian Brauner
2026-02-04 17:04 ` Theodore Tso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260202184356.GD3183987@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=baolin.wang@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=kas@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox