From: Gabriel Krisman Bertazi <krisman@collabora.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Khazhy Kumykov <khazhy@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
Al Viro <viro@zeniv.linux.org.uk>,
kernel@collabora.com, Linux MM <linux-mm@kvack.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Theodore Tso <tytso@mit.edu>
Subject: Re: [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space.
Date: Thu, 12 May 2022 16:00:55 -0400 [thread overview]
Message-ID: <87zgjmy0zs.fsf@collabora.com> (raw)
In-Reply-To: <87r157n0j2.fsf@collabora.com> (Gabriel Krisman Bertazi's message of "Thu, 05 May 2022 17:16:01 -0400")
Gabriel Krisman Bertazi <krisman@collabora.com> writes:
> Amir Goldstein <amir73il@gmail.com> writes:
>
>>> task a user could easily go from 0% to full, or OOM, rather quickly,
>>> so statfs polling would likely miss the event. The orchestrator can,
>>> when the task fails, easily (and reliably) look at this statistic to
>>> determine if a user exceeded the tmpfs limit.
>>>
>>> (I do see the parallel here to thin provisioned storage - "exceeded
>>> your individual budget" vs. "underlying overcommitted system ran out
>>> of bytes")
>>
>> Right, and in this case, the application gets a different error in case
>> of "underlying space overcommitted", usually EIO, that's why I think that
>> opting-in for this same behavior could make sense for tmpfs.
>
> Amir,
>
> If I understand correctly, that would allow the application to catch the
> lack of memory vs. lack of fs space, but it wouldn't facilitate life for
> an orchestrator trying to detect the condition. Still it seems like a
> step in the right direction. For the orchestrator, it seems necessary
> that we expose this is some out-of-band mechanism, a WB_ERROR
> notification or sysfs.
Amir,
Regarding allowing an orchestrator to catch this situation, I'd like to
go back to the original proposal and create a new tmpfs
"thin-provisioned" option that will return a different error code (as
the patch below, that I sent last week) and also issue a special
FAN_FS_ERROR/WB_ERROR to notify the orchestrator of this situation.
This would completely solve the use case, I believe. Since this is
quite specific to tmpfs, it is reasonable to implement the notification
at FS level, similar to how other FS_ERRORs are implemented.
> As a first step:
>
>>8
> Subject: [PATCH] shmem: Differentiate overcommit failure from lack of fs space
>
> When provisioning user applications in cloud environments, it is common
> to allocate containers with very small tmpfs and little available
> memory. In such scenarios, it is hard for an application to
> differentiate whether its tmpfs IO failed due do insufficient
> provisioned filesystem space, or due to running out of memory in the
> container, because both situations will return ENOSPC in shmem.
>
> This patch modifies the behavior of shmem failure due to overcommit to
> return EIO instead of ENOSPC in this scenario. In order to preserve the
> existing interface, this feature must be enabled through a new
> shmem-specific mount option.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
> Documentation/filesystems/tmpfs.rst | 16 +++++++++++++++
> include/linux/shmem_fs.h | 3 +++
> mm/shmem.c | 30 ++++++++++++++++++++---------
> 3 files changed, 40 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 0408c245785e..83278d2b15a3 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -171,6 +171,22 @@ will give you tmpfs instance on /mytmpfs which can allocate 10GB
> RAM/SWAP in 10240 inodes and it is only accessible by root.
>
>
> +When provisioning containerized applications, it is common to allocate
> +the system with a very small tmpfs and little total memory. In such
> +scenarios, it is sometimes useful for an application to differentiate
> +whether an IO operation failed due to insufficient provisioned
> +filesystem space or due to running out of container memory. tmpfs
> +includes a mount parameter to treat a memory overcommit limit error
> +differently from a lack of filesystem space error, allowing the
> +application to differentiate these two scenarios. If the following
> +mount option is specified, surpassing memory overcommit limits on a
> +tmpfs will return EIO. ENOSPC is then only used to report lack of
> +filesystem space.
> +
> +================= ===================================================
> +report_overcommit Report overcommit issues with EIO instead of ENOSPC
> +================= ===================================================
> +
> :Author:
> Christoph Rohland <cr@sap.com>, 1.12.01
> :Updated:
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index e65b80ed09e7..1be57531b257 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -44,6 +44,9 @@ struct shmem_sb_info {
> spinlock_t shrinklist_lock; /* Protects shrinklist */
> struct list_head shrinklist; /* List of shinkable inodes */
> unsigned long shrinklist_len; /* Length of shrinklist */
> +
> + /* Assist userspace with detecting overcommit errors */
> + bool report_overcommit;
> };
>
> static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a09b29ec2b45..23f2780678df 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -112,6 +112,7 @@ struct shmem_options {
> kgid_t gid;
> umode_t mode;
> bool full_inums;
> + bool report_overcommit;
> int huge;
> int seen;
> #define SHMEM_SEEN_BLOCKS 1
> @@ -207,13 +208,16 @@ static inline void shmem_unacct_blocks(unsigned long flags, long pages)
> vm_unacct_memory(pages * VM_ACCT(PAGE_SIZE));
> }
>
> -static inline bool shmem_inode_acct_block(struct inode *inode, long pages)
> +static inline int shmem_inode_acct_block(struct inode *inode, long pages)
> {
> struct shmem_inode_info *info = SHMEM_I(inode);
> struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>
> - if (shmem_acct_block(info->flags, pages))
> - return false;
> + if (shmem_acct_block(info->flags, pages)) {
> + if (sbinfo->report_overcommit)
> + return -EIO;
> + return -ENOSPC;
> + }
>
> if (sbinfo->max_blocks) {
> if (percpu_counter_compare(&sbinfo->used_blocks,
> @@ -222,11 +226,11 @@ static inline bool shmem_inode_acct_block(struct inode *inode, long pages)
> percpu_counter_add(&sbinfo->used_blocks, pages);
> }
>
> - return true;
> + return 0;
>
> unacct:
> shmem_unacct_blocks(info->flags, pages);
> - return false;
> + return -ENOSPC;
> }
>
> static inline void shmem_inode_unacct_blocks(struct inode *inode, long pages)
> @@ -372,7 +376,7 @@ bool shmem_charge(struct inode *inode, long pages)
> struct shmem_inode_info *info = SHMEM_I(inode);
> unsigned long flags;
>
> - if (!shmem_inode_acct_block(inode, pages))
> + if (shmem_inode_acct_block(inode, pages))
> return false;
>
> /* nrpages adjustment first, then shmem_recalc_inode() when balanced */
> @@ -1555,13 +1559,14 @@ static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
> struct shmem_inode_info *info = SHMEM_I(inode);
> struct page *page;
> int nr;
> - int err = -ENOSPC;
> + int err;
>
> if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> huge = false;
> nr = huge ? HPAGE_PMD_NR : 1;
>
> - if (!shmem_inode_acct_block(inode, nr))
> + err = shmem_inode_acct_block(inode, nr);
> + if (err)
> goto failed;
>
> if (huge)
> @@ -2324,7 +2329,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
> int ret;
> pgoff_t max_off;
>
> - if (!shmem_inode_acct_block(inode, 1)) {
> + if (shmem_inode_acct_block(inode, 1)) {
> /*
> * We may have got a page, returned -ENOENT triggering a retry,
> * and now we find ourselves with -ENOMEM. Release the page, to
> @@ -3301,6 +3306,7 @@ enum shmem_param {
> Opt_uid,
> Opt_inode32,
> Opt_inode64,
> + Opt_report_overcommit,
> };
>
> static const struct constant_table shmem_param_enums_huge[] = {
> @@ -3322,6 +3328,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
> fsparam_u32 ("uid", Opt_uid),
> fsparam_flag ("inode32", Opt_inode32),
> fsparam_flag ("inode64", Opt_inode64),
> + fsparam_flag ("report_overcommit", Opt_report_overcommit),
> {}
> };
>
> @@ -3405,6 +3412,9 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
> ctx->full_inums = true;
> ctx->seen |= SHMEM_SEEN_INUMS;
> break;
> + case Opt_report_overcommit:
> + ctx->report_overcommit = true;
> + break;
> }
> return 0;
>
> @@ -3513,6 +3523,7 @@ static int shmem_reconfigure(struct fs_context *fc)
> sbinfo->max_inodes = ctx->inodes;
> sbinfo->free_inodes = ctx->inodes - inodes;
> }
> + sbinfo->report_overcommit = ctx->report_overcommit;
>
> /*
> * Preserve previous mempolicy unless mpol remount option was specified.
> @@ -3640,6 +3651,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
> sbinfo->mode = ctx->mode;
> sbinfo->huge = ctx->huge;
> sbinfo->mpol = ctx->mpol;
> + sbinfo->report_overcommit = ctx->report_overcommit;
> ctx->mpol = NULL;
>
> raw_spin_lock_init(&sbinfo->stat_lock);
--
Gabriel Krisman Bertazi
prev parent reply other threads:[~2022-05-12 20:01 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-18 21:37 Gabriel Krisman Bertazi
2022-04-18 21:37 ` [PATCH v3 1/3] shmem: Keep track of out-of-memory and out-of-space errors Gabriel Krisman Bertazi
2022-04-18 21:37 ` [PATCH v3 2/3] shmem: Introduce /sys/fs/tmpfs support Gabriel Krisman Bertazi
2022-04-22 9:54 ` Dan Carpenter
2022-04-18 21:37 ` [PATCH v3 3/3] shmem: Expose space and accounting error count Gabriel Krisman Bertazi
2022-04-19 3:42 ` [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space Andrew Morton
2022-04-19 15:28 ` Gabriel Krisman Bertazi
2022-04-21 5:33 ` Amir Goldstein
2022-04-21 22:37 ` Gabriel Krisman Bertazi
2022-04-21 23:19 ` Khazhy Kumykov
2022-04-22 9:02 ` Amir Goldstein
2022-05-05 21:16 ` Gabriel Krisman Bertazi
2022-05-12 20:00 ` Gabriel Krisman Bertazi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zgjmy0zs.fsf@collabora.com \
--to=krisman@collabora.com \
--cc=akpm@linux-foundation.org \
--cc=amir73il@gmail.com \
--cc=hughd@google.com \
--cc=kernel@collabora.com \
--cc=khazhy@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox