From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9354AC4321E for ; Tue, 29 Nov 2022 13:48:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1FC2C6B0078; Tue, 29 Nov 2022 08:48:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1ACC96B007D; Tue, 29 Nov 2022 08:48:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0746C6B007E; Tue, 29 Nov 2022 08:48:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id EB5F76B0078 for ; Tue, 29 Nov 2022 08:48:22 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CDE0CA0EDE for ; Tue, 29 Nov 2022 11:21:40 +0000 (UTC) X-FDA: 80186239560.09.6245F7B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id 5AFB94000C for ; Tue, 29 Nov 2022 11:21:40 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 577DF61654; Tue, 29 Nov 2022 11:21:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EDDF6C433C1; Tue, 29 Nov 2022 11:21:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669720898; bh=Vg815GSQ20tE9Qw9qVQ5Lp8HZJbkY0+gWcbgUzOvn9c=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=nmCEXDDjZVlXiGKFZyPMhJeKRbv8bFhkQvKDidpwdBfLD1gxcht9y9eMLOvS63pZx dmg1bhw/b8kyxU4o7gOuiWzWGyN+l3DPfgmLThsCUHZp5QWbWakeTs3JhZUZvGcHFg eEv8rip+cvqoFmV+f4y8BJRBE8I4JVL3dYZrkalVz0Vr8mh3R2vwI1FY/4mOVMULpN 9QulmzYq0/3ei2VEusMfCb7qC46UZFbA4WhdikI/2Z1fPDrCeVOYvU48j50WnKImGd E/hMW8X4pGtjk6esvQKwUnyvxjEgfeT3v9Jxh8nOhqp+/gBuSaSBk5EXE5oWTx8AXT ip08Y3WQxGCQQ== Date: Tue, 29 Nov 2022 12:21:33 +0100 From: Christian Brauner To: Lukas Czerner Cc: Hugh Dickins , Jan Kara , Eric Sandeen , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, djwong@kernel.org Subject: Re: [PATCH v2 1/3] quota: add quota in-memory format support Message-ID: <20221129112133.rrpoywlwdw45k3qa@wittgenstein> References: <20221121142854.91109-1-lczerner@redhat.com> <20221121142854.91109-2-lczerner@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20221121142854.91109-2-lczerner@redhat.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669720900; a=rsa-sha256; cv=none; b=dGd6zPmncy/Y0n0r7Yi8c/ckAqyQVh/+h6KwSUsJBCXwDW3v2zmfpEn1/w5GjZfnmVv8Gh e5pjaIyZQgRyL1GlyeNH3xzokteIRfrSx/5RysKsPY9BgKclEdDbsRYwvgPyPknWJMou+1 ahoXZQG4/RS2qK7oSmTv/ZBK9Q+91Q0= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=nmCEXDDj; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669720900; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GzgUTu616P75ikUADG6R2cPiZh7XMucoTQvmLfwTQ/Y=; b=kVsJ4OdamMrIVdQ6JceT2go05/zsi/N6Cveu8bgXPBoTOuZ0DMN+UiMpsHPcJolNAvvno1 Oz2dGgiIkoRZEMCIwd7no8wZyy/KgbdccsPMPrgIBZLX7kfgEtjiJ1vDpkDTIplkNOlJ9e vUnAxMk953OVnJnMc6Iwd2En1a6l4is= X-Rspamd-Queue-Id: 5AFB94000C X-Rspam-User: Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=nmCEXDDj; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf07.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org X-Rspamd-Server: rspam09 X-Stat-Signature: thi41889358zcgbx397cxhjjm5pgdn4c X-HE-Tag: 1669720900-296578 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 21, 2022 at 03:28:52PM +0100, Lukas Czerner wrote: > In memory quota format relies on quota infrastructure to store dquot > information for us. While conventional quota formats for file systems > with persistent storage can load quota information into dquot from the > storage on-demand and hence quota dquot shrinker can free any dquot that > is not currently being used, it must be avoided here. Otherwise we can > lose valuable information, user provided limits, because there is no > persistent storage to load the information from afterwards. > > One information that in-memory quota format needs to keep track of is a > sorted list of ids for each quota type. This is done by utilizing an rb > tree which root is stored in mem_dqinfo->dqi_priv for each quota type. > > This format can be used to support quota on file system without persistent > storage such as tmpfs. > > Signed-off-by: Lukas Czerner > --- > fs/quota/Kconfig | 8 ++ > fs/quota/Makefile | 1 + > fs/quota/dquot.c | 3 + > fs/quota/quota_mem.c | 260 +++++++++++++++++++++++++++++++++++++ > include/linux/quota.h | 7 +- > include/uapi/linux/quota.h | 1 + > 6 files changed, 279 insertions(+), 1 deletion(-) > create mode 100644 fs/quota/quota_mem.c > > diff --git a/fs/quota/Kconfig b/fs/quota/Kconfig > index b59cd172b5f9..8ea9656ca37b 100644 > --- a/fs/quota/Kconfig > +++ b/fs/quota/Kconfig > @@ -67,6 +67,14 @@ config QFMT_V2 > also supports 64-bit inode and block quota limits. If you need this > functionality say Y here. > > +config QFMT_MEM > + tristate "Quota in-memory format support " > + depends on QUOTA > + help > + This config option enables kernel support for in-memory quota > + format support. Useful to support quota on file system without > + permanent storage. If you need this functionality say Y here. > + > config QUOTACTL > bool > default n > diff --git a/fs/quota/Makefile b/fs/quota/Makefile > index 9160639daffa..935be3f7b731 100644 > --- a/fs/quota/Makefile > +++ b/fs/quota/Makefile > @@ -5,3 +5,4 @@ obj-$(CONFIG_QFMT_V2) += quota_v2.o > obj-$(CONFIG_QUOTA_TREE) += quota_tree.o > obj-$(CONFIG_QUOTACTL) += quota.o kqid.o > obj-$(CONFIG_QUOTA_NETLINK_INTERFACE) += netlink.o > +obj-$(CONFIG_QFMT_MEM) += quota_mem.o > diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c > index 0427b44bfee5..f1a7a03632a2 100644 > --- a/fs/quota/dquot.c > +++ b/fs/quota/dquot.c > @@ -736,6 +736,9 @@ dqcache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > spin_lock(&dq_list_lock); > while (!list_empty(&free_dquots) && sc->nr_to_scan) { > dquot = list_first_entry(&free_dquots, struct dquot, dq_free); > + if (test_bit(DQ_NO_SHRINK_B, &dquot->dq_flags) && > + !test_bit(DQ_FAKE_B, &dquot->dq_flags)) > + continue; > remove_dquot_hash(dquot); > remove_free_dquot(dquot); > remove_inuse(dquot); > diff --git a/fs/quota/quota_mem.c b/fs/quota/quota_mem.c > new file mode 100644 > index 000000000000..7d5e82122143 > --- /dev/null > +++ b/fs/quota/quota_mem.c > @@ -0,0 +1,260 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * In memory quota format relies on quota infrastructure to store dquot > + * information for us. While conventional quota formats for file systems > + * with persistent storage can load quota information into dquot from the > + * storage on-demand and hence quota dquot shrinker can free any dquot > + * that is not currently being used, it must be avoided here. Otherwise we > + * can lose valuable information, user provided limits, because there is > + * no persistent storage to load the information from afterwards. > + * > + * One information that in-memory quota format needs to keep track of is > + * a sorted list of ids for each quota type. This is done by utilizing > + * an rb tree which root is stored in mem_dqinfo->dqi_priv for each quota > + * type. > + * > + * This format can be used to support quota on file system without persistent > + * storage such as tmpfs. > + */ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > + > +MODULE_AUTHOR("Lukas Czerner"); > +MODULE_DESCRIPTION("Quota in-memory format support"); > +MODULE_LICENSE("GPL"); > + > +/* > + * The following constants define the amount of time given a user > + * before the soft limits are treated as hard limits (usually resulting > + * in an allocation failure). The timer is started when the user crosses > + * their soft limit, it is reset when they go below their soft limit. > + */ > +#define MAX_IQ_TIME 604800 /* (7*24*60*60) 1 week */ > +#define MAX_DQ_TIME 604800 /* (7*24*60*60) 1 week */ > + > +struct quota_id { > + struct rb_node node; > + qid_t id; > +}; > + > +static int mem_check_quota_file(struct super_block *sb, int type) > +{ > + /* There is no real quota file, nothing to do */ > + return 1; > +} > + > +/* > + * There is no real quota file. Just allocate rb_root for quota ids and > + * set limits > + */ > +static int mem_read_file_info(struct super_block *sb, int type) > +{ > + struct quota_info *dqopt = sb_dqopt(sb); > + struct mem_dqinfo *info = &dqopt->info[type]; > + int ret = 0; > + > + down_read(&dqopt->dqio_sem); > + if (info->dqi_fmt_id != QFMT_MEM_ONLY) { > + ret = -EINVAL; > + goto out_unlock; > + } > + > + info->dqi_priv = kzalloc(sizeof(struct rb_root), GFP_NOFS); > + if (!info->dqi_priv) { > + ret = -ENOMEM; > + goto out_unlock; > + } > + > + /* > + * Used space is stored as unsigned 64-bit value in bytes but > + * quota core supports only signed 64-bit values so use that > + * as a limit > + */ > + info->dqi_max_spc_limit = 0x7fffffffffffffffLL; /* 2^63-1 */ > + info->dqi_max_ino_limit = 0x7fffffffffffffffLL; > + > + info->dqi_bgrace = MAX_DQ_TIME; > + info->dqi_igrace = MAX_IQ_TIME; > + info->dqi_flags = 0; > + > +out_unlock: > + up_read(&dqopt->dqio_sem); > + return ret; > +} > + > +static int mem_write_file_info(struct super_block *sb, int type) > +{ > + /* There is no real quota file, nothing to do */ > + return 0; > +} > + > +/* > + * Free all the quota_id entries in the rb tree and rb_root. > + */ > +static int mem_free_file_info(struct super_block *sb, int type) > +{ > + struct mem_dqinfo *info = &sb_dqopt(sb)->info[type]; > + struct rb_root *root = info->dqi_priv; > + struct quota_id *entry; > + struct rb_node *node; > + > + info->dqi_priv = NULL; > + node = rb_first(root); > + while (node) { > + entry = rb_entry(node, struct quota_id, node); > + node = rb_next(&entry->node); > + > + rb_erase(&entry->node, root); > + kfree(entry); > + } > + > + kfree(root); > + return 0; > +} > + > +/* > + * There is no real quota file, nothing to read. Just insert the id in > + * the rb tree. > + */ > +static int mem_read_dquot(struct dquot *dquot) > +{ > + struct mem_dqinfo *info = sb_dqinfo(dquot->dq_sb, dquot->dq_id.type); > + struct rb_node **n = &((struct rb_root *)info->dqi_priv)->rb_node; > + struct rb_node *parent = NULL, *new_node = NULL; > + struct quota_id *new_entry, *entry; > + qid_t id = from_kqid(&init_user_ns, dquot->dq_id); Hey Lukas, tmpfs instances can be mounted inside of mount namespaces owned by user namespaces as is the case in unprivileged containers. An easy example is: unshare --mount --user --map-root mount -t tmpfs tmpfs /mnt This tmpfs instances will be mounted with sb->s_user_ns set to the userns just created during the unshare call and not to init_user_ns. So this means that the filesystem idmapping isn't a 1:1 mapping. This needs to be taken into account: qid_t id = from_kqid(sb->s_user_ns, dquot->dq_id); similar below. But dquot_load_quota_sb() which you use in a later patch is restricted to the init_user_ns which means that your patch as it stands is only useable for tmpfs instances mounted in the init_user_ns. If that's intentional then the code above is probably fine but if it's not then you need preliminary patches to support quotas from filesystems mountable in non-initial user namespaces. Enabling this shouldn't be a big deal as it mostly involves updating callsites to account for sb->s_user_ns when reading and writing quotas. I've looked at that a while ago but there was no filesystem with quota support that was also mountable in a user namespaces. Idmapped mounts are already taken care of.