From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB6A7C77B75 for ; Tue, 18 Apr 2023 05:51:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2986D8E0003; Tue, 18 Apr 2023 01:51:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 248B08E0002; Tue, 18 Apr 2023 01:51:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 111928E0003; Tue, 18 Apr 2023 01:51:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 027018E0002 for ; Tue, 18 Apr 2023 01:51:06 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id ADEB9C0157 for ; Tue, 18 Apr 2023 05:51:06 +0000 (UTC) X-FDA: 80693438532.14.DA036C0 Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) by imf17.hostedemail.com (Postfix) with ESMTP id DA78F40006 for ; Tue, 18 Apr 2023 05:51:04 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=eVQSvVZn; spf=pass (imf17.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681797064; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+xpQNyOz7VTw6DZhKb12KiB+qMRPotwK8npnI3W4LeI=; b=DD+1p13a4uLcNC9w+FfDxLjORZsQe/ah4XHUall2L18BBchAvh1UIg7hfAt7vMCtiGrpyj QmObUvd8PGuBO+hRm4uxiRZWyuZlLDbwWrzhwEWihFx4+4QnXvsa962VTnyO45kH7gnEJm fBnBmbYOc2JXyBbE671DtAdszb4v+iE= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=eVQSvVZn; spf=pass (imf17.hostedemail.com: domain of hughd@google.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681797064; a=rsa-sha256; cv=none; b=zJ+v3O74VzEf4s6lgFH3Sh/cRfFSUI0XBPm603KLlVxvvh5oN3K1lwE3ePMXe7ds/5bPsI 3N2NmkNQo4Gyk7CIYZgqeK3DOP1D/kkoKqJOBraM8abgupKmA7qjHnSZ9cwvTEssL0Biws WI5d8Y3HKmPUFLT3rowx1PUxh1kMNJo= Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-54f6a796bd0so329481587b3.12 for ; Mon, 17 Apr 2023 22:51:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681797064; x=1684389064; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=+xpQNyOz7VTw6DZhKb12KiB+qMRPotwK8npnI3W4LeI=; b=eVQSvVZnHpv4UtnL3d1IArT5JCCMAch89QjNSMOaOTOiUixTwEX5nPhrnyPwjUAjMB 424v33pk/cVjjiU1Jn7GRAOFVFt/KJR7AwAv5CljSnqQ89dfJ75CgOw9j+hGkzaYT+vA QN7fVsbFrUsskpDENz2Yxhwtj/f+Zy8k0xNK8UdWMRFHneu+3i0Z95EnnWqFqtsS7Ih9 E5RqAIDqYQEZxpEJ5w4/q2PhptkRaPtESFZLDNUpUQKasYGbhk2MBVJoXRSK0WR+SJn8 r2Px0c87NiFayWURsLqOoaMoq0pCQHNlMPaxtv9Vk8EFHIu/bDcCWrM/FjIf+v7Gkt+a vrww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681797064; x=1684389064; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+xpQNyOz7VTw6DZhKb12KiB+qMRPotwK8npnI3W4LeI=; b=XGTr9pcGbJpcP02yu8DZC1Yg0qqIIZszX3tYbgrUEIpTRVEKJwb9QWw1mUDKPdYp+b sTMmmBfBLM6o+b2PiZTB0o2ImBIiqBK6p88XafNhkXabhCulTXzHUbtjMQH+dfJH9jxp 4oXaD/WEVEgNh08SguAWiJOYdD/7s6hDfW3Pr+447Iukhq6rdvSlFAg3bcdBTLMIQSjj 6mkVAAaMBBfUz/EpJw4A1hWrXB3Ikd7e+8L1GssOANg2j7y83R4VZAI7T7+viw5ZW/KE JPTQLDXm5aIo2J+EpGTCtwiyWwTb7c7jynvY8UmFomStsM3UJujVz1pYWZDmDAozsHiY jf/w== X-Gm-Message-State: AAQBX9caLFtvT6eJEJAASP00VQaRt/KXswGqLOJYBgaBDXtxOgW+00TS Qk0L3E+jDkLkXujgCAgF/tm5GA== X-Google-Smtp-Source: AKy350b1yeSoPWmvaZpWgPJDJ54+okyDePh5HjztNy6rKMYcu1cJxVO+b0XtPbS605f1x3+UcutWBw== X-Received: by 2002:a81:a011:0:b0:552:96d2:c70 with SMTP id x17-20020a81a011000000b0055296d20c70mr12181336ywg.42.1681797063533; Mon, 17 Apr 2023 22:51:03 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 22-20020a810b16000000b0054fae5ed408sm3622950ywl.45.2023.04.17.22.51.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Apr 2023 22:51:02 -0700 (PDT) Date: Mon, 17 Apr 2023 22:50:59 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Luis Chamberlain cc: hughd@google.com, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org, linux-mm@kvack.org, p.raghav@samsung.com, da.gomez@samsung.com, a.manzanares@samsung.com, dave@stgolabs.net, yosryahmed@google.com, keescook@chromium.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 6/6] shmem: add support to ignore swap In-Reply-To: <20230309230545.2930737-7-mcgrof@kernel.org> Message-ID: <79eae9fe-7818-a65c-89c6-138b55d609a@google.com> References: <20230309230545.2930737-1-mcgrof@kernel.org> <20230309230545.2930737-7-mcgrof@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DA78F40006 X-Rspam-User: X-Stat-Signature: rdibaojnzrhp75km9pm4ogcok59x9csi X-HE-Tag: 1681797064-869348 X-HE-Meta: U2FsdGVkX1/lJOakMlMHyF/butcWgMk37aqxteyLvAgQr94rURVaY2FL1xzE0kQi377lh02ii3CgZ0o4cIkiPlVGKrPysf4FX0W1nNMYL5TZykf6fNTlzCIvkmTs9Ymv/eAdTgfQ0IPvFQJqvL68qqnWX7+ELO25HYs+0W6bbYZbsV7exI1Jlm30hYwcTksGMRaPzf2F+Bxu7IHPSxjY6vaX6r24knwASMMGcJ1dUlrhw3Mzen15kAU+OLDitQtgzsTlmAHx5yLnaExqSZTj/9oUlAite/dFx78X5Fl1A9Sney+pct3MUBfGItY1eyKHgn6KX6K+7YLussiZ2OuR0UDHC3xckeXfRKH7ZBBLfa4ZuRsih6n4w0D4REPXbOaf+eCF0F+JwVlXY7zX4SX4Nc3u3ArVcYQaREqguV4W8As/yBz2Kx+IrsqofmRNg1Z0uV1gMGnjoeWqyZHm9c6iNl9BSzC3Js9moD76wB3xIEMVIL3Pa59Qwb7vh3RzMlykYd+X+DhoCcLvIVJAptu5j14m1txv7dgsInlZTBgtNWgvwmnlFBNr4fR/GoCC5u2A5B3Dko1vqmt6On77+9rQIVHoFdjLTs3ldO5uLkEJuOQs975zPWtIMdPAhc5f6AKL+J4MUjycGlWvZXltWSfFwr8I6Z2nQrWlYeDinzadFwwgfbQit/KhVbbyD6qZIJnL0apDx1qxfSc4BSjdw2LN/mFGBGtb7TWk1XpFGFKvEJ62edOivXDqYna1leQwFJvNUli6FEFVNdaa9rNwyR3lL/0RHtyF6WDRla1HGoR5OJxTi25NbiQHUUf6om2qfIiYfUnJKPT06t+180XwDpfqLyQyhUqQdagvisHFFrXM00XDg3FPemR1/GEHKCbbAk4Hi0LhYxlbrBBMKSyCv7VwFw87x/wxE8F93Ot/Ulkuw8MzOCr5cqJlvwcv7s5rOBcsOka1jypBvKiwX1TKirX eD+EDAx/ O9orQqyY/R3KEgWPKeE7pDIvoD3JphgLVJBIvmOMrcluFH6EfUPBOewzWM8bRU/xKaJWuNZ3HqmFnISrDsTpd5DCyc4kC1Yph4sapJAQWOMGTkKBDjhHlti5mw8bF7R5In4KwNEsFh4fhhJD/PH3BsY4ZtmyfmK9zQPqhaqAxhXNX4376O5uWNtWafMWQJZxCQ8zp/rxK7i1d8OgUNW3uEj30zzlzjT3fRwGDNdtNVXqhMxptEb/p1OOVaSZPn45lY5dHFT67wRdOk4tamSg/+msmEUS63M+jnKj8vKn5vNUC/Mcdphpzf61g+MzFV4R2Uqr9YXG6qq4zPG/VoIfJYJBVDl3ZM1QgCKMcOD3OThwXxEf0Xckvsa1pAo65E/aOZa0LrHNRCpmBFJJmg4BumBs6Cx/IdgkiGuPA8UBC4YkyCHiCCaljsVELJQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 9 Mar 2023, Luis Chamberlain wrote: > In doing experimentations with shmem having the option to avoid swap > becomes a useful mechanism. One of the *raves* about brd over shmem is > you can avoid swap, but that's not really a good reason to use brd if > we can instead use shmem. Using brd has its own good reasons to exist, > but just because "tmpfs" doesn't let you do that is not a great reason > to avoid it if we can easily add support for it. > > I don't add support for reconfiguring incompatible options, but if > we really wanted to we can add support for that. > > To avoid swap we use mapping_set_unevictable() upon inode creation, > and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim. I have one big question here, which betrays my ignorance: I hope that you or Christian can reassure me on this. tmpfs has fs_flags FS_USERNS_MOUNT. I know nothing about namespaces, nothing; but from overhearings, wonder if an ordinary user in a namespace might be able to mount their own tmpfs with "noswap", and thereby evade all accounting of the locked memory. That would be an absolute no-no for this patch; but I assume that even if so, it can be easily remedied by inserting an appropriate (unknown to me!) privilege check where the "noswap" option is validated. I did idly wonder what happens with "noswap" when CONFIG_SWAP is not enabled, or no swap is enabled; but I think it would be a waste of time and code to worry over doing anything different from whatever behaviour falls out trivially. You'll be sending a manpage update to Alejandro in due course, I think. Thanks, Hugh > > Acked-by: Christian Brauner > Signed-off-by: Luis Chamberlain > --- > Documentation/filesystems/tmpfs.rst | 9 ++++++--- > Documentation/mm/unevictable-lru.rst | 2 ++ > include/linux/shmem_fs.h | 1 + > mm/shmem.c | 28 +++++++++++++++++++++++++++- > 4 files changed, 36 insertions(+), 4 deletions(-) > > diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst > index 1ec9a9f8196b..f18f46be5c0c 100644 > --- a/Documentation/filesystems/tmpfs.rst > +++ b/Documentation/filesystems/tmpfs.rst > @@ -13,7 +13,8 @@ everything stored therein is lost. > > tmpfs puts everything into the kernel internal caches and grows and > shrinks to accommodate the files it contains and is able to swap > -unneeded pages out to swap space, and supports THP. > +unneeded pages out to swap space, if swap was enabled for the tmpfs > +mount. tmpfs also supports THP. > > tmpfs extends ramfs with a few userspace configurable options listed and > explained further below, some of which can be reconfigured dynamically on the > @@ -33,8 +34,8 @@ configured in size at initialization and you cannot dynamically resize them. > Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the > block layer at all. > > -Since tmpfs lives completely in the page cache and on swap, all tmpfs > -pages will be shown as "Shmem" in /proc/meminfo and "Shared" in > +Since tmpfs lives completely in the page cache and optionally on swap, > +all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared" in > free(1). Notice that these counters also include shared memory > (shmem, see ipcs(1)). The most reliable way to get the count is > using df(1) and du(1). > @@ -83,6 +84,8 @@ nr_inodes The maximum number of inodes for this instance. The default > is half of the number of your physical RAM pages, or (on a > machine with highmem) the number of lowmem RAM pages, > whichever is the lower. > +noswap Disables swap. Remounts must respect the original settings. > + By default swap is enabled. > ========= ============================================================ > > These parameters accept a suffix k, m or g for kilo, mega and giga and > diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevictable-lru.rst > index 92ac5dca420c..d5ac8511eb67 100644 > --- a/Documentation/mm/unevictable-lru.rst > +++ b/Documentation/mm/unevictable-lru.rst > @@ -42,6 +42,8 @@ The unevictable list addresses the following classes of unevictable pages: > > * Those owned by ramfs. > > + * Those owned by tmpfs with the noswap mount option. > + > * Those mapped into SHM_LOCK'd shared memory regions. > > * Those mapped into VM_LOCKED [mlock()ed] VMAs. > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h > index 103d1000a5a2..50bf82b36995 100644 > --- a/include/linux/shmem_fs.h > +++ b/include/linux/shmem_fs.h > @@ -45,6 +45,7 @@ struct shmem_sb_info { > kuid_t uid; /* Mount uid for root directory */ > kgid_t gid; /* Mount gid for root directory */ > bool full_inums; /* If i_ino should be uint or ino_t */ > + bool noswap; /* ignores VM reclaim / swap requests */ > ino_t next_ino; /* The next per-sb inode number to use */ > ino_t __percpu *ino_batch; /* The next per-cpu inode number to use */ > struct mempolicy *mpol; /* default memory policy for mappings */ > diff --git a/mm/shmem.c b/mm/shmem.c > index dfd995da77b4..2e122c72b375 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -119,10 +119,12 @@ struct shmem_options { > bool full_inums; > int huge; > int seen; > + bool noswap; > #define SHMEM_SEEN_BLOCKS 1 > #define SHMEM_SEEN_INODES 2 > #define SHMEM_SEEN_HUGE 4 > #define SHMEM_SEEN_INUMS 8 > +#define SHMEM_SEEN_NOSWAP 16 > }; > > #ifdef CONFIG_TMPFS > @@ -1337,6 +1339,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) > struct address_space *mapping = folio->mapping; > struct inode *inode = mapping->host; > struct shmem_inode_info *info = SHMEM_I(inode); > + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); > swp_entry_t swap; > pgoff_t index; > > @@ -1350,7 +1353,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) > if (WARN_ON_ONCE(!wbc->for_reclaim)) > goto redirty; > > - if (WARN_ON_ONCE(info->flags & VM_LOCKED)) > + if (WARN_ON_ONCE((info->flags & VM_LOCKED) || sbinfo->noswap)) > goto redirty; > > if (!total_swap_pages) > @@ -2487,6 +2490,8 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block > shmem_set_inode_flags(inode, info->fsflags); > INIT_LIST_HEAD(&info->shrinklist); > INIT_LIST_HEAD(&info->swaplist); > + if (sbinfo->noswap) > + mapping_set_unevictable(inode->i_mapping); > simple_xattrs_init(&info->xattrs); > cache_no_acl(inode); > mapping_set_large_folios(inode->i_mapping); > @@ -3574,6 +3579,7 @@ enum shmem_param { > Opt_uid, > Opt_inode32, > Opt_inode64, > + Opt_noswap, > }; > > static const struct constant_table shmem_param_enums_huge[] = { > @@ -3595,6 +3601,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = { > fsparam_u32 ("uid", Opt_uid), > fsparam_flag ("inode32", Opt_inode32), > fsparam_flag ("inode64", Opt_inode64), > + fsparam_flag ("noswap", Opt_noswap), > {} > }; > > @@ -3678,6 +3685,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) > ctx->full_inums = true; > ctx->seen |= SHMEM_SEEN_INUMS; > break; > + case Opt_noswap: > + ctx->noswap = true; > + ctx->seen |= SHMEM_SEEN_NOSWAP; > + break; > } > return 0; > > @@ -3776,6 +3787,14 @@ static int shmem_reconfigure(struct fs_context *fc) > err = "Current inum too high to switch to 32-bit inums"; > goto out; > } > + if ((ctx->seen & SHMEM_SEEN_NOSWAP) && ctx->noswap && !sbinfo->noswap) { > + err = "Cannot disable swap on remount"; > + goto out; > + } > + if (!(ctx->seen & SHMEM_SEEN_NOSWAP) && !ctx->noswap && sbinfo->noswap) { > + err = "Cannot enable swap on remount if it was disabled on first mount"; > + goto out; > + } > > if (ctx->seen & SHMEM_SEEN_HUGE) > sbinfo->huge = ctx->huge; > @@ -3796,6 +3815,10 @@ static int shmem_reconfigure(struct fs_context *fc) > sbinfo->mpol = ctx->mpol; /* transfers initial ref */ > ctx->mpol = NULL; > } > + > + if (ctx->noswap) > + sbinfo->noswap = true; > + > raw_spin_unlock(&sbinfo->stat_lock); > mpol_put(mpol); > return 0; > @@ -3850,6 +3873,8 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root) > seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge)); > #endif > shmem_show_mpol(seq, sbinfo->mpol); > + if (sbinfo->noswap) > + seq_printf(seq, ",noswap"); > return 0; > } > > @@ -3893,6 +3918,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc) > ctx->inodes = shmem_default_max_inodes(); > if (!(ctx->seen & SHMEM_SEEN_INUMS)) > ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64); > + sbinfo->noswap = ctx->noswap; > } else { > sb->s_flags |= SB_NOUSER; > } > -- > 2.39.1