linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: akpm@linux-foundation.org
Cc: linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, kernel-team@fb.com,
	shakeel.butt@linux.dev, wqu@suse.com, willy@infradead.org,
	mhocko@kernel.org, muchun.song@linux.dev,
	roman.gushchin@linux.dev, hannes@cmpxchg.org
Subject: [PATCH v4 1/3] mm/filemap: add AS_KERNEL_FILE
Date: Thu, 21 Aug 2025 14:55:35 -0700	[thread overview]
Message-ID: <f09c4e2c90351d4cb30a1969f7a863b9238bd291.1755812945.git.boris@bur.io> (raw)
In-Reply-To: <cover.1755812945.git.boris@bur.io>

Btrfs currently tracks its metadata pages in the page cache, using a
fake inode (fs_info->btree_inode) with offsets corresponding to where
the metadata is stored in the filesystem's full logical address space.

A consequence of this is that when btrfs uses filemap_add_folio(), this
usage is charged to the cgroup of whichever task happens to be running
at the time. These folios don't belong to any particular user cgroup, so
I don't think it makes much sense for them to be charged in that way.
Some negative consequences as a result:
- A task can be holding some important btrfs locks, then need to lookup
  some metadata and go into reclaim, extending the duration it holds
  that lock for, and unfairly pushing its own reclaim pain onto other
  cgroups.
- If that cgroup goes into reclaim, it might reclaim these folios a
  different non-reclaiming cgroup might need soon. This is naturally
  offset by LRU reclaim, but still.

We have two options for how to manage such file pages:
1. charge them to the root cgroup.
2. don't charge them to any cgroup at all.

2. breaks the invariant that every mapped page has a cgroup. This is
workable, but unnecessarily risky. Therefore, go with 1.

A very similar proposal to use the root cgroup was previously made by
Qu, where he eventually proposed the idea of setting it per
address_space. This makes good sense for the btrfs use case, as the
behavior should apply to all use of the address_space, not select
allocations. I.e., if someone adds another filemap_add_folio() call
using btrfs's btree_inode, we would almost certainly want to account
that to the root cgroup as well.

Link: https://lore.kernel.org/linux-mm/b5fef5372ae454a7b6da4f2f75c427aeab6a07d6.1727498749.git.wqu@suse.com/
Suggested-by: Qu Wenruo <wqu@suse.com>
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Boris Burkov <boris@bur.io>
---
 include/linux/pagemap.h | 2 ++
 mm/filemap.c            | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c9ba69e02e3e..a3e16d74792f 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -211,6 +211,8 @@ enum mapping_flags {
 				   folio contents */
 	AS_INACCESSIBLE = 8,	/* Do not attempt direct R/W access to the mapping */
 	AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
+	AS_KERNEL_FILE = 10,	/* mapping for a fake kernel file that shouldn't
+				   account usage to user cgroups */
 	/* Bits 16-25 are used for FOLIO_ORDER */
 	AS_FOLIO_ORDER_BITS = 5,
 	AS_FOLIO_ORDER_MIN = 16,
diff --git a/mm/filemap.c b/mm/filemap.c
index e4a5a46db89b..05c1384bd611 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -960,8 +960,14 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
 {
 	void *shadow = NULL;
 	int ret;
+	struct mem_cgroup *tmp;
+	bool kernel_file = test_bit(AS_KERNEL_FILE, &mapping->flags);
 
+	if (kernel_file)
+		tmp = set_active_memcg(root_mem_cgroup);
 	ret = mem_cgroup_charge(folio, NULL, gfp);
+	if (kernel_file)
+		set_active_memcg(tmp);
 	if (ret)
 		return ret;
 
-- 
2.50.1



  reply	other threads:[~2025-08-21 21:55 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-21 21:55 [PATCH v4 0/3] introduce kernel file mapped folios Boris Burkov
2025-08-21 21:55 ` Boris Burkov [this message]
2025-08-21 22:25   ` [PATCH v4 1/3] mm/filemap: add AS_KERNEL_FILE Shakeel Butt
2025-08-21 22:51   ` [PATCH] mm: fix CONFIG_MEMCG build for AS_KERNEL_FILE Boris Burkov
2025-08-22 13:46   ` [PATCH v4 1/3] mm/filemap: add AS_KERNEL_FILE kernel test robot
2025-08-21 21:55 ` [PATCH v4 2/3] mm: add vmstat for kernel_file pages Boris Burkov
2025-08-21 21:55 ` [PATCH v4 3/3] btrfs: set AS_KERNEL_FILE on the btree_inode Boris Burkov
2025-08-27 17:47 ` [PATCH v4 0/3] introduce kernel file mapped folios Shakeel Butt
2025-08-29  1:52 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f09c4e2c90351d4cb30a1969f7a863b9238bd291.1755812945.git.boris@bur.io \
    --to=boris@bur.io \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=willy@infradead.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox