From: Qi Zheng <zhengqi.arch@bytedance.com>
To: Dave Chinner <david@fromorbit.com>
Cc: paulmck@kernel.org, Vlastimil Babka <vbabka@suse.cz>,
akpm@linux-foundation.org, tkhai@ya.ru, roman.gushchin@linux.dev,
djwong@kernel.org, brauner@kernel.org, tytso@mit.edu,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
linux-arm-msm@vger.kernel.org, dm-devel@redhat.com,
linux-raid@vger.kernel.org, linux-bcache@vger.kernel.org,
virtualization@lists.linux-foundation.org,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 24/29] mm: vmscan: make global slab shrink lockless
Date: Tue, 4 Jul 2023 12:20:41 +0800 [thread overview]
Message-ID: <38b14080-4ce5-d300-8a0a-c630bca6806b@bytedance.com> (raw)
In-Reply-To: <a7baf44a-1eb8-d4e1-d112-93cf9cdb7beb@bytedance.com>
Hi Dave,
On 2023/6/24 19:08, Qi Zheng wrote:
> Hi Dave,
>
> On 2023/6/24 06:19, Dave Chinner wrote:
>> On Fri, Jun 23, 2023 at 09:10:57PM +0800, Qi Zheng wrote:
>>> On 2023/6/23 14:29, Dave Chinner wrote:
>>>> On Thu, Jun 22, 2023 at 05:12:02PM +0200, Vlastimil Babka wrote:
>>>>> On 6/22/23 10:53, Qi Zheng wrote:
>>>> Yes, I suggested the IDR route because radix tree lookups under RCU
>>>> with reference counted objects are a known safe pattern that we can
>>>> easily confirm is correct or not. Hence I suggested the unification
>>>> + IDR route because it makes the life of reviewers so, so much
>>>> easier...
>>>
>>> In fact, I originally planned to try the unification + IDR method you
>>> suggested at the beginning. But in the case of CONFIG_MEMCG disabled,
>>> the struct mem_cgroup is not even defined, and root_mem_cgroup and
>>> shrinker_info will not be allocated. This required more code
>>> changes, so
>>> I ended up keeping the shrinker_list and implementing the above pattern.
>>
>> Yes. Go back and read what I originally said needed to be done
>> first. In the case of CONFIG_MEMCG=n, a dummy root memcg still needs
>> to exist that holds all of the global shrinkers. Then shrink_slab()
>> is only ever passed a memcg that should be iterated.
>>
>> Yes, it needs changes external to the shrinker code itself to be
>> made to work. And even if memcg's are not enabled, we can still use
>> the memcg structures to ensure a common abstraction is used for the
>> shrinker tracking infrastructure....
>
> Yeah, what I imagined before was to define a more concise struct
> mem_cgroup in the case of CONFIG_MEMCG=n, then allocate a dummy root
> memcg on system boot:
>
> #ifdef !CONFIG_MEMCG
>
> struct shrinker_info {
> struct rcu_head rcu;
> atomic_long_t *nr_deferred;
> unsigned long *map;
> int map_nr_max;
> };
>
> struct mem_cgroup_per_node {
> struct shrinker_info __rcu *shrinker_info;
> };
>
> struct mem_cgroup {
> struct mem_cgroup_per_node *nodeinfo[];
> };
>
> #endif
These days I tried doing this:
1. CONFIG_MEMCG && !mem_cgroup_disabled()
track all global shrinkers with root_mem_cgroup.
2. CONFIG_MEMCG && mem_cgroup_disabled()
the root_mem_cgroup is also allocated in this case, so still use
root_mem_cgroup to track all global shrinkers.
3. !CONFIG_MEMCG
allocate a dummy memcg during system startup (after cgroup_init())
and use it to track all global shrinkers
This works, but needs to modify the startup order of some subsystems,
because some shrinkers will be registered before root_mem_cgroup is
allocated, such as:
1. rcu-kfree shrinker in rcu_init()
2. super block shrinkers in vfs_caches_init()
And cgroup_init() also depends on some file system infrastructure, so
I made some changes (rough and unorganized):
diff --git a/fs/namespace.c b/fs/namespace.c
index e157efc54023..6a12d3d0064e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4706,7 +4706,7 @@ static void __init init_mount_tree(void)
void __init mnt_init(void)
{
- int err;
+ //int err;
mnt_cache = kmem_cache_create("mnt_cache", sizeof(struct mount),
0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
NULL);
@@ -4725,15 +4725,7 @@ void __init mnt_init(void)
if (!mount_hashtable || !mountpoint_hashtable)
panic("Failed to allocate mount hash table\n");
- kernfs_init();
-
- err = sysfs_init();
- if (err)
- printk(KERN_WARNING "%s: sysfs_init error: %d\n",
- __func__, err);
- fs_kobj = kobject_create_and_add("fs", NULL);
- if (!fs_kobj)
- printk(KERN_WARNING "%s: kobj create error\n", __func__);
shmem_init();
init_rootfs();
init_mount_tree();
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 7d9c2a63b7cd..d87c67f6f66e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -119,6 +119,7 @@ static inline void call_rcu_hurry(struct rcu_head
*head, rcu_callback_t func)
/* Internal to kernel */
void rcu_init(void);
+void rcu_shrinker_init(void);
extern int rcu_scheduler_active;
void rcu_sched_clock_irq(int user);
void rcu_report_dead(unsigned int cpu);
diff --git a/init/main.c b/init/main.c
index ad920fac325c..4190fc6d10ad 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1049,14 +1049,22 @@ void start_kernel(void)
security_init();
dbg_late_init();
net_ns_init();
+ kernfs_init();
+ if (sysfs_init())
+ printk(KERN_WARNING "%s: sysfs_init error\n",
+ __func__);
+ fs_kobj = kobject_create_and_add("fs", NULL);
+ if (!fs_kobj)
+ printk(KERN_WARNING "%s: kobj create error\n", __func__);
+ proc_root_init();
+ cgroup_init();
vfs_caches_init();
pagecache_init();
signals_init();
seq_file_init();
- proc_root_init();
nsfs_init();
cpuset_init();
- cgroup_init();
+ rcu_shrinker_init();
taskstats_init_early();
delayacct_init();
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d068ce3567fc..71a04ae8defb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4953,7 +4953,10 @@ static void __init kfree_rcu_batch_init(void)
INIT_DELAYED_WORK(&krcp->page_cache_work,
fill_page_cache_func);
krcp->initialized = true;
}
+}
+void __init rcu_shrinker_init(void)
+{
kfree_rcu_shrinker = shrinker_alloc(0, "rcu-kfree");
if (!kfree_rcu_shrinker) {
pr_err("Failed to allocate kfree_rcu() shrinker!\n");
I adjusted it step by step according to the errors reported, and there
may be hidden problems (needs more review and testing).
In addition, unifying the processing of global and memcg slab shrink
does have many benefits:
1. shrinker::nr_deferred can be removed
2. shrinker_list can be removed
3. simplifies the existing code logic and subsequent lockless processing
But I'm still a bit apprehensive about modifying the boot order. :(
What do you think about this?
Thanks,
Qi
>
> But I have a concern: if all global shrinkers are tracking with the
> info->map of root memcg, a shrinker->id needs to be assigned to them,
> which will cause info->map_nr_max to become larger than before, then
> making the traversal of info->map slower.
>
>>
>>> If the above pattern is not safe, I will go back to the unification +
>>> IDR method.
>>
>> And that is exactly how we got into this mess in the first place....
>
> I only found one similar pattern in the kernel:
>
> fs/smb/server/oplock.c:find_same_lease_key/smb_break_all_levII_oplock/lookup_lease_in_table
>
> But IIUC, the refcount here needs to be decremented after holding
> rcu lock as I did above.
>
> So regardless of whether we choose unification + IDR in the end, I still
> want to confirm whether the pattern I implemented above is safe. :)
>
> Thanks,
> Qi
>
>>
>> -Dave
next prev parent reply other threads:[~2023-07-04 4:20 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-22 8:53 [PATCH 00/29] use refcount+RCU method to implement lockless slab shrink Qi Zheng
2023-06-22 8:53 ` [PATCH 01/29] mm: shrinker: add shrinker::private_data field Qi Zheng
2023-06-22 14:47 ` Vlastimil Babka
2023-06-23 12:50 ` [External] " Qi Zheng
2023-06-22 8:53 ` [PATCH 02/29] mm: vmscan: introduce some helpers for dynamically allocating shrinker Qi Zheng
2023-06-23 6:12 ` Dave Chinner
2023-06-23 12:49 ` Qi Zheng
2023-06-22 8:53 ` [PATCH 03/29] drm/i915: dynamically allocate the i915_gem_mm shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 04/29] drm/msm: dynamically allocate the drm-msm_gem shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 05/29] drm/panfrost: dynamically allocate the drm-panfrost shrinker Qi Zheng
2023-06-23 13:33 ` Qi Zheng
2023-06-23 14:18 ` Bobs_Email
2023-06-22 8:53 ` [PATCH 06/29] dm: dynamically allocate the dm-bufio shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 07/29] dm zoned: dynamically allocate the dm-zoned-meta shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 08/29] md/raid5: dynamically allocate the md-raid5 shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 09/29] bcache: dynamically allocate the md-bcache shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 10/29] vmw_balloon: dynamically allocate the vmw-balloon shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 11/29] virtio_balloon: dynamically allocate the virtio-balloon shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 12/29] mbcache: dynamically allocate the mbcache shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 13/29] ext4: dynamically allocate the ext4-es shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 14/29] jbd2,ext4: dynamically allocate the jbd2-journal shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 15/29] NFSD: dynamically allocate the nfsd-client shrinker Qi Zheng
2023-06-23 21:49 ` Chuck Lever
2023-06-24 11:17 ` Qi Zheng
2023-06-22 8:53 ` [PATCH 16/29] NFSD: dynamically allocate the nfsd-reply shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 17/29] xfs: dynamically allocate the xfs-buf shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 18/29] xfs: dynamically allocate the xfs-inodegc shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 19/29] xfs: dynamically allocate the xfs-qm shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 20/29] zsmalloc: dynamically allocate the mm-zspool shrinker Qi Zheng
2023-06-22 8:53 ` [PATCH 21/29] fs: super: dynamically allocate the s_shrink Qi Zheng
2023-06-22 8:53 ` [PATCH 22/29] drm/ttm: introduce pool_shrink_rwsem Qi Zheng
2023-06-22 8:53 ` [PATCH 23/29] mm: shrinker: add refcount and completion_wait fields Qi Zheng
2023-06-22 8:53 ` [PATCH 24/29] mm: vmscan: make global slab shrink lockless Qi Zheng
2023-06-22 15:12 ` Vlastimil Babka
2023-06-22 16:42 ` Qi Zheng
2023-06-22 17:41 ` Alan Huang
2023-06-22 18:18 ` Qi Zheng
2023-06-23 6:29 ` Dave Chinner
2023-06-23 13:10 ` Qi Zheng
2023-06-23 22:19 ` Dave Chinner
2023-06-24 11:08 ` Qi Zheng
2023-06-25 3:15 ` Qi Zheng
2023-07-04 4:20 ` Qi Zheng [this message]
2023-07-03 16:39 ` Paul E. McKenney
2023-07-04 3:45 ` Qi Zheng
2023-07-05 3:27 ` Qi Zheng
2023-06-22 8:53 ` [PATCH 25/29] mm: vmscan: make memcg " Qi Zheng
2023-06-22 8:53 ` [PATCH 26/29] mm: shrinker: make count and scan in shrinker debugfs lockless Qi Zheng
2023-06-22 8:53 ` [PATCH 27/29] mm: vmscan: hold write lock to reparent shrinker nr_deferred Qi Zheng
2023-06-22 8:53 ` [PATCH 28/29] mm: shrinkers: convert shrinker_rwsem to mutex Qi Zheng
2023-06-22 8:53 ` [PATCH 29/29] mm: shrinker: move shrinker-related code into a separate file Qi Zheng
2023-06-22 14:53 ` Vlastimil Babka
2023-06-23 13:12 ` Qi Zheng
2023-06-23 5:25 ` Sergey Senozhatsky
2023-06-23 13:24 ` Qi Zheng
2023-06-22 9:02 ` [PATCH 00/29] use refcount+RCU method to implement lockless slab shrink Qi Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=38b14080-4ce5-d300-8a0a-c630bca6806b@bytedance.com \
--to=zhengqi.arch@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=dm-devel@redhat.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=tkhai@ya.ru \
--cc=tytso@mit.edu \
--cc=vbabka@suse.cz \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox