linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] make unregistration of super_block shrinker more faster
@ 2023-05-31  9:57 Qi Zheng
  2023-05-31  9:57 ` [PATCH 1/8] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu() Qi Zheng
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Qi Zheng @ 2023-05-31  9:57 UTC (permalink / raw)
  To: akpm, tkhai, roman.gushchin, vbabka, viro, brauner, djwong,
	hughd, paulmck, muchun.song
  Cc: linux-mm, linux-fsdevel, linux-xfs, linux-kernel, Qi Zheng

From: Qi Zheng <zhengqi.arch@bytedance.com>

Hi all,

This patch series aims to make unregistration of super_block shrinker more
faster.

1. Background
=============

The kernel test robot noticed a -88.8% regression of stress-ng.ramfs.ops_per_sec
on commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). More
details can be seen from the link[1] below.

[1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/

We can just use the following command to reproduce the result:

stress-ng --timeout 60 --times --verify --metrics-brief --ramfs 9 &

1) before commit f95bdb700bc6b:

stress-ng: info:  [11023] dispatching hogs: 9 ramfs
stress-ng: info:  [11023] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: info:  [11023]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [11023] ramfs            774966     60.00     10.18    169.45     12915.89        4314.26
stress-ng: info:  [11023] for a 60.00s run time:
stress-ng: info:  [11023]    1920.11s available CPU time
stress-ng: info:  [11023]      10.18s user time   (  0.53%)
stress-ng: info:  [11023]     169.44s system time (  8.82%)
stress-ng: info:  [11023]     179.62s total time  (  9.35%)
stress-ng: info:  [11023] load average: 8.99 2.69 0.93
stress-ng: info:  [11023] successful run completed in 60.00s (1 min, 0.00 secs)

2) after commit f95bdb700bc6b:

stress-ng: info:  [37676] dispatching hogs: 9 ramfs
stress-ng: info:  [37676] stressor       bogo ops real time  usrtime  sys time   bogo ops/s     bogo ops/s
stress-ng: info:  [37676]                           (secs)    (secs)   (secs)   (real time) (usr+sys time)
stress-ng: info:  [37676] ramfs            168673     60.00     1.61    39.66      2811.08        4087.47
stress-ng: info:  [37676] for a 60.10s run time:
stress-ng: info:  [37676]    1923.36s available CPU time
stress-ng: info:  [37676]       1.60s user time   (  0.08%)
stress-ng: info:  [37676]      39.66s system time (  2.06%)
stress-ng: info:  [37676]      41.26s total time  (  2.15%)
stress-ng: info:  [37676] load average: 7.69 3.63 2.36
stress-ng: info:  [37676] successful run completed in 60.10s (1 min, 0.10 secs)

The root cause is that SRCU has to be careful to not frequently check for srcu
read-side critical section exits. Paul E. McKenney gave a detailed explanation:

```
In practice, the act of checking to see if there is anyone in an SRCU
read-side critical section is a heavy-weight operation, involving at
least one cache miss per CPU along with a number of full memory barriers.
```

Therefore, even if no one is currently in the SRCU read-side critical section,
synchronize_srcu() cannot return quickly. That's why unregister_shrinker() has
become slower.

2. Idea
=======

2.1 use synchronize_srcu_expedited() ?
--------------------------------------

The synchronize_srcu_expedited() will let SRCU to be much more aggressive.
If we use it to replace synchronize_srcu() in the unregister_shrinker(), the
ops/s will return to previous levels:

stress-ng: info:  [13159] dispatching hogs: 9 ramfs
stress-ng: info:  [13159] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: info:  [13159]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [13159] ramfs            710062     60.00      9.63    157.26     11834.18        4254.75
stress-ng: info:  [13159] for a 60.00s run time:
stress-ng: info:  [13159]    1920.14s available CPU time
stress-ng: info:  [13159]       9.62s user time   (  0.50%)
stress-ng: info:  [13159]     157.26s system time (  8.19%)
stress-ng: info:  [13159]     166.88s total time  (  8.69%)
stress-ng: info:  [13159] load average: 9.49 4.02 1.65
stress-ng: info:  [13159] successful run completed in 60.00s (1 min, 0.00 secs)

But because SRCU (Sleepable RCU) is used here, the reader is allowed to sleep in
the read-side critical section, so synchronize_srcu_expedited() may cause a lot
of CPU consumption, so this is not a good choice.

2.2 move synchronize_srcu() to the asynchronous delayed work
------------------------------------------------------------

Kirill Tkhai proposed a better idea[2] in 2018: move synchronize_srcu() to the
asynchronous delayed work, then it doesn't affect on user-visible unregistration
speed.

[2]. https://lore.kernel.org/lkml/153365636747.19074.12610817307548583381.stgit@localhost.localdomain/

After applying his patches ([PATCH RFC 04/10]~[PATCH RFC 10/10], with few
conflicts), the ops/s is of course back to the previous levels:

stress-ng: info:  [11506] setting to a 60 second run per stressor
stress-ng: info:  [11506] dispatching hogs: 9 ramfs
stress-ng: info:  [11506] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s
stress-ng: info:  [11506]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [11506] ramfs            829462     60.00     10.81    174.25     13824.14        4482.08
stress-ng: info:  [11506] for a 60.00s run time:
stress-ng: info:  [11506]    1920.12s available CPU time
stress-ng: info:  [11506]      10.81s user time   (  0.56%)
stress-ng: info:  [11506]     174.25s system time (  9.07%)
stress-ng: info:  [11506]     185.06s total time  (  9.64%)
stress-ng: info:  [11506] load average: 8.96 2.60 0.89
stress-ng: info:  [11506] successful run completed in 60.00s (1 min, 0.00 secs)

In order to continue to advance this patch set, I rebase these patches onto the
next-20230525. Any comments and suggestions are welcome.

Note: This patch serise is only for super_block shrinker, all further
time-critical for unregistration places may be written in the same conception.

Thanks,
Qi

Kirill Tkhai (7):
  mm: vmscan: split unregister_shrinker()
  fs: move list_lru_destroy() to destroy_super_work()
  fs: shrink only (SB_ACTIVE|SB_BORN) superblocks in super_cache_scan()
  fs: introduce struct super_operations::destroy_super() callback
  xfs: introduce xfs_fs_destroy_super()
  shmem: implement shmem_destroy_super()
  fs: use unregister_shrinker_delayed_{initiate, finalize} for
    super_block shrinker

Qi Zheng (1):
  mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu()

 fs/super.c               | 32 ++++++++++++++++++--------------
 fs/xfs/xfs_super.c       | 25 ++++++++++++++++++++++---
 include/linux/fs.h       |  6 ++++++
 include/linux/shrinker.h |  2 ++
 mm/shmem.c               |  8 ++++++++
 mm/vmscan.c              | 26 ++++++++++++++++++++------
 6 files changed, 76 insertions(+), 23 deletions(-)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/8] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu()
  2023-05-31  9:57 [PATCH 0/8] make unregistration of super_block shrinker more faster Qi Zheng
@ 2023-05-31  9:57 ` Qi Zheng
  2023-05-31 10:49   ` Christian Brauner
  2023-05-31  9:57 ` [PATCH 2/8] mm: vmscan: split unregister_shrinker() Qi Zheng
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Qi Zheng @ 2023-05-31  9:57 UTC (permalink / raw)
  To: akpm, tkhai, roman.gushchin, vbabka, viro, brauner, djwong,
	hughd, paulmck, muchun.song
  Cc: linux-mm, linux-fsdevel, linux-xfs, linux-kernel, Qi Zheng

From: Qi Zheng <zhengqi.arch@bytedance.com>

The debugfs_remove_recursive() will wait for debugfs_file_put()
to return, so there is no need to put it after synchronize_srcu()
to wait for the rcu read-side critical section to exit.

Just move it before synchronize_srcu(), which is also convenient
to put the heavy synchronize_srcu() in the delayed work later.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 mm/vmscan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index eeca83e28c9b..a773e97e152e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -818,11 +818,11 @@ void unregister_shrinker(struct shrinker *shrinker)
 	debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id);
 	mutex_unlock(&shrinker_mutex);
 
+	shrinker_debugfs_remove(debugfs_entry, debugfs_id);
+
 	atomic_inc(&shrinker_srcu_generation);
 	synchronize_srcu(&shrinker_srcu);
 
-	shrinker_debugfs_remove(debugfs_entry, debugfs_id);
-
 	kfree(shrinker->nr_deferred);
 	shrinker->nr_deferred = NULL;
 }
-- 
2.30.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/8] mm: vmscan: split unregister_shrinker()
  2023-05-31  9:57 [PATCH 0/8] make unregistration of super_block shrinker more faster Qi Zheng
  2023-05-31  9:57 ` [PATCH 1/8] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu() Qi Zheng
@ 2023-05-31  9:57 ` Qi Zheng
  2023-05-31  9:57 ` [PATCH 3/8] fs: move list_lru_destroy() to destroy_super_work() Qi Zheng
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Qi Zheng @ 2023-05-31  9:57 UTC (permalink / raw)
  To: akpm, tkhai, roman.gushchin, vbabka, viro, brauner, djwong,
	hughd, paulmck, muchun.song
  Cc: linux-mm, linux-fsdevel, linux-xfs, linux-kernel, Qi Zheng

From: Kirill Tkhai <tkhai@ya.ru>

This and the next patches in this series aim to make
time effect of synchronize_srcu() invisible for user.
The patch splits unregister_shrinker() in two functions:

	unregister_shrinker_delayed_initiate()
	unregister_shrinker_delayed_finalize()

and shrinker users may make the second of them to be called
asynchronous (e.g., from workqueue). Next patches make
superblock shrinker to follow this way, so user-visible
umount() time won't contain delays from synchronize_srcu().

Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 include/linux/shrinker.h |  2 ++
 mm/vmscan.c              | 22 ++++++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 224293b2dd06..e9d5a19d83fe 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -102,6 +102,8 @@ extern void register_shrinker_prepared(struct shrinker *shrinker);
 extern int __printf(2, 3) register_shrinker(struct shrinker *shrinker,
 					    const char *fmt, ...);
 extern void unregister_shrinker(struct shrinker *shrinker);
+extern void unregister_shrinker_delayed_initiate(struct shrinker *shrinker);
+extern void unregister_shrinker_delayed_finalize(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
 extern void synchronize_shrinkers(void);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a773e97e152e..baf8d2327d70 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -799,10 +799,7 @@ int register_shrinker(struct shrinker *shrinker, const char *fmt, ...)
 #endif
 EXPORT_SYMBOL(register_shrinker);
 
-/*
- * Remove one
- */
-void unregister_shrinker(struct shrinker *shrinker)
+void unregister_shrinker_delayed_initiate(struct shrinker *shrinker)
 {
 	struct dentry *debugfs_entry;
 	int debugfs_id;
@@ -819,6 +816,13 @@ void unregister_shrinker(struct shrinker *shrinker)
 	mutex_unlock(&shrinker_mutex);
 
 	shrinker_debugfs_remove(debugfs_entry, debugfs_id);
+}
+EXPORT_SYMBOL(unregister_shrinker_delayed_initiate);
+
+void unregister_shrinker_delayed_finalize(struct shrinker *shrinker)
+{
+	if (!shrinker->nr_deferred)
+		return;
 
 	atomic_inc(&shrinker_srcu_generation);
 	synchronize_srcu(&shrinker_srcu);
@@ -826,6 +830,16 @@ void unregister_shrinker(struct shrinker *shrinker)
 	kfree(shrinker->nr_deferred);
 	shrinker->nr_deferred = NULL;
 }
+EXPORT_SYMBOL(unregister_shrinker_delayed_finalize);
+
+/*
+ * Remove one
+ */
+void unregister_shrinker(struct shrinker *shrinker)
+{
+	unregister_shrinker_delayed_initiate(shrinker);
+	unregister_shrinker_delayed_finalize(shrinker);
+}
 EXPORT_SYMBOL(unregister_shrinker);
 
 /**
-- 
2.30.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/8] fs: move list_lru_destroy() to destroy_super_work()
  2023-05-31  9:57 [PATCH 0/8] make unregistration of super_block shrinker more faster Qi Zheng
  2023-05-31  9:57 ` [PATCH 1/8] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu() Qi Zheng
  2023-05-31  9:57 ` [PATCH 2/8] mm: vmscan: split unregister_shrinker() Qi Zheng
@ 2023-05-31  9:57 ` Qi Zheng
  2023-05-31  9:57 ` [PATCH 4/8] fs: shrink only (SB_ACTIVE|SB_BORN) superblocks in super_cache_scan() Qi Zheng
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Qi Zheng @ 2023-05-31  9:57 UTC (permalink / raw)
  To: akpm, tkhai, roman.gushchin, vbabka, viro, brauner, djwong,
	hughd, paulmck, muchun.song
  Cc: linux-mm, linux-fsdevel, linux-xfs, linux-kernel, Qi Zheng

From: Kirill Tkhai <tkhai@ya.ru>

The patch makes s_dentry_lru and s_inode_lru be destroyed
later from the workqueue. This is preparation to split
unregister_shrinker(super_block::s_shrink) in two stages,
and to call finalize stage from destroy_super_work().

Note, that generic filesystem shrinker unregistration
is safe to be split in two stages right after this
patch, since super_cache_count() and super_cache_scan()
have a deal with s_dentry_lru and s_inode_lru only.

But there are two exceptions: XFS and SHMEM, which
define .nr_cached_objects() and .free_cached_objects()
callbacks. These two do not allow us to do the splitting
right after this patch. They touch fs-specific data,
which is destroyed earlier, than destroy_super_work().
So, we can't call unregister_shrinker_delayed_finalize()
from destroy_super_work() because of them, and next
patches make preparations to make this possible.

Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 fs/super.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 8d8d68799b34..2ce4c72720f3 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -159,6 +159,11 @@ static void destroy_super_work(struct work_struct *work)
 							destroy_work);
 	int i;
 
+	WARN_ON(list_lru_count(&s->s_dentry_lru));
+	WARN_ON(list_lru_count(&s->s_inode_lru));
+	list_lru_destroy(&s->s_dentry_lru);
+	list_lru_destroy(&s->s_inode_lru);
+
 	for (i = 0; i < SB_FREEZE_LEVELS; i++)
 		percpu_free_rwsem(&s->s_writers.rw_sem[i]);
 	kfree(s);
@@ -177,8 +182,6 @@ static void destroy_unused_super(struct super_block *s)
 	if (!s)
 		return;
 	up_write(&s->s_umount);
-	list_lru_destroy(&s->s_dentry_lru);
-	list_lru_destroy(&s->s_inode_lru);
 	security_sb_free(s);
 	put_user_ns(s->s_user_ns);
 	kfree(s->s_subtype);
@@ -287,8 +290,6 @@ static void __put_super(struct super_block *s)
 {
 	if (!--s->s_count) {
 		list_del_init(&s->s_list);
-		WARN_ON(s->s_dentry_lru.node);
-		WARN_ON(s->s_inode_lru.node);
 		WARN_ON(!list_empty(&s->s_mounts));
 		security_sb_free(s);
 		put_user_ns(s->s_user_ns);
@@ -330,14 +331,6 @@ void deactivate_locked_super(struct super_block *s)
 		unregister_shrinker(&s->s_shrink);
 		fs->kill_sb(s);
 
-		/*
-		 * Since list_lru_destroy() may sleep, we cannot call it from
-		 * put_super(), where we hold the sb_lock. Therefore we destroy
-		 * the lru lists right now.
-		 */
-		list_lru_destroy(&s->s_dentry_lru);
-		list_lru_destroy(&s->s_inode_lru);
-
 		put_filesystem(fs);
 		put_super(s);
 	} else {
-- 
2.30.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/8] fs: shrink only (SB_ACTIVE|SB_BORN) superblocks in super_cache_scan()
  2023-05-31  9:57 [PATCH 0/8] make unregistration of super_block shrinker more faster Qi Zheng
                   ` (2 preceding siblings ...)
  2023-05-31  9:57 ` [PATCH 3/8] fs: move list_lru_destroy() to destroy_super_work() Qi Zheng
@ 2023-05-31  9:57 ` Qi Zheng
  2023-05-31  9:57 ` [PATCH 5/8] fs: introduce struct super_operations::destroy_super() callback Qi Zheng
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Qi Zheng @ 2023-05-31  9:57 UTC (permalink / raw)
  To: akpm, tkhai, roman.gushchin, vbabka, viro, brauner, djwong,
	hughd, paulmck, muchun.song
  Cc: linux-mm, linux-fsdevel, linux-xfs, linux-kernel, Qi Zheng

From: Kirill Tkhai <tkhai@ya.ru>

This patch prepares superblock shrinker for delayed unregistering.
It makes super_cache_scan() avoid shrinking of not active superblocks.
SB_ACTIVE is used as such the indicator. In case of superblock is not
active, super_cache_scan() just exits with SHRINK_STOP as result.

Note, that SB_ACTIVE is cleared in generic_shutdown_super() and this
is made under the write lock of s_umount. Function super_cache_scan()
also takes the read lock of s_umount, so it can't skip this flag cleared.

SB_BORN check is added to super_cache_scan() just for uniformity
with super_cache_count(), while super_cache_count() received SB_ACTIVE
check just for uniformity with super_cache_scan().

After this patch super_cache_scan() becomes to ignore unregistering
superblocks, so this function is OK with splitting unregister_shrinker().
Next patches prepare super_cache_count() to follow this way.

Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 fs/super.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index 2ce4c72720f3..2ce54561e82e 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -79,6 +79,11 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
 	if (!trylock_super(sb))
 		return SHRINK_STOP;
 
+	if ((sb->s_flags & (SB_BORN|SB_ACTIVE)) != (SB_BORN|SB_ACTIVE)) {
+		freed = SHRINK_STOP;
+		goto unlock;
+	}
+
 	if (sb->s_op->nr_cached_objects)
 		fs_objects = sb->s_op->nr_cached_objects(sb, sc);
 
@@ -110,6 +115,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
 		freed += sb->s_op->free_cached_objects(sb, sc);
 	}
 
+unlock:
 	up_read(&sb->s_umount);
 	return freed;
 }
@@ -136,7 +142,7 @@ static unsigned long super_cache_count(struct shrinker *shrink,
 	 * avoid this situation, so do the same here. The memory barrier is
 	 * matched with the one in mount_fs() as we don't hold locks here.
 	 */
-	if (!(sb->s_flags & SB_BORN))
+	if ((sb->s_flags & (SB_BORN|SB_ACTIVE)) != (SB_BORN|SB_ACTIVE))
 		return 0;
 	smp_rmb();
 
-- 
2.30.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 5/8] fs: introduce struct super_operations::destroy_super() callback
  2023-05-31  9:57 [PATCH 0/8] make unregistration of super_block shrinker more faster Qi Zheng
                   ` (3 preceding siblings ...)
  2023-05-31  9:57 ` [PATCH 4/8] fs: shrink only (SB_ACTIVE|SB_BORN) superblocks in super_cache_scan() Qi Zheng
@ 2023-05-31  9:57 ` Qi Zheng
  2023-05-31 11:19   ` Christian Brauner
  2023-05-31  9:57 ` [PATCH 6/8] xfs: introduce xfs_fs_destroy_super() Qi Zheng
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Qi Zheng @ 2023-05-31  9:57 UTC (permalink / raw)
  To: akpm, tkhai, roman.gushchin, vbabka, viro, brauner, djwong,
	hughd, paulmck, muchun.song
  Cc: linux-mm, linux-fsdevel, linux-xfs, linux-kernel, Qi Zheng

From: Kirill Tkhai <tkhai@ya.ru>

The patch introduces a new callback, which will be called
asynchronous from delayed work.

This will allows to make ::nr_cached_objects() safe
to be called on destroying superblock in next patches,
and to split unregister_shrinker() into two primitives.

Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 fs/super.c         | 3 +++
 include/linux/fs.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index 2ce54561e82e..4e9d08224f86 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -170,6 +170,9 @@ static void destroy_super_work(struct work_struct *work)
 	list_lru_destroy(&s->s_dentry_lru);
 	list_lru_destroy(&s->s_inode_lru);
 
+	if (s->s_op->destroy_super)
+		s->s_op->destroy_super(s);
+
 	for (i = 0; i < SB_FREEZE_LEVELS; i++)
 		percpu_free_rwsem(&s->s_writers.rw_sem[i]);
 	kfree(s);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0b54ac1d331b..30b46d0facfc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1910,6 +1910,7 @@ struct super_operations {
 	int (*drop_inode) (struct inode *);
 	void (*evict_inode) (struct inode *);
 	void (*put_super) (struct super_block *);
+	void (*destroy_super) (struct super_block *);
 	int (*sync_fs)(struct super_block *sb, int wait);
 	int (*freeze_super) (struct super_block *);
 	int (*freeze_fs) (struct super_block *);
-- 
2.30.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 6/8] xfs: introduce xfs_fs_destroy_super()
  2023-05-31  9:57 [PATCH 0/8] make unregistration of super_block shrinker more faster Qi Zheng
                   ` (4 preceding siblings ...)
  2023-05-31  9:57 ` [PATCH 5/8] fs: introduce struct super_operations::destroy_super() callback Qi Zheng
@ 2023-05-31  9:57 ` Qi Zheng
  2023-05-31  9:57 ` [PATCH 7/8] shmem: implement shmem_destroy_super() Qi Zheng
  2023-05-31  9:57 ` [PATCH 8/8] fs: use unregister_shrinker_delayed_{initiate, finalize} for super_block shrinker Qi Zheng
  7 siblings, 0 replies; 13+ messages in thread
From: Qi Zheng @ 2023-05-31  9:57 UTC (permalink / raw)
  To: akpm, tkhai, roman.gushchin, vbabka, viro, brauner, djwong,
	hughd, paulmck, muchun.song
  Cc: linux-mm, linux-fsdevel, linux-xfs, linux-kernel, Qi Zheng

From: Kirill Tkhai <tkhai@ya.ru>

xfs_fs_nr_cached_objects() touches sb->s_fs_info,
and this patch makes it to be destructed later.

After this patch xfs_fs_nr_cached_objects() is safe
for splitting unregister_shrinker(): mp->m_perag_tree
is stable till destroy_super_work(), while iteration
over it is already RCU-protected by internal XFS
business.

Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 fs/xfs/xfs_super.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 7e706255f165..694616524c76 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -743,11 +743,18 @@ xfs_fs_drop_inode(
 }
 
 static void
-xfs_mount_free(
+xfs_free_names(
 	struct xfs_mount	*mp)
 {
 	kfree(mp->m_rtname);
 	kfree(mp->m_logname);
+}
+
+static void
+xfs_mount_free(
+	struct xfs_mount	*mp)
+{
+	xfs_free_names(mp);
 	kmem_free(mp);
 }
 
@@ -1136,8 +1143,19 @@ xfs_fs_put_super(
 	xfs_destroy_mount_workqueues(mp);
 	xfs_close_devices(mp);
 
-	sb->s_fs_info = NULL;
-	xfs_mount_free(mp);
+	xfs_free_names(mp);
+}
+
+static void
+xfs_fs_destroy_super(
+	struct super_block	*sb)
+{
+	if (sb->s_fs_info) {
+		struct xfs_mount	*mp = XFS_M(sb);
+
+		kmem_free(mp);
+		sb->s_fs_info = NULL;
+	}
 }
 
 static long
@@ -1165,6 +1183,7 @@ static const struct super_operations xfs_super_operations = {
 	.dirty_inode		= xfs_fs_dirty_inode,
 	.drop_inode		= xfs_fs_drop_inode,
 	.put_super		= xfs_fs_put_super,
+	.destroy_super		= xfs_fs_destroy_super,
 	.sync_fs		= xfs_fs_sync_fs,
 	.freeze_fs		= xfs_fs_freeze,
 	.unfreeze_fs		= xfs_fs_unfreeze,
-- 
2.30.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 7/8] shmem: implement shmem_destroy_super()
  2023-05-31  9:57 [PATCH 0/8] make unregistration of super_block shrinker more faster Qi Zheng
                   ` (5 preceding siblings ...)
  2023-05-31  9:57 ` [PATCH 6/8] xfs: introduce xfs_fs_destroy_super() Qi Zheng
@ 2023-05-31  9:57 ` Qi Zheng
  2023-05-31  9:57 ` [PATCH 8/8] fs: use unregister_shrinker_delayed_{initiate, finalize} for super_block shrinker Qi Zheng
  7 siblings, 0 replies; 13+ messages in thread
From: Qi Zheng @ 2023-05-31  9:57 UTC (permalink / raw)
  To: akpm, tkhai, roman.gushchin, vbabka, viro, brauner, djwong,
	hughd, paulmck, muchun.song
  Cc: linux-mm, linux-fsdevel, linux-xfs, linux-kernel, Qi Zheng

From: Kirill Tkhai <tkhai@ya.ru>

Similar to xfs_fs_destroy_super() implement the method
for shmem.

shmem_unused_huge_count() just touches sb->s_fs_info.
After such the later freeing it will be safe for
unregister_shrinker() splitting (which is made in next
patch).

Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 mm/shmem.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 71e6c9855770..a4c3fdf23838 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3930,6 +3930,12 @@ static void shmem_put_super(struct super_block *sb)
 	free_percpu(sbinfo->ino_batch);
 	percpu_counter_destroy(&sbinfo->used_blocks);
 	mpol_put(sbinfo->mpol);
+}
+
+static void shmem_destroy_super(struct super_block *sb)
+{
+	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
+
 	kfree(sbinfo);
 	sb->s_fs_info = NULL;
 }
@@ -4018,6 +4024,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 
 failed:
 	shmem_put_super(sb);
+	shmem_destroy_super(sb);
 	return -ENOMEM;
 }
 
@@ -4182,6 +4189,7 @@ static const struct super_operations shmem_ops = {
 	.evict_inode	= shmem_evict_inode,
 	.drop_inode	= generic_delete_inode,
 	.put_super	= shmem_put_super,
+	.destroy_super	= shmem_destroy_super,
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	.nr_cached_objects	= shmem_unused_huge_count,
 	.free_cached_objects	= shmem_unused_huge_scan,
-- 
2.30.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 8/8] fs: use unregister_shrinker_delayed_{initiate, finalize} for super_block shrinker
  2023-05-31  9:57 [PATCH 0/8] make unregistration of super_block shrinker more faster Qi Zheng
                   ` (6 preceding siblings ...)
  2023-05-31  9:57 ` [PATCH 7/8] shmem: implement shmem_destroy_super() Qi Zheng
@ 2023-05-31  9:57 ` Qi Zheng
  7 siblings, 0 replies; 13+ messages in thread
From: Qi Zheng @ 2023-05-31  9:57 UTC (permalink / raw)
  To: akpm, tkhai, roman.gushchin, vbabka, viro, brauner, djwong,
	hughd, paulmck, muchun.song
  Cc: linux-mm, linux-fsdevel, linux-xfs, linux-kernel, Qi Zheng

From: Kirill Tkhai <tkhai@ya.ru>

Previous patches made all the data, which is touched from
super_cache_count(), destroyed from destroy_super_work():
s_dentry_lru, s_inode_lru and super_block::s_fs_info.

super_cache_scan() can't be called after SB_ACTIVE is cleared
in generic_shutdown_super().

So, it safe to move heavy unregister_shrinker_delayed_finalize()
part to delayed work, i.e. it's safe for parallel do_shrink_slab()
to be executed between unregister_shrinker_delayed_initiate() and
destroy_super_work()->unregister_shrinker_delayed_finalize().

This makes the heavy synchronize_srcu() to do not affect on user-visible
unregistration speed (since now it's executed from workqueue).

All further time-critical for unregistration places may be written
in the same conception.

Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 fs/super.c         | 4 +++-
 include/linux/fs.h | 5 +++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index 4e9d08224f86..c61efb74fa7f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -165,6 +165,8 @@ static void destroy_super_work(struct work_struct *work)
 							destroy_work);
 	int i;
 
+	unregister_shrinker_delayed_finalize(&s->s_shrink);
+
 	WARN_ON(list_lru_count(&s->s_dentry_lru));
 	WARN_ON(list_lru_count(&s->s_inode_lru));
 	list_lru_destroy(&s->s_dentry_lru);
@@ -337,7 +339,7 @@ void deactivate_locked_super(struct super_block *s)
 {
 	struct file_system_type *fs = s->s_type;
 	if (atomic_dec_and_test(&s->s_active)) {
-		unregister_shrinker(&s->s_shrink);
+		unregister_shrinker_delayed_initiate(&s->s_shrink);
 		fs->kill_sb(s);
 
 		put_filesystem(fs);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 30b46d0facfc..869dd8de91a5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1929,6 +1929,11 @@ struct super_operations {
 	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 	struct dquot **(*get_dquots)(struct inode *);
 #endif
+	/*
+	 * Shrinker may call these two function on destructing super_block
+	 * till unregister_shrinker_delayed_finalize() has completed
+	 * in destroy_super_work(), and they must care about that.
+	 */
 	long (*nr_cached_objects)(struct super_block *,
 				  struct shrink_control *);
 	long (*free_cached_objects)(struct super_block *,
-- 
2.30.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/8] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu()
  2023-05-31  9:57 ` [PATCH 1/8] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu() Qi Zheng
@ 2023-05-31 10:49   ` Christian Brauner
  2023-05-31 12:52     ` Qi Zheng
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2023-05-31 10:49 UTC (permalink / raw)
  To: Qi Zheng
  Cc: akpm, tkhai, roman.gushchin, vbabka, viro, djwong, hughd,
	paulmck, muchun.song, linux-mm, linux-fsdevel, linux-xfs,
	linux-kernel, Qi Zheng

On Wed, May 31, 2023 at 09:57:35AM +0000, Qi Zheng wrote:
> From: Qi Zheng <zhengqi.arch@bytedance.com>
> 
> The debugfs_remove_recursive() will wait for debugfs_file_put()
> to return, so there is no need to put it after synchronize_srcu()
> to wait for the rcu read-side critical section to exit.
> 
> Just move it before synchronize_srcu(), which is also convenient
> to put the heavy synchronize_srcu() in the delayed work later.
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---

Afaict, should be a patch independent of this series.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/8] fs: introduce struct super_operations::destroy_super() callback
  2023-05-31  9:57 ` [PATCH 5/8] fs: introduce struct super_operations::destroy_super() callback Qi Zheng
@ 2023-05-31 11:19   ` Christian Brauner
  2023-05-31 12:54     ` Qi Zheng
  0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2023-05-31 11:19 UTC (permalink / raw)
  To: Qi Zheng
  Cc: akpm, tkhai, roman.gushchin, vbabka, viro, djwong, hughd,
	paulmck, muchun.song, linux-mm, linux-fsdevel, linux-xfs,
	linux-kernel, Qi Zheng, Christoph Hellwig

On Wed, May 31, 2023 at 09:57:39AM +0000, Qi Zheng wrote:
> From: Kirill Tkhai <tkhai@ya.ru>
> 
> The patch introduces a new callback, which will be called
> asynchronous from delayed work.
> 
> This will allows to make ::nr_cached_objects() safe
> to be called on destroying superblock in next patches,
> and to split unregister_shrinker() into two primitives.
> 
> Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
>  fs/super.c         | 3 +++
>  include/linux/fs.h | 1 +
>  2 files changed, 4 insertions(+)

Misses updates to
Documentation/filesystems/locking.rst
Documentation/filesystems/vfs.rst


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/8] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu()
  2023-05-31 10:49   ` Christian Brauner
@ 2023-05-31 12:52     ` Qi Zheng
  0 siblings, 0 replies; 13+ messages in thread
From: Qi Zheng @ 2023-05-31 12:52 UTC (permalink / raw)
  To: Christian Brauner
  Cc: akpm, tkhai, roman.gushchin, vbabka, viro, djwong, hughd,
	paulmck, muchun.song, linux-mm, linux-fsdevel, linux-xfs,
	linux-kernel, Qi Zheng



On 2023/5/31 18:49, Christian Brauner wrote:
> On Wed, May 31, 2023 at 09:57:35AM +0000, Qi Zheng wrote:
>> From: Qi Zheng <zhengqi.arch@bytedance.com>
>>
>> The debugfs_remove_recursive() will wait for debugfs_file_put()
>> to return, so there is no need to put it after synchronize_srcu()
>> to wait for the rcu read-side critical section to exit.
>>
>> Just move it before synchronize_srcu(), which is also convenient
>> to put the heavy synchronize_srcu() in the delayed work later.
>>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
> 
> Afaict, should be a patch independent of this series.

OK, will resend as an independent patch.

Thanks,
Qi


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/8] fs: introduce struct super_operations::destroy_super() callback
  2023-05-31 11:19   ` Christian Brauner
@ 2023-05-31 12:54     ` Qi Zheng
  0 siblings, 0 replies; 13+ messages in thread
From: Qi Zheng @ 2023-05-31 12:54 UTC (permalink / raw)
  To: Christian Brauner
  Cc: akpm, tkhai, roman.gushchin, vbabka, viro, djwong, hughd,
	paulmck, muchun.song, linux-mm, linux-fsdevel, linux-xfs,
	linux-kernel, Qi Zheng, Christoph Hellwig



On 2023/5/31 19:19, Christian Brauner wrote:
> On Wed, May 31, 2023 at 09:57:39AM +0000, Qi Zheng wrote:
>> From: Kirill Tkhai <tkhai@ya.ru>
>>
>> The patch introduces a new callback, which will be called
>> asynchronous from delayed work.
>>
>> This will allows to make ::nr_cached_objects() safe
>> to be called on destroying superblock in next patches,
>> and to split unregister_shrinker() into two primitives.
>>
>> Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>>   fs/super.c         | 3 +++
>>   include/linux/fs.h | 1 +
>>   2 files changed, 4 insertions(+)
> 
> Misses updates to
> Documentation/filesystems/locking.rst
> Documentation/filesystems/vfs.rst

Will do.

Thanks,
Qi


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-05-31 12:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-31  9:57 [PATCH 0/8] make unregistration of super_block shrinker more faster Qi Zheng
2023-05-31  9:57 ` [PATCH 1/8] mm: vmscan: move shrinker_debugfs_remove() before synchronize_srcu() Qi Zheng
2023-05-31 10:49   ` Christian Brauner
2023-05-31 12:52     ` Qi Zheng
2023-05-31  9:57 ` [PATCH 2/8] mm: vmscan: split unregister_shrinker() Qi Zheng
2023-05-31  9:57 ` [PATCH 3/8] fs: move list_lru_destroy() to destroy_super_work() Qi Zheng
2023-05-31  9:57 ` [PATCH 4/8] fs: shrink only (SB_ACTIVE|SB_BORN) superblocks in super_cache_scan() Qi Zheng
2023-05-31  9:57 ` [PATCH 5/8] fs: introduce struct super_operations::destroy_super() callback Qi Zheng
2023-05-31 11:19   ` Christian Brauner
2023-05-31 12:54     ` Qi Zheng
2023-05-31  9:57 ` [PATCH 6/8] xfs: introduce xfs_fs_destroy_super() Qi Zheng
2023-05-31  9:57 ` [PATCH 7/8] shmem: implement shmem_destroy_super() Qi Zheng
2023-05-31  9:57 ` [PATCH 8/8] fs: use unregister_shrinker_delayed_{initiate, finalize} for super_block shrinker Qi Zheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox