linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Sultan Alsawaf <sultan@kerneltoast.com>
To: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: akpm@linux-foundation.org, tkhai@ya.ru, hannes@cmpxchg.org,
	shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev,
	muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com,
	dave@stgolabs.net, penguin-kernel@i-love.sakura.ne.jp,
	paulmck@kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/7] mm: vmscan: make global slab shrink lockless
Date: Fri, 24 Feb 2023 00:20:12 -0800	[thread overview]
Message-ID: <Y/hzPP0G5AFhOmsY@sultan-box.localdomain> (raw)
In-Reply-To: <8049b6ed-435f-b518-f947-5516a514aec2@bytedance.com>

On Fri, Feb 24, 2023 at 12:00:21PM +0800, Qi Zheng wrote:
> 
> 
> On 2023/2/24 02:24, Sultan Alsawaf wrote:
> > On Thu, Feb 23, 2023 at 09:27:20PM +0800, Qi Zheng wrote:
> > > The shrinker_rwsem is a global lock in shrinkers subsystem,
> > > it is easy to cause blocking in the following cases:
> > > 
> > > a. the write lock of shrinker_rwsem was held for too long.
> > >     For example, there are many memcgs in the system, which
> > >     causes some paths to hold locks and traverse it for too
> > >     long. (e.g. expand_shrinker_info())
> > > b. the read lock of shrinker_rwsem was held for too long,
> > >     and a writer came at this time. Then this writer will be
> > >     forced to wait and block all subsequent readers.
> > >     For example:
> > >     - be scheduled when the read lock of shrinker_rwsem is
> > >       held in do_shrink_slab()
> > >     - some shrinker are blocked for too long. Like the case
> > >       mentioned in the patchset[1].
> > > 
> > > Therefore, many times in history ([2],[3],[4],[5]), some
> > > people wanted to replace shrinker_rwsem reader with SRCU,
> > > but they all gave up because SRCU was not unconditionally
> > > enabled.
> > > 
> > > But now, since commit 1cd0bd06093c ("rcu: Remove CONFIG_SRCU"),
> > > the SRCU is unconditionally enabled. So it's time to use
> > > SRCU to protect readers who previously held shrinker_rwsem.
> > > 
> > > [1]. https://lore.kernel.org/lkml/20191129214541.3110-1-ptikhomirov@virtuozzo.com/
> > > [2]. https://lore.kernel.org/all/1437080113.3596.2.camel@stgolabs.net/
> > > [3]. https://lore.kernel.org/lkml/1510609063-3327-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp/
> > > [4]. https://lore.kernel.org/lkml/153365347929.19074.12509495712735843805.stgit@localhost.localdomain/
> > > [5]. https://lore.kernel.org/lkml/20210927074823.5825-1-sultan@kerneltoast.com/
> > > 
> > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> > > ---
> > >   mm/vmscan.c | 27 +++++++++++----------------
> > >   1 file changed, 11 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 9f895ca6216c..02987a6f95d1 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -202,6 +202,7 @@ static void set_task_reclaim_state(struct task_struct *task,
> > >   LIST_HEAD(shrinker_list);
> > >   DECLARE_RWSEM(shrinker_rwsem);
> > > +DEFINE_SRCU(shrinker_srcu);
> > >   #ifdef CONFIG_MEMCG
> > >   static int shrinker_nr_max;
> > > @@ -706,7 +707,7 @@ void free_prealloced_shrinker(struct shrinker *shrinker)
> > >   void register_shrinker_prepared(struct shrinker *shrinker)
> > >   {
> > >   	down_write(&shrinker_rwsem);
> > > -	list_add_tail(&shrinker->list, &shrinker_list);
> > > +	list_add_tail_rcu(&shrinker->list, &shrinker_list);
> > >   	shrinker->flags |= SHRINKER_REGISTERED;
> > >   	shrinker_debugfs_add(shrinker);
> > >   	up_write(&shrinker_rwsem);
> > > @@ -760,13 +761,15 @@ void unregister_shrinker(struct shrinker *shrinker)
> > >   		return;
> > >   	down_write(&shrinker_rwsem);
> > > -	list_del(&shrinker->list);
> > > +	list_del_rcu(&shrinker->list);
> > >   	shrinker->flags &= ~SHRINKER_REGISTERED;
> > >   	if (shrinker->flags & SHRINKER_MEMCG_AWARE)
> > >   		unregister_memcg_shrinker(shrinker);
> > >   	debugfs_entry = shrinker_debugfs_remove(shrinker);
> > >   	up_write(&shrinker_rwsem);
> > > +	synchronize_srcu(&shrinker_srcu);
> > > +
> > >   	debugfs_remove_recursive(debugfs_entry);
> > >   	kfree(shrinker->nr_deferred);
> > > @@ -786,6 +789,7 @@ void synchronize_shrinkers(void)
> > >   {
> > >   	down_write(&shrinker_rwsem);
> > >   	up_write(&shrinker_rwsem);
> > > +	synchronize_srcu(&shrinker_srcu);
> > >   }
> > >   EXPORT_SYMBOL(synchronize_shrinkers);
> > > @@ -996,6 +1000,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
> > >   {
> > >   	unsigned long ret, freed = 0;
> > >   	struct shrinker *shrinker;
> > > +	int srcu_idx;
> > >   	/*
> > >   	 * The root memcg might be allocated even though memcg is disabled
> > > @@ -1007,10 +1012,10 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
> > >   	if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
> > >   		return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
> > > -	if (!down_read_trylock(&shrinker_rwsem))
> > > -		goto out;
> > > +	srcu_idx = srcu_read_lock(&shrinker_srcu);
> > > -	list_for_each_entry(shrinker, &shrinker_list, list) {
> > > +	list_for_each_entry_srcu(shrinker, &shrinker_list, list,
> > > +				 srcu_read_lock_held(&shrinker_srcu)) {
> > >   		struct shrink_control sc = {
> > >   			.gfp_mask = gfp_mask,
> > >   			.nid = nid,
> > > @@ -1021,19 +1026,9 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
> > >   		if (ret == SHRINK_EMPTY)
> > >   			ret = 0;
> > >   		freed += ret;
> > > -		/*
> > > -		 * Bail out if someone want to register a new shrinker to
> > > -		 * prevent the registration from being stalled for long periods
> > > -		 * by parallel ongoing shrinking.
> > > -		 */
> > > -		if (rwsem_is_contended(&shrinker_rwsem)) {
> > > -			freed = freed ? : 1;
> > > -			break;
> > > -		}
> > >   	}
> > > -	up_read(&shrinker_rwsem);
> > > -out:
> > > +	srcu_read_unlock(&shrinker_srcu, srcu_idx);
> > >   	cond_resched();
> > >   	return freed;
> > >   }
> > > -- 
> > > 2.20.1
> > > 
> > > 
> > 
> > Hi Qi,
> > 
> > A different problem I realized after my old attempt to use SRCU was that the
> > unregister_shrinker() path became quite slow due to the heavy synchronize_srcu()
> > call. Both register_shrinker() *and* unregister_shrinker() are called frequently
> > these days, and SRCU is too unfair to the unregister path IMO.
> 
> Hi Sultan,
> 
> IIUC, for unregister_shrinker(), the wait time is hardly longer with
> SRCU than with shrinker_rwsem before.

The wait time can be quite different because with shrinker_rwsem, the
rwsem_is_contended() bailout would cause unregister_shrinker() to wait for only
one random shrinker to finish at worst rather than waiting for *all* shrinkers
to finish.

> And I just did a simple test. After using the script in cover letter to
> increase the shrink_slab hotspot, I did umount 1k times at the same
> time, and then I used bpftrace to measure the time consumption of
> unregister_shrinker() as follows:
> 
> bpftrace -e 'kprobe:unregister_shrinker { @start[tid] = nsecs; }
> kretprobe:unregister_shrinker /@start[tid]/ { @ns[comm] = hist(nsecs -
> @start[tid]); delete(@start[tid]); }'
> 
> @ns[umount]:
> [16K, 32K)             3 |      |
> [32K, 64K)            66 |@@@@@@@@@@      |
> [64K, 128K)           32 |@@@@@      |
> [128K, 256K)          22 |@@@      |
> [256K, 512K)          48 |@@@@@@@      |
> [512K, 1M)            19 |@@@      |
> [1M, 2M)             131 |@@@@@@@@@@@@@@@@@@@@@      |
> [2M, 4M)             313
> |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [4M, 8M)             302 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> |
> [8M, 16M)             55 |@@@@@@@@@
> 
> I see that the highest time-consuming of unregister_shrinker() is between
> 8ms and 16ms, which feels tolerable?

If you've got a fast x86 machine then I'd say that's a bit slow. :)

This depends a lot on which shrinkers are active on your system and how much
work each one does upon running. If a driver's shrinker doesn't have much to do
because there's nothing it can shrink further, then it'll run fast. Conversely,
if a driver is stressed in a way that constantly creates a lot of potential work
for its shrinker, then the shrinker will run longer.

Since shrinkers are allowed to sleep, the delays can really add up when waiting
for all of them to finish running. In the past, I recall observing delays of
100ms+ in unregister_shrinker() on slower arm64 hardware when I stress tested
the SRCU approach.

If your GPU driver has a shrinker (such as i915), I suggest testing again under
heavy GPU load. The GPU shrinkers can be pretty heavy IIRC.

Thanks,
Sultan

> Thanks,
> Qi
> 
> > 
> > Although I never got around to submitting it, I made a non-SRCU solution [1]
> > that uses fine-grained locking instead, which is fair to both the register path
> > and unregister path. (The patch I've linked is a version of this adapted to an
> > older 4.14 kernel FYI, but it can be reworked for the current kernel.)
> > 
> > What do you think about the fine-grained locking approach?
> > 
> > Thanks,
> > Sultan
> > 
> > [1] https://github.com/kerneltoast/android_kernel_google_floral/commit/012378f3173a82d2333d3ae7326691544301e76a
> > 
> 
> -- 
> Thanks,
> Qi


  parent reply	other threads:[~2023-02-24  8:20 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-23 13:27 [PATCH v2 0/7] make " Qi Zheng
2023-02-23 13:27 ` [PATCH v2 1/7] mm: vmscan: add a map_nr_max field to shrinker_info Qi Zheng
2023-02-25  8:18   ` Qi Zheng
2023-02-25 15:14     ` Kirill Tkhai
2023-02-25 15:52       ` Qi Zheng
2023-02-26 13:54       ` Qi Zheng
2023-02-23 13:27 ` [PATCH v2 2/7] mm: vmscan: make global slab shrink lockless Qi Zheng
2023-02-23 15:26   ` Rafael Aquini
2023-02-23 15:37     ` Rafael Aquini
2023-02-24  4:09       ` Qi Zheng
2023-02-23 18:24   ` Sultan Alsawaf
2023-02-23 18:39     ` Paul E. McKenney
2023-02-23 19:18       ` Sultan Alsawaf
2023-02-24  4:00     ` Qi Zheng
2023-02-24  4:16       ` Qi Zheng
2023-02-24  8:20       ` Sultan Alsawaf [this message]
2023-02-24 10:12         ` Qi Zheng
2023-02-24 21:02       ` Kirill Tkhai
2023-02-24 21:14         ` Kirill Tkhai
2023-02-25  8:08           ` Qi Zheng
2023-02-25 15:30             ` Kirill Tkhai
2023-02-25 15:57               ` Qi Zheng
2023-02-25 16:17                 ` Kirill Tkhai
2023-02-25 16:37                   ` Qi Zheng
2023-02-25 21:28                     ` Kirill Tkhai
2023-02-26 13:56                       ` Qi Zheng
2023-02-23 13:27 ` [PATCH v2 3/7] mm: vmscan: make memcg " Qi Zheng
2023-02-23 13:27 ` [PATCH v2 4/7] mm: shrinkers: make count and scan in shrinker debugfs lockless Qi Zheng
2023-02-23 13:27 ` [PATCH v2 5/7] mm: vmscan: hold write lock to reparent shrinker nr_deferred Qi Zheng
2023-02-23 13:27 ` [PATCH v2 6/7] mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers() Qi Zheng
2023-02-23 13:27 ` [PATCH v2 7/7] mm: shrinkers: convert shrinker_rwsem to mutex Qi Zheng
2023-02-23 18:19 ` [PATCH v2 0/7] make slab shrink lockless Paul E. McKenney
2023-02-24  4:08   ` Qi Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/hzPP0G5AFhOmsY@sultan-box.localdomain \
    --to=sultan@kerneltoast.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=paulmck@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=tkhai@ya.ru \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox