linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: lipeifeng@oppo.com
Cc: akpm@linux-foundation.org, zhengqi.arch@bytedance.com,
	roman.gushchin@linux.dev, muchun.song@linux.dev,
	21cnbao@gmail.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mm/shrinker: add SHRINKER_NO_DIRECT_RECLAIM
Date: Mon, 15 Apr 2024 09:59:18 +1000	[thread overview]
Message-ID: <Zhxt1uL+QPisq4rE@dread.disaster.area> (raw)
In-Reply-To: <20240413015410.30951-1-lipeifeng@oppo.com>

On Sat, Apr 13, 2024 at 09:54:10AM +0800, lipeifeng@oppo.com wrote:
> From: Peifeng Li <lipeifeng@oppo.com>
> 
> In the case of insufficient memory, threads will be in direct_reclaim to
> reclaim memory, direct_reclaim will call shrink_slab to run sequentially
> each shrinker callback. If there is a lock-contention in the shrinker
> callback,such as spinlock,mutex_lock and so on, threads may be likely to
> be stuck in direct_reclaim for a long time, even if the memfree has reached
> the high watermarks of the zone, resulting in poor performance of threads.

That's always been a problem. That's a shrinker implementation
problem, not a shrinker infrastructure problem.

> Example 1: shrinker callback may wait for spinlock
> static unsigned long mb_cache_shrink(struct mb_cache *cache,
>                                      unsigned long nr_to_scan)
> {
>         struct mb_cache_entry *entry;
>         unsigned long shrunk = 0;
> 
>         spin_lock(&cache->c_list_lock);
>         while (nr_to_scan-- && !list_empty(&cache->c_list)) {
>                 entry = list_first_entry(&cache->c_list,
>                                          struct mb_cache_entry, e_list);
>                 if (test_bit(MBE_REFERENCED_B, &entry->e_flags) ||
>                     atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) {
>                         clear_bit(MBE_REFERENCED_B, &entry->e_flags);
>                         list_move_tail(&entry->e_list, &cache->c_list);
>                         continue;
>                 }
>                 list_del_init(&entry->e_list);
>                 cache->c_entry_count--;
>                 spin_unlock(&cache->c_list_lock);
>                 __mb_cache_entry_free(cache, entry);
>                 shrunk++;
>                 cond_resched();
>                 spin_lock(&cache->c_list_lock);
>         }
>         spin_unlock(&cache->c_list_lock);
> 
>         return shrunk;
> }

Yeah, we learnt a -long- time ago that using global locks in
shrinkers that have -unbounded concurrency- is a really bad idea.
This is just a poorly implemented shrinker implemenation because it
doesn't take into account memory reclaim concurrency.

This is, for example, why list_lru exists is tightly tied into
the SHRINKER_NUMA_AWARE infrastructure - it gets rid of the need for
global locks in reclaim lists that shrinkers traverse.

> Example 2: shrinker callback may wait for mutex lock
> static
> unsigned long kbase_mem_evictable_reclaim_scan_objects(struct shrinker *s,
> 		struct shrink_control *sc)
> {
> 	struct kbase_context *kctx;
> 	struct kbase_mem_phy_alloc *alloc;
> 	struct kbase_mem_phy_alloc *tmp;
> 	unsigned long freed = 0;
> 
> 	kctx = container_of(s, struct kbase_context, reclaim);
> 
> 	// MTK add to prevent false alarm
> 	lockdep_off();

That's just -broken-.

If shrinkers are called from a context that they can't take locks
because they might deadlock, then they must either use trylocks and
abort (i.e. SHRINK_STOP) or use context flags provided by the
allocation context (e.g. GFP_NOFS, memalloc_nofs_save()) to tell
reclaim that context specific subsystem locks are held and the
shrinker should not attempt to take them and/or run in this context.

> 	mutex_lock(&kctx->jit_evict_lock);

That's also wrong.

Shrinkers must be non-blocking, otherwise the cause memory reclaim
latencies that will result in unpredicatable memory allocation
latencies and that makes anyone running applications with latency
specific SLAs very unhappy.

IOWs, this is a subsystem shrinker that is very poorly implemented
and needs to be fixed before we do anything else.

> In mobile-phone,threads are likely to be stuck in shrinker callback during
> direct_reclaim, with example like the following:
> <...>-2806    [004] ..... 866458.339840: mm_shrink_slab_start:
> 			dynamic_mem_shrink_scan+0x0/0xb8 ... priority 2
> <...>-2806    [004] ..... 866459.339933: mm_shrink_slab_end:
> 			dynamic_mem_shrink_scan+0x0/0xb8 ...

Yup, that's exactly the problem with blocking shrinkers - they can
screw the whole system over because it stops memory allocation in
it's tracks. Shrinkers must be non-blocking.

> For the above reason, the patch introduces SHRINKER_NO_DIRECT_RECLAIM that
> allows driver to set shrinker callback not to be called in direct_reclaim
> unless sc->priority is 0.

No, that's fundamentally flawed, too.

Firstly, it doesn't avoid deadlocks, nor does it avoid lock
contention under heavy memory pressure - it just hides these
problems until we are critically low on memory. Which will happen
much faster, because we aren't reclaiming memory from caches that
hold memory that needs to be reclaimed. This isn't good.

Further, it bypasses the mechanism we use to defer the shrinker
work to a context where it can be executed safely (i.e. kswapd).
Shrinkers that cannot run in the current context are supposed to
return SHRINK_STOP to tell the shrink_slab infrastructure to
accumulate the work for the next context that can run the reclaim
rather than execute it.

This allows kswapd to do the reclaim work instead of direct reclaim.
It also ensures that all the memory pressure being applied to the
shrinkers is actually actioned so we keep all the caches and memory
usage in relative balance.

IOWs, the choice of running the shrinker or not is controlled by two
things:

1. the shrinker implementation itself, and
2. the reclaim context flags provided by the allocation that needs
reclaim to be performed.

Long story short: if a shrinker is causing direct reclaim problems
because of poor locking design, latency and/or context specific
deadlocks, then the subsystem and it's shrinker needs to be fixed.
We should not be skipping direct reclaim just because a shrinker is
really poorly implemented.

-Dave.
-- 
Dave Chinner
david@fromorbit.com


      parent reply	other threads:[~2024-04-14 23:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-13  1:54 lipeifeng
2024-04-13  5:19 ` Barry Song
2024-04-13  5:42   ` 李培锋
2024-04-13  5:58     ` Barry Song
2024-04-14 23:59 ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zhxt1uL+QPisq4rE@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lipeifeng@oppo.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox