From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D97EAC4345F for ; Sat, 13 Apr 2024 05:58:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 195856B0083; Sat, 13 Apr 2024 01:58:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 11F466B0085; Sat, 13 Apr 2024 01:58:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F27F26B0087; Sat, 13 Apr 2024 01:58:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D1F3E6B0083 for ; Sat, 13 Apr 2024 01:58:30 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6B0D5120703 for ; Sat, 13 Apr 2024 05:58:30 +0000 (UTC) X-FDA: 82003453980.26.4DFF351 Received: from mail-oa1-f54.google.com (mail-oa1-f54.google.com [209.85.160.54]) by imf20.hostedemail.com (Postfix) with ESMTP id AA6B31C0003 for ; Sat, 13 Apr 2024 05:58:28 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=l8xAX6zO; spf=pass (imf20.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.160.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712987908; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FjFvmkrzi13rpe1gaHU7tj05ZLNOeYzGSqdwOX09dTo=; b=MEPHO7XGNkRpiFuOg5qqrBUNHASLwXYwUU6ZngCisuiQr6o/qkTjyJZkawhmPlMqpp1LET a2L9plKiDvbi9dQ68looBrhzvJt3NR4CrDA1zSfgITa/rpxmcAO6Xzt0P9v2YK29sMo39+ 6z66xvUdXVkgYirJvWIRgkncoCqs5hY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712987908; a=rsa-sha256; cv=none; b=SUEJW30MvzrK8jZsc/I7eLQV0enMGe6X4KBDvZ5p7Yn1nrCoYC3JHzqjyuxURnofOJUhHY rwrRXRy2vitaBwf8G7S09CuxoRnjX0JPamSykhQBsYmCopRB5rHpGv7jVPau+8VCTE8daN cP0mbqznRBa7W3XmSEdzZCQoX7gR8K8= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=l8xAX6zO; spf=pass (imf20.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.160.54 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-oa1-f54.google.com with SMTP id 586e51a60fabf-222b6a05bb1so933686fac.3 for ; Fri, 12 Apr 2024 22:58:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712987907; x=1713592707; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FjFvmkrzi13rpe1gaHU7tj05ZLNOeYzGSqdwOX09dTo=; b=l8xAX6zOowhkiXg++0G5hKXwkZ5XG5TCnbGVMxH6rRlmdNdYuNVllMXRxfsVI3n5Wd rlk9fiORzweY0W3gMpdYkFBeeWcGWUpQ+PufHgFyENxybU2+8nyUWgGjOEsReUuzXr8C sM2nOMgTfnFFTVwzknJJnQ2UbvhkCLhvPTxg9Tbl2AE6b8KbVGDdBg+8UF6Xu2Hi7wsw ryO3AmLD1W8zMUV2VowQOZSbrysTqA+tNUBIk3b7NYU4m0yzWEOSZs3+btvqrS1unYrC EEPVpbzvkwu4D3PUF9zfE1BXdtibkb4nJMthZVPq4Bl7wypMS4SWc6DNli3U697WOItt VgIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712987907; x=1713592707; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FjFvmkrzi13rpe1gaHU7tj05ZLNOeYzGSqdwOX09dTo=; b=gmAvnhIvihhHBNcFDSdmTXiQ6YkOtI3aSia3iOHQK4qszeQbjDh1X2DdhUWgS2blaP e7cYfvEen+dRBRw3WBZZj65qp+xc3niv6RiDviiP1JtrgKsw9c4hJZHlKCfF7AoAZdZB WPHLc5WF9ZtaVl87m8FwSkFrGT1VmsCpzExe8F15RNYCEvPXEewFou8aWwPeR/KoBDlG 4CdbbUiAQ3BEHT7R2b1dlgNEKtp0Xd776ho1Wnz+BY6ZaeqPo/KKMp1wdQpyXqFEc230 7SbSdzXC4xQGQUvAuEffA4GgQrYduWd+stE6vDtRO/Bh0+oOuPVARNdZzP9VWcMRSryv ez2Q== X-Forwarded-Encrypted: i=1; AJvYcCXxrv7byMpB6jddl5v/Sap8quAEhEn2TRMVFiouhE7N6KLK0BAel1ZF6dzdEuoiGWbRGmaYEdlVrcjUJLyPNInzYkk= X-Gm-Message-State: AOJu0YzkinhH2KbIc689D4VTW+AcQ/rUV2Po+eh/gLU7Jb8KNPp/ezb4 7XX0g5oUDrl3aErhq7rrVaD4ldRzeb3vNApH2EHaONKvrs/EftLdDERLK4N1NweNZKu4KPGAY9H gv0xI5f96LhvkH+CRBrYyEXB1Ydk= X-Google-Smtp-Source: AGHT+IFb+k36oaBQOwp8PZ6UyEoUSHB/QVqViPq1CEHfj4ru0PRI5QbdNndx+Q8+hZQNZXKJrDX0dXX3kCOqO5mlx1U= X-Received: by 2002:a05:6870:430e:b0:221:8a03:6dea with SMTP id w14-20020a056870430e00b002218a036deamr5316637oah.38.1712987907512; Fri, 12 Apr 2024 22:58:27 -0700 (PDT) MIME-Version: 1.0 References: <20240413015410.30951-1-lipeifeng@oppo.com> <00861870-5c40-4f00-b91f-fc4cfb4a780b@oppo.com> In-Reply-To: <00861870-5c40-4f00-b91f-fc4cfb4a780b@oppo.com> From: Barry Song <21cnbao@gmail.com> Date: Sat, 13 Apr 2024 17:58:16 +1200 Message-ID: Subject: Re: [PATCH v2] mm/shrinker: add SHRINKER_NO_DIRECT_RECLAIM To: lipeifeng@oppo.com Cc: akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, roman.gushchin@linux.dev, muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: h1rtss587cobhkgp7gi66ux3r3t47114 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: AA6B31C0003 X-Rspam-User: X-HE-Tag: 1712987908-109147 X-HE-Meta: U2FsdGVkX1/Q8SkGwSIqL2ZzlGjU8XMGwdQSgzaXN1eAPhfuNCdTo2Ims89A1WeDQpWvcwQuqdvPMOVK3Oz26Ykk+BaapxedWPwxHtryB37rCnPSn/iWocWUHK9iQadl5F17qdMzQNVivzhhOOZJeXxMiQsrCAUSYC/ioro1cSjcu73FSsJhxwXHP91nq6yNrvyoe+G7my6bHXLW2bVDEuLQEOJxvPzkvHjv3Yz0BWC1nc+U2sBM1EVAaH8l7FTApOLa8aFYJMh3WlwSX2RyIy7GgJbNYJWxeKeTQ/0pmhkNIMBjUIe9vgGYHvGciiGDXX/08JFSEaVrXHxFNoqVtRtldJpDEzPtpMfYGc6aGbWIDpNRSBNMjGa34cUXwhNBWn+CsyQ+2b9vi0MBaikSrs1TtebXY/61UmCLQ6Jjghyg/WZOUh2jH8TbO+MMxZXDX6sa8WZPOobCBaLJbqewvUCABsrrSEi8lHwieeJfcx4otxsDF/qNeJJSGtqX6aDnk4WLIn5GGmxSZKGpcyzYTEM2g20HgUMjUr7j/3ugGFIfuqCzzTqICWiMHJxK46cYdVT6AGrXAH66lnbTFRZ7woeNVR2imi2t1N17SLEajRuHlnNYDJ2xoS9rxWlH7w3RwC4kLOYT48p0TtH5ZRvnxDYI/NZV1PjxvKtekQQs1XzbJaLyCwx/xgIZTSrWLso66F6VYcBc0ffueZEk7EqB3XugoS+/WM9tdE9Mf9lRDaAOjQv/uZ8Opgybl8kK45kuVW5gQPNcSk5dMEQ4Vu8gN4SnAJwfWxFbqwfCeZlZVyZTu+OyWOaSTU0po1Q2Fy4SPIIG3vsbsu5MqdyMUlxtU7XP5dB/5YrKxoEx6Cxu5S48tyJS/qfocfD0/pMGOOLyttzCjqFMjNpYMBlw6XSecK53ik3bdD0PGiAWMFlDjU4MjWD8HbVLsRY1NX9MQACMYmd2bqFS900y/CCSOwx uXYbpiW1 8Qj+lkTA/yLsSp7B+nuFVfGd5K6oaVkaAbJWml40XMTEhDV72CBDjCpU1isrptvwQBmlwq9MD4Q0f/ji45U/uUSv5J07JlvtYsdaxC9dAO74udIDseDze9pz5LY8qvdUghE1dRuOH4DbXYbngZBxq9LbidE9kGVzpOajcN1cjHpgMltA+UR7SpsV3p6EsQQZi584dcnoX875sdKm5SKCn+DZb+LBPubWfoVjdu2V3kYkHEtKz9cguJQBW4+p1dVJoP61V0wdyyBQpXVYCuG4Ekelsk3buhfvjCrcA5tiN9k5EFFfiNsBlSxcytta/ckudJit04ZjgPS41V5x6cpPfCqy8igGW0Dh7UFXS2KxPd8nQ8o5onwcko+gKRd6lLUCQ0kiVahmMbA+ehO8cJUYLg4BRZKKPkZRB4GrvjlZF1Vh3EdKuxxVyZvVXpZyrhmycR6XkGBCmA0jaXvpSAQGLtAB8TI5fOamCpLEEJabej3lNPF1+erp/96sT5w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Apr 13, 2024 at 5:42=E2=80=AFPM =E6=9D=8E=E5=9F=B9=E9=94=8B wrote: > > > =E5=9C=A8 2024/4/13 13:19, Barry Song =E5=86=99=E9=81=93: > > On Sat, Apr 13, 2024 at 1:54=E2=80=AFPM wrote: > >> From: Peifeng Li > >> > >> In the case of insufficient memory, threads will be in direct_reclaim = to > >> reclaim memory, direct_reclaim will call shrink_slab to run sequential= ly > >> each shrinker callback. If there is a lock-contention in the shrinker > >> callback,such as spinlock,mutex_lock and so on, threads may be likely = to > >> be stuck in direct_reclaim for a long time, even if the memfree has re= ached > >> the high watermarks of the zone, resulting in poor performance of thre= ads. > >> > >> Example 1: shrinker callback may wait for spinlock > >> static unsigned long mb_cache_shrink(struct mb_cache *cache, > >> unsigned long nr_to_scan) > >> { > >> struct mb_cache_entry *entry; > >> unsigned long shrunk =3D 0; > >> > >> spin_lock(&cache->c_list_lock); > >> while (nr_to_scan-- && !list_empty(&cache->c_list)) { > >> entry =3D list_first_entry(&cache->c_list, > >> struct mb_cache_entry, e_lis= t); > >> if (test_bit(MBE_REFERENCED_B, &entry->e_flags) || > >> atomic_cmpxchg(&entry->e_refcnt, 1, 0) !=3D 1) { > >> clear_bit(MBE_REFERENCED_B, &entry->e_flags); > >> list_move_tail(&entry->e_list, &cache->c_list= ); > >> continue; > >> } > >> list_del_init(&entry->e_list); > >> cache->c_entry_count--; > >> spin_unlock(&cache->c_list_lock); > >> __mb_cache_entry_free(cache, entry); > >> shrunk++; > >> cond_resched(); > >> spin_lock(&cache->c_list_lock); > >> } > >> spin_unlock(&cache->c_list_lock); > >> > >> return shrunk; > >> } > >> Example 2: shrinker callback may wait for mutex lock > >> static > >> unsigned long kbase_mem_evictable_reclaim_scan_objects(struct shrinker= *s, > >> struct shrink_control *sc) > >> { > >> struct kbase_context *kctx; > >> struct kbase_mem_phy_alloc *alloc; > >> struct kbase_mem_phy_alloc *tmp; > >> unsigned long freed =3D 0; > >> > >> kctx =3D container_of(s, struct kbase_context, reclaim); > >> > >> // MTK add to prevent false alarm > >> lockdep_off(); > >> > >> mutex_lock(&kctx->jit_evict_lock); > >> > >> list_for_each_entry_safe(alloc, tmp, &kctx->evict_list, evict= _node) { > >> int err; > >> > >> err =3D kbase_mem_shrink_gpu_mapping(kctx, alloc->reg= , > >> 0, alloc->nents); > >> if (err !=3D 0) { > >> freed =3D -1; > >> goto out_unlock; > >> } > >> > >> alloc->evicted =3D alloc->nents; > >> > >> kbase_free_phy_pages_helper(alloc, alloc->evicted); > >> freed +=3D alloc->evicted; > >> list_del_init(&alloc->evict_node); > >> > >> kbase_jit_backing_lost(alloc->reg); > >> > >> if (freed > sc->nr_to_scan) > >> break; > >> } > >> out_unlock: > >> mutex_unlock(&kctx->jit_evict_lock); > >> > >> // MTK add to prevent false alarm > >> lockdep_on(); > >> > >> return freed; > >> } > >> > >> In mobile-phone,threads are likely to be stuck in shrinker callback du= ring > >> direct_reclaim, with example like the following: > >> <...>-2806 [004] ..... 866458.339840: mm_shrink_slab_start: > >> dynamic_mem_shrink_scan+0x0/0xb8 ... priority= 2 > >> <...>-2806 [004] ..... 866459.339933: mm_shrink_slab_end: > >> dynamic_mem_shrink_scan+0x0/0xb8 ... > >> > >> For the above reason, the patch introduces SHRINKER_NO_DIRECT_RECLAIM = that > >> allows driver to set shrinker callback not to be called in direct_recl= aim > >> unless sc->priority is 0. > >> > >> The reason why sc->priority=3D0 allows shrinker callback to be called = in > >> direct_reclaim is for two reasons: > >> 1.Always call all shrinker callback in drop_slab that priority is 0. > >> 2.sc->priority is 0 during direct_reclaim, allow direct_reclaim to cal= l > >> shrinker callback, to reclaim memory timely. We already provide current_is_kswapd() and shrinker_control to drivers. If = you believe that sc->priority can assist shrinker callbacks in behaving differently, why not simply pass it along with the shrinker_control and allow drivers to decide their preferred course of action? I don't find it reasonable to reverse the approach. Allowing drivers to pass a flag to the memory management core doesn't seem sensible. > >> > >> Note: > >> 1.shrinker_register() default not to set SHRINKER_NO_DIRECT_RECLAIM, t= o > >> maintain the current behavior of the code. > >> 2.Logic of kswapd and drop_slab to call shrinker callback isn't affect= ed. > >> > >> Signed-off-by: Peifeng Li > >> --- > >> -v2: fix the commit message > >> include/linux/shrinker.h | 5 +++++ > >> mm/shrinker.c | 38 +++++++++++++++++++++++++++++++++++--- > >> 2 files changed, 40 insertions(+), 3 deletions(-) > >> > >> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h > >> index 1a00be90d93a..2d5a8b3a720b 100644 > >> --- a/include/linux/shrinker.h > >> +++ b/include/linux/shrinker.h > >> @@ -130,6 +130,11 @@ struct shrinker { > >> * non-MEMCG_AWARE shrinker should not have this flag set. > >> */ > >> #define SHRINKER_NONSLAB BIT(4) > >> +/* > >> + * Can shrinker callback be called in direct_relcaim unless > >> + * sc->priority is 0? > >> + */ > >> +#define SHRINKER_NO_DIRECT_RECLAIM BIT(5) > >> > > My point is, drivers won't voluntarily stay unreclaimed during direct > > reclamation. Hence, this approach is unlikely to succeed. Those > > drivers can't be trusted. Had they been aware of their slowness, > > they wouldn't have written code in this manner. > > Actually, we hope there is a way for us to solve the block of shrinker > callback, > > Because many drivers can't remove their lock in shrinker callback > timely, with > > the flags, we can tell drivers to add it. > > > Detecting problematic driver shrinkers and marking them as skipped > > might prove challenging. I concur with Zhengqi; the priority should > > be fixing the driver whose shrinker is slow. Do you have a list of > > slow drivers? > > Most of slow drivers hadn't been upstreamed, so that we can not gather > > a list of slow drivers. > > I am curious if executing do_shrink_slab() asynchronously could be accept= ed > > in Linux? or executing part of shrinker callbacks asynchronously? That entirely hinges on the type of data we possess. As Zhengqi pointed out= , asynchronous slab shrinkers could also impede memory reclamation. If the data eventually shows that this isn't an issue, asynchronous slab shrinkers might discover a solution. > > In my mind, if the memory-reclaim-path of the kernel would be affected by > > the driver, the robustness of the kernel will be greatly reduced. > > > > > > >> __printf(2, 3) > >> struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt,= ...); > >> diff --git a/mm/shrinker.c b/mm/shrinker.c > >> index dc5d2a6fcfc4..7a5dffd166cd 100644 > >> --- a/mm/shrinker.c > >> +++ b/mm/shrinker.c > >> @@ -4,7 +4,7 @@ > >> #include > >> #include > >> #include > >> - > >> +#include > >> #include "internal.h" > >> > >> LIST_HEAD(shrinker_list); > >> @@ -544,7 +544,23 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_= mask, int nid, > >> if (!memcg_kmem_online() && > >> !(shrinker->flags & SHRINKER_NONSLAB)) > >> continue; > >> - > >> + /* > >> + * SHRINKER_NO_DIRECT_RECLAIM, mean that shrin= ker callback > >> + * should not be called in direct_reclaim unle= ss priority > >> + * is 0. > >> + */ > >> + if ((shrinker->flags & SHRINKER_NO_DIRECT_RECL= AIM) && > >> + !current_is_kswapd()) { > >> + /* > >> + * 1.Always call shrinker callback in = drop_slab that > >> + * priority is 0. > >> + * 2.sc->priority is 0 during direct_r= eclaim, allow > >> + * direct_reclaim to call shrinker cal= lback, to reclaim > >> + * memory timely. > >> + */ > >> + if (priority) > >> + continue; > >> + } > >> ret =3D do_shrink_slab(&sc, shrinker, priorit= y); > >> if (ret =3D=3D SHRINK_EMPTY) { > >> clear_bit(offset, unit->map); > >> @@ -658,7 +674,23 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid= , struct mem_cgroup *memcg, > >> continue; > >> > >> rcu_read_unlock(); > >> - > >> + /* > >> + * SHRINKER_NO_DIRECT_RECLAIM, mean that shrinker call= back > >> + * should not be called in direct_reclaim unless prior= ity > >> + * is 0. > >> + */ > >> + if ((shrinker->flags & SHRINKER_NO_DIRECT_RECLAIM) && > >> + !current_is_kswapd()) { > >> + /* > >> + * 1.Always call shrinker callback in drop_sla= b that > >> + * priority is 0. > >> + * 2.sc->priority is 0 during direct_reclaim, = allow > >> + * direct_reclaim to call shrinker callback, t= o reclaim > >> + * memory timely. > >> + */ > >> + if (priority) > >> + continue; > >> + } > >> ret =3D do_shrink_slab(&sc, shrinker, priority); > >> if (ret =3D=3D SHRINK_EMPTY) > >> ret =3D 0; > >> -- > >> 2.34.1 > > Thanks > > Barry