From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D1E6C433F5 for ; Sat, 2 Apr 2022 17:55:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 681096B0071; Sat, 2 Apr 2022 13:54:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6079C6B0072; Sat, 2 Apr 2022 13:54:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 481B16B0073; Sat, 2 Apr 2022 13:54:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 31A956B0071 for ; Sat, 2 Apr 2022 13:54:53 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id EB5D560798 for ; Sat, 2 Apr 2022 17:54:42 +0000 (UTC) X-FDA: 79312689204.07.FFE6FEF Received: from out0.migadu.com (out0.migadu.com [94.23.1.103]) by imf06.hostedemail.com (Postfix) with ESMTP id 0E1CD180019 for ; Sat, 2 Apr 2022 17:54:41 +0000 (UTC) Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1648922079; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tZzJyOenYgHGR7Qus+38SF8NRb7w4EHzCNnvX97aW7g=; b=fWKZ6gNuhfl1YgF5V33Yc3q2pJAxGzp48CNKfHcyIR9qqmetC2uU9ExOPoCBI6HyEnmUax FSERvPxsH4zHkehIgZVNTXQn+t6UaSMZ1kiCuqbmxCpHtRSEn5SHfs8Z9yz4+LH1OP/szU fo0xK26iXa+17pzPNUvHkf4SG3G0E7g= Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [RFC] mm/vmscan: add periodic slab shrinker X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin In-Reply-To: <20220402072103.5140-1-hdanton@sina.com> Date: Sat, 2 Apr 2022 10:54:36 -0700 Cc: MM , Matthew Wilcox , Dave Chinner , Mel Gorman , Stephen Brennan , Yu Zhao , David Hildenbrand , LKML Message-Id: References: <20220402072103.5140-1-hdanton@sina.com> To: Hillf Danton X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0E1CD180019 X-Rspam-User: Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=fWKZ6gNu; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf06.hostedemail.com: domain of roman.gushchin@linux.dev designates 94.23.1.103 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev X-Stat-Signature: u8bqsx1wt9c5wkkr81caw9dzu3cgd5qh X-HE-Tag: 1648922081-58919 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello Hillf! Thank you for sharing it, really interesting! I=E2=80=99m actually working o= n the same problem.=20 No code to share yet, but here are some of my thoughts: 1) If there is a =E2=80=9Cnatural=E2=80=9D memory pressure, no additional sl= ab scanning is needed. 2) =46rom a power perspective it=E2=80=99s better to scan more at once, but l= ess often. 3) Maybe we need a feedback loop with the slab allocator: e.g. if slabs are a= lmost full there is more sense to do a proactive scanning and free up some m= emory, otherwise we=E2=80=99ll end up allocating more slabs. But it=E2=80=99= s tricky. 4) If the scanning is not resulting in any memory reclaim, maybe we should (= temporarily) exclude the corresponding shrinker from the scanning. Thanks! > On Apr 2, 2022, at 12:21 AM, Hillf Danton wrote: >=20 > =EF=BB=BFTo mitigate the pain of having "several millions" of negative den= tries in > a single directory [1] for example, add the periodic slab shrinker that > runs independent of direct and background reclaimers in bid to recycle the= > slab objects that haven been cold for more than 30 seconds. >=20 > Q, Why is it needed? > A, Kswapd may take a nap as long as 30 minutes. >=20 > Add periodic flag to shrink control to let cache owners know this is the > periodic shrinker that equals to the regular one running at the lowest > recalim priority, and feel free to take no action without one-off objects > piling up. >=20 > Only for thoughts now. >=20 > Hillf >=20 > [1] https://lore.kernel.org/linux-fsdevel/20220209231406.187668-1-stephen.= s.brennan@oracle.com/ >=20 > --- x/include/linux/shrinker.h > +++ y/include/linux/shrinker.h > @@ -14,6 +14,7 @@ struct shrink_control { >=20 > /* current node being shrunk (for NUMA aware shrinkers) */ > int nid; > + int periodic; >=20 > /* > * How many objects scan_objects should scan and try to reclaim. > --- x/mm/vmscan.c > +++ y/mm/vmscan.c > @@ -781,6 +781,8 @@ static unsigned long do_shrink_slab(stru > scanned +=3D shrinkctl->nr_scanned; >=20 > cond_resched(); > + if (shrinkctl->periodic) > + break; > } >=20 > /* > @@ -906,7 +908,8 @@ static unsigned long shrink_slab_memcg(g > */ > static unsigned long shrink_slab(gfp_t gfp_mask, int nid, > struct mem_cgroup *memcg, > - int priority) > + int priority, > + int periodic) > { > unsigned long ret, freed =3D 0; > struct shrinker *shrinker; > @@ -929,6 +932,7 @@ static unsigned long shrink_slab(gfp_t g > .gfp_mask =3D gfp_mask, > .nid =3D nid, > .memcg =3D memcg, > + .periodic =3D periodic, > }; >=20 > ret =3D do_shrink_slab(&sc, shrinker, priority); > @@ -952,7 +956,7 @@ out: > return freed; > } >=20 > -static void drop_slab_node(int nid) > +static void drop_slab_node(int nid, int periodic) > { > unsigned long freed; > int shift =3D 0; > @@ -966,19 +970,31 @@ static void drop_slab_node(int nid) > freed =3D 0; > memcg =3D mem_cgroup_iter(NULL, NULL, NULL); > do { > - freed +=3D shrink_slab(GFP_KERNEL, nid, memcg, 0); > + freed +=3D shrink_slab(GFP_KERNEL, nid, memcg, 0, periodic); > } while ((memcg =3D mem_cgroup_iter(NULL, memcg, NULL)) !=3D NULL);= > } while ((freed >> shift++) > 1); > } >=20 > -void drop_slab(void) > +static void __drop_slab(int periodic) > { > int nid; >=20 > for_each_online_node(nid) > - drop_slab_node(nid); > + drop_slab_node(nid, periodic); > +} > + > +void drop_slab(void) > +{ > + __drop_slab(0); > } >=20 > +static void periodic_slab_shrinker_workfn(struct work_struct *work) > +{ > + __drop_slab(1); > + queue_delayed_work(system_unbound_wq, to_delayed_work(work), 30*HZ); > +} > +static DECLARE_DELAYED_WORK(periodic_slab_shrinker, periodic_slab_shrinke= r_workfn); > + > static inline int is_page_cache_freeable(struct folio *folio) > { > /* > @@ -3098,7 +3114,7 @@ static void shrink_node_memcgs(pg_data_t > shrink_lruvec(lruvec, sc); >=20 > shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, > - sc->priority); > + sc->priority, 0); >=20 > /* Record the group's reclaim efficiency */ > vmpressure(sc->gfp_mask, memcg, false, > @@ -4354,8 +4370,11 @@ static void kswapd_try_to_sleep(pg_data_ > */ > set_pgdat_percpu_threshold(pgdat, calculate_normal_threshold); >=20 > - if (!kthread_should_stop()) > + if (!kthread_should_stop()) { > + queue_delayed_work(system_unbound_wq, > + &periodic_slab_shrinker, 60*HZ); > schedule(); > + } >=20 > set_pgdat_percpu_threshold(pgdat, calculate_pressure_threshold); > } else { > -- >=20