From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5D1E6C433F5
	for <linux-mm@archiver.kernel.org>; Sat,  2 Apr 2022 17:55:04 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 681096B0071; Sat,  2 Apr 2022 13:54:53 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 6079C6B0072; Sat,  2 Apr 2022 13:54:53 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 481B16B0073; Sat,  2 Apr 2022 13:54:53 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24])
	by kanga.kvack.org (Postfix) with ESMTP id 31A956B0071
	for <linux-mm@kvack.org>; Sat,  2 Apr 2022 13:54:53 -0400 (EDT)
Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id EB5D560798
	for <linux-mm@kvack.org>; Sat,  2 Apr 2022 17:54:42 +0000 (UTC)
X-FDA: 79312689204.07.FFE6FEF
Received: from out0.migadu.com (out0.migadu.com [94.23.1.103])
	by imf06.hostedemail.com (Postfix) with ESMTP id 0E1CD180019
	for <linux-mm@kvack.org>; Sat,  2 Apr 2022 17:54:41 +0000 (UTC)
Content-Type: text/plain; charset=utf-8
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1648922079;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=tZzJyOenYgHGR7Qus+38SF8NRb7w4EHzCNnvX97aW7g=;
	b=fWKZ6gNuhfl1YgF5V33Yc3q2pJAxGzp48CNKfHcyIR9qqmetC2uU9ExOPoCBI6HyEnmUax
	FSERvPxsH4zHkehIgZVNTXQn+t6UaSMZ1kiCuqbmxCpHtRSEn5SHfs8Z9yz4+LH1OP/szU
	fo0xK26iXa+17pzPNUvHkf4SG3G0E7g=
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [RFC] mm/vmscan: add periodic slab shrinker
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Roman Gushchin <roman.gushchin@linux.dev>
In-Reply-To: <20220402072103.5140-1-hdanton@sina.com>
Date: Sat, 2 Apr 2022 10:54:36 -0700
Cc: MM <linux-mm@kvack.org>, Matthew Wilcox <willy@infradead.org>,
 Dave Chinner <david@fromorbit.com>,
 Mel Gorman <mgorman@techsingularity.net>,
 Stephen Brennan <stephen.s.brennan@oracle.com>,
 Yu Zhao <yuzhao@google.com>, David Hildenbrand <david@redhat.com>,
 LKML <linux-kernel@vger.kernel.org>
Message-Id: <CE7BB198-BF09-4D9F-AE99-85324B81E472@linux.dev>
References: <20220402072103.5140-1-hdanton@sina.com>
To: Hillf Danton <hdanton@sina.com>
X-Migadu-Flow: FLOW_OUT
X-Migadu-Auth-User: linux.dev
X-Rspamd-Server: rspam06
X-Rspamd-Queue-Id: 0E1CD180019
X-Rspam-User: 
Authentication-Results: imf06.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=fWKZ6gNu;
	dmarc=pass (policy=none) header.from=linux.dev;
	spf=pass (imf06.hostedemail.com: domain of roman.gushchin@linux.dev designates 94.23.1.103 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev
X-Stat-Signature: u8bqsx1wt9c5wkkr81caw9dzu3cgd5qh
X-HE-Tag: 1648922081-58919
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Hello Hillf!

Thank you for sharing it, really interesting! I=E2=80=99m actually working o=
n the same problem.=20

No code to share yet, but here are some of my thoughts:
1) If there is a =E2=80=9Cnatural=E2=80=9D memory pressure, no additional sl=
ab scanning is needed.
2) =46rom a power perspective it=E2=80=99s better to scan more at once, but l=
ess often.
3) Maybe we need a feedback loop with the slab allocator: e.g. if slabs are a=
lmost full there is more sense to do a proactive scanning and free up some m=
emory, otherwise we=E2=80=99ll end up allocating more slabs. But it=E2=80=99=
s tricky.
4) If the scanning is not resulting in any memory reclaim, maybe we should (=
temporarily) exclude the corresponding shrinker from the scanning.

Thanks!

> On Apr 2, 2022, at 12:21 AM, Hillf Danton <hdanton@sina.com> wrote:
>=20
> =EF=BB=BFTo mitigate the pain of having "several millions" of negative den=
tries in
> a single directory [1] for example, add the periodic slab shrinker that
> runs independent of direct and background reclaimers in bid to recycle the=

> slab objects that haven been cold for more than 30 seconds.
>=20
> Q, Why is it needed?
> A, Kswapd may take a nap as long as 30 minutes.
>=20
> Add periodic flag to shrink control to let cache owners know this is the
> periodic shrinker that equals to the regular one running at the lowest
> recalim priority, and feel free to take no action without one-off objects
> piling up.
>=20
> Only for thoughts now.
>=20
> Hillf
>=20
> [1] https://lore.kernel.org/linux-fsdevel/20220209231406.187668-1-stephen.=
s.brennan@oracle.com/
>=20
> --- x/include/linux/shrinker.h
> +++ y/include/linux/shrinker.h
> @@ -14,6 +14,7 @@ struct shrink_control {
>=20
>    /* current node being shrunk (for NUMA aware shrinkers) */
>    int nid;
> +    int periodic;
>=20
>    /*
>     * How many objects scan_objects should scan and try to reclaim.
> --- x/mm/vmscan.c
> +++ y/mm/vmscan.c
> @@ -781,6 +781,8 @@ static unsigned long do_shrink_slab(stru
>        scanned +=3D shrinkctl->nr_scanned;
>=20
>        cond_resched();
> +        if (shrinkctl->periodic)
> +            break;
>    }
>=20
>    /*
> @@ -906,7 +908,8 @@ static unsigned long shrink_slab_memcg(g
>  */
> static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>                 struct mem_cgroup *memcg,
> -                 int priority)
> +                 int priority,
> +                 int periodic)
> {
>    unsigned long ret, freed =3D 0;
>    struct shrinker *shrinker;
> @@ -929,6 +932,7 @@ static unsigned long shrink_slab(gfp_t g
>            .gfp_mask =3D gfp_mask,
>            .nid =3D nid,
>            .memcg =3D memcg,
> +            .periodic =3D periodic,
>        };
>=20
>        ret =3D do_shrink_slab(&sc, shrinker, priority);
> @@ -952,7 +956,7 @@ out:
>    return freed;
> }
>=20
> -static void drop_slab_node(int nid)
> +static void drop_slab_node(int nid, int periodic)
> {
>    unsigned long freed;
>    int shift =3D 0;
> @@ -966,19 +970,31 @@ static void drop_slab_node(int nid)
>        freed =3D 0;
>        memcg =3D mem_cgroup_iter(NULL, NULL, NULL);
>        do {
> -            freed +=3D shrink_slab(GFP_KERNEL, nid, memcg, 0);
> +            freed +=3D shrink_slab(GFP_KERNEL, nid, memcg, 0, periodic);
>        } while ((memcg =3D mem_cgroup_iter(NULL, memcg, NULL)) !=3D NULL);=

>    } while ((freed >> shift++) > 1);
> }
>=20
> -void drop_slab(void)
> +static void __drop_slab(int periodic)
> {
>    int nid;
>=20
>    for_each_online_node(nid)
> -        drop_slab_node(nid);
> +        drop_slab_node(nid, periodic);
> +}
> +
> +void drop_slab(void)
> +{
> +    __drop_slab(0);
> }
>=20
> +static void periodic_slab_shrinker_workfn(struct work_struct *work)
> +{
> +    __drop_slab(1);
> +    queue_delayed_work(system_unbound_wq, to_delayed_work(work), 30*HZ);
> +}
> +static DECLARE_DELAYED_WORK(periodic_slab_shrinker, periodic_slab_shrinke=
r_workfn);
> +
> static inline int is_page_cache_freeable(struct folio *folio)
> {
>    /*
> @@ -3098,7 +3114,7 @@ static void shrink_node_memcgs(pg_data_t
>        shrink_lruvec(lruvec, sc);
>=20
>        shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
> -                sc->priority);
> +                sc->priority, 0);
>=20
>        /* Record the group's reclaim efficiency */
>        vmpressure(sc->gfp_mask, memcg, false,
> @@ -4354,8 +4370,11 @@ static void kswapd_try_to_sleep(pg_data_
>         */
>        set_pgdat_percpu_threshold(pgdat, calculate_normal_threshold);
>=20
> -        if (!kthread_should_stop())
> +        if (!kthread_should_stop()) {
> +            queue_delayed_work(system_unbound_wq,
> +                        &periodic_slab_shrinker, 60*HZ);
>            schedule();
> +        }
>=20
>        set_pgdat_percpu_threshold(pgdat, calculate_pressure_threshold);
>    } else {
> --
>=20