From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=zxrt=CX=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0C686C43461
	for <linux-mm@archiver.kernel.org>; Mon, 14 Sep 2020 09:30:36 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id A6724207EA
	for <linux-mm@archiver.kernel.org>; Mon, 14 Sep 2020 09:30:35 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A6724207EA
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 19D27900003; Mon, 14 Sep 2020 05:30:35 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 14F726B005A; Mon, 14 Sep 2020 05:30:35 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 08B02900003; Mon, 14 Sep 2020 05:30:35 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0253.hostedemail.com [216.40.44.253])
	by kanga.kvack.org (Postfix) with ESMTP id E80286B0037
	for <linux-mm@kvack.org>; Mon, 14 Sep 2020 05:30:34 -0400 (EDT)
Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id AFFF480726DA
	for <linux-mm@kvack.org>; Mon, 14 Sep 2020 09:30:34 +0000 (UTC)
X-FDA: 77261146788.06.tree90_100fea527107
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin06.hostedemail.com (Postfix) with ESMTP id 87B66105AF7C1
	for <linux-mm@kvack.org>; Mon, 14 Sep 2020 09:30:34 +0000 (UTC)
X-HE-Tag: tree90_100fea527107
X-Filterd-Recvd-Size: 4989
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
	by imf28.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Mon, 14 Sep 2020 09:30:34 +0000 (UTC)
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.221.27])
	by mx2.suse.de (Postfix) with ESMTP id 7D281AE85;
	Mon, 14 Sep 2020 09:30:48 +0000 (UTC)
Date: Mon, 14 Sep 2020 11:30:32 +0200
From: Michal Hocko <mhocko@suse.com>
To: zangchunxin@bytedance.com
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Muchun Song <songmuchun@bytedance.com>
Subject: Re: [PATCH v2] mm/vmscan: fix infinite loop in drop_slab_node
Message-ID: <20200914093032.GG16999@dhcp22.suse.cz>
References: <20200909152047.27905-1-zangchunxin@bytedance.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20200909152047.27905-1-zangchunxin@bytedance.com>
X-Rspamd-Queue-Id: 87B66105AF7C1
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam05
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

The subject is misleading because this patch doesn't fix an infinite
loop, right? It just allows the userspace to interrupt the operation.

On Wed 09-09-20 23:20:47, zangchunxin@bytedance.com wrote:
> From: Chunxin Zang <zangchunxin@bytedance.com>
>=20
> On our server, there are about 10k memcg in one machine. They use memor=
y
> very frequently. When I tigger drop caches=EF=BC=8Cthe process will inf=
inite loop
> in drop_slab_node.

Is this really an infinite loop, or it just takes a lot of time to
process all the metadata in that setup? If this is really an infinite
loop then we should look at it. My current understanding is that the
operation would finish at some time it just takes painfully long to get
there.

> There are two reasons:
> 1.We have too many memcgs, even though one object freed in one memcg, t=
he
>   sum of object is bigger than 10.
>=20
> 2.We spend a lot of time in traverse memcg once. So, the memcg who
>   traversed at the first have been freed many objects. Traverse memcg n=
ext
>   time, the freed count bigger than 10 again.
>=20
> We can get the following info through 'ps':
>=20
>   root:~# ps -aux | grep drop
>   root  357956 ... R    Aug25 21119854:55 echo 3 > /proc/sys/vm/drop_ca=
ches
>   root 1771385 ... R    Aug16 21146421:17 echo 3 > /proc/sys/vm/drop_ca=
ches
>   root 1986319 ... R    18:56 117:27 echo 3 > /proc/sys/vm/drop_caches
>   root 2002148 ... R    Aug24 5720:39 echo 3 > /proc/sys/vm/drop_caches
>   root 2564666 ... R    18:59 113:58 echo 3 > /proc/sys/vm/drop_caches
>   root 2639347 ... R    Sep03 2383:39 echo 3 > /proc/sys/vm/drop_caches
>   root 3904747 ... R    03:35 993:31 echo 3 > /proc/sys/vm/drop_caches
>   root 4016780 ... R    Aug21 7882:18 echo 3 > /proc/sys/vm/drop_caches
>=20
> Use bpftrace follow 'freed' value in drop_slab_node:
>=20
>   root:~# bpftrace -e 'kprobe:drop_slab_node+70 {@ret=3Dhist(reg("bp"))=
; }'
>   Attaching 1 probe...
>   ^B^C
>=20
>   @ret:
>   [64, 128)        1 |                                                 =
   |
>   [128, 256)      28 |                                                 =
   |
>   [256, 512)     107 |@                                                =
   |
>   [512, 1K)      298 |@@@                                              =
   |
>   [1K, 2K)       613 |@@@@@@@                                          =
   |
>   [2K, 4K)      4435 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@=
@@@|
>   [4K, 8K)       442 |@@@@@                                            =
   |
>   [8K, 16K)      299 |@@@                                              =
   |
>   [16K, 32K)     100 |@                                                =
   |
>   [32K, 64K)     139 |@                                                =
   |
>   [64K, 128K)     56 |                                                 =
   |
>   [128K, 256K)    26 |                                                 =
   |
>   [256K, 512K)     2 |                                                 =
   |
>=20
> In the while loop, we can check whether the TASK_KILLABLE signal is set=
,
> if so, we should break the loop.

I would make it explicit that this is not fixing the above scenario. It
just helps to cancel to operation which is a good thing in general.
=20
> Signed-off-by: Chunxin Zang <zangchunxin@bytedance.com>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

With updated changelog
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> 	changelogs in v2:=20
> 	1) Via check TASK_KILLABLE signal break loop.
>=20
>  mm/vmscan.c | 3 +++
>  1 file changed, 3 insertions(+)
>=20
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b6d84326bdf2..c3ed8b45d264 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -704,6 +704,9 @@ void drop_slab_node(int nid)
>  	do {
>  		struct mem_cgroup *memcg =3D NULL;
> =20
> +		if (fatal_signal_pending(current))
> +			return;
> +
>  		freed =3D 0;
>  		memcg =3D mem_cgroup_iter(NULL, NULL, NULL);
>  		do {
> --=20
> 2.11.0
>=20

--=20
Michal Hocko
SUSE Labs