From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=zxrt=CX=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 52793C43461
	for <linux-mm@archiver.kernel.org>; Mon, 14 Sep 2020 15:17:43 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id ECC6020756
	for <linux-mm@archiver.kernel.org>; Mon, 14 Sep 2020 15:17:42 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ECC6020756
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 5AFC48E0001; Mon, 14 Sep 2020 11:17:42 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 560056B007B; Mon, 14 Sep 2020 11:17:42 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 475888E0001; Mon, 14 Sep 2020 11:17:42 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0068.hostedemail.com [216.40.44.68])
	by kanga.kvack.org (Postfix) with ESMTP id 2EAFA6B005D
	for <linux-mm@kvack.org>; Mon, 14 Sep 2020 11:17:42 -0400 (EDT)
Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id D36151E0A
	for <linux-mm@kvack.org>; Mon, 14 Sep 2020 15:17:41 +0000 (UTC)
X-FDA: 77262021522.23.help29_291792527109
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin23.hostedemail.com (Postfix) with ESMTP id 69A2C37617
	for <linux-mm@kvack.org>; Mon, 14 Sep 2020 15:17:39 +0000 (UTC)
X-HE-Tag: help29_291792527109
X-Filterd-Recvd-Size: 5587
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
	by imf49.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Mon, 14 Sep 2020 15:17:38 +0000 (UTC)
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.221.27])
	by mx2.suse.de (Postfix) with ESMTP id 9B45CB527;
	Mon, 14 Sep 2020 15:17:52 +0000 (UTC)
Date: Mon, 14 Sep 2020 17:17:36 +0200
From: Michal Hocko <mhocko@suse.com>
To: Chunxin Zang <zangchunxin@bytedance.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Muchun Song <songmuchun@bytedance.com>
Subject: Re: [External] Re: [PATCH v2] mm/vmscan: fix infinite loop in
 drop_slab_node
Message-ID: <20200914151736.GA16999@dhcp22.suse.cz>
References: <20200909152047.27905-1-zangchunxin@bytedance.com>
 <20200914093032.GG16999@dhcp22.suse.cz>
 <CAKRVAeP1yPDTcdcW+H6EnMrDHsWGNkooGcSyeYWHi8CXCc+u4Q@mail.gmail.com>
 <20200914134713.GS16999@dhcp22.suse.cz>
 <CAKRVAeOejq8D63FC=Vhhu7VcRba__d-m8OGUQoRDvcUuh7L2ZA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <CAKRVAeOejq8D63FC=Vhhu7VcRba__d-m8OGUQoRDvcUuh7L2ZA@mail.gmail.com>
X-Rspamd-Queue-Id: 69A2C37617
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam02
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon 14-09-20 23:02:15, Chunxin Zang wrote:
> On Mon, Sep 14, 2020 at 9:47 PM Michal Hocko <mhocko@suse.com> wrote:
>=20
> > On Mon 14-09-20 21:25:59, Chunxin Zang wrote:
> > > On Mon, Sep 14, 2020 at 5:30 PM Michal Hocko <mhocko@suse.com> wrot=
e:
> > >
> > > > The subject is misleading because this patch doesn't fix an infin=
ite
> > > > loop, right? It just allows the userspace to interrupt the operat=
ion.
> > > >
> > > >
> > > Yes,  so we are making a separate patch follow Vlastimil's
> > recommendations.
> > > Use double of threshold to end the loop.
> >
> > That still means the changelog needs an update
> >
>=20
> The patch is already merged in Linux-next branch.  Can I update the
> changelog now?

Yes. Andrew will refresh it. He doesn't have a git tree which would
prevent rewriting the patch.

> This is my first patch, please forgive me :)

No worries. The mm patch workflow is rather different from others.

> > > On Thu, Sep 10, 2020 at 1:59 AM Vlastimil Babka <vbabka@suse.cz> wr=
ote:
> > > > > From: Chunxin Zang <zangchunxin@bytedance.com>
> > > > >
> > > > ...
> > > > - IMHO it's still worth to bail out in your scenario even without=
 a
> > > > signal, e.g.
> > > > by the doubling of threshold. But it can be a separate patch.
> > > > Thanks!
> > > > ...
> > >
> > >
> > >
> > > On Wed 09-09-20 23:20:47, zangchunxin@bytedance.com wrote:
> > > > > From: Chunxin Zang <zangchunxin@bytedance.com>
> > > > >
> > > > > On our server, there are about 10k memcg in one machine. They u=
se
> > memory
> > > > > very frequently. When I tigger drop caches=EF=BC=8Cthe process =
will infinite
> > loop
> > > > > in drop_slab_node.
> > > >
> > > > Is this really an infinite loop, or it just takes a lot of time t=
o
> > > > process all the metadata in that setup? If this is really an infi=
nite
> > > > loop then we should look at it. My current understanding is that =
the
> > > > operation would finish at some time it just takes painfully long =
to get
> > > > there.
> > > >
> > >
> > > Yes,  it's really an infinite loop.  Every loop spends a lot of tim=
e. In
> > > this time,
> > > memcg will alloc/free memory,  so the next loop, the total of  'fre=
ed'
> > > always bigger than 10.
> >
> > I am still not sure I follow. Do you mean that there is somebody
> > constantly generating more objects to reclaim?
> >
>=20
> Yes, this is my meaning. :)
>=20
>=20
> > Maybe we are just not agreeing on the definition of an infinite loop =
but
> > in my book that means that the final condition can never be met. Whil=
e a
> > busy adding new object might indeed cause drop caches to loop for a l=
ong
> > time this is to be expected from that interface as it is supposed to
> > drop all the cache and that can grow during the operation.
> > --
> >
>=20
> Because I have 10k memcg , all of them are heavy users of memory.
> During each loop, there are always more than 10 reclaimable objects
> generating, so the
> condition is never met.

10k or any number of memcgs shouldn't really make much of a difference.
Except for the time the scan adds. Fundamentally we are talking about
freed objects and whether they are on the global or per memcg lists
should result in a similar behavior.

> The drop cache process has no chance to exit the
> loop.
> Although the purpose of the 'drop cache' interface is to release all
> caches, we still need a
> way to terminate it, e.g. in this case, the process took too long to ru=
n .

Yes, this is perfectly understandable. Having a bail out on fatal signal
is a completely reasonable thing to do. I am mostly confused by your
infinite loop claims and what the relation of this patch to it. I would
propose this wording instead

"
We have observed that drop_caches can take a considerable amount of
time (<put data here>). Especially when there are many memcgs involved
because they are adding an additional overhead.

It is quite unfortunate that the operation cannot be interrupted by a
signal currently. Add a check for fatal signals into the main loop
so that userspace can control early bailout.
"

or something along those lines.

>=20
>   root  357956 ... R    Aug25 21119854:55 echo 3 > /proc/sys/vm/drop_ca=
ches

--=20
Michal Hocko
SUSE Labs