From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3FA23C4332F
	for <linux-mm@archiver.kernel.org>; Tue, 14 Nov 2023 12:27:33 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id B1DB18D0040; Tue, 14 Nov 2023 07:27:32 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id ACCD08D002E; Tue, 14 Nov 2023 07:27:32 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 947DC8D0040; Tue, 14 Nov 2023 07:27:32 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 7F7778D002E
	for <linux-mm@kvack.org>; Tue, 14 Nov 2023 07:27:32 -0500 (EST)
Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id 38364A094D
	for <linux-mm@kvack.org>; Tue, 14 Nov 2023 12:27:32 +0000 (UTC)
X-FDA: 81456485544.26.ECB9089
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by imf21.hostedemail.com (Postfix) with ESMTP id 579901C0007
	for <linux-mm@kvack.org>; Tue, 14 Nov 2023 12:27:30 +0000 (UTC)
Authentication-Results: imf21.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DNnQdMIz;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf21.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1699964850;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=VVmHNNeFMDGSL4NxtqtI00/pc6EjR3H7Xa6XWdxbY84=;
	b=g7oLlj19J4L6JZqAAG1PAJVm7wO0sDtjxZvbwujOPEdY6iXH35tOTgcXoEU1aNRaqPLS+E
	IVcEcV4D9ESFIUrQOe53IpVu2Nx6R8klaHdrfn7P67Lt2mdmCSEFFmlCDt5WmpaTrPPo0B
	wVkkZI+QV7/NK8YfDiwY+TW4KNFf1hs=
ARC-Authentication-Results: i=1;
	imf21.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DNnQdMIz;
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=pass (imf21.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699964850; a=rsa-sha256;
	cv=none;
	b=3IoEx2eFIcg2CPi1I1gVOWeY2tHreUohKITV/+Wa8RlJxe69hvRGMaQTvJM6fExV1Vyaqm
	JMPupTaLQT0m2qo+1+VnNd0Zm671EKzkaS77N6el8hHTbCIBqBdRXA+SfG22e4A8hF5s9f
	ymcN+1YbkyDb/gSb1n8yjTT3Tq0/0/0=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1699964849;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=VVmHNNeFMDGSL4NxtqtI00/pc6EjR3H7Xa6XWdxbY84=;
	b=DNnQdMIz3/xbyFe4RkV1cqUSohHZSR1QSdQUHDQ3fXPfJuQGDsvuWdxlVcmKSh63Au4pb6
	xEUyDTSKx8S0zq5VYc6LDF/pjsdeYD+bL/6D0EieFjarHcJn6au86OmZUxmKODg9KuAszc
	ShcgwF72avoVUqoJl9XmTeeiTfjCOY8=
Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73])
 by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3,
 cipher=TLS_AES_256_GCM_SHA384) id us-mta-661-1n0yzDg0Mna-IAiZG9I8KA-1; Tue,
 14 Nov 2023 07:27:26 -0500
X-MC-Unique: 1n0yzDg0Mna-IAiZG9I8KA-1
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B828629AA3AF;
	Tue, 14 Nov 2023 12:27:25 +0000 (UTC)
Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 4DA142166B26;
	Tue, 14 Nov 2023 12:27:25 +0000 (UTC)
Received: by tpad.localdomain (Postfix, from userid 1000)
	id 8334C400DF741; Tue, 14 Nov 2023 09:26:53 -0300 (-03)
Date: Tue, 14 Nov 2023 09:26:53 -0300
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Michal Hocko <mhocko@suse.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [patch 0/2] mm: too_many_isolated can stall due to out of sync
 VM counters
Message-ID: <ZVNnjVdeNblG1l8t@tpad>
References: <20231113233420.446465795@redhat.com>
 <ZVMtuYLviLYqAI7x@tiehlicka>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ZVMtuYLviLYqAI7x@tiehlicka>
X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6
X-Rspamd-Server: rspam09
X-Rspamd-Queue-Id: 579901C0007
X-Stat-Signature: 5tb39oaubg7t6yiuco9ft7oan1ud6miy
X-Rspam-User: 
X-HE-Tag: 1699964850-151000
X-HE-Meta: U2FsdGVkX1/xK9ZebQscq/a4p1GY/lIaVMaSwfoGgf29zYau8/+a9Jf41og4AjZLrYba/xi+uIbtBvVJESp+D+UcrwSHnS6y375ITK5C19UFm4YXGjNb3pKUAxRgFzgh5UW4U7d2ccfWA37cQ48D+A5OQJS5Aw/l5cfxckyUuSZxSru9n975oyK5OWjdEcXiuRv7u+M4fYg0on1XR3HbGj0YrOR04Q3oRltpjzf3vR9rOFy8KXEsrsUUBbRNzpF9DN2OJM4d8ZXaykg8+FMOH5HxlQj6/+FRfbU/rjpFLyBLRVXVjKf6n4pyKGGPB6Ihq8Z1u0xfjf0tefyOHcN7aLSaQ8XJcFZwh8+sbIVFkjew3NSeBZ0gf9+wVUVpl1hg8sEyujUHPgSVO6f72tWwAa20A1aqplp3YLky1Wv++01AKbd5f4Zvpnz/wiZOwD4Uat+e/lXEeFKYd1vlXxpoTB6XB4HSCkBkSIKuy2SNr1KiL7jyEpP99cUCK+r80SXnQAgs/5HQw793Zj4hX5aXdktyrxFU+uoF1/UKbNfSyGGQi+cnBj3BlFs+5D/3y2yj4p2qxQp0oqHXiyGU3iKGCX4KUx/ZeajySMfFEpWnbTGA+LAHx3di1av13FlzGaq0M1vjLUmM5mYggxdw4JMy4KrRn5gHyDyGlaScS8rti+USWuoUHxvqDk2WbkKANzrc/FsFtbQ1IZXnGec/1yY9suyUw8v/r8Y5nYp573No0Yp+G6QAfQXN5zlfs8g+X62obAvujs4HYIX/DHQIFLPBTTGd4V7G1V/CqcAxEWCGKspZtZFrvgErfXjFQ/mRtBUclvvdzDC1h14BT2VWYEjPx3GgCNrojiZwgeedzLpY5tzNRugkrREb0ktM4ureZAiTBuOTB5a3np5wGGDvRH98JSSoFoAB5osxt9lvrH0HcArYnZSfAvvXDritIpXy/MiLAVG1gjCB8Nd0sE+MJLg
 DG78cLqL
 ewK8Ou2W9PD0C1Y2LNUav0X1ZUB9hh8RkBkyxiF0lN01fdyKVcqcR0wPI/JVoz+15m52gAK7kHtlVS1OEognCL6q2pPNwYirDfHhazJUF68MhWlRg8i9jfQTEcmoM8ZmQVxJlkK98NHZz3POVlzcsbyxgNBb6IBZGT4Yyjom7h63jkk4+VtIWSxqe7vVO+t4LQPznc2i32jCsZ4Q=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Hi Michal,

On Tue, Nov 14, 2023 at 09:20:09AM +0100, Michal Hocko wrote:
> On Mon 13-11-23 20:34:20, Marcelo Tosatti wrote:
> > A customer reported seeing processes hung at too_many_isolated,
> > while analysis indicated that the problem occurred due to out
> > of sync per-CPU stats (see below).
> > 
> > Fix is to use node_page_state_snapshot to avoid the out of stale values.
> > 
> > 2136 static unsigned long
> >     2137 shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >     2138                      struct scan_control *sc, enum lru_list lru)
> >     2139 {
> >     :
> >     2145         bool file = is_file_lru(lru);
> >     :
> >     2147         struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> >     :
> >     2150         while (unlikely(too_many_isolated(pgdat, file, sc))) {
> >     2151                 if (stalled)
> >     2152                         return 0;
> >     2153
> >     2154                 /* wait a bit for the reclaimer. */
> >     2155                 msleep(100);   <--- some processes were sleeping here, with pending SIGKILL.
> >     2156                 stalled = true;
> >     2157
> >     2158                 /* We are about to die and free our memory. Return now. */
> >     2159                 if (fatal_signal_pending(current))
> >     2160                         return SWAP_CLUSTER_MAX;
> >     2161         }
> > 
> > msleep() must be called only when there are too many isolated pages:
> 
> What do you mean here?

That msleep() must not be called when

isolated > inactive

is false.

> >     2019 static int too_many_isolated(struct pglist_data *pgdat, int file,
> >     2020                 struct scan_control *sc)
> >     2021 {
> >     :
> >     2030         if (file) {
> >     2031                 inactive = node_page_state(pgdat, NR_INACTIVE_FILE);
> >     2032                 isolated = node_page_state(pgdat, NR_ISOLATED_FILE);
> >     2033         } else {
> >     :
> >     2046         return isolated > inactive;
> > 
> > The return value was true since:
> > 
> >     crash> p ((struct pglist_data *) 0xffff00817fffe580)->vm_stat[NR_INACTIVE_FILE]
> >     $8 = {
> >       counter = 1
> >     }
> >     crash> p ((struct pglist_data *) 0xffff00817fffe580)->vm_stat[NR_ISOLATED_FILE]
> >     $9 = {
> >       counter = 2
> > 
> > while per_cpu stats had:
> > 
> >     crash> p ((struct pglist_data *) 0xffff00817fffe580)->per_cpu_nodestats
> >     $85 = (struct per_cpu_nodestat *) 0xffff8000118832e0
> >     crash> p/x 0xffff8000118832e0 + __per_cpu_offset[42]
> >     $86 = 0xffff00917fcc32e0
> >     crash> p ((struct per_cpu_nodestat *) 0xffff00917fcc32e0)->vm_node_stat_diff[NR_ISOLATED_FILE]
> >     $87 = -1 '\377'
> > 
> >     crash> p/x 0xffff8000118832e0 + __per_cpu_offset[44]
> >     $89 = 0xffff00917fe032e0
> >     crash> p ((struct per_cpu_nodestat *) 0xffff00917fe032e0)->vm_node_stat_diff[NR_ISOLATED_FILE]
> >     $91 = -1 '\377'
> 
> This doesn't really tell much. How much out of sync they really are
> cumulatively over all cpus?

This is the cumulative value over all CPUs (offsets for other CPUs 
have been omitted since they are zero).

> > It seems that processes were trapped in direct reclaim/compaction loop
> > because these nodes had few free pages lower than watermark min.
> > 
> >   crash> kmem -z | grep -A 3 Normal
> >   :
> >   NODE: 4  ZONE: 1  ADDR: ffff00817fffec40  NAME: "Normal"
> >     SIZE: 8454144  PRESENT: 98304  MIN/LOW/HIGH: 68/166/264
> >     VM_STAT:
> >           NR_FREE_PAGES: 68
> >   --
> >   NODE: 5  ZONE: 1  ADDR: ffff00897fffec40  NAME: "Normal"
> >     SIZE: 118784  MIN/LOW/HIGH: 82/200/318
> >     VM_STAT:
> >           NR_FREE_PAGES: 45
> >   --
> >   NODE: 6  ZONE: 1  ADDR: ffff00917fffec40  NAME: "Normal"
> >     SIZE: 118784  MIN/LOW/HIGH: 82/200/318
> >     VM_STAT:
> >           NR_FREE_PAGES: 53
> >   --
> >   NODE: 7  ZONE: 1  ADDR: ffff00997fbbec40  NAME: "Normal"
> >     SIZE: 118784  MIN/LOW/HIGH: 82/200/318
> >     VM_STAT:
> >           NR_FREE_PAGES: 52
> 
> How have you concluded that too_many_isolated is at root of this issue.

Because the customer observed the problem and obtained traces:

"If so, I have to mention about an another problem caused by vmstat issue here.
The customer experienced process hang like the issue reported here, but in this case the
process was trapped in compaction route. In shrink_inactive_list(), reclaim_throttle() is called
when too_many_isolated() is true. In fact confirmed from memory dump, there was no isolated
pages but zone's vmstat have 2 counts as isolated pages and percpu vmstats have -2 counts. 

  too_many = isolated > (inactive + active) / 2;

There was no more inactive and active pages. As the result, the process was throttled in
this point again and again until finish of parallel reclaimers who did not exist there in real."

> With a very low NR_FREE_PAGES and many contending allocation the system
> could be easily stuck in reclaim. What are other reclaim
> characteristics? 

I can ask. What information in particular do you want to know?

> Is the direct reclaim successful? 

Processes are stuck in too_many_isolated (unnecessarily). What do you mean when you ask
"Is the direct reclaim successful", precisely?