From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 87DC4C4332F
	for <linux-mm@archiver.kernel.org>; Tue, 14 Nov 2023 12:46:47 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 07D3680009; Tue, 14 Nov 2023 07:46:47 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 02D6E8D002E; Tue, 14 Nov 2023 07:46:46 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id E5E3280009; Tue, 14 Nov 2023 07:46:46 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id D5A998D002E
	for <linux-mm@kvack.org>; Tue, 14 Nov 2023 07:46:46 -0500 (EST)
Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id B0BD4A034B
	for <linux-mm@kvack.org>; Tue, 14 Nov 2023 12:46:46 +0000 (UTC)
X-FDA: 81456534012.15.76D67FB
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29])
	by imf20.hostedemail.com (Postfix) with ESMTP id 990DC1C0018
	for <linux-mm@kvack.org>; Tue, 14 Nov 2023 12:46:44 +0000 (UTC)
Authentication-Results: imf20.hostedemail.com;
	dkim=pass header.d=suse.com header.s=susede1 header.b=RYhjiO6K;
	spf=pass (imf20.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com;
	dmarc=pass (policy=quarantine) header.from=suse.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1699966005;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=aNAmZc/h0Qn920knBrNSBJHjjOcELL6JYd/Oi6WCTf0=;
	b=SjRDx6xGiN+3yf1lKeEmK/nHM9rpeW1iBICzsvc+uaGH0WIO3cFHcnPmwQMKKoFE39/KHe
	FHbDtreASFEdotoCh7EucBcgWP182/JDtWQeHgcTnIBXcT/AueRAN51pWcQJNObaxT+Jku
	WOTUqo0tJxQlQnmsfuymcA+8OtNowRs=
ARC-Authentication-Results: i=1;
	imf20.hostedemail.com;
	dkim=pass header.d=suse.com header.s=susede1 header.b=RYhjiO6K;
	spf=pass (imf20.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com;
	dmarc=pass (policy=quarantine) header.from=suse.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699966005; a=rsa-sha256;
	cv=none;
	b=Uq+hAoCG6ymsL5YGmn0cXtFQak6KbfJ98R8VkpOihHXsrcPgAu/zLvTaNSXwJzX0Om9abc
	6mBB+lAmwUElGOx8PoHjLKOmLWCejTYbo7VnrWXZipbVLl5qWimxi6uOwXFq3N/Vnw+qF3
	k7HzxldPMzSRrXox8zGIQDEubMhofT4=
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
	(No client certificate requested)
	by smtp-out2.suse.de (Postfix) with ESMTPS id 6D7B51F86A;
	Tue, 14 Nov 2023 12:46:42 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1;
	t=1699966002; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=aNAmZc/h0Qn920knBrNSBJHjjOcELL6JYd/Oi6WCTf0=;
	b=RYhjiO6KtHBY6C7PWMVN65uGPd4AWaLtExHAHt3sspPmqvtR6KAVfCErbP9UuWcU2o1wST
	vmkNgB291pJIQSX1Q2wWOhj3jiydp1P/wh/7rtpjEPYYjuzAA1YAO7EJ8GMXeFfmcrIkpF
	ltk06biBkpugwH6HSkAtsB1jQgGjjls=
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
	(No client certificate requested)
	by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4FBD113416;
	Tue, 14 Nov 2023 12:46:42 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
	by imap2.suse-dmz.suse.de with ESMTPSA
	id f4BwEDJsU2XMdAAAMHmgww
	(envelope-from <mhocko@suse.com>); Tue, 14 Nov 2023 12:46:42 +0000
Date: Tue, 14 Nov 2023 13:46:41 +0100
From: Michal Hocko <mhocko@suse.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [patch 0/2] mm: too_many_isolated can stall due to out of sync
 VM counters
Message-ID: <ZVNsMVPJ5y8C_hBC@tiehlicka>
References: <20231113233420.446465795@redhat.com>
 <ZVMtuYLviLYqAI7x@tiehlicka>
 <ZVNnjVdeNblG1l8t@tpad>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ZVNnjVdeNblG1l8t@tpad>
X-Rspamd-Queue-Id: 990DC1C0018
X-Rspam-User: 
X-Stat-Signature: o7p7tpobw45e19ydobphqnr9gfhnjq18
X-Rspamd-Server: rspam01
X-HE-Tag: 1699966004-372118
X-HE-Meta: U2FsdGVkX18R0KAfY/QXsZ9mgBa/N3MPSmrcIqAm2N9Q/pSvXXz0LEYy4uhA+w5/J7N558wfL7dVGUrqq5UeBvTNXhY9tj/KP8y0egqaB0j/jlI5SXyxpKi41lWugZYwwaeLjAxMuI2ExkXDL164YLtYGTc2AqUdXQCRQ1UR4wxDy0HkF5VoNuqW6woX2wvyxUJ9D+dOEf+7W+sHaq+qBtrT+DSGJRNfJai9dAZNwZqFngkyOzm9GVyIAi4oxF6dbhBwwyLcneWUQFjdCECtHuOFoo0LLBatGdTBELyZZnUA+T9dwUPEI/XKStuVNhDV3SFW+a0YZyIUIDnIFGjI/8AwazXjIf15xDFVTmfePV8R0yTTUQpSBER52mifjC7th4XasQSuJ4wWZ0ToiUbrn9JTakUvUVWd8zJ9PTnWdYACQxGVHpMfeLRlNkTOLQwFGCYDyeYi0bFSbUUFfDO5jIU5pc4AcQlG+y8+zKM0olLdEgy0rV0vIki9qyV4LPiHfbI1yPuToJzp6EqdOJGtR2pHHE6foTVICtUyVARIncCKcSGGePqA8IhLHJQ4N4gTYy+IZFsn9rNWTopavqyhYzuwVXqgi6p04JOZaG5zjbFbiOGykFJhd1mzeWxvYS1tFaYp3ByDaVOcz+U+Ct/26h8oX7gc4UkAtR1Vpt4DMEe3zCU/otUbSNaFZh6k91CfsKz6Ve/v+oAk65Yl4cHYIoia/NyN3Nfv5i5PxOA/5tV0r4jd6Pa/PkO8nqqD6cb4fSCKaVnMVpyw1PIh1EY1WBGTE1rDhHZtdcn0K4uVe4YAb0KBmph6mT/CQvximMZiPgcFHNUNTa/qGy4/M4Xe33DFuxmHn/+LFMXi0i6emotSwDeld38FtGUZ2NJUC0TbQwTLRxpiWirBVua082sGguSXTv06w1DSXm8TQgu7iSddwCh9hExKlnj4K6JBj0MVj5ko5uvYWEjU9x0i9+c
 17QOCQ7o
 ywHu8Urnj3WP09INffUVGZk7RB96sYU0U6PczilZlQBKWRxbqvXJNOb5nkPTHV/lnQzd44sLyeFwgE4sFEmOZNUIQpowDlntWckVxrM1j39SHxVA6/rPf02xYgOB4nRZA8SEMVeSTGYgFHEaYdsMLjLkMK8NsAUebdsgqcfKgta1sjmTxjRXaKVFwlCQ9HTV7nURwkZifz+06PdfHskbO3WW8xoJ00BZKG9ft2INiP91euaY=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Tue 14-11-23 09:26:53, Marcelo Tosatti wrote:
> Hi Michal,
> 
> On Tue, Nov 14, 2023 at 09:20:09AM +0100, Michal Hocko wrote:
> > On Mon 13-11-23 20:34:20, Marcelo Tosatti wrote:
> > > A customer reported seeing processes hung at too_many_isolated,
> > > while analysis indicated that the problem occurred due to out
> > > of sync per-CPU stats (see below).
> > > 
> > > Fix is to use node_page_state_snapshot to avoid the out of stale values.
> > > 
> > > 2136 static unsigned long
> > >     2137 shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> > >     2138                      struct scan_control *sc, enum lru_list lru)
> > >     2139 {
> > >     :
> > >     2145         bool file = is_file_lru(lru);
> > >     :
> > >     2147         struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> > >     :
> > >     2150         while (unlikely(too_many_isolated(pgdat, file, sc))) {
> > >     2151                 if (stalled)
> > >     2152                         return 0;
> > >     2153
> > >     2154                 /* wait a bit for the reclaimer. */
> > >     2155                 msleep(100);   <--- some processes were sleeping here, with pending SIGKILL.
> > >     2156                 stalled = true;
> > >     2157
> > >     2158                 /* We are about to die and free our memory. Return now. */
> > >     2159                 if (fatal_signal_pending(current))
> > >     2160                         return SWAP_CLUSTER_MAX;
> > >     2161         }
> > > 
> > > msleep() must be called only when there are too many isolated pages:
> > 
> > What do you mean here?
> 
> That msleep() must not be called when
> 
> isolated > inactive
> 
> is false.

Well, but the code is structured in a way that this is simply true.
too_many_isolated might be false positive because it is a very loose
interface and the number of isolated pages can fluctuate depending on
the number of direct reclaimers.
 
> > >     2019 static int too_many_isolated(struct pglist_data *pgdat, int file,
> > >     2020                 struct scan_control *sc)
> > >     2021 {
> > >     :
> > >     2030         if (file) {
> > >     2031                 inactive = node_page_state(pgdat, NR_INACTIVE_FILE);
> > >     2032                 isolated = node_page_state(pgdat, NR_ISOLATED_FILE);
> > >     2033         } else {
> > >     :
> > >     2046         return isolated > inactive;
> > > 
> > > The return value was true since:
> > > 
> > >     crash> p ((struct pglist_data *) 0xffff00817fffe580)->vm_stat[NR_INACTIVE_FILE]
> > >     $8 = {
> > >       counter = 1
> > >     }
> > >     crash> p ((struct pglist_data *) 0xffff00817fffe580)->vm_stat[NR_ISOLATED_FILE]
> > >     $9 = {
> > >       counter = 2
> > > 
> > > while per_cpu stats had:
> > > 
> > >     crash> p ((struct pglist_data *) 0xffff00817fffe580)->per_cpu_nodestats
> > >     $85 = (struct per_cpu_nodestat *) 0xffff8000118832e0
> > >     crash> p/x 0xffff8000118832e0 + __per_cpu_offset[42]
> > >     $86 = 0xffff00917fcc32e0
> > >     crash> p ((struct per_cpu_nodestat *) 0xffff00917fcc32e0)->vm_node_stat_diff[NR_ISOLATED_FILE]
> > >     $87 = -1 '\377'
> > > 
> > >     crash> p/x 0xffff8000118832e0 + __per_cpu_offset[44]
> > >     $89 = 0xffff00917fe032e0
> > >     crash> p ((struct per_cpu_nodestat *) 0xffff00917fe032e0)->vm_node_stat_diff[NR_ISOLATED_FILE]
> > >     $91 = -1 '\377'
> > 
> > This doesn't really tell much. How much out of sync they really are
> > cumulatively over all cpus?
> 
> This is the cumulative value over all CPUs (offsets for other CPUs 
> have been omitted since they are zero).

OK, so that means the NR_ISOLATED_FILE is 0 while NR_INACTIVE_FILE is 1,
correct? If that is the case then the value is indeed outdated but it
also means that the NR_INACTIVE_FILE is so small that all but 1 (resp. 2
as kswapd is never throttled) reclaimers will be stalled anyway. So does
the exact snapshot really help? Do you have any means to reproduce this
behavior and see that the patch actually changed the behavior?

[...]

> > With a very low NR_FREE_PAGES and many contending allocation the system
> > could be easily stuck in reclaim. What are other reclaim
> > characteristics? 
> 
> I can ask. What information in particular do you want to know?

When I am dealing with issues like this I heavily rely on /proc/vmstat
counters and pgscan, pgsteal counters to see whether there is any
progress over time.

> > Is the direct reclaim successful? 
> 
> Processes are stuck in too_many_isolated (unnecessarily). What do you mean when you ask
> "Is the direct reclaim successful", precisely?

With such a small LRU list it is quite likely that many processes will
be competing over last pages on the list while rest will be throttled
because there is nothing to reclaim. It is quite possible that all
reclaimers will be waiting for a single reclaimer (either kswapd or
other direct reclaimer). I would like to understand whether the system
is stuck in unproductive state where everybody just waits until the
counter is synced or everything just progress very slowly because of the
small LRU. 
-- 
Michal Hocko
SUSE Labs