From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04F2ACA0ED3 for ; Tue, 3 Sep 2024 02:34:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A4668D0121; Mon, 2 Sep 2024 22:34:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 553FE8D0119; Mon, 2 Sep 2024 22:34:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F5FA8D0121; Mon, 2 Sep 2024 22:34:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 160D38D0119 for ; Mon, 2 Sep 2024 22:34:17 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 90069A8326 for ; Tue, 3 Sep 2024 02:34:16 +0000 (UTC) X-FDA: 82521857712.03.95C36E5 Received: from mailgw.kylinos.cn (mailgw.kylinos.cn [124.126.103.232]) by imf14.hostedemail.com (Postfix) with ESMTP id 337E5100003 for ; Tue, 3 Sep 2024 02:34:12 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of liuye@kylinos.cn designates 124.126.103.232 as permitted sender) smtp.mailfrom=liuye@kylinos.cn; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725330807; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UU7H4jDRRSk1Ad0slroVp13JWRECYna1HtUtXNG8NYg=; b=kMIDcppHYDdWZjhvxtUtpk8Bq6mauJTjSY5Xr83BJYDmN2M03MVrMXk8VDAeyeVBXB1aKr BCSUDhtlkQ7VW4HBt7kkCYl1IfPGltawpO0Zza9M2TfDouL+TxIU3KLKBubZmqpR+skg9b zGUUdVPjodfzI+IeMqokB7dHKvLMgJc= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of liuye@kylinos.cn designates 124.126.103.232 as permitted sender) smtp.mailfrom=liuye@kylinos.cn; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725330807; a=rsa-sha256; cv=none; b=Wns96ECHzXJX+duQx7tIWKqcFvX/p/lpSrEXnpcSwq+SASfU/1WJ+1FAJPMBrtgeAvhcX/ /2Ma4ooTaU1hIS7Q2XxRoFQ3Ce/wpoJzzvgOEYJvZd2IVJjjfY5KhjMg9CeKd29At3QeFE YxJvj6G/7RxUW1KalsNH2S4sEplGsms= X-UUID: fbd24c34699c11efa216b1d71e6e1362-20240903 X-CID-P-RULE: Release_Ham X-CID-O-INFO: VERSION:1.1.38,REQID:bd9abdb4-83d6-4806-a89b-0c02576fbc63,IP:0,U RL:0,TC:0,Content:-20,EDM:0,RT:0,SF:0,FILE:0,BULK:0,RULE:Release_Ham,ACTIO N:release,TS:-20 X-CID-META: VersionHash:82c5f88,CLOUDID:84177c2eb0d6bc05a720550c73e86ae0,BulkI D:nil,BulkQuantity:0,Recheck:0,SF:102,TC:0,Content:1,EDM:-3,IP:nil,URL:0,F ile:nil,RT:nil,Bulk:nil,QS:nil,BEC:nil,COL:0,OSI:0,OSA:0,AV:0,LES:1,SPR:NO ,DKR:0,DKP:0,BRR:0,BRE:0 X-CID-BVR: 0 X-CID-BAS: 0,_,0,_ X-CID-FACTOR: TF_CID_SPAM_SNR X-UUID: fbd24c34699c11efa216b1d71e6e1362-20240903 Received: from node2.com.cn [(10.44.16.197)] by mailgw.kylinos.cn (envelope-from ) (Generic MTA) with ESMTP id 2097771908; Tue, 03 Sep 2024 10:34:03 +0800 Received: from node2.com.cn (localhost [127.0.0.1]) by node2.com.cn (NSMail) with SMTP id 074F7B8075B2; Tue, 3 Sep 2024 10:34:03 +0800 (CST) X-ns-mid: postfix-66D6759A-908536510 Received: from [172.30.70.72] (unknown [172.30.70.72]) by node2.com.cn (NSMail) with ESMTPA id 99C42B8075B2; Tue, 3 Sep 2024 02:34:01 +0000 (UTC) Subject: Re: [PATCH] mm/vmscan: Fix hard LOCKUP in function isolate_lru_folios To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240815025226.8973-1-liuye@kylinos.cn> <20240823020443.7379-1-liuye@kylinos.cn> From: liuye Message-ID: <7211d74d-a002-8758-94f2-25eb58737fe8@kylinos.cn> Date: Tue, 3 Sep 2024 10:34:00 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <20240823020443.7379-1-liuye@kylinos.cn> Content-Type: multipart/alternative; boundary="------------D512D8E84F728BEB887423EE" Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 337E5100003 X-Stat-Signature: h4umsi64wfn5s9im9er91n5dwanb18h7 X-HE-Tag: 1725330852-441318 X-HE-Meta: U2FsdGVkX1+22M9/KQCVgHUTWkCDA2EVDFtFsp+rGXIEicwqZcLdj7UnrRDnBEz8f+bhfG222ZpMntiYXE/NJ6XA4vgROmQVLcETlMLqW5vs6jqUYWI7f2yQfbc4X0NawdSt16XFOtnkj3qWIoed7ZrVn8AIVWTihIgUjnH5D1YZycDa2t0QI+/F/cJ75UKjqZm92kp3eIjYNQhTtTp3wIO3SnPbLjwRcfgWZ5NifroYQQjFu18erhMJanGQA047Kud+sbiTzqi4hkSctILPgOtia8l54swtR7g67CQnUUlRbeSi4T7JEWtROabJC6c+BynR4OW1VvA73wiFpXkfIWMRR95pSrzJ2UImy0u09jfjB57SjEX3qG5+r6JL6pq+t/Glq6ozC+a9RRUrYefp9mhhxlgJvVHJOBnAbF4b58IbE//JJGouM5G6qZGewoBXptLCtAWbb1tjm1Smh+aa+MBRo9eawYbaG7jciJUFowP8FA63fFcz/FJKDArJ8lciQ+rMRmMPr1hj+/lqVBdJr8d1f2/AyNgrP67119nHkKpdCDyQZ+ri/qbpfahjhYacinBW4pdCCDXQEcPP917qz6LJ9lHgEuu3BV/b5+1ackp/oecgDQsjnMO7Ejsmn657fcbCJ8iUWxp84s8+moMZ3b1TNuWKAth3eq6A1WmI5fV11YjWzmOeWU3LqadRUvzT163CZZiG6W4D2G3if1A6k/m3x973CLtzc32PJNTPrK8bMqn1RmLczKwrZ1sU5KQMwx8jrXQaeevjbg2hxD66rKwtz+HkWt0hTlMz3M25i0gYm6yCHzNhCwYcwagge5OucnViMcrhDyrIwgWMCTLi9fSOylEetiUA8Oipl+Hx1xJBPA8uaKPDpP81XV08pINR0GRArfaE1R+vH7bbUGwTBQNdQq6ZsqdqGTZE+IyOt1bwhXvYZQq9iTgkqOtkOLfGwOrBiYYOFYxqll0jlpp tNmcINws A1AoI4dFeMdqPoXGsYQdzyWPAk5fqDt8poyct/+T9k0JO+yq/K4jgL6vqXqieJTfSMy911KPE/nqOSw8LBXMUgHYNohRxeioa0dBWuhZzmu1qilH1QDHHT2NUXENNKpT8PuvxP/uuNjsmbW5iRHAROfrjG3v9rOmodWbOMhE9OhrmFepXK5TMqZMEZNY4xDxLFJdCHU/SLsHrwJwSIvCXDwImvMUQEk9624KUKiAbMaKaZ01Gy+3gXe/ZG2RI9w0/axcrbDKGmqLDWQx1GYekubFZoA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This is a multi-part message in MIME format. --------------D512D8E84F728BEB887423EE Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable On 2024/8/23 =E4=B8=8A=E5=8D=8810:04, liuye wrote: > I'm sorry to bother you about that, but it looks like the following ema= il send 7 days ago, > did not receive a response from you. Do you mind having a look at this > when you have a bit of free time please? > >>>> Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-no= de basis") >>> Merged in 2016. >>> >>> Under what circumstances does it occur? >> User processe are requesting a large amount of memory and keep page ac= tive. >> Then a module continuously requests memory from ZONE_DMA32 area. >> Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm rea= ched. >> However pages in the LRU(active_anon) list are mostly from >> the ZONE_NORMAL area. >> >>> Can you please describe how to reproduce this? >> Terminal 1: Construct to continuously increase pages active(anon). >> mkdir /tmp/memory >> mount -t tmpfs -o size=3D1024000M tmpfs /tmp/memory >> dd if=3D/dev/zero of=3D/tmp/memory/block bs=3D4M >> tail /tmp/memory/block >> >> Terminal 2: >> vmstat -a 1 >> active will increase. >> procs -----------memory---------- ---swap-- -----io---- -system-- ----= ---cpu------- >> r b swpd free inact active si so bi bo in cs us = sy id wa st gu >> 1 0 0 1445623076 45898836 83646008 0 0 0 0 1807 = 1682 0 0 100 0 0 0 >> 1 0 0 1445623076 43450228 86094616 0 0 0 0 1677 = 1468 0 0 100 0 0 0 >> 1 0 0 1445623076 41003480 88541364 0 0 0 0 1985 = 2022 0 0 100 0 0 0 >> 1 0 0 1445623076 38557088 90987756 0 0 0 4 1731 = 1544 0 0 100 0 0 0 >> 1 0 0 1445623076 36109688 93435156 0 0 0 0 1755 = 1501 0 0 100 0 0 0 >> 1 0 0 1445619552 33663256 95881632 0 0 0 0 2015 = 1678 0 0 100 0 0 0 >> 1 0 0 1445619804 31217140 98327792 0 0 0 0 2058 = 2212 0 0 100 0 0 0 >> 1 0 0 1445619804 28769988 100774944 0 0 0 0 1729= 1585 0 0 100 0 0 0 >> 1 0 0 1445619804 26322348 103222584 0 0 0 0 1774= 1575 0 0 100 0 0 0 >> 1 0 0 1445619804 23875592 105669340 0 0 0 4 1738= 1604 0 0 100 0 0 0 >> >> cat /proc/meminfo | head >> Active(anon) increase. >> MemTotal: 1579941036 kB >> MemFree: 1445618500 kB >> MemAvailable: 1453013224 kB >> Buffers: 6516 kB >> Cached: 128653956 kB >> SwapCached: 0 kB >> Active: 118110812 kB >> Inactive: 11436620 kB >> Active(anon): 115345744 kB >> Inactive(anon): 945292 kB >> >> When the Active(anon) is 115345744 kB, insmod module triggers the ZONE= _DMA32 watermark. >> >> perf show nr_scanned=3D28835844. >> 28835844 * 4k =3D 115343376KB approximately equal to 115345744 kB. >> >> perf record -e vmscan:mm_vmscan_lru_isolate -aR >> perf script >> isolate_mode=3D0 classzone=3D1 order=3D1 nr_requested=3D32 nr_scanned=3D= 2 nr_skipped=3D2 nr_taken=3D0 lru=3Dactive_anon >> isolate_mode=3D0 classzone=3D1 order=3D1 nr_requested=3D32 nr_scanned=3D= 0 nr_skipped=3D0 nr_taken=3D0 lru=3Dactive_anon >> isolate_mode=3D0 classzone=3D1 order=3D0 nr_requested=3D32 nr_scanned=3D= 28835844 nr_skipped=3D28835844 nr_taken=3D0 lru=3Dactive_anon >> isolate_mode=3D0 classzone=3D1 order=3D1 nr_requested=3D32 nr_scanned=3D= 28835844 nr_skipped=3D28835844 nr_taken=3D0 lru=3Dactive_anon >> isolate_mode=3D0 classzone=3D1 order=3D0 nr_requested=3D32 nr_scanned=3D= 29 nr_skipped=3D29 nr_taken=3D0 lru=3Dactive_anon >> isolate_mode=3D0 classzone=3D1 order=3D0 nr_requested=3D32 nr_scanned=3D= 0 nr_skipped=3D0 nr_taken=3D0 lru=3Dactive_anon >> >> If increase Active(anon) to 1000G then insmod module triggers the ZONE= _DMA32 watermark. hard lockup will occur. >> >> In my device nr_scanned =3D 0000000003e3e937 when hard lockup. Convert= to memory size 0x0000000003e3e937 * 4KB =3D 261072092 KB. >> >> #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53 >> ffffc90006fb7c30: 0000000000000020 0000000000000000 >> ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000 >> ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8 >> ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48 >> ffffc90006fb7c70: 0000000000000000 0000000000000000 >> ffffc90006fb7c80: 0000000000000000 0000000000000000 >> ffffc90006fb7c90: 0000000000000000 0000000000000000 >> ffffc90006fb7ca0: 0000000000000000 0000000003e3e937 >> ffffc90006fb7cb0: 0000000000000000 0000000000000000 >> ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000 >> >>> Why do you think it took eight years to be discovered? >> The problem requires the following conditions to occur: >> 1. The device memory should be large enough. >> 2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL = area. >> 3. The memory in ZONE_DMA32 needs to reach the watermark. >> >> If the memory is not large enough, or if the usage design of ZONE_DMA3= 2 area memory is reasonable, this problem is difficult to detect. >> >> notes: >> The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL, but= other suitable scenarios may also trigger the problem. >> >>> It looks like that will fix, but perhaps something more fundamental >>> needs to be done - we're doing a tremendous amount of pretty pointles= s >>> work here. Answers to my above questions will help us resolve this. >>> >>> Thanks. >> Please refer to the above explanation for details. >> >> Thanks. > Thanks. Friendly ping. --------------D512D8E84F728BEB887423EE Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable


On 2024/8/23 =E4=B8=8A=E5=8D=8810:04, = liuye wrote:
I'm sorry to bother you abou=
t that, but it looks like the following email send 7 days ago,=20
did not receive a response from you. Do you mind having a look at this=20
when you have a bit of free time please?

Fixes: b2e18757f2c9 ("=
mm, vmscan: begin reclaiming pages on a per-node basis")
Merged in 2016.

Under what circumstances does it occur? =20
User processe are requesting a large amount of memory and keep page activ=
e.
Then a module continuously requests memory from ZONE_DMA32 area.
Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reache=
d.
However pages in the LRU(active_anon) list are mostly from=20
the ZONE_NORMAL area.

Can you please describe =
how to reproduce this? =20
Terminal 1: Construct to continuously increase pages active(anon).=20
mkdir /tmp/memory
mount -t tmpfs -o size=3D1024000M tmpfs /tmp/memory
dd if=3D/dev/zero of=3D/tmp/memory/block bs=3D4M
tail /tmp/memory/block

Terminal 2:
vmstat -a 1
active will increase.
procs -----------memory---------- ---swap-- -----io---- -system-- -------=
cpu-------
 r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy i=
d wa st gu
 1  0      0 1445623076 45898836 83646008    0    0     0     0 1807 1682=
  0  0 100  0  0  0
 1  0      0 1445623076 43450228 86094616    0    0     0     0 1677 1468=
  0  0 100  0  0  0
 1  0      0 1445623076 41003480 88541364    0    0     0     0 1985 2022=
  0  0 100  0  0  0
 1  0      0 1445623076 38557088 90987756    0    0     0     4 1731 1544=
  0  0 100  0  0  0
 1  0      0 1445623076 36109688 93435156    0    0     0     0 1755 1501=
  0  0 100  0  0  0
 1  0      0 1445619552 33663256 95881632    0    0     0     0 2015 1678=
  0  0 100  0  0  0
 1  0      0 1445619804 31217140 98327792    0    0     0     0 2058 2212=
  0  0 100  0  0  0
 1  0      0 1445619804 28769988 100774944    0    0     0     0 1729 158=
5  0  0 100  0  0  0
 1  0      0 1445619804 26322348 103222584    0    0     0     0 1774 157=
5  0  0 100  0  0  0
 1  0      0 1445619804 23875592 105669340    0    0     0     4 1738 160=
4  0  0 100  0  0  0

cat /proc/meminfo | head
Active(anon) increase.
MemTotal:       1579941036 kB
MemFree:        1445618500 kB
MemAvailable:   1453013224 kB
Buffers:            6516 kB
Cached:         128653956 kB
SwapCached:            0 kB
Active:         118110812 kB
Inactive:       11436620 kB
Active(anon):   115345744 kB  =20
Inactive(anon):   945292 kB

When the Active(anon) is 115345744 kB, insmod module triggers the ZONE_DM=
A32 watermark.

perf show nr_scanned=3D28835844.=20
28835844 * 4k =3D 115343376KB approximately equal to 115345744 kB.

perf record -e vmscan:mm_vmscan_lru_isolate -aR
perf script
isolate_mode=3D0 classzone=3D1 order=3D1 nr_requested=3D32 nr_scanned=3D2=
 nr_skipped=3D2 nr_taken=3D0 lru=3Dactive_anon
isolate_mode=3D0 classzone=3D1 order=3D1 nr_requested=3D32 nr_scanned=3D0=
 nr_skipped=3D0 nr_taken=3D0 lru=3Dactive_anon
isolate_mode=3D0 classzone=3D1 order=3D0 nr_requested=3D32 nr_scanned=3D2=
8835844 nr_skipped=3D28835844 nr_taken=3D0 lru=3Dactive_anon
isolate_mode=3D0 classzone=3D1 order=3D1 nr_requested=3D32 nr_scanned=3D2=
8835844 nr_skipped=3D28835844 nr_taken=3D0 lru=3Dactive_anon
isolate_mode=3D0 classzone=3D1 order=3D0 nr_requested=3D32 nr_scanned=3D2=
9 nr_skipped=3D29 nr_taken=3D0 lru=3Dactive_anon
isolate_mode=3D0 classzone=3D1 order=3D0 nr_requested=3D32 nr_scanned=3D0=
 nr_skipped=3D0 nr_taken=3D0 lru=3Dactive_anon

If increase Active(anon) to 1000G then insmod module triggers the ZONE_DM=
A32 watermark. hard lockup will occur.

In my device nr_scanned =3D 0000000003e3e937 when hard lockup. Convert to=
 memory size 0x0000000003e3e937 * 4KB =3D 261072092 KB.

#5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
    ffffc90006fb7c30: 0000000000000020 0000000000000000=20
    ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000=20
    ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8=20
    ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48=20
    ffffc90006fb7c70: 0000000000000000 0000000000000000=20
    ffffc90006fb7c80: 0000000000000000 0000000000000000=20
    ffffc90006fb7c90: 0000000000000000 0000000000000000=20
    ffffc90006fb7ca0: 0000000000000000 0000000003e3e937=20
    ffffc90006fb7cb0: 0000000000000000 0000000000000000=20
    ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000=20

Why do you think it took=
 eight years to be discovered?
The problem requires the following conditions to occur:
1. The device memory should be large enough.
2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL are=
a.
3. The memory in ZONE_DMA32 needs to reach the watermark.

If the memory is not large enough, or if the usage design of ZONE_DMA32 a=
rea memory is reasonable, this problem is difficult to detect.

notes:
The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL, but ot=
her suitable scenarios may also trigger the problem.

It looks like that will =
fix, but perhaps something more fundamental
needs to be done - we're doing a tremendous amount of pretty pointless
work here.  Answers to my above questions will help us resolve this.

Thanks.
Please refer to the above explanation for details.

Thanks.
Thanks.
Friendly ping.


--------------D512D8E84F728BEB887423EE--