From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E463C27C79 for ; Wed, 12 Jun 2024 01:41:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D127E6B0129; Tue, 11 Jun 2024 21:41:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C9A3F6B012A; Tue, 11 Jun 2024 21:41:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B13E06B012B; Tue, 11 Jun 2024 21:41:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 900DA6B0129 for ; Tue, 11 Jun 2024 21:41:00 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 324F041190 for ; Wed, 12 Jun 2024 01:41:00 +0000 (UTC) X-FDA: 82220533080.08.34E06B2 Received: from out30-119.freemail.mail.aliyun.com (out30-119.freemail.mail.aliyun.com [115.124.30.119]) by imf13.hostedemail.com (Postfix) with ESMTP id 77D5A2000F for ; Wed, 12 Jun 2024 01:40:56 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=nNbtfUT8; spf=pass (imf13.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.119 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718156457; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1/401mu9gQEGLWBhJwl/y7E9wdlX+Wwp801PQuc4g0Q=; b=cqXa4D4PwWKKQu0cNZeDsvCkQBKr+4z75nT2zs/El40OWpvhfzNWTDPoUUowk6jg9vC/jm GY7VqOrzLjMi5blNxSm1vxeEJifgO4EgPg9wZvd7oDW3CYK7riJeuCjOiGUcEIR0CGsBZY ppgg4lEkIgcNe1aoc6GmfjwE1IwuFDI= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=nNbtfUT8; spf=pass (imf13.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.119 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718156457; a=rsa-sha256; cv=none; b=5kGxE4TM2goWGk5DfXEvpwMxnRJDQrm13wBVdr7/JfQgx2owHwMM5ZrSWoCXzRwaWVrd87 99ybG30rJ/xqcHL8NfZKlH08uIaDCaYS+6QGgiKxs0S0XgJ4k9m/PdIys6j9UqUXkmv1mg znRIQWMsvZY4ZDq+8Doh3dvQ+RrVAHM= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1718156453; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=1/401mu9gQEGLWBhJwl/y7E9wdlX+Wwp801PQuc4g0Q=; b=nNbtfUT8Qj5NXjY1HyD6j0Tya6A2PGvtnUnZH8CJVW37lYsPudS9BM+MLZ/ImvJ+ILEUuAqXYdoo/zKkSbMqTqVSURRwz02ziIFgtwWaI28N+YiNS8Y5NttZx8M7prIqiK7Z067ZZt+czUPWICljgWKp9vlQQ6vnNLbKTR5VAuA= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R781e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033045075189;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0W8I3m95_1718156451; Received: from 30.97.56.60(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0W8I3m95_1718156451) by smtp.aliyun-inc.com; Wed, 12 Jun 2024 09:40:52 +0800 Message-ID: <7cfe60a2-cd8e-4af2-a5d9-6c790b6dd665@linux.alibaba.com> Date: Wed, 12 Jun 2024 09:40:51 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: LTP: fork13: kernel panic on rk3399-rock-pi-4 running mainline 6.10.rc3 To: David Hildenbrand , Naresh Kamboju , open list , linux-mm , lkft-triage@lists.linaro.org, Linux Regressions Cc: Andrew Morton , willy@infradead.org, Kefeng Wang , Barry Song , Ryan Roberts , "Russell King (Oracle)" , Dan Carpenter , Arnd Bergmann , Anders Roxell References: <80a05784-21dd-4f20-b441-1e2a2be0e0ff@linux.alibaba.com> <3ce90ee2-f51b-49fc-b010-f97e61e617f4@redhat.com> From: Baolin Wang In-Reply-To: <3ce90ee2-f51b-49fc-b010-f97e61e617f4@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 77D5A2000F X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: zmsggpmfayq5x3sbn5oncybuiw89nahy X-HE-Tag: 1718156456-747155 X-HE-Meta: U2FsdGVkX1/diMr0sJ+c/PGFv/xqmefREc5jHgJz0Z+N43GikyvwesniPUMrAJ0cB3YVaQx+/Mg41ixvDjtdr15xXgP4EePHmn6YfAIYd/ZiQNmdEpLkByLhFjE5ckw70p0saFQ4k8EgVfwUsINn40rDG6jWmSRx1W6ArZXrCXboU6Y+zqELKgVYPs6dirozhgMyjsPbBAYzUKeeteXG0RfX6RenMMfYCOpzR01KSYshJztXzbsqFdSxqf1fBFdDdQB/CZV9Zkkzi0FY09D3H8i9eCC8RvLUGIDYpDqUIzdlnfyUaZ4793KlkfUWY5WlCtSzz19KxdIApSTMki4P6yD6Jc6c4ZQ2i6yaXEKDmgyOEnD4jlSDcAO7+bgXok8Yko1PlP8RCcGndV8B4+h9IRNjhgQ7+K0fqKNTuy6Z40s9CQlaeFXT5RRgy0wWcWw9C4Mhc6bNvHexg+8zQC+oZCQInkqojDL5A3mqi9uMbkTcF07naqGCqKQWm7asFY15Zu+Gju1RKxc5TAYZFpO9GQx1Yx3k5CXDt8TeTJVXwyoZi8QZPhN8VsH558d1YgEKlaicjAkbDTBVNrbu6AxgjwLeRkpPHmLsYVaiZk5UToxjfObRArMx0qdYjILnz+xtF+d9VotpXYOoaTrmbg/TiTa5Nzkf1hMt3CU7zGXmXdKvlZkXVIj2Qk1RMS2Y0FwPXt9nvB1iFMhYgo7ZeZjleq88sUeLDj9TFIr25SfYHwEmUWeSMGMa8yOGaskDgZMsLwaP/wQwvn1qcyZAia6yEZ4fwgiCSAyJGTV9SpiJq4DLRkCK1nyOL+xkiaflKE0dsIdiyTd4EkCnNYxPtSmumRtEEOHY6LZ5PoJsJ+aJ/UkkgyTChyiipFhG5B47AO7BP00NkXdVW2nby3mBvUbpyhp7XZ2LStpnMWRSq7G9KrWq320FR52aYyWxP6rZtoAk8Y29kQm3LL4qWPOY3Nv eetBI61x th46QKI+reO3dq9RVy9Qz3m9+JOerxo7iwb4RpTZowH0By4WRshVyfJh8ggbMDpVIHX7yVi4U6yU1GFJ92sKR9kl8+N8SF8YhHrBDo91BPJpo5N0W3rC5z3ErcJ0AKbwJK/P1iCShLniOFhXNhzvEw/3MhwSXtqXEPIGBsnPgHdzIKc1O9wYCfzAPg78xihVihLWjyBsF+83kRipHcdshsSJ6kjkXA9sdQmuXxGC+ZY4/NNWjosR4wNsved9z8UQsmQx01K/aFyLYYVthJ2cJyxNhYWZAINl6Kv8pPQW3+frlsqdSOdUeZXm8RSsGz7sJqOto6yUjbj8EWWUmirH5hg1/I92Zr7R20NbIdbg0y9vWw74S1B81D14TUYkTCK6xOgtK6XgMh6yHwmSS1BOTn4aha6G786vKD8pDWOhO/7SMBwZyHQylSSwpcYlnXR2VvsdBSGdT9Kjm7fhd0bMM0ZrLDs3LSSbIM7UpwdnTCvaqm9+R8GtShTZazYyoMQcnDpru7Ek0AVzmb1RUNVtdDfGvU4kKczKkbBDj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/6/11 20:09, David Hildenbrand wrote: > On 11.06.24 13:39, Baolin Wang wrote: >> >> >> On 2024/6/11 18:32, David Hildenbrand wrote: >>> On 11.06.24 12:14, Naresh Kamboju wrote: >>>> The kernel panic was noticed while running LTP syscalls fork13 (long >>>> running) on >>>> the mainline master 6.10.rc3 kernel on arm64 rk3399-rock-pi-4 device. >>>> >>>> Please find detailed logs in the links, >>>> >>>> As you know fork13 is a stress test case trying to generate a maximum >>>> number >>>> of PID's in a 100,000 loop. >>>> >>>> This device is running via NFS mounted filesystem. >>>> >>>> I have tried to reproduce this problem in a loop but failed to >>>> reproduce the >>>> crash. >>>> >>>> >>>> Crash flow: >>>> ------ >>>> fork13 run started >>>> BUG: Bad page map in process fork13 >>>> BUG: Bad rss-counter state mm: >>>> Unable to handle kernel paging request at virtual address >>>> Internal error: Oops: 0000000096000046 >>>> run for 800 secs ( 13 minutes) and more. >>>> fork14 run started and completed >>>> >>>> fpathconf01 run started and completed >>>> sugov: >>>> >>>> Unable to handle kernel NULL pointer dereference at virtual address >>>> >>>> Insufficient stack space to handle exception! >>>> end Kernel panic - not syncing: kernel stack overflow >>>> >>>> I have tried to decode stack dump by not being useful [1]. >>>> [1] https://people.linaro.org/~naresh.kamboju/output-rk3399.txt >>>> >>>> Test log : >>>> -------- >>>> tst_test.c:1733: TINFO: LTP version: 20240524 >>>> tst_test.c:1617: TINFO: Timeout per run is 0h 15m 00s >>>> [  904.280569] BUG: Bad page map in process fork13  pte:2000000019ffc3 >>>> pmd:80000000df55003 >>>> [  904.281397] page: refcount:1 mapcount:-1 mapping:0000000000000000 >>>> index:0x0 pfn:0x19f >>> >>> Mapcount underflow on a small folio (head: not printed). >>> >>> [...] >>> >>>> [  904.294564] BUG: Bad page map in process fork13  pte:200000002e4fc3 >>>> pmd:80000000df55003 >>>> [  904.295275] page: refcount:2 mapcount:-1 mapping:000000007885152f >>>> index:0x6 pfn:0x2e4 >>> >>> Another mapcount underflow on a small folio (head: not printed). >>> >>> >>>> [  904.309309] BUG: Bad page map in process fork13  pte:20000000cc6fc3 >>>> pmd:80000000df55003 >>>> [  904.310031] page: refcount:1 mapcount:-1 mapping:0000000000000000 >>>> index:0x6 pfn:0xcc6 >>>> [  904.310728] head: order:3 mapcount:-1 entire_mapcount:0 >>>> nr_pages_mapped:8388607 pincount:0 >>> >>> Mapcount underflow on a large folio. >>> >>> ... >>> >>>> [  904.326666] BUG: Bad page map in process fork13  pte:20000000268fc3 >>>> pmd:80000000df55003 >>>> [  904.327390] page: refcount:1 mapcount:-1 mapping:00000000f0624181 >>>> index:0x1b pfn:0x268 >>> >>> Another mapcount underflow on a small folio (head: not printed). >>> >>>> [  904.328094] memcg:ffff0000016b4000 >>>> [  904.328401] aops:nfs_file_aops ino:8526e6 dentry >>>> name:"libgpg-error.so.0.36.0" >>>> [  904.329051] flags: >>>> 0x3fffe000000002c(referenced|uptodate|lru|node=0|zone=0|lastcpupid=0x1ffff) >>>> [  904.329878] raw: 03fffe000000002c fffffdffc0009a48 fffffdffc022f3c8 >>>> ffff00000688bd60 >>>> [  904.330561] raw: 000000000000001b 0000000000000000 00000001fffffffe >>>> ffff0000016b4000 >>>> [  904.331240] page dumped because: bad pte >>>> [  904.331590] addr:0000aaaad9afe000 vm_flags:00000075 >>>> anon_vma:0000000000000000 mapping:ffff0000300d4188 index:2e >>>> [  904.332476] file:fork13 fault:filemap_fault mmap:nfs_file_mmap >>>> read_folio:nfs_read_folio >>>> [  904.333245] CPU: 5 PID: 22685 Comm: fork13 Tainted: G    B >>> >>> >>> Are these maybe side-effects due to >>> >>> https://lkml.kernel.org/r/20240607103241.1298388-1-wangkefeng.wang@huawei.com >> >> IIUC, the rk3399-rock-pi-4b device has no NUMA nodes (6 arm64 cores), so >> I don't think the numa balancing will cause this issue. >> >> Anyway, I will run fork13 test case on my arm64 server to try. > > I have the faint recollection that we can (or at least could in the > past) end up in do_numa_page() also without NUMA hinting. > > I documented in 51d3d5eb74ff53b92dcff48b30ae2ed8edd85a32: > > " > Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even > without NUMA hinting (which currently doesn't seem to be applicable to > shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA. > " I see. Thanks for explanation.