From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.4 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 531A2C433EF for ; Mon, 6 Sep 2021 21:12:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9674F6105A for ; Mon, 6 Sep 2021 21:12:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9674F6105A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=roeck-us.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id A1046900002; Mon, 6 Sep 2021 17:12:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9BEE46B0072; Mon, 6 Sep 2021 17:12:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8384A900002; Mon, 6 Sep 2021 17:12:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0239.hostedemail.com [216.40.44.239]) by kanga.kvack.org (Postfix) with ESMTP id 73CFD6B0071 for ; Mon, 6 Sep 2021 17:12:03 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 149838249980 for ; Mon, 6 Sep 2021 21:12:03 +0000 (UTC) X-FDA: 78558396126.27.08CA4B4 Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50]) by imf16.hostedemail.com (Postfix) with ESMTP id AD5FFF00008C for ; Mon, 6 Sep 2021 21:12:02 +0000 (UTC) Received: by mail-ot1-f50.google.com with SMTP id v33-20020a0568300921b0290517cd06302dso10195727ott.13 for ; Mon, 06 Sep 2021 14:12:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=3BYjk6UCABzgfhpa13DCMqGWNU4bi0bpYOlCKs8Hqko=; b=RYDLV28zbOwsoT+C4MSA2mk/XschIQgxME6ygbBF9hQSP9AN07edN49qCdlBYfpWCj vqNm0R56RFjSL00hX/gkhUNp/Kv1tqnlG2scR1qBioId9SDngpF9nNkSyi2lSBTFY2Jb S8Fs4VBsRK1PoGR0iozHcgQhFZ47N2iAVh4bbcj4EvmfvnHKoaspG4sMbhhISFPHQRx6 67bp2hgRdMBIJ3WFRQrQ8Q+PnZSDMT556ELUPErcUMqUBNlHEDW9lNKrTrzzXCcOC9IP zJ6z+FwkuN4Yvp2h8fNzVPjluYFhMnY9kSZ6wWb1vJz2rFk/phd98fWUOhhNQqCjD+/I a0hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=3BYjk6UCABzgfhpa13DCMqGWNU4bi0bpYOlCKs8Hqko=; b=Nr9tVm+hfqlfxV9i2N7SYD3Uc7hDJIKaLDm+gLHSkWMDkTLjPyFY6pldm0y7CNzacL hX+e8CJrjrOqg2PM3K6yId00xrdQZYpI4JL6N0cd0pMdVSCEKxr0kBiZXvOjDCtm6FIX +VmFxTrBhQ5wENKCW5TMwakt3sSBKKOWubFSvCGu4RC55TNjMNof6yJcNh05CcohR+1K Sk8Z0tOh2t+70Fyoj+xxU+6cNXApD2vEPzhkyewPHpGZ/tmSjNuzp5m5mLWSEmCsDvK8 XkoeEN3US/WovsGe3BRZEbmFN1QcdcZLE8EZY9+MMzke/cjQnUV8ay8TI2OyaPAkDdUG xbLw== X-Gm-Message-State: AOAM532x32po7ki+pA3oGxn1xBCnguhTc3N3Rdd2xFw46DW0+n+rCJqD laqOg6AlntnaGIbdhnE0NKc= X-Google-Smtp-Source: ABdhPJyDbbTB1Jg7fm/VhzQ0cgeuwrHgfighJ3aM3U3MrNSJBx2HqT4gAtixN1p4+CPB3Y/jM2wKaQ== X-Received: by 2002:a05:6830:156:: with SMTP id j22mr11992646otp.75.1630962721982; Mon, 06 Sep 2021 14:12:01 -0700 (PDT) Received: from server.roeck-us.net ([2600:1700:e321:62f0:329c:23ff:fee3:9d7c]) by smtp.gmail.com with ESMTPSA id f33sm1968647otf.0.2021.09.06.14.12.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 Sep 2021 14:12:01 -0700 (PDT) Subject: Re: [syzbot] BUG: soft lockup in handle_mm_fault (2) To: David Hildenbrand , Andrew Morton , syzbot , Dmitry Vyukov Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com, Linus Torvalds , "Eric W. Biederman" References: <00000000000063692b05cb493f6d@google.com> <20210906103309.f4152941a9a00a27f62dbc2b@linux-foundation.org> <59af1f1c-ae77-cfec-8d8c-32368f8ffdb6@redhat.com> <6d7ea31d-66af-84e5-1db0-9cbbb634f649@redhat.com> From: Guenter Roeck Message-ID: <8574c83d-9623-7c7e-9213-322d6b0064ca@roeck-us.net> Date: Mon, 6 Sep 2021 14:11:59 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <6d7ea31d-66af-84e5-1db0-9cbbb634f649@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=RYDLV28z; spf=pass (imf16.hostedemail.com: domain of groeck7@gmail.com designates 209.85.210.50 as permitted sender) smtp.mailfrom=groeck7@gmail.com; dmarc=none X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: AD5FFF00008C X-Stat-Signature: dweks7n9nz9w3kfubw6m9mbup1b1617b X-HE-Tag: 1630962722-565981 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/6/21 12:20 PM, David Hildenbrand wrote: > On 06.09.21 20:53, Guenter Roeck wrote: >> On 9/6/21 10:46 AM, David Hildenbrand wrote: >>> On 06.09.21 19:33, Andrew Morton wrote: >>>> (cc's added) >>>> >>>> On Sun, 05 Sep 2021 18:05:40 -0700 syzbot wrote: >>>> >>>>> Hello, >>>>> >>>>> syzbot found the following issue on: >>>>> >>>>> HEAD commit:=C2=A0=C2=A0=C2=A0 49624efa65ac Merge tag 'denywrite-fo= r-5.15' of git://githu.. >>>>> git tree:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 upstream >>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=3D12eff4b= 3300000 >>>>> kernel config:=C2=A0 https://syzkaller.appspot.com/x/.config?x=3Dc5= 98149362d97396 >>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=3Daa7a876b8= 108f1622bc3 >>>>> compiler:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 aarch64-linux-gnu-gcc= (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35= .1 >>>>> userspace arch: arm64 >>>>> >>>>> Unfortunately, I don't have any reproducer for this issue yet. >>>>> >>>>> IMPORTANT: if you fix the issue, please add the following tag to th= e commit: >>>>> Reported-by: syzbot+aa7a876b8108f1622bc3@syzkaller.appspotmail.com >>>>> >>>>> watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [syz-executor.1:2= 6449] >>>>> Modules linked in: >>>>> irq event stamp: 248 >>>>> hardirqs last=C2=A0 enabled at (247): [] __exit_t= o_kernel_mode arch/arm64/kernel/entry-common.c:81 [inline] >>>>> hardirqs last=C2=A0 enabled at (247): [] exit_to_= kernel_mode+0x38/0x230 arch/arm64/kernel/entry-common.c:91 >>>>> hardirqs last disabled at (248): [] enter_el1_irq= _or_nmi+0x10/0x20 arch/arm64/kernel/entry-common.c:227 >>>>> softirqs last=C2=A0 enabled at (182): [] _stext+0= x964/0xff8 >>>>> softirqs last disabled at (41): [] do_softirq_own= _stack include/asm-generic/softirq_stack.h:10 [inline] >>>>> softirqs last disabled at (41): [] invoke_softirq= kernel/softirq.c:439 [inline] >>>>> softirqs last disabled at (41): [] __irq_exit_rcu= +0x208/0x4f0 kernel/softirq.c:636 >>>>> CPU: 0 PID: 26449 Comm: syz-executor.1 Not tainted 5.14.0-syzkaller= -09416-g49624efa65ac #0 >>>>> Hardware name: linux,dummy-virt (DT) >>>>> pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=3D--) >>>>> pc : clear_page+0x14/0x28 arch/arm64/lib/clear_page.S:23 >>>>> lr : clear_highpage include/linux/highmem.h:181 [inline] >>>>> lr : kernel_init_free_pages.part.0+0x6c/0x17c mm/page_alloc.c:1286 >>>>> sp : ffff800019be75e0 >>>>> x29: ffff800019be75e0 x28: 0000000000000000 x27: 0000000000000000 >>>>> x26: ffff000009d64940 x25: ffff6000013ac928 x24: 00000000000014c0 >>>>> x23: ffff000009d63480 x22: fffffc0000173340 x21: ffff800015794a78 >>>>> x20: dfff800000000000 x19: fffffc0000173300 x18: 0000000000000000 >>>>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 >>>>> x14: 1ffff0000337ce86 x13: 0000000000000013 x12: ffff7f800002e667 >>>>> x11: 1fffff800002e666 x10: ffff7f800002e666 x9 : 0000000000000000 >>>>> x8 : ffff600000b99a00 x7 : 0000000000000000 x6 : 000000000000003f >>>>> x5 : 0000000000000040 x4 : 1ffff00003060d98 x3 : 1fffe000013ac691 >>>>> x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff000005ccc880 >>>>> Call trace: >>>>> =C2=A0=C2=A0 clear_page+0x14/0x28 arch/arm64/lib/clear_page.S:21 >>>>> =C2=A0=C2=A0 kernel_init_free_pages mm/page_alloc.c:1283 [inline] >>>>> =C2=A0=C2=A0 post_alloc_hook+0x1ac/0x25c mm/page_alloc.c:2426 >>>>> =C2=A0=C2=A0 prep_new_page mm/page_alloc.c:2436 [inline] >>>>> =C2=A0=C2=A0 get_page_from_freelist+0x184c/0x2320 mm/page_alloc.c:4= 168 >>>>> =C2=A0=C2=A0 __alloc_pages+0x1a8/0x21d0 mm/page_alloc.c:5390 >>>>> =C2=A0=C2=A0 alloc_pages_vma+0xbc/0x530 mm/mempolicy.c:2252 >>>>> =C2=A0=C2=A0 alloc_zeroed_user_highpage_movable+0x9c/0xd0 arch/arm6= 4/mm/fault.c:926 >>>>> =C2=A0=C2=A0 do_anonymous_page mm/memory.c:3767 [inline] >>>>> =C2=A0=C2=A0 handle_pte_fault mm/memory.c:4556 [inline] >>>>> =C2=A0=C2=A0 __handle_mm_fault+0xbc4/0x2210 mm/memory.c:4693 >>>>> =C2=A0=C2=A0 handle_mm_fault+0x1dc/0x4f0 mm/memory.c:4791 >>>>> =C2=A0=C2=A0 __do_page_fault arch/arm64/mm/fault.c:499 [inline] >>>>> =C2=A0=C2=A0 do_page_fault+0x230/0x8c0 arch/arm64/mm/fault.c:599 >>>>> =C2=A0=C2=A0 do_translation_fault+0x1a4/0x210 arch/arm64/mm/fault.c= :680 >>>>> =C2=A0=C2=A0 do_mem_abort+0x64/0x1c0 arch/arm64/mm/fault.c:813 >>>>> =C2=A0=C2=A0 el0_da+0x7c/0x2b0 arch/arm64/kernel/entry-common.c:481 >>>>> =C2=A0=C2=A0 el0t_64_sync_handler+0x168/0x1b0 arch/arm64/kernel/ent= ry-common.c:616 >>>>> =C2=A0=C2=A0 el0t_64_sync+0x1a0/0x1a4 arch/arm64/kernel/entry.S:572 >>> >>> At first sight, looks unrelated. Being stuck in clear_page() is weird= ; we're running inside a VM ("dummy-virt"), whereby such stuck tasks in t= he guests are sometimes the result of the hypervisor being stuck (e.g., h= eavily overcommitted). >>> >> Unrelated to your series, yes, because it was first reported after com= mit ebf435d3b51b >> ("Merge tag 'staging-5.15-rc1' of git://git.kernel.org/pub/scm/linux/k= ernel/git/gregkh/staging") >> which predates your series. >> >>> If we don't get a reproducer, that's most probably the root cause. Le= t's see. >>> >> >> That seems unlikely. The problem was seen 8 times by now, starting Sep= tember 2. >=20 > .. always in a similar setup? (even the same hypervisor involved ?) >=20 > I've seen these exact symptoms >=20 > a) when the hypervisor was heavily overcommitting > b) the hypervisor was using uffd (e.g., for psotcopy live migration) an= d not properly resolving faults in user space for the VM process >=20 > It would happen when the VM would first access some yet unpopulated pag= e in the hypervisor. >=20 > But obviously, could be something else, especially once we spot it on r= eal HW. But it smells like the VM is slow. >=20 >=20 > I can spot: https://groups.google.com/g/syzkaller-bugs/c/l6RsKu3FhT0/m/= 7we3AMNxAAAJ >=20 > "This is also due to arm64 removal of CMDLINE support. > syzbot sets watchdog_thresh=3D165, but this fired after 22s. " >=20 > and there, it was also "dummy-virt" ... so maybe really a slow/overload= ed hypervisor. >=20 Ah yes, obviously that error is not new. I stand corrected. Thanks, Guenter