From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.4 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6E46C433EF for ; Mon, 6 Sep 2021 19:20:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6C5336103D for ; Mon, 6 Sep 2021 19:20:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6C5336103D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B53D6900002; Mon, 6 Sep 2021 15:20:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B04D36B0072; Mon, 6 Sep 2021 15:20:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CB17900002; Mon, 6 Sep 2021 15:20:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107]) by kanga.kvack.org (Postfix) with ESMTP id 8D36F6B0071 for ; Mon, 6 Sep 2021 15:20:15 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 2D493274BC for ; Mon, 6 Sep 2021 19:20:15 +0000 (UTC) X-FDA: 78558114390.01.785D3A1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf15.hostedemail.com (Postfix) with ESMTP id B87DBD0000A1 for ; Mon, 6 Sep 2021 19:20:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1630956013; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rEHc56k7ZFIBpjFDov8p4EWeaCiKlIL9gEWlSxoICMg=; b=GG14mcnKltcYOKOrhh0DsBEhRUMSjvhI6cvA38/8awcLnL0rsh/Jbvbspv8FZ6gN6Fnbw4 vScCleH8GnnwKjyu6SRcCSRukKAsR/LRNtjhJbCgTln9TX5eJtWzoCHzzQQqggPd5ebsAc YDSnuL47NVyb3Kvs6yES+0nQgBLcXgE= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-285-4ZyMBu31MvSGpy8IFOvzrQ-1; Mon, 06 Sep 2021 15:20:11 -0400 X-MC-Unique: 4ZyMBu31MvSGpy8IFOvzrQ-1 Received: by mail-wm1-f69.google.com with SMTP id u14-20020a7bcb0e0000b0290248831d46e4so285770wmj.6 for ; Mon, 06 Sep 2021 12:20:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=rEHc56k7ZFIBpjFDov8p4EWeaCiKlIL9gEWlSxoICMg=; b=QwLty/IwWt/BzhxzweqzIJJ32xTj3qfZ7m/WH2tN+lQ4dSugzGxivgBPVWqMuSfxiR gDgyCTciZwMqaNyYZmy0xZ0Qb1I3yCEHF43P9FWc3SGo5ilaWVKgrkoSswNmffiPOAXy pCssXA8UYB4KU/HraouRG3Sz/Oba9BmA07k7fKsbyii1cHuB/5Coz0tIWMBzzbWMNu5G K9D3aOGyt0WMixbp2zabo1UeB8SqiAAyAvZgYfAmZdhiQbk9WUfdDeNRA6dYcsly0g2I Fc+7DvqDmtnVyt602bsWVzejtY7iu4atFwqbwkqsWBSRCyuzyuY47c/hf+TWYCpO5WWy tDYQ== X-Gm-Message-State: AOAM530Ql3EjnbmML/73z4GjE4+xhcbl/LgYpZTy4Ru+gxAW6CRlhq9M MLhC/cYyY1lrMEc72VubYqNMpJRtqPJWh9DT7iZ9FO7y4rPt0LeRUJ1olDp7dbwwBNg2YqT5zPf p82V2sj0Nniw= X-Received: by 2002:a05:6000:1244:: with SMTP id j4mr14911482wrx.335.1630956010394; Mon, 06 Sep 2021 12:20:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwfNeKWWGiCxGjrk6wfcQ0gagAcVjp2TJJiioLGI6ohLpGWa/Ko6B9trweEm0tzFElvsoKgsw== X-Received: by 2002:a05:6000:1244:: with SMTP id j4mr14911463wrx.335.1630956010093; Mon, 06 Sep 2021 12:20:10 -0700 (PDT) Received: from [192.168.3.132] (p5b0c6323.dip0.t-ipconnect.de. [91.12.99.35]) by smtp.gmail.com with ESMTPSA id r25sm7919454wrc.26.2021.09.06.12.20.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 Sep 2021 12:20:09 -0700 (PDT) To: Guenter Roeck , Andrew Morton , syzbot , Dmitry Vyukov Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com, Linus Torvalds , "Eric W. Biederman" References: <00000000000063692b05cb493f6d@google.com> <20210906103309.f4152941a9a00a27f62dbc2b@linux-foundation.org> <59af1f1c-ae77-cfec-8d8c-32368f8ffdb6@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [syzbot] BUG: soft lockup in handle_mm_fault (2) Message-ID: <6d7ea31d-66af-84e5-1db0-9cbbb634f649@redhat.com> Date: Mon, 6 Sep 2021 21:20:09 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 1 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=GG14mcnK; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf15.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: B87DBD0000A1 X-Stat-Signature: 9h47g15o8e3dsi96tj6ucbejmddmc5ok X-HE-Tag: 1630956014-3973 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 06.09.21 20:53, Guenter Roeck wrote: > On 9/6/21 10:46 AM, David Hildenbrand wrote: >> On 06.09.21 19:33, Andrew Morton wrote: >>> (cc's added) >>> >>> On Sun, 05 Sep 2021 18:05:40 -0700 syzbot wrote: >>> >>>> Hello, >>>> >>>> syzbot found the following issue on: >>>> >>>> HEAD commit:=C2=A0=C2=A0=C2=A0 49624efa65ac Merge tag 'denywrite-for= -5.15' of git://githu.. >>>> git tree:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 upstream >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=3D12eff4b3= 300000 >>>> kernel config:=C2=A0 https://syzkaller.appspot.com/x/.config?x=3Dc59= 8149362d97396 >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=3Daa7a876b81= 08f1622bc3 >>>> compiler:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 aarch64-linux-gnu-gcc = (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.= 1 >>>> userspace arch: arm64 >>>> >>>> Unfortunately, I don't have any reproducer for this issue yet. >>>> >>>> IMPORTANT: if you fix the issue, please add the following tag to the= commit: >>>> Reported-by: syzbot+aa7a876b8108f1622bc3@syzkaller.appspotmail.com >>>> >>>> watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [syz-executor.1:26= 449] >>>> Modules linked in: >>>> irq event stamp: 248 >>>> hardirqs last=C2=A0 enabled at (247): [] __exit_to= _kernel_mode arch/arm64/kernel/entry-common.c:81 [inline] >>>> hardirqs last=C2=A0 enabled at (247): [] exit_to_k= ernel_mode+0x38/0x230 arch/arm64/kernel/entry-common.c:91 >>>> hardirqs last disabled at (248): [] enter_el1_irq_= or_nmi+0x10/0x20 arch/arm64/kernel/entry-common.c:227 >>>> softirqs last=C2=A0 enabled at (182): [] _stext+0x= 964/0xff8 >>>> softirqs last disabled at (41): [] do_softirq_own_= stack include/asm-generic/softirq_stack.h:10 [inline] >>>> softirqs last disabled at (41): [] invoke_softirq = kernel/softirq.c:439 [inline] >>>> softirqs last disabled at (41): [] __irq_exit_rcu+= 0x208/0x4f0 kernel/softirq.c:636 >>>> CPU: 0 PID: 26449 Comm: syz-executor.1 Not tainted 5.14.0-syzkaller-= 09416-g49624efa65ac #0 >>>> Hardware name: linux,dummy-virt (DT) >>>> pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=3D--) >>>> pc : clear_page+0x14/0x28 arch/arm64/lib/clear_page.S:23 >>>> lr : clear_highpage include/linux/highmem.h:181 [inline] >>>> lr : kernel_init_free_pages.part.0+0x6c/0x17c mm/page_alloc.c:1286 >>>> sp : ffff800019be75e0 >>>> x29: ffff800019be75e0 x28: 0000000000000000 x27: 0000000000000000 >>>> x26: ffff000009d64940 x25: ffff6000013ac928 x24: 00000000000014c0 >>>> x23: ffff000009d63480 x22: fffffc0000173340 x21: ffff800015794a78 >>>> x20: dfff800000000000 x19: fffffc0000173300 x18: 0000000000000000 >>>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 >>>> x14: 1ffff0000337ce86 x13: 0000000000000013 x12: ffff7f800002e667 >>>> x11: 1fffff800002e666 x10: ffff7f800002e666 x9 : 0000000000000000 >>>> x8 : ffff600000b99a00 x7 : 0000000000000000 x6 : 000000000000003f >>>> x5 : 0000000000000040 x4 : 1ffff00003060d98 x3 : 1fffe000013ac691 >>>> x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff000005ccc880 >>>> Call trace: >>>> =C2=A0 clear_page+0x14/0x28 arch/arm64/lib/clear_page.S:21 >>>> =C2=A0 kernel_init_free_pages mm/page_alloc.c:1283 [inline] >>>> =C2=A0 post_alloc_hook+0x1ac/0x25c mm/page_alloc.c:2426 >>>> =C2=A0 prep_new_page mm/page_alloc.c:2436 [inline] >>>> =C2=A0 get_page_from_freelist+0x184c/0x2320 mm/page_alloc.c:4168 >>>> =C2=A0 __alloc_pages+0x1a8/0x21d0 mm/page_alloc.c:5390 >>>> =C2=A0 alloc_pages_vma+0xbc/0x530 mm/mempolicy.c:2252 >>>> =C2=A0 alloc_zeroed_user_highpage_movable+0x9c/0xd0 arch/arm64/mm/f= ault.c:926 >>>> =C2=A0 do_anonymous_page mm/memory.c:3767 [inline] >>>> =C2=A0 handle_pte_fault mm/memory.c:4556 [inline] >>>> =C2=A0 __handle_mm_fault+0xbc4/0x2210 mm/memory.c:4693 >>>> =C2=A0 handle_mm_fault+0x1dc/0x4f0 mm/memory.c:4791 >>>> =C2=A0 __do_page_fault arch/arm64/mm/fault.c:499 [inline] >>>> =C2=A0 do_page_fault+0x230/0x8c0 arch/arm64/mm/fault.c:599 >>>> =C2=A0 do_translation_fault+0x1a4/0x210 arch/arm64/mm/fault.c:680 >>>> =C2=A0 do_mem_abort+0x64/0x1c0 arch/arm64/mm/fault.c:813 >>>> =C2=A0 el0_da+0x7c/0x2b0 arch/arm64/kernel/entry-common.c:481 >>>> =C2=A0 el0t_64_sync_handler+0x168/0x1b0 arch/arm64/kernel/entry-com= mon.c:616 >>>> =C2=A0 el0t_64_sync+0x1a0/0x1a4 arch/arm64/kernel/entry.S:572 >> >> At first sight, looks unrelated. Being stuck in clear_page() is weird;= we're running inside a VM ("dummy-virt"), whereby such stuck tasks in th= e guests are sometimes the result of the hypervisor being stuck (e.g., he= avily overcommitted). >> > Unrelated to your series, yes, because it was first reported after comm= it ebf435d3b51b > ("Merge tag 'staging-5.15-rc1' of git://git.kernel.org/pub/scm/linux/ke= rnel/git/gregkh/staging") > which predates your series. >=20 >> If we don't get a reproducer, that's most probably the root cause. Let= 's see. >> >=20 > That seems unlikely. The problem was seen 8 times by now, starting Sept= ember 2. .. always in a similar setup? (even the same hypervisor involved ?) I've seen these exact symptoms a) when the hypervisor was heavily overcommitting b) the hypervisor was using uffd (e.g., for psotcopy live migration) and=20 not properly resolving faults in user space for the VM process It would happen when the VM would first access some yet unpopulated page=20 in the hypervisor. But obviously, could be something else, especially once we spot it on=20 real HW. But it smells like the VM is slow. I can spot:=20 https://groups.google.com/g/syzkaller-bugs/c/l6RsKu3FhT0/m/7we3AMNxAAAJ "This is also due to arm64 removal of CMDLINE support. syzbot sets watchdog_thresh=3D165, but this fired after 22s. " and there, it was also "dummy-virt" ... so maybe really a=20 slow/overloaded hypervisor. --=20 Thanks, David / dhildenb