From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53DEACA0FE9 for ; Fri, 1 Sep 2023 07:29:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A1C028E0010; Fri, 1 Sep 2023 03:29:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9CC2E8D0002; Fri, 1 Sep 2023 03:29:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 893868E0010; Fri, 1 Sep 2023 03:29:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 773FE8D0002 for ; Fri, 1 Sep 2023 03:29:19 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 417E4A015B for ; Fri, 1 Sep 2023 07:29:19 +0000 (UTC) X-FDA: 81187202838.16.BBC8E49 Received: from mail-yw1-f170.google.com (mail-yw1-f170.google.com [209.85.128.170]) by imf19.hostedemail.com (Postfix) with ESMTP id 657AF1A000C for ; Fri, 1 Sep 2023 07:29:17 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=LTYg1NAd; spf=pass (imf19.hostedemail.com: domain of hughd@google.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693553357; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5LW3hHQLLB7chZFzQzoqJF0V/Isvx2JGSHv0+TQLQ+U=; b=X9KdRGhiKRm9BtIbj/80p4w+DTVh7BMnZeR84IAJhql3lAuG1ajnXzfpbUYJpracu81JK+ NiOjG3L8TxyWAcioQU0DgQ3nKmPkWS0+uKdXa11U9odV6m00Nxzl2KrDsGkkGR4WRiYL1p P96k8lNnMB6Wv2HqaFp8J+RLZlkVCBQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693553357; a=rsa-sha256; cv=none; b=i4ICpXQmLD4ocXPapKdl3DcWn7Gb8ufzR80h+R71aTEKsfu2JFw77Hv67vDboqaZqOO8mj JKBsOVbAyuRr9zz2iO0wkRYe2V+ylHxlRnhPeZzsGBBEp/FFLQovfmrOZ2aspZw5AQ7rLm hJfY0ULxWPnFjc9Byh4Kwp61hLs1D0A= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=LTYg1NAd; spf=pass (imf19.hostedemail.com: domain of hughd@google.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f170.google.com with SMTP id 00721157ae682-58d40c2debeso18570787b3.2 for ; Fri, 01 Sep 2023 00:29:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693553356; x=1694158156; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=5LW3hHQLLB7chZFzQzoqJF0V/Isvx2JGSHv0+TQLQ+U=; b=LTYg1NAdrj28IHjhavxWgpUIJjljvtcbG39IibA5nKbrpOJBo5anh5sw1VRG9xNT7B 5luL9ev26fT4rFnmycB6jGRNxmnhKHbTr1+fefEU0DxnIDLrL7n6pJzOHmVUfHO+fYVC EGrBvH7J6THw6/aknOYe4zBxHTDIufd28Y3v/Gp+Sww0yfV7QsXD3jG89MQr3oZep/F5 Uv0r9NcuejDslvNctycpGnwOwlHjVsvXzKs5wIl/I5BjJ1JiJm5bCkHcgH6sgGwqe0gu tqs0rP+JprxZwXtpth+Bu0LKFoCj9rz6NpOUv8+IOZe65q1GvQq+kFZ41oPC86iJw5SQ fJjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693553356; x=1694158156; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5LW3hHQLLB7chZFzQzoqJF0V/Isvx2JGSHv0+TQLQ+U=; b=NazH4OjC2+7FJxQ7UN8fqG0avUrh72+geEy2UTu+NUDci82CXyFpJSRhrdiUta0INS sCUggMwzW7E/ZgWOpqexh1qoQSPr2iZybMB720XMbxNNCAUbWi05WhpA5eT6HTzIjvVa rtZ2cAVMUNYYEgdKbn5Vg9YuZPuM45HeypLpuXUQQzrsJB9HVnHWH091+Uo7JLmrIckZ nr9ME++QK6fMQI/aLJNX35oGRenoQIXZEx5t6bdXokCo4oL1LkukSsmS1jd2V0CbMv2u hwQKaKN3qWdpSno9UX6GVmoXx9K8UVHQgws6nY41hHIlHDvcysWTHS7BoTcV+DRj6gtj ZGfA== X-Gm-Message-State: AOJu0YzJLsq/JYz63TdFz+V/UXK53sIZgb7KOYLzvC4wHQAo+8/1zqLT SjlqQd/EV6+nC9mVLuW10cKK0A== X-Google-Smtp-Source: AGHT+IHYTT7ITidyIX5/cLirH9qENQ5ZkDTlXFDxmVfuerWktFS//V5VtHC8epHEM5ckxMnPsUOpQQ== X-Received: by 2002:a05:6902:188a:b0:ced:abcd:62e5 with SMTP id cj10-20020a056902188a00b00cedabcd62e5mr2432793ybb.7.1693553356372; Fri, 01 Sep 2023 00:29:16 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v13-20020a25fc0d000000b00d6861b49f01sm767447ybd.65.2023.09.01.00.29.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Sep 2023 00:29:15 -0700 (PDT) Date: Fri, 1 Sep 2023 00:29:07 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Mikhail Gavrilov cc: Hugh Dickins , Andrew Morton , Bagas Sanjaya , linux-kernel@vger.kernel.org, linux-mm@kvack.org, regressions@lists.linux.dev Subject: Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting In-Reply-To: Message-ID: <3548ca67-ce58-3bc6-fef5-348b98d7678b@google.com> References: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1463760895-707783301-1693553355=:17635" X-Rspamd-Queue-Id: 657AF1A000C X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 6mok95tu8hdp1qezoxummo8p6fi6eng8 X-HE-Tag: 1693553357-985507 X-HE-Meta: U2FsdGVkX1+SGGKwFavtkFP4r11VwQsW3cwxfqD8UL7s2tEaKKqxcPqDJ7A6MDQY4wty2MkAzbxyN5IqcapiStfrcWv3DCQwqINiQeKs1ORC2MGMOOHOA0dJJmfZHxWOvJSkWcN5qCG5ri6xKD8FGJlt/bnlFP+08bc1N160JBXkjfbtfOM3MyzzjE2zl9iTJUVQx9/kEl2ptZ3Fq8pIAKzwn4FSm7ctyYWuzYO1+plrRAF4lfsSCNHDqAc9mMiaVpmjwJyVFIREC5WeEg05T2VTmZQgVvSCL2mJb1j2+qPESliQhnxww7FwLItXMxm1QrIRF2cgLjdO58N8rbNIDXJae1WwIKlelTxligTtGMHGZbK+hOmcM1YCzKRfAQxO+4U6mYG6QAYj9sSW98lvcpcyWIIm3gFcF8u6/E3OTckxQOlw6pEJyncYFrFNxICUNNxPa7WDILnbjvCr6awR+E8Mq8b3Th4QoMm894mRGbGNvxKdturPR1lCmVnzw9gba1PKCT7n6CrTXzGFLUmwizYuBWNOGQutytfqBuHRD3WIz98XKCniz/4rrxvRqSD7Ushv8iESkNKtpxfMasCc6zRGg/eN41zQ/ggasxszUOAyfUEMDFMnnmhJuUm7ZuUvgrcA4uCihZkk18uv4Q9Tkk4k/Wx09aqfYBVroS0h8d2qhCEG2XqWbXNpBvcJrAYow/Nu4wR7PDxKt9tdMrXlxXn8qhRm9fE9K6+3DnMF/3ZM9G3IHdoSqmWh1V0LVRjCEC0Pwl5fquenbLifztOc592spbhqHBdDkMU9d7qCflCZ18eqYtpu5uqyaVEVETXrYYvvKUbjcRee35mN6pAZHb3IicJRbRVD4Xnn2PZdQ8j5vQf9ZWxTf7fUz2d+eU0nN7dZPnU/NRLwlL+xf0p1r55sRz2sjMSL0SjyZOIEiPmREP9zMLdQFILPBtXDf2WYkdWoC6aatjcjDiPqLMn NjLrt4Y8 uRW7VNqORm0+lot6SCtAmqJLhLR3mE4YQh0GagU9kdlvt8wrYUF5aV/oHpmWudVyxjpA/ViaPTPAEcq4zv5IaxKkJOx8nYmSUYFvWuURZe9Kt1ZjOgCtm1bI/7AbxoItvW/QNx3+LvTsmddSD1Kmk3YYmSS3LKAKUCrPPyIwlYdFEmrp+XU/fZ3uMXlbWgCFH4iExQhuOR4rFkjFXJD4W/kluQcIHwJcRbPOh2yo0NOuC8/degauYCOqVYP2bjAR4bUlKHiKkEck+kn4qOO/bkBcUplVXofxs7vCefz/YxhIHk+MqD1P40aDWXnoEqD1Ckzo6Py8ojLNdyRuNxYxM2bJXoxPBfYXf+uRvWUYWD73rFm4gJq5XOHwFfXpIZurKtCLD99tSL8Nm6K0j1RWkHt1CMjYlAU3ukT1T0K9VDZO6Y7MwstbG4xJWprjuGc9oekUoBFj9bG7/+VddTTo7M1bYA0cqMvDzxq1+RRoFpUhDH64n4BYxCddn+Sveb9NC8hgXKjMTLy9ZCs5H3A8pb0WzTduKJe2TGHhVmDHX/aCYN4/eS3OCy1XhGVyVwmGHACeZjwCoxkgRAMoU4uUkKf8wQK1YVj5bihMtjw3SgFFMn71qxBh5wpu7tak/fWN9dnl5NYy6gvGeFUGAC3GPKIB1+HlL6qGHmRHxFWdNxTyMoaennkOJoHITmg98eimXrH00UDUC+aFspR3huqcDZt35/Blrt95hOX6k+1vlbGe3hdwJpqwlER8IOA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463760895-707783301-1693553355=:17635 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Fri, 1 Sep 2023, Mikhail Gavrilov wrote: > Hi, > next release cycle, and another regression. > Yesterday after another kernel update in Fedora Rawhide system stopped bo= oting. Many thanks for reporting, Mike: I'm sorry that it never showed up while in linux-next, leaving you to be the one to hit it again. > Today thanks to git bisect, I found out that this is a commit: >=20 > =E2=9D=AF git bisect bad > a349d72fd9efc87c8fd1d16d3164752d84a7275b is the first bad commit > commit a349d72fd9efc87c8fd1d16d3164752d84a7275b > Author: Hugh Dickins > Date: Tue Jul 11 21:30:40 2023 -0700 >=20 > mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s =2E.. > Before putting them to use (several commits later), add rcu_read_lock= () to > pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a > separate commit, since it risks exposing imbalances: prior commits ha= ve > fixed all the known imbalances, but we may find some have been missed= =2E I assume that it is such an imbalance - somewhere omitting to pte_unmap() after a pte_offset_map(); but I cannot see where. > It looks like the hang happens so early that when booting into a > working kernel and running "journalctl -b -1" I see in the console the > log of the previous kernel which was booted before the problematic > kernel. > Therefore, I apologize that I can't provide the kernel logs. > I can provides only photos when backtrace appears on my monitor: > Here we waiting: https://ibb.co/5xmm0BH > And then I see backtrace: https://ibb.co/TLLGFNP >=20 > Unfortunately I can't revert commit > a349d72fd9efc87c8fd1d16d3164752d84a7275b for testing more fresh builds > because of conflicts. >=20 > My hardware: https://linux-hardware.org/?probe=3Ddd5735f315 > I also attached kernel build config and full bisect log. Thanks for all the info, which has helped in several ways. The only thing I can do is to offer you a debug (and then keep running) patch - suitable for the config you showed there, not for anyone else's config. I've never used stackdepot before, but I've tried this out in good and bad cases, and expect it to work for you, shedding light on where is going wrong - machine should boot up fine, and in dmesg you'll find one stacktrace between "WARNING: pte_map..." and "End of pte_map..." lines. To apply on top of a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s"), the bad end point of your bisection; but if you prefer, I can provide a version to go on top of whatever later Linus commit suits you. Patch not for general consumption, just for Mike's debugging: please report back the stacktrace shown - thanks! Hugh --- include/linux/pgtable.h | 5 +---- mm/memory.c | 1 + mm/mremap.c | 1 + mm/pgtable-generic.c | 40 ++++++++++++++++++++++++++++++++++++++-- 4 files changed, 41 insertions(+), 6 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5134edcec668..131392f1c33e 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -106,10 +106,7 @@ static inline pte_t *__pte_map(pmd_t *pmd, unsigned lo= ng address) { =09return pte_offset_kernel(pmd, address); } -static inline void pte_unmap(pte_t *pte) -{ -=09rcu_read_unlock(); -} +void pte_unmap(pte_t *pte); #endif =20 /* Find an entry in the second-level page table.. */ diff --git a/mm/memory.c b/mm/memory.c index 44d11812a88f..b1ee8ab51978 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1033,6 +1033,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, =09=09ret =3D -ENOMEM; =09=09goto out; =09} +=09pte_unmap(NULL);=09/* avoid warning when knowingly nested */ =09src_pte =3D pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); =09if (!src_pte) { =09=09pte_unmap_unlock(dst_pte, dst_ptl); diff --git a/mm/mremap.c b/mm/mremap.c index 11e06e4ab33b..56d981add487 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -175,6 +175,7 @@ static int move_ptes(struct vm_area_struct *vma, pmd_t = *old_pmd, =09=09err =3D -EAGAIN; =09=09goto out; =09} +=09pte_unmap(NULL);=09/* avoid warning when knowingly nested */ =09new_pte =3D pte_offset_map_nolock(mm, new_pmd, new_addr, &new_ptl); =09if (!new_pte) { =09=09pte_unmap_unlock(old_pte, old_ptl); diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 400e5a045848..87cbdc73beda 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -232,11 +232,47 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,= unsigned long address, #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 +#include +#include +#include + +static depot_stack_handle_t depot_stack; + +static void pte_map(void) +{ +=09static bool done =3D false; +=09unsigned long entries[16]; +=09unsigned int nr_entries; + +=09/* rcu_read_lock(); */ +=09if (raw_smp_processor_id() !=3D 0 || done) +=09=09return; +=09if (depot_stack) { +=09=09pr_warn("WARNING: pte_map was not pte_unmapped:\n"); +=09=09stack_depot_print(depot_stack); +=09=09pr_warn("End of pte_map warning.\n"); +=09=09done =3D true; +=09=09return; +=09} +=09nr_entries =3D stack_trace_save(entries, ARRAY_SIZE(entries), 0); +=09depot_stack =3D stack_depot_save(entries, nr_entries, GFP_NOWAIT); +=09if (ktime_get_seconds() > 1800)=09/* give up after half an hour */ +=09=09done =3D true; +} + +void pte_unmap(pte_t *pte) +{ +=09/* rcu_read_unlock(); */ +=09if (raw_smp_processor_id() !=3D 0) +=09=09return; +=09depot_stack =3D 0; +} + pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) { =09pmd_t pmdval; =20 -=09rcu_read_lock(); +=09pte_map(); =09pmdval =3D pmdp_get_lockless(pmd); =09if (pmdvalp) =09=09*pmdvalp =3D pmdval; @@ -250,7 +286,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr,= pmd_t *pmdvalp) =09} =09return __pte_map(&pmdval, addr); nomap: -=09rcu_read_unlock(); +=09pte_unmap(NULL); =09return NULL; } =20 --=20 2.35.3 ---1463760895-707783301-1693553355=:17635--