From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85263C433FE for ; Thu, 21 Oct 2021 08:40:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 27A086115B for ; Thu, 21 Oct 2021 08:40:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 27A086115B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B5FBF6B0071; Thu, 21 Oct 2021 04:40:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B0F2B6B0072; Thu, 21 Oct 2021 04:40:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D826900002; Thu, 21 Oct 2021 04:40:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0032.hostedemail.com [216.40.44.32]) by kanga.kvack.org (Postfix) with ESMTP id 8CDBC6B0071 for ; Thu, 21 Oct 2021 04:40:17 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 43772180178A9 for ; Thu, 21 Oct 2021 08:40:17 +0000 (UTC) X-FDA: 78719797674.32.669C8CC Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by imf04.hostedemail.com (Postfix) with ESMTP id 5879150000A8 for ; Thu, 21 Oct 2021 08:40:13 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10143"; a="289822337" X-IronPort-AV: E=Sophos;i="5.87,169,1631602800"; d="scan'208";a="289822337" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2021 01:40:14 -0700 X-IronPort-AV: E=Sophos;i="5.87,169,1631602800"; d="scan'208";a="495028875" Received: from ssuryana-mobl.ger.corp.intel.com (HELO localhost) ([10.249.45.34]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2021 01:40:07 -0700 From: Jani Nikula To: Vlastimil Babka , Naresh Kamboju , open list , Linux-Next Mailing List , linux-mm , dri-devel@lists.freedesktop.org, Andrew Morton Cc: Marco Elver , Vijayanand Jitta , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Geert Uytterhoeven , Oliver Glitta , Imran Khan , lkft-triage@lists.linaro.org, Stephen Rothwell Subject: Re: [next] [dragonboard 410c] Unable to handle kernel paging request at virtual address 00000000007c4240 In-Reply-To: <80ab567d-74f3-e14b-3c30-e64bbd64b354@suse.cz> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo References: <80ab567d-74f3-e14b-3c30-e64bbd64b354@suse.cz> Date: Thu, 21 Oct 2021 11:40:03 +0300 Message-ID: <87fssuojoc.fsf@intel.com> MIME-Version: 1.0 Content-Type: text/plain X-Stat-Signature: s79i85q4uadwb4hz8u4xdohpggc6cqp4 Authentication-Results: imf04.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf04.hostedemail.com: domain of jani.nikula@intel.com has no SPF policy when checking 134.134.136.31) smtp.mailfrom=jani.nikula@intel.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5879150000A8 X-HE-Tag: 1634805613-29995 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 21 Oct 2021, Vlastimil Babka wrote: > On 10/20/21 20:24, Naresh Kamboju wrote: >> Following kernel crash noticed on linux next 20211020 tag. >> while booting on arm64 architecture dragonboard 410c device. >> >> I see the following config is enabled in 20211020 tag builds. >> CONFIG_STACKDEPOT=y >> >> Crash log, >> [ 18.583097] Unable to handle kernel paging request at virtual >> address 00000000007c4240 >> [ 18.583521] Mem abort info: >> [ 18.590286] ESR = 0x96000004 >> [ 18.592920] EC = 0x25: DABT (current EL), IL = 32 bits >> [ 18.596103] SET = 0, FnV = 0 >> [ 18.601512] EA = 0, S1PTW = 0 >> [ 18.604384] FSC = 0x04: level 0 translation fault >> [ 18.607447] Data abort info: >> [ 18.612296] ISV = 0, ISS = 0x00000004 >> [ 18.615451] CM = 0, WnR = 0 >> [ 18.618990] user pgtable: 4k pages, 48-bit VAs, pgdp=000000008b4c7000 >> [ 18.622054] [00000000007c4240] pgd=0000000000000000, p4d=0000000000000000 >> [ 18.628974] Internal error: Oops: 96000004 [#1] SMP >> [ 18.635073] Modules linked in: adv7511 cec snd_soc_lpass_apq8016 >> snd_soc_lpass_cpu snd_soc_lpass_platform snd_soc_msm8916_digital >> qcom_camss qrtr snd_soc_apq8016_sbc videobuf2_dma_sg qcom_pon >> qcom_spmi_vadc snd_soc_qcom_common qcom_q6v5_mss qcom_vadc_common >> rtc_pm8xxx qcom_spmi_temp_alarm msm qcom_pil_info v4l2_fwnode >> qcom_q6v5 snd_soc_msm8916_analog qcom_sysmon qcom_common v4l2_async >> qnoc_msm8916 qcom_rng gpu_sched qcom_glink_smem venus_core >> videobuf2_memops icc_smd_rpm qmi_helpers drm_kms_helper v4l2_mem2mem >> mdt_loader display_connector i2c_qcom_cci videobuf2_v4l2 crct10dif_ce >> videobuf2_common socinfo drm rmtfs_mem fuse >> [ 18.672948] CPU: 0 PID: 178 Comm: kworker/u8:3 Not tainted >> 5.15.0-rc6-next-20211020 #1 >> [ 18.695000] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) >> [ 18.695012] Workqueue: events_unbound deferred_probe_work_func >> [ 18.695033] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 18.715282] pc : __stack_depot_save+0x13c/0x4e0 >> [ 18.722130] lr : stack_depot_save+0x14/0x20 >> [ 18.726641] sp : ffff800014a23500 >> [ 18.730801] x29: ffff800014a23500 x28: 00000000000f8848 x27: ffff800013acdf68 >> [ 18.734294] x26: 0000000000000000 x25: 00000000007c4240 x24: ffff800014a23780 >> [ 18.741413] x23: 0000000000000008 x22: ffff800014a235b8 x21: 0000000000000008 >> [ 18.748530] x20: 00000000c32f8848 x19: ffff00001038cc18 x18: ffffffffffffffff >> [ 18.755649] x17: ffff80002d9f8000 x16: ffff800010004000 x15: 000000000000c426 >> [ 18.762767] x14: 0000000000000000 x13: ffff800014a23780 x12: 0000000000000000 >> [ 18.769885] x11: ffff00001038cc80 x10: ffff8000136e9ba0 x9 : ffff800014a235f4 >> [ 18.777003] x8 : 0000000000000001 x7 : 00000000b664620b x6 : 0000000011a58b4a >> [ 18.784121] x5 : 000000001aa43464 x4 : 000000009e7d8b67 x3 : 0000000000000001 >> [ 18.791239] x2 : 0000000000002800 x1 : ffff800013acd000 x0 : 00000000f2d429d8 >> [ 18.798358] Call trace: >> [ 18.805451] __stack_depot_save+0x13c/0x4e0 >> [ 18.807716] stack_depot_save+0x14/0x20 >> [ 18.811881] __drm_stack_depot_save+0x44/0x70 [drm] >> [ 18.815710] modeset_lock.part.0+0xe0/0x1a4 [drm] >> [ 18.820571] drm_modeset_lock_all_ctx+0x2d4/0x334 [drm] > > This stack_depot_save path appears to be new from Jani's commit > cd06ab2fd48f ("drm/locking: add backtrace for locking contended locks > without backoff") > And there's a semantic conflict with my patch in mmotm: > - sha1 (valid only in next-20211020) 5e6d063de5cd ("lib/stackdepot: allow > optional init and stack_table allocation by kvmalloc()") > - lore: https://lore.kernel.org/all/20211013073005.11351-1-vbabka@suse.cz/ > - patchwork: https://patchwork.freedesktop.org/series/95549/#rev3 > > With my patch, to-be callers of stack_depot_save() need to call > stack_depot_init() at least once, to avoid unnecessary runtime overhead > otherwise I have added that calls into three DRM contexts in my patch, but > didn't see cd06ab2fd48f yet at the time. Auch, I did see your changes fly by, but overlooked this type of conflict with my patch. Sorry for the trouble. > This one seems a bit more tricky and I could really use some advice. > cd06ab2fd48f adds stackdepot usage to drm_modeset_lock which itself has a > number of different users and requiring those to call stack_depot_init() > would be likely error prone. Would it be ok to add the call of > stack_depot_init() (guarded by #ifdef CONFIG_DRM_DEBUG_MODESET_LOCK) to > drm_modeset_lock_init()? It will do a mutex_lock()/unlock(), and kvmalloc() > on first call. > I don't know how much of hotpath this is, but hopefully should be acceptable > in debug config. Or do you have better suggestion? Thanks. I think that should be fine. Maybe add __drm_stack_depot_init() in the existing #if IS_ENABLED(CONFIG_DRM_DEBUG_MODESET_LOCK), similar to the other __drm_stack_depot_*() functions, with an empty stub for CONFIG_DRM_DEBUG_MODESET_LOCK=n, and call it unconditionally in drm_modeset_lock_init(). > Then we have to figure out how to order a fix between DRM and mmotm... That is the question! The problem exists only in the merge of the two. On current DRM side stack_depot_init() exists but it's __init and does not look safe to call multiple times. And obviously my changes don't exist at all in mmotm. I guess one (admittedly hackish) option is to first add a patch in drm-next (or drm-misc-next) that makes it safe to call stack_depot_init() multiple times in non-init context. It would be dropped in favour of your changes once the trees get merged together. Or is there some way for __drm_stack_depot_init() to detect whether it should call stack_depot_init() or not, i.e. whether your changes are there or not? BR, Jani. > >> [ 18.825435] drm_client_firmware_config.constprop.0.isra.0+0xc0/0x5d0 [drm] >> [ 18.830478] drm_client_modeset_probe+0x328/0xbb0 [drm] >> [ 18.837413] __drm_fb_helper_initial_config_and_unlock+0x54/0x5b4 >> [drm_kms_helper] >> [ 18.842633] drm_fb_helper_initial_config+0x5c/0x70 [drm_kms_helper] >> [ 18.850266] msm_fbdev_init+0x98/0x100 [msm] >> [ 18.856767] msm_drm_bind+0x650/0x720 [msm] >> [ 18.861021] try_to_bring_up_master+0x230/0x320 >> [ 18.864926] __component_add+0xc8/0x1c4 >> [ 18.869435] component_add+0x20/0x30 >> [ 18.873253] mdp5_dev_probe+0xe0/0x11c [msm] >> [ 18.877077] platform_probe+0x74/0xf0 >> [ 18.881328] really_probe+0xc4/0x470 >> [ 18.884883] __driver_probe_device+0x11c/0x190 >> [ 18.888534] driver_probe_device+0x48/0x110 >> [ 18.892786] __device_attach_driver+0xa4/0x140 >> [ 18.896869] bus_for_each_drv+0x84/0xe0 >> [ 18.901380] __device_attach+0xe4/0x1c0 >> [ 18.905112] device_initial_probe+0x20/0x30 >> [ 18.908932] bus_probe_device+0xac/0xb4 >> [ 18.913098] deferred_probe_work_func+0xc8/0x120 >> [ 18.916920] process_one_work+0x280/0x6a0 >> [ 18.921780] worker_thread+0x80/0x454 >> [ 18.925683] kthread+0x178/0x184 >> [ 18.929326] ret_from_fork+0x10/0x20 >> [ 18.932634] Code: d37d4e99 92404e9c f940077a 8b190359 (c8dfff33) >> [ 18.936203] ---[ end trace 3e289b724840642d ]--- >> >> Full log, >> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20211020/testrun/6177937/suite/linux-log-parser/test/check-kernel-oops-3786583/log >> https://lkft.validation.linaro.org/scheduler/job/3786583#L2549 >> >> Build config: >> https://builds.tuxbuild.com/1zlLlQrUyHVr1MQ1gcler3dKaE6/config >> >> Reported-by: Linux Kernel Functional Testing >> >> steps to reproduce: >> 1) https://builds.tuxbuild.com/1zlLlQrUyHVr1MQ1gcler3dKaE6/tuxmake_reproducer.sh >> 2) Boot db410c device >> >> -- >> Linaro LKFT >> https://lkft.linaro.org >> > -- Jani Nikula, Intel Open Source Graphics Center