From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFC77C74A5B for ; Wed, 29 Mar 2023 16:54:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7FDE16B0072; Wed, 29 Mar 2023 12:54:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D5326B0074; Wed, 29 Mar 2023 12:54:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C4956B0075; Wed, 29 Mar 2023 12:54:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5FC226B0072 for ; Wed, 29 Mar 2023 12:54:54 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 32CDA1A06D8 for ; Wed, 29 Mar 2023 16:54:54 +0000 (UTC) X-FDA: 80622535308.26.3E03C80 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf27.hostedemail.com (Postfix) with ESMTP id 6AF1540020 for ; Wed, 29 Mar 2023 16:54:52 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of cmarinas@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680108892; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/aZSZ0es9sTAj0t2pGa0MkaCKs14ZkTxuDWGo71gOwY=; b=NoZm8jJ/y83CEbIrAt7peF6dgGsms4otX1LvUZ9x3md0tOYDS1XTX3CLm5QnUpqWNZgzJR ZzU84nAmy1c1dizBc1UWDw5MWjq8pDBgCW5NHjf5wGvJYXYAXADdLZkMLS0kk/nTnGc4YJ M0tCHNz5ctLLpxSLMEL/C5Oin83CCEo= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of cmarinas@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680108892; a=rsa-sha256; cv=none; b=QuxugBSwDQpHXW90P7LPHjWbqeiMwyIHJCH/pP2UYJWgRFQ6MWcmJmOPw4n5XkfP0tM+68 oievzVN1rOExacX7SVSl5bRYIG31iR/FM7IKIOfbZ7Gpy+JCRbUy4jOZPvJXCTvPeEZG9B gly8t9zVMTeUNlOs/YB5mOkIFZOvrjU= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 72C6161D85; Wed, 29 Mar 2023 16:54:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A6E6DC433D2; Wed, 29 Mar 2023 16:54:48 +0000 (UTC) Date: Wed, 29 Mar 2023 17:54:45 +0100 From: Catalin Marinas To: Qun-wei Lin =?utf-8?B?KOael+e+pOW0tCk=?= Cc: "linux-arm-kernel@lists.infradead.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "surenb@google.com" , "david@redhat.com" , Chinwen Chang =?utf-8?B?KOW8temMpuaWhyk=?= , "kasan-dev@googlegroups.com" , Kuan-Ying Lee =?utf-8?B?KOadjuWGoOepjik=?= , Casper Li =?utf-8?B?KOadjuS4reamrik=?= , "gregkh@linuxfoundation.org" , Steven Price Subject: Re: [BUG] Usersapce MTE error with allocation tag 0 when low on memory Message-ID: References: <5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com> X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: rzjwaashibqkudwfpfz7jj8psa61ykjq X-Rspamd-Queue-Id: 6AF1540020 X-HE-Tag: 1680108892-687115 X-HE-Meta: U2FsdGVkX1+ZP9RkBeU96Du89TUwK8LVdm04LYwVeS1ETdfynwJ5qVSFhbnm1enTAQx9QbrdIxxVLSh3ElUx51lea/2ne8IqxEjJYjIr49YdFtTnol/cAg+bHF6jEbPU1oRbbNtbLongVQWZWttDeAyBBnljZEmInJ3k9fRT4FvrpmmZcpKelMBcVvcQvzsL/MxlE9CP3B+zmX5JXX4jX5jSq4UUK9ye6YT7/ULXHBEBYqsgEk6D8DGIO3bDl2rPmYmBSB0tTpFXASo4kcr4Jbt+pNxxayOiRwcU5CjnObXbix/AVDgPM4BSHtxxgcSbOuljMLPjEe1IcO8DB3HE2WTRYGSNIF2talD+fQ8kRUEFUgDeagvVVJwp3tzwp3DEDIyjnI2KdB5eARY8SlqovjrFmmPxQxAD/fBtfCzRZjEwvNuXEDEkHOSwmv/oK0dOy+Ulc1j5uqqqrBdQdCeTIuiAXyR6yvo58yWGNyXSs3JQvwCRJKunzrHkEbbA3kocwSxemYCf7gyGiCOVB9T3OAYVSkkwwOonP03x3OvX1lXV+pGa1TGRoO9Phkv8PBOmrBMo+e3D+tH1kb6Wv0v4H3Dn8IQmTkpkYjUskIICksxhh9HdqCkpNJboDFhnm6/sOXnwZpn4QgpL6gHaV5P0OjZxFVYQwQkHReRVUDHJ8NoSXQfv3raEXoYyTqgtW4srUSu4/gQJsNDHcCPhyFUK7/S+5trxmwm2VL3UbCPGD1qLc1RWbRKGPH2EHHampFjnNhuLhnuLc4TBDkoRcp5SvmpAXmAR70rUmy8/x5DJtdJtUi1ZuaCEldPVMqGXh8RikeKrBOEwrYy1qOTF0m/mOyjlpUMvSesUA6Jjsos6pAt+QE7cCeFCjVhCoGvl8OUlqzX5mxVkTwYmAu/njx2c9N8J85DTerBK6CmKZgpUVesvDMc9yBT58v2vJ3pljSclTlzvh+RQ8lDFMOl04P6 ye3q0bmw pM+n8AKBv5TShD0f654wxcPKC3nA4RTgTsLkXwyKP9V3M6fUKRb0UrUip6OriEq8/feZWVtVgBujwW7EQejLD1/gBqE3c+fp47NMU4AgCmJQczzfgAWOYOjuESwA0T5uj72b2lz+G31C5SWG2wbLT3IrYwYmR8gh8gjwV11FIBb7NV48C90XRtYLrjBTSdvGw4n4hwLeRolMu8SeeKMjG7KCGZgOzditJaSzWP4HDDvpbAjDCiXPZPlCOcIldAK6x1MnWxozWXLYgBN+zH6V8gLsVr68YVwJz+vbBovH2BSma09CCkftVBdkbaUySiwESUvGHTt30gfUBCMVKgqtzEPsHlA1gVwyNFJgHirq3MMXqpCJpTCox8TQMbbUh/C8hEy8nwhD+s6PRyqrWxX60zJAOAFKCbiAn0HU3BnERytJpFA50o76u+ZiB8C3/VQTyeCc4xY9I3bBKOxr7dwSGc8yGSjyxqVkFbLpmmgUh+wlsOXLoS+Qt0YlS6Fu/uDSluUDT1OUUQcVjy5NkowacUE7/3Q20KWA1CBOLL9zw3kf5U9Pkz8nO7qKU3g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: + Steven Price who added the MTE swap support. On Wed, Mar 29, 2023 at 02:55:49AM +0000, Qun-wei Lin (林群崴) wrote: > Hi, > > We meet the mass MTE errors happened in Android T with kernel-6.1. > > When the system is under memory pressure, the MTE often triggers some > error reporting in userspace. > > Like the tombstone below, there are many reports with the acllocation > tags of 0: > > Build fingerprint: > 'alps/vext_k6897v1_64/k6897v1_64:13/TP1A.220624.014/mp2ofp23:userdebug/ > dev-keys' > Revision: '0' > ABI: 'arm64' > Timestamp: 2023-03-14 06:39:40.344251744+0800 > Process uptime: 0s > Cmdline: /vendor/bin/hw/camerahalserver > pid: 988, tid: 1395, name: binder:988_3 >>> > /vendor/bin/hw/camerahalserver <<< > uid: 1047 > tagged_addr_ctrl: 000000000007fff3 (PR_TAGGED_ADDR_ENABLE, > PR_MTE_TCF_SYNC, mask 0xfffe) > signal 11 (SIGSEGV), code 9 (SEGV_MTESERR), fault addr > 0x0d000075f1d8d7f0 > x0 00000075018d3fb0 x1 00000000c0306201 x2 00000075018d3ae8 x > 3 000000000000720c > x4 0000000000000000 x5 0000000000000000 x6 00000642000004fe x > 7 0000054600000630 > x8 00000000fffffff2 x9 b34a1094e7e33c3f x10 > 00000075018d3a80 x11 00000075018d3a50 > x12 ffffff80ffffffd0 x13 0000061e0000072c x14 > 0000000000000004 x15 0000000000000000 > x16 00000077f2dfcd78 x17 00000077da3a8ff0 x18 > 00000075011bc000 x19 0d000075f1d8d898 > x20 0d000075f1d8d7f0 x21 0d000075f1d8d910 x22 > 0000000000000000 x23 00000000fffffff7 > x24 00000075018d4000 x25 0000000000000000 x26 > 00000075018d3ff8 x27 00000000000fc000 > x28 00000000000fe000 x29 00000075018d3b20 > lr 00000077f2d9f164 sp 00000075018d3ad0 pc 00000077f2d9f134 p > st 0000000080001000 > > backtrace: > #00 pc 000000000005d134 /system/lib64/libbinder.so > (android::IPCThreadState::talkWithDriver(bool)+244) (BuildId: > 8b5612259e4a42521c430456ec5939c7) > #01 pc 000000000005d448 /system/lib64/libbinder.so > (android::IPCThreadState::getAndExecuteCommand()+24) (BuildId: > 8b5612259e4a42521c430456ec5939c7) > #02 pc 000000000005dd64 /system/lib64/libbinder.so > (android::IPCThreadState::joinThreadPool(bool)+68) (BuildId: > 8b5612259e4a42521c430456ec5939c7) > #03 pc 000000000008dba8 /system/lib64/libbinder.so > (android::PoolThread::threadLoop()+24) (BuildId: > 8b5612259e4a42521c430456ec5939c7) > #04 pc 0000000000013440 /system/lib64/libutils.so > (android::Thread::_threadLoop(void*)+416) (BuildId: > 10aac5d4a671e4110bc00c9b69d83d8a) > #05 pc > 00000000000c14cc /apex/com.android.runtime/lib64/bionic/libc.so > (__pthread_start(void*)+204) (BuildId: > 718ecc04753b519b0f6289a7a2fcf117) > #06 pc > 0000000000054930 /apex/com.android.runtime/lib64/bionic/libc.so > (__start_thread+64) (BuildId: 718ecc04753b519b0f6289a7a2fcf117) > > Memory tags around the fault address (0xd000075f1d8d7f0), one tag per > 16 bytes: > 0x75f1d8cf00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8d000: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8d100: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8d200: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8d300: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8d400: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8d500: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8d600: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > =>0x75f1d8d700: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [0] > 0x75f1d8d800: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8d900: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8da00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8db00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8dc00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8dd00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0x75f1d8de00: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > Also happens in coredump. > > This problem only occurs when ZRAM is enabled, so we think there are > some issues regarding swap in/out. > > Having compared the differences between Kernel-5.15 and Kernel-6.1, > We found the order of swap_free() and set_pte_at() is changed in > do_swap_page(). > > When fault in, do_swap_page() will call swap_free() first: > do_swap_page() -> swap_free() -> __swap_entry_free() -> > free_swap_slot() -> swapcache_free_entries() -> swap_entry_free() -> > swap_range_free() -> arch_swap_invalidate_page() -> > mte_invalidate_tags_area() -> mte_invalidate_tags() -> xa_erase() > > and then call set_pte_at(): > do_swap_page() -> set_pte_at() -> __set_pte_at() -> mte_sync_tags() -> > mte_sync_page_tags() -> mte_restore_tags() -> xa_load() > > This means that the swap slot is invalidated before pte mapping, and > this will cause the mte tag in XArray to be released before tag > restore. > > After I moved swap_free() to the next line of set_pte_at(), the problem > is disappeared. > > We suspect that the following patches, which have changed the order, do > not consider the mte tag restoring in page fault flow: > https://lore.kernel.org/all/20220131162940.210846-5-david@redhat.com/ > > Any suggestion is appreciated. > > Thank you.