From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5CE6EB64D8 for ; Wed, 14 Jun 2023 16:39:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48DCF6B007E; Wed, 14 Jun 2023 12:39:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43E938E0001; Wed, 14 Jun 2023 12:39:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 306906B0081; Wed, 14 Jun 2023 12:39:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1F9B66B007E for ; Wed, 14 Jun 2023 12:39:24 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E076C1C852E for ; Wed, 14 Jun 2023 16:39:23 +0000 (UTC) X-FDA: 80901913806.28.FC02CA2 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf13.hostedemail.com (Postfix) with ESMTP id 2F9CF20015 for ; Wed, 14 Jun 2023 16:39:21 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Z7XbgqIl; spf=pass (imf13.hostedemail.com: domain of lee@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=lee@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686760762; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Hjo+4+rcy/T7lqV49y6LSDWgelglmJxXAhzjSCVvukM=; b=KuC3mUyIkSfxV/7/5p97jXQ+0eFgiQL41MSfBUUSBA0bFrj+HO+/yvkUY1cTfMCZMNTEMI CywklozgEeg7m/Ul6/TWJuZNUZQ7mp3dNCZap9weA/C4mgP0t3nyYQJL+fMkz90WgMQ+dO EelPJQWh5qE346CwlFx1/ttNZKCNNsw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686760762; a=rsa-sha256; cv=none; b=kEMIg6TkZDlVrSV7sYKk475ivNtB6GLD1b1nnNU8+rGz+isiCPhla7nLUO+vUlND/+mbQy 4HajrsNUnViRkGb6zMJSkTL8d2o0wD3p/r4jRbH/Kj37MBO0U8H/ik7Vb+uxomMgEq6WMj AO1S4phRbEj1PQOLrNPd8UB80oA35/Y= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Z7XbgqIl; spf=pass (imf13.hostedemail.com: domain of lee@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=lee@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2C03F63E23; Wed, 14 Jun 2023 16:39:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4D2F3C433C8; Wed, 14 Jun 2023 16:39:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1686760760; bh=N49DIUM8N1AIbUqV2rLwZ2dEQ1XQhBV1xliOLIOeXd4=; h=From:To:Cc:Subject:Date:From; b=Z7XbgqIlPAgdyJHIn2m24HrFNNPLjQ9Uzo2REpu9SaaIFGnt3rinRowHcnD4flYhi NiWbod/sCD4tHxNR+rGgci2WqjSQ/Tz+76XYt/dZ2+esSoON+9Ez1r96C4s8zJRlLl 7ppS01YwKpLUYJEKwdb7f+lOCEh/nvIF9k9vR1tZEjI2br4JP4//9Y+XBZYAIOeKwL /Sg3/SmNcz5G6mzeTo3NX0SwjXQSGguPfGArNVyh7Bj77VOybpcYsLWtAT0Y8ruoo5 HzU6YcdMR4CcExzGaEEGt7WLzlbKGgKZVqiY9udr3U75O0JD1wjC3yt9/O0KV0LoZ7 JAtKfVHpshb0w== From: Lee Jones To: lee@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 0/1] x86: Fix .bss corruption Date: Wed, 14 Jun 2023 17:38:53 +0100 Message-ID: <20230614163859.924309-1-lee@kernel.org> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 2F9CF20015 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: wsjfz9uwtjowtt18hj3kjdoq8e6ojhck X-HE-Tag: 1686760761-343778 X-HE-Meta: U2FsdGVkX1/OvlGYfrjVvYgraK/8yh6244O5VKi31KTTpdwH+RLq7XVJzC3XZTLARnjmsUEndnjZVw3FgfvnrUo4moFB3n3PI5VYT1axn3kA1AUNAzn5Tqk+z2DwQpGdKFvs6NAvVQ5bddMHHi3Un2qRlw+oROT8B0GIgjyuWgL/b7IDyLmjGhyI7GPUvs1GWyiB/ZwtvzB8af9kznsADVDcLqc+qIMyw8hoh1PURpDSnYnrzZzS15aWRxBBlV7uPbXvivcSgKoTTnaDF34CqosqgJMO7/jAH/4LIlFhP3Fa0uxU9NeeXKWmXy7fRo0TyfR0vZ1pxbJk6lRraMUO9/m76E4r4k43qMDrWakvJJ1jdy6NEPT4b/TgnJz6NDhmHP9R26VvMh7xgbKRwRnaDFcel9rJo4H3D+260BQeADNEIxZgok901EK8GwzhLEHhXA7JLj34EuFCEOk/Q4SdYWs2SJYpFpTnozPln+3fRF5wilnJQZ7vaokcWSEwupN7JM7tS2umtKdnO55GiYfeDjiDXmUeOCibpeRb8pwEpYU1jcKaQxnB4nUGbhfheOArGAekOaO1TsTQ24gzMHXH7Vl+BbUvYjoChkp6+qWtyCuWTgFi+V1VXk6zvDvTNHF4OH3GKxO21wmUuTEAoEOEpa/1mlI6v0tTmSJGQwuuqyNrTLx43SuG9dM4zgXqrrKnvBW7aNmjz13ys28/gETouEyk+PkhL7H+OOxcxG+p85BPRQf1901+a3ECXiH/C+yeafn7S9QBOfP6ZnJuT0XVsGk+lKR+EgvCoYNkNHhB0R3hshH+4FcWuDB3EH6IQ8bjJsLcZkfHALlfPhN/9S56qvzPAyX648IqN52ihX6puTJnXRJ0TgHRk3y6gkX9NelWs1qWIOhcMYZKs0P1pUVIxK7NYz0j0QA4yQTUOqpLZC1KJXlSObvryMi3tgsOWdtBNGylUDIOF+RguHB885t 5jy9vbza v6NhTrktpkGi4lS7D+CWNoodw9rJw89HtgXNnAqyxyHJYOXLj5/Rq4pP9rvZJO+hcmdVhXmY5XIcq+VNL/GGdZ4TxAlDW8B+0lUx92N263MTDeOvNdJsFZdn9lIZzDURgDMZc8+aq2ErL+NhNZYM73bhvjmgAXWfuHNmwWN17OmVCX6U1rs1TjJfDb69keJs/FIGgMPhy3d82s6WGq/4Isobu0T70SWxQINEYuHzRuuYke2ySWuz+oKShdT6TMd9/deLnk5sCbxL1S5KLqLdQOe9+9BHPOkbclPnrdwzStlYWosZHmY8/+Z4xaRxsuTDFVc6j1AojTuobRQt/N8QmWxWXXw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: - Report The following highly indeterministic kernel panic was reported to occur every few hundred boots: Kernel panic - not syncing: Fatal exception RIP: 0010:alloc_ucounts+0x68/0x280 RSP: 0018:ffffb53ac13dfe98 EFLAGS: 00010006 RAX: 0000000000000000 RBX: 000000015b804063 RCX: ffff9bb60d5aa500 RDX: 0000000000000001 RSI: 00000000000003fb RDI: 0000000000000001 RBP: ffffb53ac13dfec0 R08: 0000000000000000 R09: ffffd53abf600000 R10: ffff9bb60b4bdd80 R11: ffffffffbcde7bb0 R12: ffffffffbf04e710 R13: ffffffffbf405160 R14: 00000000000003fb R15: ffffffffbf1afc60 FS: 00007f5be44d5000(0000) GS:ffff9bb6b9cc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000015b80407b CR3: 00000001038ea000 CR4: 00000000000406a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __sys_setuid+0x1a0/0x290 __x64_sys_setuid+0xc/0x10 do_syscall_64+0x43/0x90 ? asm_exc_page_fault+0x8/0x30 entry_SYSCALL_64_after_hwframe+0x44/0xae - Problem The issue was eventually tracked down to the attempted dereference of a value located in a corrupted hash table. ucounts_hashtable is an array of 1024 struct hlists. Each element is the head of its own linked list where previous ucount allocations are stored. The [20]th element of ucounts_hashtable was being consistently trashed on each and every boot. However the indeterminism comes from it being accessed only every few hundred boots. The issue disappeared, or was at least unidentifiable when !(LTO=full) or when memory base randomisation (a.k.a. KASLR) was disabled, rending GDB all but impossible to use effectively. The cause of the corruption was uncovered using a verity of different debugging techniques and was eventually tracked down to page table manipulation in early architecture setup. The following line in arch/x86/realmode/init.c [0] allocates a variable, just 8 Bytes in size, to "hold the pgd entry used on booting additional CPUs": pgd_t trampoline_pgd_entry; The address of that variable is then passed from init_trampoline_kaslr() via a call to set_pgd() [1] to have a value (not too important here) assigned to it. Numerous abstractions take place, eventually leading to native_set_p4d(), an inline function [2] contained in arch/x86/include/asm/pgtable_64.h. >From here, intentionally or otherwise, a call to pti_set_user_pgtbl() is made. This is where the out-of-bounds write eventually occurs. It is not known (by me) why this function is called. The returned result is subsequently used as a value to write using the WRITE_ONCE macro. Perhaps the premature write is not intended. This is what I hope to find out. A little way down in pti_set_user_pgtbl() [3] the following line occurs: kernel_to_user_pgdp(pgdp)->pgd = pgd.pgd The kernel_to_user_pgdp() part takes the address of pgdp (a.k.a. trampoline_pgd_entry) and ends up flipping the 12th bit, essentially adding 4k (0x1000) to the address. Then the part at the end assigns our value (still not important here) to it. However, if we remember that only 8 Bytes was allocated (globally) for trampoline_pgd_entry, then means we just stored the value into the outlands (what we now know to be allocated to another global storage user ucounts_hashtable). [0] https://elixir.bootlin.com/linux/latest/source/arch/x86/realmode/init.c#L18 [1] https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/kaslr.c#L178 [2] https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/pgtable_64.h#L142 [3] https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/pti.c#L142 Lee Jones (1): x86/mm/KASLR: Store pud_page_tramp into entry rather than page arch/x86/mm/kaslr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) -- 2.41.0.162.gfafddb0af9-goog