From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01635C4332F for ; Sun, 1 Jan 2023 11:08:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 89A1E8E0002; Sun, 1 Jan 2023 06:08:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 825F28E0001; Sun, 1 Jan 2023 06:08:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 69B758E0002; Sun, 1 Jan 2023 06:08:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 54A6E8E0001 for ; Sun, 1 Jan 2023 06:08:17 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 351D5A04EF for ; Sun, 1 Jan 2023 11:08:17 +0000 (UTC) X-FDA: 80305956234.26.24E74B1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf16.hostedemail.com (Postfix) with ESMTP id D7825180008 for ; Sun, 1 Jan 2023 11:08:14 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HyubhYj8; spf=pass (imf16.hostedemail.com: domain of mlevitsk@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mlevitsk@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672571295; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fBBKzqii62ZjxIZ4kMasNOVjJLy5rfIiuLe5QumouC4=; b=qKvye47VvHCPo2NuhvWvuyPNur8arjdNHyDpL81W7dcySlXdBkburpCrEhLt8C+Ngjawrl qbVW6+AuvH/7W0lbzCsZLnPe3awEpNuav0c11Ke77Qop2B7vw8Of+AHSXxq+vQphJTENgN cNzfJWKZsjTfVYC2WxLK5QNO8/uwn60= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HyubhYj8; spf=pass (imf16.hostedemail.com: domain of mlevitsk@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mlevitsk@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672571295; a=rsa-sha256; cv=none; b=IBC2l6rGnemf2sdNZVNLEiKWzrtbYDvlciqKS0KywvSKnOUozvS+7v07Xa02JDz87b/mKp yPXxD7T9X20ZSL26skOq3Va4CkYtWSHm6mkpPQmQt39hfNSxNXNEe/K9oSKdKy+2asAT41 jDP4RNmyRzX5mw3b+jOs93kqrOaECUk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672571294; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fBBKzqii62ZjxIZ4kMasNOVjJLy5rfIiuLe5QumouC4=; b=HyubhYj8w63yDT7vgINEgqdC/R6DchM0sFt4iHFqEpYMY0L26T5u8PAnfd1Kknva+BFNYE VoPHZdr2Gak/MZV3yn+1QiIE98b5VGJ0OdNaHMYDAlnyqcThGk4PKRPnqIYNy89rFgNOYN dWZDJ8XE8DbFk0LQBQUFJT72J/6mOkY= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-499-8wS4LQBtMSeKeadzOLScpw-1; Sun, 01 Jan 2023 06:08:13 -0500 X-MC-Unique: 8wS4LQBtMSeKeadzOLScpw-1 Received: by mail-wm1-f71.google.com with SMTP id c7-20020a1c3507000000b003d355c13ba8so13639709wma.6 for ; Sun, 01 Jan 2023 03:08:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=fBBKzqii62ZjxIZ4kMasNOVjJLy5rfIiuLe5QumouC4=; b=AkNw0dfaNGXqbw/oY0pDkUHWk7pk7l9iy5L/xzLw+kyxrP5CUY+P/AITB6M0KJYBci b9T1emdN1nYtwM7a2qye70hGA+7cTJhcTK4UfI3jhbrGYhZ5bchTxgwosmHxMynKaWkI OGv+Te1ykiGb/YC4WgqAN83TgYYbMHYopqfsWvhO94PMUzwkrUxk/Oon8o5RxHdXzXX4 GZPX44bQ76vcX9dNPfw3KCJKVrU6iKd6kQeXsRyk6kCZHXsmI8vOw7WYCW6O6n9rCBiY nfdoNO9tAMA3n4gUN8M1cjPitNJeaJ9yo5pHhL7WzsJR9ZJLvqss2h6EgOrjfhqRSTFf Unrg== X-Gm-Message-State: AFqh2krsjPF3mFm9oLVowxvazlFj9WGOAgNN7oDSod6avQtsWYl0DINC Z3cbweOEfSpypOvcVmoj9eexlgmIbi1k1JngPwf2h3gfTwP/nwFoN9Gjlacu8nQkdfKO86D8d7p v8AOrPEJvEr8= X-Received: by 2002:a05:600c:41c2:b0:3d3:5565:3617 with SMTP id t2-20020a05600c41c200b003d355653617mr30193553wmh.24.1672571291062; Sun, 01 Jan 2023 03:08:11 -0800 (PST) X-Google-Smtp-Source: AMrXdXv3H1oBhA10tTYm0SEh9kpFSr3kuZ1y9+L8b22gb0VvrfW/7eer/yg4p3C3Kirc5w3npOjo+A== X-Received: by 2002:a05:600c:41c2:b0:3d3:5565:3617 with SMTP id t2-20020a05600c41c200b003d355653617mr30193528wmh.24.1672571290802; Sun, 01 Jan 2023 03:08:10 -0800 (PST) Received: from starship ([89.237.103.62]) by smtp.gmail.com with ESMTPSA id q2-20020a1cf302000000b003d1cc0464a2sm34525487wmq.8.2023.01.01.03.08.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 01 Jan 2023 03:08:10 -0800 (PST) Message-ID: <451187de09e9a80f73a0588da65d55d4a8da6552.camel@redhat.com> Subject: Re: supervisor write access in kernel mode in __pv_queued_spin_unlock_slowpath From: Maxim Levitsky To: Hyeonggon Yoo <42.hyeyoo@gmail.com>, kernel test robot Cc: Vlastimil Babka , oe-lkp@lists.linux.dev, lkp@intel.com, Mike Rapoport , Christoph Lameter , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Juergen Gross , "Srivatsa S. Bhat" , Alexey Makhalov , VMware PV-Drivers Reviewers , kvm@vger.kernel.org, Sean Christopherson Date: Sun, 01 Jan 2023 13:08:07 +0200 In-Reply-To: References: <202212312021.bc1efe86-oliver.sang@intel.com> User-Agent: Evolution 3.36.5 (3.36.5-2.fc32) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Stat-Signature: iphzmub9h6gmbtnr3kyto3ysq1jx76cw X-Rspam-User: X-Rspamd-Queue-Id: D7825180008 X-Rspamd-Server: rspam06 X-HE-Tag: 1672571294-424020 X-HE-Meta: U2FsdGVkX1/wCBMfmoY6jZ5WrVRvTQChpUgw3X8c5uxAK+dku8I6hkucng+yg79S7gaSMK0Ps8cfOQm3QcNI0KylVdXGZZLVg38CbTOsveN0W8q+FBn8tc5CQUsKhKgGLM/BYvQ6X1gl9U7oVnwdcpfBiDx5SMw93ai7wJnKdnF5g1wjCg6jiNHOTjSZqJFgPsUvRjr+x9NQXkzWvvUrP7bVakBW0tQhlYgEkPMhFP6t/saHXM9KLrGswqBahuUzcbFuRQ6AummhKq6+mc1mHfbtUfrGzmd4KfGkIinLUJiiXt5JSOMWCscDVCM2NtBnR8j0+m/5AIM9E9lHHIC1rxkbtFsHd74JpuAHUDYiqQ6grdyl+1U/mo07eWk9u2kfJ4dHvOIpVipr3qq1MBiYq7MVI/BoYyISIm/+OyTouPE9JQQ6s4Om7PW2wS9Q/nOl1kNx/yXP0WImlQsOJ4zeCCYVvgC+eZcqoRcrC0ANtpcDC27A+jY1pMxv2mIcKn97UCga4t/V6lrY36WSH0Do2YaYjAsuj4f8R6CvXLjYtBUD8QHPVseyULNiOF/L7yAQfJjwOs0CQ/YUPc2Z04R/MMsk4NTOO+hqGF+lQQcxo2Xzv5pmGvsAlX1tnjz6lS+dppiXpvW3Q91TDIdLNZsExbS1KH4eeWG5Qx9qbX249sHxT8bhqCUG+8EZsP6iy5DPxO4x4gPqcmkcAh8OzSv7ixyw+AwQU92fdPG0Xk20xOH0R26b2G+YV3UN589LBfyinQk48Ofx221MfK87AFfHk2RVKdpV1xvVY5gk9usKeXrzRry89pgVxrh/4TIi4qNFeRTm8trT21u+7VM7ke9jDW+WT7Z1hufQPETBqwqaT1MIKjJoi27QD1tqbmBR2xtMEyKEZfbUZDr9mbLAqTelo/Gw302rFSDvmkfjeAB139qRZU+61OBYuNSP3N/mrKSbGjc6E/WWfcvDaPDLJdp JQXuCV5i kWF4KaZr3TX/bfiCwKuUI8ImZrczWJs7syeGx5heCDivosDJU4JjrZMIeVo4PU/A+y8eXiZul/Nh3qaqApJGmnVEM2IAeCbBqBPn4JoYKsHwEz5odmO0xdbyRO2sbfKsiam2+pcs1yDIkb0jcOCwsRcinOCLsLHejqOC4vcS5Qqr0JsKwIHQVf5rnycX4ylFlOft1p9xa22U4n500MoVcF0EY8uIfgei1v/qVKHmFAIEGP7U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, 2023-01-01 at 16:37 +0900, Hyeonggon Yoo wrote: > On Sun, Jan 01, 2023 at 03:50:28PM +0900, Hyeonggon Yoo wrote: > > On Sat, Dec 31, 2022 at 11:26:25PM +0800, kernel test robot wrote: > > > Greeting, > > > > > > FYI, we noticed kernel_BUG_at_include/linux/mm.h due to commit (built with gcc-11): > > > > > > commit: 0af8489b0216fa1dd83e264bef8063f2632633d7 ("mm, slub: remove percpu slabs with CONFIG_SLUB_TINY") > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > [test failed on linux-next/master c76083fac3bae1a87ae3d005b5cb1cbc761e31d5] > > > > > > in testcase: rcutorture > > > version: > > > with following parameters: > > > > > > runtime: 300s > > > test: default > > > torture_type: tasks-tracing > > > > > > test-description: rcutorture is rcutorture kernel module load/unload test. > > > test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt > > > > > > > > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > > > > > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > > > > > > > > If you fix the issue, kindly add following tag > > > > Reported-by: kernel test robot > > > > Link: https://lore.kernel.org/oe-lkp/202212312021.bc1efe86-oliver.sang@intel.com > > > > > > > > > > > > To reproduce: > > > > > > # build kernel > > > cd linux > > > cp config-6.1.0-rc2-00014-g0af8489b0216 .config > > > make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules > > > make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 INSTALL_MOD_PATH= modules_install > > > cd > > > find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz > > > > > > > > > git clone https://github.com/intel/lkp-tests.git > > > cd lkp-tests > > > bin/lkp qemu -k -m modules.cgz job-script # job-script is attached in this email > > > > > > # if come across any failure that blocks the test, > > > # please remove ~/.lkp and /lkp dir to run from a clean state. > > > > I was unable to reproduce in the same way as described above > > because some files referenced in job-script couldn't be downloaded from > > download.01.org/0day :( > > > > So I just built rcutorture module as builtin > > and I got weird spinlock bug on commit: 0af8489b0216 > > ("mm, slub: remove percpu slabs with CONFIG_SLUB_TINY") > > (+Cc KVM/Paravirt experts) > > > full dmesg added as attachment > > > > [ 1387.564837][ T57] BUG: unable to handle page fault for address: c108f5f4 > > [ 1387.566649][ T57] #PF: supervisor write access in kernel mode > > [ 1387.567965][ T57] #PF: error_code(0x0003) - permissions violation > > [ 1387.569439][ T57] *pde = 010001e1 > > [ 1387.570276][ T57] Oops: 0003 [#1] SMP > > [ 1387.571149][ T57] CPU: 2 PID: 57 Comm: rcu_torture_rea Tainted: G S 6.1.0-rc2-00010-g0af8489b0216 #2130 63d19ac2b985fca570c354d8750f489755de37ed > > [ 1387.574673][ T57] EIP: kvm_kick_cpu+0x54/0x90 > > [ 1387.575802][ T57] Code: 2f c5 01 8b 04 9d e0 d4 4e c4 83 15 14 7b 2f c5 00 83 05 08 6d 2f c5 01 0f b7 0c 30 b8 05 00 00 00 83 15 0c 6d 2f c5 00 31 db <0f> 01 c1 83 05 10 6d 2f c5 01 8b 5d f8 8b 75 fc 83 15 14 6d 2f c5 ^^^^^^ Yes this is the unfamous hypercall patching bug.... > > So what is happening is that Intel and AMD has a *slightly* different instruction reserved for hypercalls (paravirt calls from guest to host hypervisor). KVM developers made a mistake to be 'nice' to the guests and if the guest uses the wrong hypercall instruction the KVM attempts to rewrite it with the right instruction. That can fail, because to avoid security issues, KVM uses the exact same security context as the instruction itself (it is as if the instruction was defined such as it overwrote itself) This means that is the guest memory is marked read-only in the guest paging, then the write will fail and #PF will happen on the wrong hypercall instruction. Here we have the Intel's instruction (VMCALL, 0f 01 C1), and the host machine is likely AMD which uses VMMCALL instruction which is (0F 01 D9) Now any recent Linux guest is supposed to use a right instruction using the alternatives mechanism, but it can if the hypervisor passes 'non native' vendor id, like GenunineIntel on AMD machine. In my testing using named CPU models like you do '-cpu SandyBridge' still passes through host vendor ID (that is the guest will see Intel's cpu but with vendor='AutheticAMD') but nobody confirmed me that this is a bug or a feature and I am not sure if older qemu versions also did this. Assuming that your host machine is AMD, your best bet to check if my theory is right is to boot the guest without triggering the bug, and check in /proc/cpuinfo if the vendor string is 'GenuineIntel' Best regards, Maxim Levitsky [ 1387.580456][ T57] EAX: 00000005 EBX: 00000000 ECX: 00000003 EDX: c108f5a0 > > [ 1387.582071][ T57] ESI: c5153580 EDI: 00000046 EBP: c69cddf8 ESP: c69cddf0 > > [ 1387.583775][ T57] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010046 > > [ 1387.585643][ T57] CR0: 80050033 CR2: c108f5f4 CR3: 0776b000 CR4: 00350e90 > > [ 1387.587492][ T57] Call Trace: > > [ 1387.588365][ T57] __pv_queued_spin_unlock_slowpath+0x66/0x110 > > [ 1387.589898][ T57] __pv_queued_spin_unlock+0x4b/0x60 > > [ 1387.591040][ T57] __raw_callee_save___pv_queued_spin_unlock+0x9/0x10 > > [ 1387.592771][ T57] do_raw_spin_unlock+0x49/0xa0 > > [ 1387.593805][ T57] _raw_spin_unlock_irqrestore+0x53/0xd0 > > [ 1387.594927][ T57] swake_up_one+0x4f/0x70 > > [ 1387.595739][ T57] __rcu_report_exp_rnp+0x26b/0x470 > > [ 1387.596730][ T57] rcu_report_exp_cpu_mult+0x82/0x2f0 > > [ 1387.597770][ T57] rcu_qs+0xac/0x160 > > [ 1387.598503][ T57] rcu_note_context_switch+0x31/0x1e0 > > [ 1387.599460][ T57] __schedule+0xc5/0x770 > > [ 1387.600195][ T57] __cond_resched+0x7a/0x100 > > [ 1387.600996][ T57] stutter_wait+0x9e/0x2c0 > > [ 1387.601956][ T57] rcu_torture_reader+0x162/0x3e0 > > [ 1387.603048][ T57] ? rcu_torture_reader+0x3e0/0x3e0 > > [ 1387.604269][ T57] ? __kthread_parkme+0xab/0xf0 > > [ 1387.605420][ T57] kthread+0x167/0x1d0 > > [ 1387.606383][ T57] ? rcu_torture_read_exit_child+0xa0/0xa0 > > [ 1387.607516][ T57] ? kthread_exit+0x50/0x50 > > [ 1387.608517][ T57] ret_from_fork+0x19/0x24 > > [ 1387.609548][ T57] Modules linked in: > > [ 1387.610187][ T57] CR2: 00000000c108f5f4 > > [ 1387.610873][ T57] ---[ end trace 0000000000000000 ]--- > > [ 1387.611829][ T57] EIP: kvm_kick_cpu+0x54/0x90 > > [ 1387.612653][ T57] Code: 2f c5 01 8b 04 9d e0 d4 4e c4 83 15 14 7b 2f c5 00 83 05 08 6d 2f c5 01 0f b7 0c 30 b8 05 00 00 00 83 15 0c 6d 2f c5 00 31 db <0f> 01 c1 83 05 10 6d 2f c5 01 8b 5d f8 8b 75 fc 83 15 14 6d 2f c5 > > [ 1387.616715][ T57] EAX: 00000005 EBX: 00000000 ECX: 00000003 EDX: c108f5a0 > > [ 1387.618242][ T57] ESI: c5153580 EDI: 00000046 EBP: c69cddf8 ESP: c69cddf0 > > [ 1387.619912][ T57] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010046 > > [ 1387.621666][ T57] CR0: 80050033 CR2: c108f5f4 CR3: 0776b000 CR4: 00350e90 > > [ 1387.623128][ T57] Kernel panic - not syncing: Fatal exception > > [ 1389.285045][ T57] Shutting down cpus with NMI > > [ 1389.297949][ T57] Kernel Offset: disabled > > [ 1389.299174][ T57] ---[ end Kernel panic - not syncing: Fatal exception ]---