From: Alistair Popple <apopple@nvidia.com>
To: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com>
Cc: "Yedireswarapu,
SaiX Nandan" <saix.nandan.yedireswarapu@intel.com>,
"Saarinen, Jani" <jani.saarinen@intel.com>,
"Kurmi, Suresh Kumar" <suresh.kumar.kurmi@intel.com>,
"Nikula, Jani" <jani.nikula@intel.com>,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
dan.carpenter@linaro.org
Subject: Re: Regression in linux-next
Date: Tue, 25 Jul 2023 23:15:25 +1000 [thread overview]
Message-ID: <87o7k0xh9u.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <SJ1PR11MB61296D265E3407D447188EF6B903A@SJ1PR11MB6129.namprd11.prod.outlook.com>
Thanks Chaitanya for the detailed report. Dan Carpenter also reported a
Smatch warning for this:
https://lore.kernel.org/linux-mm/38ed0627-1283-4da2-827a-e90484d7bd7d@moroto.mountain/
The below should fix the problem, will respin the series to include the
fix.
---
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 63c8eb740af7..ec3b068cbbe6 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -621,9 +621,10 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
* Subsystems should only register for invalidate_secondary_tlbs() or
* invalidate_range_start()/end() callbacks, not both.
*/
- if (WARN_ON_ONCE(subscription->ops->arch_invalidate_secondary_tlbs &&
- (subscription->ops->invalidate_range_start ||
- subscription->ops->invalidate_range_end)))
+ if (WARN_ON_ONCE(subscription &&
+ (subscription->ops->arch_invalidate_secondary_tlbs &&
+ (subscription->ops->invalidate_range_start ||
+ subscription->ops->invalidate_range_end))))
return -EINVAL;
if (!mm->notifier_subscriptions) {
"Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com> writes:
> Hello Alistair,
>
> Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.
>
> This mail is regarding a regression we are seeing in our CI runs[1] on linux-next
> repository.
>
> On next-20230720 [2], we are seeing the following error
>
> <4>[ 76.189375] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P DDR5 SODIMM SBS RVP, BIOS MTLPFWI1.R00.3271.D81.2307101805 07/10/2023
> <4>[ 76.202534] RIP: 0010:__mmu_notifier_register+0x40/0x210
> <4>[ 76.207804] Code: 1a 71 5a 01 85 c0 0f 85 ec 00 00 00 48 8b 85 30
> 01 00 00 48 85 c0 0f 84 04 01 00 00 8b 85 cc 00 00 00 85 c0 0f 8e bb
> 01 00 00 <49> 8b 44 24 10 48 83 78 38 00 74 1a 48 83 78 28 00 74 0c 0f
> 0b b8
> <4>[ 76.226368] RSP: 0018:ffffc900019d7ca8 EFLAGS: 00010202
> <4>[ 76.231549] RAX: 0000000000000001 RBX: 0000000000001000 RCX: 0000000000000001
> <4>[ 76.238613] RDX: 0000000000000000 RSI: ffffffff823ceb7b RDI: ffffffff823ee12d
> <4>[ 76.245680] RBP: ffff888102ec9b40 R08: 00000000ffffffff R09: 0000000000000001
> <4>[ 76.252747] R10: 0000000000000001 R11: ffff8881157cd2c0 R12: 0000000000000000
> <4>[ 76.259811] R13: ffff888102ec9c70 R14: ffffffffa07de500 R15: ffff888102ec9ce0
> <4>[ 76.266875] FS: 00007fbcabe11c00(0000) GS:ffff88846ec00000(0000) knlGS:0000000000000000
> <4>[ 76.274884] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[ 76.280578] CR2: 0000000000000010 CR3: 000000010d4c2005 CR4: 0000000000f70ee0
> <4>[ 76.287643] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>[ 76.294711] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> <4>[ 76.301775] PKRU: 55555554
> <4>[ 76.304463] Call Trace:
> <4>[ 76.306893] <TASK>
> <4>[ 76.308983] ? __die_body+0x1a/0x60
> <4>[ 76.312444] ? page_fault_oops+0x156/0x450
> <4>[ 76.316510] ? do_user_addr_fault+0x65/0x980
> <4>[ 76.320747] ? exc_page_fault+0x68/0x1a0
> <4>[ 76.324643] ? asm_exc_page_fault+0x26/0x30
> <4>[ 76.328796] ? __mmu_notifier_register+0x40/0x210
> <4>[ 76.333460] ? __mmu_notifier_register+0x11c/0x210
> <4>[ 76.338206] ? preempt_count_add+0x4c/0xa0
> <4>[ 76.342273] mmu_notifier_register+0x30/0xe0
> <4>[ 76.346509] mmu_interval_notifier_insert+0x74/0xb0
> <4>[ 76.351344] i915_gem_userptr_ioctl+0x21a/0x320 [i915]
> <4>[ 76.356565] ? __pfx_i915_gem_userptr_ioctl+0x10/0x10 [i915]
> <4>[ 76.362271] drm_ioctl_kernel+0xb4/0x150
> <4>[ 76.366159] drm_ioctl+0x21d/0x420
> <4>[ 76.369537] ? __pfx_i915_gem_userptr_ioctl+0x10/0x10 [i915]
> <4>[ 76.375242] ? find_held_lock+0x2b/0x80
> <4>[ 76.379046] __x64_sys_ioctl+0x79/0xb0
> <4>[ 76.382766] do_syscall_64+0x3c/0x90
> <4>[ 76.386312] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> <4>[ 76.391317] RIP: 0033:0x7fbcae63f3ab
>
> Details log can be found in [3].
>
> After bisecting the tree, the following patch seems to be causing the
> regression.
>
> commit 828fe4085cae77acb3abf7dd3d25b3ed6c560edf
> Author: Alistair Popple apopple@nvidia.com
> Date: Wed Jul 19 22:18:46 2023 +1000
>
> mmu_notifiers: rename invalidate_range notifier
>
> There are two main use cases for mmu notifiers. One is by KVM which uses
> mmu_notifier_invalidate_range_start()/end() to manage a software TLB.
>
> The other is to manage hardware TLBs which need to use the
> invalidate_range() callback because HW can establish new TLB entries at
> any time. Hence using start/end() can lead to memory corruption as these
> callbacks happen too soon/late during page unmap.
>
> mmu notifier users should therefore either use the start()/end() callbacks
> or the invalidate_range() callbacks. To make this usage clearer rename
> the invalidate_range() callback to arch_invalidate_secondary_tlbs() and
> update documention.
>
> Link: https://lkml.kernel.org/r/9a02dde2f8ddaad2db31e54706a80c12d1817aaf.1689768831.git-series.apopple@nvidia.com
>
>
> We also verified by reverting the patch in the tree.
>
> Could you please check why this patch causes the regression and if we can find
> a solution for it soon?
>
> [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html?
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20230720
> [3] https://intel-gfx-ci.01.org/tree/linux-next/next-20230720/bat-mtlp-6/dmesg0.txt
next prev parent reply other threads:[~2023-07-25 13:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <SJ1PR11MB6129592BDF5D06949F99816CB95B9@SJ1PR11MB6129.namprd11.prod.outlook.com>
[not found] ` <85a93e09-03fd-18d7-c3af-bae13643d01d@suse.cz>
2022-10-03 9:04 ` Regression on linux-next Borah, Chaitanya Kumar
[not found] ` <SJ1PR11MB6129A7F5C08E2C47748F2BA5B97E9@SJ1PR11MB6129.namprd11.prod.outlook.com>
[not found] ` <SJ1PR11MB612980562220A376CA90E105B97E9@SJ1PR11MB6129.namprd11.prod.outlook.com>
2023-07-25 6:42 ` Regression in linux-next Borah, Chaitanya Kumar
2023-07-25 10:53 ` [Intel-gfx] " Tvrtko Ursulin
2023-07-26 3:55 ` Borah, Chaitanya Kumar
2023-07-25 13:15 ` Alistair Popple [this message]
2023-07-26 3:53 ` Borah, Chaitanya Kumar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o7k0xh9u.fsf@nvdebian.thelocal \
--to=apopple@nvidia.com \
--cc=chaitanya.kumar.borah@intel.com \
--cc=dan.carpenter@linaro.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
--cc=jani.saarinen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=saix.nandan.yedireswarapu@intel.com \
--cc=suresh.kumar.kurmi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox