From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: "Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"David Hildenbrand" <david@redhat.com>,
amd-gfx@lists.freedesktop.org, linux-mm@kvack.org,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Jann Horn" <jannh@google.com>,
"Pedro Falcato" <pfalcato@suse.de>
Subject: Re: [PATCH v2] fix AMDGPU failure with periodic signal
Date: Fri, 2 Jan 2026 19:02:40 +0000 [thread overview]
Message-ID: <0826eb09-216c-4d00-b4eb-ed1a2ba204bf@lucifer.local> (raw)
In-Reply-To: <6f16b618-26fc-3031-abe8-65c2090262e7@redhat.com>
+cc literally everyone you should have cc'd in mm :/
Hi Mikulas,
You really need to check MAINTAINERS, you've sent a patch that changes mm/vma.c
without cc'ing a single maintainer or reviewer of that file. I just happened to
notice this by chance, even lei seemed to mess up the file query for some
reason.
I'm confused in general about this patch, you sent it on 7th Nov? And it's been
ignored until now and then taken without review to the hotfixes queue?
Andrew - what's going on here? The patch looks fine but we do need to be made
aware of this stuff!
And it's seemingly against a specific stable version?... I guess this code is
antiquated so safe but still.
Thanks, Lorenzo
On Fri, Nov 07, 2025 at 06:48:01PM +0100, Mikulas Patocka wrote:
> If a process sets up a timer that periodically sends a signal in short
> intervals and if it uses OpenCL on AMDGPU at the same time, we get random
> errors. Sometimes, probing the OpenCL device fails (strace shows that
> open("/dev/kfd") failed with -EINTR). Sometimes we get the message
> "amdgpu: init_user_pages: Failed to register MMU notifier: -4" in the
> syslog.
>
> The bug can be reproduced with this program:
> http://www.jikos.cz/~mikulas/testcases/opencl/opencl-bug-small.c
>
> The root cause for these failures is in the function mm_take_all_locks.
> This function fails with -EINTR if there is pending signal. The -EINTR is
> propagated up the call stack to userspace and userspace fails if it gets
> this error.
>
> There is the following call chain: kfd_open -> kfd_create_process ->
> create_process -> mmu_notifier_get -> mmu_notifier_get_locked ->
> __mmu_notifier_register -> mm_take_all_locks -> "return -EINTR"
>
> If the failure happens in init_user_pages, there is the following call
> chain: init_user_pages -> amdgpu_hmm_register ->
> mmu_interval_notifier_insert -> mmu_notifier_register ->
> __mmu_notifier_register -> mm_take_all_locks -> "return -EINTR"
>
> In order to fix these failures, this commit changes
> signal_pending(current) to fatal_signal_pending(current) in
> mm_take_all_locks, so that it is interrupted only if the signal is
> actually killing the process.
>
> Also, this commit skips pr_err in init_user_pages if the process is being
> killed - in this situation, there was no error and so we don't want to
> report it in the syslog.
>
> I'm submitting this patch for the stable kernels, because this bug may
> cause random failures in any OpenCL code.
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org
>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 +++++++--
> mm/vma.c | 8 ++++----
> 2 files changed, 11 insertions(+), 6 deletions(-)
>
> Index: linux-6.17.7/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> ===================================================================
> --- linux-6.17.7.orig/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ linux-6.17.7/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1069,8 +1069,13 @@ static int init_user_pages(struct kgd_me
>
> ret = amdgpu_hmm_register(bo, user_addr);
> if (ret) {
> - pr_err("%s: Failed to register MMU notifier: %d\n",
> - __func__, ret);
> + /*
> + * If we got EINTR because the process was killed, don't report
> + * it, because no error happened.
> + */
> + if (!(fatal_signal_pending(current) && ret == -EINTR))
> + pr_err("%s: Failed to register MMU notifier: %d\n",
> + __func__, ret);
> goto out;
> }
>
> Index: linux-6.17.7/mm/vma.c
> ===================================================================
> --- linux-6.17.7.orig/mm/vma.c
> +++ linux-6.17.7/mm/vma.c
> @@ -2175,14 +2175,14 @@ int mm_take_all_locks(struct mm_struct *
> * is reached.
> */
> for_each_vma(vmi, vma) {
> - if (signal_pending(current))
> + if (fatal_signal_pending(current))
> goto out_unlock;
> vma_start_write(vma);
> }
>
> vma_iter_init(&vmi, mm, 0);
> for_each_vma(vmi, vma) {
> - if (signal_pending(current))
> + if (fatal_signal_pending(current))
> goto out_unlock;
> if (vma->vm_file && vma->vm_file->f_mapping &&
> is_vm_hugetlb_page(vma))
> @@ -2191,7 +2191,7 @@ int mm_take_all_locks(struct mm_struct *
>
> vma_iter_init(&vmi, mm, 0);
> for_each_vma(vmi, vma) {
> - if (signal_pending(current))
> + if (fatal_signal_pending(current))
> goto out_unlock;
> if (vma->vm_file && vma->vm_file->f_mapping &&
> !is_vm_hugetlb_page(vma))
> @@ -2200,7 +2200,7 @@ int mm_take_all_locks(struct mm_struct *
>
> vma_iter_init(&vmi, mm, 0);
> for_each_vma(vmi, vma) {
> - if (signal_pending(current))
> + if (fatal_signal_pending(current))
> goto out_unlock;
> if (vma->anon_vma)
> list_for_each_entry(avc, &vma->anon_vma_chain, same_vma)
>
>
next prev parent reply other threads:[~2026-01-02 19:02 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-07 17:48 Mikulas Patocka
2026-01-02 19:02 ` Lorenzo Stoakes [this message]
2026-01-02 19:08 ` Lorenzo Stoakes
2026-01-02 19:15 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0826eb09-216c-4d00-b4eb-ed1a2ba204bf@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=david@redhat.com \
--cc=jannh@google.com \
--cc=linux-mm@kvack.org \
--cc=mpatocka@redhat.com \
--cc=pfalcato@suse.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox