linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Mikulas Patocka <mpatocka@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alex Deucher <alexander.deucher@amd.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	amd-gfx@lists.freedesktop.org, linux-mm@kvack.org,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>
Subject: Re: [PATCH v3 2/3] mm: only interrupt taking all mm locks on fatal signal
Date: Wed, 7 Jan 2026 10:50:27 +0100	[thread overview]
Message-ID: <5c8ccd03-9a3a-47c0-9a89-be0abf6c1fb5@amd.com> (raw)
In-Reply-To: <b672e17b-461d-16ae-e7d3-45d3c1aab142@redhat.com>

On 1/4/26 22:17, Mikulas Patocka wrote:
> If a process sets up a timer that periodically sends a signal in short
> intervals and if it executes some kernel code that calls
> mm_take_all_locks, we get random -EINTR failures.
> 
> The function mm_take_all_locks fails with -EINTR if there is pending
> signal. The -EINTR is propagated up the call stack to userspace and
> userspace fails if it gets this error.

That is perfectly expected behavior.

> In order to fix these failures, this commit changes
> signal_pending(current) to fatal_signal_pending(current) in
> mm_take_all_locks, so that it is interrupted only if the signal is
> actually killing the process.
> 
> For example, this bug happens when using OpenCL on AMDGPU. Sometimes,
> probing the OpenCL device fails (strace shows that open("/dev/kfd")
> failed with -EINTR). Sometimes we get the message "amdgpu:
> init_user_pages: Failed to register MMU notifier: -4" in the syslog.

Yeah, that's because the ROCm userspace components doesn't work well with signals (and that is documented) but that is not the fault of the kernel.

Could be that the kernel driver prints some incorrect error messages, but that's about it.

So as far as I can see you try to work around problems in ROCm userspace libraries. Please don't do that!

If you find an issue like this the correct approach is to fix ROCm libraries instead, most likely the thunk component.

Regards,
Christian.

> 
> The bug can be reproduced with the following program.
> 
> To run this program, you need AMD graphics card and the package
> "rocm-opencl" installed. You must not have the package "mesa-opencl-icd"
> installed, because it redirects the default OpenCL implementation to
> itself.
> 
> include <stdio.h>
> include <stdlib.h>
> include <unistd.h>
> include <string.h>
> include <signal.h>
> include <sys/time.h>
> 
> define CL_TARGET_OPENCL_VERSION	300
> include <CL/opencl.h>
> 
> static void fn(void)
> {
> 	while (1) {
> 		int32_t err;
> 		cl_device_id device;
> 		err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
> 		if (err != CL_SUCCESS) {
> 			fprintf(stderr, "clGetDeviceIDs failed: %d\n", err);
> 			exit(1);
> 		}
> 		write(2, "-", 1);
> 	}
> }
> 
> static void alrm(int sig)
> {
> 	write(2, ".", 1);
> }
> 
> int main(void)
> {
> 	struct itimerval it;
> 	struct sigaction sa;
> 	memset(&sa, 0, sizeof sa);
> 	sa.sa_handler = alrm;
> 	sa.sa_flags = SA_RESTART;
> 	sigaction(SIGALRM, &sa, NULL);
> 	it.it_interval.tv_sec = 0;
> 	it.it_interval.tv_usec = 50;
> 	it.it_value.tv_sec = 0;
> 	it.it_value.tv_usec = 50;
> 	setitimer(ITIMER_REAL, &it, NULL);
> 	fn();
> 	return 1;
> }
> 
> I'm submitting this patch for the stable kernels, because this bug may
> cause random failures in any code that calls mm_take_all_locks.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-November/133141.html
> Link: https://yhbt.net/lore/linux-mm/6f16b618-26fc-3031-abe8-65c2090262e7@redhat.com/T/#u
> Cc: stable@vger.kernel.org
> Fixes: 7906d00cd1f6 ("mmu-notifiers: add mm_take_all_locks() operation")
> 
> ---
>  mm/vma.c |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: mm/mm/vma.c
> ===================================================================
> --- mm.orig/mm/vma.c	2026-01-04 21:19:13.000000000 +0100
> +++ mm/mm/vma.c	2026-01-04 21:19:13.000000000 +0100
> @@ -2166,14 +2166,14 @@ int mm_take_all_locks(struct mm_struct *
>  	 * is reached.
>  	 */
>  	for_each_vma(vmi, vma) {
> -		if (signal_pending(current))
> +		if (fatal_signal_pending(current))
>  			goto out_unlock;
>  		vma_start_write(vma);
>  	}
>  
>  	vma_iter_init(&vmi, mm, 0);
>  	for_each_vma(vmi, vma) {
> -		if (signal_pending(current))
> +		if (fatal_signal_pending(current))
>  			goto out_unlock;
>  		if (vma->vm_file && vma->vm_file->f_mapping &&
>  				is_vm_hugetlb_page(vma))
> @@ -2182,7 +2182,7 @@ int mm_take_all_locks(struct mm_struct *
>  
>  	vma_iter_init(&vmi, mm, 0);
>  	for_each_vma(vmi, vma) {
> -		if (signal_pending(current))
> +		if (fatal_signal_pending(current))
>  			goto out_unlock;
>  		if (vma->vm_file && vma->vm_file->f_mapping &&
>  				!is_vm_hugetlb_page(vma))
> @@ -2191,7 +2191,7 @@ int mm_take_all_locks(struct mm_struct *
>  
>  	vma_iter_init(&vmi, mm, 0);
>  	for_each_vma(vmi, vma) {
> -		if (signal_pending(current))
> +		if (fatal_signal_pending(current))
>  			goto out_unlock;
>  		if (vma->anon_vma)
>  			list_for_each_entry(avc, &vma->anon_vma_chain, same_vma)
> 



      parent reply	other threads:[~2026-01-07  9:50 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-04 21:17 Mikulas Patocka
2026-01-05 10:42 ` Vlastimil Babka
2026-01-05 12:15   ` Lorenzo Stoakes
2026-01-05 18:15 ` Liam R. Howlett
2026-01-05 20:08   ` Mikulas Patocka
2026-01-06 17:40     ` Liam R. Howlett
2026-01-06 20:19       ` Mikulas Patocka
2026-01-06 21:56         ` Pedro Falcato
2026-01-07 20:14           ` Mikulas Patocka
2026-01-07  8:43         ` Vlastimil Babka
2026-01-07  9:25         ` Michel Dänzer
2026-01-06 11:36   ` Michel Dänzer
2026-01-06 12:52     ` Mikulas Patocka
2026-01-06 15:03       ` David Hildenbrand (Red Hat)
2026-01-07  9:55         ` Vlastimil Babka
2026-01-07 22:19           ` David Hildenbrand (Red Hat)
2026-01-06 14:57     ` Vlastimil Babka
2026-01-07  9:50 ` Christian König [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c8ccd03-9a3a-47c0-9a89-be0abf6c1fb5@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mpatocka@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox