linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 2/3] mm: only interrupt taking all mm locks on fatal signal
@ 2026-01-04 21:17 Mikulas Patocka
  2026-01-05 10:42 ` Vlastimil Babka
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Mikulas Patocka @ 2026-01-04 21:17 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Alex Deucher, Christian König, Andrew Morton,
	David Hildenbrand, amd-gfx, linux-mm, Liam R. Howlett,
	Vlastimil Babka, Jann Horn, Pedro Falcato

If a process sets up a timer that periodically sends a signal in short
intervals and if it executes some kernel code that calls
mm_take_all_locks, we get random -EINTR failures.

The function mm_take_all_locks fails with -EINTR if there is pending
signal. The -EINTR is propagated up the call stack to userspace and
userspace fails if it gets this error.

In order to fix these failures, this commit changes
signal_pending(current) to fatal_signal_pending(current) in
mm_take_all_locks, so that it is interrupted only if the signal is
actually killing the process.

For example, this bug happens when using OpenCL on AMDGPU. Sometimes,
probing the OpenCL device fails (strace shows that open("/dev/kfd")
failed with -EINTR). Sometimes we get the message "amdgpu:
init_user_pages: Failed to register MMU notifier: -4" in the syslog.

The bug can be reproduced with the following program.

To run this program, you need AMD graphics card and the package
"rocm-opencl" installed. You must not have the package "mesa-opencl-icd"
installed, because it redirects the default OpenCL implementation to
itself.

include <stdio.h>
include <stdlib.h>
include <unistd.h>
include <string.h>
include <signal.h>
include <sys/time.h>

define CL_TARGET_OPENCL_VERSION	300
include <CL/opencl.h>

static void fn(void)
{
	while (1) {
		int32_t err;
		cl_device_id device;
		err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
		if (err != CL_SUCCESS) {
			fprintf(stderr, "clGetDeviceIDs failed: %d\n", err);
			exit(1);
		}
		write(2, "-", 1);
	}
}

static void alrm(int sig)
{
	write(2, ".", 1);
}

int main(void)
{
	struct itimerval it;
	struct sigaction sa;
	memset(&sa, 0, sizeof sa);
	sa.sa_handler = alrm;
	sa.sa_flags = SA_RESTART;
	sigaction(SIGALRM, &sa, NULL);
	it.it_interval.tv_sec = 0;
	it.it_interval.tv_usec = 50;
	it.it_value.tv_sec = 0;
	it.it_value.tv_usec = 50;
	setitimer(ITIMER_REAL, &it, NULL);
	fn();
	return 1;
}

I'm submitting this patch for the stable kernels, because this bug may
cause random failures in any code that calls mm_take_all_locks.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Link: https://lists.freedesktop.org/archives/amd-gfx/2025-November/133141.html
Link: https://yhbt.net/lore/linux-mm/6f16b618-26fc-3031-abe8-65c2090262e7@redhat.com/T/#u
Cc: stable@vger.kernel.org
Fixes: 7906d00cd1f6 ("mmu-notifiers: add mm_take_all_locks() operation")

---
 mm/vma.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: mm/mm/vma.c
===================================================================
--- mm.orig/mm/vma.c	2026-01-04 21:19:13.000000000 +0100
+++ mm/mm/vma.c	2026-01-04 21:19:13.000000000 +0100
@@ -2166,14 +2166,14 @@ int mm_take_all_locks(struct mm_struct *
 	 * is reached.
 	 */
 	for_each_vma(vmi, vma) {
-		if (signal_pending(current))
+		if (fatal_signal_pending(current))
 			goto out_unlock;
 		vma_start_write(vma);
 	}
 
 	vma_iter_init(&vmi, mm, 0);
 	for_each_vma(vmi, vma) {
-		if (signal_pending(current))
+		if (fatal_signal_pending(current))
 			goto out_unlock;
 		if (vma->vm_file && vma->vm_file->f_mapping &&
 				is_vm_hugetlb_page(vma))
@@ -2182,7 +2182,7 @@ int mm_take_all_locks(struct mm_struct *
 
 	vma_iter_init(&vmi, mm, 0);
 	for_each_vma(vmi, vma) {
-		if (signal_pending(current))
+		if (fatal_signal_pending(current))
 			goto out_unlock;
 		if (vma->vm_file && vma->vm_file->f_mapping &&
 				!is_vm_hugetlb_page(vma))
@@ -2191,7 +2191,7 @@ int mm_take_all_locks(struct mm_struct *
 
 	vma_iter_init(&vmi, mm, 0);
 	for_each_vma(vmi, vma) {
-		if (signal_pending(current))
+		if (fatal_signal_pending(current))
 			goto out_unlock;
 		if (vma->anon_vma)
 			list_for_each_entry(avc, &vma->anon_vma_chain, same_vma)



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-01-07 22:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-04 21:17 [PATCH v3 2/3] mm: only interrupt taking all mm locks on fatal signal Mikulas Patocka
2026-01-05 10:42 ` Vlastimil Babka
2026-01-05 12:15   ` Lorenzo Stoakes
2026-01-05 18:15 ` Liam R. Howlett
2026-01-05 20:08   ` Mikulas Patocka
2026-01-06 17:40     ` Liam R. Howlett
2026-01-06 20:19       ` Mikulas Patocka
2026-01-06 21:56         ` Pedro Falcato
2026-01-07 20:14           ` Mikulas Patocka
2026-01-07  8:43         ` Vlastimil Babka
2026-01-07  9:25         ` Michel Dänzer
2026-01-06 11:36   ` Michel Dänzer
2026-01-06 12:52     ` Mikulas Patocka
2026-01-06 15:03       ` David Hildenbrand (Red Hat)
2026-01-07  9:55         ` Vlastimil Babka
2026-01-07 22:19           ` David Hildenbrand (Red Hat)
2026-01-06 14:57     ` Vlastimil Babka
2026-01-07  9:50 ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox