Re: [PATCH v3 2/3] mm: only interrupt taking all mm locks on fatal signal

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: "Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Hildenbrand" <david@redhat.com>,
	amd-gfx@lists.freedesktop.org, linux-mm@kvack.org,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Jann Horn" <jannh@google.com>,
	"Pedro Falcato" <pfalcato@suse.de>
Subject: Re: [PATCH v3 2/3] mm: only interrupt taking all mm locks on fatal signal
Date: Mon, 5 Jan 2026 13:15:54 -0500	[thread overview]
Message-ID: <7whbqlfrwjr4z2d4bpny3rjyl5tetdyx7ccf52uvby7hgywoym@6l6m2xcytez7> (raw)
In-Reply-To: <b672e17b-461d-16ae-e7d3-45d3c1aab142@redhat.com>

* Mikulas Patocka <mpatocka@redhat.com> [260104 16:17]:
> If a process sets up a timer that periodically sends a signal in short
> intervals and if it executes some kernel code that calls
> mm_take_all_locks, we get random -EINTR failures.
> 
> The function mm_take_all_locks fails with -EINTR if there is pending
> signal. The -EINTR is propagated up the call stack to userspace and
> userspace fails if it gets this error.
> 
> In order to fix these failures, this commit changes
> signal_pending(current) to fatal_signal_pending(current) in
> mm_take_all_locks, so that it is interrupted only if the signal is
> actually killing the process.

I may be missing context because I didn't get 1/3 of this patch set,
nor can I find it in my ML searching.  Nor did I get the cover letter,
or find it.  Is this series threaded?

What you are doing is changing a really horrible loop across all VMAs,
that happens 4 times, to a less interruptible method.

I'm not sure I'm okay with this.  Everyone else does seem fine with it,
because userspace basically never checks error codes for a retry (or
really anything, and sometimes not even for an error at all).

But this could potentially have larger consequences for those
applications that register signal handlers, couldn't it?  That is, they
expect to get a return based on some existing code but now it won't
return and the application is forced to wait for all locks to be taken
regardless of how long it takes - or to force kill the application?

We regularly get people requesting the default number of vmas be
increased.  This means that processes that approach max_map_count will
wait until 4 loops through the vmas before they can be interrupted, or
they have to kill the process.

> 
> For example, this bug happens when using OpenCL on AMDGPU. Sometimes,
> probing the OpenCL device fails (strace shows that open("/dev/kfd")
> failed with -EINTR). Sometimes we get the message "amdgpu:
> init_user_pages: Failed to register MMU notifier: -4" in the syslog.

If you only get the error message sometimes, does that mean there is
another signal check that isn't covered by this change - or another call
path?

> 
> The bug can be reproduced with the following program.
> 
> To run this program, you need AMD graphics card and the package
> "rocm-opencl" installed. You must not have the package "mesa-opencl-icd"
> installed, because it redirects the default OpenCL implementation to
> itself.

I'm not saying it's wrong to change the signal handling, but this is
very much working around a bug in userspace constantly hammering a task
with signals and then is surprised there is a response that the kernel
was interrupted.

This seems to imply that all signal handling should only happen on fatal
signals?

...
> 
> I'm submitting this patch for the stable kernels, because this bug may
> cause random failures in any code that calls mm_take_all_locks.

They aren't random failures, they are a response to a signal sent to the
process that may be taking a very long time to do something.

I really don't see how continuously sending signals at a short interval
interrupting system calls can be considered random failures, especially
when the return is -EINTR which literally means "Interrupted system
call".  How else would you interrupt a system call, if not a signal?

I feel like we are making the code less versatile to work around the
fact that userspace didn't realise that system calls could be
interrupted without a fatal signal.

And from that view, I consider this a functional change and not a bug
fix.

Thanks,
Liam

next prev parent reply	other threads:[~2026-01-05 18:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-04 21:17 Mikulas Patocka
2026-01-05 10:42 ` Vlastimil Babka
2026-01-05 12:15   ` Lorenzo Stoakes
2026-01-05 18:15 ` Liam R. Howlett [this message]
2026-01-05 20:08   ` Mikulas Patocka
2026-01-06 17:40     ` Liam R. Howlett
2026-01-06 20:19       ` Mikulas Patocka
2026-01-06 21:56         ` Pedro Falcato
2026-01-07 20:14           ` Mikulas Patocka
2026-01-07  8:43         ` Vlastimil Babka
2026-01-07  9:25         ` Michel Dänzer
2026-01-06 11:36   ` Michel Dänzer
2026-01-06 12:52     ` Mikulas Patocka
2026-01-06 15:03       ` David Hildenbrand (Red Hat)
2026-01-07  9:55         ` Vlastimil Babka
2026-01-07 22:19           ` David Hildenbrand (Red Hat)
2026-01-06 14:57     ` Vlastimil Babka
2026-01-07  9:50 ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7whbqlfrwjr4z2d4bpny3rjyl5tetdyx7ccf52uvby7hgywoym@6l6m2xcytez7 \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mpatocka@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox