linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mikulas Patocka <mpatocka@redhat.com>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: "Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Hildenbrand" <david@redhat.com>,
	amd-gfx@lists.freedesktop.org, linux-mm@kvack.org,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Jann Horn" <jannh@google.com>,
	"Pedro Falcato" <pfalcato@suse.de>
Subject: Re: [PATCH v3 2/3] mm: only interrupt taking all mm locks on fatal signal
Date: Tue, 6 Jan 2026 21:19:59 +0100 (CET)	[thread overview]
Message-ID: <6633f8ed-f432-f4c4-3fe2-8c14248cadab@redhat.com> (raw)
In-Reply-To: <j3dy3g5mchtdzxldtnqu5nwaalbr6ec4ceim3nuu6nwcddmqjc@7dgzr4m7pli2>



On Tue, 6 Jan 2026, Liam R. Howlett wrote:

> * Mikulas Patocka <mpatocka@redhat.com> [260105 15:08]:
> > 
> > > If you only get the error message sometimes, does that mean there is
> > > another signal check that isn't covered by this change - or another call
> > > path?
> > 
> > This call path is also triggered by -EINTR from mm_take_all_locks: 
> > "init_user_pages -> amdgpu_hmm_register -> mmu_interval_notifier_insert -> 
> > mmu_notifier_register -> __mmu_notifier_register -> mm_take_all_locks -> 
> > return -EINTR". I am not expert in the GPU code, so I don't know how much 
> > serious it is.
> 
> Okay, so the other call paths also end up getting the -EINTR from this
> function?  Can you please add that detail to the commit message?

Yes. I'd like to ask the GPU people to look at it and say how much damage 
this -EINTR could do. I don't know - I just saw the messages "Failed to 
register MMU notifier: -4" in the syslog.

> This means that -EINTR can no longer be returned from open(), right?
> Otherwise you are just reducing a race condition between open() and a
> signal entering from your timer.

EINTR can be returned from open() in cases when it was historically 
behaving this way - such as opening a fifo when there is no matching 
process having it open.

But I think that opening /dev/kfd doesn't fall into this category.

NFS has an "intr" flag that makes the filesystem syscalls interruptible by 
signals. It is off by default, because many programs don't expect EINTR 
when opening, reading or writing plain files on a filesystem.

> Any other -EINTR system call will also cause you problems since you
> continuously send signals to your process, so we'll have to change them
> all for this to work?

I use SA_RESTART for the signals. And I retry all the syscalls on EINTR 
just in case SA_RESTART didn't work. So, I don't experience random 
failures in my code due to the periodic signal.

But there is code that I have no control over - such as the OpenCL shared 
library.

> This is the userspace ignoring what the error code means and just
> aborting on any error.  This is a change in behaviour on the kernel side
> to work around what they are doing.
> 
> It also sounds like it can be avoided by userspace not sending signals
> during the open process, or even to

So far, I worked around this issue by blocking all signals around 
clGetPlatformIDs and clGetDeviceIDs - but this is a hack.

> retry at a higher level if a recoverable error occurs.

If clGetDeviceIDs fails and I call clGetDeviceIDs again, it doesn't even 
attempt to open /dev/kfd again and fails right away. So, I can't work 
around it by retrying it.

> > Even if I disabled the periodic timer, the failure could be triggered by 
> > other signals, for example SIGWINCH when the user resizes the terminal, or 
> > SIGCHLD when a subprocess exits.
> 
> Those are also not random, they are expected signals caused by events.

From the process's point of view, they are random - the process doesn't 
know when the user will drag the corner of the terminal window and resize 
it. If the process spawns a subprocess, it cannot predict when will the 
subprocess exit and SIGCHLD will be delivered.

If we don't change it, we end up with unreliable software stack that can 
fail during rare events, such as dragging the corner of the window.

> I'm trying to say this git commit message is wrong and misleading.

OK, so I'll try to rewrite the commit message and submit version 4 of the 
patch.

Mikulas



  reply	other threads:[~2026-01-06 20:20 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-04 21:17 Mikulas Patocka
2026-01-05 10:42 ` Vlastimil Babka
2026-01-05 12:15   ` Lorenzo Stoakes
2026-01-05 18:15 ` Liam R. Howlett
2026-01-05 20:08   ` Mikulas Patocka
2026-01-06 17:40     ` Liam R. Howlett
2026-01-06 20:19       ` Mikulas Patocka [this message]
2026-01-06 21:56         ` Pedro Falcato
2026-01-07 20:14           ` Mikulas Patocka
2026-01-07  8:43         ` Vlastimil Babka
2026-01-07  9:25         ` Michel Dänzer
2026-01-06 11:36   ` Michel Dänzer
2026-01-06 12:52     ` Mikulas Patocka
2026-01-06 15:03       ` David Hildenbrand (Red Hat)
2026-01-07  9:55         ` Vlastimil Babka
2026-01-07 22:19           ` David Hildenbrand (Red Hat)
2026-01-06 14:57     ` Vlastimil Babka
2026-01-07  9:50 ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6633f8ed-f432-f4c4-3fe2-8c14248cadab@redhat.com \
    --to=mpatocka@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=pfalcato@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox