linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: "Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Hildenbrand" <david@redhat.com>,
	amd-gfx@lists.freedesktop.org, linux-mm@kvack.org,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Jann Horn" <jannh@google.com>,
	"Pedro Falcato" <pfalcato@suse.de>
Subject: Re: [PATCH v3 2/3] mm: only interrupt taking all mm locks on fatal signal
Date: Tue, 6 Jan 2026 12:40:46 -0500	[thread overview]
Message-ID: <j3dy3g5mchtdzxldtnqu5nwaalbr6ec4ceim3nuu6nwcddmqjc@7dgzr4m7pli2> (raw)
In-Reply-To: <e60858c9-12a6-2b04-35ae-9d676f14db2a@redhat.com>

* Mikulas Patocka <mpatocka@redhat.com> [260105 15:08]:
> 
> 
> On Mon, 5 Jan 2026, Liam R. Howlett wrote:
> 
> > I may be missing context because I didn't get 1/3 of this patch set,
> > nor can I find it in my ML searching.  Nor did I get the cover letter,
> > or find it.  Is this series threaded?
> 
> You can ignore the patch 1/3, it just changes memory management test 
> suite.
> 
> > What you are doing is changing a really horrible loop across all VMAs,
> > that happens 4 times, to a less interruptible method.
> > 
> > I'm not sure I'm okay with this.  Everyone else does seem fine with it,
> > because userspace basically never checks error codes for a retry (or
> > really anything, and sometimes not even for an error at all).
> 
> Everyone else does seem fine because they don't use periodic signals :-)

I meant everyone else on this thread.

> 
> OpenCL is not the first thing that got broken by periodic signals. In the 
> past, I found bugs in Linux/alpha, Linux/hppa, Cygwin, Intel Software 
> Developer's Emulator regarding periodic signals. fork() was also buggy and 
> it was fixed by c3ad2c3b02e953ead2b8d52a0c9e70312930c3d0.
> 
> > But this could potentially have larger consequences for those
> > applications that register signal handlers, couldn't it?  That is, they
> > expect to get a return based on some existing code but now it won't
> > return and the application is forced to wait for all locks to be taken
> > regardless of how long it takes - or to force kill the application?
> 
> Do you have any userspace application that expects open("/dev/kfd") to be 
> interrupted with -EINTR and breaks when it isn't?

Your change doesn't just affect /dev/kfd though, it affects any call
path that leads to looping across all vmas four times in a row.

And since this could take a long time, I'm not sure it's a good idea to
stop responding to all non-fatal signal.

> 
> > We regularly get people requesting the default number of vmas be
> > increased.  This means that processes that approach max_map_count will
> > wait until 4 loops through the vmas before they can be interrupted, or
> > they have to kill the process.
> > 
> > > 
> > > For example, this bug happens when using OpenCL on AMDGPU. Sometimes,
> > > probing the OpenCL device fails (strace shows that open("/dev/kfd")
> > > failed with -EINTR). Sometimes we get the message "amdgpu:
> > > init_user_pages: Failed to register MMU notifier: -4" in the syslog.
> > 
> > If you only get the error message sometimes, does that mean there is
> > another signal check that isn't covered by this change - or another call
> > path?
> 
> This call path is also triggered by -EINTR from mm_take_all_locks: 
> "init_user_pages -> amdgpu_hmm_register -> mmu_interval_notifier_insert -> 
> mmu_notifier_register -> __mmu_notifier_register -> mm_take_all_locks -> 
> return -EINTR". I am not expert in the GPU code, so I don't know how much 
> serious it is.

Okay, so the other call paths also end up getting the -EINTR from this
function?  Can you please add that detail to the commit message?

This means that -EINTR can no longer be returned from open(), right?
Otherwise you are just reducing a race condition between open() and a
signal entering from your timer.

> 
> > > The bug can be reproduced with the following program.
> > > 
> > > To run this program, you need AMD graphics card and the package
> > > "rocm-opencl" installed. You must not have the package "mesa-opencl-icd"
> > > installed, because it redirects the default OpenCL implementation to
> > > itself.
> > 
> > I'm not saying it's wrong to change the signal handling, but this is
> > very much working around a bug in userspace constantly hammering a task
> > with signals and then is surprised there is a response that the kernel
> > was interrupted.
> > 
> > This seems to imply that all signal handling should only happen on fatal
> > signals?
> 
> No - the kernel should do what applications expect. open is (according to 
> the man page) supposed to be interrupted when opening slow devices (for 
> example fifo). I'm wondering whether /dev/kfd should be considered a slow 
> device or not.

I disagree.  The kernel should do what was agreed to in an API,
otherwise our whole deck of cards falls over.  We may not even have the
resources to do what the applications expect.  One could also code an
application that expects to violate the security of the system.

This is documented behaviour from the open() call, so it certainly
sounds like an issue if userspace doesn't expect this return.

Any other -EINTR system call will also cause you problems since you
continuously send signals to your process, so we'll have to change them
all for this to work?


> 
> > ...
> > > 
> > > I'm submitting this patch for the stable kernels, because this bug may
> > > cause random failures in any code that calls mm_take_all_locks.
> > 
> > They aren't random failures, they are a response to a signal sent to the
> > process that may be taking a very long time to do something.
> > 
> > I really don't see how continuously sending signals at a short interval
> > interrupting system calls can be considered random failures, especially
> > when the return is -EINTR which literally means "Interrupted system
> > call".  How else would you interrupt a system call, if not a signal?
> 
> The AMDGPU OpenCL implementation attempts to open /dev/kfd and if it gets 
> -EINTR, it behaves as if OpenCL were unavailable - it won't report itself 
> in clGetPlatformIDs and it will make clGetDeviceIDs fail.

This is the userspace ignoring what the error code means and just
aborting on any error.  This is a change in behaviour on the kernel side
to work around what they are doing.

It also sounds like it can be avoided by userspace not sending signals
during the open process, or even to retry at a higher level if a
recoverable error occurs.

Again, I'm not saying that this change isn't the right thing to do, but
it's not the kernels fault that userspace is aborting on -EINTR.  And
it's not the kernels fault that the userspace application is sending
signals during a system call that will respond to those signals.

This is changing the kernels behaviour to avoid having to deal with two
cascading failures in userspace: 1. The opencl not retrying on
interrupted open calls and 2. sending signals to a process in the middle
of a system call that will respond to signals.

I am saying that calling this a bug fix is wrong.

Even though Vlastimil has pointed out that this locking did not happen
prior to 7906d00cd1f6, the contract to userspace for the open() system
call could still return -EINTR and that change predates amdgpu's kfd in
2015.  So I don't consider the 'kernel broke userspace' argument in this
case.

> 
> So, we are dealing with random failures - any single signal received at 
> wrong time can make OpenCL fail.

I feel like we are working with two different definitions of random.  I
would consider a failure caused by a periodic signal to be an expected
and predictable event.

"random failures in any code that calls mm_take_all_locks" makes it
sound like we're going to hit some uninitialized variable that sometimes
isn't zero and exit, but really we're receiving a signal that may need
to do something within a reasonable amount of time.

> 
> Even if I disabled the periodic timer, the failure could be triggered by 
> other signals, for example SIGWINCH when the user resizes the terminal, or 
> SIGCHLD when a subprocess exits.

Those are also not random, they are expected signals caused by events.

I'm trying to say this git commit message is wrong and misleading.

> 
> > I feel like we are making the code less versatile to work around the
> > fact that userspace didn't realise that system calls could be
> > interrupted without a fatal signal.
> > 
> > And from that view, I consider this a functional change and not a bug
> > fix.
> > 
> > Thanks,
> > Liam
> 
> In practice, I use 10ms timer and I get occasional OpenCL failures. In the 
> test example, I used 50us timer, so that it is reproduced reliably - but 
> decreasing the timer frequency doesn't fix the failure, it just makes it 
> happen less often.

This does not fit my definition of random, bug, or failure.

Thanks,
Liam




  reply	other threads:[~2026-01-06 17:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-04 21:17 Mikulas Patocka
2026-01-05 10:42 ` Vlastimil Babka
2026-01-05 12:15   ` Lorenzo Stoakes
2026-01-05 18:15 ` Liam R. Howlett
2026-01-05 20:08   ` Mikulas Patocka
2026-01-06 17:40     ` Liam R. Howlett [this message]
2026-01-06 20:19       ` Mikulas Patocka
2026-01-06 21:56         ` Pedro Falcato
2026-01-07 20:14           ` Mikulas Patocka
2026-01-07  8:43         ` Vlastimil Babka
2026-01-07  9:25         ` Michel Dänzer
2026-01-06 11:36   ` Michel Dänzer
2026-01-06 12:52     ` Mikulas Patocka
2026-01-06 15:03       ` David Hildenbrand (Red Hat)
2026-01-07  9:55         ` Vlastimil Babka
2026-01-07 22:19           ` David Hildenbrand (Red Hat)
2026-01-06 14:57     ` Vlastimil Babka
2026-01-07  9:50 ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=j3dy3g5mchtdzxldtnqu5nwaalbr6ec4ceim3nuu6nwcddmqjc@7dgzr4m7pli2 \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mpatocka@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox