linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Sebastian Sewior <bigeasy@linutronix.de>,
	Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>, bpf <bpf@vger.kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>, Hou Tao <houtao1@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	shakeel.butt@linux.dev, Thomas Gleixner <tglx@linutronix.de>,
	Tejun Heo <tj@kernel.org>, linux-mm <linux-mm@kvack.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation
Date: Fri, 13 Dec 2024 16:00:35 -0500	[thread overview]
Message-ID: <20241213160035.677810fb@gandalf.local.home> (raw)
In-Reply-To: <20241213150950.2879b7db@gandalf.local.home>

On Fri, 13 Dec 2024 15:09:50 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> > What's the concern then? That PI may see an odd order of locks for this task ?
> > but it cannot do anything about it anyway, since the task won't schedule.
> > And before irq handler is over the B will be released and everything
> > will look normal again.  
> 
> The problem is the chain walk. It could also cause unwanted side effects in RT.
> 
> If low priority task 1 has lock A and is running on another CPU and low
> priority task 2 blocks on lock A and then is interrupted right before going
> to sleep as being "blocked on", and takes lock B in the interrupt context.
> We then have high priority task 3 on another CPU block on B which will then
> see that the owner of B is blocked (even though it is not blocked for B), it
> will boost its priority as well as the owner of the lock (A). The A owner
> will get boosted where it is not the task that is blocking the high
> priority task.
> 
> My point is that RT is all about deterministic behavior. It would require
> a pretty substantial audit to the PI logic to make sure that this doesn't
> cause any unexpected results.
> 
> My point is, the PI logic was not designed for taking a lock after being
> blocked on another lock. It may work, but we had better prove and show all
> side effects that can happen in these cases.

When B is released, task 2 will be unboosted, but task 1 will not. That's
because a task is only unboosted when it releases a lock (or a lock times
out, and then the waiter will unboost the chain, but that's not this case).

Task 1 will unboost when it finally releases lock A.

Another issue is that interrupts are not stopped by spin_lock_irq*() as in
RT spin_lock_irq*() is the same as spin_lock(). As spin_locks() are assumed
to only be in threaded irq context, there's no reason to actually disable
interrupts when taking one of these spin_lock_irq*().

That means we could have task 1 trying to take lock B too. If we have a
lock order of:

  A -> B

Where B is the trylock in the irq context, we have:


  CPU 1			CPU 2			CPU 3
  -----			-----			-----
 task 1 takes A
			task 2 blocks on A, gets interrupted, trylock B and succeeds:

 task 1 takes B (blocks)

						High priority task 3 blocks on B

Now we have this in the PI chain:

  Task 3 boosts tasks 2 but task 2 is blocked on A
  Task 3 then boosts owner of A which is task 1 which is blocked on lock B
  Task 3 then boosts owner of lock B which is task 2
  [ wash, rinse, repeat!!! ]

There is a safety valve in the code that will prevent an infinite loop, and
it will trigger a printk message or it may stop because the priorities are
equal, but still, this is not desirable and may even have other side
effects. If deadlock detection is enabled, this will trigger it.

Again, allowing spin_locks being taken in hard irq context in RT, even with
trylock is going to open a nasty can of worms that will make this less
deterministic and determinism is the entire point of RT. If we allow one
user to have spin_lock_trylock() in hard irq context, we have to allow
anyone to do it.

-- Steve


  reply	other threads:[~2024-12-13 21:00 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-10  2:39 [PATCH bpf-next v2 0/6] bpf, mm: Introduce __GFP_TRYLOCK Alexei Starovoitov
2024-12-10  2:39 ` [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation Alexei Starovoitov
2024-12-10  5:31   ` Matthew Wilcox
2024-12-10  9:05     ` Michal Hocko
2024-12-10 20:25       ` Shakeel Butt
2024-12-11 10:08         ` Michal Hocko
2024-12-10 22:06       ` Alexei Starovoitov
2024-12-11 10:19         ` Michal Hocko
2024-12-12 15:07         ` Sebastian Sewior
2024-12-12 15:21           ` Michal Hocko
2024-12-12 15:35             ` Sebastian Sewior
2024-12-12 15:48               ` Steven Rostedt
2024-12-12 16:00                 ` Sebastian Sewior
2024-12-13 17:44                   ` Steven Rostedt
2024-12-13 18:44                     ` Alexei Starovoitov
2024-12-13 18:57                       ` Alexei Starovoitov
2024-12-13 20:09                       ` Steven Rostedt
2024-12-13 21:00                         ` Steven Rostedt [this message]
2024-12-13 22:02                           ` Alexei Starovoitov
2024-12-12 21:57               ` Alexei Starovoitov
2024-12-10 21:42     ` Alexei Starovoitov
2024-12-10  9:01   ` Sebastian Andrzej Siewior
2024-12-10 21:53     ` Alexei Starovoitov
2024-12-11  8:38       ` Vlastimil Babka
2024-12-12  2:14         ` Alexei Starovoitov
2024-12-12  8:54           ` Vlastimil Babka
2024-12-10 18:39   ` Vlastimil Babka
2024-12-10 22:42     ` Alexei Starovoitov
2024-12-11  8:48       ` Vlastimil Babka
2024-12-10  2:39 ` [PATCH bpf-next v2 2/6] mm, bpf: Introduce free_pages_nolock() Alexei Starovoitov
2024-12-10  8:35   ` Sebastian Andrzej Siewior
2024-12-10 22:49     ` Alexei Starovoitov
2024-12-12 14:44       ` Sebastian Andrzej Siewior
2024-12-12 19:57         ` Alexei Starovoitov
2024-12-11 10:11   ` Vlastimil Babka
2024-12-12  1:43     ` Alexei Starovoitov
2024-12-10  2:39 ` [PATCH bpf-next v2 3/6] locking/local_lock: Introduce local_trylock_irqsave() Alexei Starovoitov
2024-12-11 10:53   ` Vlastimil Babka
2024-12-11 11:55     ` Vlastimil Babka
2024-12-12  2:49       ` Alexei Starovoitov
2024-12-12  9:15         ` Vlastimil Babka
2024-12-13 14:02           ` Vlastimil Babka
2024-12-12 15:15   ` Sebastian Andrzej Siewior
2024-12-12 19:59     ` Alexei Starovoitov
2024-12-10  2:39 ` [PATCH bpf-next v2 4/6] memcg: Add __GFP_TRYLOCK support Alexei Starovoitov
2024-12-11 23:47   ` kernel test robot
2024-12-10  2:39 ` [PATCH bpf-next v2 5/6] mm, bpf: Use __GFP_ACCOUNT in try_alloc_pages() Alexei Starovoitov
2024-12-11 12:05   ` Vlastimil Babka
2024-12-12  2:54     ` Alexei Starovoitov
2024-12-10  2:39 ` [PATCH bpf-next v2 6/6] bpf: Use try_alloc_pages() to allocate pages for bpf needs Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241213160035.677810fb@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=houtao1@huawei.com \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=memxor@gmail.com \
    --cc=mhocko@suse.com \
    --cc=peterz@infradead.org \
    --cc=shakeel.butt@linux.dev \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox