* Re: [PATCH v4] signal: Let tasks cache one sigqueue struct.
[not found] ` <20230406204721.A6lSYL7A@linutronix.de>
@ 2023-04-07 0:03 ` Hillf Danton
2023-04-07 3:53 ` Matthew Wilcox
0 siblings, 1 reply; 3+ messages in thread
From: Hillf Danton @ 2023-04-07 0:03 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-kernel, Eric W. Biederman, Ingo Molnar, Mel Gorman,
Oleg Nesterov, Peter Zijlstra, Thomas Gleixner, Vincent Guittot,
linux-mm, Linus Torvalds
On Thu, 6 Apr 2023 22:47:21 +0200 Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> The sigqueue caching originated in the PREEMPT_RT tree. A few of the
> applications, that were ported to Linux, were ported from OS-9. Sending
> notifications from one task to another via a signal was a common
> communication model there and so the applications are heavy signal
> users. Removing the allocation reduces the critical path, avoids locks
> and so lowers the maximal latency of the task while sending a signal.
This is needed only if slab does not work for you.
Why not make it work better instead of peeling slab one more off just to
make signal/rt richer?
>
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -432,7 +432,18 @@ static struct sigqueue *
> return NULL;
> =20
> if (override_rlimit || likely(sigpending <=3D task_rlimit(t, RLIMIT_SIGPE=
> NDING))) {
> - q =3D kmem_cache_alloc(sigqueue_cachep, gfp_flags);
> +
> + if (!sigqueue_flags) {
> + struct sighand_struct *sighand =3D t->sighand;
> +
> + lockdep_assert_held(&sighand->siglock);
> + if (sighand->sigqueue_cache) {
> + q =3D sighand->sigqueue_cache;
> + sighand->sigqueue_cache =3D NULL;
> + }
> + }
> + if (!q)
> + q =3D kmem_cache_alloc(sigqueue_cachep, gfp_flags);
> } else {
> print_dropped_signal(sig);
> }
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v4] signal: Let tasks cache one sigqueue struct.
2023-04-07 0:03 ` [PATCH v4] signal: Let tasks cache one sigqueue struct Hillf Danton
@ 2023-04-07 3:53 ` Matthew Wilcox
2023-05-24 15:33 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2023-04-07 3:53 UTC (permalink / raw)
To: Hillf Danton
Cc: Sebastian Andrzej Siewior, linux-kernel, Eric W. Biederman,
Ingo Molnar, Mel Gorman, Oleg Nesterov, Peter Zijlstra,
Thomas Gleixner, Vincent Guittot, linux-mm, Linus Torvalds
On Fri, Apr 07, 2023 at 08:03:06AM +0800, Hillf Danton wrote:
> On Thu, 6 Apr 2023 22:47:21 +0200 Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > The sigqueue caching originated in the PREEMPT_RT tree. A few of the
> > applications, that were ported to Linux, were ported from OS-9. Sending
> > notifications from one task to another via a signal was a common
> > communication model there and so the applications are heavy signal
> > users. Removing the allocation reduces the critical path, avoids locks
> > and so lowers the maximal latency of the task while sending a signal.
It might lower the _average_ latency, but it certainly doesn't lower
the _maximum_ latency, because you have to assume worst case scenario
for maximum latency. Which is that there's no sigqueue cached, so you
have to go into the slab allocator. And again, worst case scenario is
that you have to go into the page allocator from there, and further that
you have to run reclaim, and ...
What I find odd about the numbers that you quote:
> The numbers of system boot followed by an allmod kernel build:
> Out of 333216 allocations, 194876 (~58%) were served from the cache.
> From all free invocations, 4212 were in a path were caching is not done
> and 329002 sigqueue were cached.
... is that there's no absolute numbers. Does it save 1% of the cost
of sending a signal? 10%? What does perf say about the cost saved
by no longer going into slab? Because the fast path in slab is very
fast. It might even be quicker than your fast path for multithreaded
applications which have threads running on different NUMA nodes.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v4] signal: Let tasks cache one sigqueue struct.
2023-04-07 3:53 ` Matthew Wilcox
@ 2023-05-24 15:33 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 3+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-05-24 15:33 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Hillf Danton, linux-kernel, Eric W. Biederman, Ingo Molnar,
Mel Gorman, Oleg Nesterov, Peter Zijlstra, Thomas Gleixner,
Vincent Guittot, linux-mm, Linus Torvalds
On 2023-04-07 04:53:27 [+0100], Matthew Wilcox wrote:
> On Fri, Apr 07, 2023 at 08:03:06AM +0800, Hillf Danton wrote:
> > On Thu, 6 Apr 2023 22:47:21 +0200 Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > The sigqueue caching originated in the PREEMPT_RT tree. A few of the
> > > applications, that were ported to Linux, were ported from OS-9. Sending
> > > notifications from one task to another via a signal was a common
> > > communication model there and so the applications are heavy signal
> > > users. Removing the allocation reduces the critical path, avoids locks
> > > and so lowers the maximal latency of the task while sending a signal.
>
> It might lower the _average_ latency, but it certainly doesn't lower
> the _maximum_ latency, because you have to assume worst case scenario
> for maximum latency. Which is that there's no sigqueue cached, so you
> have to go into the slab allocator. And again, worst case scenario is
> that you have to go into the page allocator from there, and further that
> you have to run reclaim, and ...
Yes. The idea is in general not to send more signals in parallel than
the available number cached slots.
> What I find odd about the numbers that you quote:
>
> > The numbers of system boot followed by an allmod kernel build:
> > Out of 333216 allocations, 194876 (~58%) were served from the cache.
> > From all free invocations, 4212 were in a path were caching is not done
> > and 329002 sigqueue were cached.
>
> ... is that there's no absolute numbers. Does it save 1% of the cost
> of sending a signal? 10%? What does perf say about the cost saved
> by no longer going into slab? Because the fast path in slab is very
> fast. It might even be quicker than your fast path for multithreaded
> applications which have threads running on different NUMA nodes.
I asked for updated numbers and the improvement is not as big as
initially reported. There might have been a change in the configuration
for the testing an so the improvement is not as big as initially assumed.
I'm sorry, but I didn't get any numbers to back anything up. I'm
dropping the effort here, thanks for the review :)
Sebastian
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-05-24 15:33 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20230406194004.KP1K6FwO@linutronix.de>
[not found] ` <20230406204721.A6lSYL7A@linutronix.de>
2023-04-07 0:03 ` [PATCH v4] signal: Let tasks cache one sigqueue struct Hillf Danton
2023-04-07 3:53 ` Matthew Wilcox
2023-05-24 15:33 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox