From: Arnd Bergmann <arnd@arndb.de>
To: ksummit-discuss@lists.linuxfoundation.org
Subject: Re: [Ksummit-discuss] [TECH TOPIC] Bus IPC
Date: Sat, 30 Jul 2016 11:24:07 +0200 [thread overview]
Message-ID: <1519332.ZM9tMjbubR@wuerfel> (raw)
In-Reply-To: <CANq1E4QvX6RMeapc8ZBRvg6UFdwCEbQMhgH=DqM6UZMkF3+T1g@mail.gmail.com>
On Friday, July 29, 2016 12:24:03 AM CEST David Herrmann wrote:
> Tom Gundersen and I would like to propose a technical session on
> in-kernel IPC systems. For roughly half a year now we have been
> developing (with others) a capability-based [1] IPC system for linux,
> called bus1 [2]. We would like to present bus1, start a discussion on
> open problems, and talk about the possible path forward for an upstream
> inclusion.
>
> While bus1 emerged out of the kdbus project, it is a new, independent
> project, designed from scratch. Its main goal is to implement an n-to-n
> communication bus on linux. A lot of inspiration is taken from both
> DBus, as well as the the most commonly used IPC systems of other OSs,
> and related research projects (including Android Binder, OS-X/Hurd Mach
> IPC, Solaris Doors, Microsoft Midori IPC, seL4, Sandstorm's Cap'n'Proto,
> ..).
>
> The bus1 IPC system was designed to...
>
> o be a machine-local IPC system. It is a fast communication channel
> between local threads and processes, independent of the marshaling
> format used.
>
> o provide secure, reliable capability-based [1] communication. A
> message is always invoked on a capability, requiring the caller to
> own said capability, otherwise it cannot perform that operation.
>
> o efficiently support n-to-n communication. Every peer can communicate
> with every other peer (given the right capabilities), with minimal
> overhead for state-tracking.
>
> o be well-suited for both unicast and multicast messages.
>
> o guarantee a global message order [3], allowing clients to rely on
> causal ordering between messages they send and receive (for further
> reading, see Leslie Lamport's work on distributed systems [4]).
>
> o scale with the number of CPUs available. There is no global context
> specific to the bus1 IPC, but all communication happens based on
> local context only. That is, if two independent peers never talk to
> each other, their operations never share any memory (no shared
> locks, no shared state, etc.).
>
> o avoid any in-kernel buffering and rather transfer data directly
> from a sender into the receiver's mappable queue (single-copy).
>
> A user-space implementation of bus1 (or even any bus-based IPC) was
> considered, but was found to have several seemingly unavoidable issues.
>
> o To guarantee reliable, global message ordering including multicasts,
> as well as to provide reliable capabilities, a bus-broker is
> required. In other words, the current linux syscall API is not
> sufficient to implement the design as described above in an efficient
> way without a dedicated, trusted, privileged process that manages the
> bus and routes messages between the peers.
>
> o Whenever a bus-broker is involved, any message transaction between
> two clients requires the broker process to execute code in its own
> time-slice. While this time-slice can be distributed fairly across
> clients, it is ultimately always accounted on the user of the broker,
> rather than the originating user. Kernel time-slice accounting, and
> the accounting in the broker are completely separated and cannot make
> decisions based on the data of each other.
> Furthermore, the broker needs to be run with quite excessive resource
> limits and execution rights to be able to serve requests of high
> priority peers, making the same resources available to low priority
> peers as well.
> An in-kernel IPC mechanism removes the requirement for such a highly
> privileged bus-broker, and rather accounts any operation and resource
> exactly on the calling user, cgroup, and process.
>
> o Bus ipc often involves peers requesting services from other trusted
> peers, and waiting for a possible result before continuing. If
> said trust relationship is given, privileged processes actively want
> priority inheritance when calling into less privileged, but trusted
> processes. There is currently no known way to implement this in a
> user-space broker without requiring n^2 PI-futex pairs.
>
> o A userspace broker would entail two UDS transactions and potentially
> an extra context-switch, compared to a single bus1 transaction with
> the in-kernel broker. Our x86-benchmarks (before any serious
> optimization work has started) shows that two UDS transactions are
> always slower than one bus1 transaction. On top of that comes the
> extra context switch, which has about the same cost as a full bus1
> transaction, as well as any time spent in the broker itself. With an
> imaginary no-overhead broker, we found an in-kernel broker to be >40%
> faster. The numbers will differ between machines, but the reduced
> latency is undeniable.
>
> o Accounting of inflight resources (e.g., file-descriptors) in a broker
> is completely broken. Right now, any outgoing message of a broker
> will account FDs on the broker, however, there is no way for the
> broker to track outgoing FDs. As such, it cannot attribute them on
> the original sender of the FD, opening up for DoS attacks.
>
> o LSMs and audit cannot hook into the broker, nor get any additional
> routing information. Thus, audit cannot log proper information, and
> LSMs need to hook into a user-space process, relying on them to
> implement the wanted security model.
>
> o The kernel itself can never operate on the bus, nor provide services
> seamlessly to user-space (e.g., like netlink does), unless it is
> implemented in the kernel.
>
> o If a broker is involved, no communication can be ordered against
> side-channels. A kernel implementation, on the other hand, provides
> strong ordering against any other event happening on the system.
>
> The implemention of bus1.ko with its <5k LOC is relatively small, but
> still takes a considerable amount of time to review and understand. We
> would like to use the kernel-summit as an opportunity to present bus1,
> and answer questions on its design, implementation, and use of other
> kernel subsystems. We encourage everyone to look into the sources, but
> we still believe that a personal discussion up-front would save everyone
> a lot of time and energy. Furthermore, it would also allow us to
> collectively solve remaining issues.
>
> Everyone interested in IPC is invited to the discussion. In particular,
> we would welcome everyone who participated in the Binder and kdbus
> discussions, is involed in shmem+memcg (or other bus1-related
> subsystems), possibly including:
>
> o Andy Lutomirski
> o Greg Kroah-Hartman
> o Steven Rostedt
> o Eric W. Biederman
> o Jiri Kosina
> o Borislav Petkov
> o Michal Hocko (memcg)
> o Johannes Weiner (memcg)
> o Hugh Dickins (shmem)
> o Tom Gundersen (bus1)
> o David Herrmann (bus1)
I'd like to join in discussing the user interface. The current version
seems (compared to kdbus) simple enough that we could consider using
syscalls instead of a miscdev.
Arnd
next prev parent reply other threads:[~2016-07-30 9:24 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-28 22:24 David Herrmann
2016-07-28 22:57 ` Andy Lutomirski
2016-07-28 23:42 ` Jiri Kosina
2016-07-29 7:12 ` Hannes Reinecke
2016-07-30 22:25 ` Tom Gundersen
2016-07-29 2:41 ` Greg KH
2016-07-30 2:45 ` Steven Rostedt
2016-07-30 9:24 ` Arnd Bergmann [this message]
2016-07-30 21:58 ` Tom Gundersen
2016-07-30 22:21 ` Josh Triplett
2016-08-01 10:36 ` David Herrmann
2016-08-01 18:53 ` Josh Triplett
2016-08-02 8:43 ` Jiri Kosina
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1519332.ZM9tMjbubR@wuerfel \
--to=arnd@arndb.de \
--cc=ksummit-discuss@lists.linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox