[Ksummit-discuss] [TECH TOPIC] A Safety-critical Linux system architecture

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* [Ksummit-discuss] [TECH TOPIC] A Safety-critical Linux system architecture
@ 2018-09-12  1:18 Tiejun Chen
  2018-09-12 10:35 ` Linus Walleij
  0 siblings, 1 reply; 7+ messages in thread
From: Tiejun Chen @ 2018-09-12  1:18 UTC (permalink / raw)
  To: ksummit-discuss

Linux is playing very well in the case of embedded system. And there is no doubt that IoT, internet of things, grows as larger. Actually a variety of embedded systems make IoT so Linux definitely still can contribute to IoT. The thing here is many IoT systems are deployed in the critical infrastructures like the energy generation, oil and gas center, avionic, automotive, etc, where software contexts need to be certified according to different specifications like ARINC 653, Automotive Safety Integrity Level, and so on. So we need to explore making Linux itself certified.

Some efforts like SIL2LinuxMP, which "aims at the certification of the base components of an embedded GNU/Linux RTOS running on a single-core or multi-core industrial COTS computer board. Base components are boot loader, root filesystem, Linux kernel and C library bindings to access the Linux kernel. With the exception of a minimal set of utilities to inspect the system, manage files and start test procedures, user space applications are not included." But looks now it's still at the early stage and not good enough in the production environment.

So could we discuss together if-how we build a safety-critical Linux system architecture? Here are some potential technologies:

Minimal kernel part with some light weight stacks.
Every system and user service are containerized, and even with hardware assisted technologies like SGX.
It's based on Unikernel concept.
One process or even one thread is restricted right there.
It might be having system audit with hypervisor.
How to verify Linux in such a safety-critical environment

But not limited these factors in terms of our safety-critical Linux system architecture.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] A Safety-critical Linux system architecture
  2018-09-12  1:18 [Ksummit-discuss] [TECH TOPIC] A Safety-critical Linux system architecture Tiejun Chen
@ 2018-09-12 10:35 ` Linus Walleij
  2018-09-12 16:29   ` Darren Hart
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Walleij @ 2018-09-12 10:35 UTC (permalink / raw)
  To: tiejunc; +Cc: ksummit-discuss

On Wed, Sep 12, 2018 at 3:18 AM Tiejun Chen <tiejunc@vmware.com> wrote:

> software contexts need to be certified according to different specifications
> like ARINC 653, Automotive Safety Integrity Level, and so on. So we
> need to explore making Linux itself certified.

There is a bunch of these certifications and specifications. Many of them
include manual review of "all code on the system", which is why several
approaches to this includes stripping down the kernel source to only
the code (after removing all Kconfig buzz and ifdefs) that will compile
and run on the target.

This should of course be possible to integrate into the existing Linux
build system, like "make sources" that would create a reduced
kernel tree that will also compile (russian matroska dolls come to
mind). I think such projects exist in Japan but I haven't heard from
them recently.

My pet peeve is that the review process appears to be something
along the lines that a "certified person/consultant" who has training
in this standard is supposed to review all the code for safety, so
after reviewing IMO we should work on (A) making sure that these
reviews and comments and the exact lines of the kernel and which
version/commit ID of it it pertains to is made public and (B) that the
review persons statement be merged into the kernel git log as
some kind of annotation along the lines of:

Reviewed-for-ISO-26262-by: Linus Walleij <linus.walleij@linaro.org>

This way we can help the safety community by making the safety
critical review process more open and public, and also making these
reviewers put their personal names behind it.

If they also propose patches and follow up on them: even better.

The idea is not (as one could maybe think) that of public shaming
for all the stuff these people invariably are going to miss, but to
create a space where they can learn about the kernel and the
weaknesses pertaining to safety critical deployments and also
spread this knowledge to kernel maintainers instead of keeping
it all in their private safety engineering bubble.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] A Safety-critical Linux system architecture
  2018-09-12 10:35 ` Linus Walleij
@ 2018-09-12 16:29   ` Darren Hart
  2018-09-13  3:13     ` Tiejun Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Darren Hart @ 2018-09-12 16:29 UTC (permalink / raw)
  To: Linus Walleij; +Cc: ksummit-discuss

On Wed, Sep 12, 2018 at 12:35:07PM +0200, Linus Walleij wrote:
> On Wed, Sep 12, 2018 at 3:18 AM Tiejun Chen <tiejunc@vmware.com> wrote:
> 
> > software contexts need to be certified according to different specifications
> > like ARINC 653, Automotive Safety Integrity Level, and so on. So we
> > need to explore making Linux itself certified.
> 
> There is a bunch of these certifications and specifications. Many of them
> include manual review of "all code on the system", which is why several
> approaches to this includes stripping down the kernel source to only
> the code (after removing all Kconfig buzz and ifdefs) that will compile
> and run on the target.
> 
> This should of course be possible to integrate into the existing Linux
> build system, like "make sources" that would create a reduced
> kernel tree that will also compile (russian matroska dolls come to
> mind). I think such projects exist in Japan but I haven't heard from
> them recently.
> 
> My pet peeve is that the review process appears to be something
> along the lines that a "certified person/consultant" who has training
> in this standard is supposed to review all the code for safety, so
> after reviewing IMO we should work on (A) making sure that these
> reviews and comments and the exact lines of the kernel and which
> version/commit ID of it it pertains to is made public and (B) that the
> review persons statement be merged into the kernel git log as
> some kind of annotation along the lines of:
> 
> Reviewed-for-ISO-26262-by: Linus Walleij <linus.walleij@linaro.org>
> 

Functional Safety (FuSa) is the freedom from unacceptable risk. It typically
involves safety measures that manage known and acceptable risk.

There is a significant difference in the traditional Functional Safety (FuSa)
systems and the systems built with Linux. As opposed to purpose built micro
processors and less than 100k lines of code, Linux systems on general purpose
CPUs present a "complex" system - where "complex" is referring to a system which
exhibits emergent properties - properties which can only be observed in the assembled
system, and not in the individual components.

This is significant because it requires a new approach to qualifying systems. We
cannot apply traditional hazard analysis for fault trees to systems with CPUs
made up of 7 Billion transistors (each) and a pre-existing software stack with
10s of millions lines of code.

Point being: starting to add safety specific reviews to a pre-existing complex
software stack doesn't help. Linux will never be developed in a strictly
compliant manner to any FuSa standard (especially 26262 which is not suitable
for any complex software stack, fortunately it allows you to defer to the more
generic IEC 61508).

The problem facing us here is not "how do we make Linux safe", it is "how do we
show that Linux has been developed in such a way that it presents an acceptable
level of risk which can be managed with a defined set of safety measures".

To the point of "Architecture". While it is tempting to try to "make it safe" or
"design an architecture", safety critical systems are designed first from a
safety case.

The problem we need to solve here is not a technical Linux kernel problem. We
need to understand a set of use cases, determine safety requirements, and then
complete the methods and procedures begun by the SIL2LinuxMP project to show
that Linux (pretty much as is) can be used with an acceptable level of risk.

I do not feel Kernel Summit is the right venue for this discussion.

-- 
Darren Hart
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] A Safety-critical Linux system architecture
  2018-09-12 16:29   ` Darren Hart
@ 2018-09-13  3:13     ` Tiejun Chen
  2018-09-13  7:57       ` Linus Walleij
  2018-09-13  9:50       ` Greg KH
  0 siblings, 2 replies; 7+ messages in thread
From: Tiejun Chen @ 2018-09-13  3:13 UTC (permalink / raw)
  To: Darren Hart, Linus Walleij; +Cc: ksummit-discuss

> -----Original Message-----
> From: Darren Hart <dvhart@infradead.org>
> Sent: Thursday, September 13, 2018 12:29 AM
> To: Linus Walleij <linus.walleij@linaro.org>
> Cc: Tiejun Chen <tiejunc@vmware.com>; ksummit-
> discuss@lists.linuxfoundation.org
> Subject: Re: [Ksummit-discuss] [TECH TOPIC] A Safety-critical Linux system
> architecture
> 
> On Wed, Sep 12, 2018 at 12:35:07PM +0200, Linus Walleij wrote:
> > On Wed, Sep 12, 2018 at 3:18 AM Tiejun Chen <tiejunc@vmware.com> wrote:
> >
> > > software contexts need to be certified according to different
> > > specifications like ARINC 653, Automotive Safety Integrity Level,
> > > and so on. So we need to explore making Linux itself certified.
> >
> > There is a bunch of these certifications and specifications. Many of
> > them include manual review of "all code on the system", which is why
> > several approaches to this includes stripping down the kernel source
> > to only the code (after removing all Kconfig buzz and ifdefs) that
> > will compile and run on the target.
> >
> > This should of course be possible to integrate into the existing Linux
> > build system, like "make sources" that would create a reduced kernel
> > tree that will also compile (russian matroska dolls come to mind). I
> > think such projects exist in Japan but I haven't heard from them
> > recently.
> >
> > My pet peeve is that the review process appears to be something along
> > the lines that a "certified person/consultant" who has training in
> > this standard is supposed to review all the code for safety, so after
> > reviewing IMO we should work on (A) making sure that these reviews and
> > comments and the exact lines of the kernel and which version/commit ID
> > of it it pertains to is made public and (B) that the review persons
> > statement be merged into the kernel git log as some kind of annotation
> > along the lines of:
> >
> > Reviewed-for-ISO-26262-by: Linus Walleij <linus.walleij@linaro.org>
> >

Thanks a lot. Something is really being inspiring me in this area.

> 
> Functional Safety (FuSa) is the freedom from unacceptable risk. It typically
> involves safety measures that manage known and acceptable risk.
> 
> There is a significant difference in the traditional Functional Safety (FuSa)
> systems and the systems built with Linux. As opposed to purpose built micro
> processors and less than 100k lines of code, Linux systems on general purpose
> CPUs present a "complex" system - where "complex" is referring to a system
> which exhibits emergent properties - properties which can only be observed in
> the assembled system, and not in the individual components.
> 
> This is significant because it requires a new approach to qualifying systems. We
> cannot apply traditional hazard analysis for fault trees to systems with CPUs
> made up of 7 Billion transistors (each) and a pre-existing software stack with 10s
> of millions lines of code.
> 
> Point being: starting to add safety specific reviews to a pre-existing complex
> software stack doesn't help. Linux will never be developed in a strictly
> compliant manner to any FuSa standard (especially 26262 which is not suitable
> for any complex software stack, fortunately it allows you to defer to the more
> generic IEC 61508).
> 
> The problem facing us here is not "how do we make Linux safe", it is "how do
> we show that Linux has been developed in such a way that it presents an
> acceptable level of risk which can be managed with a defined set of safety
> measures".
> 
> To the point of "Architecture". While it is tempting to try to "make it safe" or
> "design an architecture", safety critical systems are designed first from a safety
> case.
> 
> The problem we need to solve here is not a technical Linux kernel problem. We
> need to understand a set of use cases, determine safety requirements, and then
> complete the methods and procedures begun by the SIL2LinuxMP project to
> show that Linux (pretty much as is) can be used with an acceptable level of risk.
> 
> I do not feel Kernel Summit is the right venue for this discussion.
> 

I cannot understand why we cannot make this over there.

In the one hand, typically we already have several approaches to enable Linux kernel into such a safety-critical environment, like SIL2LinuxMP, Jailhouse and so on. Even since OSS NA, Intel has announced that Clearlinux could be considered as a good candidate. And some new {software, hardware} features have been introduced into Linux kernel in recent years. So on my side, I'd like to some potential incorporation of these existing technologies. So at this point it's worth discussing what Linux itself can do right now, and what Linux kernel itself could do in some ways.

On the other hand, even without something as you said, "understand a set of use cases, determine safety requirements, and then complete the methods and procedures". Yes, I tend to agree that we need to make these stuff clear very well, but this doesn't mean we shouldn't talk about Linux itself now. Because we already have fundamental issues right there like, 
1. Real time issue: we need to get Linux being RTOS to meet safety-critical requirements.  
2. Partitioning {software, hardware}resources: we need to have strong barrier to providing such an evidence that one program can't interact with another in any ways including shared memory, interrupts, etc.
3. How to "remove" or disable any unnecessary or unused codes in safety-critical environment.
4. documentations to safety and security in Linux.
5. ...

Thanks
Tiejun



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] A Safety-critical Linux system architecture
  2018-09-13  3:13     ` Tiejun Chen
@ 2018-09-13  7:57       ` Linus Walleij
  2018-09-13  9:50       ` Greg KH
  1 sibling, 0 replies; 7+ messages in thread
From: Linus Walleij @ 2018-09-13  7:57 UTC (permalink / raw)
  To: tiejunc; +Cc: ksummit-discuss

On Thu, Sep 13, 2018 at 5:13 AM Tiejun Chen <tiejunc@vmware.com> wrote:
> > On Wed, Sep 12, 2018 at 12:35:07PM +0200, Linus Walleij wrote:
> > > On Wed, Sep 12, 2018 at 3:18 AM Tiejun Chen <tiejunc@vmware.com> wrote:

> > I do not feel Kernel Summit is the right venue for this discussion.

> I cannot understand why we cannot make this over there.

I have to side with Darren on this one, but I will likely not be at KS myself
so take whatever I say with a grain of salt.

I think the LF projects, specifically SIL2LinuxMP etc need to find their
right place to gather people and IMO the Linux Plumbers Conference
or Embedded Linux Conference would be the best place, it just intuitively
fits the type of subject.

But it is not for me to decide.

For the core kernel, I think tooling to the build system and git annotation
of safety reviews are the only things that come to mind, and it doesn't
need much discussion: somone needs to send patches. If they prove
extremely controversial and lead to lots of discussion (etc) THEN we
might want to discuss it at the kernel summit if things grind to a halt.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] A Safety-critical Linux system architecture
  2018-09-13  3:13     ` Tiejun Chen
  2018-09-13  7:57       ` Linus Walleij
@ 2018-09-13  9:50       ` Greg KH
  2018-09-16 11:30         ` Tiejun Chen
  1 sibling, 1 reply; 7+ messages in thread
From: Greg KH @ 2018-09-13  9:50 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: ksummit-discuss

On Thu, Sep 13, 2018 at 03:13:11AM +0000, Tiejun Chen wrote:
> On the other hand, even without something as you said, "understand a
> set of use cases, determine safety requirements, and then complete the
> methods and procedures". Yes, I tend to agree that we need to make
> these stuff clear very well, but this doesn't mean we shouldn't talk
> about Linux itself now. Because we already have fundamental issues
> right there like, 
> 1. Real time issue: we need to get Linux being RTOS to meet
> safety-critical requirements.  

So listing what is "lacking" from the existing -rt patchset would be
great, I'm sure those developers would want to know this.

Combined with some resources to help get the remaining -rt patches
merged upstream would also be great.

> 2. Partitioning {software, hardware}resources: we need to have strong
> barrier to providing such an evidence that one program can't interact
> with another in any ways including shared memory, interrupts, etc.

What is preventing you from adding this to Linux now?

> 3. How to "remove" or disable any unnecessary or unused codes in
> safety-critical environment.

If unused code is unused, why is it an issue?

And how do you describe "unnecessary"?  Who determines this?

> 4. documentations to safety and security in Linux.

What type of documentation is lacking?

These are all very generic questions/topics, why not propose a talk for
the KS track at Plumbers for it?  Or many talks as these really are a
lot of different, individual things.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] A Safety-critical Linux system architecture
  2018-09-13  9:50       ` Greg KH
@ 2018-09-16 11:30         ` Tiejun Chen
  0 siblings, 0 replies; 7+ messages in thread
From: Tiejun Chen @ 2018-09-16 11:30 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss

[snip]

> These are all very generic questions/topics, why not propose a talk for the KS
> track at Plumbers for it?  Or many talks as these really are a lot of different,
> individual things.

Greg and Linus,

Thank you for taking your time on this review.

Now I tend to agree that this'd better be proposed as a candidate to tech talk. 

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-09-16 11:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-12  1:18 [Ksummit-discuss] [TECH TOPIC] A Safety-critical Linux system architecture Tiejun Chen
2018-09-12 10:35 ` Linus Walleij
2018-09-12 16:29   ` Darren Hart
2018-09-13  3:13     ` Tiejun Chen
2018-09-13  7:57       ` Linus Walleij
2018-09-13  9:50       ` Greg KH
2018-09-16 11:30         ` Tiejun Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox