Discussion on: Safety-Critical Software: 15 things every developer should know

View post

This is a great article!

I do have a few comments.

Despite being all around us, safety-critical software isn't on the average developer's radar.

Not only is it not on the average developer's radar, but it's almost certainly not on the average consumer's radar either. And as software systems get more information about our personal lives - what we look like, who we talk to, where we go - the systems become more safety-critical for more people. Maybe we aren't talking about the system itself causing bodily harm, but the system containing data that, in the wrong hands, could cause harm to the individual. We don't all need NASA levels of development processes and procedures to build systems that have runtimes of tens of thousands of years without errors, but many can learn from some techniques that go into building these critical systems.

Safety-critical software is about as far from agile as you can get

This is probably the only thing that I don't agree with at all.

Agile in safety-critical systems isn't about improving speed. In fact, nothing about Agile Software Development is about improving speed - the "speed improvement" is mostly perceived due to more frequently deliver and demonstration of software. The advantage of agility is about responding to uncertainty and changes.

The ability to respond to changing requirements is important, even in critical systems. I can't tell you the number of times that the software requirements changed because the hardware was designed and built in a manner that didn't fully support the system requirements. It was more cost effective to fix the hardware problems in software and go through the software release process than it was to redesign, manufacture, and correct the hardware in fielded systems. In some cases, the hardware was fixed for future builds of the system, but the software fix was necessary for systems already deployed in the field.

The basic values and principles still apply to safety-critical systems and I highly recommend considering many of the techniques commonly associated with Agile Software Development as I've personally seen it improve the quality of software components going into integration and validation.

Blaine Osepchuk • Mar 2 '20 • Edited

Hi Thomas, thanks for taking the time to leave a comment.

I totally agree with your first comment.

I have some thoughts about your second comment though. It's probably more correct to say that teams adopt agile to better respond to uncertainty and change than to go faster and reduce costs. Thanks for pointing that out.

I want to preface my next comments by stating that I've never worked on a safety-critical project. I'm just writing about what I learned from reading and watching talks about it.

Let's look at the agile principles and how they might present themselves in a large safety-critical system:

Customer satisfaction by early and continuous delivery of valuable software. (Continuous delivery is unlikely. Daily builds are possible and desirable but that's not the same thing)
Welcome changing requirements, even in late development. (I doubt they will be welcome, but they may be grudgelying accepted as the best way forward, given the enormous amount of extra work changes to requirements may entail)
Deliver working software frequently (weeks rather than months) (Doubtful. Product will be delivered after it is certified)
Close, daily cooperation between business people and developers (Possible. Desirable)
Projects are built around motivated individuals, who should be trusted (Products are built from detailed plans and documented processes. Motivated individuals are good but everybody make mistakes so extensive checks and balances are required to ensure correctness and quality)
Face-to-face conversation is the best form of communication (co-location) (Agreed. But many large systems are built by different teams or even different companies. Colocating 500 developers is rarely practical)
Working software is the primary measure of progress (I'm not sure how this one would be viewed in a safety-critical project)
Sustainable development, able to maintain a constant pace (Definitely desirable. But I have read about death marches in safety-critical software development)
Continuous attention to technical excellence and good design (Absolutely, although the design and the code are almost certainly not created by the same people)
Simplicity—the art of maximizing the amount of work not done—is essential (Excellent goal for all projects but on a safety critical project the individual has little discretion over what not to do)
Best architectures, requirements, and designs emerge from self-organizing teams (It's more likely that people are assigned roles in the project by management. Self-organization will likely be discouraged. You must follow the process)
Regularly, the team reflects on how to become more effective, and adjusts accordingly (Great ideal. I haven't read anything about team learning or retrospectives on safety-critical projects. Training is emphasized in several documents but that's not the same thing)

So, after going through that exercise I think I agree that the agile principles can add value to a safety-critical development effort. But I think several of them are in direct conflict with the processes imposed by the standards and the nature of these projects. We are therefore unlikely to see them as significant drivers of behavior in these kinds of projects.

If you were asked to look at a safety-critical project, examine its documents and plans, and even watch people work, and then rate the project from one to ten where one was for a waterfall project and ten was for an agile project, my impression is that most people would rate safety-critical projects closer to one than ten. Would you agree with that?

Thomas J Owens • Mar 2 '20

Customer satisfaction by early and continuous delivery of valuable software. (Continuous delivery is unlikely. Daily builds are possible and desirable but that's not the same thing)

It depends on how you define "customer". Consider a value stream map in a critical system. The immediate downstream "customer" of the software development process isn't the end user. You won't be able to continuously deliver to the end user or end customer - the process of going through an assessment or validation process is simply too costly. In a hardware/software system, it's likely to be a systems integration team. It could also be an independent software quality assurance team.

Continuous integration is almost certainly achievable in critical systems. Continuous delivery (when defined as delivery to the right people) is also achievable. Depending on the type of system, continuous deployment to a test environment (perhaps even a customer-facing test environment) is possible, but not it's not going to be to a production environment like you can with some environment.

Welcome changing requirements, even in late development. (I doubt they will be welcome, but they may be grudgelying accepted as the best way forward, given the enormous amount of extra work changes to requirements may entail)

If you were to change a requirement after certification or validation, yeah, it's a mess. You would need to go through the certification or validation process again. That's going to be product and industry specific, but it likely takes time and costs a bunch of money.

However, changing requirements before certification or validation is a different beast. It's much easier. However, if it could have impacts in how the certification or validation is done. It also matters a lot if it's a new or modified system requirement or a new or modified software requirement (in cases of hardware/software systems).

This is one of the harder ones, but in my experience, most of the "changing requirements" in critical systems comes in one of two forms. First is changing the software requirements to account for hardware problems to ensure the system meets its system requirements. Second is reuse of the system in a new context that may require software to be changed.

Deliver working software frequently (weeks rather than months) (Doubtful. Product will be delivered after it is certified)

Again, you can think outside the box on what it means to "deliver working software". The development team frequently delivers software not to end users or end customers but integration and test teams. They can be set up to receive a new iteration of the software in weeks or months and be able to create and dry run system level verification and validation tests and get feedback to development teams on the appropriate cadence.

Close, daily cooperation between business people and developers (Possible. Desirable)

I don't think there's a difference here. Hardware/software systems also need this collaboration between software developers and the hardware engineering teams. There may also be independent test teams and such to collaborate with. But the ideal of collaboration is still vital. Throwing work over the wall is antithetical to not only agile values and principles, but lean values and principles.

Projects are built around motivated individuals, who should be trusted (Products are built from detailed plans and documented processes. Motivated individuals are good but everybody make mistakes so extensive checks and balances are required to ensure correctness and quality)

Yes, you need more documentation around the product and the processes used to build it. But highly motivated individuals go a long way to supporting continuous improvement and building a high quality product.

Face-to-face conversation is the best form of communication (co-location) (Agreed. But many large systems are built by different teams or even different companies. Colocating 500 developers is rarely practical)

I'm not familiar with any instance with anywhere close to 500 developers. Maybe on an entire large scale system, but you typically build to agreed upon interfaces. Each piece may have a team or a few teams working on it. This is hard to do on large programs, but when you look at the individual products that make up that large program, it's definitely achievable.

Working software is the primary measure of progress (I'm not sure how this one would be viewed in a safety-critical project)

This goes back to defining who you deliver to. Getting working software to integration and test teams so they can integrate it with hardware and check it out helps them prepare for the real system testing much earlier. They can make sure all the tests are in place and dry run. Any test harnesses or tools can be built iteratively just like the software. Since testing usually takes a hit in project scheduling and budgeting anyway, this helps identify risk early.

Sustainable development, able to maintain a constant pace (Definitely desirable. But I have read about death marches in safety-critical software development)

I've also read about death marches in non-safety-critical software development. Other techniques such as frequent delivery and involvement of the downstream parties helps to identify and mitigate risk early.

Continuous attention to technical excellence and good design (Absolutely, although the design and the code are almost certainly not created by the same people)

The idea of a bunch of people sitting in a room coming up with a design and then throwing it over the wall exists, but it's not as common as you'd think. When I worked in aerospace, it started with the system engineers and working with senior engineers from various disciplines to figure out the building blocks. When it came to software, the development team that wrote the code also did the detailed design. The senior engineer who was involved at the system level was usually on the team that did the detailed design and coding as well.

Simplicity—the art of maximizing the amount of work not done—is essential (Excellent goal for all projects but on a safety critical project, the individual has little discretion over what not to do)

This is very closely related to the lean principles of reducing waste. It's true that there is little discretion over the requirements that need to be implemented before the system can be used, but there can still be ways to ensure that all the requirements do trace back to a need. There's also room to lean out the process and make sure that the documentation being produced is required to support downstream activities. Going electronic for things like bidirectional traceability between requirements and code and test and test results and using tools that allow reports and artifacts to be generated and "fall out of doing the work" also go a long way to agility in a regulated context.

Best architectures, requirements, and designs emerge from self-organizing teams (It's more likely that people are assigned roles in the project by management. Self-organization will likely be discouraged. You must follow the process)

This depends greatly on the organization and the criticality of the system being developed. It's important to realize that the regulations and guidelines around building critical systems almost always tell you what you must do, not how to do it. With the right support, a team can develop methods that facilitate agility that meet any rules they must follow.

Regularly, the team reflects on how to become more effective, and adjusts accordingly (Great ideal. I haven't read anything about team learning or retrospectives on safety-critical projects. Training is emphasized in several documents but that's not the same thing)

Retros for a safety critical project aren't that different than anything else. The biggest difference is that the team is more constrained in what they are allowed to do with their process by regulatory compliance and perhaps their organization's quality management system.

If you were asked to look at a safety-critical project, examine its documents and plans, and even watch people work, and then rate the project from one to ten where one was for a waterfall project and ten was for an agile project, my impression is that most people would rate safety-critical projects closer to one than ten. Would you agree with that?

I believe that you could get up to a 7 or 8. I think that agility is still gaining traction in the safety-critical community, and it's probably at a 1 or 2 now. It's extremely difficult to coach a development team operating in a safety critical or regulated space without a background in that space. But after having done it, it's possible to see several benefits from agility.

Blaine Osepchuk • Mar 2 '20

Awesome feedback! Thanks for sharing your knowledge and experience.