DEV Community

Professional QA
Professional QA

Posted on

Software Fault Tolerance

Software Fault Tolerance is terminology dealing with testing a system’s capacity to deal with situations such as incorrect input, overloading of traffic at a given point of time, responding to a large number of requests and so on. Fault tolerance is measured as a sum total of hardware and software request and response calculation.
During the phases of the design and development of software, a designer or a developer might skip some minor fact that could later result in some functional or design error. The simple idea underlying software fault tolerance is to check whether the system comes across any software or hardware issue.

Software Fault Tolerance

Objectives of Software Fault Tolerance:

  • A major objective behind fault tolerance is to check the system’s capability to keep operating in a situation where other components may fail due to some error condition
  • Fault tolerance is a very important factor for life-critical systems. Life critical or safety-critical systems are those, the failure of which can cause damage to life, property or any environmental damage
  • To conduct a fault tolerance test, a fault tolerance design must be in place. Design is something which describes the test cases that presents the scenarios which the system has to undergo to be able to pass tolerance tests
  • The software fault tolerance process is also achieved by assessing the possibility of exceptional conditions and hence structuring the system to be able to sustain errors.

Software tolerance techniques:
When we talk about ‘software fault tolerance’ it becomes imperative to mention ‘N-way programming’.

1) N-way programming or N-version programming is a technique by which more than one version of a program is tested independently, thus minimizing the chances of similar software faults being repeated. The core idea underlying N-version programming is to improve the reliability of operations performed by software.

How to perform N-way software fault tolerance:

  • Functionality of the software is developed, such as functions, data types, algorithms for comparisons and so on as a part of the initial specification

  • Once a complete specification is ready, two or more versions of the program are prepared separately, having no relation to other versions. A program thus goes through N-version execution environment and surpasses every condition that validates the program capable of clearing maximum fault tolerance conditions

  • An execution environment can be a tool that captures the output generated by the system cross-checking them with its own set of agreements. Hence the algorithms execute, generate output and thus checks whether the output justifies tolerance conditions or not.

2) Recovery Block is a method that is used to evaluate different results of the same algorithm. When an application is being tested it is broken into blocks to capture faults, if any. Each block is categorized into a primary, secondary and exceptional case in order to verify a series of conditions.

3) Self-checking software: Self-checking software is a method by which a manual check is done as a part of an extra check on the system. It is measured by which we can be assured of the accuracy of the development and testing process that has taken place so far.

Top comments (0)