This is an archive
(replace .gov by .rip)

Automated Combinatorial Testing for Software ACTS

Autonomous Systems

Self-driving cars and autonomous systems of all types are notoriously difficult challenges for software assurance.  Both traditional testing and formal methods are even harder to apply for autonomous systems than in ordinary cases. The key problem is that these systems must be able to function correctly in a vast space of possible input conditions.  For example, autonomous vehicles must deal with lighting, rain, fog, pedestrians, animals, other vehicles, road markings, signs, etc.  Combinatorial methods are uniquely well suited to analysis and testing for this enormous input space, because by their nature they efficiently test combinations that are rare, and very probably would not be included using traditional test methods. 

Why don’t traditional assurance methods work for autonomous systems?

The key method for testing life-critical software in aviation, and some other fields, is the modified condition/decision coverage (MCDC) criterion, which requires that every decision within code takes every possible outcome, each condition within each decision takes every possible outcome, and every condition in every decision has been shown to independently affect the decision outcome. 

This criterion is extraordinarily time-consuming and expensive to achieve, typically consuming 85% to 90% or more of the software development budget (NASA study).  For autonomous systems, there is an even more significant assurance problem:  large parts of these systems use neural network or other black-box software that is difficult if not impossible to verify and test according to the MCDC criterion, or other structural coverage measures such as branch or statement coverage.  The reason is that the behavior of neural networks depends on connections formed based on large volumes of input, and not on the logic and decisions in conventional software.

Testing and assurance of autonomous systems including neural nets is an unsolved problem and the subject of much current research.  Methods of addressing assurance in this space have focused on measuring neuron coverage, but it is far from clear if neuron coverage has a significant relationship with correctness and safety in autonomous systems.  It may be that neuron coverage is too simple of a metric that is not sufficiently indicative of system behavior, just as statement coverage in conventional software is a poor measure of test thoroughness or correctness.  Most efforts at assuring correct operation of autonomous systems have been based on testing for extended periods, with inadequate measures of how thorough such testing has been other than time, number of miles driven, or other basic quantities.  Such brute force testing may be insufficient for preventing failures under extremely rare circumstances.  A well-known example is the fatal crash of a car in autonomous mode that resulted from a very rare four-factor combination of a white truck against a brightly lit sky, along with truck height and angle versus the car [1,2].

NIST, the US national metrology laboratory, has developed methods and tools for measuring the degree to which testing has covered extremely rare conditions.  Rather than simply conducting huge volumes of tests, with little assurance that rare circumstances can be handled by the autonomous system, the NIST methods can measure the degree to which t-way combinations have been covered in tests, and identify any combinations that have been missed.  Research by NIST and others has shown that all, or nearly all, failures involve no more than six factors, so testing of all 5-way to 7-way combinations can provide high assurance.   This can be achieved by generating covering arrays of all t-way combinations of inputs. 

An alternative that may be more practical in full system testing, is to use NIST’s CCM input space coverage measurement tool to measure combination coverage of inputs.  This is a complementary measure to traditional structural coverage.  But since structural coverage measures do not apply to many AI/ML components, input space coverage may be the only way to evaluate test adequacy for autonomous systems (more on the idea of input space coverage measurement here).

[1] https://www.tesla.com/blog/tragic-loss 
[2] https://www.ntsb.gov/news/press-releases/Pages/PR20170912.aspx

 

Examples of combinatorial testing for assurance of autonomous systems

Gladisch, C., Heinzemann, C., Herrmann, M., & Woehrle, M. (2020). Leveraging Combinatorial Testing for Safety-Critical Computer Vision Datasets. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 324-325).

We show how combinatorial testing based on a domain model can be leveraged for generating test sets providing coverage guarantees with respect to important environmental features and their interaction. Additionally, we show how our approach can be used for growing a dataset, i.e. to identify where data is missing and should be collected next. We evaluate our approach on an internal use case and two public datasets.

Li, Y., Tao, J., & Wotawa, F. (2020). Ontology-based test generation for automated and autonomous driving functions. Information and Software Technology, 117, 106200.

The proposed approach for testing autonomous driving takes ontologies describing the environment of autonomous vehicles, and automatically converts it to test cases that are used in a simulation environment to verify automated driving functions. The conversion relies on combinatorial testing. The first experimental results relying on an example from the automotive industry indicates that the approach can be used in practice.

Duan, J., Gao, F., & He, Y. (2020). Test scenario generation and optimization technology for intelligent driving systems. IEEE Intelligent Transportation Systems Magazine.

In this paper, we propose a new scenario generation algorithm called Combinatorial Testing Based on Complexity (CTBC) based on both combinatorial testing (CT) method and Test Matrix (TM) technique for intelligent driving systems. The effectiveness of this method is validated by applying it to the lane departure warning (LDW) system on a hardware-in-the-loop (HIL) test platform.

Herbold, S., & Haar, T. (2020). Smoke Testing for Machine Learning: Simple Tests to Discover Severe Defects. arXiv preprint arXiv:2009.01521. 

Within this article, we discuss the question whether standard software testing techniques that have been part of textbooks since decades are also useful for the testing of machine learning software. Concretely, we try to determine generic smoke tests that can be used to assert that basic functions can be executed without crashing. We found that we can derive such tests using techniques similar to equivalence classes and boundary value analysis. Moreover, we found that these concepts can also be applied to hyperparameters, to further improve the quality of the smoke tests. 

Gannous, A., & Andrews, A. (2019, October). Integrating Safety Certification Into Model-Based Testing of Safety-Critical Systems. 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE) (pp. 250-260). IEEE.

With the increase of autonomous, sensing-based functionality in safety-critical systems, efficient and cost-effective testing that maximizes safety evidences has become increasingly challenging. A previously proposed framework for testing safety critical systems called Model-Combinatorial based testing (MCbt) has the potential for addressing these challenges. MCbt is a framework that proposes an integration of model-based testing, fault analysis, and combinatorial testing to produce the maximum number of evidences for an efficient safety certification process but was never actually used to derive a specific testing approach. In this paper, we present a concrete application of MCbt with an application to a case study. The validation showed that MCbt is more efficient and produces more safety evidences compared to state-of-the-art testing approaches.

Tao, J., Li, Y., Wotawa, F., Felbinger, H., & Nica, M. (2019, April). On the Industrial Application of Combinatorial Testing for Autonomous Driving Functions. In 2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)(pp. 234-240). IEEE.

We discuss a method for testing automated and autonomous driving functions using ontologies and combinatorial testing that is able to automate test case generation. Moreover, we report on the application of the method at the industrial level. There we depict the comprehensive application process from the construction of the ontology to test suite execution in detail. This case study shows that the proposed approach can be used for testing and validation of autonomous driving functions in practice.

Klueck, F., Li, Y., Nica, M., Tao, J., & Wotawa, F. (2018, October). Using ontologies for test suites generation for automated and autonomous driving functions. In 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)(pp. 118-123). IEEE.

In this paper, we outline a general automated testing approach to be applied for verification and validation of automated and autonomous driving functions. The approach makes use of ontologies of environment the system under test is interacting with. Ontologies are automatically converted into input models for combinatorial testing, which are used to generate test cases. The obtained abstract test cases are used to generate concrete test scenarios that provide the basis for simulation used to verify the functionality of the system under test. We discuss the general approach including its potential for automation in the automotive domain where there is growing need for sophisticated verification based on simulation in case of automated and autonomous vehicles.

Tuncali, C. E., Fainekos, G., Ito, H., & Kapinski, J. (2018, June). Simulation-based adversarial test generation for autonomous vehicles with machine learning components. In 2018 IEEE Intelligent Vehicles Symposium (IV)(pp. 1555-1562). IEEE.

One of the main challenges is that many autonomous driving systems have machine learning (ML) components, such as deep neural networks, for which formal properties are difficult to characterize. We present a testing framework that is compatible with test case generation and automatic falsification methods, which are used to evaluate cyber-physical systems. We demonstrate how the framework can be used to evaluate closed-loop properties of an autonomous driving system model that includes the ML components, all within a virtual environment. We demonstrate how to use test case generation methods, such as covering arrays, as well as requirement falsification methods to automatically identify problematic test scenarios. The resulting framework can be used to increase the reliability of autonomous driving systems.

Masuda, S., Nakamura, H., & Kajitani, K. (2018). Rule-based searching for collision test cases of autonomous vehicles simulation. IET Intelligent Transport Systems12(9), 1088-1095.

Thorough testing of AD software using simulations must be conducted in advance of testing AD cars on the road. Parameters of the many objects around an AD car, such as other cars, traffic lanes and pedestrians are required as inputs of the simulation. Therefore, the number of parameter combinations becomes extremely large. A combination of parameters is called a test case; hence, the challenge is to search collision test cases from the extremely large number of combinations. A rule-based method is the main focus because an explicit method of searching test cases is required in certain industries in the real world. In this study, a method of rule-based searching for collision test cases of autonomous vehicles simulations is proposed. Simulation models that have rules between an AD car and other cars are defined. Algorithms were also developed to search collision test cases that generate test cases incrementally. Experiments on AD simulations involving the simulation models of a three-lane highway and a signalised intersection were conducted. The results indicate the efficiency of the method.

Abdessalem, R. B., Nejati, S., Briand, L. C., & Stifter, T. (2018, May). Testing vision-based control systems using learnable evolutionary algorithms. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)(pp. 1016-1026). IEEE.

Our evaluation performed on an industrial automotive automotive system shows that: (1) Our algorithm outperforms a baseline evolutionary search algorithm and generates 78% more distinct, critical test scenarios compared to the baseline algorithm. (2) Our algorithm accurately characterizes critical regions of the system under test, thus identifying the conditions that are likely to lead to system failures.

Rocklage, E., Kraft, H., Karatas, A., & Seewig, J. (2017, October). Automated scenario generation for regression testing of autonomous vehicles. In 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC)(pp. 476-483). IEEE.

We present a novel approach to automatically generate test scenarios for regression testing of autonomous vehicle systems as a black box in a virtual simulation environment. To achieve this we focus on the problem of generating the motion of other traffic participants without loss of generality. We combine the combinatorial interaction testing approach with a simple trajectory planner as a feasibility checker to generate efficient test sets with variable coverage. The underlying constraint satisfaction problem is solved with a simple backtracking algorithm.

Tuncali, C. E., Fainekos, G., Prokhorov, D., Ito, H., & Kapinski, J. (2019). Requirements-driven Test Generation for Autonomous Vehicles with Machine Learning Components. IEEE Transactions on Intelligent Vehicles. 2019 Nov 25;5(2):265-80.

We use multiple methods to generate test cases, including covering arrays, which is an efficient method to search discrete variable spaces. The resulting test cases can be used to debug the controller design by identifying controller behaviors that do not satisfy requirements. The test cases can also enhance the testing phase of development by identifying critical corner cases that correspond to the limits of the system’s allowed behaviors. We present STL requirements for an autonomous vehicle system, which capture both component-level and system level behaviors. Additionally, we present three driving scenarios and demonstrate how our requirements-driven testing framework can be used to identify critical system behaviors, which can be used to support the development process.

Majumdar, R., Mathur, A., Pirron, M., Stegner, L., & Zufferey, D. (2019). Paracosm: A Language and Tool for Testing Autonomous Driving SystemsarXiv preprint arXiv:1902.01084.

Paracosm allows users to programmatically describe complex driving situations with specific visual features, e.g., road layout in an urban environment, as well as reactive temporal behaviors of cars and pedestrians. Paracosm programs are executed on top of a game engine that provides realistic physics simulation and visual rendering. The infrastructure allows systematic exploration of the state space, both for visual features (lighting, shadows, fog) and for reactive interactions with the environment (pedestrians, other traffic). We define a notion of test coverage for Paracosm configurations based on combinatorial testing and low dispersion sequences.

Created May 24, 2016, Updated November 23, 2020