Random Vs. Combinatorial Methods in Modeling & Simulation
We have investigated the use of combinatorial methods for discrete event simulation. The study compared random and tway combinatorial inputs of a network simulator, to determine if these two approaches produce significantly different deadlock detection for varying network configurations.
Combinatorial (or tway) testing requires every combination of any t parameter values to be covered by at least one test. Combinatorial methods can be highly effective because empirical data suggest that nearly all failures involve the interaction of a small number of parameters (1 to 6). Thus, for example, if all deadlocks involve at most 5way interactions between n parameters, then exhaustive testing of all nway interactions adds no additional information that would not be obtained by testing all 5way interactions. While the maximum degree of interaction between parameters involved in the deadlocks clearly cannot be known in advance, covering all tway interactions may be more efficient than using random generation of inputs. In the study we tested this hypothesis for t = 2, 3, and 4 for deadlock detection in a network simulation. Achieving the same degree of coverage provided by 4way tests would have required approximately 3.2 times as many random tests; thus combinatorial methods were more efficient for detecting deadlocks involving a higher degree of interactions.
The comparisons between random and combinatorial methods suggested a number of conclusions:
 For binary variables (v=2), random tests compare reasonably well with covering arrays (96% to 99% coverage) for all three values of t for 15 or more variables. Thus random testing for a SUT with all or mostly binary variables may compare favorably with combinatorial testing.
 Combination coverage provided by random generation of the equivalent number of pairwise tests at (t = 2) decreases as the number of values per variable increases, and the coverage provided by pairwise testing is significantly less than 100%. The effectiveness of random testing relative to pairwise testing should be expected to decline as the average number of values per variable increases.
 For 4way interactions, coverage provided by random test generation increases with the number of variables. Combinatorial testing for a module with approximately 10 variables should be significantly more effective than random testing, while the difference between the two test methods should be less for modules with 20 or more variables.
 The advantage of combinatorial testing relative to random testing decreases at higher interaction levels. For example, with 15 or more variables, random tests provide 98% coverage or better for all levels of v.
 For 100% combination coverage, the efficiency advantage of combinatorial testing varies directly with the number of values per variable and inversely with the interaction strength t. The ratio of random/combinatorial coverage is highest for 10 variables with t = 2, but declines for other pairings of t andv. To obtain 100% combination coverage, random testing is significantly less efficient than combinatorial testing, requiring 2 to nearly 5 times as many tests as a covering array generated by IPOG. Thus if 100% combination coverage is desired, combinatorial testing should be significantly less expensive than random test generation.
An important practical consideration in comparing combinatorial with random testing is the efficiency of the covering array generator. Algorithms have a very wide range in the size of covering arrays they produce. It is not uncommon for the better algorithms to produce arrays that are more than 50% smaller than other algorithms. We have found in comparisons with other tools that there is no uniformly “best” algorithm (ref). Other algorithms may produce smaller or larger combinatorial test suites, so the comparable random test suite will vary in the number of combinations covered. Thus random testing may fare better in comparison with combinatorial tests produced by one of the less efficient algorithms.
However there is a less obvious but important tradeoff regarding covering array size. An algorithm that produces a very compact array, i.e., with few tests, for tway combinations may include fewer (t+1)way combinations because there are fewer tests. Tables 1 and 2 illustrate this phenomenon for an example. Table 1 shows the percentage of t+1 up to t+3 combination coverage provided by the IPOG tests and in Table 2 the equivalent number of random tests. Although IPOG pairwise tests provide better 3way coverage than the random tests, at other interaction strengths and values of t, the random tests are roughly the same or slightly better in combination coverage than IPOG. Pairwise combinatorial tests detected slightly fewer events than the equivalent number of random tests. One possible explanation may be that the superior 4way and 5way coverage of the random tests allowed detection of more events. Almost paradoxically, an algorithm that produces a larger, suboptimal covering array may provide better fault detection because the larger array is statistically more likely to include t+1, t+2, and higher degree interaction tests as a byproduct of the test generation. Again, however, the less optimal covering array is likely to more closely resemble the random test suite in fault detection.
Table 1. Higher (t+k) interaction coverage of combinatorial tway tests

t

3way coverage

4way coverage

5way coverage




2

.758

.429

.217




3


.924

.709




4



.974




Table 2. Higher (t+k) interaction coverage of random tway tests

t

3way coverage

4way coverage

5way coverage




2

.735

.499

.306




3


.917

.767




4



.974 



Note also that the number of faults in the SUT can affect the degree to which random testing approaches combinatorial testing effectiveness. For example, suppose the random test set covers 99% of combinations for 4way interactions, and the SUT contains only one 4way interaction fault. Then there is a 99% probability that the random tests will contain the 4way interaction that triggers this fault. However, if the SUT contains m independent faults, then the probability that combinations for all m faults are included in the random test set is .99^{m}. Hence with multiple faults, random testing may be significantly less effective, as its probability of detecting all faults will be 1 – c^{m}, for c = percent coverage and m = number of faults.
D.R. Kuhn, R. Kacker and Y.Lei, Random vs. Combinatorial Methods for Discrete Event Simulation of a Grid Computer Network, MODSIM World 2009, Virginia Beach, Virginia, October 1416, 2009. In Selected Papers Presented at MODSIM World 2009 Conference and Expo, edited by T.E. Pinelli, NASA/CP2010216205, National Aeronautics and Space Administration, pp. 8388.
Abstract; Paper
ModSim World paper is a condensed version of this tech report:
D.R. Kuhn,R. Kacker, Y. Lei, Combinatorial and Random Testing Effectiveness for a Grid Computer Simulator