Published: April 12, 2021
Author(s)
Erin Lanus (Virginia Tech), Laura Freeman (Virginia Tech), Richard Kuhn (NIST), Raghu Kacker (NIST)
Conference
Name: 2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)
Dates: 04/12/2021 - 04/16/2021
Location: (Virtual) Porto de Galinhas, Brazil
This short paper defines a combinatorial coverage metric for comparing machine learning (ML) data sets and proposes the differences between data sets as a function of combinatorial coverage. The paper illustrates its utility for evaluating and predicting performance of ML models. Identifying and measuring differences between data sets can be of significant value for ML problems, where the accuracy of the model is heavily dependent on the degree to which training data are sufficiently representative of data that will be encountered in application. The utility of the method is illustrated for transfer learning, the problem of predicting performance of a model trained on one data set when applied to another.
This short paper defines a combinatorial coverage metric for comparing machine learning (ML) data sets and proposes the differences between data sets as a function of combinatorial coverage. The paper illustrates its utility for evaluating and predicting performance of ML models. Identifying and...
See full abstract
This short paper defines a combinatorial coverage metric for comparing machine learning (ML) data sets and proposes the differences between data sets as a function of combinatorial coverage. The paper illustrates its utility for evaluating and predicting performance of ML models. Identifying and measuring differences between data sets can be of significant value for ML problems, where the accuracy of the model is heavily dependent on the degree to which training data are sufficiently representative of data that will be encountered in application. The utility of the method is illustrated for transfer learning, the problem of predicting performance of a model trained on one data set when applied to another.
Hide full abstract
Keywords
combinatorial testing; machine learning; operating envelopes; transfer learning; test-set selection
Control Families
None selected