TY - GEN
T1 - Validation of software testing experiments
T2 - 7th IEEE International Conference on Software Testing, Verification and Validation, ICST 2014
AU - Hays, Mark
AU - Hayes, Jane Huffman
AU - Bathke, Arne C.
PY - 2014
Y1 - 2014
N2 - Researchers in software testing are often faced with the following problem of empirical validation: does a new testing technique actually help analysts find more faults than some baseline method? Researchers evaluate their contribution using statistics to refute the null hypothesis that their technique is no better at finding faults than the state of the art. The decision as to which statistical methods are appropriate is best left to an expert statistician, but the reality is that software testing researchers often don't have this luxury. We developed an algorithm, Means Test, to help automate some aspects of statistical analysis. We implemented Means Test in the statistical software environment R, encouraging reuse and decreasing the need to write and test statistical analysis code. Our experiment showed that Means Test has significantly higher F-measures than several other common hypothesis tests. We applied Means Test to systematically validate the work presented at the 2013 IEEE Sixth International Conference on Software Testing, Verification, and Validation (ICST'13). We found six papers that potentially misstated the significance of their results. Means Test provides a free and easy-to-use possibility for researchers to check whether their chosen statistical methods and the results obtained are plausible. It is available for download at coest.org.
AB - Researchers in software testing are often faced with the following problem of empirical validation: does a new testing technique actually help analysts find more faults than some baseline method? Researchers evaluate their contribution using statistics to refute the null hypothesis that their technique is no better at finding faults than the state of the art. The decision as to which statistical methods are appropriate is best left to an expert statistician, but the reality is that software testing researchers often don't have this luxury. We developed an algorithm, Means Test, to help automate some aspects of statistical analysis. We implemented Means Test in the statistical software environment R, encouraging reuse and decreasing the need to write and test statistical analysis code. Our experiment showed that Means Test has significantly higher F-measures than several other common hypothesis tests. We applied Means Test to systematically validate the work presented at the 2013 IEEE Sixth International Conference on Software Testing, Verification, and Validation (ICST'13). We found six papers that potentially misstated the significance of their results. Means Test provides a free and easy-to-use possibility for researchers to check whether their chosen statistical methods and the results obtained are plausible. It is available for download at coest.org.
KW - empirical validation
KW - software testing
KW - statistical analysis
UR - http://www.scopus.com/inward/record.url?scp=84903180173&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84903180173&partnerID=8YFLogxK
U2 - 10.1109/ICST.2014.46
DO - 10.1109/ICST.2014.46
M3 - Conference contribution
AN - SCOPUS:84903180173
SN - 9780769551852
T3 - Proceedings - IEEE 7th International Conference on Software Testing, Verification and Validation, ICST 2014
SP - 333
EP - 342
BT - Proceedings - IEEE 7th International Conference on Software Testing, Verification and Validation, ICST 2014
Y2 - 31 March 2014 through 4 April 2014
ER -