Inherent characteristics of traceability artifacts: Less is more

Jane Huffman Hayes, Giulio Antoniol, Bram Adams, Yann Gaël Guehénéuc

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

This paper describes ongoing work to characterize the inherent ease or "traceability" with which a textual artifact can be traced using an automated technique. Software traceability approaches use varied measures to build models that automatically recover links between pairs of natural language documents. Thus far, most of the approaches use a single-step model, such as logistic regression, to identify new trace links. However, such approaches require a large enough training set of both true and false trace links. Yet, the former are by far in the minority, which reduces the performance of such models. Therefore, this paper formulates the problem of identifying trace links as the problem of finding, for a given logistic regression model, the subsets of links in the training set giving the best accuracy (in terms of G-metric) on a test set. Using hill climbing with random restart for subset selection, we found that, for the Change Style dataset, we can classify links with a precision of up to 40% and a recall of up to 66% using a training set as small as one true candidate link (out of 33) and 41 false links. To get better performance and learn the best possible logistic regression classifier, we must "discard" links in the trace dataset that increase noise to avoid learning with links that are not representative. This preliminary work is promising because it shows that few correct examples may perform better than several poor ones. It also shows which inherent characteristics of the artifacts make them good candidates to learn efficient traceability models automatically, i.e., it reveals their traceability.

Original languageEnglish
Title of host publication2015 IEEE 23rd International Requirements Engineering Conference, RE 2015 - Proceedings
Pages196-201
Number of pages6
ISBN (Electronic)9781467369053
DOIs
StatePublished - Nov 4 2015
Event23rd IEEE International Requirements Engineering Conference, RE 2015 - Ottawa, Canada
Duration: Aug 24 2015Aug 28 2015

Publication series

Name2015 IEEE 23rd International Requirements Engineering Conference, RE 2015 - Proceedings

Conference

Conference23rd IEEE International Requirements Engineering Conference, RE 2015
Country/TerritoryCanada
CityOttawa
Period8/24/158/28/15

Bibliographical note

Publisher Copyright:
© 2015 IEEE.

Keywords

  • Traceability
  • artifact characteristics
  • logistic regression
  • machine learning
  • model

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Inherent characteristics of traceability artifacts: Less is more'. Together they form a unique fingerprint.

Cite this