Identifying secondary crashes using text mining techniques

Xu Zhang, Eric Green, Mei Chen, Reginald R. Souleyrette

Research output: Contribution to journalArticlepeer-review

23 Scopus citations


Secondary crashes, roadway clearance time, and incident clearance time are three primary performance measures for traffic incident management. Crash databases, which are the main source of information on secondary crashes, often fail to identify or misidentify secondary crashes. Manual reviews of crash reports can improve the accuracy of crash classification, but they are expensive and time consuming. To improve the identification of secondary crashes, this study developed a text mining approach to distinguish secondary crashes based on crash narratives. To deal with the unstructured nature of crash narratives, a four-step process involving tokenization, counting, vectorization, and normalization was implemented to transform their content into numeric vectors suitable for machine learning. Next, an evaluation of several classification models found the logistic regression model produced the most accurate classifications. A single-word representation of the narratives offered the best performance compared to more complicated schemes and is recommended for future implementation. A review of classification results showed the model is effective at identifying keywords which characterize secondary crashes. Some false classifications may be a consequence of subjective reviewer interpretations and potential mislabeling. The findings demonstrate that the text mining approach provides satisfactory performance and has great potential for identifying secondary crashes.

Original languageEnglish
Pages (from-to)1338-1358
Number of pages21
JournalJournal of Transportation Safety and Security
Issue number10
StatePublished - Nov 25 2020

Bibliographical note

Publisher Copyright:
© 2019 Taylor & Francis Group, LLC and The University of Tennessee.


  • crash narrative
  • machine learning
  • secondary crash
  • text mining

ASJC Scopus subject areas

  • Transportation
  • Safety Research


Dive into the research topics of 'Identifying secondary crashes using text mining techniques'. Together they form a unique fingerprint.

Cite this