Abstract
Secondary crashes, roadway clearance time, and incident clearance time are three primary performance measures for traffic incident management. Crash databases, which are the main source of information on secondary crashes, often fail to identify or misidentify secondary crashes. Manual reviews of crash reports can improve the accuracy of crash classification, but they are expensive and time consuming. To improve the identification of secondary crashes, this study developed a text mining approach to distinguish secondary crashes based on crash narratives. To deal with the unstructured nature of crash narratives, a four-step process involving tokenization, counting, vectorization, and normalization was implemented to transform their content into numeric vectors suitable for machine learning. Next, an evaluation of several classification models found the logistic regression model produced the most accurate classifications. A single-word representation of the narratives offered the best performance compared to more complicated schemes and is recommended for future implementation. A review of classification results showed the model is effective at identifying keywords which characterize secondary crashes. Some false classifications may be a consequence of subjective reviewer interpretations and potential mislabeling. The findings demonstrate that the text mining approach provides satisfactory performance and has great potential for identifying secondary crashes.
Original language | English |
---|---|
Pages (from-to) | 1338-1358 |
Number of pages | 21 |
Journal | Journal of Transportation Safety and Security |
Volume | 12 |
Issue number | 10 |
DOIs | |
State | Published - Nov 25 2020 |
Bibliographical note
Publisher Copyright:© 2019 Taylor & Francis Group, LLC and The University of Tennessee.
Keywords
- crash narrative
- machine learning
- secondary crash
- text mining
ASJC Scopus subject areas
- Transportation
- Safety Research