TY - GEN
T1 - A multi-label classification approach for coding cancer information service chat transcripts
AU - Rios, Anthony
AU - Vanderpool, Robin
AU - Shaw, Pam
AU - Kavuluru, Ramakanth
PY - 2013
Y1 - 2013
N2 - National Cancer Institute's (NCI) Cancer Information Service (CIS) offers online instant messaging based information service called LiveHelp to patients, family members, friends, and other cancer information consumers. A cancer information specialist (IS) 'chats' with a consumer and provides information on a variety of topics including clinical trials. After a LiveHelp chat session is finished, the IS codes about 20 different elements of metadata about the session in electronic contact record forms (ECRF), which are to be later used for quality control and reporting. Besides straightforward elements like age and gender, more specific elements to be coded include the purpose of contact, the subjects of interaction, and the different responses provided to the consumer, the latter two often taking on multiple values. As such, ECRF coding is a time consuming task and automating this process could help ISs to focus more on their primary goal of helping consumers with valuable cancer related information. As a first attempt in this task, we explored multi-label and multi-class text classification approaches to code the purpose, subjects of interaction, and the responses provided based on the chat transcripts. With a sample dataset of about 673 transcripts, we achieved example-based F-scores of 0.67 (for subjects) and 0.58 (responses). We also achieved label-based micro F-scores of 0.65 (for subjects), 0.62 (for responses), and 0.61 (for purpose). To our knowledge this is the first attempt in automatic coding of LiveHelp transcripts and our initial results on the smaller corpus indicate promising future directions in this task.
AB - National Cancer Institute's (NCI) Cancer Information Service (CIS) offers online instant messaging based information service called LiveHelp to patients, family members, friends, and other cancer information consumers. A cancer information specialist (IS) 'chats' with a consumer and provides information on a variety of topics including clinical trials. After a LiveHelp chat session is finished, the IS codes about 20 different elements of metadata about the session in electronic contact record forms (ECRF), which are to be later used for quality control and reporting. Besides straightforward elements like age and gender, more specific elements to be coded include the purpose of contact, the subjects of interaction, and the different responses provided to the consumer, the latter two often taking on multiple values. As such, ECRF coding is a time consuming task and automating this process could help ISs to focus more on their primary goal of helping consumers with valuable cancer related information. As a first attempt in this task, we explored multi-label and multi-class text classification approaches to code the purpose, subjects of interaction, and the responses provided based on the chat transcripts. With a sample dataset of about 673 transcripts, we achieved example-based F-scores of 0.67 (for subjects) and 0.58 (responses). We also achieved label-based micro F-scores of 0.65 (for subjects), 0.62 (for responses), and 0.61 (for purpose). To our knowledge this is the first attempt in automatic coding of LiveHelp transcripts and our initial results on the smaller corpus indicate promising future directions in this task.
UR - http://www.scopus.com/inward/record.url?scp=84889783075&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84889783075&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84889783075
SN - 9781577356059
T3 - FLAIRS 2013 - Proceedings of the 26th International Florida Artificial Intelligence Research Society Conference
SP - 338
EP - 343
BT - FLAIRS 2013 - Proceedings of the 26th International Florida Artificial Intelligence Research Society Conference
Y2 - 22 May 2013 through 24 May 2013
ER -