Toward automated e-cigarette surveillance: Spotting e-cigarette proponents on Twitter

Ramakanth Kavuluru, A. K.M. Sabbir

Research output: Contribution to journalArticlepeer-review

36 Scopus citations


Background: Electronic cigarettes (e-cigarettes or e-cigs) are a popular emerging tobacco product. Because e-cigs do not generate toxic tobacco combustion products that result from smoking regular cigarettes, they are sometimes perceived and promoted as a less harmful alternative to smoking and also as means to quit smoking. However, the safety of e-cigs and their efficacy in supporting smoking cessation is yet to be determined. Importantly, the federal drug administration (FDA) currently does not regulate e-cigs and as such their manufacturing, marketing, and sale is not subject to the rules that apply to traditional cigarettes. A number of manufacturers, advocates, and e-cig users are actively promoting e-cigs on Twitter. Objective: We develop a high accuracy supervised predictive model to automatically identify e-cig "proponents" on Twitter and analyze the quantitative variation of their tweeting behavior along popular themes when compared with other Twitter users (or tweeters). Methods: Using a dataset of 1000 independently annotated Twitter profiles by two different annotators, we employed a variety of textual features from latest tweet content and tweeter profile biography to build predictive models to automatically identify proponent tweeters. We used a set of manually curated key phrases to analyze e-cig proponent tweets from a corpus of over one million e-cig tweets along well known e-cig themes and compared the results with those generated by regular tweeters. Results: Our model identifies e-cig proponents with 97% precision, 86% recall, 91% F-score, and 96% overall accuracy, with tight 95% confidence intervals. We find that as opposed to regular tweeters that form over 90% of the dataset, e-cig proponents are a much smaller subset but tweet two to five times more than regular tweeters. Proponents also disproportionately (one to two orders of magnitude more) highlight e-cig flavors, their smoke-free and potential harm reduction aspects, and their claimed use in smoking cessation. Conclusions: Given FDA is currently in the process of proposing meaningful regulation, we believe our work demonstrates the strong potential of informatics approaches, specifically machine learning, for automated e-cig surveillance on Twitter.

Original languageEnglish
Pages (from-to)19-26
Number of pages8
JournalJournal of Biomedical Informatics
StatePublished - Jun 1 2016

Bibliographical note

Publisher Copyright:
© 2016 Elsevier Inc..


  • Electronic cigarettes
  • Text classification
  • Text mining

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications


Dive into the research topics of 'Toward automated e-cigarette surveillance: Spotting e-cigarette proponents on Twitter'. Together they form a unique fingerprint.

Cite this