Using publicly available data to predict recreational cannabis legalization at the county-level: A machine learning approach

Barrett Wallace Montgomery, Xiaoran Tong, Olga Vsevolozhskaya, James C. Anthony

Research output: Contribution to journalArticlepeer-review


Background: There is substantial geographic variability in local cannabis policies within states that have legalized recreational cannabis. This study develops an interpretable machine learning model that uses county-level population demographics, sociopolitical factors, and estimates of substance use and mental illness prevalences to predict the legality of recreational cannabis sales within each U.S. county. Methods: We merged data and selected 14 model inputs from the 2010 Census, 2012 County Presidential Data from the MIT Elections Lab, and Small Area Estimates from the National Surveys on Drug Use and Health (NSDUH) from 2010 to 2012 at the county level. County policies were labeled as having recreational cannabis legal (RCL) if the sale of recreational cannabis was allowed anywhere in the county in 2014, resulting in 92 RCL and 3002 non-RCL counties. We used synthetic data augmentation and minority oversampling techniques to build an ensemble of 1000 logistic regressions on random sub-samples of the data, withholding one state at a time and building models from all remaining states. Performance was evaluated by comparing the predicted policy conditions with the actual outcomes in 2014. Results: When compared to the actual RCL policies in 2014, the ensemble estimated predictions of counties transitioning to RCL had a macro f1 average score of 0.61. The main factors associated with legalizing county-level recreational cannabis sales were the prevalences of past-month cannabis use and past-year cocaine use. Conclusion: By leveraging publicly available data from 2010 to 2012, our model was able to achieve appreciable discrimination in predicting counties with legal recreational cannabis sales in 2014, however, there is room for improvement. Having demonstrated model performance in the first handful of states to legalize cannabis, additional testing with more recent data using time to event models is warranted.

Original languageEnglish
Article number104340
JournalInternational Journal of Drug Policy
StatePublished - Mar 2024

Bibliographical note

Publisher Copyright:
© 2024 Elsevier B.V.


  • Cannabis
  • Drug policy
  • Ensemble
  • Epidemiology
  • Machine learning
  • Prediction
  • Public health law

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Health Policy


Dive into the research topics of 'Using publicly available data to predict recreational cannabis legalization at the county-level: A machine learning approach'. Together they form a unique fingerprint.

Cite this