TY - JOUR
T1 - Using publicly available data to predict recreational cannabis legalization at the county-level
T2 - A machine learning approach
AU - Montgomery, Barrett Wallace
AU - Tong, Xiaoran
AU - Vsevolozhskaya, Olga
AU - Anthony, James C.
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/3
Y1 - 2024/3
N2 - Background: There is substantial geographic variability in local cannabis policies within states that have legalized recreational cannabis. This study develops an interpretable machine learning model that uses county-level population demographics, sociopolitical factors, and estimates of substance use and mental illness prevalences to predict the legality of recreational cannabis sales within each U.S. county. Methods: We merged data and selected 14 model inputs from the 2010 Census, 2012 County Presidential Data from the MIT Elections Lab, and Small Area Estimates from the National Surveys on Drug Use and Health (NSDUH) from 2010 to 2012 at the county level. County policies were labeled as having recreational cannabis legal (RCL) if the sale of recreational cannabis was allowed anywhere in the county in 2014, resulting in 92 RCL and 3002 non-RCL counties. We used synthetic data augmentation and minority oversampling techniques to build an ensemble of 1000 logistic regressions on random sub-samples of the data, withholding one state at a time and building models from all remaining states. Performance was evaluated by comparing the predicted policy conditions with the actual outcomes in 2014. Results: When compared to the actual RCL policies in 2014, the ensemble estimated predictions of counties transitioning to RCL had a macro f1 average score of 0.61. The main factors associated with legalizing county-level recreational cannabis sales were the prevalences of past-month cannabis use and past-year cocaine use. Conclusion: By leveraging publicly available data from 2010 to 2012, our model was able to achieve appreciable discrimination in predicting counties with legal recreational cannabis sales in 2014, however, there is room for improvement. Having demonstrated model performance in the first handful of states to legalize cannabis, additional testing with more recent data using time to event models is warranted.
AB - Background: There is substantial geographic variability in local cannabis policies within states that have legalized recreational cannabis. This study develops an interpretable machine learning model that uses county-level population demographics, sociopolitical factors, and estimates of substance use and mental illness prevalences to predict the legality of recreational cannabis sales within each U.S. county. Methods: We merged data and selected 14 model inputs from the 2010 Census, 2012 County Presidential Data from the MIT Elections Lab, and Small Area Estimates from the National Surveys on Drug Use and Health (NSDUH) from 2010 to 2012 at the county level. County policies were labeled as having recreational cannabis legal (RCL) if the sale of recreational cannabis was allowed anywhere in the county in 2014, resulting in 92 RCL and 3002 non-RCL counties. We used synthetic data augmentation and minority oversampling techniques to build an ensemble of 1000 logistic regressions on random sub-samples of the data, withholding one state at a time and building models from all remaining states. Performance was evaluated by comparing the predicted policy conditions with the actual outcomes in 2014. Results: When compared to the actual RCL policies in 2014, the ensemble estimated predictions of counties transitioning to RCL had a macro f1 average score of 0.61. The main factors associated with legalizing county-level recreational cannabis sales were the prevalences of past-month cannabis use and past-year cocaine use. Conclusion: By leveraging publicly available data from 2010 to 2012, our model was able to achieve appreciable discrimination in predicting counties with legal recreational cannabis sales in 2014, however, there is room for improvement. Having demonstrated model performance in the first handful of states to legalize cannabis, additional testing with more recent data using time to event models is warranted.
KW - Cannabis
KW - Drug policy
KW - Ensemble
KW - Epidemiology
KW - Machine learning
KW - Prediction
KW - Public health law
UR - http://www.scopus.com/inward/record.url?scp=85185888405&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85185888405&partnerID=8YFLogxK
U2 - 10.1016/j.drugpo.2024.104340
DO - 10.1016/j.drugpo.2024.104340
M3 - Article
C2 - 38342052
AN - SCOPUS:85185888405
SN - 0955-3959
VL - 125
JO - International Journal of Drug Policy
JF - International Journal of Drug Policy
M1 - 104340
ER -