The Materials Simulation Toolkit for Machine learning (MAST-ML): An automated open source toolkit to accelerate data-driven materials research

Ryan Jacobs, Tam Mayeshiba, Ben Afflerbach, Luke Miles, Max Williams, Matthew Turner, Raphael Finkel, Dane Morgan

Research output: Contribution to journalArticlepeer-review

34 Scopus citations

Abstract

As data science and machine learning methods are taking on an increasingly important role in the materials research community, there is a need for the development of machine learning software tools that are easy to use (even for nonexperts with no programming ability), provide flexible access to the most important algorithms, and codify best practices of machine learning model development and evaluation. Here, we introduce the Materials Simulation Toolkit for Machine Learning (MAST-ML), an open source Python-based software package designed to broaden and accelerate the use of machine learning in materials science research. MAST-ML provides predefined routines for many input setup, model fitting, and post-analysis tasks, as well as a simple structure for executing a multi-step machine learning model workflow. In this paper, we describe how MAST-ML is used to streamline and accelerate the execution of machine learning problems. We walk through how to acquire and run MAST-ML, demonstrate how to execute different components of a supervised machine learning workflow via a customized input file, and showcase a number of features and analyses conducted automatically during a MAST-ML run. Further, we demonstrate the utility of MAST-ML by showcasing examples of recent materials informatics studies which used MAST-ML to formulate and evaluate various machine learning models for an array of materials applications. Finally, we lay out a vision of how MAST-ML, together with complementary software packages and emerging cyberinfrastructure, can advance the rapidly growing field of materials informatics, with a focus on producing machine learning models easily, reproducibly, and in a manner that facilitates model evolution and improvement in the future.

Original languageEnglish
Article number109544
JournalComputational Materials Science
Volume176
DOIs
StatePublished - Apr 15 2020

Bibliographical note

Funding Information:
The authors gratefully acknowledge funding provided by the NSF Software Infrastructure for Sustained Innovation (SI2) award No. 1148011 and the NSF DMREF award number DMR-1332851. The NSF SI2 award No. 1148011 funded Ryan Jacobs, Tam Mayeshiba, Luke Miles, Max Williams and Matthew Turner. The NSF DMREF award number DMR-1332851 funded Ben Afflerbach. The authors wish to thank all of those who contributed to the development of MAST-ML in various ways, such as through (1) commits to the MAST-ML source code on Github ( https://github.com/uw-cmg/MAST-ML ), (2) contributed analysis scripts which were integrated to become MAST-ML features, (3) useful discussions with the authors of this paper on how to improve MAST-ML, (4) testing of different MAST-ML features to eliminate bugs and streamline performance (listed in alphabetical order): Alex Do, Shuo Han, Wei Li, Dr. Yu-chen Liu, Haijin Lu, Vanessa Meschke, Dr. Maciej Polak, Alex Politowicz, Sam Wagner, Zuf Wang, Dr. Logan Ward, Kangqi Xi, and Linda Xiao.

Publisher Copyright:
© 2020 Elsevier B.V.

Keywords

  • Data science
  • Machine learning
  • Materials informatics
  • Open source software

ASJC Scopus subject areas

  • Computer Science (all)
  • Chemistry (all)
  • Materials Science (all)
  • Mechanics of Materials
  • Physics and Astronomy (all)
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'The Materials Simulation Toolkit for Machine learning (MAST-ML): An automated open source toolkit to accelerate data-driven materials research'. Together they form a unique fingerprint.

Cite this