Abstract
One of the most important soil hydraulic properties for modeling water transport in the vadose zone is saturated hydraulic conductivity. However, it is challenging to measure it in the field. Pedotransfer Functions (PTFs) are mathematical models that can predict saturated hydraulic conductivity (Ks) from easily measured soil characteristics. Though the development of PTFs for predicting Ks is not new, the tools and methods used to predict Ks are continuously evolving. Model performance depends on choosing soil features that explain the largest amount of Ks variance with the fewest input variables. In addition, the lack of interpretability in most "black box"machine learning models makes it difficult to extract practical knowledge as the machine learning process obfuscates the relationship between inputs and outputs in the PTF models. The objective of this study was to develop a set of new PTFs for predicting Ks using machine learning algorithms and a large database of over 8000 soil samples (the Florida Soil Characterization Database) while incorporating statistical methods to inform predictor selection for the model inputs. Of the machine learning (ML) models tested, random forest regression (RF) and gradient-boosted regression (GB) gave the best performances, with R2 = 0.71 and RMSE = 0.47 cm h-1 on the test data for both. Using the permutation feature importance technique, the GB and RF regression models showed similar results, where clay content described the most variation in the data, followed by bulk density. The implication of this study is that, when predicting Ks using the Florida Soil Characterization Database, priority should be given to obtaining quality data on clay content and bulk density as they are the most influential predictors for estimating Ks.
Original language | English |
---|---|
Pages (from-to) | 285-296 |
Number of pages | 12 |
Journal | Journal of the ASABE |
Volume | 66 |
Issue number | 2 |
DOIs | |
State | Published - 2023 |
Bibliographical note
Funding Information:Department of Agriculture, Hatch—Multistate under 1002344 and 1003563. Co-author O.W. acknowledges support for this work through the KY006120 Hatch/Multistate Project "Soil, Water, and Environmental Physics to Sustain Agriculture and Natural Resources." We thank Dr. S. Grun-wald (PI), Dr. W.G. Harris, Wade Hurt, Dr. S.A. Bloom, Dr. R.G. Rivero, V. Ramasundaram, M. Gao, B. Murphy, and K. Bloom, as well as other staff of the Environmental Pedology Laboratory, Soil and Water Sciences Department, University of Florida, in conjunction with the Natural Resources Conservation Service, for the laboratory soil measurements, mining, development, and Q&A of the large FSCD that provided the soil data enabling the ML and neural network analyses. The FSCD can be obtained by contacting Dr. S. Grund-wald at https://www.sgrunwald.org/big-data.
Publisher Copyright:
© 2023 American Society of Agricultural and Biological Engineers.
Keywords
- Deep learning
- Gradient boosted regression
- Pedotransfer functions
- Random forest regression
- Soil database
- Soil properties
ASJC Scopus subject areas
- Forestry
- Food Science
- Agronomy and Crop Science
- Soil Science
- Biomedical Engineering