Building valid regression models for biological data using STATA and R

Charles Lindsey, Simon J. Sheather

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

2 Scopus citations


This chapter explains how linear regression can be applied to model relationships in biological data. The example response variables include brain weight, size of fish populations, HDL (high-density lipoprotein) cholesterol level, and diabetes progression. The statistical software R and Stata are used to perform the analyses. The main tools the authors use to validate regression assumptions are plots involving standardized residuals and/or fitted values. The chapter then considers the marginal model plots, which have wider application than residual plots. Examination of the residual plots demonstrate whether the assumption of constant error variance is reasonable. The chapter discusses how transforming the variables can lead to a valid model. It also shows how to assess the extent of collinearity among the predictor variables.

Original languageEnglish
Title of host publicationBiological Knowledge Discovery Handbook
Subtitle of host publicationPreprocessing, Mining and Postprocessing of Biological Data
Number of pages31
ISBN (Electronic)9781118617151
StatePublished - 2014

Bibliographical note

Publisher Copyright:
© 2014 John Wiley & Sons, Inc.

ASJC Scopus subject areas

  • Computer Science (all)


Dive into the research topics of 'Building valid regression models for biological data using STATA and R'. Together they form a unique fingerprint.

Cite this