Upper Mississippi River Restoration Program

Long Term Resource Monitoring

Who We Are

Mission and Goals

Background

Program Documents

USGS Contacts

A Team Corner

State Field Stations

Lake City, Minnesota

La Crosse, Wisconsin

Bellevue, Iowa

Great Rivers, Illinois

Open River, Missouri

Havana, Illinois

Field Station Directory

Components

Fish

Aquatic Vegetation

Water Quality

Macroinvertebrates

Land Cover

Bathymetry

GIS Data

Other Research

Data and Tools

Data Visualization Tools

Sampling Design and Statistics

Reports

Reports and Publications

Strategic Plan 2010-2014

Status and Trends Report 2008

Fact Sheets

UMESC

Search

LTRM Statistics

Statistical Models and LTRM Data

Statisticians typically distinguish among design- and model-based inferences. The former derive from the sampling design while the latter rely on assumptions not associated with the design. An example of a model-based inference is one that assumes observations are normally distributed.

The use of models with LTRM data typically represents a scientific effort and, hence, falls under the LTRM's second mandate (i.e., that of understanding patterns in LTRMP data). Analytical concerns associated with such efforts are generally beyond the scope of this web site (an exception is that we address the use of means from LTRM data sets below). Users interested in adjusting for design attributes, such as variable selection probabilities, may consult Rabe-Hesketh and Skrondal (2006) and Carle (2009).

Modeling Using Means

Modeling using sample means (averages) of LTRM data should be approached with care. This is primarily because such means possess not only sampling but also parameter variance. The latter variance component arises because the mean of a sampled variable will actually vary by sampling event (i.e., not just as a result of sampling variance). Further considerations include: (1) the sampling variance of a given mean is a function of the sample size and sample sizes in the LTRM have not been constant over sampling events; (2) for means from stratified random samples, the sampling variance is a function of sampling probabilities and strata-specific variances: the former may often be treated as having been essentially constant over the Program’s duration while the latter should not; (3) means from LTRM stratified random samples should not a priori be presumed normally distributed (Thompson 2002); (4) the sampling variances of means of categorical and count data are themselves functions of the means (i.e., the sampling variance varies not only as a function of sample size but also of the mean); (5) for nonnormal data, parameter variance is typically presumed to vary linearly on a scale other than that on which the data were sampled; and (6) true means from the biotic components should, in the absence of evidence to the contrary, be presumed temporally correlated. Further discussion of modeling using means is provided by Snijders and Bosker (1999).

References

Carle, A. C. 2009. Fitting multilevel models in complex survey data with design weights: Recommendations. BMC Medical Research Methodology 9:49 doi:10.1186/1471-2288-9-49.

Rabe-Hesketh, S., and A. Skrondral. 2006. Multilevel modeling of complex survey data. Journal of the Royal Statistical Society, Series A 169: 805-827.

Snijders, T. A. B., and R. J. Bosker. 1999. Multilevel analysis. Sage, London.

Thompson, S. K. 2002. Sampling. Second edition. Wiley & Sons, New York.

Contact: Questions or comments may be directed to Brian Gray, LTRM statistician, Upper Midwest Environmental Sciences Center, La Crosse, Wisconsin, at brgray@usgs.gov.

Page Last Modified: August 12, 2016