USGS - science for a changing world

Upper Midwest Environmental Sciences Center

LTRM Statistics

Statistical Models and LTRM Data

Statisticians typically distinguish among design- and model-based inferences. The former derive from the sampling design while the latter rely on assumptions not associated with the design. An example of a model-based inference is one that assumes observations are normally distributed.

The use of models with LTRM data typically represents a scientific effort and, hence, falls under the LTRM's second mandate (i.e., that of understanding patterns in LTRMP data). Analytical concerns associated with such efforts are generally beyond the scope of this web site (an exception is that we address the use of means from LTRM data sets below). Users interested in adjusting for design attributes, such as variable selection probabilities, may consult Rabe-Hesketh and Skrondal (2006) and Carle (2009).

Modeling Using Means

Modeling using sample means (averages) of LTRM data should be approached with care. This is primarily because such means possess not only sampling but also parameter variance. The latter variance component arises because the mean of a sampled variable will actually vary by sampling event (i.e., not just as a result of sampling variance). Further considerations include: (1) the sampling variance of a given mean is a function of the sample size and sample sizes in the LTRM have not been constant over sampling events; (2) for means from stratified random samples, the sampling variance is a function of sampling probabilities and strata-specific variances: the former may often be treated as having been essentially constant over the Program’s duration while the latter should not; (3) means from LTRM stratified random samples should not a priori be presumed normally distributed (Thompson 2002); (4) the sampling variances of means of categorical and count data are themselves functions of the means (i.e., the sampling variance varies not only as a function of sample size but also of the mean); (5) for nonnormal data, parameter variance is typically presumed to vary linearly on a scale other than that on which the data were sampled; and (6) true means from the biotic components should, in the absence of evidence to the contrary, be presumed temporally correlated. Further discussion of modeling using means is provided by Snijders and Bosker (1999).

References

Carle, A. C. 2009. Fitting multilevel models in complex survey data with design weights: Recommendations. BMC Medical Research Methodology 9:49 doi:10.1186/1471-2288-9-49.

Rabe-Hesketh, S., and A. Skrondral. 2006. Multilevel modeling of complex survey data. Journal of the Royal Statistical Society, Series A 169: 805-827.

Snijders, T. A. B., and R. J. Bosker. 1999. Multilevel analysis. Sage, London.

Thompson, S. K. 2002. Sampling. Second edition. Wiley & Sons, New York.

Contact: Questions or comments may be directed to Brian Gray, LTRM statistician, Upper Midwest Environmental Sciences Center, La Crosse, Wisconsin, at brgray@usgs.gov.

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo USA.gov logo U.S. Department of the Interior | U.S. Geological Survey

URL: http://www.umesc.usgs.gov/lltrmp/stats/modeling.html
Page Contact Information: Contacting the Upper Midwest Environmental Sciences Center
Page Last Modified: August 12, 2016