USGS - science for a changing world

Upper Midwest Environmental Sciences Center

Home/ Overview/ Science Programs/ Data Library/ Products and Publications/States/ Rivers/Teachers and Students/ Links/ Contact/ Search
A-Team Cornerfolder.gifLong Term Resource Monitoring Program
  A Team Corner
  LTRMP Statistics

Estimating Means and Standard Errors from LTRMP Survey Data

Introduction

The LTRMP collects data using sampling locations that have been selected both probabilistically and nonprobabilistically (in LTRMP parlance, "stratified random" and "fixed-site" data, respectively). Sample information from probabilistically selected locations may be used to make inferences about the populations from which those samples were derived. For example, the prevalence of submersed aquatic vegetation may be estimated for the entire population of sample units (generally defined for an entire reach) using data collected at locations selected probabilistically combined with information about the sampling design. Our “fixed site” data do not permit such design-based generalizations. For this reason, this web site is primarily concerned with the use of sample information from probabilistically selected sites.

The LTRMP estimates annual means and associated standard errors by relying on the sampling design (rather than on distributional assumptions presumed associated with the observed data). These so-called “design-based” methods accommodate complexities often associated with survey designs, including stratification and nonproportional sampling ("strata" represent populations from which independent samples are drawn). A useful comparison of design- and model-based methods for the analysis of survey data is provided by Lohr (1999).

The LTRMP does not presently adjust statistics from the biological components for detection probabilities or capture efficiencies. Consequently, statistics from the Program's biological components are more properly termed index statistics. Index statistics are presumed to be correlated with parameters of interest (e.g., abundance, percent frequency of occurrence). However, because index statistics have not been adjusted for variation in detection probabilities, changes in index statistics cannot be explicitly differentiated from changes in detection probabilities. Further information on index statistics and detection probabilities is provided in Thompson et al. (1998).

Sample Inclusion Probabilities

A sample inclusion probability is the probability that an individual population unit—for the LTRMP, a grid point—is selected for sampling. Example: 20 grid points from each of strata i and j are selected using simple random sampling. If the population sizes of these strata are 1000 and 2000 grid points, then the sample inclusion probabilities are 20/1000 = 2% and 20/2000 = 1%, respectively. For inclusion probabilities to be constant across strata (i.e., “proportional to size”), the number of grid points selected would need to be directly proportional to the strata sizes (e.g., select 10 and 20 units from strata i and j, respectively). For a given component, these inclusion probabilities have varied across strata (within a given pool) but, with few exceptions, have been generally constant across years.

Sampling Weights

The LTRMP uses sampling weights to adjust for nonproportional sampling. Sampling weights for the LTRMP are generally defined as the inverses of the sample inclusion probabilities, and they may also be viewed as the number of potentially sampleable units represented by a given sampled unit. Continuing with the previous example, each sampled unit in strata i and j may be viewed as representing 50 and 100 (i.e., 1000/20 and 2000/20) potentially sampleable units, respectively.

In some instances, locations selected for sampling by the LTRMP were not sampled. This might have occurred, for example, when the intended sampling location was inaccessible. At present, the LTRMP treats these missing observations as missing completely at random (vegetation component) or by substituting predefined alternative locations (other components). If unsampled locations were either (1) not missing completely at random or (2) were not interchangeable with the alternate locations for the given metric, then we may expect our reported statistics to reflect bias of unknown magnitude. At present, the LTRMP ignores the issue of missing data, and sample inclusion probabilities are estimated using the observed rather than intended number of sample units.

Sample inclusion probabilities and sampling weights by strata, component, and reach are calculated using the number of sampling observations and the corresponding population sizes. Population sizes are provided below in both pdf and Excel format.

Population units (Excel file) (pdf file)

Estimating Design-based Means and Standard Errors

For the LTRMP, multi-strata means are adjusted for nonproportional sampling and standard errors of multi-strata means for both nonproportional sampling and stratification. Design-based means and standard errors are estimated using SAS' survey means procedure (proc surveymeans); further technical details are provided in SAS (2003).

Comments by sampling component:

Fish

  • Designs include both spatial and temporal strata.
  • Sample sizes within spatial and temporal strata have typically been small (n < 10). Consequently, means are typically reported by spatial strata (i.e., by ignoring temporal strata).
  • Wing dams, sampling locations within wing dams and tailwaters are not selected using the methods used for the larger strata (Gutreuter 1995). Consequently, information about wing dam and tailwater sampling is excluded from annual means reported by the fish component.
  • Further information about statistics used by the LTRMP's fish component is provided in Gutreuter (1993).

Macroinvertebrate

  • The macroinvertebrate component defined strata in space only. Sample sizes have typically been moderate in backwater and impounded strata (n ~ 50), intermediate in the side channel stratum (n ~ 20), and small in the main channel border stratum (n ~ 10).
  • Further information about statistics used by the LTRMP's macroinvertebrate component is provided in Sauer (1998).

Vegetation

  • The vegetation component defines strata in space only and, with the exception of the percent cover variable, uses a cluster design. Sample sizes vary considerably by strata but are often large (n > 50).
  • The vegetation component’s sampling frame for all pools was revised in 1999. Consequently, means and standard errors from 1998 (all pools) represent subpopulations (see below).
  • Sampling date is partially confounded with stratum in Pool 8 (sampling occurs from north to south, and strata are partially defined along a north-south axis)
  • Further information about statistics used by the LTRMP's vegetation component is provided in Yin et al. (2001).

Water quality

  • The water quality component uses a spatially stratified design that is replicated within each of the four seasons. Sample sizes have typically been moderate in backwater and impounded strata, moderate in side channel strata, and small in main channel border strata. As interest is typically in within-season estimates, means and standard errors will generally be estimated separately for each reach, season and stratum.
  • Means for Pool 26 exclude Swan Lake (which is not technically part of Pool 26).

Means of Subpopulations

Estimating means and standard errors from a subpopulation not defined by the design typically require methods that acknowledge that the number of samples in the subpopulation is a random variable (Thompson 2002). This issue is most commonly faced when estimating means and standard errors for upper and lower Pool 4, but is also faced when estimating means from vegetation data collected in 1998 (see Vegetation above).

Species Richness

Species richness estimates reported by the LTRMP represent the number of detected species and, as such, should be treated as possible underestimates. Methods for estimating species richness that adjust for species-specific detection probabilities are reviewed by MacKenzie et al. (2005).

Finite Population Correction Factors

Given a finite number of potential sampling locations, the variance of a statistic will decrease as increasing proportions of those locations are sampled. For example, when an entire population is sampled, the design-based sampling variance is zero (because the population is censused). While corrections for the sampling fraction of a population may be addressed using finite population correction factors, such corrections are often ignored when sample inclusion probabilities are less than 10%. For the LTRMP, sampling fractions only rarely exceed 10%. A discussion of our approach for these few exceptions is provided under notes on finite population corrections and confidence intervals.


References

Gutreuter, S. 1993. A statistical review of sampling of fishes in the Long Term Resource Monitoring Program. National Biological Survey, Environmental Management Technical Center, Onalaska, Wisconsin, December 1993. EMTC 93-T004. 15 pp. (NTIS PB94-150828) 

Gutreuter, S., R. Burkhardt, and K. Lubinski. 1995. Long Term Resource Monitoring Program Procedures: Fish monitoring. National Biological Service, Environmental Management Technical Center, Onalaska, Wisconsin, July 1995. LTRMP 95-P002 1. 42 pp. + Appendixes A-J.


Mackenzie, D. I., J. D. Nichols, N. Sutton, and L. L. Bailey. 2005. Improving inferences in population studies of rare species that are detected imperfectly. Ecology 86:1101–1113.

Sauer, J. 1998. Temporal analyses of select macroinvertebrates in the Upper Mississippi River System, 1992-1995. U.S. Geological Survey, Environmental Management Technical Center, Onalaska, Wisconsin, April 1998. LTRMP 98-T001. 26 pp. + Appendix. (NTIS PB98-140874) 

Snijders, T. A. B., and R. J. Bosker. 1999. Multilevel analysis. Sage, London. 266 pp.

Thompson, S. K. 2002. Sampling. Second edition. Wiley & Sons, New York.

Thompson, W. L., G. C. White, and C. Gowan. 1998. Monitoring vertebrate populations. Academic Press, San Diego, California.

Yin, Y., H. Langrehr, T. Blackburn, M. Moore, J. Winkelman, R. Cosgriff, and T. Cook. 2001. 1998 annual status report: Submersed and rooted floating leaf vegetation in Pools 4, 8, 13, and 26 and La Grange Pool of the Upper Mississippi River System. U.S. Geological Survey, Upper Midwest Environmental Sciences Center, La Crosse, Wisconsin, May 2001. LTRMP 2001-P001. 9 pp. + Appendix + Chapters 1-5. (DTIC ADA392067)

Contact: Further information about estimating means and standard errors from LTRMP data may be obtained from Brian Gray, LTRMP statistician, Upper Midwest Environmental Sciences Center, La Crosse, Wisconsin, at brgray@usgs.gov.

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo USA.gov logo U.S. Department of the Interior | U.S. Geological Survey
URL: http://www.umesc.usgs.gov/ltrmp/means.html
Page Contact Information: Contacting the Upper Midwest Environmental Sciences Center
Page Last Modified: November 17, 2009