Upper Midwest Environmental Sciences Center
Introduction The LTRM collects data using sampling locations that have been selected both probabilistically and nonprobabilistically (in LTRM parlance, "stratified random" and "fixed-site" data, respectively). Sample information from probabilistically selected locations may be used to make inferences about the populations from which those samples were derived. For example, the prevalence of submersed aquatic vegetation may be estimated for the entire population of sample units (generally defined for an entire reach) using data collected at locations selected probabilistically combined with information about the sampling design. Our “fixed site” data do not permit such design-based generalizations. For this reason, this web site is primarily concerned with the use of sample information from probabilistically selected sites. The LTRM estimates annual means and associated standard errors by relying on the sampling design (rather than on distributional assumptions presumed associated with the observed data). These so-called “design-based” methods accommodate complexities often associated with survey designs, including stratification and nonproportional sampling ("strata" represent subpopulations from which independent samples are drawn). A useful comparison of design- and model-based methods for the analysis of survey data is provided by Lohr (1999). The LTRM does not presently adjust statistics from the biological components for detection probabilities or capture efficiencies. Consequently, statistics from the Program's biological components are more properly termed index statistics. Index statistics are presumed to be correlated with parameters of interest (e.g., abundance, percent frequency of occurrence). However, because index statistics have not been adjusted for variation in detection probabilities, changes in index statistics cannot be explicitly differentiated from changes in detection probabilities. Further information on index statistics and detection probabilities is provided in Thompson et al. (1998). LTRM treats missing data as missing completely at random (MCAR). MCAR means that both missing and nonmissing data arose from the same distribution. Data may be missing because, for example, a site could not be visited (due, for example, to flow considerations) or a sample was taken but sample data was subsequently lost. Because the proportions of missing data within LTRM datasets are typically small, we believe that violation of MCAR assumptions will typically have inconsequential effects. Exceptions are addressed on a case-by-case basis in the application web pages.
A sample inclusion probability is the probability that an individual population unit—for the LTRM, a grid point—is selected for sampling. Example: 20 grid points from each of strata i and j are selected using simple random sampling. If the population sizes of these strata are 1000 and 2000 grid points, then the sample inclusion probabilities are 20/1000 = 2% and 20/2000 = 1%, respectively. For inclusion probabilities to be constant across strata (i.e., “proportional to size”), the number of grid points selected would need to be directly proportional to the strata sizes (e.g., select 10 and 20 units from strata i and j, respectively). For a given component, these inclusion probabilities have varied across strata (within a given pool) but, with few exceptions, have been essentially constant within strata. The LTRM uses sampling weights to adjust for nonproportional sampling. Sampling weights for the LTRM are generally defined as the inverses of the sample inclusion probabilities, and they may also be viewed as the number of potentially sampleable units represented by a given sampled unit. Continuing with the previous example, each sampled unit in strata i and j may be viewed as representing 50 and 100 (i.e., 1000/20 and 2000/20) potentially sampleable units, respectively. In some instances, locations selected for sampling by the LTRM were not sampled. This might have occurred, for example, when the intended sampling location was inaccessible. At present, the LTRM treats these missing observations as missing completely at random (vegetation component) or by substituting predefined alternative locations (else). If unsampled locations were either (1) not missing completely at random or (2) were not interchangeable with the alternate locations for the given metric, then we may expect our reported statistics to reflect bias of unknown magnitude. At present, the LTRM ignores the issue of missing data, and sample inclusion probabilities are estimated using the observed rather than intended number of sample units. Sample inclusion probabilities and sampling weights by strata, component, and reach are calculated using the number of sampling observations and the corresponding population sizes. Population sizes are provided below in both pdf and Excel format. Population units (Excel file) (pdf file) Estimating Design-based Means and Standard Errors For the LTRM, means are adjusted for nonproportional sampling and standard errors for both nonproportional sampling and stratification. Design-based means and standard errors are estimated using SAS' survey means procedure (proc surveymeans); further technical details are provided in SAS (2003). Comments by sampling component:
Macroinvertebrate
Vegetation
Water quality
Estimating means and standard errors from a subpopulation not defined by the design typically require methods that acknowledge that the number of samples in the subpopulation is a random variable (Thompson 2002). See Estimating a mean from a portion of one or more strata. Estimating species richness is generally beyond the scope of this document. Methods for estimating species richness that adjust for species-specific detection probabilities are reviewed by MacKenzie et al. (2005). Survey statisticians typically adjust standard errors when large proportions (>10%) of potential sampling units are sampled. As these proportions increase, we become increasingly sure about our estimates—and, hence, standard errors should decrease and confidence interval widths narrow. Corrections for sampling large proportions of sampling sites are termed finite population corrections. Sampling fractions are typically low (<10%) for all LTRM components. Occasional exceptions are seen for the water quality component during 1993 through winter 1994-95 (a design change occurred in spring 1995). Precision estimates reported by the water quality component for 1993 through winter 1994-95 do not currently adjust for finite population sampling. This approach reflects that sampling proportions in the 1990s were a function of a sampling design that changed in 1995, and that failure to adjust for finite population sampling yields conservative precision estimates (e.g., standard error estimates will be too large). proc surveymeans data=WQall total=180; Results (below) indicate that finite sampling adjustment for finite sampling yielded modest decreases in both the standard error of the mean (SE) and the widths of the 95% confidence interval on the mean (95% CI). For this stratum and year, 18% of sites were sampled.
Confidence Intervals Most survey-based confidence intervals on the mean, including those supplied by the LTRM, rely on a normality assumption for the mean. When underlying data aren’t normally distributed, the normality assumption for the mean assumes large sample sizes. The definition of “large” depends on the skewness or asymmetry of the data and on the number of strata (Lohr 1999; Thompson 2002). Unless otherwise indicated, all data from the LTRM should be presumed nontrivially skewed (exceptions include some water quality metrics). Within-strata sample sizes have often been small (i.e., n < 30 to 50, and possibly <10), as have the number of strata. The number of strata has varied from between approximately 15 in the fish component (10 beginning in 2005) to approximately 4 for metrics associated with the other components. For these reasons, confidence intervals that rely on a normality assumption (whether supplied by the LTRM or otherwise) for the mean should be viewed as approximate. Exceptions include where the underlying data may be assumed approximately symmetric (e.g., percent frequency of occurrences near 50%) or when sample sizes are large. Alternative methods of estimating confidence intervals are described by Lohr (1999). Gutreuter, S., R. Burkhardt, and K. Lubinski. 1995. Long Term Resource Monitoring Program Procedures: Fish monitoring. National Biological Service, Environmental Management Technical Center, Onalaska, Wisconsin, July 1995. LTRMP 95-P002 1. 42 pp. + Appendixes A-J. Sauer, J. 1998. Temporal analyses of select macroinvertebrates in the Upper Mississippi River System, 1992-1995. U.S. Geological Survey, Environmental Management Technical Center, Onalaska, Wisconsin, April 1998. LTRMP 98-T001. 26 pp. + Appendix. (NTIS PB98-140874) Thompson, W. L., G. C. White, and C. Gowan. 1998. Monitoring vertebrate populations. Academic Press, San Diego, California. Yin, Y., H. Langrehr, T. Blackburn, M. Moore, J. Winkelman, R. Cosgriff, and T. Cook. 2001. 1998 annual status report: Submersed and rooted floating leaf vegetation in Pools 4, 8, 13, and 26 and La Grange Pool of the Upper Mississippi River System. U.S. Geological Survey, Upper Midwest Environmental Sciences Center, La Crosse, Wisconsin, May 2001. LTRMP 2001-P001. 9 pp. + Appendix + Chapters 1-5. (DTIC ADA392067) Contact: Further information about estimating means and standard errors from LTRM data may be obtained from Brian Gray, LTRM statistician, Upper Midwest Environmental Sciences Center, La Crosse, Wisconsin, at brgray@usgs.gov. |