USGS - science for a changing world

Upper Midwest Environmental Sciences Center

Home/ Overview/ Science Programs/ Data Library/ Products and Publications/States/ Rivers/Teachers and Students/ Links/ Contact/ Search

Maps, Models, and Tools for Bird Conservation Planning
Avian Species-of-Concern

A Hierarchical Spatial Count Model with Application to American Woodcock

taped seminar presented by Dr. Wayne Thogmartin, March 16, 2006
For more information, see http://www.ars.usda.gov/News/docs.htm?docid=11416
 

Textual Equivalent

I wanted to thank the organizers for inviting me to speak today about some of the work that I've been involved in.

Andy's provided a great foundation for the theoretical background and so basically today what I'll be doing is providing the nitty-gritty of an example relating to the American Woodcock.

The motivation for this research is that the woodcock is a species that is harvested but is declining in abundance, and so, the Fish and Wildlife Service, land management agencies, conservation groups, hunting groups, they're all interested in trying to identify means in which they may be able to efficiently deliver conservation on the ground, and one way that we believe that's possible is if we actually know where the species is highly abundant. We'll be able to direct our conservation efforts into those areas where the species is abundant and avoid delivering conservation to areas where the species is rare. So in that way we'll get more bang for our buck.

This is the primary breeding range of the American Woodcock in the United States. It does extend into Canada and just north of this region but you can tell that this is quite a large area extending from Minnesota and west to Maine and the Northeast down to Virginia in the South.

These are the data that we'll be using to parameterize this model - it's the American Woodcock Singing Ground Survey data; it's somewhat similar to the Breeding Bird Survey. There are routes on secondary roads; these routes are comprised of 10 stops in which birders count the number of woodcocks seen and heard. There are over 1,000 surveys in this area and for the purposes of the model that I'll describe we use counts between 1981 and 2001 to parameterize the model and then evaluated the model's performanced based upon counts we withheld from that time period as well as for counts from 2002 and 2003.

So just to summarize some of the statistics associated with these counts like I said there are over a thousand surveys but because we're replicating this over a 20-year period; we're using data from over a 20-year period we have counts from over 9,000 surveys. The mean number of counts per survey was 3.39 birds. Now this speaks to the question that ended our previous talk and we found for these data that over a quarter of the counts was zero, the median count was two, and the maximum count was 47.

One important point that I should make to you is there were over 1500 observers in the course of this study and that's one source of variability that we want to consider in our modeling effort.

This is the form of the model and Andy already described much of this and you may recall from yesterday's talk Mary described the initial auto-regression so we won't spend a huge amount of time on this aspect but we will revisit the topic.

So first of all I just want to point out that we have some nuisance effects associated with observer and year and we want to be able to control for these nuisance effects so that we can identify the underlying environmental parameters and spatial correlation that will help us in our predictions - predicting this species' abundance across space.

First of all we consider observers because observers do in fact count birds differently. Older observers are less able to hear some bird calls, so may in fact not count as many as there are out there singing. New observers are often times overwhelmed with the sounds that are out in the environment and so will tend to undercount. So we want to be able to have some sort of function that can control for these differences in the sampling that is employed.

Also because we're examining over twenty years of data we want to have some effect that allows us to level the data to a common playing field so that 1981's counts are relevant to 2001's count given that there's this overall trend and that there's variability here anyway. So we basically are bringing all of the data to a common level. This year effect actually is a combined effect of annual year effects and trend.

Then we have environmental variables that we're interested in. We considered variables from various suites of environmental characteristics: land cover compositions, landscape configuration, rain, human disturbances on the landscape, climate, that sort of thing - all of these variables were chosen a priori from an investigation or perusal of the literature. So we limit some of the concerns about this being solely an exploratory approach.

The error term here is basically where the actual Poisson variation is placed in this model, and then we have the spatial conditional auto regression.

We fit this model using WinBUGS, which is a Bayesian approach to auto fitting. Basically we're just taking the standardized likelihood and combining with the data to obtain posterior probability. This approach is hierarchical because there are clustering of slopes for observers, for years, and for the route effects, and so really what we're saying here is that every observer has their own slope and intercept, every year has their own slope and intercept, basically each of these random effects has their own auto parameters.

This correlation may occur because of the way you designed the study, certainly because of the temporal aspects of the data, and certainly occur because considering space. In this case all three of those issues are in hand so we want to at least consider them to be good so we can pull out unbiased estimates of our environmental covariables.

So we'll talk a little bit about that in more detail: as I said observers do count birds differently, some count birds similar to one another but in general what you see is that for a single observer you might have an observer improving their performance over time and so we want to be able to control for that fact. Other observers may decline over time.

In general, what we would like to see - here's the observer who's performance increases over time and you can see that other observers may decline. The general trend over time is that all observers are fairly level. This is just a made-up example but you can control for the fact of improving observer performance or declining over time.

Temporal variability - some years the mean abundance for the year may be above the grand mean and other years the abundance may be below and we want to control for that as well. We might see that there are declines over space in a year. Some of these things might be things we want to control for.

This is a distribution of the mean of the time series - it's difficult to show in a two-dimensional sense variability that one can see in a time series over space so here we are just showing the mean of the time series for each of the routes. The reason why I want to show this is because this gets to what was previously described Tobler's law, the First Law of Geography, and Tobler's law again says that we expect that which is closer to be more similar than that which is farther apart and you can see that that's the case here: you have a general north-south trend, it seems smaller counts in the South, and larger counts in the North. There may be a trend from the Southeast to the Northwest in increasing numbers of birds, and it seems that all of the high counts are associated here, and there are areas where we have very few counts or very low counts. So spatial considerations are important with these data, just see that from a basic summarization of these counts.

So, we applied for these data an irregular lattice. Andy obviously showed you an example where a regular lattice was possible. The problem of associating these routes with counties say or townships is that some counties will have more full routes and other counties will have no roots and so you have that problem of flow through, so to obviate that we applied an irregular lattice and used a Tessellation approach so that every route has a neighborhood structure.

We'll zoom in on that blue circle and I'll show you exactly what we do here. We took a first-order conditional auto-regression approach and basically what that does is that it says that first of all we need to define our neighborhood and our neighborhood for this cell here is basically these neighboring cells. We say for these neighboring cells that they have a value of one, or a weight of one, and all the other cells beyond those have a value or weight of zero. We use those as our weights in our weighting scheme as Mary talked about yesterday.

Basically conditional auto regression is the probability of observing a particular value at a given site; it's a conditional probability that depends upon the values in the surrounding neighborhood. The advantages of this approach is that it is conservative, and there is high specificity especially when you’re dealing with logistics problems or even in sparse data situations. So it's quite useful.

To schematically describe what I mean by smoothing because this is what conditional auto regression does. It's a favored approach in spatial epidemiology where people want to smooth disease infection rates, probability of these occurring in an area. Basically, what you're doing, here we've got three dimensions but you can sort of imagine what happens if you have multiple neighbors. In the case on the left if your location i is depressed relative to your neighbors the expectation for that location i gets raised because your neighbors expected counts are higher. Now in the case in which location i is much higher counts relative to its neighbors the expectation is lowered somewhat. So there's basically this smoothing that's applied across the entire region of interest and this smoothing is dependant upon the local neighbors. So in some areas you're going to have considerable smoothing and in other areas you're may have very little smoothing. So it's not necessarily very straightforward exactly, there's certain things that preclude just a straightforward calculation of a spatial correlation coefficient because it actually varies across states. Most correlation coefficients are sort of summaries, grand correlation coefficients across states.

This is again what we had in raw form, and this is the smoothed expectation - you can see that some of that variability has been reduced. This is the smoothed expectations in the face of a number of environmental covariates that I'll describe shortly. So this is over and above that which is explained by the environmental covariate. So you can tell that there is some residual spatial structure here that we are not able to describe, especially here on the northern periphery there seems to be some darker counts and this may be due to some edge effects that we're not able to fully model because we don't have good planetary covariates for the Canadian side of this population so we're not able to fully consider those edge effects described yesterday.

So there's a number of reasons why we may see some residual spatial structure. There may be a mismatch between the scales at which we modeled our covariates and the scale at which this Tesselation or lattice structure applies. There may be some biological reasons, species always returning to an area regardless of suitability of the local land cover.

There are alternatives to employing a first order conditional auto regression like we did. You might adopt some sort of ad-hoc weighting scheme where you say we're going to provide a weight of one to those nearest neighbors and then to those neighbors immediately surrounding those and then apply a zero weight to all those beyond the second order neighbors. This is kind of ad-hoc and so other people have employed Euclidian distance-based weights where they evaluate variagram analysis, they evaluate some range over which their correlation extends and say everything within that range is going to get a weight of one and everything beyond is going to get a weight of zero. Others have employed just a linear weighting scheme where the weights fall off with distance. And then there's the possibility of weighting your neighborhood matrix by the interaction or contiguity of your data. So in this case we have a location down here on Crystal Creek and everybody within that watershed is given a weight of one but those locations on the South Branch Root River gets a weight of zero. We can evaluate the performance of these different methods.

I glossed over the trending variables that we used or the variables we used to describe the large scale trend in our data but we evaluated these various variables at multiple spatial scales. We did so because it's not very clear in the literature at which scale these birds are responding to. Most of the literature for this bird and for most birds it's really identified by studies conducted at a field level or stand level. So it's not very clear that the things that we know about them at this fine scale necessarily translate up to a larger coarser scale. It's fairly standard to evaluate multiple spatial scales when your examining spatial structure, spatial variables with these species.

In this case there's a number of things going on and I'll just highlight them briefly but an important variable is the start of the season, the day of the year in which the growing season begins. The woodcock is negatively related to this variable. We think that's the case because it indexes earthworm abundance/availability. So that the farther north you go earthworms are less available simply because the ground is frozen further into the breeding period. Now this is contrary to that trend we saw and a lot of folks get confused by this because this does turn out to be one of the most important variables. They get confused because we do see in the general data summary that there is an increase in abundance over space as one goes from the South to the North but you have to realize this is one variable within the context of a whole slew of other variables so we won't get to wrapped up in that problem.

We also see that forest is most important at the finest scale and declines as one increases in coarseness of scale. But aspen which is surprisingly is not correlated with forest because much of the range in not in aspen, but the importance of aspen increases as you coarsen the scale. So that's an interesting finding that there's a switch there.

The proportion of the land cover in human land uses - urban, transportation, commercial - that sort of thing - there's a negative relationship to humans at all scales and that has important implications for the fact that we're now at 300 million people in the United States and we're going to be approaching half a billion by the middle of this century and so that has important implications for the conservation.

One way to easily summarize the random effects is through caterpillar plots. We have over one thousand groups here in this caterpillar plot and we can see that a number of the routes right here are above the mean in their effect on woodcock abundance and up here we have a number of routes below the mean. So this is a nice way to summarize the distribution of those effects across the routes.

For the observer effect in general we find that if we were to consider a normal distribution really we have about 2.5% above the mean and 2.5% below the mean and so it's our normal expectations and so whether we actually need to include an observer effect.

Here we have the annual random effects and you can see the posterior standard deviation for the annual random effects is quite small, we might be able to reduce the residual model variability if we examine this sinusoidal pattern that appears to occur in the data and so we might be able to reduce the residual model variability if we impose say a first order auto regression on the time series aspects. I just want to point out that we've imputed the values for 2002 and 2003 and that's why we have much greater variance around these estimates but they fit along the annual effect there.

This is our predicted woodcock relative abundance map circa 1991. You can center this at any year you want basically given that you have this time series. This is strictly the environmental covariates plus the residual spatial route effect identified earlier.

We evaluated this model by imputing values for the withheld data and for the years after the model for - those years after the model period that we were examining - and we find high correspondence greater than 73% explained variation yearly one-to-one correspondence, so it's a fairly good model.

The way we're using this data is again to focus our conservation applications in areas where it would do the most good. One of the things we're doing is relating this predicted pattern in woodcock abundance against conservation estate - various federal, state and tribal land management agencies and trying to identify how much of the population occurs on private land.

We focus in on hot spots - the top 5% of our distribution - and if we zoom in to the orange circle: here's a map of our predicted woodcock abundance, hotspot of abundance or peak of abundance related against federal lands on the left and state lands on the right and you can see that for this hotspot much of the peak of abundance occurs outside of any sort of governmental land management, so therefore the conclusion is that it occurs in private land that poses a whole bunch of new issues involved in conserving the species. We can see on the left that there's a number of federal wildlife refuges - all of these refuges has a private lands program and if they want to effect conservation for this species around their refuge they would best direct their efforts to these areas where the species is most abundant instead of trying to just diffusely apply it across their area of interest.

So this is the way we're using these models. I've run out of time but if you want more information here's an http site. I've got papers coming out - Journal of Wildlife Management - next year or so, and you can always email me. I think on the CD there's a couple reprints from the BBS data analysis that Andy mentioned.

I want to thank everybody for your attention - thanks.

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo USA.gov logo U.S. Department of the Interior | U.S. Geological Survey
URL: http://www.umesc.usgs.gov/terrestrial/migratory_birds/bird_conservation/wt_woodcock_20060316.html
Page Contact Information: Contacting the Upper Midwest Environmental Sciences Center
Page Last Modified: October 2, 2007