Beyond estimates of occurrence and relative abundance, the eBird Status and Trends products contain estimates of predictor importance (PI), partial dependence (PD), and predictive performance metrics (PPMs). The PPMs can be used to evaluate statistical performance of the models, either over the entire spatiotemporal extent of the model results, or for a specific region and season. Predictor importances (PIs) and partial dependences (PDs can be used to understand relationships between the occurrence or count response and predictors, most notably the land cover variables used in the model. The functions described in this section help load and analyse these data. More details about predictive performance metrics and how they were calculated and interpreted can be found in Fink et al. (2019).
IMPORTANT. AFTER DOWNLOADING THE RESULTS, DO NOT CHANGE THE FILE STRUCTURE. All functionality in this package relies on the structure inherent in the delivered results. Changing the folder and file structure will cause errors with this package.
The non-raster data are stored in two SQLite databases:
pi-pd.db: predictor importance and partial dependence estimates for each stixel used to fit the model.
predictions.db: predictions on a test dataset consisting of checklists not used in model fitting.
ebirdst package provides functions for accessing these, such that you should never have to handle them manually.
We’ll start by loading the PI and PD data from the example data package for Yellow-bellied Sapsucker in Michigan.
library(ebirdst) library(dplyr) # Because the non-raster data is large, there is a parameter on the # ebirdst_download function that defaults to downloading only the raster data. # To access the non-raster data, set tifs_only = FALSE. <- ebirdst_download(species = "example_data", tifs_only = FALSE) sp_path # predictor importance <- load_pis(sp_path) pis glimpse(pis) # partial dependence <- load_pds(sp_path) pds glimpse(pds)
Notice that the data in both these data frames is provided for each stixel, identified by a
When working with Predictive Performance Metrics (PPMs), PIs, or PDs, it is very common to select a subset of space and time for analysis. In
ebirdst this is done by creating a spatiotemporal extent object with
ebirdst_extent(). These objects define the region and season for analysis and are passed to many functions in
# define a spatiotemporal extent <- ebirdst_extent(c(xmin = -86, xmax = -83, ymin = 42, ymax = 45), lp_extent t = c(0.425, 0.475)) print(lp_extent) # subset to this extent <- ebirdst_subset(pis, ext = lp_extent) pis_ss nrow(pis) nrow(pis_ss)
To understand which stixels are contributing to estimates within a given spatiotemporal extent,
stixel_footprint() generate a
RasterLayer depicting where a majority of the information is coming from within a given extent. The map ranges from 0 to 1, with pixels have a value of 1 meaning that 100% of the selected stixels are contributing information at that pixel. Calling
plot() on the output of this function will map the stixel footprint as well as centroids of all the stixels.
par(mfrow = c(1, 1), mar = c(0, 0, 0, 6)) <- stixel_footprint(sp_path, ext = lp_extent) footprint plot(footprint)
plot_pis() function generates a bar plot showing a rank of the most important predictors within a spatiotemporal subset. There is an option to show all predictors or to aggregate FRAGSTATS by the land cover types.
# with all classes plot_pis(pis, ext = lp_extent, by_cover_class = FALSE, n_top_pred = 15) # aggregating fragstats for cover classes plot_pis(pis, ext = lp_extent, by_cover_class = TRUE, n_top_pred = 15)
Smoothed partial dependence curves for a given predictor can be plotted using
plot_pds(). Confidence intervals are estimated through a processing of subsampling and bootstrapping. This function returns the smoothed data and CIs and plots these data. For example, let’s look at PD curves for checklist start time, expressed as the difference from solar noon, and the percentage of broadleaf forest cover. The full list of predictors see the data frame
# in the interest of speed, run with 5 bootstrap iterations # in practice, best to run with the default number of iterations (100) <- plot_pds(pds, "solar_noon_diff", ext = lp_extent, n_bs = 5) pd_smooth ::glimpse(pd_smooth) dplyr # deciduous broadleaf forest plot_pds(pds, "mcd12q1_lccs1_fs_c14_1500_pland", ext = lp_extent, n_bs = 5)
Beyond confidence intervals provided for the abundance estimates, the centroid data can also be used to calculate predictive performance metrics, to get an idea as to whether there is substantial statistical performance to evaluate information provided by PIs and PDs (as well as abundance and occurrence information). Three types of PPMs are calculated:
Both the occurrence and count PPMs are within-range metrics, meaning the comparison between observations and predictions is only made within the range where the species occurs.
ebirdst_ppms() calculates a suite of PPMs for a given spatiotemporal extent. The results can then be plotted with
<- ebirdst_ppms(sp_path, ext = lp_extent) ppms plot(ppms)
ebirdst_ppms_ts() can be used to get a time series of PPMs, calculating the full suite either at a weekly or monthly resolution.
plot() can be used to visualise these PPM time series for a given PPM.
<- ebirdst_ppms_ts(sp_path, ext = lp_extent, summarize_by = "months") ppms_monthly # plot binary kappa plot(ppms_monthly, type = "binary", metric = "kappa") # plot occurrence probability auc plot(ppms_monthly, type = "occurrence", metric = "auc")
Fink, D., T. Damoulas, & J. Dave, J. 2013. Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. In Twenty-Seventh AAAI Conference on Artificial Intelligence.
Fink, D., W.M. Hochachka, B. Zuckerberg, D.W. Winkler, B. Shaby, M.A. Munson, G. Hooker, M. Riedewald, D. Sheldon, & S. Kelling. 2010. Spatiotemporal exploratory models for broad‐scale survey data. Ecological Applications, 20(8), 2131-2147.
Fink, D., T. Auer, A. Johnston, V. Ruiz‐Gutierrez, W.M. Hochachka, & S. Kelling. 2019. Modeling avian full annual cycle distribution and population trends with citizen science data. Ecological Applications, 00(00):e02056. https://doi.org/10.1002/eap.2056