This vignette is an extended set of examples to highlight the foieGras
package’s functionality. Please, do NOT interpret these examples as instructions for conducting analysis of animal movement data. Numerous essential steps in a proper analysis have been left out of this document. It is your job to understand your data, ensure you are asking the right questions of your data, and that the analyses you undertake appropriately reflect those questions. We can not do this for you!
This vignette provides a (very) brief overview of how to use foieGras
to filter animal track locations obtained via the Argos satellite system or via processed light-level geolocation (GLS). foieGras
provides two state-space models (SSM’s) for filtering (ie. estimating “true” locations and associated movement model parameters, while accounting for error-prone observations):
rw
crw
Both models are continuous-time models, that is, they account for time intervals between successive observations, thereby naturally accounting for the irregularly-timed nature of most Argos data. We won’t dwell on the details of the models here, those will come in a future paper, except to say there may be advantages to choosing one over the other in certain circumstances. The Random Walk model tends not to deal well with small to moderate gaps (relative to a specified time step) in observed locations and can over-fit to particularly noisy data. The Correlated Random Walk model can often deal better with these small to moderate data gaps and smooth through noisy data but tends to estimate nonsensical movement through larger data gaps.
Additionally, foieGras
provides fast models (mpm
, jmpm
) for estimating a behavioural index along animals’ tracks (see Jonsen et al. 2019 Ecology 100:e02566 for details). The mpm
is fit to individual tracks, whereas the jmpm
is fit to multiple individual track simultaneously with a variance parameter that is estimated jointly across the tracks. This latter model can often better resolve subtle changes in movement behaviour along tracks that lack much contrast in movements.
foieGras
expects data to be provided in one of several possible formats.
data.frame
or tibble
that looks like this#> # A tibble: 6 x 8
#> id date lc lon lat smaj smin eor
#> <chr> <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 54591 2012-03-05 05:09:33 1 111. -66.4 2442 416 42
#> 2 54591 2012-03-06 04:55:14 0 110. -66.4 49660 391 90
#> 3 54591 2012-03-07 04:23:10 A 110. -66.5 5032 472 93
#> 4 54591 2012-03-07 21:23:06 A 110. -66.4 4007 286 116
#> 5 54591 2012-03-09 04:27:49 B 110. -66.5 13063 956 82
#> 6 54591 2012-03-10 00:10:41 A 111. -66.4 5099 478 79
where the Argos data are provided via CLS Argos’ Kalman filter model (KF) and include error ellipse information for each observed location.
data.frame
or tibble
that looks like this#> # A tibble: 6 x 5
#> id date lc lon lat
#> <chr> <dttm> <fct> <dbl> <dbl>
#> 1 ct36-F-09 2009-02-10 19:42:44 A 70.6 -49.7
#> 2 ct36-F-09 2009-02-11 07:56:36 A 70.2 -50.2
#> 3 ct36-F-09 2009-02-12 01:53:07 A 70.1 -51.1
#> 4 ct36-F-09 2009-02-12 19:06:55 B 69.5 -52.0
#> 5 ct36-F-09 2009-02-13 12:13:19 B 71.0 -53.1
#> 6 ct36-F-09 2009-02-14 01:10:58 B 70.1 -53.4
where the Argos data are provided via CLS Argos’ Least-Squares model (LS) and do not include error ellipse information.
#> # A tibble: 6 x 8
#> id date lc lon lat smaj smin eor
#> <chr> <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 54591 2012-03-05 05:09:33 1 111. -66.4 2442 416 42
#> 2 54591 2012-03-06 04:55:14 0 110. -66.4 49660 391 90
#> 3 54591 2012-03-07 04:23:10 A 110. -66.5 NA NA NA
#> 4 54591 2012-03-07 21:23:06 A 110. -66.4 NA NA NA
#> 5 54591 2012-03-09 04:27:49 B 110. -66.5 NA NA NA
#> 6 54591 2012-03-10 00:10:41 A 111. -66.4 5099 478 79
in this situation, foieGras
treats observations with missing error ellipse information as though they are LS-based observations.
sf
object where observations have any of the previous 3 structures and also include CRS
information#> Simple feature collection with 6 features and 6 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 2426.138 ymin: -912.3109 xmax: 2437.96 ymax: -903.7763
#> epsg (SRID): NA
#> proj4string: +proj=stere +lat_0=-90 +lat_ts=-71 +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=km +no_defs
#> # A tibble: 6 x 7
#> id date lc smaj smin eor geometry
#> <chr> <dttm> <chr> <dbl> <dbl> <dbl> <POINT [km]>
#> 1 54591 2012-03-05 05:09:33 1 2442 416 42 (2430.943 -912.3109)
#> 2 54591 2012-03-06 04:55:14 0 49660 391 90 (2437.96 -903.7763)
#> 3 54591 2012-03-07 04:23:10 A 5032 472 93 (2429.753 -907.3739)
#> 4 54591 2012-03-07 21:23:06 A 4007 286 116 (2437.367 -905.2335)
#> 5 54591 2012-03-09 04:27:49 B 13063 956 82 (2426.138 -905.8045)
#> 6 54591 2012-03-10 00:10:41 A 5099 478 79 (2431.234 -909.0692)
data.frame
, tibble
or sf
object where processed GLS data are provided and include longitude and latitude error SD’s (in degrees)#> # A tibble: 5 x 7
#> id date lc lon lat lonerr laterr
#> <dbl> <dttm> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 54632 2019-10-07 12:37:03 G 100 -55 0.191 0.179
#> 2 54632 2019-10-08 00:37:03 G 100. -54 0.731 1.78
#> 3 54632 2019-10-08 12:37:03 G 101 -53 0.328 4.09
#> 4 54632 2019-10-09 00:37:03 G 102. -52 0.793 0.359
#> 5 54632 2019-10-09 12:37:03 G 102 -51 0.0566 2.94
model fitting for quality control of locations is comprised of 2 steps: a prefilter step where a number of checks are made on the input data (see ?foieGras::prefilter
for details), including applying the argsofilter::sdafilter
to identify extreme outlier observations. Additionally, if the input data are not supplied as an sf
object, prefilter
guesses at an appropriate projection (typically world mercator, EPSG 3395) to apply to the data. The SSM is then fit to this projected version of the data. Users invoke this process via the fit_ssm
function:
## load foieGras example data
data(ellie)
## prefilter and fit Random Walk SSM using a 24 h time step
fit <- fit_ssm(ellie, model = "rw", time.step = 24, verbose = 0)
#> Warning in sqrt(as.numeric(object$diag.cov.random)): NaNs produced
these are the minimum arguments required: the input data, the model (“rw” or “crw”) and the time.step (in h) to which locations are predicted. Additional control can be exerted over the prefiltering step, via the vmax
, ang
, distlim
, spdf
and min.dt
arguments. see ?foieGras::fit_ssm
for details, the defaults for these arguments are quite conservative, usually leading to relative few observations being flagged to be ignored by the SSM. Additional control over the SSM fitting step can also be exerted but these should rarely need to be accessed by users and will not be dealt with here.
Simple summary information about the foieGras
fit can be obtained by calling the fit object:
fit$ssm[[1]]
#> Process model: rw
#> Time interval: 24 hours
#> number of observations: 64
#> number of regularised state estimates: 113
#>
#> parameter estimates
#> -------------------
#> Estimate Std. Error
#> rho_p -0.848 0.037
#> sigma 103.777 9.283
#> sigma 89.944 8.026
#> rho_o 0.000 0.000
#> tau 1.000 0.000
#> tau 1.000 0.000
#> psi 1.007 NaN
#> -------------------
#> negative log-likelihood: 732.1398
#> convergence: CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH
and a summary plot
method allows a quick visual of the SSM fit to the data:
The predicted values (red) are the state estimates predicted at regular time intervals, specified by
time.step
(here every 24 h). These estimates are plotted on top of the observations that passed the prefilter stage (blue points and blue rug at bottom). Fitted values are the state estimates corresponding to the time of each observation; their time series are plotted by default - plot(fit)
. A 2-D time series plot of the track is invoked by the argument type = 2
.
As SSMs are latent variable models, evaluating their goodness of fit is less straightforward than simpler statistical models without latent variables. We can use One-Step-Ahead (prediction) residuals via foieGras::osar
. Here we use osar
to compare SSM fits of the rw
and crw
model to the same example southern elephant seal data.
## fit crw SSM
fitc <- fit_ssm(ellie, model = "crw", time.step = 24, verbose = 0)
## calculate OSA resids for both models
fit_res <- osar(fit)
fitc_res <- osar(fitc)
## plot residuals
plot(fit_res)
The
crw
model appears provide a better fit than the rw
model, with standardized OSA residuals conforming more closely to a theoretical Normal distribution. One note of caution when calculating OSA residuals, the underlying TMB::oneStepPredict
method is currently experimental and can require considerable computation time, especially when calculating across multiple individual fits.
Estimated tracks can be mapped using the fmap
function, which uses the foieGras
-applied projection (Global Mercator). Projections can be changed easily via the crs
argument in the form of a proj4string (as in the example, below).
## change projection to Antarctic Polar Stereographic centred on
## the approximate mid-point of the track
fmap(fitc, what = "predicted",
crs = "+proj=stere +lat_0=-90 +lat_ts=-71 +lon_0=85 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=km +no_defs")
The estimated locations can be accessed for further analysis, custom mapping, etc… by using the grab
function. They can be returned as a projected sf
tibble or as a simple unprojected tibble. Note, that for all foieGras
outputs the x
, y
, x.se
and y.se
units are in km.
## grab fitted locations from fit object as a projected sf object
plocs_sf <- grab(fitc, what = "f")
## grab predicted locations in unprojected form, returning as a tibble
plocs <- grab(fitc, "p", as_sf = FALSE)
## unprojected form looks like this
plocs
#> # A tibble: 113 x 12
#> id date lon lat x y x.se y.se
#> <chr> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 54591 2012-03-05 05:00:00 111. -66.4 12309. -9956. 10.00e-6 10.00e-6
#> 2 54591 2012-03-06 05:00:00 111. -66.4 12303. -9947. 1.31e+1 7.57e-3
#> 3 54591 2012-03-07 05:00:00 110. -66.5 12297. -9963. 3.46e+0 2.34e-1
#> 4 54591 2012-03-08 05:00:00 110. -66.4 12286. -9946. 6.06e+0 5.28e+0
#> 5 54591 2012-03-09 05:00:00 110. -66.5 12296. -9973. 7.52e+0 1.07e+0
#> 6 54591 2012-03-10 05:00:00 111. -66.4 12305. -9956. 4.70e+0 2.83e+0
#> 7 54591 2012-03-11 05:00:00 111. -66.5 12328. -9970. 3.62e+0 3.07e+0
#> 8 54591 2012-03-12 05:00:00 111. -66.5 12306. -9967. 1.54e+1 3.65e+0
#> 9 54591 2012-03-13 05:00:00 110. -66.4 12291. -9953. 5.97e+0 4.87e+0
#> 10 54591 2012-03-14 05:00:00 110. -66.4 12290. -9939. 4.76e+0 4.40e+0
#> # … with 103 more rows, and 4 more variables: u <dbl>, v <dbl>,
#> # u.se <dbl>, v.se <dbl>
fit_ssm
can be applied to single tracks as shown, it can also fit to multiple individual tracks in a single input tibble
opr data.frame
. The SSM is fit to each individual separately. The resulting output is a compound tibble
with rows corresponding to each individual foieGras
fit object. The converged
column indicates whether each model fit converged successfully.
# load 2 southern elephant seal example data
data(ellies)
fit2 <- fit_ssm(ellies, vmax = 10, model = "crw", time.step = 48, verbose = 0)
# list fit outcomes for both seals
fit2
#> # A tibble: 2 x 3
#> id ssm converged
#> <chr> <list> <lgl>
#> 1 ct36-F-09 <ssm> TRUE
#> 2 ct96-16-13 <ssm> TRUE
individual id
is displayed in the 1st column, all fit output (ssm
) in the 2nd column, and convergence
status of each model fit is displayed in the 3rd column
fmap
automatically handles ssm fit objects with multiple individuals, plotting all on a single map
A behavioural index can be estimated from locations provided they occur regularly in time and they either have minimal location error (i.e. GPS data) or they have been ssm filtered. We can fit the mpm
to foieGras
ssm-predicted locations. Here we use the ssm fits to the s. elephant seal data
## fit mpm separately to each individual track
fmp <- fit2 %>%
grab(., "p", as_sf = FALSE) %>%
select(id, date, lon, lat) %>%
fit_mpm(., model = "mpm")
#>
#> fitting mpm...
fmp
#> # A tibble: 2 x 3
#> id mpm converged
#> <chr> <list> <lgl>
#> 1 ct36-F-09 <mpm> TRUE
#> 2 ct96-16-13 <mpm> TRUE
We can visualize the estimated behavioural index (move persistence) as a time series for each seal. The move persistence parameter \(g_t\) ranges continuously from 0 (little persistence, indicative of area-restricted movements) to 1 (high persistence, indicative of directed movements).
## plot mpm estimates by individual seal
grab(fmp, "fitted") %>%
ggplot() +
geom_point(aes(date, g, colour = g)) +
scale_colour_viridis_c(limits = c(0,1)) +
ylim(0,1) +
facet_wrap(~ id, scales = "free_x", ncol = 1)
A joint move persistence model jmpm
is also available for fitting to multiple individuals. This model fits to all individuals simultaneously, estimating a joint random walk variance parameter that can often better resolve subtle variations in \(g_t\).
We can explore the spatio-temporal variation in movement behaviour by plotting the \(g_t\) values along each seal’s track, but first we have to merge the ssm-predicted locations with the move persistence estimates using foieGras::join()
## join ssm predicted locations and move persistence values together
fmp_locs <- join(fit2, fmp, as_sf = FALSE)
ggplot(fmp_locs) +
geom_point(aes(lon, lat, colour = g)) +
scale_colour_viridis_c(limits = c(0,1))