This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the mlr3book section on “Spatiotemporal Analysis”.
After loading the package via
library("mlr3spatiotempcv")
, the spatiotemporal resampling
methods and example tasks provided by {mlr3spatiotempcv} are available
to the user alongside the default {mlr3} resampling methods and
tasks.
To make use of spatial resampling methods, a {mlr3} task that is aware of its spatial characteristic needs to be created. Two child classes exist in {mlr3spatiotempcv} for this purpose:
TaskClassifST
TaskRegrST
To create one of these, one can either pass a sf
object
as the “backend” directly:
# create 'sf' object
= sf::st_as_sf(ecuador, coords = c("x", "y"))
data_sf
# create mlr3 task
= TaskClassifST$new("ecuador_sf",
task backend = data_sf, target = "slides", positive = "TRUE"
)
or use a plain data.frame
. In this case, the constructor
of TaskClassifST
needs a few more arguments:
= mlr3::as_data_backend(ecuador)
data = TaskClassifST$new("ecuador",
task backend = data, target = "slides",
positive = "TRUE", extra_args = list(coordinate_names = c("x", "y"))
)
Now this Task can be used as a normal {mlr3} task in any kind of modeling scenario. Have a look at the mlr3book section on “Spatiotemporal Analysis” on how to apply a spatiotemporal resampling method to such a task.
In {mlr3}, dictionaries are used for overview purposes of available methods. The following sections show which dictionaries get appended with new entries when loading {mlr3spatiotempcv}.
Additional task types:
TaskClassifST
TaskRegrST
$task_types
mlr_reflections#> type package task learner prediction
#> 1: classif mlr3 TaskClassif LearnerClassif PredictionClassif
#> 2: classif mlr3spatiotempcv TaskClassifST LearnerClassif PredictionClassif
#> 3: regr mlr3 TaskRegr LearnerRegr PredictionRegr
#> 4: regr mlr3spatiotempcv TaskRegrST LearnerRegr PredictionRegr
#> measure
#> 1: MeasureClassif
#> 2: MeasureClassif
#> 3: MeasureRegr
#> 4: MeasureRegr
Additional column roles:
coordinates
$task_col_roles
mlr_reflections#> $regr
#> [1] "feature" "target" "name" "order" "stratum" "group" "weight"
#>
#> $classif
#> [1] "feature" "target" "name" "order" "stratum" "group" "weight"
#>
#> $classif_st
#> [1] "feature" "target" "name" "order" "stratum"
#> [6] "group" "weight" "coordinates"
#>
#> $regr_st
#> [1] "feature" "target" "name" "order" "stratum"
#> [6] "group" "weight" "coordinates"
Additional resampling methods:
spcv_block
spcv_buffer
spcv_coords
spcv_disc
spcv_tiles
spcv_env
sptcv_cluto
sptcv_cstf
and their respective repeated versions.
as.data.table(mlr_resamplings)
#> key label
#> 1: bootstrap Bootstrap
#> 2: custom Custom Splits
#> 3: custom_cv Custom Split Cross-Validation
#> 4: cv Cross-Validation
#> 5: holdout Holdout
#> 6: insample Insample Resampling
#> 7: loo Leave-One-Out
#> 8: repeated_cv Repeated Cross-Validation
#> 9: repeated_spcv_block <NA>
#> 10: repeated_spcv_coords <NA>
#> 11: repeated_spcv_disc <NA>
#> 12: repeated_spcv_env <NA>
#> 13: repeated_spcv_tiles <NA>
#> 14: repeated_sptcv_cluto <NA>
#> 15: repeated_sptcv_cstf <NA>
#> 16: spcv_block <NA>
#> 17: spcv_buffer <NA>
#> 18: spcv_coords <NA>
#> 19: spcv_disc <NA>
#> 20: spcv_env <NA>
#> 21: spcv_tiles <NA>
#> 22: sptcv_cluto <NA>
#> 23: sptcv_cstf <NA>
#> 24: subsampling Subsampling
#> key label
#> params iters
#> 1: ratio,repeats 30
#> 2: NA
#> 3: NA
#> 4: folds 10
#> 5: ratio 1
#> 6: 1
#> 7: NA
#> 8: folds,repeats 100
#> 9: folds,repeats,rows,cols,range,selection,... 10
#> 10: folds,repeats 10
#> 11: folds,radius,buffer,prob,replace,repeats 10
#> 12: folds,repeats,features 10
#> 13: dsplit,nsplit,rotation,user_rotation,offset,user_offset,... 0
#> 14: folds,repeats 10
#> 15: folds,repeats,space_var,time_var,class 10
#> 16: folds,rows,cols,range,selection,rasterLayer 10
#> 17: theRange,spDataType,addBG 0
#> 18: folds 10
#> 19: folds,radius,buffer,prob,replace 10
#> 20: folds,features 10
#> 21: dsplit,nsplit,rotation,user_rotation,offset,user_offset,... 0
#> 22: folds 10
#> 23: folds,space_var,time_var,class 10
#> 24: ratio,repeats 30
#> params iters
Additional example tasks:
tsk("ecuador")
(spatial, classif)tsk("cookfarm")
(spatiotemp, regr)The following table lists all spatiotemporal methods implemented in
{mlr3spatiotempcv} (or {mlr3}), their upstream R package and scientific
references. All methods besides "spcv_buffer"
also have a
corresponding “repeated” method.
Category | (Package) Method Name | Reference | mlr3 Notation |
---|---|---|---|
Buffering, spatial | (blockCV) Spatial Buffering | Valavi et al. (2018) | rsmp("spcv_buffer") |
Buffering, spatial | (sperrorest) Spatial Disc | Brenning (2012) | rsmp("spcv_disc") |
Blocking, spatial | (blockCV) Spatial Blocking | Valavi et al. (2018) | rsmp("spcv_block") |
Blocking, spatial | (sperrorest) Spatial Tiles | Valavi et al. (2018) | rsmp("spcv_tiles") |
Clustering, spatial | (sperrorest) Spatial CV | Brenning (2012) | rsmp("spcv_coords") |
Clustering, feature-space | (blockCV) Environmental Blocking | Valavi et al. (2018) | rsmp("spcv_env") |
Grouping, predefined inds | (mlr3) Predefined partitions |
|
rsmp("custom_cv") |
Grouping, spatiotemporal | (mlr3) via col_roles "group" |
|
rsmp("cv") , Task$set_col_roles() |
Grouping, spatiotemporal | (CAST) Leave-Location-and-Time-Out | Meyer et al. (2018) | rsmp("sptcv_cstf") |
Clustering, spatiotemporal | (skmeans) Spatiotemporal Clustering | Zhao and Karypis (2002) | rsmp("sptcv_cluto") |