Getting Started

Introduction

This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the mlr3book section on “Spatiotemporal Analysis”.

After loading the package via library("mlr3spatiotempcv"), the spatiotemporal resampling methods and example tasks provided by {mlr3spatiotempcv} are available to the user alongside the default {mlr3} resampling methods and tasks.

Creating a spatial Task

To make use of spatial resampling methods, a {mlr3} task that is aware of its spatial characteristic needs to be created. Two child classes exist in {mlr3spatiotempcv} for this purpose:

To create one of these, one can either pass a sf object as the “backend” directly:

# create 'sf' object
data_sf = sf::st_as_sf(ecuador, coords = c("x", "y"))

# create mlr3 task
task = TaskClassifST$new("ecuador_sf",
  backend = data_sf, target = "slides", positive = "TRUE"
)

or use a plain data.frame. In this case, the constructor of TaskClassifST needs a few more arguments:

data = mlr3::as_data_backend(ecuador)
task = TaskClassifST$new("ecuador",
  backend = data, target = "slides",
  positive = "TRUE", extra_args = list(coordinate_names = c("x", "y"))
)

Now this Task can be used as a normal {mlr3} task in any kind of modeling scenario. Have a look at the mlr3book section on “Spatiotemporal Analysis” on how to apply a spatiotemporal resampling method to such a task.

Contributed assets by {mlr3spatiotempcv}

In {mlr3}, dictionaries are used for overview purposes of available methods. The following sections show which dictionaries get appended with new entries when loading {mlr3spatiotempcv}.

Task Type

Additional task types:

mlr_reflections$task_types
#>       type          package          task        learner        prediction
#> 1: classif             mlr3   TaskClassif LearnerClassif PredictionClassif
#> 2: classif mlr3spatiotempcv TaskClassifST LearnerClassif PredictionClassif
#> 3:    regr             mlr3      TaskRegr    LearnerRegr    PredictionRegr
#> 4:    regr mlr3spatiotempcv    TaskRegrST    LearnerRegr    PredictionRegr
#>           measure
#> 1: MeasureClassif
#> 2: MeasureClassif
#> 3:    MeasureRegr
#> 4:    MeasureRegr

Task Column Roles

Additional column roles:

mlr_reflections$task_col_roles
#> $regr
#> [1] "feature" "target"  "name"    "order"   "stratum" "group"   "weight" 
#> 
#> $classif
#> [1] "feature" "target"  "name"    "order"   "stratum" "group"   "weight" 
#> 
#> $classif_st
#> [1] "feature"     "target"      "name"        "order"       "stratum"    
#> [6] "group"       "weight"      "coordinates"
#> 
#> $regr_st
#> [1] "feature"     "target"      "name"        "order"       "stratum"    
#> [6] "group"       "weight"      "coordinates"

Resampling Methods

Additional resampling methods:

and their respective repeated versions.

as.data.table(mlr_resamplings)
#>                      key                         label
#>  1:            bootstrap                     Bootstrap
#>  2:               custom                 Custom Splits
#>  3:            custom_cv Custom Split Cross-Validation
#>  4:                   cv              Cross-Validation
#>  5:              holdout                       Holdout
#>  6:             insample           Insample Resampling
#>  7:                  loo                 Leave-One-Out
#>  8:          repeated_cv     Repeated Cross-Validation
#>  9:  repeated_spcv_block                          <NA>
#> 10: repeated_spcv_coords                          <NA>
#> 11:   repeated_spcv_disc                          <NA>
#> 12:    repeated_spcv_env                          <NA>
#> 13:  repeated_spcv_tiles                          <NA>
#> 14: repeated_sptcv_cluto                          <NA>
#> 15:  repeated_sptcv_cstf                          <NA>
#> 16:           spcv_block                          <NA>
#> 17:          spcv_buffer                          <NA>
#> 18:          spcv_coords                          <NA>
#> 19:            spcv_disc                          <NA>
#> 20:             spcv_env                          <NA>
#> 21:           spcv_tiles                          <NA>
#> 22:          sptcv_cluto                          <NA>
#> 23:           sptcv_cstf                          <NA>
#> 24:          subsampling                   Subsampling
#>                      key                         label
#>                                                          params iters
#>  1:                                               ratio,repeats    30
#>  2:                                                                NA
#>  3:                                                                NA
#>  4:                                                       folds    10
#>  5:                                                       ratio     1
#>  6:                                                                 1
#>  7:                                                                NA
#>  8:                                               folds,repeats   100
#>  9:                 folds,repeats,rows,cols,range,selection,...    10
#> 10:                                               folds,repeats    10
#> 11:                    folds,radius,buffer,prob,replace,repeats    10
#> 12:                                      folds,repeats,features    10
#> 13: dsplit,nsplit,rotation,user_rotation,offset,user_offset,...     0
#> 14:                                               folds,repeats    10
#> 15:                      folds,repeats,space_var,time_var,class    10
#> 16:                 folds,rows,cols,range,selection,rasterLayer    10
#> 17:                                   theRange,spDataType,addBG     0
#> 18:                                                       folds    10
#> 19:                            folds,radius,buffer,prob,replace    10
#> 20:                                              folds,features    10
#> 21: dsplit,nsplit,rotation,user_rotation,offset,user_offset,...     0
#> 22:                                                       folds    10
#> 23:                              folds,space_var,time_var,class    10
#> 24:                                               ratio,repeats    30
#>                                                          params iters

Examples Tasks

Additional example tasks:

Upstream Packages and Scientific References

The following table lists all spatiotemporal methods implemented in {mlr3spatiotempcv} (or {mlr3}), their upstream R package and scientific references. All methods besides "spcv_buffer" also have a corresponding “repeated” method.

Category (Package) Method Name Reference mlr3 Notation
Buffering, spatial (blockCV) Spatial Buffering Valavi et al. (2018) rsmp("spcv_buffer")
Buffering, spatial (sperrorest) Spatial Disc Brenning (2012) rsmp("spcv_disc")
Blocking, spatial (blockCV) Spatial Blocking Valavi et al. (2018) rsmp("spcv_block")
Blocking, spatial (sperrorest) Spatial Tiles Valavi et al. (2018) rsmp("spcv_tiles")
Clustering, spatial (sperrorest) Spatial CV Brenning (2012) rsmp("spcv_coords")
Clustering, feature-space (blockCV) Environmental Blocking Valavi et al. (2018) rsmp("spcv_env")




Grouping, predefined inds (mlr3) Predefined partitions
   -
rsmp("custom_cv")
Grouping, spatiotemporal (mlr3) via col_roles "group"
   -
rsmp("cv"), Task$set_col_roles()
Grouping, spatiotemporal (CAST) Leave-Location-and-Time-Out Meyer et al. (2018) rsmp("sptcv_cstf")
Clustering, spatiotemporal (skmeans) Spatiotemporal Clustering Zhao and Karypis (2002) rsmp("sptcv_cluto")

References

Brenning, Alexander. 2012. Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest.” In 2012 IEEE International Geoscience and Remote Sensing Symposium. IEEE. https://doi.org/10.1109/igarss.2012.6352393.
Meyer, Hanna, Christoph Reudenbach, Tomislav Hengl, Marwan Katurji, and Thomas Nauss. 2018. “Improving Performance of Spatio-Temporal Machine Learning Models Using Forward Feature Selection and Target-Oriented Validation.” Environmental Modelling & Software 101 (March): 1–9. https://doi.org/10.1016/j.envsoft.2017.12.001.
Valavi, Roozbeh, Jane Elith, Jose J. Lahoz-Monfort, and Gurutzeta Guillera-Arroita. 2018. blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv, June. https://doi.org/10.1101/357798.
Zhao, Ying, and George Karypis. 2002. “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” 11th Conference of Information and Knowledge Management (CIKM), 515–24. http://glaros.dtc.umn.edu/gkhome/node/167.