Abstract

sjSDM is a scalable joint species distribution model based on Monte-Carlo intergration of the joint likelihood.

Load the package via

`library(sjSDM)`

sjSDM depends on the Anaconda distribution of python and pytorch. You will get a warning if you don’t have python installed. sjSDM depends on Anaconda which will be installed automatically if it is not available.

To install the CPU version, run:

`install_sjSDM()`

For the GPU version (NVIDIA GPUs are only supported), run:

`install_sjSDM(method = "gpu")`

More details on and trouble-shooting for installing the necessary dependencies is explained in the vignette dependencies, call

`vignette("Dependencies", package = "sjSDM")`

To cite sjSDM in a publication, use

`citation("sjSDM")`

sjSDM has a function to create test data. Here, we create a dataset with 3 environmental predictors, 5 species and 100 sites.

`= simulate_SDM(env = 3L, species = 5L, sites = 100L) com `

The model is then fit with the function sjSDM(). You have to provide predictors (can be also a data.frame) and response as matrices.

`= sjSDM(Y = com$response, env = com$env_weights, iter = 100L, se=TRUE) model `

Things you can do with the model output

```
coef(model)
summary(model)
getCov(model)
```

sjSDM supports formula description for the predictors.

E.g. interaction with intercept and calculate p-values:

```
= sjSDM(Y = com$response,
model env = linear(data = com$env_weights, formula = ~ X1*X2,), iter = 50L, se = TRUE)
summary(model)
```

E.g. quadratic effect without intercept:

```
= sjSDM(Y = com$response,
model env = linear(data = com$env_weights, formula = ~0+ I(X1^2)), iter = 50L, se = TRUE)
summary(model)
```

jSDMs account for correlation between species within communities (sites), in real datasets, however, communities (sites) are often additionally correlated (== spatial autocorrelation).

Let’s first simulate test data: 1) Simulate jSDM without a link (normal response)

```
= simulate_SDM(env = 3L, species = 5L, sites = 100L,
com link = "identical", response = "identical")
= com$env_weights
X = com$response Y
```

- add spatial residuals (create coordinates and use spatial distance matrix to draw autocorrelated residuals for each species)

```
= matrix(rnorm(200), 100, 2)+2
XYcoords = as.matrix(dist(XYcoords))
WW = mvtnorm::rmvnorm( 5L, sigma = exp(-WW)) spatialResiduals
```

- Finish test data

```
= Y + t(spatialResiduals)
Ysp = ifelse(Ysp < 0, 0, 1) # multivariate probit model Y
```

Using spatial eigenvectors as additional predictors is a common approach:

```
= generateSpatialEV(XYcoords)
SPeigen
= sjSDM(Y, env = linear(X, ~.), spatial = linear(SPeigen, ~0+.), iter = 100L)
model summary(model)
```

Type I analysis of variance for the two/three different components (environment, biotic associations, and space (optional)).

```
= anova(model)
an plot(an)
```

We can also visualize the internal meta-community structure (only for models with space):

`plot(an, internal = TRUE)`

We can also partition the effects of the different components. This function will be deprecated in future. Please use `plot(anova(model), internal = TRUE)`

```
= importance(model)
imp plot(imp)
```

sjSDM supports l1 (lasso) and l2 (ridge) regularization: * alpha is the weighting between lasso and ridge * alpha = 0.0 corresponds to pure lasso * alpha = 1.0 corresponds to pure ridge

```
= sjSDM(Y = com$response, env = linear(data = com$env_weights, formula = ~0+ I(X1^2),lambda = 0.5), iter = 50L)
model summary(model)
```

We can do the same for the species associations:

```
= sjSDM(Y = com$response,
model env = linear(data = com$env_weights, formula = ~0+ I(X1^2),lambda = 0.5),
biotic = bioticStruct(lambda =0.1),
iter = 50L)
summary(model)
```

```
= sjSDM(Y, env = linear(X, ~X1+X2), spatial = linear(XYcoords, ~0+X1:X2, lambda = 0.4))
model summary(model)
```

```
= simulate_SDM(env = 3L, species = 5L, sites = 100L)
com = com$env_weights
X = com$response
Y
# three fully connected layers with relu as activation function
= sjSDM(Y = Y,
model env = DNN(data = X, formula = ~., hidden = c(10L, 10L, 10L), activation = "relu"),
iter = 50L, se = TRUE)
summary(model)
```

The methods for sjSDM() work also for the non-linear model:

```
getCov(model) # species association matrix
= predict(model) # predict on fitted data
pred = predict(model, newdata = X) # predict on new data pred
```

Extract and set weights of model:

```
= getWeights(model) # get layer weights and sigma
weights setWeights(model, weights)
```

Plot the training history:

`plot(model)`

```
= sjSDM(Y, env = linear(X, ~X1+X2), spatial = DNN(XYcoords, ~0+X1:X2, lambda = 0.4))
model summary(model)
```

```
= sjSDM(Y, env = DNN(X, ~X1+X2), spatial = DNN(XYcoords, ~0+X1:X2, lambda = 0.4))
model summary(model)
```

sjSDM supports other responses than presence-absence data: Simulate non-presence-absence data:

```
= simulate_SDM(env = 3L, species = 5L, sites = 100L,link = "identical", response = "count") # samples from a Poisson distribution
com = com$env_weights
X = com$response Y
```

```
= sjSDM(Y, env = linear(X, ~.), se = TRUE, iter = 50L, family = poisson("log"))
model summary(model)
```

```
= sjSDM(log(Y+0.01), env = linear(X, ~.), se = TRUE, iter = 50L, family = gaussian("identity"))
model summary(model)
```