# Using External Simulators

#### coala 0.6.0

Coala can call the coalescent simulators ms[1], msms[2] and scrm[3] and can use seq-gen[4] for finite sites simulations. The R version of scrm should get installed automatically as a dependeny of coala. For the other programs, you need to have an executable binary available on your system.

# Installation

Short instructions on obtaining and compiling the programs are given in the help pages of activate_ms, activate_msms and activate_seqgen. More detailed instructions are provided in the wiki.

# Activation

In addition to providing the binary for a simulator, you need to inform coala where the binary is. We refer to this process as activation of a binary. Afterwards, coala will use the simulator automatically where-ever appropriate.

There are three different ways to activate a binary:

1. Use the activate_msms and activate_seqgen functions to activate the simulators from within R. You should use the functions before creating a model.
2. Alternatively, you can place the binaries in your working directory or in a folder listed in your PATH environment variable using one of the names listed under “Expected Binary Names” below. If there is a match file, coala will automatically activate the simulator.
3. You can start the R session with an environment variable that hold the path to the binaries. In this case, the simulators should also be automatically be activated when the coala package is loaded.
4. Coala uses the R versions of scrm and ms. scrm should alawys be available. Install the CRAN package phyclust to use ms.
Simulator Priority Expected Binary Names Environment Var Function
seq-gen 100 seqgen, seq-gen, seqgen.exe, seq-gen.exe SEQGEN activate_seqgen
msms 200 msms.jar / java, java.exe MSMS / JAVA activate_msms
ms 300 activate_ms
scrm 400

You can use the list_simulators() command to view which simulators are currently available:

library(coala)
list_simulators()
##   name priority                      info
## 2 scrm      400         version : 1.7.3-1
## 1   ms      300 version : phyclust_0.1.28

# Priority

The check_model function checks which simulators support a specific model, and states the problems which coala has detected with the simulators that do not support it. For example, a simple model with infinite-sites mutations (IFS) can be simulated with scrm or – if installed – with ms and msms, but not with seq-gen because the latter generates finite-sites mutations:

model <- coal_model(10, 1) +
feat_mutation(5, model = "IFS") +
sumstat_nucleotide_div()
check_model(model)
## ms : OK
##
## scrm : OK
model
## Features:
## * Sampling of 10 (pop 1) individuals with ploidy 1 at time 0
## * Mutations with rate 5 following a IFS mutation model
## * Generating Seg. Sites
##
## Parameter: None
##
## Loci: 1 locus of length 1000
##
## Summary Statistics: stat_pi
##
## Simulator: scrm
## Command: scrm 10 1 -t 5

If multiple simulators can simulate a model, the one with the highest priority is used. In our example, that is scrm. If we would like to use ms instead, we need to raise its priority:

activate_ms(priority = 500)

# References

• [1]: Richard R. Hudson. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics (2002) 18 (2): 337-338 10.1093/bioinformatics/18.2.337
• [2]: Gregory Ewing and Joachim Hermisson. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics (2010) 26 (16): 2064-2065 10.1093/bioinformatics/btq322
• [3]: Paul R. Staab, Sha Zhu, Dirk Metzler and Gerton Lunter. scrm: efficiently simulating long sequences using the approximated coalescent with recombination. Bioinformatics (2015) 31 (10): 1680-1682. 10.1093/bioinformatics/btu861
• [4]: Andrew Rambaut and Nicholas C. Grassly. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci (1997) 13 (3): 235-238 10.1093/bioinformatics/13.3.235