```
library(ggplot2)
library(dplyr)
library(tidyr)
library(faux)
```

The `rnorm_multi()`

function makes multiple normally distributed vectors with specified parameters and relationships.

For example, the following creates a sample that has 100 observations of 3 variables, drawn from a population where A has a mean of 0 and SD of 1, while B and C have means of 20 and SDs of 5. A correlates with B and C with r = 0.5, and B and C correlate with r = 0.25.

```
<- rnorm_multi(n = 100,
dat mu = c(0, 20, 20),
sd = c(1, 5, 5),
r = c(0.5, 0.5, 0.25),
varnames = c("A", "B", "C"),
empirical = FALSE)
```

n | var | A | B | C | mean | sd |
---|---|---|---|---|---|---|

100 | A | 1.00 | 0.49 | 0.51 | -0.04 | 1.04 |

100 | B | 0.49 | 1.00 | 0.19 | 19.95 | 4.91 |

100 | C | 0.51 | 0.19 | 1.00 | 19.64 | 4.61 |

Table: Sample stats

You can specify the correlations in one of four ways:

- A single r for all pairs
- A vars by vars matrix
- A vars*vars length vector
- A vars*(vars-1)/2 length vector

If you want all the pairs to have the same correlation, just specify a single number.

`<- rnorm_multi(100, 5, 0, 1, .3, varnames = letters[1:5]) bvn `

n | var | a | b | c | d | e | mean | sd |
---|---|---|---|---|---|---|---|---|

100 | a | 1.00 | 0.18 | 0.29 | 0.33 | 0.31 | 0.04 | 1.03 |

100 | b | 0.18 | 1.00 | 0.18 | 0.33 | 0.30 | 0.13 | 1.06 |

100 | c | 0.29 | 0.18 | 1.00 | 0.14 | 0.20 | 0.07 | 0.99 |

100 | d | 0.33 | 0.33 | 0.14 | 1.00 | 0.28 | 0.15 | 1.06 |

100 | e | 0.31 | 0.30 | 0.20 | 0.28 | 1.00 | 0.03 | 1.03 |

Table: Sample stats from a single rho

If you already have a correlation matrix, such as the output of `cor()`

, you can specify the simulated data with that.

```
<- cor(iris[,1:4])
cmat <- rnorm_multi(100, 4, 0, 1, cmat,
bvn varnames = colnames(cmat))
```

n | var | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | mean | sd |
---|---|---|---|---|---|---|---|

100 | Sepal.Length | 1.00 | -0.24 | 0.87 | 0.82 | 0.09 | 0.98 |

100 | Sepal.Width | -0.24 | 1.00 | -0.58 | -0.52 | 0.07 | 1.08 |

100 | Petal.Length | 0.87 | -0.58 | 1.00 | 0.96 | 0.04 | 1.03 |

100 | Petal.Width | 0.82 | -0.52 | 0.96 | 1.00 | 0.05 | 1.04 |

Table: Sample stats from a correlation matrix

You can specify your correlation matrix by hand as a vars*vars length vector, which will include the correlations of 1 down the diagonal.

```
<- c(1, .3, .5,
cmat 3, 1, 0,
.5, 0, 1)
.<- rnorm_multi(100, 3, 0, 1, cmat,
bvn varnames = c("first", "second", "third"))
```

n | var | first | second | third | mean | sd |
---|---|---|---|---|---|---|

100 | first | 1.00 | 0.31 | 0.48 | 0.05 | 1.02 |

100 | second | 0.31 | 1.00 | 0.01 | -0.14 | 0.86 |

100 | third | 0.48 | 0.01 | 1.00 | 0.02 | 1.12 |

Table: Sample stats from a vars*vars vector

You can specify your correlation matrix by hand as a vars*(vars-1)/2 length vector, skipping the diagonal and lower left duplicate values.

```
<- .3
rho1_2 <- .5
rho1_3 <- .5
rho1_4 <- .2
rho2_3 <- 0
rho2_4 <- -.3
rho3_4 <- c(rho1_2, rho1_3, rho1_4, rho2_3, rho2_4, rho3_4)
cmat <- rnorm_multi(100, 4, 0, 1, cmat,
bvn varnames = letters[1:4])
```

n | var | a | b | c | d | mean | sd |
---|---|---|---|---|---|---|---|

100 | a | 1.00 | 0.29 | 0.61 | 0.41 | -0.10 | 1.06 |

100 | b | 0.29 | 1.00 | 0.23 | -0.03 | 0.09 | 1.14 |

100 | c | 0.61 | 0.23 | 1.00 | -0.28 | 0.08 | 1.17 |

100 | d | 0.41 | -0.03 | -0.28 | 1.00 | -0.12 | 0.97 |

Table: Sample stats from a (vars*(vars-1)/2) vector

If you want your samples to have the *exact* correlations, means, and SDs you entered, set `empirical`

to TRUE.

```
<- rnorm_multi(100, 5, 0, 1, .3,
bvn varnames = letters[1:5],
empirical = T)
```

n | var | a | b | c | d | e | mean | sd |
---|---|---|---|---|---|---|---|---|

100 | a | 1.0 | 0.3 | 0.3 | 0.3 | 0.3 | 0 | 1 |

100 | b | 0.3 | 1.0 | 0.3 | 0.3 | 0.3 | 0 | 1 |

100 | c | 0.3 | 0.3 | 1.0 | 0.3 | 0.3 | 0 | 1 |

100 | d | 0.3 | 0.3 | 0.3 | 1.0 | 0.3 | 0 | 1 |

100 | e | 0.3 | 0.3 | 0.3 | 0.3 | 1.0 | 0 | 1 |

Table: Sample stats with empirical = TRUE

Us `rnorm_pre()`

to create a vector with a specified correlation to one or more pre-existing variables. The following code creates a new column called `B`

with a mean of 10, SD of 2 and a correlation of r = 0.5 to the `A`

column.

```
<- rnorm_multi(varnames = "A") %>%
dat mutate(B = rnorm_pre(A, mu = 10, sd = 2, r = 0.5))
```

n | var | A | B | mean | sd |
---|---|---|---|---|---|

100 | A | 1.00 | 0.37 | -0.03 | 1.10 |

100 | B | 0.37 | 1.00 | 10.02 | 2.28 |

Set `empirical = TRUE`

to return a vector with the **exact** specified parameters.

`$C <- rnorm_pre(dat$A, mu = 10, sd = 2, r = 0.5, empirical = TRUE) dat`

n | var | A | B | C | mean | sd |
---|---|---|---|---|---|---|

100 | A | 1.00 | 0.37 | 0.50 | -0.03 | 1.10 |

100 | B | 0.37 | 1.00 | 0.15 | 10.02 | 2.28 |

100 | C | 0.50 | 0.15 | 1.00 | 10.00 | 2.00 |

You can also specify correlations to more than one vector by setting the first argument to a data frame containing only the continuous columns and r to the correlation with each column.

`$D <- rnorm_pre(dat, r = c(.1, .2, .3), empirical = TRUE) dat`

n | var | A | B | C | D | mean | sd |
---|---|---|---|---|---|---|---|

100 | A | 1.00 | 0.37 | 0.50 | 0.1 | -0.03 | 1.10 |

100 | B | 0.37 | 1.00 | 0.15 | 0.2 | 10.02 | 2.28 |

100 | C | 0.50 | 0.15 | 1.00 | 0.3 | 10.00 | 2.00 |

100 | D | 0.10 | 0.20 | 0.30 | 1.0 | 0.00 | 1.00 |

Not all correlation patterns are possible, so youâ€™ll get an error message if the correlations you ask for are impossible.

```
$E <- rnorm_pre(dat, r = .9)
dat#> Warning in rnorm_pre(dat, r = 0.9): Correlations are impossible.
```