Brief note: I created this package to make my analyses easier. So some statistics that have been implemented were chosen because that’s what I’ve done. If you would like a particular statistical method included, please fill out an Issue and I will try to implement it!

Most analyses follow a similar pattern to how construction/engineering projects are developed: design -> add specifications -> construction -> (optional) add to the design and specs -> cleaning, scrubbing, and polishing. The `mason`

package tries to emulate this process to make it easier to do analyses in a consistent and ‘tidy’ format.

The general command flow for using `mason`

is:

- Start the design of a blueprint for the analysis by specifying which statistical technique to use in your analysis (
`design()`

). - Add settings/options to the blueprint for the methods of the statistics (
`add_settings()`

). - Add the variables you want to run the statistics on (
`add_variables()`

). These variables include the \(y\) variables (outcomes), the \(x\) variables (predictors), covariates, and interaction variables. - Using the blueprint, construct the ‘mason project’ (stats analysis) so that the results are generated (
`construct()`

). - Sometimes analyses are too big for one first pass, from blueprint to construction, and needs to add more to the blueprint. Use
`add_variables()`

or`add_settings()`

after the`construct()`

to add to the existing results. - When you are ready, make the ‘mason project’ cleaned up by scrubbing it down and polishing it up (
`scrub()`

and`polish_*()`

commands). The results are now ready for further presentation in a figure or table!

Let’s go over an example analysis. We’ll use `glm`

for a simple linear regression. Let’s use the built-in `swiss`

dataset. A quick peek at it shows:

```
head(swiss)
#> Fertility Agriculture Examination Education Catholic
#> Courtelary 80.2 17.0 15 12 9.96
#> Delemont 83.1 45.1 6 9 84.84
#> Franches-Mnt 92.5 39.7 5 5 93.40
#> Moutier 85.8 36.5 12 7 33.77
#> Neuveville 76.9 43.5 17 15 5.16
#> Porrentruy 76.1 35.3 9 7 90.57
#> Infant.Mortality
#> Courtelary 22.2
#> Delemont 22.2
#> Franches-Mnt 20.2
#> Moutier 20.3
#> Neuveville 20.6
#> Porrentruy 26.6
```

Ok, let’s say we want to several models. We are interested in `Fertility`

and `Infant.Mortality`

as outcomes and `Education`

and `Agriculture`

as potential predictors. We also want to control for `Catholic`

. This setup means we have four potential models to analyze. With mason this is relatively easy. Analyses in mason are essentially separated into a blueprint phase and a construction phase. Since any structure or building always needs a blueprint, let’s get that started.

```
library(mason)
design(swiss, 'glm')
#> # Analysis for glm is still under construction.
#> # Showing data right now:
#> # A tibble: 47 x 6
#> Fertility Agriculture Examination Education Catholic Infant.Mortality
#> <dbl> <dbl> <int> <int> <dbl> <dbl>
#> 1 80.2 17 15 12 9.96 22.2
#> 2 83.1 45.1 6 9 84.8 22.2
#> 3 92.5 39.7 5 5 93.4 20.2
#> 4 85.8 36.5 12 7 33.8 20.3
#> 5 76.9 43.5 17 15 5.16 20.6
#> 6 76.1 35.3 9 7 90.6 26.6
#> # ... with 41 more rows
```

So far, all we’ve done is created a blueprint of the analysis, but it doesn’t contain much. Let’s add some settings to the blueprint. mason was designed to make use of the `%>%`

pipes from the package `magrittr`

(also found in `dplyr`

), so let’s load up `magrittr`

!

You’ll notice that each time, the only thing that is printed to the console is the dataset. That’s because we haven’t constructed the analysis yet! We are still in the blueprint phase, so nothing new has been added! Since we have two outcomes and two predictors, we have a total of four models to analysis. Normally we would need to run each of the models separately. However, if simply list the outcomes and the predictors in mason, it will ‘loop’ through each combination and run all four models! Let’s add the variables.

```
dp <- dp %>%
add_variables('yvars', c('Fertility', 'Infant.Mortality')) %>%
add_variables('xvars', c('Education', 'Agriculture'))
```

Alright, still nothing has happened. However, we are now at the phase that we can construct the analysis using `construct()`

.

```
dp <- construct(dp)
dp
#> # Analysis for glm constructed but has not been scrubbed.
#> # Here is a peek at the results:
#> # A tibble: 8 x 10
#> Yterms Xterms term estimate std.error statistic p.value conf.low
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fertility Agricul… (Inte… 6.03e+1 4.25 14.2 3.22e-18 52.0
#> 2 Fertility Agricul… Xterm… 1.94e-1 0.0767 2.53 1.49e- 2 0.0438
#> 3 Fertility Educati… (Inte… 7.96e+1 2.10 37.8 9.30e-36 75.5
#> 4 Fertility Educati… Xterm… -8.62e-1 0.145 -5.95 3.66e- 7 -1.15
#> 5 Infant.M… Agricul… (Inte… 2.03e+1 1.06 19.2 2.46e-23 18.3
#> 6 Infant.M… Agricul… Xterm… -7.81e-3 0.0191 -0.409 6.84e- 1 -0.0452
#> # ... with 2 more rows, and 2 more variables: conf.high <dbl>,
#> # sample.size <int>
```

Cool! This is the unadjusted model, without any covariates. We said we wanted to adjust for `Catholic`

. But let’s say we want to keep the unadjusted analysis too. Since we have ‘finished’ the analysis by cleaning it up, we can still add to the blueprint.

```
dp2 <- dp %>%
add_variables('covariates', 'Catholic') %>%
construct()
dp2
#> # Analysis for glm constructed but has not been scrubbed.
#> # Here is a peek at the results:
#> # A tibble: 20 x 10
#> Yterms Xterms term estimate std.error statistic p.value conf.low
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fertility Agricul… (Inte… 6.03e+1 4.25 14.2 3.22e-18 52.0
#> 2 Fertility Agricul… Xterm… 1.94e-1 0.0767 2.53 1.49e- 2 0.0438
#> 3 Fertility Educati… (Inte… 7.96e+1 2.10 37.8 9.30e-36 75.5
#> 4 Fertility Educati… Xterm… -8.62e-1 0.145 -5.95 3.66e- 7 -1.15
#> 5 Infant.M… Agricul… (Inte… 2.03e+1 1.06 19.2 2.46e-23 18.3
#> 6 Infant.M… Agricul… Xterm… -7.81e-3 0.0191 -0.409 6.84e- 1 -0.0452
#> # ... with 14 more rows, and 2 more variables: conf.high <dbl>,
#> # sample.size <int>
```

We now have two models in the results. We’re happy with them, so let’s clean it up using the `scrub()`

function.

All `scrub()`

does is removes any extra specs in the attributes and sets the results as the main dataset. You can see this by looking at it’s details and comparing to the unscrubbed version.

```
colnames(dp2)
#> [1] "Fertility" "Agriculture" "Examination"
#> [4] "Education" "Catholic" "Infant.Mortality"
colnames(dp_clean)
#> [1] "Yterms" "Xterms" "term" "estimate" "std.error"
#> [6] "statistic" "p.value" "conf.low" "conf.high" "sample.size"
names(attributes(dp2))
#> [1] "names" "class" "row.names" "specs"
names(attributes(dp_clean))
#> [1] "names" "class" "row.names"
class(dp2)
#> [1] "bp" "glm_bp" "data.frame"
class(dp_clean)
#> [1] "tbl_df" "tbl" "data.frame"
```

And all as a single pipe chain:

```
swiss %>%
design('glm') %>%
add_settings() %>%
add_variables('yvars', c('Fertility', 'Infant.Mortality')) %>%
add_variables('xvars', c('Education', 'Agriculture')) %>%
construct() %>%
add_variables('covariates', 'Catholic') %>%
construct() %>%
scrub()
#> # A tibble: 20 x 10
#> Yterms Xterms term estimate std.error statistic p.value conf.low
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fertility Agricu… (Inte… 6.03e+1 4.25 14.2 3.22e-18 5.20e+1
#> 2 Fertility Agricu… <-Xte… 1.94e-1 0.0767 2.53 1.49e- 2 4.38e-2
#> 3 Fertility Educat… (Inte… 7.96e+1 2.10 37.8 9.30e-36 7.55e+1
#> 4 Fertility Educat… <-Xte… -8.62e-1 0.145 -5.95 3.66e- 7 -1.15e+0
#> 5 Infant.M… Agricu… (Inte… 2.03e+1 1.06 19.2 2.46e-23 1.83e+1
#> 6 Infant.M… Agricu… <-Xte… -7.81e-3 0.0191 -0.409 6.84e- 1 -4.52e-2
#> 7 Infant.M… Educat… (Inte… 2.03e+1 0.653 31.1 4.85e-32 1.90e+1
#> 8 Infant.M… Educat… <-Xte… -3.01e-2 0.0449 -0.670 5.07e- 1 -1.18e-1
#> 9 Fertility Agricu… (Inte… 5.99e+1 3.99 15.0 6.35e-19 5.20e+1
#> 10 Fertility Agricu… <-Xte… 1.10e-1 0.0785 1.40 1.70e- 1 -4.43e-2
#> 11 Fertility Agricu… Catho… 1.15e-1 0.0427 2.69 1.01e- 2 3.12e-2
#> 12 Fertility Educat… (Inte… 7.42e+1 2.35 31.6 7.35e-32 6.96e+1
#> 13 Fertility Educat… <-Xte… -7.88e-1 0.129 -6.10 2.43e- 7 -1.04e+0
#> 14 Fertility Educat… Catho… 1.11e-1 0.0298 3.72 5.60e- 4 5.25e-2
#> 15 Infant.M… Agricu… (Inte… 2.03e+1 1.04 19.4 3.37e-23 1.82e+1
#> 16 Infant.M… Agricu… <-Xte… -2.01e-2 0.0206 -0.976 3.35e- 1 -6.04e-2
#> 17 Infant.M… Agricu… Catho… 1.66e-2 0.0112 1.49 1.44e- 1 -5.30e-3
#> 18 Infant.M… Educat… (Inte… 1.97e+1 0.825 23.9 7.93e-27 1.81e+1
#> 19 Infant.M… Educat… <-Xte… -2.24e-2 0.0454 -0.495 6.23e- 1 -1.11e-1
#> 20 Infant.M… Educat… Catho… 1.15e-2 0.0105 1.10 2.79e- 1 -9.04e-3
#> # ... with 2 more variables: conf.high <dbl>, sample.size <int>
```

There are also additional `polish_*`

type commands that are more or less simply wrappers around commands that you may do on the results dataset, like filtering or renaming. The list of polish commands can be found in `?mason::polish`

.