We will start by loading the
codalm R package, in addition to the
future.apply packages, which we will use for bootstrapping.
library(codalm) require('future') #> Loading required package: future require('future.apply') #> Loading required package: future.apply
We will be analyzing how the percentages of mothers and fathers with low, medium, and high education levels relate to each other, across 31 countries in Europe. The format that we need is for both compositions to be in matrices, with one row per observation. We will also normalize the rows of these matrices to ensure that they sum to 1, although the
codalm function would also take care of this for us.
data("educFM") <- as.matrix(educFM[,2:4]) father <- father / rowSums(father) father <- as.matrix(educFM[,5:7] ) mother <- mother/rowSums(mother)mother
To estimate the coefficient matrix B, we can use the
<- codalm(y = father, x = mother) B_est B_est#> [,1] [,2] [,3] #> [1,] 9.113082e-01 0.05115443 0.03753734 #> [2,] 1.427924e-08 0.90541094 0.09458905 #> [3,] 1.860624e-07 0.14154036 0.85845945
To see the interpretation of this matrix, please see Fiksel et al. (2021). If all the rows of B_est are exactly the same, it is recommended to set
accelerate = FALSE as a sensitivity check.
We can also use the bootstrap to estimate 95% confidence intervals. We will only use 50 bootstrap iterations as an example (nboot = 50), but is recommended to do more.
<- codalm_ci(y = father, x = mother, nboot = 50, B_ci conf = .95) $ci_L B_ci#> [,1] [,2] [,3] #> [1,] 8.398315e-01 1.689452e-02 0.01890371 #> [2,] 5.512479e-23 8.310054e-01 0.04690478 #> [3,] 1.533253e-13 1.876738e-09 0.64493422 $ci_U B_ci#> [,1] [,2] [,3] #> [1,] 0.95443317 0.1174445 0.0696308 #> [2,] 0.04001022 0.9522237 0.1655356 #> [3,] 0.16044680 0.3550658 1.0000000
These matrices given the lower and upper bounds for the confidence interval for each element of the coefficient matrix.
You can also take advantage of parallelization, if you have multiple cores available.
<- 2 ncores <- codalm_ci(y = father, x = mother, nboot = 50, B_ci_parallel conf = .95, parallel = TRUE, ncpus = ncores, strategy = 'multisession') identical(B_ci$ci_L, B_ci_parallel$ci_L) #>  TRUE identical(B_ci$ci_U, B_ci_parallel$ci_U) #>  TRUE
Finally, we will do a permutation test for linear independence. Again, we will only do 50 permutations as an example, but in practice this number should be higher. For demonstration purposes, we will generate the compositional outcome independently of the compositional predictor
set.seed(123) <- gtools::rdirichlet(100, rep(1, 3)) x <- gtools::rdirichlet(100, rep(1, 3)) y <- codalm_indep_test(y = y, x = x, indep_test_pval nperms = 100, init.seed = 123) indep_test_pval#>  0.2871287
This function can also be parallelized. Unlike the bootstrapping, there is no need to differentiate between whether the user is using a Windows or Unix system.
<- codalm_indep_test(y = y, x = x, indep_test_pval_parallel nperms = 100, init.seed = 123, parallel = TRUE, ncpus = ncores, strategy = 'multisession') indep_test_pval_parallel#>  0.2871287