The package name zGPS.AO
stands for Zero Inflated Gamma Poisson Shrinker with AE ontology, and in this vignettes, we will introduce how to use this package’s main function zinb_analysis_tool
, by applying it to a real dataset, the VAERS dataset.
The VAERS dataset is built in our package, and here’s what the data looks like:
data(vaers_data)
dim(vaers_data)
#> [1] 1017599 3
vaers_data[1:10,]
#> # A tibble: 10 x 3
#> ID VAX_TYPE AE_NAME
#> <dbl> <chr> <chr>
#> 1 223972 DTAP Swelling
#> 2 223972 DTAP Feeling hot
#> 3 223972 DTAP Erythema
#> 4 223972 DTAP Tenderness
#> 5 231988 VARCEL Pruritus
#> 6 231988 VARCEL Urticaria
#> 7 231988 MMR Pruritus
#> 8 231988 MMR Urticaria
#> 9 231988 HEPA Pruritus
#> 10 231988 HEPA Urticaria
This data contains information about report ids, and all the vaccine-AE combinations in each reports. For example, in this dataset, report 223972 contains one vaccine DTAP, and four AEs.
We are interested in identifying Relative Risks (RR) between vaccines and AEs. However, we are not only interested in the RR between vaccines and individual AEs, also we investigate the RR of vaccines and AE groups. The number of reports for a single vaccine and a single AE is too small and may have too much randomness. However, if we combine similar AEs into groups, and invest the vaccine-AE group associations, we could get more reliable results.
Here’s an example of group structure data from Medical Dictionary for Regulatory Activities (MedDRA) dataset.
data("dd.meddra")
dim(dd.meddra)
#> [1] 12482 2
dd.meddra[1:10,]
#> AE_NAME GROUP_NAME
#> 1 Anaemia Anaemias nonhaemolytic and marrow depression
#> 2 Anaemia folate deficiency Anaemias nonhaemolytic and marrow depression
#> 3 Anaemia macrocytic Anaemias nonhaemolytic and marrow depression
#> 4 Anaemia megaloblastic Anaemias nonhaemolytic and marrow depression
#> 5 Anaemia neonatal Anaemias nonhaemolytic and marrow depression
#> 6 Anaemia of chronic disease Anaemias nonhaemolytic and marrow depression
#> 7 Anaemia vitamin B12 deficiency Anaemias nonhaemolytic and marrow depression
#> 8 Aplasia pure red cell Anaemias nonhaemolytic and marrow depression
#> 9 Aplastic anaemia Anaemias nonhaemolytic and marrow depression
#> 10 Babesiosis Anaemias nonhaemolytic and marrow depression
The group data used in this package is a data.frame containing two columns, AE_NAME
and GROUP_NAME
, specifying which AEs belong to which AE groups. It doesn’t hurt if there are some AEs belongs to more than one groups, but it is recommended to have non-overlapping groups. Those AE names in the report data should appear in the group data.
Reports of all vaccines will be used to do the analysis, but our model requires users to specify vaccines of their interest. Here is an example of how to specify them by an special list merge_list
(will be used in the main function later):
data("merge_list")
merge_list[1:3]
#> [[1]]
#> [[1]][[1]]
#> [1] "FLUN3" "FLUN4"
#>
#> [[1]][[2]]
#> [1] "FLUN"
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] "FLU3" "FLU4"
#>
#> [[2]][[2]]
#> [1] "FLU"
#>
#>
#> [[3]]
#> [[3]][[1]]
#> [1] "MMR"
#>
#> [[3]][[2]]
#> [1] "MMR"
The merge_list
is a list containing sublists. For example, The first sublist here shows we are interested in FLUN3, FLUN4, but we would like to merge those vaccines to one type FLUN. The third sublist shows we are interested in MMR, and we want to keep it as one type.
Our main function zinb_analysis_tool
is an all-in-one function that calculates both the group and AE level RRs as well as evaluating the p values. The first three parameters of this function are discussed above, and the min_freq
and min_size
determines the criteria for scarce AEs and small AE groups. Those AEs and AE groups will be excluded in the study. However, the data of those AEs and groups will not the deleted and still be used to calculate the baseline frequency table and permutation tests.
In evaluating the p values, we usually generate a large number of permuted datasets, like 2000. However, for the purples of this vignette, we only permute the dataset for 2 times by specifying n_perm = 2
in the function. When permuting the dataset for a large number of times, one may consider the built-in parallelization in our function. For example, here we use n_copies = 2
to imply that we want to break the job into 2 pieces and do them in parallel.
The output of zinb_analysis_tool
contains group level RR s
, group level parameters r
, ‘beta’, and p
, AE level RR lambda_hat
and p values. ## Visualization
We have designed four functions to visualize the result on both AE and group levels.
The first plot one may want to look at is the group level RR heatmap. It can be generated by either plot_heatmap
, or the more interactive version plotly_heatmap
.
The usage of plotly_heatmap
is the same as plot_heatmap
, users may execute the following code on their own computer:
From the heam map, one may notice that the most significant group level RR is between FLUN and Hepatobiliary investigations. One may use the plot_lambdas
to explore the AE level RR within this AE group specificly:
Also, there is a more advanced interactive version of plot_lambdas
called plotly_lambdas