The package name
zGPS.AO stands for Zero Inflated Gamma Poisson Shrinker with AE ontology, and in this vignettes, we will introduce how to use this package’s main function
zinb_analysis_tool, by applying it to a real dataset, the VAERS dataset.
The VAERS dataset is built in our package, and here’s what the data looks like:
data(vaers_data) dim(vaers_data) #>  1017599 3 vaers_data[1:10,] #> # A tibble: 10 x 3 #> ID VAX_TYPE AE_NAME #> <dbl> <chr> <chr> #> 1 223972 DTAP Swelling #> 2 223972 DTAP Feeling hot #> 3 223972 DTAP Erythema #> 4 223972 DTAP Tenderness #> 5 231988 VARCEL Pruritus #> 6 231988 VARCEL Urticaria #> 7 231988 MMR Pruritus #> 8 231988 MMR Urticaria #> 9 231988 HEPA Pruritus #> 10 231988 HEPA Urticaria
This data contains information about report ids, and all the vaccine-AE combinations in each reports. For example, in this dataset, report 223972 contains one vaccine DTAP, and four AEs.
We are interested in identifying Relative Risks (RR) between vaccines and AEs. However, we are not only interested in the RR between vaccines and individual AEs, also we investigate the RR of vaccines and AE groups. The number of reports for a single vaccine and a single AE is too small and may have too much randomness. However, if we combine similar AEs into groups, and invest the vaccine-AE group associations, we could get more reliable results.
Here’s an example of group structure data from Medical Dictionary for Regulatory Activities (MedDRA) dataset.
data("dd.meddra") dim(dd.meddra) #>  12482 2 dd.meddra[1:10,] #> AE_NAME GROUP_NAME #> 1 Anaemia Anaemias nonhaemolytic and marrow depression #> 2 Anaemia folate deficiency Anaemias nonhaemolytic and marrow depression #> 3 Anaemia macrocytic Anaemias nonhaemolytic and marrow depression #> 4 Anaemia megaloblastic Anaemias nonhaemolytic and marrow depression #> 5 Anaemia neonatal Anaemias nonhaemolytic and marrow depression #> 6 Anaemia of chronic disease Anaemias nonhaemolytic and marrow depression #> 7 Anaemia vitamin B12 deficiency Anaemias nonhaemolytic and marrow depression #> 8 Aplasia pure red cell Anaemias nonhaemolytic and marrow depression #> 9 Aplastic anaemia Anaemias nonhaemolytic and marrow depression #> 10 Babesiosis Anaemias nonhaemolytic and marrow depression
The group data used in this package is a data.frame containing two columns,
GROUP_NAME, specifying which AEs belong to which AE groups. It doesn’t hurt if there are some AEs belongs to more than one groups, but it is recommended to have non-overlapping groups. Those AE names in the report data should appear in the group data.
Reports of all vaccines will be used to do the analysis, but our model requires users to specify vaccines of their interest. Here is an example of how to specify them by an special list
merge_list (will be used in the main function later):
data("merge_list") merge_list[1:3] #> [] #> [][] #>  "FLUN3" "FLUN4" #> #> [][] #>  "FLUN" #> #> #> [] #> [][] #>  "FLU3" "FLU4" #> #> [][] #>  "FLU" #> #> #> [] #> [][] #>  "MMR" #> #> [][] #>  "MMR"
merge_list is a list containing sublists. For example, The first sublist here shows we are interested in FLUN3, FLUN4, but we would like to merge those vaccines to one type FLUN. The third sublist shows we are interested in MMR, and we want to keep it as one type.
Our main function
zinb_analysis_tool is an all-in-one function that calculates both the group and AE level RRs as well as evaluating the p values. The first three parameters of this function are discussed above, and the
min_size determines the criteria for scarce AEs and small AE groups. Those AEs and AE groups will be excluded in the study. However, the data of those AEs and groups will not the deleted and still be used to calculate the baseline frequency table and permutation tests.
In evaluating the p values, we usually generate a large number of permuted datasets, like 2000. However, for the purples of this vignette, we only permute the dataset for 2 times by specifying
n_perm = 2 in the function. When permuting the dataset for a large number of times, one may consider the built-in parallelization in our function. For example, here we use
n_copies = 2 to imply that we want to break the job into 2 pieces and do them in parallel.
The output of
zinb_analysis_tool contains group level RR
s, group level parameters
r, ‘beta’, and
p, AE level RR
lambda_hat and p values. ## Visualization
We have designed four functions to visualize the result on both AE and group levels.
The first plot one may want to look at is the group level RR heatmap. It can be generated by either
plot_heatmap, or the more interactive version
The usage of
plotly_heatmap is the same as
plot_heatmap, users may execute the following code on their own computer:
From the heam map, one may notice that the most significant group level RR is between FLUN and Hepatobiliary investigations. One may use the
plot_lambdas to explore the AE level RR within this AE group specificly:
Also, there is a more advanced interactive version of