This short tutorial presents some of the possible genetic architectures that one could simulate, but it certainly does not explore all the possibilities. For more information on specific input parameters, please check the help documentation (?create_phenotypes).

Note that the example dataset is already numericalized. HapMap format may also be used with options geno_obj, genotypes_file or geno_path.

library(simplePHENOTYPES)
data("SNP55K_maize282_maf04")
SNP55K_maize282_maf04[1:8, 1:10]

# 2 Single Trait

Simulating ten replicates of a single trait with a heritability of 0.7. In this setting, the simulated trait is controlled by one large-effect QTN (additive effect of 0.9) and two small effect QTNs with additive effects following a geometric series starting with 0.2; thus the effect size of the first of these two QTNs is 0.2, and the effect size of the second is 0.2^2. Results are being saved at a temporary directory (home_dir = tempdir()). Please see help files (?create_phenotypes) to see which default values are being used.

create_phenotypes(
geno_obj = SNP55K_maize282_maf04,
rep = 10,
h2 = 0.7,
model = "A",
home_dir = tempdir())

# 3 Multiple Traits: Pleiotropic Architecture

Simulating three traits ntraits = 3 controlled by the same three additive QTN and four dominance QTN. The effect size of the largest-effect additive QTN is 0.3 for all traits, while the additive effect sizes are 0.04, 0.2 and 0.1 for each trait, respectively. Heritability for trait_1 is 0.2, while the heritability of the two correlated traits is 0.4. Each replicate is being recorded in a different file (output_format = "multi-file") in a folder named “Results_Pleiotropic”. The correlation between traits is not specified by the user; instead, the observed correlation is an artifact of different allelic effects for each trait. The same QTNs are used to generate phenotypes in all 10 replications (vary_QTN = FALSE); alternatively, one could select different QTNs in each replicate using vary_QTN = TRUE (default). The vector add_effect contains one allelic effect for each trait and a geometric series (default) will be used to generate allelic effects for each one of the three additive QTNs (add_QTN_num = 3) and four dominance QTNs (dom_QTN_num = 4).

 test1 <-  create_phenotypes(
geno_obj = SNP55K_maize282_maf04,
dom_QTN_num = 4,
h2 = c(0.2, 0.4, 0.4),
dom_effect = c(0.04,0.2,0.1),
ntraits = 3,
rep = 10,
vary_QTN = FALSE,
output_format = "multi-file",
architecture = "pleiotropic",
output_dir = "Results_Pleiotropic",
to_r = TRUE,
seed = 10,
sim_method = "geometric",
home_dir = tempdir()
)

Optionally, users may input a list of allelic effects (sim_method = "custom"). In the example below, a geometric series (custom_geometric) is being assigned and should generate the same simulated data as the previous example (all.equal(test1, test2)). Notice that since big_add_QTN_effect is provided, the user only needs to provide effects for two out of the three additive QTNs being simulated. On the other hand, all four dominance QTN must have an effect assigned on the custom_geometric_d list.

 custom_geometric_a <- list(trait_1 = c(0.04, 0.0016),
trait_2 = c(0.2, 0.04),
trait_3 = c(0.1, 0.01))
custom_geometric_d <- list(trait_1 = c(0.04, 0.0016, 6.4e-05, 2.56e-06),
trait_2 = c(0.2, 0.04, 0.008, 0.0016),
trait_3 = c(0.1, 0.01, 0.001, 1e-04))

test2 <-  create_phenotypes(
geno_obj = SNP55K_maize282_maf04,
dom_QTN_num = 4,
h2 = c(0.2,0.4, 0.4),
dom_effect = custom_geometric_d,
ntraits = 3,
rep = 10,
vary_QTN = FALSE,
output_format = "multi-file",
architecture = "pleiotropic",
output_dir = "Results_Pleiotropic",
to_r = T,
sim_method = "custom",
seed = 10,
home_dir = tempdir()
)

all.equal(test1, test2)

# 4 Multiple Traits: Partially Pleiotropic Architecture

Simulating 20 replicates of three traits, which are respectively controlled by seven, 13 and four QTNs. The first of the three pleiotropic QTNs (pleio_a = 3) will have a large additive QTN effect of 0.9. All other QTNs will have additive effects that follow a geometric series, where the effect size of the i^th QTN has an effect size of 0.3^i. Correlation among traits is assigned to be equal to the cor_matrix object. All 20 replicates of these three simulated traits will be saved in one file, specifically in a long format and with an additional column named “Rep”. Results will be saved in a directory called “Results_Partially”. In this example, the genotype file will be saved in numeric format. Additionally, all phenotypes will be assigned to an object called “sim_results” (to_r = TRUE) and loaded into R.

cor_matrix <- matrix(c(   1, 0.3, -0.9,
0.3,   1,  -0.5,
-0.9, -0.5,    1 ), 3)

sim_results <- create_phenotypes(
geno_obj = SNP55K_maize282_maf04,
ntraits = 3,
pleio_a = 3,
pleio_e = 2,
degree_of_dom = 0.5,
trait_spec_a_QTN_num = c(4, 10, 1),
trait_spec_e_QTN_num = c(3, 2, 5),
h2 = c(0.2, 0.4, 0.8),
epi_effect = c(0.3, 0.3, 0.3),
cor = cor_matrix,
rep = 20,
output_dir = "Results_Partially",
output_format = "long",
architecture = "partially",
out_geno = "numeric",
to_r = TRUE,
model = "AED",
home_dir = tempdir()
)

# 5 Multiple Traits: Linkage Disequilibrium Architecture

Simulating five replicates of two linked traits controlled by three additive QTNs each. For each QTN, a marker is first selected, and then two separate markers that have an r^2 of at most 0.8 (ld=0.8) with the selected marker will be the respective QTNs of the two traits. The three QTNs will have additive effects that follow a geometric series, where the effect size of the i^th QTN is 0.02^i for one trait and 0.05^i for the other trait. Starting seed number is 200 and output phenotypes are saved in one file, but in a “wide” format with each replicate of two traits being added as additional columns. Plink fam, bim and bed files are also saved at Results_LD.

create_phenotypes(
geno_obj = SNP55K_maize282_maf04,
h2 = c(0.2, 0.4),
rep = 5,
seed = 200,
output_format = "wide",
architecture = "LD",
output_dir = "Results_LD",
ld=0.8,
model = "A",
home_dir = tempdir()
)

# 6 Multiple Traits: Partially Pleiotropic Architecture with Epistatic effects

Simulating 20 replicates of three traits. In each replicate, different SNPs are selected to be the QTNs for each trait. These traits are controlled by three pleiotropic additive QTNs (pleio = 3); two pleiotropic epistatic QTNs (pleio_e = 2); four, ten and one additive trait-specific QTN (specific_QTN_number = c(4,10, 1)); and two, one and five epistatic trait-specific QTNs (trait_specific_e_QTN_number = c(2,1, 5)). Results will be saved as “.fam” files used as gemma input; thus each replicate trait that is simulated will have its own unique output file.

create_phenotypes(
geno_obj = SNP55K_maize282_maf04,
pleio_a = 3,
pleio_e = 2,
trait_spec_a_QTN_num = c(4,10, 1),
trait_spec_e_QTN_num = c(2,1, 5),
epi_effect = c(0.01, 0.4, 0.2),
h2 = c(0.2,0.4, 0.8),
ntraits = 3,
rep = 20,
vary_QTN = TRUE,
output_dir = "Results_Partially_E",
output_format = "gemma",
architecture = "partially",
model = "AE",
home_dir = tempdir()
)

# 7 Using your own data

If files are saved by chromosome, they can be read directly into create_phenotypes using options geno_path (recommendation: consider having all marker data files in a separate folder). If multiple files are saved in the same folder as the marker data, the parameter prefix might be used to select only marker data. For example, if your data is saved as “WGS_chrm_1.hmp.txt”, …, “WGS_chrm_10.hmp.txt”, one would use prefix = "WGS_chrm_" .

create_phenotypes(
geno_path = "PATH/TO/FILE",
prefix = "WGS_chrm_",
h2 = 0.2,
rep = 5,
seed = 200,
output_format = "gemma",
output_dir = "Results",
)