Version Note: Up-to-date with v0.3.0
library(psycModel)
TLDR:
1) It is a beginner-friendly R package for statistical analysis in social science.
2) Tired of manually writing all variables in a model? You can use dplyr::select() syntax for all models
3) Fitting models, plotting, checking goodness of fit, and model assumption violations all in one place.
4) Beautiful and easy-to-read output. Check out this example now.
Support models:
1. Linear regression (i.e., support ANOVA, ANCOVA), generalized linear regression.
2. Linear mixed effect model (or HLM to be more specific), generalized linear mixed effect model.
3. Confirmatory and exploratory factor analysis.
4. Simple mediation analysis.
5. Reliability analysis.
6. Correlation, descriptive statistics (e.g., mean, SD).
At its core, this package allows people to analyze their data with one simple function call. For example, when you are running a linear regression, you need to fit the model, check the goodness of fit (e.g., R2), check the model assumption, and plot the interaction (if the interaction is included). Without this package, you need several packages to do the above steps. Additionally, if you are an R beginner, you probably don’t know where to find all these R packages. This package has done all that work for you, so you can just do everything with one simple function call.
Another good example is CFA. The most common (and probably the only) option to fit a CFA in R is using lavaan. Lavaan has its own unique set of syntax. It is very versatile and powerful, but you do need to spend some time learning it. It may not worth the time for people who just want to run a quick and simple CFA model. In my package, it’s very intuitive with cfa_summary(data, x1:x3)
, and you get the model summary, the fit measures, and a nice-looking path diagram. The same logic also applies to HLM since lme4
/ nlme
also has its own set of syntax that you need to learn.
Moreover, I also made fitting the model even simpler by using the dplyr::select
syntax. In short, traditionally, if you want to fit a linear regression model, the syntax looks like this lm(y ~ x1 + x2 + x3 + x4 + ... + xn, data)
. Now, the syntax is much shorter and more intuitive: lm_model(y, x1:xn, data)
. You can even replace x1:xn
with everything()
. I also wrote this very short article that teaches people how to use the dplyr::select()
syntax (it is not comprehensive, and it is not intended to be).
Finally, I made the output in R much more beautiful and easy to read. The default output from R, to be frank, look ugly. I spent a lot of time making sure it looks good in this package (see below for examples). I am sure that you will see how big the improvement is.
integrated_model_summary
is the integrated function for linear regression and generalized linear regression. It will first fit the model using lm_model
or glm_model
, then it will pass the fitted model object to model_summary
which produces model estimates and assumption checks. If interaction terms are included, they will be passed to the relevant interaction_plot function for plotting (the package currently does not support generalized linear regression interaction plotting).
Additionally, you can request assumption_plot
and simple_slope
(default is FALSE
). By requesting assumption_plot
, it produces a panel of graphs that allow you to visually inspect the model assumption (in addition to testing it statistically). simple_slope
is another powerful way to probe further into the interaction. It shows you the slope estimate at the mean and +1/-1 SD of the mean of the moderator. For example, you hypothesized that social-economic status (SES) moderates the effect of teacher experience on education quality. Then, simple_slope shows you the slope estimate of teacher experience on education quality at +1/-1 SD and the mean level of SES. Additionally, it produces a Johnson-Newman plot that shows you at what level of the moderator that the slope_estimate is predicted to be insignificant.
integrated_model_summary(
data = iris,
response_variable = Sepal.Length,
predictor_variable = tidyselect::everything(),
two_way_interaction_factor = c(Sepal.Width, Petal.Width),
model_summary = TRUE,
interaction_plot = TRUE,
assumption_plot = TRUE,
simple_slope = TRUE,
plot_color = TRUE
)
Model Summary
Model Type = Linear regression
Outcome = Sepal.Length
Predictors = Sepal.Width, Petal.Length, Petal.Width, Species
Model Estimates
───────────────────────────────────────────────────────────────────────────────────────
Parameter Coefficient t df SE p 95% CI
───────────────────────────────────────────────────────────────────────────────────────
(Intercept) 1.609 3.858 144 0.417 0.000 *** [ 0.785, 2.433]
Sepal.Width 0.772 6.475 144 0.119 0.000 *** [ 0.536, 1.007]
Petal.Length 0.749 12.754 144 0.059 0.000 *** [ 0.633, 0.865]
Petal.Width 0.112 0.297 144 0.378 0.767 [-0.635, 0.860]
Species -0.264 -2.222 144 0.119 0.028 * [-0.499, -0.029]
Sepal.Width:Petal.Width -0.154 -1.485 144 0.103 0.140 [-0.358, 0.051]
───────────────────────────────────────────────────────────────────────────────────────
Goodness of Fit
───────────────────────────────────────────────────
AIC BIC R² R²_adjusted RMSE σ
───────────────────────────────────────────────────
82.514 103.588 0.864 0.860 0.304 0.310
───────────────────────────────────────────────────
Model Assumption Check
OK: Residuals appear to be independent and not autocorrelated (p = 0.998).
OK: residuals appear as normally distributed (p = 0.892).
Unable to check autocorrelation. Try changing na.action to na.omit.
Warning: Heteroscedasticity (non-constant error variance) detected (p = 0.047).
Warning: Severe multicolinearity detected (VIF > 10). Please inspect the following table to identify high correlation factors.
Multicollinearity Table
─────────────────────────────────────────────
Term VIF SE_factor
─────────────────────────────────────────────
Sepal.Width 4.174 2.043
Petal.Length 16.625 4.077
Petal.Width 128.506 11.336
Species 14.673 3.831
Sepal.Width:Petal.Width 90.119 9.493
─────────────────────────────────────────────
Slope Estimates at Each Level of Moderators
────────────────────────────────────────────────────────────────────
Petal.Width Level Est. S.E. t val. p 95% CI
────────────────────────────────────────────────────────────────────
Low 0.704 0.086 8.226 0.000 *** [0.535, 0.874]
Mean 0.587 0.072 8.182 0.000 *** [0.445, 0.729]
High 0.470 0.124 3.786 0.000 *** [0.225, 0.715]
────────────────────────────────────────────────────────────────────
Note: For continuous variable, low and high represent -1 and +1 SD from the mean, respectively.
This is the multilevel-variation of integrated_model_summary
. It works exactly the same way as integrated_model_summary
except you need to specify the non_random_effect_factors (i.e., level-2 factors) and the random_effect_factors (i.e., the level-1 factors) instead of predictor_variable
.
integrated_multilevel_model_summary(
data = popular,
response_variable = popular,
random_effect_factors = extrav,
non_random_effect_factors = c(sex, texp),
three_way_interaction_factor = c(extrav, sex, texp),
graph_label_name = c("popular", "extraversion", "sex", "teacher experience"), # change interaction plot label
id = class,
model_summary = TRUE,
interaction_plot = TRUE,
assumption_plot = TRUE,
simple_slope = TRUE,
plot_color = TRUE
)
[1] "psycModel is based on the package parameters. The parameters pacakge v0.14.0 has some problem. We are waiting for them to fix. This package will update again once parameters is updated. Should not be a long time"
Model Summary
Model Type = Linear Mixed Effect Model (fitted using lme4 or lmerTest)
Outcome = popular
Predictors = extrav, sex, texp, extrav:sex, extrav:texp, sex:texp, extrav:sex:texp
Model Estimates
───────────────────────────────────────────────────────
Warning
───────────────────────────────────────────────────────
Warning: Waiting for the parameters pacakge to update
───────────────────────────────────────────────────────
Goodness of Fit
──────────────────────────────────────────────────────────────────────
AIC BIC R²_conditional R²_marginal ICC RMSE σ
──────────────────────────────────────────────────────────────────────
4823.684 4890.894 0.709 0.554 0.349 0.721 0.743
──────────────────────────────────────────────────────────────────────
Model Assumption Check
OK: Model is converged
OK: No singularity is detected
Warning: Autocorrelated residuals detected (p < .001).
OK: residuals appear as normally distributed (p = 0.425).
Unable to check autocorrelation. Try changing na.action to na.omit.
OK: Error variance appears to be homoscedastic (p = 0.758).
Warning: Severe multicolinearity detected (VIF > 10). Please inspect the following table to identify high correlation factors.
Multicollinearity Table
─────────────────────────────────────
Term VIF SE_factor
─────────────────────────────────────
extrav 9.005 3.001
sex 110.249 10.500
texp 6.109 2.472
extrav:sex 95.403 9.767
extrav:texp 13.869 3.724
sex:texp 109.012 10.441
extrav:sex:texp 95.547 9.775
─────────────────────────────────────
Slope Estimates at Each Level of Moderators
────────────────────────────────────────────────────────────────────────
texp Level sex Level Est. S.E. t val. p 95% CI
────────────────────────────────────────────────────────────────────────
Low Low 0.578 0.031 18.659 0.000 *** [0.517, 0.638]
High 0.649 0.032 20.573 0.000 *** [0.587, 0.711]
Mean Low 0.429 0.024 17.604 0.000 *** [0.381, 0.476]
High 0.473 0.023 20.425 0.000 *** [0.428, 0.519]
High Low 0.280 0.036 7.769 0.000 *** [0.209, 0.350]
High 0.298 0.031 9.527 0.000 *** [0.236, 0.359]
────────────────────────────────────────────────────────────────────────
Note: For continuous variable, low and high represent -1 and +1 SD from the mean, respectively.
This can be used to compared model. All type of model comparison supported by performance::compare_performance()
are supported since this is just a wrapper for that function.
<- lm_model(
fit1 data = popular,
response_variable = popular,
predictor_var = c(sex, extrav),
quite = TRUE
)
<- lm_model(
fit2 data = popular,
response_variable = popular,
predictor_var = c(sex, extrav),
two_way_interaction_factor = c(sex, extrav),
quite = TRUE
)
compare_fit(fit1, fit2)
Model Summary
Model Type = Model Comparison
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
value
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
compare_fit is temporialy disable due to unknown error caused by insight upgrade from 0.13.2 to 0.14.0. Follow instruction on the package load message to get back all the features.
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
CFA model is fitted using lavaan::cfa()
. You can pass multiple factor (in the below example, x1, x2, x3 represent one factor, x4,x5,x6 represent another factor etc.). It will show you the fit measure, factor loading, and goodness of fit based on cut-off criteria (you should review literature for the cut-off criteria as the recommendations are subjected to changes). Additionally, it will show you a nice-looking path diagram.
cfa_summary(
data = lavaan::HolzingerSwineford1939,
:x3,
x1:x6,
x4:x9
x7 )
Model Summary
Model Type = Confirmatory Factor Analysis
Model Formula =
. DV1 =~ x1 + x2 + x3
DV2 =~ x4 + x5 + x6
DV3 =~ x7 + x8 + x9
Fit Measure
─────────────────────────────────────────────────────────────────────────────────────
Χ² DF P CFI RMSEA SRMR TLI AIC BIC BIC2
─────────────────────────────────────────────────────────────────────────────────────
85.306 24.000 0.000 *** 0.931 0.092 0.065 0.896 7517.490 7595.339 7528.739
─────────────────────────────────────────────────────────────────────────────────────
Factor Loadings
────────────────────────────────────────────────────────────────────────────────
Latent.Factor Observed.Var Std.Est SE Z P 95% CI
────────────────────────────────────────────────────────────────────────────────
DV1 x1 0.772 0.055 14.041 0.000 *** [0.664, 0.880]
x2 0.424 0.060 7.105 0.000 *** [0.307, 0.540]
x3 0.581 0.055 10.539 0.000 *** [0.473, 0.689]
DV2 x4 0.852 0.023 37.776 0.000 *** [0.807, 0.896]
x5 0.855 0.022 38.273 0.000 *** [0.811, 0.899]
x6 0.838 0.023 35.881 0.000 *** [0.792, 0.884]
DV3 x7 0.570 0.053 10.714 0.000 *** [0.465, 0.674]
x8 0.723 0.051 14.309 0.000 *** [0.624, 0.822]
x9 0.665 0.051 13.015 0.000 *** [0.565, 0.765]
────────────────────────────────────────────────────────────────────────────────
Model Covariances
──────────────────────────────────────────────────────────────
Var.1 Var.2 Est SE Z P 95% CI
──────────────────────────────────────────────────────────────
DV1 DV2 0.459 0.064 7.189 0.000 *** [0.334, 0.584]
DV1 DV3 0.471 0.073 6.461 0.000 *** [0.328, 0.613]
DV2 DV3 0.283 0.069 4.117 0.000 *** [0.148, 0.418]
──────────────────────────────────────────────────────────────
Model Variance
──────────────────────────────────────────────────────
Var Est SE Z P 95% CI
──────────────────────────────────────────────────────
x1 0.404 0.085 4.763 0.000 *** [0.238, 0.571]
x2 0.821 0.051 16.246 0.000 *** [0.722, 0.920]
x3 0.662 0.064 10.334 0.000 *** [0.537, 0.788]
x4 0.275 0.038 7.157 0.000 *** [0.200, 0.350]
x5 0.269 0.038 7.037 0.000 *** [0.194, 0.344]
x6 0.298 0.039 7.606 0.000 *** [0.221, 0.374]
x7 0.676 0.061 11.160 0.000 *** [0.557, 0.794]
x8 0.477 0.073 6.531 0.000 *** [0.334, 0.620]
x9 0.558 0.068 8.208 0.000 *** [0.425, 0.691]
DV1 1.000 0.000 NaN NaN [1.000, 1.000]
DV2 1.000 0.000 NaN NaN [1.000, 1.000]
DV3 1.000 0.000 NaN NaN [1.000, 1.000]
──────────────────────────────────────────────────────
Goodness of Fit:
Warning. Poor χ² fit (p < 0.05). It is common to get p < 0.05. Check other fit measure.
OK. Acceptable CFI fit (CFI > 0.90)
Warning. Poor RMSEA fit (RMSEA > 0.08)
OK. Good SRMR fit (SRMR < 0.08)
Warning. Poor TLI fit (TLI < 0.90)
OK. Barely acceptable factor loadings (0.4 < some loadings < 0.7)