`jrt`

This package provides user-friendly functions designed for the easy implementation of Item-Response Theory (IRT) models and scoring with judgment data. Although it can be used in a variety of contexts, the original motivation for implementation is to facilitate use for creativity researchers.

`jrt`

is not an estimation package, it provides wrapper functions that call estimation packages and extract/report/plot information from them. At this stage, `jrt`

uses the (excellent) package `mirt`

(Chalmers, 2012) as its only IRT engine. Thus, if you use `jrt`

for your research, please ensure to cite `mirt`

(https://www.jstatsoft.org/article/view/v048i06) as the estimation package/engine.

- Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment.
*Journal of Statistical Software, 48*(6), 1–29. http://dx.doi.org/10.18637/jss.v048.i06

We also encourage that you cite `jrt`

– especially if you use the plots or the automatic model selection. Currently, this would be done with:

- Myszkowski, N., & Storme, M. (2019). Judge response theory? A call to upgrade our psychometrical account of creativity judgments.
*Psychology of Aesthetics, Creativity, and the Arts, 13*(2), 167-175. http://dx.doi.org/10.1037/aca0000225

Ok now let’s get started…

Then, a judgment `data.frame`

would be provided to the function `jrt`

. Here we’ll use the simulated one in `jrt::ratings`

.

`data <- jrt::ratings`

It looks like this:

```
head(data)
#> Judge_1 Judge_2 Judge_3 Judge_4 Judge_5 Judge_6
#> 1 5 4 3 4 4 4
#> 2 3 3 2 3 2 2
#> 3 3 3 3 3 3 2
#> 4 3 2 2 3 4 2
#> 5 2 3 1 2 2 1
#> 6 3 2 2 3 2 1
```

`jrt`

is in development and these features will hopefully appear soon (check back !), but in this release:

- Your data should be ordinal/polytomous exclusively (although the plotting functions also work with mirt fitted binary and nominal models)
- Your data should be assumed unidimensional (one latent ability predicts the judgments)
- Your judgments should be assumed conditionnally independent (the judgements are only related to one another because they are explained by the same latent)
- Your data should not include impossible values (so check that first)
- Your data only has 2 facets (e.g. products by judges)

I know, that’s a lot that you can’t do…but this covers the typical cases, at least for the Consensual Assessment Technique – which is why it was originally created.

`jrt()`

You will first want to first load the library.

```
library(jrt)
#> Loading required package: directlabels
```

The main function of the `jrt`

package is `jrt()`

. By default, this function will:

- Fit the most common and available IRT models for ordinal data
- Select automatically the best fitting model (based on an information criterion, by default the AIC corrected)
- Report a lot of useful indices of reliability (from IRT, CTT and the inter-rater reliability literature) and plot the Judge Category Curves and Total Information Function plot (which shows the levels of \(\theta\) at which the set of judgements is the most informative/reliable) – we’ll see how to customize them later!
- Make the factor scores and standard errors readily accessible in the
`@factor.scores`

(or`@output.data`

) slot of the`jrt`

object.

Let’s do it!

- Select, fit and return stats with information function. We’re storing the output in an object (
`fit`

) to do more after. Note: There’s a progress bar by default, but it takes space in the vignette, so I’ll remove it here with`progress.bar = F`

.

```
fit <- jrt(data, progress.bar = F)
#> The possible responses detected are: 1-2-3-4-5
#>
#> -== Model Selection (6 judges) ==-
#> AICc for Rating Scale Model: 4414.924 | Model weight: 0.000
#> AICc for Generalized Rating Scale Model: 4370.699 | Model weight: 0.000
#> AICc for Partial Credit Model: 4027.701 | Model weight: 0.000
#> AICc for Generalized Partial Credit Model: 4021.567 | Model weight: 0.000
#> AICc for Constrained Graded Rating Scale Model: 4400.553 | Model weight: 0.000
#> AICc for Graded Rating Scale Model: 4310.307 | Model weight: 0.000
#> AICc for Constrained Graded Response Model: 4003.993 | Model weight: 0.859
#> AICc for Graded Response Model: 4007.604 | Model weight: 0.141
#> -> The best fitting model is the Constrained Graded Response Model.
#>
#> -== General Summary ==-
#> - 6 Judges
#> - 300 Products
#> - 5 response categories (1-2-3-4-5)
#> - Mean judgment = 2.977 | SD = 0.862
#>
#> -== IRT Summary ==-
#> - Model: Constrained (equal slopes) Graded Response Model (Samejima, 1969) | doi: 10.1007/BF03372160
#> - Estimation package: mirt (Chalmers, 2012) | doi: 10.18637/jss.v048.i06
#> - Estimation algorithm: Expectation-Maximization (EM; Bock & Atkin, 1981) | doi: 10.1007/BF02293801
#> - Method of factor scoring: Expected A Posteriori (EAP)
#> - AIC = 3999.249 | AICc = 4003.993 | BIC = 4091.843 | SABIC = 3999.249
#>
#> -== Model-based reliability ==-
#> - Empirical reliability | Average in the sample: .893
#> - Expected reliability | Assumes a Normal(0,1) prior density: .894
```

Of course there’s more available here than one would report. If using IRT scoring (which is the main purpose of this package), we recommend reporting what IRT model was selected, along with IRT indices primarily, since the scoring is based on the estimation of the \(\theta\) abilities. In this case typically what is reported in the empirical reliability (here 0.893), which is the estimate of the reliability of the observations in the sample. It can be interpreted similarily as other more traditionnal indices of reliability (like Cronbach’s \(\alpha\)).

- Doing the same thing without messages

`fit <- jrt(data, silent = T)`

- Selecting the model a priori

One may of course select a model based on assumptions on the data rather than on model fit comparisons. This is done through using the name of a model as an imput of the argument `irt.model`

of the `jrt()`

function. This bypasses the automatic model selection stage.

```
fit <- jrt(data, "PCM")
#> The possible responses detected are: 1-2-3-4-5
#>
#> -== General Summary ==-
#> - 6 Judges
#> - 300 Products
#> - 5 response categories (1-2-3-4-5)
#> - Mean judgment = 2.977 | SD = 0.862
#>
#> -== IRT Summary ==-
#> - Model: Partial Credit Model (Masters, 1982) | doi: 10.1007/BF02296272
#> - Estimation package: mirt (Chalmers, 2012) | doi: 10.18637/jss.v048.i06
#> - Estimation algorithm: Expectation-Maximization (EM; Bock & Atkin, 1981) | doi: 10.1007/BF02293801
#> - Method of factor scoring: Expected A Posteriori (EAP)
#> - AIC = 4022.957 | AICc = 4027.701 | BIC = 4115.551 | SABIC = 4022.957
#>
#> -== Model-based reliability ==-
#> - Empirical reliability | Average in the sample: .889
#> - Expected reliability | Assumes a Normal(0,1) prior density: .759
```

See the documentation for a list of available models. Most models are directly those of `mirt`

. Others are versions of the Graded Response Model or Generalized Partial Credit Model that are constrained in various ways (equal discriminations and/or equal category structures) through the `mirt.model()`

function of `mirt`

.

Note that they can also be called by their full names (e.g. `jrt(data, "Graded Response Model")`

).

- Extract the factor scores with
`@factor.scores`

.

```
head(fit@factor.scores)
#> Judgments.Factor.Score Judgments.Standard.Error Judgments.Mean.Score
#> 1 1.7072898 0.5824312 4.000000
#> 2 -0.7214498 0.5581569 2.500000
#> 3 -0.1529127 0.5119362 2.833333
#> 4 -0.4247967 0.5319672 2.666667
#> 5 -2.2557524 0.6720093 1.833333
#> 6 -1.4155774 0.6202465 2.166667
```

Note : If you want a more complete output with the original data, use `@output.data`

. If there were missing data, `@output.data`

also appends imputed data.

```
head(fit@output.data)
#> Judge_1 Judge_2 Judge_3 Judge_4 Judge_5 Judge_6 Judgments.Factor.Score
#> 1 5 4 3 4 4 4 1.7072898
#> 2 3 3 2 3 2 2 -0.7214498
#> 3 3 3 3 3 3 2 -0.1529127
#> 4 3 2 2 3 4 2 -0.4247967
#> 5 2 3 1 2 2 1 -2.2557524
#> 6 3 2 2 3 2 1 -1.4155774
#> Judgments.Standard.Error Judgments.Mean.Score
#> 1 0.5824312 4.000000
#> 2 0.5581569 2.500000
#> 3 0.5119362 2.833333
#> 4 0.5319672 2.666667
#> 5 0.6720093 1.833333
#> 6 0.6202465 2.166667
```

Judge characteristics can be inspected with Judge Category Curve (JCC) plots. They are computed with the function `jcc.plot()`

.

A basic example for Judge 3…

`jcc.plot(fit, judge = 3)`

Now of course, there are many options, but a few things that you could try:

- Plot the category curves of all judges by using
`judge = "all"`

or simply removing the`judge`

argument (note that you can change the number of columns or rows, see the documentation for these advanced options).

`jcc.plot(fit)`

- Plot the category curves of a vector of judges by providing a vector of judge numbers. For example here for judges 1 and 6.

`jcc.plot(fit, judge = c(1,6))`

- Change the layout by providing a number of columns or rows desired (not both, they may conflict):

`jcc.plot(fit, facet.cols = 2)`

- Plot the category curves in black and white with
`greyscale = TRUE`

…

`jcc.plot(fit, 1, greyscale = T)`