This document gives an overview of the functionality provided by the R package
Age-Period-Cohort (APC) analysis is used to disentangle observed trends (e.g. of social, economic, medical or epidemiological data) to enable conclusions about the developments over three temporal dimensions:
The critical challenge in APC analysis is that these main components are linearly dependent: \[ cohort = period - age \]
Accordingly, flexible methods and visualization techniques are needed to properly disentagle observed temporal association structures. The
APCtools package comprises different methods that tackle this problem and aims to cover all steps of an APC analysis. This includes state-of-the-art descriptive visualizations as well as visualization and summary functions based on the estimation of a generalized additive regression model (GAM). The main functionalities of the package are highlighted in the following.
Before we start, let’s load the relevant packages for the following analyses.
library(APCtools) library(dplyr) # general data handling library(mgcv) # estimation of generalized additive regression models (GAMs) library(ggplot2) # data visualization library(ggpubr) # arranging multiple ggplots in a grid with ggarrange() # set the global theme of all plots theme_set(theme_minimal())
APC analyses require long-term panel or repeated cross-sectional data. The package includes two exemplary datasets on the travel behavior of German tourists (dataset
travel) and the number of unintentional drug overdose deaths in the United States (
drug_deaths). See the respective help pages
?drug_deaths for details.
In the following, we will use the
travel dataset to investigate if travel distances of the main trip of German travelers mainly change over the life cycle of a person (age effect), macro-level developments like decreasing air travel prices in the last decades (period effect) or the generational membership of a person, which is shaped by similar socialization and historical experiences (cohort effect).
Different functions are available for descriptively visualizing observed structures. This includes plots for the marginal distribution of some variable of interest, 1D plots for the development of some variable over age, period or cohort, as well as density matrices that visualize the development over all temporal dimensions.
The marginal distribution of a variable can be visualized using
plot_density. Metric variables can be plotted using a density plot or a boxplot, while categorical variables can be plotted using a bar chart.
gg1 <- plot_density(dat = travel, y_var = "mainTrip_distance", log_scale = TRUE) gg2 <- plot_density(dat = travel, y_var = "mainTrip_distance", log_scale = TRUE, plot_type = "boxplot") gg3 <- plot_density(dat = travel, y_var = "household_size") ggpubr::ggarrange(gg1, gg2, gg3, nrow = 1)
Plotting the distribution of a variable against age, period or cohort is possible with function
plot_variable. The distribution of metric and categorical variables is visualized using boxplots or line charts (see argument
plot_type) and bar charts, respectively. The latter by default show relative frequencies, but can be changed to show absolute numbers by specifying argument
geomBar_position = "stack".
plot_variable(dat = travel, y_var = "mainTrip_distance", apc_dimension = "period", plot_type = "line", ylim = c(0,1000))
plot_variable(dat = travel, y_var = "household_size", apc_dimension = "period")
To include all temporal dimensions in one plot,
APCtools contains function
plot_densityMatrix. In Weigert et al. (2021), this plot type was referred to as ridgeline matrix when plotting multiple density plots for a metric variable. The basic principle of a density matrix is to (i) visualize two of the temporal dimensions on the x- and y-axis (specified using the argument
dimensions), s.t. the third temporal dimension is represented on the diagonals of the matrix, and (ii) to categorize the respective variables on the x- and y-axis in meaningful groups. The function then creates a grid, where each cell contains the distribution of the selected
y_var variable in the respective category.
By default, age and period are depicted on the x- and y-axis, respectively, and cohort on the diagonals. The categorization is defined by specifying two of the arguments
age_groups <- list(c(80,89),c(70,79),c(60,69),c(50,59), c(40,49),c(30,39),c(20,29)) period_groups <- list(c(1971,1979),c(1980,1989),c(1990,1999), c(2000,2009),c(2010,2018)) plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", age_groups = age_groups, period_groups = period_groups, log_scale = TRUE)
To highlight the effect of the variable depicted on the diagonal (here: cohort), different diagonals can be highlighted using argument
plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", age_groups = age_groups, period_groups = period_groups, highlight_diagonals = list("born 1950 - 1959" = 8, "born 1970 - 1979" = 10), log_scale = TRUE)