This vignette provides an introduction to the functions facilitating the analysis of the dependencies of CRAN packages, specifically
To obtain the information about various kinds of dependencies of a package, we can use the function
get_dep() which takes the package name and the type of dependencies as the first and second arguments, respectively. Currently, the second argument accepts
Reverse_suggests, or any variations in their letter cases, or if the underscore "_" is replaced by a space.
We only consider the 4 most common types of dependencies in R packages, namely
LinkingTo, and their reverse counterparts. For more information on different types of dependencies, see the official guidelines and https://r-pkgs.org/description.html.
As the information all dependencies of one package are on the same page on CRAN, to avoid scraping the same multiple times, we can use
get_dep_df() instead of
get_dep(). The output will be a data frame instead of a character vector.
get_dep_df("dplyr", c("imports", "LinkingTo")) #> from to type reverse #> 1 dplyr ellipsis imports FALSE #> 2 dplyr generics imports FALSE #> 3 dplyr glue imports FALSE #> 4 dplyr lifecycle imports FALSE #> 5 dplyr magrittr imports FALSE #> 6 dplyr methods imports FALSE #> 7 dplyr R6 imports FALSE #> 8 dplyr rlang imports FALSE #> 9 dplyr tibble imports FALSE #> 10 dplyr tidyselect imports FALSE #> 11 dplyr utils imports FALSE #> 12 dplyr vctrs imports FALSE
type is the type of the dependency converted to lower case. Also,
LinkingTo is now converted to
linking to for consistency. For the four reverse dependencies, the substring
"reverse_" will not be shown in
type; instead the
reverse column will be
TRUE. This can be illustrated by the following:
get_dep("abc", "depends") #>  "abc.data" "nnet" "quantreg" "MASS" "locfit" get_dep("abc", "reverse_depends") #>  "abctools" "EasyABC" get_dep_df("abc", c("depends", "reverse_depends")) #> from to type reverse #> 1 abc abc.data depends FALSE #> 2 abc nnet depends FALSE #> 3 abc quantreg depends FALSE #> 4 abc MASS depends FALSE #> 5 abc locfit depends FALSE #> 6 abc abctools depends TRUE #> 7 abc EasyABC depends TRUE
Theoretically, for each forward dependency
#> from to type reverse #> 1 A B c FALSE
there should be an equivalent reverse dependency
#> from to type reverse #> 1 B A c TRUE
type in the forward and reverse dependencies enables this to be checked easily.
To obtain all 8 types of dependencies, we can use
"all" in the second argument, instead of typing a character vector of all 8 words:
df0.abc <- get_dep_df("abc", "all") df0.abc #> from to type reverse #> 1 abc abc.data depends FALSE #> 2 abc nnet depends FALSE #> 3 abc quantreg depends FALSE #> 4 abc MASS depends FALSE #> 5 abc locfit depends FALSE #> 9 abc abctools depends TRUE #> 10 abc EasyABC depends TRUE #> 11 abc ecolottery imports TRUE #> 12 abc ouxy imports TRUE #> 14 abc coala suggests TRUE df0.rstan <- get_dep_df("rstan", "all") dplyr::count(df0.rstan, type, reverse) # all 8 types #> type reverse n #> 1 depends FALSE 2 #> 2 depends TRUE 23 #> 3 imports FALSE 10 #> 4 imports TRUE 64 #> 5 linking to FALSE 5 #> 6 linking to TRUE 52 #> 7 suggests FALSE 12 #> 8 suggests TRUE 20
As of 2020-09-11, the packages that have all 8 types of dependencies are gRbase, quanteda, rstan, sf, stochvol, xts.
To build a dependency network, we have to obtain the dependencies for multiple packages. For illustration, we choose the core packages of the tidyverse, and find out what each package
Imports. We put all the dependencies into one data frame, in which the package in the
from column imports the package in the
to column. This is essentially the edge list of the dependency network.
df0.imports <- rbind( get_dep_df("ggplot2", "Imports"), get_dep_df("dplyr", "Imports"), get_dep_df("tidyr", "Imports"), get_dep_df("readr", "Imports"), get_dep_df("purrr", "Imports"), get_dep_df("tibble", "Imports"), get_dep_df("stringr", "Imports"), get_dep_df("forcats", "Imports") ) head(df0.imports) #> from to type reverse #> 1 ggplot2 digest imports FALSE #> 2 ggplot2 glue imports FALSE #> 3 ggplot2 grDevices imports FALSE #> 4 ggplot2 grid imports FALSE #> 5 ggplot2 gtable imports FALSE #> 6 ggplot2 isoband imports FALSE tail(df0.imports) #> from to type reverse #> 59 stringr magrittr imports FALSE #> 60 stringr stringi imports FALSE #> 61 forcats ellipsis imports FALSE #> 62 forcats magrittr imports FALSE #> 63 forcats rlang imports FALSE #> 64 forcats tibble imports FALSE
With the help of the ‘igraph’ package, we can use this data frame to build a graph object that represents the dependency network.
The nature of a dependency network makes it a directed acyclic graph (DAG). We can use the ‘igraph’ function
is_dag() to check.
Note that this applies to
Depends) only due to their nature. This acyclic nature does not apply to a network of, for example,
It is possible to set a boundary on the nodes to which the edges are directed, using the function
df_to_graph(). The second argument takes in a data frame that contains the list of such nodes in the column
In the topological ordering, represented by the column
id_num, a low (high) number represents being at the front (back) of the ordering. If package A
Imports package B i.e. there is a directed edge from A to B, then A will be topologically before B. As the package ‘tibble’ doesn’t import any package but is imported by most other packages, it naturally goes to the back of the ordering. This ordering may not be unique for a DAG, and other admissible orderings can be obtained by setting
random=TRUE in the function:
We can also apply the topological sorting to the bigger dependencies network.