Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Maintainer: Lincoln Mullen <lincoln at lincolnmullen.com>

Author(s): Lincoln Mullen*

Install package and any missing dependencies by running this line in your R console:

install.packages("textreuse")

Depends R (>= 3.1.1)
Imports assertthat(>=0.1), digest(>=0.6.8), dplyr(>=0.4.3), NLP(>=0.1.8), Rcpp(>=0.12.0), RcppProgress(>=0.1), stringr(>=1.0.0), tidyr(>=0.3.1)
Suggests testthat(>=0.11.0), knitr(>=1.11), rmarkdown(>=0.8), covr
Enhances
Linking to BH, Rcpp, RcppProgress
Reverse
depends
Reverse
imports
Reverse
suggests
Reverse
enhances
Reverse
linking to

Package textreuse
Materials
URL https://github.com/ropensci/textreuse
Task Views NaturalLanguageProcessing
Version 0.1.4
Published 2016-11-28
License MIT + file LICENSE
BugReports https://github.com/ropensci/textreuse/issues
SystemRequirements
NeedsCompilation yes
Citation
CRAN checks textreuse check results
Package source textreuse_0.1.4.tar.gz