Processes big text data files in batches efficiently. For this purpose, it offers functions for splitting, parsing, tokenizing and creating a vocabulary. Moreover, it includes functions for building either a document-term matrix or a term-document matrix and extracting information from those (term-associations, most frequent terms). Lastly, it embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.

Documentation

Manual: textTinyR.pdf
Vignette: Functionality of the textTinyR package

Maintainer: Lampros Mouselimis <mouselimislampros at gmail.com>

Author(s): Lampros Mouselimis <mouselimislampros at gmail.com>

Install package and any missing dependencies by running this line in your R console:

install.packages("textTinyR")

Depends R(>= 3.2.3), Matrix
Imports Rcpp(>=0.12.5), R6, data.table, utils
Suggests testthat, covr, knitr, rmarkdown
Enhances
Linking to Rcpp, RcppArmadillo(>=0.7.5), BH
Reverse
depends
Reverse
imports
Reverse
suggests
Reverse
enhances
Reverse
linking to

Package textTinyR
Materials
URL https://github.com/mlampros/textTinyR
Task Views
Version 1.0.3
Published 2017-01-29
License GPL-3
BugReports https://github.com/mlampros/textTinyR/issues
SystemRequirements The package requires the following two components : A C++11 compiler and on a unix OS the boost-locale headers and libraries ( boost >= 1.55.0 , www.boost.org ). Debian/Ubuntu: libboost-locale-dev, Fedora : yum install boost-devel, OSX/brew : detailed installation instructions can be found in the README file
NeedsCompilation yes
Citation
CRAN checks textTinyR check results
Package source textTinyR_1.0.3.tar.gz