corpus: Text Corpus Analysis

Text corpus data analysis, with full support for international text (Unicode). Functions for reading data from newline-delimited 'JSON' files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies, including n-grams.

Version: 0.10.2
Depends: R (≥ 3.3)
Imports: stats, utf8 (≥ 1.1.0)
Suggests: knitr, rmarkdown, Matrix, testthat
Enhances: quanteda, tm
Published: 2021-05-02
Author: Leslie Huang [cre, ctb], Patrick O. Perry [aut, cph], Finn Årup Nielsen [cph, dtc] (AFINN Sentiment Lexicon), Martin Porter and Richard Boulton [ctb, cph, dtc] (Snowball Stemmer and Stopword Lists), The Regents of the University of California [ctb, cph] (Strtod Library Procedure), Carlo Strapparava and Alessandro Valitutti [cph, dtc] (WordNet-Affect Lexicon), Unicode, Inc. [cph, dtc] (Unicode Character Database)
Maintainer: Leslie Huang <lesliehuang at>
License: Apache License (== 2.0) | file LICENSE
NeedsCompilation: yes
CRAN checks: corpus results


Reference manual: corpus.pdf
Vignettes: Chinese text handling
Introduction to corpus
Stemming Words
Text data in Corpus and other packages


Package source: corpus_0.10.2.tar.gz
Windows binaries: r-devel:, r-devel-UCRT:, r-release:, r-oldrel:
macOS binaries: r-release (arm64): corpus_0.10.2.tgz, r-release (x86_64): corpus_0.10.2.tgz, r-oldrel: corpus_0.10.2.tgz
Old sources: corpus archive

Reverse dependencies:

Reverse imports: GenEst, stylest


Please use the canonical form to link to this page.