HTML tables are a valuable data source but extracting and recasting these data into a useful format can be tedious. This package allows to collect structured information from HTML tables. It is similar to readHTMLTable() of the XML package but provides three major advantages. First, the function automatically expands row and column spans in the header and body cells. Second, users are given more control over the identification of header and body rows which will end up in the R table, including semantic header information that appear throughout the body. Third, the function preprocesses table code, corrects common types of malformations, removes unneeded parts and so helps to alleviate the need for tedious post-processing.

Documentation

Manual: htmltab.pdf
Vignette: htmltab case studies

Maintainer: Christian Rubba <christian.rubba at gmail.com>

Author(s): Christian Rubba*

Install package and any missing dependencies by running this line in your R console:

install.packages("htmltab")

Depends R (>= 3.0.0)
Imports XML(>=3.98.1.3), httr(>=1.0.0)
Suggests testthat, knitr, tidyr
Enhances
Linking to
Reverse
depends
Reverse
imports
Reverse
suggests
installr
Reverse
enhances
Reverse
linking to

Package htmltab
Materials
URL https://github.com/crubba/htmltab
Task Views WebTechnologies
Version 0.7.1
Published 2016-12-29
License MIT + file LICENSE
BugReports https://github.com/crubba/htmltab/issues
SystemRequirements
NeedsCompilation no
Citation
CRAN checks htmltab check results
Package source htmltab_0.7.1.tar.gz