Dates are often messy. Whether historical (or ancient), future, or even recent, we often only know approximately when an event occurred, that it happened within a particular period, an unreliable source means a date should be flagged as uncertain, or sources offer multiple, competing dates.
As researchers, we often recognise this messiness but are forced to force nonexistent precision on data so we can proceed with analysis. For example, if we only know something happened in a given month or year, we might just opt for the start of that month (e.g. 2021-07-01) or year (2021-01-01), assuming that to err on the earlier (or later) side is a justifiable bias. However, this can create issues for inference in which sequence or timing is important. The goal of {messydates}
is to help with this problem by retaining and working with various kinds of date imprecision.
{messydates}
implements the extended annotation standard for dates, the Extended Date/Time Format (EDTF), outlined in ISO 8601-2_2019(E) for R. These include standardised notation for:
2012-XX-01
for the first of some unknown month in 2012 or 2012-01
for some unknown day in January 20122012-01-12~
for approximately the 12th of January 20122012-01-12?
where this data point is based on an unreliable source{2012-01-01,2012-01-12}
where the date can be both 1 January 2012 and 12 January 20122012-01-01..2012-01-12
for all dates between the 1 January 2012 and 12 January 2012 inclusive{messydates}
contains a set of tools for constructing and coercing into and from the ‘messydt’ class. This date class allows regular dates to be annotated to express unspecified date components, approximate or uncertain date components, date ranges, and sets of dates.
Importantly, the package also includes a function for unpacking or expanding sets or ranges of dates into all dates consistent with how the date or set of dates is specified or annotated. Methods are also offered that can be used to make explicit how researchers convert date imprecision into precise dates for analysis, such as getting the min()
, max()
, or even a random()
date from among the dates consistent with a set or range of dates. This greatly facilitates research transparency as well as robustness checks.
The easiest way to install {messydates}
is directly from CRAN:
However, you may also install the development version from GitHub.
Please see the messydates website for more information about how to use {messydates}
. The package was developed as part of the PANARCHIC project, which studies the effects of network and power on how quickly states join, reform, or create international institutions by examining the historical dynamics of institutional networks from different domains. The PANARCHIC project is funded by the Swiss National Science Foundation (SNSF). For more information on the Geneva Global Governance Observatory and our current projects, please see our website or our Github page.
{messydates}
offers a new date class, but one that comes with methods for converting from and into common date classes such as Date
, POSIXct
, and POSIXlt
. It is thus fully compatible with packages such as {lubridate}
and {anydate}
. It is also compatible with {unstruwwel}
, which also parses historic dates in R, though the emphasis of our package is on working with these dates.