# Subclassing oce objects

#### 2019-06-16

Abstract. This vignette explains how new classes of objects can be created, using oce objects as a base class. The advantage of this is that the newly-formed objects will automatically have important properties of oce objects, in terms of operators such as [[ and [[<-, functions such as subset() and summary(), schemes for handling units, etc. The treatment centres on the creation of a hypothetical class called wave, which will hold time-series of elevation data.

# 1 Tutorial

## 1.1 Defining a class

The setClass function is the key to defining a new class. Entering ?setClass in an R console reveals the details of this function, including some notes on how it has evolved since R version 3.0.

We need only a simple form here, with

library(oce)
wave <- setClass(Class="wave", contains="oce")

being enough to create a new class called wave that inherits the base features of the oce class.

To create a new object that inherits from the wave class, use e.g.

w <- new("wave")

## 1.2 Examining an object

We can see that w inherits from the oce class with

class(w)
#> [1] "wave"
#> attr(,"package")
#> [1] ".GlobalEnv"

and a check that is common to see in code is

inherits(w, "wave")
#> [1] TRUE

The contents of the object are revealed with

str(w)
#> Formal class 'wave' [package ".GlobalEnv"] with 3 slots
#>   ..@ metadata     :List of 2
#>   .. ..$units: list() #> .. ..$ flags: list()
#>   ..@ data         : list()
#>   ..@ processingLog:List of 2
#>   .. ..$time : POSIXct[1:1], format: "2019-06-16 14:44:31" #> .. ..$ value: chr "Create oce object"

Notice that w three oce “slots”, named metadata, data and processingLog. These are inherited from oce. The first is meant to hold information about the data, such as a file-name, an instrument number, a location of sampling, etc. The second is meant to hold actual data or measurements. And the third, not normally accessed by the user directly, holds information about the object’s evolution (note that the object considers itself an oce object). The names of these three slots are usually enough to keep them straight in the analyst’s head, although it can sometimes be difficult deciding whether something belongs in metadata or data.

The oce system also provides its objects with certain operators and functions. For example, [[ can be used to retrieve the slots, items within the slots, and (in some cases) values that may be calculated from the contents of the slots (e.g. ctd and similar objects related to hydrography can return calculated potential temperature or Conservative Temperature, even though neither is typically stored in such datasets).

That’s not the only thing. All oce objects give special powers to the [[ operator, e.g. we can retrieve the metadata slot with

w[["metadata"]]
#> $units #> list() #> #>$flags
#> list()

and the same for the data slot

w[["data"]]
#> list()

## 1.3 Adding data to objects

The [[<- operator can be used to fill the slots with information, e.g. we could insert a station location of "STN01" with

w[["metadata"]]$station <- "STN01" and verify that this worked with str(w) #> Formal class 'wave' [package ".GlobalEnv"] with 3 slots #> ..@ metadata :List of 3 #> .. ..$ units  : list()
#>   .. ..$flags : list() #> .. ..$ station: chr "STN01"
#>   ..@ data         : list()
#>   ..@ processingLog:List of 2
#>   .. ..$time : POSIXct[1:1], format: "2019-06-16 14:44:31" #> .. ..$ value: chr "Create oce object"

However, there is a better way to insert metadata, with the oceSetMetadata function, e.g.

w <- oceSetMetadata(w, "serialNumber", 1234)

sets the serial number to 1234, and

str(w[["metadata"]])
#> List of 4
#>  $units : list() #>$ flags       : list()
#>  $station : chr "STN01" #>$ serialNumber: num 1234

verifies that this worked.

Now, let’s insert some data. Imagine a half-minute dataset with 10Hz sampling, for a signal with a $$1$$m elevation wave with period $$10$$s, plus some noise of order $$1$$cm.

t <- as.POSIXct("2019-01-01 00:00:00", tz="UTC") + seq(0, 30, length.out=100)
tau <- 10
e <- sin(as.numeric(2 * pi * as.numeric(t) / tau)) + rnorm(t, sd=0.01)

(Notice that we are not using the R ts form to make this time-series. This is because oceanographic data are commonly acquired on an irregular time interval, so it makes sense to store observation time explicitly, instead of using the start/step/stop approach of the ts scheme.)

These data may be inserted into our object with

w <- oceSetData(w, "time", t)
w <- oceSetData(w, "elevation", e)

At this point the reader is likely to use str to see if this worked, but since the object is starting to fill up, it might make sense to use the summary function, which is inherited from oce.

summary(w)
#> * Time ranges from 2019-01-01 to 2019-01-01 00:00:30 with 100 samples and mean increment 0.3030303 s
#> * Data
#>
#>               Min.    Mean        Max.  Dim. OriginalName
#>     elevation -1.0054 -1.7484e-06 1.015 100  -
#>
#> * Processing Log
#>     - 2019-06-16 11:44:31 UTC: Create oce object
#>     - 2019-06-16 11:45:38 UTC: oceSetMetadata(object = w, name = "serialNumber", value = 1234)
#>     - 2019-06-16 11:45:38 UTC: oceSetData(object = w, name = "time", value = t)
#>     - 2019-06-16 11:45:38 UTC: oceSetData(object = w, name = "elevation", value = e)

This produces a useful summary not just of the data, but also of how the object was constructed. But we can do better. In some crazy world, someone might consider measuring elevation in feet, not metres, and so we ought to specify the unit. The way to do this is with the unit argument of oceSetData. This is a somewhat tricky argument, as a study of the result of ?oceSetData will reveal. For now, we just show a common way, without explanation, writing

w <- oceSetData(w, "elevation", e, unit=list(unit=expression(m),scale=""))

This over-rides the existing definition. Now, let’s look at the summary:

summary(w)
#> * Time ranges from 2019-01-01 to 2019-01-01 00:00:30 with 100 samples and mean increment 0.3030303 s
#> * Data
#>
#>                   Min.    Mean        Max.  Dim. OriginalName
#>     elevation [m] -1.0054 -1.7484e-06 1.015 100  -
#>
#> * Processing Log
#>     - 2019-06-16 11:44:31 UTC: Create oce object
#>     - 2019-06-16 11:45:38 UTC: oceSetMetadata(object = w, name = "serialNumber", value = 1234)
#>     - 2019-06-16 11:45:38 UTC: oceSetData(object = w, name = "time", value = t)
#>     - 2019-06-16 11:45:38 UTC: oceSetData(object = w, name = "elevation", value = e)
#>     - 2019-06-16 11:45:38 UTC: oceSetData(object = w, name = "elevation", value = e, unit = list(unit = expression(m),     scale = ""))

Notice that we now have a unit on the elevation, but we have an indication that the value of that quantity was defined twice. This processing-log feature is one of the big advantages of using oceSetData over direct insertion into an object.

The most common function to add is a plot function. Since plot is a built-in function, we are subclassing it. The details of doing this are provided by ?setMethod. Again, studying the documentation for that function would be worthwhile, but the gist is provided by a simple example, e.g.

setMethod(f="plot",
signature=signature("wave"),
definition=function(x, which=1, ...) {
if (which == 1) {
plot(x[["time"]], x[["elevation"]], ...)
} else if (which == 2) {
hist(x[["elevation"]], ...)
} else {
stop("which must be 1 or 2")
}
})

Here, the signature argument tells R that plot() called with a wave object as its first argument ought to use the indicated function. That function takes just two arguments: the object to be plotted, and which, an indication of the desired plot type.

For example, since which defaults to 1, we can get a popular plot with

plot(w)

Note the simplicity of this action. The user has no reason to state what kind of object this is, because R detects the type and dispatches to the specialized wave-plotting function. This may seem like a small thing in the present context, but imagine an analyst writing code to analyse a wide variety of data types: it is very convenient to have a simple function call that works for each.

Since the ... argument is passed into both the plotting methods. Thus, for example, a cleaner time-series plot might be created with

plot(w, type="l", xlab="Time [s]", ylab="Elevation [m]")

## 2.1 Initializing objects with data

The following lets the user specify time and elevation when the object is created. It also permits a specification of units, with a default being to us metres.

setMethod(f="initialize",
signature="wave",
definition=function(.Object, time, elevation, units) {
if (missing(units)) {
.Object@metadata$units <- list() if (missing(units)) .Object@metadata$units$elevation <- list(unit=expression(m), scale="") } .Object@data$time <- if (missing(time)) NULL else time
.Object@data$elevation <- if (missing(elevation)) NULL else elevation .Object@processingLog$time <- presentTime()
.Object@processingLog\$value <- "create 'wave' object"
return(.Object)
}
)

A test proves that this works as hoped for.

ww <- new("wave", time=t, elevation=e)
summary(ww)
#> * Time ranges from 2019-01-01 to 2019-01-01 00:00:30 with 100 samples and mean increment 0.3030303 s
#> * Data
#>
#>                   Min.    Mean        Max.  Dim. OriginalName
#>     elevation [m] -1.0054 -1.7484e-06 1.015 100  -
#>
#> * Processing Log
#>     - 2019-06-16 14:45:38 UTC: create 'wave' object

Notice that the units now appear, without complication to the user. Oh, and this object now knows that it is a wave object, not an oce object our little class is getting smarter by the minute!

## 2.2 Specializing the [[ operator

As in the previous section, the key of specializing how [[ works is to use setMethod(), but this time the function is named "[[". Suppose we want to permit e.g. w[["peak"]] as a way to find the maximum value of wave height. This becomes a call to the "[[" function, with first argument as w, and second argument as "peak". We can handle this with:

setMethod(f="[[",
signature(x="wave", i="ANY", j="ANY"),
definition=function(x, i, j, ...) {
if (i == "peak") {
return(max(x[["elevation"]], na.rm=TRUE))
} else {
callNextMethod()
}
}
)

(The details of the signature definition are explained in the documentation provided by ?setMethod, and readers ought to study that material before changing the signature definition.)

The important thing to focus on is the if block in the function definition. Called as e.g. w[["peak"]] causes i to equal "peak", and so the return value will be the maximum elevation. However, in all other instances, the return values is provided by callNextMethod(), and what that does is to dig one level deeper for a way to handle [[. At that deeper level, it finds the oce definition, the details of which can be found with ?"[[,oce-method".

The test

w[["peak"]]
#> [1] 1.014971
str(w[["elevation"]])
#>  num [1:100] 0.00451 0.20649 0.37584 0.53922 0.70422 ...

verifies that our new code works for getting the peak value, and that it falls back to the oce code for other calls.