Cypripedium candidum raw MPMs

Richard P. Shefferson, Shun Kurokawa, and Johan Ehrlén

This document was built in Markdown in R 4.0.3, and covers package lefko3 version 3.1.0.



In this vignette, we will focus on a demographic dataset for a North American population of the white lady’s slipper, Cypripedium candidum. This species is an herbaceous perennial in the orchid family, and is very long-lived. It is also of conservation concern, and the population is located within a state nature preserve located in northeastern Illinois, USA. The population was monitored annually from 2004 to 2009, with two monitoring sessions per year. More information about this population and its characteristics is given in Shefferson et al. (2001) and Shefferson et al. (2017).

Population matrix projection modeling requires an appropriate life history model showing how all stages and transitions are related. The figure below shows a very general life history model detailing these relationships in Cypripedium candidum. The first stage of life is a dormant seed stage, although an individual may germinate in the year following seed production. The first germinated stage is a protocorm, which is an underground, mycoheterotrophic stage unique to the families Orchidaceae and Pyrolaceae. There are three years of protocorm stages, followed by a seedling stage, and finally a set of stages that comprise the size-classified adult portion of life. The figure shows 49 such stages, each for a different number of stems (including 0 for vegetative dormancy) and one of two reproductive statuses. These stages may be compressed for different circumstances (more on this later).

Figure 1. Life history model of Cypripedium candidum.

We can see a variety of transitions within this figure. The juvenile stages have fairly simple transitions. New recruits may enter the population directly from germination of a seed produced the previous year, in which case they start in the protocorm 1 stage, or they may begin as dormant seed. Dormant seed may remain dormant, die, or germinate into the protocorm 1 stage. Protocorms exist for up to 3 years, yielding the protocorm 1, 2, and 3 stages, without any possibility of staying within each of these stages for more than a single year. Protocorm 3 leads to a seedling stage, in which the plant may persist for many years before becoming mature. Here, maturity does not really refer to reproduction per se, but rather to a morphology indistinguishable from a reproductive plant except for the lack of a flower. The first mature stage is usually either vegetative dormancy (dorm), during which time the plant does not sprout, or a small, non-flowering adult (1V). Once in this portion of the life history, the plant may transition among 49 mature stages, including vegetative dormancy, 1-24 shoots without flowers, or 1-24 shoots with at least one flower.

The horizontal dataset cypdata, and the ahistorical vertical dataset cypvert which is the same as cypdata but is structured differently, both include only data for the adult stages, and so later we will need to set juvenile transitions to constants.


We will analyze these data in two different ways to illustrate the utility of package lefko3:

  1. through the estimation of raw MPMs using a simplified life history; and

  2. through the estimation of function-based MPMs using a count-based size metric and the general life history model shown above.

We will not estimate an IPM because size is measured as a count variable in this case.

In this vignette, we will focus on analysis (1).

Analysis 1. Raw matrix estimation

In this example, we will create raw matrices with these data. Here, we use the term ‘raw’ to refer to the fact that we will estimate matrix elements as exact proportions of individuals surviving and transitioning to different stages. This requires us to develop a life history model that is both biologically realistic, statistically meaningful, and parsimonious. The first requirement means that stages need to be defined in biologically meaningful ways. The second requirement means that stages should correlate strongly with the underlying demography. The final requirement means that we need to design our life stages in such a way that most years include some individuals in each stage. We also need to consider the fact that very low numbers of stages appear to result in biased matrix analyses, so we want to make sure that we have at least 7 stages in the final model (Salguero-Gómez and Plotkin 2010).

Steps 1 and 2a. Life history model development, and horizontal dataset organization

First let’s wipe the memory, load lefko3, and then load the data.



The dataset that we have provided is organized in horizontal format, meaning that rows correspond to unique individuals and columns correspond to stage in particular years. Looking at the original Excel spreadsheet (below), you will note a repeating pattern in the names of the columns. Package lefko3 includes functions to handle data in horizontal format, as well as functions to handle vertically formatted data (i.e. data for individuals is broken up across rows, where each row is a unique combination of individual and year in time t).

Figure 2. Organization of the Cypripedium dataset, as viewed in Microsoft Excel.

In this dataset, there are 77 individuals, so there are 77 rows with data (not counting the header). There are 27 columns. Note that the first 3 columns are variables giving identifying information about each individual, with each individual’s data entirely restricted to one row. This is followed by a number of sets of 4 columns, each named Inf2.XX, Inf.XX, Veg.XX, and Pod.XX. The XX in each case corresponds to a specific year, which are organized consecutively. Thus, columns 4-7 refer to year 04 (short for 2004), columns 8-11 refer to year 05, columns 12-15 refer to year 06, columns 16-19 refer to year 07, columns 20-23 refer to year 08, and columns 24-27 refer to year 09. To properly conduct this exercise, we need to know the exact number of years used, which is six years here (includes all years from 2004 to 2009). Note that each year MUST utilize exactly the same number and pattern of columns.

Now we will move on to the assessment of size. The full sizes of individuals are actually the sums of columns (representing sprouts) within years. We will take these sums, and then assess the distribution of individual sizes across years. We will look at all years and look for general patterns and abnormalities.

size.04 <- cypdata$Inf2.04 + cypdata$Inf.04 + cypdata$Veg.04
size.05 <- cypdata$Inf2.05 + cypdata$Inf.05 + cypdata$Veg.05
size.06 <- cypdata$Inf2.06 + cypdata$Inf.06 + cypdata$Veg.06
size.07 <- cypdata$Inf2.07 + cypdata$Inf.07 + cypdata$Veg.07
size.08 <- cypdata$Inf2.08 + cypdata$Inf.08 + cypdata$Veg.08
size.09 <- cypdata$Inf2.09 + cypdata$Inf.09 + cypdata$Veg.09

summary(c(size.04, size.05, size.06, size.07, size.08, size.09))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#>   1.000   1.000   2.000   3.581   5.000  24.000      97

The minimum size noted is 1, while the maximum is 24. There are 97 NAs, which includes cases in which plants were not alive as well as cases in which plants were vegetatively dormant. In the latter case, the individual is alive but not observable, which can be interpreted as an aboveground size of 0. Let’s quickly plot the size distribution of sprouting individuals.

plot(density(c(size.04, size.05, size.06, size.07, size.08, size.09), na.rm = TRUE), 
     main = "", xlab = "Size (# of sprouts)", bty = "n")

Figure 21. Density plot of plant size

This exercise gives us a reasonable idea of size classes to use for adult stages. We will have a dormant class (size = 0 shoots), extra small class (1 shoot), small class (2-3 shoots), medium class (4-5 shoots), large class (6-10 shoots), and extra large class (>10 shoots). Let’s define a stageframe summarizing this.

sizevector <- c(0, 0, 0, 0, 0, 0, 1, 2.5, 4.5, 8, 17.5)
stagevector <- c("SD", "P1", "P2", "P3", "SL", "D", "XSm", "Sm", "Md", "Lg", "XLg")
repvector <- c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1)
obsvector <- c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1)
matvector <- c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)
immvector <- c(0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
propvector <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
indataset <- c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)
binvec <- c(0, 0, 0, 0, 0, 0.5, 0.5, 1, 1, 2.5, 7)

cypframe_raw <- sf_create(sizes = sizevector, stagenames = stagevector, 
                      repstatus = repvector, obsstatus = obsvector, 
                      matstatus = matvector, propstatus = propvector, 
                      immstatus = immvector, indataset = indataset, 
                      binhalfwidth = binvec)

Now we will add some comments to the stageframe for our later use in interpretation.

cypframe_raw$comments[(cypframe_raw$stagenames == "SD")] <- "Dormant seed"
cypframe_raw$comments[(cypframe_raw$stagenames == "P1")] <- "1st yr protocorm"
cypframe_raw$comments[(cypframe_raw$stagenames == "P2")] <- "2nd yr protocorm"
cypframe_raw$comments[(cypframe_raw$stagenames == "P3")] <- "3rd yr protocorm"
cypframe_raw$comments[(cypframe_raw$stagenames == "SL")] <- "Seedling"
cypframe_raw$comments[(cypframe_raw$stagenames == "D")] <- "Dormant adult"
cypframe_raw$comments[(cypframe_raw$stagenames == "XSm")] <- "Extra small adult (1 shoot)"
cypframe_raw$comments[(cypframe_raw$stagenames == "Sm")] <- "Small adult (2-3 shoots)"
cypframe_raw$comments[(cypframe_raw$stagenames == "Md")] <- "Medium adult (4-5 shoots)"
cypframe_raw$comments[(cypframe_raw$stagenames == "Lg")] <- "Large adult (6-10 shoots)"
cypframe_raw$comments[(cypframe_raw$stagenames == "XLg")] <- "Extra large adult (>10 shoots)"

Type cypframe_raw at the R prompt to see what this structure looks like.

Next we will create the vertical dataset. Because we are lumping reproductive and non-reproductive individuals into the non-dormant adult classes, we need to set NRasRep = TRUE. Otherwise, verticalize3() will attempt to use the reproductive status of individuals in classification, and will fail due to the presence of non-reproductive adults. We also need to set NAas0 = TRUE to make sure that NA values in size are turned into 0 entries where necessary, and so aid in the assignment of the vegetative dormancy stage. After this runs, type summary(cypraw_v1) to get a summary of the new dataset.

cypraw_v1 <- verticalize3(data = cypdata, noyears = 6, firstyear = 2004, 
                         patchidcol = "patch", individcol = "plantid", 
                         blocksize = 4, sizeacol = "Inf2.04", sizebcol = "Inf.04", 
                         sizeccol = "Veg.04", repstracol = "Inf.04", 
                         repstrbcol = "Inf2.04", fecacol = "Pod.04", 
                         stageassign = cypframe_raw, stagesize = "sizeadded", 
                         NAas0 = TRUE, NRasRep = TRUE)

Step 2b. Vertical dataset organization

We may also wish to see how to proceed if our original dataset is already in vertical, but ahistorical, format. This package also includes dataset cypvert, which is the same dataset as cypdata but set in ahistorical vertical format. The structure of the dataset can be seen below. Note that individual histories are split across multiple rows.