Predictive analytics in manufacturing uses models extracted from historical order and production related records to support the optimization of manufacturing processes. For instance, forecasting the overall amount of incoming orders is crucial for a) properly scheduling and sizing raw material and part orders and b) configuring processes as e.g. final assembly or testing. Loosely based on a specific subproblem in one of our ongoing R & D projects, this howto showcases the process of building an order amount predictor in R.
Predictive analytics for supporting manufacturing
Manufacturing is a field where predictive analytics, when applied properly, can have a significant impact on operational efficiency and risk. For instance, without trustworthy guidance on the expected amount of orders to be fulfilled in the near future, usually a significant stockpile of parts/raw materials and spare (process) capacities are necessary in order to compensate for the stochastic variances in the volume of incoming orders. In contrast, with good predictive models, existing capacities can be exploited better and material overhead can be reduced  despite the usually hectic nature of the order arrival process.
For almost a year, our department has been conducting a production modeling and optimization R&D project at a major hardware equipment manufacturer offering a rich variety of hardware configuration options to the consumers. Experience shows that the order flow of the offerings is surprisingly seasonal and highly stochastic.
In such settings, accurate historical operational data covering most aspects of the manufacturing and supporting processes is typically rigorously recorded; our main challenge was (and is) extracting knowledge from the logs that can guide continuous improvement of operations and their associated risks. One aspect of this activity is building a forecasting approach for the amount of incoming orders with a weekly granularity.
In this article we present an approach that builds an order amount predictor based solely on historical order information; we exploit the fact that there are apparent (and hidden) regularities in the time series. This example is selfcontained enough for showing how "pure intelligence"  or a bit less sonorously: the application of predictive analytics  can increase productivity and profitability at virtually no cost. (At least compared to the cost of excess or depleted inventory in our given industrial setting.)
Time series data
We will work on the weekly aggregates of the worldwide amount of orders, normalized into the [0,1] interval. The data is contained in a single data frame named “orderts” (available at http://www.mit.bme.hu/~ikocsis/rcontest/):
> head(orderts)
year quarter week orders
1 2009 Q1 1 0.1620370
2 2009 Q1 2 0.0350988
3 2009 Q1 3 0.1109399
4 2009 Q1 4 0.1491900
5 2009 Q1 5 0.1209814
6 2009 Q1 6 0.1044528
The value of the week and quarter columns is relative with respect to the year.
Exploring the time series data
One of the major strengths of R is that it contains (or integrates with) a number of tools and packages that facilitate exploratory data analysis. It is certainly convenient that the data to be explored can be quickly transformed and various statistical tests can be performed on it in an interactive way; however, the strong support of visual data exploration has to be emphasized on its own right. For instance, one can use the iplots package or the integration with ggobi (through rggobi) for interactive multivariate data exploration. For exploring our univariate time series we will use the ggplot2 package. ggplot2 is based on a “grammar of graphics” that allows for composing various views of the data in an agile and quite rapid way.
First and foremost, let us compare the two full years 2009 and 2010 and the first quarter of 2011 (for an explanation of the ggplot2 calls, please refer to the package documentation or the cited references):
library(‘ggplot2’)
qplot(week, orders, data = orderts, colour = as.factor(year), geom = "line")
It is apparent that there is quite much similarity between the two full years and the one full quarter we have data for; also, there seems to be a strong periodicity across the quarters. This suggests that there is an underlying structure in the data that can be used for forecasting. However, there seems to be a strange one week difference between the apparent peaks of the two full years. A byquarter histogram of the week values uncovers the cause quickly:
qplot(week, data=orderts, colour=as.factor(year), binwidth=0.5) + facet_wrap(~ quarter)
There was one more business week in the first quarter of 2009 than in the first quarter of 2010 and 2011; consequently, there is a consistent one week difference between the last weeks of the quarters. However, due to the sudden surge of orders, from the business point of view this is exactly the week that counts the most.
The model building approaches we will use later can usually cope with such a oneweek offset discrepancy; for quick and dirty data exploration here let’s get simply rid of the 13th week of 2009 Q1. Also, to support visualization we introduce a “week of current quarter” variable.
orderts2 < cbind(orderts[13,], weekinq=c(1:117))
prev < orderts2[1,]
runvar < 1
for(i in 2:nrow(orderts2)){
current < orderts2[i,]
orderts2[i,"weekinq"] < ifelse(prev$quarter == current$quarter, runvar+1, 1)
runvar < ifelse(prev$quarter == current$quarter, runvar+1, 1)
prev < current
}
rm(prev, current, runvar, i)
Let us compare the orders now on a quarter by quarter basis:
qplot(weekinq, orders, data = orderts2, colour = as.factor(year), geom = "line") +
facet_wrap(~quarter)
These plots tell us the following:
 At the very end of the quarter, there is always a significant spike in the orders; for Q3 and Q4, even the exact amounts are very close.
 During the quarters, there is also significant similarity between the orders.
 On the whole, the time series seems to be stationary and highly periodic; therefore, it should be worthwile to analyse its characteristics in the frequency domain.
Frequency domain observations
Visual inspection hinted at a strong periodicity in the time series at the quarter, half year and year intervals. In order to prove this suspicion, we now perform a brief power spectrum analysis of the first two years of the data. 2011 Q1 is omitted here; we would like to look at full years to uncover possibly yearly periodicity. Also, this analysis will lead us to the training data for the forecasting approach and 2011 Q1 is kept back for testing purposes.
The following code performs the common Fast Fourier Transformation, extends the results with a frequency member index and plots the value of the complex modulus w.r.t. the frequency index. Note that the zero frequency component is omitted. Also, as the input signal is realvalued, it is symmetric in the remaining frequencies; therefore, only the lower half of the spectrum is plotted.
(A note from the authors: if you are not familiar with frequency domain sorcery, please do not feel intimidated; with a fair amount of simplification, all that happens is that we take a supposedly periodic, complicated, real valued signal and compute the magnitude and phase of its constituent frequencies. Please refer to Wikipedia, any standard textbook on signal processing or control theory, or the book "Time Series Analysis with Applications in R". If, however, you happen to be intimately familiar with frequency domain analysis, you are kindly asked to excuse us the lack of mathematical rigor in the following; it would have been out of place in an R howto.)
f < data.frame(coef = fft(orderts2[1:104, "orders"]), freqindex = c(1:104))
qplot(freqindex, Mod(coef), data = f2[2:53,], geom = "line")
This results in the following “power spectrum” plot that uses the complex modulus to measure “magnitude”:
It is apparent that a number of frequencies have significantly higher amplitude than the others, hinting at a strong underlying periodic nature in the data. Let’s identify these peaks:
f[Mod(f$coef) > 3 & f$freqindex < 53, "freqindex"]  1
The result:
[1] 0 8 16 24 32 40 48
What we see is that a component with the frequency of 8 / 104 (2 * 4 quarters in the two years) and its harmonics (note the exactly 8 difference between the peaks) seem to dominate the signal. However, these frequencies alone are insufficient to capture the time series to the precision we would like to. To demonstrate this, the following code segment eliminates the frequencies with “small magnitude” in a copy of the Fourier transform. The remaining ones are transformed back to the time domain (an inverse FFT call) in order to be compared to the original time series.
peaks < Mod(f$coef) > 3
ffilt < f
ffilt[!peaks, "coef"] < 0
ffilt < data.frame(index=ffilt$freqindex, value=Re(fft(ffilt$coef, inverse=TRUE))/104, type=rep("filtered", times=104))
ffilt < rbind(ffilt, data.frame(index=seq(1:104), value=orderts2[1:104,"orders"], type=rep("original", times=104)))
Forecasting approach
The signal we deal with shows strong regularity, but is at the same time highly complex (and decidedly nonlinear). Therefore, for forecasting purposes we split it into simpler periodic time series and train neural networks for the finite time window forecasting of each simplified component. The “splitting” is based on a nonoverlapping partitioning of the frequency domain.
Our prediction approach is the following. “New” time series data is concatenated to the training set as it becomes available. This new, extended time series is again split with the same bandpass filtering utilized for the training data. The new simplified time series set (the elements of which are extended with the images of the new observations) is used to exercise the corresponding forecasting neural network. The order forecast for the unsplit signal is reached by summing the outputs of the componentforecasting neural networks.
Signal decomposition for training and forecasting
We split the frequency domain of the time series into intervals so that each interval contains either the fundamental frequency of the strong periodic signal or one harmonic of it. This is effectively a bandpass filtering based decomposition:
peakind < f[abs(f$coef) > 3 & f$freqindex > 1 & f$freqindex < midindex,]
midindex < ceiling((length(f$coef)1)/ 2) + 1
lindex < length(f$coef)
lowerind < 1
subsignals < lapply(c(peakind$freqindex, midindex+1), function(x){
upperind < x
fsub < f
notnullind < ((fsub$freqindex >= lowerind
& fsub$freqindex < upperind)

(fsub$freqindex > (lindex  upperind + 2)
& fsub$freqindex <= (lindex  lowerind + 2)))
fsub[!notnullind,"coef"] < 0
lowerind << upperind
Re(fft(fsub$coef, inverse=TRUE)/length(fsub$coef))
})
The code produces the time series signals belonging to the harmonicsdefined frequency bands. For the above two year time series and harmonic set, this produces the following decomposition in the time domain:
grid.newpage()
pushViewport(viewport(layout=grid.layout(4,2)))
vplayout < function(x,y)
viewport(layout.pos.row = x, layout.pos.col = y)
psig < function(x, y, z){
h < data.frame(index = c(1:length(subsignals[[x]])),
orders = subsignals[[x]])
lab < paste("Subseries ", as.character(x), sep="")
print(qplot(index, orders, data = h, geom = "line", main=lab), vp = vplayout(y,z))
TRUE
}
psig(1,1,1); psig(2,1,2); psig(3,2,1); psig(4,2,2); psig(5,3,1); psig(6,3,2); psig(7,4,1)
Neural network training
We use the “Multilayer Perceptron” (MLP) feedforward artificial neural network model to map a finite time window of order observations onto predictions about the orders that can be expected in the future. We preferred nonlinear modeling against linear predictors, since we assumed the latter one could not catch the information underlying our data. Some other notable advantages of MLP models are the following.
 Universal function approximator: theoretically any function can be learned by an MLP.
 Only a small number of arguments (in our case, the number of neurons in the socalled “hidden layer”) has to be chosen.
 Powerful support in R (via the nnet package).
We will have a number of neural networks, each one responsible for learning and predicting a frequency band of the original. We assume that "cutting up" the original signal by using frequency bands results in subsignals that are fundamentally easier to predict. (Note that in the following, we work with a slightly different time series of 105 observations and the main frequency/harmonics 9,17,25,33,41,49,50 in the Fourier series.)
The number of hidden neurons was selected by trialanderror. As the prediction of the decomposed signals is not a complicated task, simple networks, i.e. a few hidden neurons, suffice. The final neuron numbers were selected as follows:
#number of hidden neurons
nn.sizes < c(4,2,3,3,3,2,2,2)
One additional parameter has to be chosen: the size of the time window used for prediction. This was not used directly by the networks, as we use static networks. This setting determines how the time series has to be transformed so it can be used as the input of the network. The width of the time window was chosen to be 4 for all time series. Alas, this means that we have to bring our data into a form amenable for neural network training  as the neural networks learn 4tuples of observations (x(t), x(t1), x(t2), x(t3)), we have to have the multiple timeshifted versions of each component signal ready:
numofsubs < length(subsignals)
twindow < 4
offsettedsubdfs < lapply(1:numofsubs, function(x){
singleoffsets < lapply(0:(twindow1), function(y){
subsignals[[x]][(twindowy):(length(subsignals[[x]])y1)]
})
a < Reduce(cbind, singleoffsets)
names < lapply(1:twindow, function(y){paste("TS", as.character(x), "_", as.character(y), sep = "")})
b < as.data.frame(a)
colnames(b) < names
b
})
For later use we calculate the number of samples in our dataset.
sample.number < length(offsettedsubdfs[[1]][,1])
After this, we use the input to actually train the neural networks.
#the neural networks
nns < lapply(1:length(offsettedsubdfs), function(i)
{
nn < nnet(offsettedsubdfs[[i]][1:(sample.number),], #the training samples
subsignals[[i]][(twindow+1):(length(subsignals[[i]]))], #the output
#corresponding to the training samples
size=nn.sizes[i], #number of neurons
maxit = 1000, #number of maximum iteration
linout = TRUE) #the neuron in the output layer should be linear
#the result of the trained networks should be plotted
plot(subsignals[[i]][(twindow+1):(length(subsignals[[i]]))], type="l")
lines(nn$fitted.values,type="l",col="red")
nn
})
The results of the neural network trainings for the different decomposed signals can be seen on the figure below.
The red line is the response of the network, and the black is the original time series. All training data fits well, and the response of the neural networks at the training samples is accurate. The MSE (mean square error) for all training data is smaller than 10^(5).
Now we have a set of neural network predictors that each have learned a "part" of the original signal  an quite well to that. And as the Fouriertransform is a linear operation, the sum of the individual predictions will give the full time series prediction back.
Forecasting
At this point we have a composite predictor that is able to predict the order amount for the first week of 2011. Actually, it is able to predict any week of the year  provided that the observations for the previous weeks are available. We have to simply augment the historical data, split it into component time series signals for the individual neural networks based on the earlier defined frequency bands, excercise the individual networks with the new component time series and sum up the prediction results. This is effectively a shortterm prediction of the order time series.
We are also able to produce mid and longterm predictions. In this case we predict one week, use the result of the prediction to augment the historical data and then excercise the predictor with this new input again. Theoretically, this feedback strategy can be used to predict to arbitrarily long horizons in the future.
We have mentioned earlier that we have used only the first 8 quarters as training data. Now we will use the data for the last, ninth quarter to check how well we can predict with a one quarter horizon:
number.of.predict < 14
#long term prediction
long.predictions < lapply(1:length(offsettedsubdfs), function(i)
{
prediction < vector(length=number.of.predict, mode="numeric")
#initial input
input < offsettedsubdfs[[i]][sample.number,]
for (j in 1 : number.of.predict)
{
prediction[j] < predict(nns[[i]], input)
input < c(prediction[j],input[1:(length(input)1)])
}
#we want to plot the prediction
plot(c(nns[[i]]$fitted.values,prediction), type="l",col="red")
lines(subsignals[[i]][(twindow+1):length(subsignals[[i]])])
prediction
})
We can draw the forecast from the training samples. The corresponding figure can be seen below.
The results of the predictions are plausible. However the most interesting part is the prediction of the original time series. So we have to sum up the signals which can be seen on the figure above. This is on the next figure.
Here the error is calculated as the RMSE (Root Mean Square Error) for the predicted time series; its value is 0.08.
However, beside the comparison of the original and predicted data, we may want to know, what kind of data should be predicted by the single neural networks, thus what kind of forecast would result in a perfect prediction. This could be examined or at least approximated with the following method. First we plot the decomposed signal calculated from the first 8 quarter, than we plot the trained signals, as it was predicted by the networks, and then we plot the decomposed signals as if we have used the full 9 quarter. This can be seen on the figure below.
It is easy to see that those signals were predicted more accurately where the signal was periodic, and those signals had the highest errors where the trend was dominant, which are the first and the second signals. The training signals are black, the trained and predicted signals are red and blue are the signals that would have belonged to a perfect prediction.
Application in the industrial context
The predictors extracted from historical data can well support production scheduling and smoothing the production processes. Initially, at the beginning of a quarter production can generate assets according to the predicted volume and composition of the features in the orders. As the number of the incoming orders increases with the progress of time, priority can be given to the definite orders and the slack production capacity can be used to compensate the difference between the predicted and already defined order volume. In the final phase of a quarter only orders are processed. This simple logic fits well into any optimization approach by assigning a gradually decreasing importance to the predictor (decreasing weight). The impact from the point of view of production is that premanufacturing generates enough assets of products fitting to the composition of the later orders to fulfill them from the store during the week load at the end of the quarter. This way, the utilization of manufacturing capabilities can remain at a nearly constant level.
Brief outlook: first steps in mining multivariate supporting data
However nice a "technical" approach we devise for prediction, a true strategic planner will most probably demand more. What is the cause of the spikes? Can and should the prediction problem be structurally decomposed, e.g. geographically? If we want to actually explain the periodicity and its characteristics, the whole world is not only enough, but too much  is there maybe somewhere a "model country" that in itself predicts global orders? As a brief outlook concluding the article, we show a way of getting started in these questions.
Consider the following order time series for individual regions. (Speaking of pars pro toto modeling, using sector names of the planet Trantor from Asimov's Foundation books seems appropriate.)
It is apparent that most probably, Mycogen and the Imperial sector present only a very small contribution to the overall amount. However, this does not mean that they could not be "model economies"  actually, we would be very lucky if we could extrapolate to the whole world from the behavior of sectors with so miniscule contributions. (The economics behind can be expected to be much simpler.) How does the correlation matrix of the time series as variables look like?
> cor(sectors)
week Dahl Imperial Streeling Mycogen Wye all
week 1.00000000 0.21147568 0.0147790996 0.01035854 0.0586567040 0.01822436 0.048532314
Dahl 0.21147568 1.00000000 0.5177571586 0.65752677 0.0826777918 0.38762459 0.749961161
Imperial 0.01477910 0.51775716 1.0000000000 0.62612006 0.0003132273 0.43375312 0.665914883
Streeling 0.01035854 0.65752677 0.6261200602 1.00000000 0.1348356653 0.70499846 0.965086630
Mycogen 0.05865670 0.08267779 0.0003132273 0.13483567 1.0000000000 0.09789273 0.001716141
Wye 0.01822436 0.38762459 0.4337531246 0.70499846 0.0978927252 1.00000000 0.794789408
all 0.04853231 0.74996116 0.6659148831 0.96508663 0.0017161411 0.79478941 1.000000000
What we see is that Streeling has a correlation coefficient of 0.965 with the overall amount. But is it not only an effect of producing almost all orders in itself? Not really:
Although there is certainly scatter around the curve fitted (using local regression methods, specifically, a ggplot2 "stat_smooth" statistical overlay), in most cases it is not really significant. It seems like we should look into the order process of sector Streeling much deeper; first maybe with a further geographic refinement.
What would you guess, which "Old World" country do we use as a model market?
Concluding remarks
Following a more general trend, exploratory data analysis and predictive analytics have been for a while very hot topics in manufacturing planning. And rightfully so; as we have seen on the presented examples, although they are certainly not omnipotent  we can take only "educated guesses" at the future  using them intelligently can have an immense impact on many aspects of production, from inventory management to strategic planning.
R is certainly not the only option. We could have implemented the workflow that inspired this howto in a number of other statistical/engineering environments. Actually, we did; what we found is that even if we disregard the cost factor (R being free; most other mature statistical environments not so much), R has such a potent and well integrated set of capabilities builtin or readily available in packages that switching to it sped up data exploration and algorithmic development quite much. (And here we did not even touch multivariate data exploration and analysis.)
Speaking of workflows: one apparent weakness of R is that when you get down to real work, the "workflow"  e.g. extract data, transform, load, transform again, visualize, check properties in the frequency domain, determine frequency bins and so on  quickly becomes hard to follow. Also, as the length of your workflow increases, it needs a certain amount of rigor to keep it restartable at intermediary points. Scripting being what it is, it is definitely not an R specific weakness  the same would hold for bash.
To our luck, there already are some tools that are able to (visually) model and automate data processing workflows; and even in that area, the ability to integrate R seems to becoming a must. Currently we are looking at KNIME, the Konstanz Information Miner. But this is a topic we should discuss in another howto.
References
 Home page of the ggplot2 package: http://had.co.nz/ggplot2/
 Another great reference on ggplot2: Hadley Wickham: “ggplot2  Elegant Graphics for Data Analysis”, 2009, Springer.
 Neural network forecasting: http://www.neuralforecasting.com/
 Palit, Ajoy K., Popovic, Dobrivoje: "Computational Intelligence in Time Series Forecasting, 1st Edition.", 2005, Springer.
 R. Rojas: "Neural Networks: a systematic introduction", 1996, SpringerVerlag, Berlin.
 A very good statistical textbook from Springer: J.D. Cryer, K.S. Chan: "Time Series Analysis with Applications in R", 2008, Spinger.
 KNIME product page: http://www.knime.org/
Principal investigator: Prof. András Pataricza (pataric@mit.bme.hu)
Contact person: Imre Kocsis (ikocsis@mit.bme.hu)