Clustering Time Series

Visualising clusters (or heirarchies) of Time Series Data

6th June 2014 by Ian Hansel


This is a post to show you how visualise multiple time series in R using ggplot2. It was inspired by this blog post on time series by Rob J Hyndman, Measuring Time Series Characteristics (it's a great blog that's worth checking out). In particular the last line of the article, talking about the visualisation below, the den­do­gram was gen­er­ated in mat­lab, although that could now also be done in R using the ggden­dro pack­age for example.



I was unable to find any examples of this in R yet so I decided to use ggplot2 + ggdendro + gridExtra to make the following.


Here's the code used to generate the plots - note that the clustering is not done here like the paper describes (that's another post). This is just to show you how to plot the time series in groupings in R.


# Install the necessary packages
library(devtools)
install_github('R-package','quandl')
library(Quandl)
library(ggplot2)
library(gridExtra)
library(ggdendro)
library(zoo)

# Get some data from QUANDL
amazon <- Quandl('GOOG/NASDAQ_AMZN', start_date="2013-05-01", end_date='2014-05-01', collapse='weekly', type='zoo')
apple <- Quandl('GOOG/NASDAQ_AAPL', start_date="2013-05-01", end_date='2014-05-01', collapse='weekly', type='zoo')
google <- Quandl('GOOG/NASDAQ_GOOG', start_date="2013-05-01", end_date='2014-05-01', collapse='weekly', type='zoo')
qlikview <- Quandl('GOOG/NASDAQ_QLIK', start_date="2013-05-01", end_date='2014-05-01', collapse='weekly', type='zoo')
sap <- Quandl('GOOG/NYSE_SAP', start_date="2013-05-01", end_date='2014-05-01', collapse='weekly', type='zoo')
walmart <- Quandl('GOOG/NYSE_WMT', start_date="2013-05-01", end_date='2014-05-01', collapse='weekly', type='zoo')
sears <- Quandl('GOOG/NASDAQ_SHLD', start_date="2013-05-01", end_date='2014-05-01', collapse='weekly', type='zoo')
jcpenny <- Quandl('GOOG/NYSE_JCP', start_date="2013-05-01", end_date='2014-05-01', collapse='weekly', type='zoo')

# Plot the time series
plot(amazon)
# Merge and plot the time series (just the closing price)
joined_ts <- cbind(amazon[,4], apple[,4], google[,4], qlikview[,4], sap[,4], walmart[,4], sears[,4], jcpenny[,4])
names(joined_ts) <- c('amazon', 'apple', 'google', 'qlikview', 'sap', 'walmart', 'sears', 'jcpenny')
plot(joined_ts)

# Scale the time series and plot
maxs <- apply(joined_ts, 2, max)
mins <- apply(joined_ts, 2, min)
joined_ts_scales <- scale(joined_ts, center = mins, scale = maxs - mins)
plot(joined_ts_scales)

# Sparklines & Dendograms
hc <- hclust(dist(t(joined_ts_scales)), "ave")
colours_hc <- cutree(hc, h=2) # colour the tree at different levels by changing the h value

### Plot
hcdata <- dendro_data(hc)
names_order <- hcdata$labels$label
# Use the folloing to remove labels from dendogram so not doubling up - but good for checking
hcdata$labels$label <- ''
p1 <- ggdendrogram(hcdata, rotate=TRUE, leaf_labels=FALSE)

new_data <- joined_ts_scales[,rev(as.character(names_order))]
p2 <- autoplot(new_data, facets = Series ~ . ) + 
  aes(colour=as.character(rep(colours_hc,each=53)), linetype = NULL) +
  xlab('') + ylab('') + theme(legend.position="none")

gp1<-ggplotGrob(p1)
gp2<-ggplotGrob(p2) 


grid.arrange(gp2, gp1, ncol=2, widths=c(4,2))

Resources:

Hope you've enjoyed reading - if you have any suggestions please comment below or message me on twitter.

If you're interested in learning more about R or Statistics and are in Sydney, I recommend these courses or checking out the Sydney R user group.


Cheers,

Ian Hansel



6th June 2014 by Ian Hansel







comments powered by Disqus