*In a previous post, I showed how to make a 3 panel chart displaying RSS temperature anomalies, El Nino- La Nina (NINO34 index) and volcanic event (SATO Index) time series data. While time series charts can give a visual impression of how time series are related, time series regression provides much more information, including leading – lagging relationships between the time series. *

*In this post, I analyze the RSS, NINO34 and SATO time series using a simple trend line and a multiple regression model with lag periods for the independent variables. The post demonstrates some of R ‘s regression analysis and data lagging capabilities. Links to the R script and data files are provided.
*

**Introduction**

The RSS time series chart shown in my previous post displays the RSS, NINO34 and SATO time series for the 1979-2008 period. From reviewing this chart we get the impression that there is a relationship between the NINO34 and SATO indexes and the RSS temperature anomaly. When the NINO34 index is high, RSS anomalies seem to rise, when NINO34 is low, RSS anomalies seem to fall. The 1998 period shows a high NINO34 and corresponding high RSS. The SATO index also seems to play a role, however, it is not as clear as NINO34 because of fewer dramatic rises in SATO index and the interplay between the NINO34 and SATO indexes and RSS.

What can we do to move beyond these visual impressions, can we make specific statements about the RSS time series, the role of NINI34 and SATO?

Lucia at the Blackboard blog regularly develops trend lines of global temperature series to calculate the rate of change in temperature series (degree C/ year). The trend line slope gives the rate of change per year. This is useful to get an overall sense of change and to compare comparable temperature time series. It does not provide specific information about the relationship between RSS and the NINO34 and SATO indexes.

Tamino at Open Mind demonstrated a multiple regression analysis of GISS temperature anomalies with time, El Nino-LaNina and volcanic activity in his Known Factors post . His multiple regression analysis provides information on the contribution that time, EL Nino – La Nina (MEI index) & volcanic events play in the temperature trend.

The combination of both simple trend analysis and multiple regression together provide considerably more information on the trends and the relationship between temperature and El Nino – La Nina & volcanic events than the time series plots by themselves.

**Time Series Regression**

I’ve developed several Excel trend analysis tools to help me analyze temperature time series. These tools are available on the Trend Analysis with Excel page, the most frequently visited page at my ProcessTrends.com site.

This post shows my first use of R’s regression analysis tools. This 2 panel chart shows the simple trend line and multiple regression plots for the RSS data.

The simple trend line regression has a slope of 0.0157 C per year and an adjusted r-squared of 0.382. The multiple regression shows that the RSS changes 0.0128 C per year and 0.128 per C change in NINO34 and -3.26 per SATO index change. The adjusted r-squared improves to 0.677 when the NINO34 and SATO indexes are included in the regression.

Since I used different time series than Tamino, my results are comparable but not exactly the same as his. I found the best lag periods for RSS, NINO34 and SATO to be -5 months for NINO34 and -3 months for SATO index versus Tamino’s 4 and 8 months for his MEI and volcanic parameters.

The take home messages:

- NINO34 and SATO indexes are good leading indicators for RSS temperture anomalies.
- R’s time series analysis and regression tools are powerful
- Showing regression results on time series chart significantly enhances the value of both the regression and the chart

**R Script Walkthrough**

*The R script is shown below in modules. I have extensively commented my code to make it easier for me to reuse it for other charts. The script file is here and the data file is here.*

**1: Setup: **This script provides introductory comments, defines the source data file link, specifies the plot parameters (par) and the plot text note font adjustment factor (t_cex). The par(mfrow=c(2,1)) specifies a multi-panel chart with 2 rows and 1 column.

# Time Series Regression of RSS Temp Anomaly Trend, SATO Index, NINO3.4 Index ## Monthly anomaly data ## D Kelly O'Day; http://processtrends.com; http://chartsgraphs.wordpress.com ## Feb 11-13, 2009 ########################################################## ## 1: Setup ########################################################## link <- "http://processtrends.com/R_Files/Consol_GISS_SATO_Nino34A_RSS.csv" options(digits=6) par(ps=9); par(cex.axis=c(0.85)); par(cex.lab=c(0.9));par(las="1") par(mfrow=c(2,1)) par(oma=c(3,3,4,2)); par(mar=c(0,5,0,0)); par(bty="n") t_cex = 0.9 # control plot text note font size

**2: Read Data: **This script opens the data file specified in the link (note it is an http:// link) and reads the source data into a raw_data data.frame. The column names are specified in the read.data() command.

Since the raw_data goes back to 1880 and we only want to analyze RSS data that starts in 1979, we subset() the data into a new data.frame (in_data) that we will use for our analysis.

########################################################## ## 2: Read Data ########################################################## raw_data <- read.csv(link, skip = 1, sep = ",", dec=".", row.names = NULL, header = FALSE, colClasses = rep("numeric",7), comment.char = "#", na.strings = c("*", "-",-99.9, -999.9), col.names = c("Yr", "Mo","Yr_frac", "GISS", "SATO", "NINO34", "RSS") ) in_data <- subset(raw_data, Yr >= 1979)

**3: Create time series variables:** In our analysis, we want to use NINO34 and SATO as independent variables and RSS as the dependent variable. Since we will be trying different lag periods for the independent variables, we create parameters n_lag and s_lag to allow us to vary the lags to see which set gives us the best fit based on manual review of resulting r squared.

We want to establish our 4 variables as time series objects so that we can take advantage of R’s time series object capabilities. We specify the start year and frequency (12 per year for monthly data) to create the time series objects for our variables.

We next use R’s lag() function to create the lagged NINO34 and SATO variables. Finally, we use R’s ts.union() function to generate m_d, our time series data set.

########################################################## ## 3: Create time series variables ########################################################## # Enter lag months for Nino34 and SATA n_lag <- -5 s_lag <- -3 # Try several lags to find maximum r2 ts_rss <- ts(RSS, start =c(1979,1), frequency = c(12)) ts_nino34 <- ts(NINO34,start = c(1979,1), frequency = c(12)) ts_sato <- ts(SATO, start = c(1979, 1), frequency = c(12)) ts_yr_frac <- ts(Yr_frac, start = c(1979,1), frequency = 12) ts_data <- data.frame(ts_rss, ts_nino34, ts_sato,ts_yr_frac) nino_l <- lag(ts_nino34, n_lag) sato_l <- lag(ts_sato, s_lag) m_d <-ts.union(ts_yr_frac, ts_rss, ts_nino34,nino_l, sato_l)

**4: Develop simple trend line regression model: **R has several liner model tools, including the lm() function. In this script I establish a trend_lm_fit object that includes the results of a least squares regression of ts_rss versus ts_yr_frac from data saved in m_d.

We can obtain the regression coefficients by assigning summary(trend_lm_fit) to trend_sum and then retrieving coef() [1] and [2] for a and b. We get the adjusted r squared by retrieving trend_sum$adj.r.squared.

########################################################## ## 4: Develop simple trend line regression model ########################################################## trend_lm_fit <- lm(ts_rss ~ ts_yr_frac, data = m_d) trend_sum <- summary(trend_lm_fit) # get tend line coefficients trend_a <- coef(trend_lm_fit)[1] trend_b <- coef(trend_lm_fit)[2] trend_r2 <- trend_sum$adj.r.squared

**5: Multiple Regression: **We use a similar approach for the multiple regression, we create a lm_fit object to hold the lm() function results for our multiple regression formula. We then retrieve the a, b, c and d coefficients by using coef(lm_fit)[1] for a, etc.

We retrieve the adjusted r squared the same way we did for the simple liner regression. Finally, we generate a multiple regression estimate (mr_est) by using the coefficients in our regression model.

########################################################## ## 5: Multiple Regression: RSS = a + b*yr_frac + c*NINO34_lag + d*SATO_lag ########################################################## lm_fit <- lm(ts_rss ~ ts_yr_frac + nino_l + sato_l, data = m_d) reg_sum <- summary(lm_fit) # get regression coefficients a <- coef(lm_fit)[1] b <- coef(lm_fit)[2] c <- coef(lm_fit)[3] d <- coef(lm_fit)[4] r2 <- reg_sum$adj.r.squared # produce multiple regression estimate mr_est <- a +b*ts_yr_frac + c*nino_l + d* sato_l

**6: Plot 1 – Simple Trend Line** We have previously specified that we want a 2 panel chart (par(mfrow=c(2,1)).

First, we develop a text string to document our actual regression formula with trend_f which retrieves the a and b coefficients (notice signif() controls number of digits displayed).

The plot command specifies the variable (ts_rss) and other plot parameters. I have adjusted the axes, added axis labels, the regression line and annotation text to fully document the regression results. Notice how I use the rect() function to mask the vertical grid lines in vicinity of annotation text.

Notice how I have added the vertical grid lines with the abline(v=c(1985, 1990, etc). By using this statement in both plots, we get nice vertical lines that help the reader navigate between both plots.

########################################################## ## 6: Plot 1: Simple Trend Line ########################################################## trend_f = paste("RSS = ", signif(trend_a, 3), " +", signif(trend_b,3), "*Yr_Frac") plot(ts_rss, col = "grey", xlim = c(1979, 2010), ylim = c(-1,1), xaxs="i", yaxs ="i", ylab = expression(paste("RSS Anomaly - ", degree, "C")), axes = FALSE) axis(1, at= FALSE, labels=FALSE, col = "grey") axis(2, labels = TRUE, col = "grey") axis(3, labels=FALSE, at = FALSE, col = "grey") abline(trend_lm_fit, col = "red") abline(v = c(1985, 1990, 1995, 2000,2005, 2010), col = "lightgrey") abline(h=1, col = "grey") abline(h=-1, col = "grey") rect(2001.,-0.5, 2006, -0.25, border = NA, col = "white") text(2008,-0.3, paste("adj r squared = ", signif(trend_r2,3)), cex = t_cex, pos = 2) rect( 1980, 0.7, 1989.5, 0.9, bor = NA, col = "white") text(1979, 0.8, "a) Simple Trend Line", col = "black", pos = 4,adj = 0) rect(1979.2, -0.9, 1992, -0.7, border = NA, col = "white") text(1982, -0.8, trend_f, col = "black", pos = 4, cex = 0.85, adj = 0) lines(c(1980, 1981.7), c(-0.8, -0.8), col = "red")

**7: Plot 2 – Multiple Regression:** The 2nd plot is comparable to the first, except that it displays the multiple regression results.

########################################################## ## 7: PLOT 2: Multiple Regression ########################################################## par(mar=c(1,5,0,0)) rss_f = paste("RSS = ", signif(a, 3), " +", signif(b,3), "*Yr_Frac + ",signif(c,3), "*NINO34 ", signif(d,3), "*SATO") plot(ts_rss, col = "grey", xlim = c(1979, 2010), ylim = c(-1, 1), xaxs="i", yaxs ="i", xlab = "", ylab = expression(paste("RSS Anomaly - ", degree, "C")), mgp = c(3,1,0)) abline(h=-1, col = "grey") axis(1, col = "grey", at= NULL, labels = TRUE, cex = 1) axis(2, col = "grey") abline(v = c(1985, 1990, 1995, 2000,2005, 2010), col = "lightgrey") points(mr_est, type ="l", col = "red") rect( 1980, 0.7, 1997, 0.9, bor = NA, col = "white") text(1979, 0.8, "b) Multiple Regression (Yr, NINO34, SATO)", pos = 4, adj = 0) rect(2000.2, -0.6, 2008, -0.2, border = NA, col = "white") text(2008,-0.3, paste("adj r squared = ", signif(r2,3)), cex = t_cex, pos = 2) text(2008,-0.4, paste("Nino34 lag ", n_lag, " months"), cex = t_cex, pos = 2) text(2008,-0.5, paste("SATO lag ", s_lag, " months"), cex = t_cex, pos = 2) rect(1979.2, -0.9, 2004, -0.7, border = NA, col = "white") text(1982, -0.8, rss_f, col = "black", pos = 4, adj = 0, cex = 0.85) lines(c(1980, 1981.7), c(-0.8, -0.8), col = "red")

**8: Overall Chart Title and footer**: I want the chart to be fully documented so that the chart title includes the month and year of last data range, and the footer includes the print data.

########################################################## ## 8: Overall Chart Title, footer ########################################################## # Calculate mo, yr & value for last readings nr <- nrow(ts_data) # Find number of data rows mo <- Mo[nr] # Find last month of data yr <- Yr[nr] # Find last year of data mo_names <- c("Jan", "Feb", "Mar", "Apr", "May", "June", "July", "Aug", "Sept", "Oct", "Nov", "Dec") # Generate and add Title_1, Title_2 Title_1 <- "RSS Temperature Anomalies: a) Simple Trend Line, b) Multipe Regression" Title_2 <- paste("Monthly Data: Jan, 1979 through ", mo_names[mo], ",", yr) mtext(Title_1, 3,2, outer = TRUE) mtext(Title_2, 3,1, outer = TRUE, cex = 0.85) # Generate and add bottom footer KOD & System date notes mtext("D Kelly O'Day - http://chartgraphs.wordpress.com", 1,1, adj = 0, cex = 0.8, outer=TRUE) mtext(format(Sys.time(), "%m/%d/ %Y"), 1, 1, adj = 1, cex = 0.8, outer=TRUE) ##########################################################

That’s it.

What do you think?

Pingback: UAH Temperature Anomalies Following Predictable Pattern | Climate Charts & Graphs

Pingback: RClimate Script: NINO 3.4 SST Anomaly Trends « Climate Charts & Graphs

Pingback: Understanding the Science of CO2’s Role in Climate Change: 1 – Introduction « Climate Charts & Graphs

Pingback: Excel Chart Misrepresents CO2 – Temperature Relationship « Charts & Graphs with R

Pingback: Decadal Trend Rates in Global Temperature « Charts & Graphs

Interesting.

How would one analyze four data sets from 1992-2009?

Gistemp

HadCRU

RSS

UAH

The trends are very close to one another, but what does it mean?

Pingback: R Lets You Put Chart Inside Chart « Charts & Graphs