In this post I show how to use the R strucchange package to prepare a segmented regression of the annual GISS temperature anomaly data series.
Introduction
In this previous post, I showed how the monthly GISS temperature anomaly trend rates vary widely by decade. While decades make sense from a calendar standpoint, they are an arbitrary subdivision of the long term temperature trends from a climate standpoint.
In this post, I use the R package strucchange to evaluate the annual GISS data to find regime shift breakpoints and develop linear regression models of data subsets between the breakpoints.
Let’s take a look at the 1880 – 2008 trend chart.
The anomaly has risen in fits and starts over the 1880-2008 period. The 1880 – 1920 period seems rather flat with a jump in the early 1920s. There also seems to be a significant jump in the 1980s.
While this type of eyeball interpretation is quick, the stucchange package lets us find statistically significant changes in the mean or slope of our data.
strucchange Analysis of GISS Trends
The strucchange package has several functions to help in analyzing non-monotonic data series. The breakpoints() function finds the optimal breakpoins in the data series. For the annual GISS anomaly data, 1925 and 1986 are the optimal breakpoints points.
Here’s an segmented regression model GISS data using the 1925 and 1986 breakpoints. Notice the red confidence limits for the breakpoints and the vertical dashed lines for the estimated breakpoints.
R Script
Here’s the R script that retrieves the GISS data, determines the breakpoints, generates the trend plot and segmented linear models.
## Ann GISS Surface Temp Anomaly Breakpoint Analysis ########
### D Kelly O'Day, (7/20/09 -7/22/09) ###
###### Setup
library(ggplot2); library(strucchange)
rm(list=ls())
par(las=1); par(ps=10); par(oma=c(2,1,0,1)); par(mar=c(2,4,3,0))
link <-- "http://data.giss.nasa.gov/gistemp/graphs/Fig.A2.txt"
#### Read text file, create data.frame
url <- c("http://data.giss.nasa.gov/gistemp/graphs/Fig.A.txt")
file <- c("Fig.A.txt")
download.file(url, file)
## Determine the number of rows in a file, and exclude the last 1
x_rows <- length(readLines(file)) - 1
## Read file as char vector, one line per row, Exclude first 4 rows
lines <- readLines(file, n=x_rows)[5:x_rows]
## Convert the character vector to a dataframe
df <- read.table(
textConnection(lines), header=F, colClasses = "character")
closeAllConnections()
# We are only interested in first 2 columns
df <- df[,1:2]
names(df) <- c("yr", "anomaly")
# Convert all variables (columns) to numeric formatl
df <- colwise(as.numeric) (df)
GISS_a_ts <- ts(df$anomaly, start = 1880,freq = 1)
############ Working with breakpoints
bp.GISS <- breakpoints(GISS_a_ts ~1)
no_bp <- length(bp.GISS$breakpoints[])
## To get break dates
bd_GISS <- breakdates(bp.GISS)
## create vectors from[i] & to[i] for segment start & end pt
fr <- as.numeric(3)
to <- as.numeric(3)
fr[1] <- df$yr[1]
to[1] <- bd_GISS[1]
fr[2] <- bd_GISS[1] + 1
to[2] <- bd_GISS[2]
fr[3] <- bd_GISS[2] + 1
to[3] <- df$yr[length(df$yr)]
ts_yr <- as.vector(time(GISS_a_ts))
fm0 <- lm(GISS_a_ts ~ ts_yr)
overall_trend <- coef(fm0)[2]*100
fm2 <- lm(GISS_a_ts ~ breakfactor(bp.GISS, breaks = 2))
ci.GISS <- confint(bp.GISS, breaks = 2)
##### Generate Plot
plot(GISS_a_ts, main = "Annual GISS Land & Ocean Temperature Anomaly - C \n Segmented Regression",
xlab = "", ylab = "Annual Temperature Anomaly - C", col = "grey")
lines(ts(fitted(fm0), start = 1880), col ="blue")
#lines(ts(fitted(fm2), start = 1880), col = "red")
lines(ci.GISS, col = "red")
## Loop to subset into segments and prepare segment lm()
for (s in 1:3){
sub <- subset(df, df$yr > fr[s] & df$yr <=to[s])
dlm <- lm(sub$anomaly ~ sub$yr, data = sub)
#to get indvidual a & b factors for each segment
a <- coef(dlm)[1]
b <- coef(dlm)[2]
x_vals <- c(fr[s], to[s])
y_vals <- c(a+b*fr[s], a+b*to[s])
lines(x_vals, y_vals, col = "green")
note <- c(signif(b *100, 3))
note_x <- fr[s] + (to[s]-fr[s])/2
note_y <- -0.56
text(note_x, note_y, note, cex = 0.8, pos=4)
}
leg_x <- fr[1] + (to[3] - fr[1])/2
## Plot annotation
text(1930, -0.5, "Trend Rate - C/century", pos=4, cex = 0.75)
text(1885, 0.75, "Baseline: 1951-1980", cex = 0.75, pos = 4)
overall_tr_note <- paste("Overall Trend", signif(overall_trend,4), "C/century")
text(1930, 0.75, overall_tr_note, cex = 0.75, pos=4)
# Generate and add bottom footer KOD & System data notes
mtext("D Kelly O'Day - http://chartgraphs.wordpress.com", side=1,line=0, adj = 0, cex = 0.8, outer=TRUE)
mtext(format(Sys.time(), "%m/%d/ %Y"), side=1,line= 0, adj = 1, cex = 0.8, outer=TRUE)




6 responses so far ↓
Understanding the Science of CO2’s Role in Climate Change: 1 – Introduction « Climate Charts & Graphs // November 12, 2009 at 2:40 PM |
[...] regressions (link, link, [...]
Global Sea Surface Temperature Trends (1850-2009)/ « Charts & Graphs with R // September 4, 2009 at 9:04 AM |
[...] a number of land & ocean temperature anomaly trend charts in previous posts, including: here, here, here and here. These posts have dealt with atmospheric temperature anomalies. In this [...]
RE // August 14, 2009 at 2:16 PM |
Thanks for the fix. Works perfect; very, very helpful tutorial!
My apologies – You’re right there were only 74 lines; I think I pasted your script twice and completely miscounted.
RE // August 2, 2009 at 11:01 PM |
I am unable to run the subset segment loop (starting on line 107). Am I missing something?
Kelly // August 3, 2009 at 7:31 AM
Rodolfo:
Thanks for the feedback. Since my original R file only had 74 lines, I ignored your original comment.
When I got your 2nd comment, I checked my post script. I copied the script from my post and pasted it into Notepad and still only got 74 lines.
However, I reproduced your problem and found that I had corrupted the 1st command after the for loop. I’ve repaired the script so that it should now work for you. Please recopy the script to see if that resolves the issue.
If that doesn’t solve it, I’ll send you my original script file.
TodosLogos // July 28, 2009 at 1:53 PM |
Very useful tutorial!