Step Charts: R is Easier Than Excel

In this post, I show how to make a Step Chart with R.  The chart also includes a lowess smoother and annotation.  Readers can visit my ProcessTrends.com site to see how to make a step chart in Excel.

R’s Step Chart Capability

Here’s  an R step chart that I made using R’s built-in step chart capabilities. It shows  global temperature anomalies from 1880 to 2007, as calculated by NASA – GISS.

step_chart1

 

Why Use a Step Chart? 

Line charts connect two data points with a single straight line, while step charts connect these same two points with a horizontal and a vertical line, in a stair step fashion. Since our temperature chart shows annual average global temperature anomalies, a step chart is more appropriate than a line chart because the calculated Y values are constant for one year spans.

Here’s a side by side comparison of a line chart and a step chart for our temperature data. The line chart is quite spiky. To me, the step chart is both more realistic and more attractive.line_step_compare

Many time series reflect step changes rather than gradual changes implied by a line chart. Postage stamp costs, for example, are constant until an increase on a specific date. Using a line chart for postal rates implies a gradual change in postal rates over time when they are actually step changes at specific dates.

Excel does not provide a step chart option, so users need to use a workaround like the one I show on my site. As a long time  Excel user, I accepted Excel’s limitations and used workarounds like my step chart technique. As I look back, I was like the woodworker who got by with hand tools when I really needed a power tool. As I got better and better with my workarounds, I spent more time working on neat techniques than I did on my actual goal, producing an effective chart.

Overview of R Script 

The script image is shown below, text files of the script and data are available here.

r_script

Let’s walk through the script to see how R handles step charting as well as chart annotation and adding a lowess smoother. The script is set up in the 4 steps I have described before. I have deliberately arranged the script to highlight the arguments for each function to help me be able to reuse the script from chart to chart and help both you and I understand the options for each function. 

Step 1 – Setup 

## STEP 1: SETUP
setwd(‘C:/R_Home/Charts & Graphs Blog/GISS_Temp_Anom/’)
link =  c(“GISS_Temp.txt”)

These 2 lines of actual code (remember # designates a comment line) establish the working directory and establish the variable link that I use to specify the source data file. 

Step 2 – Read Data 

# STEP 2: READ DATA

my_Data <- read.table(link,
             skip = 0, sep = “”, dec=”.”,
             row.names = NULL, header = FALSE,
             colClasses <- rep(“numeric”,3),
             comment.char = “#”, na.strings = c(“*”, “-“,-99.9, -999.9),
             col.names <- c(“Yr”, “Ann_Anomaly”, “5_Yr_Mean”) )

This single line of script reads the data in the file defined by link. In this case, the skip argument is 0, the na.strings include (*, -, -99.9, -999.9) and the column names are specified as Yr, etc.

The nice thing about this script is that I can reuse it over and over, without worrying about file names. I just have to tweak the col.names and colClasses and adjust the na.strings arguments if I find an unusual missing character situation.

Step 3 – Data Manipulation 

# STEP 3: MANIPULATE DATA
  # Construct chart title – use 2 lines w/ \n
  Title <- paste(“Annual Surface Air Temperature Anomaly  \n Based on Meterologic Stations (1880 – 2007)” )

In this example, the only data manipulation is to construct the chart title.

Step 4 – Make Step Chart

## STEP 4: CREATE PLOT
   par(las = 1)       # Set Y axis label orientation to vertical, set text font sizes
   par(cex.main=0.8); par(cex.sub=0.7); par(cex.lab = 0.8); par(cex.axis =.75)
plot(Ann_Anomaly ~ Yr, my_Data,
         ylim = c(-1,1),
        
type=c(“s”),
         col = “dark grey”,
         xlab = “”,
         asp = “full”,
         ylab = expression(paste(“Annual Temperature Anomaly – “,degree, “C (Baseline: 1951-1980)”)),
         main = Title,
         sub = “Source: NASA – GISS @ http://data.giss.nasa.gov/gistemp/graphs/Fig.A.txt&#8221;)
lines(lowess(my_Data$Ann_Anomaly ~ my_Data$Yr, f=0.15),col = “blue” )
arrows(1951,-.5, 1980, -.5, code = 3, angle = 20, col = “dark grey”)
abline(h=0, col = “grey”)          # o.o horizontal line
text(1965, -.495, “Baseline\n Period\n1951-1980″, cex = 0.6, pos=3)
text(1910, -0.95, “Lowess smoother fit, f = 0.15″, cex =0.65, pos = 1 )
points(c(1881,1890), c(-1,-1), col=”blue”,  type = “l”)

This script establishes the chart text sizes, makes the step chart, adds a lowess smoother line, adds an arrow line to designate the 1951-1980 baseline period, adds text notes for baseline period and lowess smoother and adds a short line for lowess line legend.

How did I make my step chart? By using “s” in the type argument! That’s right, one letter changes the chart type. Incredibly simple and powerful.


About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s