In this post, I examine an R script to show how to: 1) Read dynamic data from a web site text file, 2) Determine number of data rows in file; 3) Determine date of last measurement, 4) Create dynamic title; 5) Bank to 45, and 6) Create an XY Plot. Excel users are encouraged to dust off their R program and try this script on your own PC. Links to the source data and script are provided.
Example R Chart and R Script
The web based source data file for this example, available at this link, includes the monthly CO2 measurements at Mana Loa Observatory in Hawaii since March, 1958. The data is updated monthly.
Our goal is to develop a tool to update our chart monthly as new measurements are added to the source file.
The script image is shown below, a text file of the script is available here.
We will walk through the R script to learn how to read web based data files, handle dynamic data ranges, construct dynamic titles, and produce an XY plot.
Overview of R Script
I see most Excel or R chart tasks as a sequence of 4 steps:
- Setup the background information
- Read the source data
- Manipulate data for chart preparation
- Make the chart
The R script for this example is set up in these 4 steps. I’ve deliberately arranged the script to clearly highlight the arguments for each function to help both you and I to understand the options available under each function.
Step 1 – Setup
## STEP 1: SETUP
setwd(”C:/R_Home/CG”)
library(lattice)
link <-url(”ftp://ftp.cmdl.noaa.gov/ccg/co2/trends/co2_mm_mlo.txt”)
These 3 lines (remember # lines are comments) set the working directory, install load the lattice library package and specify the link for the CO2 data text file.
Step 2 – Read Data
## STEP 2: READ DATA
CO2_Data <- read.table(link,
sep = “”,
dec=”.”,
skip = 68,
row.names = NULL,
header = FALSE,
colClasses = rep(”numeric”,6),
na.strings = -99.99,
col.names <- = c(”Yr”, “Mo”, “Mo_Yr”, “CO2″, “Trnd”, “X”) )
Notice the skip argument, this tells R to skip the first 68 lines in the source file becasue this is file documentation information, not data. Also, the na.strings argument tells R that -99.99 is the code for data not available.
We now have a data frame named CO2_Data that is ready for data manipulation and charting.
Step 3: Data Manipulation
## STEP 3: MANIPULATE DATA
c <- nrow(CO2_Data) # Find number of data rows
mo <- CO2_Data$Mo[c] # Find month for last data row
yr <- CO2_Data$Yr[c] # Find year for last data row
thru <- paste(”Updated Through:”, mo,”/”,yr) # Note on last data point
Title <- expression(paste(”Monthly C”, O[2], ” (ppmv) – Mana Loa, Hawaii”))
To determine the last measurement’s month and year, we determine the row number (c) for the last row, then we determine the mo and yr for row c.
To construct a note that shows the last month, we use the paste() function to construct a concatenated text string of plain text and the mo and yr variables. The title uses the expression() function to allow us to format the 2 in CO2 as a subscript.
Step 4 – Make XY Plot
## STEP 4: CREATE PLOT
xyplot(CO2 ~ Mo_Yr, CO2_Data,
ylim = c(300,400), # y axis limits
xlim = c(1958, 2009), # x axis limits
type=”l”, # type of line
col = “red”, # color of line
xlab = “”, # no x axis label
ylab = expression(paste(CO[2] – ppmv)), # y axis title w/ subscript for 2
main = Title, # add main title using Title variable
sub = thru, # add chart note using variable thru
par.settings=list(axis.text=list(cex=0.8),
fontsize=list(text=10)), # font sizes
aspect = “xy”) # banking to 45
This xyplot() function includes arguments for x and y axis limits, line type, line color, x and y axis titles, fontsizes and banking to 45.
Try It For Yourself
The R script file includes the source data file link so you have everything you need to make this chart.
If you have R setup on your computer, you are ready to go. Why put off trying R until a later date, next month or next year?
Try it and let me know how you do.


8 responses so far ↓
Step Charts: R is Easier Than Excel « Charts & Graphs // December 20, 2008 at 2:37 PM |
[...] chart annotation and adding a lowess smoother. The script is set up in the 4 steps I have described before. I have deliberately arranged the script to highlight the arguments for each function to help me be [...]
Ruslan // November 20, 2008 at 5:18 AM |
The download function doesn’t work for me beccause I use a web proxy I guess.
Boxplots: R Does Them Right « Charts & Graphs // November 18, 2008 at 1:00 PM |
[...] the script to see how R handles boxplots. The script is set up in the 4 steps I have described before. I have deliberately arranged the script to highlight the arguments for each function to help me be [...]
Step Charts: R is Easier Than Excel « Charts & Graphs // November 11, 2008 at 12:16 PM |
[...] R Resources ← Anatomy of an R Chart Script [...]
dkodpe // October 28, 2008 at 3:25 PM |
Robert
Both suggestions are good! Thanks.
I checked your comment.char, it works just as you describe.
The attach() function lets you simplify variable names. You can use Mo rather than CO2_Data$Mo.
Robert Kosara // October 28, 2008 at 2:56 PM |
Nice walkthrough! I don’t think you need to skip those lines though, have you looked at the comment.char parameter to read.table? It’s set to # by default, and I guess it should ignore any line starting with that character. I haven’t tried this out yet, but this would make this a bit more robust (in case the number of lines for the data description changes). Also, if you attach the data frame, you can access the columns directly by name (I’d also give them longer names, but that’s just me).
dkodpe // October 28, 2008 at 1:02 PM |
Hadley:
Thanks.
I know that “<-” and “=” are equivalent for function assignments. The “<- ” seems to work for the arguments in my example.
Is it convention or required to use “=” in arguments?
Hadley Wickham // October 28, 2008 at 12:49 PM |
You have a small error in your read.table function – you’re using <- instead of = for col.names and colClasses. I’d also describe library() as loading the lattice package, not installing it.