In this post, I examine an R script to show how to: 1) Read dynamic data from a web site text file, 2) Determine number of data rows in file; 3) Determine date of last measurement, 4) Create dynamic title; 5) Bank to 45, and 6) Create an XY Plot. Excel users are encouraged to dust off their R program and try this script on your own PC. Links to the source data and script are provided. (more…)
Entries from October 2008
Anatomy of an R Chart Script
October 28, 2008 · 9 Comments
Categories: Climate Change · R Learning Curve
Tagged: Climate Trends, Excel and R, R scripts
Using R to Enhance an Excel Chart
October 22, 2008 · 5 Comments
In this post I show how to add a loess fit from R to an Excel chart. (more…)
Categories: Chart Principles
Tagged: R and Excel, R scripts
Dot Plots versus Stacked Bar Charts – Update 3
October 16, 2008 · 11 Comments
This post includes three updates. I have added to the original material to show sequence of my thinking in the chart development based on reader input and charts. (more…)
Categories: Chart Principles · R Learning Curve
Tagged: R scripts
R Learning Curve for an Excel User
October 13, 2008 · 4 Comments
The only difference is that with R, I have a much more powerful tool with 100’s of world class programmers developing new tools all the time.
R has many multivariate chart and analysis capabilities not included in Excel:
- Dot plots
- Step charts
- Box & Whisker plots
- Trellis – Lattice – Small Multiples
- Lowess smooothing
I have been able to develop rudimentary versions of the 4 advanced chart types, however, I have not been able to prepare a lowess smoothing program in Excel – VBA. Once I started comparing R to my Excel VBA tools, I realized that I could produce a better, more advanced chart with one line of R code than with dozens of lines of VBA code. By switching to R for my multivariate charts, I gain from the work of 100’s of world class statistical programmers, letting me concentrate on producing advanced charts rather than developing Excel VBA code that just handles part of what the R function does.
R Learning Strategy
Having tried to learn R several time before, I’ve developed an R learning strategy that I think will work this time:
- Use Excel charts for all my simple, straight forward charts.
- Use Excel for all of my data manipulation. Since I can do just about everything I need in Excel, why relearn data manipulation in R
- Use a friendly R interface tool. I selected TINN-R, a simple and effective tool for writing my scripts.
- Use the free, Chapters 9 & 10 from Introduction to Data Technologies written by one of the leading statistical charting programmers, Paul Murrell.
- Transfer data to R through CSV files. It’s easy and it reduces my learning curve.
- Learn just one of R graphics packages, in my case, Lattice. It does a great job on trellis – lattice, multivariate charts, Excel’s Achilles heal.
- Use a code snippet tool (VBCC in my case) to store my R code snippets in an easily retrievable form. I used this same tool to save my VBA code snippets, saving me lots of time on my VBA learning curve.
- Keep going! Now that I have a few scripts under my belt, I feel much more comfortable.
Categories: R Learning Curve
Tagged: R scripts
R Lattice Plot Beats Excel Stacked Column Chart – Update 3
October 9, 2008 · 24 Comments
What’s Wrong With Excel’s Stacked Column Chart This is my 2nd post on BP Oil Statistical Review of World Energy – June 2008, post 1 is here. In this post, I discuss Excel’s stacked column charts, using BP’s Regional Consumption Pattern 2007 as an example.
First, the good news. While I’m not sure what chart package BP used, this chart looks a lot like an Excel stacked column chart with some color, Y axis position and label enhancements. The bad news is that BP used a stacked column chart which is the Excel way of showing three variables on a 2D display. In this case, the variables are: Region of World, Fuel type and percent of energy use by fuel in that region. It is difficult to interpret the values for the internal fuels on the stacked column chart because they do not have a common baseline. Here’s the same data in an R Lattice chart.
To me, the R Lattice – trellis chart helps me to see the patterns more clearly than the stacked column chart. Notice how Asia Pacific use of coal sticks out! S & C America have the largest portion of hydroelectric use. Natural gas is used widely, with Middle East having the greatest portion. Nuclear is relatively small, with Europe leading and N America close behind. Finally, look at oil use, Middle East is greatest user, followed by S & C America and Africa. Could you see these details in BP’s stacked column chart? Why Are Excel Chart Users Still Using Excel for Multivariate Charts If R is so good and free, then why are Excel charters still using Excel for multivariate charts? For me, there are two reasons:
- At first, I didn’t know any better. I knew about small multiples from Tufte’s writings, however, I didn’t know there was a free tool that could do trellis – lattice type small multiples.
- R Learning Curve - Naomi Robbins book, Creating More Effective Graphs, introduced me to R. I got excited about R’s capabilities, however, I found the learning curve daunting. It was easier to slip back into comfortable Excel charts rather than learn a new – better way to make charts.
R Lattice Flexibility In a comment to this post, Tony said ” .. I typically like to see all of the charts either in a row or column so it’s easier to compare.” Here the lattice plot the way Tony prefers.
The original 2 x 3 matrix was the default. by adding a simple layout control, I changed it to a 1x 6 display.
Update Hadley Wickham, an R heavyweight (that’s a good thing), suggested in his comment that “.. you might also want to use the reorder function to reorder the factor levels in terms of the highest use”. Since it sounded like a good idea from a really experienced R programmer, I decided to give it a try. Here’s my revised chart based on Hadley’s suggested reorder of both the fuel and region factors.
I like it! The fuel panels are now sorted by median percent energy use, with nuclear the lowest and oil the highest. The regions are sorted by magnitude of oil use, with Africa at low end and N America at high end.
Update 2 In my 1st update, I sorted the panels by magnitude of energy use, however, I was not able to sort by regional energy use within panels. I tried several sorting, ordering and reordering approaches to no avail. I finally asked the R Graphics expert, Paul Murrell, the author of R Graphics for help. Thanks again Paul.
Here’s the plot the way I really wanted it.
“Whilst the charts look nice isn’t the fact that the x-axis doesn’t begin at 0 confusing the message?
Percentage of nuclear energy usage in Middle-East is not that intuitive…I would recommend forcing the x-axis to begin at 0.”
Here’s the chart with the x-axis starting at 0.
Categories: Multivariate Charts
Tagged: R and Excel, R Beats Excel, R scripts
R Lattice Plot Beats Excel Stacked Area Trend Chart
October 5, 2008 · 3 Comments
Start of Series on Switch to R for Advanced Charts
I have been a long time Excel chart user, with dozens of techniques and tools for advanced Excel charts. As I pointed out in this post, Excel’s multivariate data visualization limitations are severe. For me, the best strategy is to switch to R for advanced charts and continue to use Excel for my data analysis and simple charts.
This post starts a series on my transition to R that may be of interest to those Excel chart users who struggle with Excel’s limited multivariate chart capabilities. I will post videos and provide source data files as well as R scripts on my ProcessTrends.com website for those who want to try R for themselves.
Working Example
In learning R graphics, I will be using the BP Statistical Review of World Energy – June 2008 report for data and examples of ineffective Excel like charts. BP does a great service for all of us by publishing their data rich report and workbook on world energy trends, an incredibly important topic from economic, environmental and security standpoints. The 45 page report includes 8 charts types and a total of 26 charts. From a data visualization standpoint, 4 of these chart types are acceptable and 4 are ineffective.
BP seems to use Excel – PowerPoint or a similarly weak data visualization tool for their charts. The telling point is the reliance on doughnut, stacked area trend charts, stacked column charts and clustered column charts to show multivariate data. I have previously discussed BP’s chart selection in my Chart Doctor page.
Stacked Area Trend Charts
BP’s Statistical Review includes 11 stacked area trend charts. Let’s take a look at one of them.
This multivariate chart shows 3 variables: oil consumption (Y axis) by year (X axis) and region (stacked area). To add the 3rd dimension to this 2D display, BP stacked the regions with a stacked area chart.
We can see the North American and world total values clearly, however, it is quit difficult to interpret the specific trends for the 5 other regions because their baselines change over time.
I’ve made this R lattice chart to show the oil consumption trends for the 6 regions. Notice that I have added the 1965-1982 data and have not included the worldwide total consumption.

This lattice plot adds the 3rd variable by making a small multiple chart for each region. In this example, the small multiples have the same X and Y scales so comparisons are very straightforward.
The lattice plot shows the individual regional trends clearly, we can see that Asia Pacific consumption has increased dramatically, now equivalent to N American consumption. We can also see that Europe – Eurasia consumption peaked in the late 1970s and has been drifting downward since. We see similar local peaks in N America and Asia Pacific consumption in the late 1970s. We can also see the smaller market share and increasing trends for Africa, S & C America and Middle East.
The R lattice plot lets me see both the trends and market share for each region much more clearly than the Excel like stacked area trend chart.
Excel’s Missing Multivariate Chart Capabilities
Why use stacked area trend charts? Did BP’s charters use stacked area trend charts because their chart tool doesn’t have small multiples, trellis – lattice chart capabilities?
Excel charters often use stack & cluster techniques on their their multivariate data because Excel doesn’t have built-in small multiple, trellis – lattice tools. To work around Excel’s multivariate chart limitations, I started making panel charts in 2006. Once I started looking into R, I quickly realized that Excel work-arounds, tricks, neat techniques aren’t enough. The tool does matter. You need real multivariate graphical tools to effectively visualize multivariate data. Since Excel’s chart engine doesn’t support effective multivariate data displays, it’s time to switch to a tool that does.
After spending so much time mastering Excel, it just makes me wonder, “.. why the h*** doesn’t Excel have trellis – lattice capabilities? “ Tufte wrote about “small multiples” in his 1983 book, “The Visual Display of Quantitative Information“. That’s 25 years ago. Has the Microsofts Excel Team read Tufte, Few, Robbins, Cleveland?
I’d like to hear what you think about multivariate charting with Excel.
Categories: Multivariate Charts · R Learning Curve
Tagged: R and Excel, R Beats Excel, R Graphics, R scripts





