What’s Wrong With Excel’s Stacked Column Chart This is my 2nd post on BP Oil Statistical Review of World Energy – June 2008, post 1 is here. In this post, I discuss Excel’s stacked column charts, using BP’s Regional Consumption Pattern 2007 as an example. First, the good news. While I’m not sure what chart package BP used, this chart looks a lot like an Excel stacked column chart with some color, Y axis position and label enhancements. The bad news is that BP used a stacked column chart which is the Excel way of showing three variables on a 2D display. In this case, the variables are: Region of World, Fuel type and percent of energy use by fuel in that region. It is difficult to interpret the values for the internal fuels on the stacked column chart because they do not have a common baseline. Here’s the same data in an R Lattice chart. To me, the R Lattice – trellis chart helps me to see the patterns more clearly than the stacked column chart. Notice how Asia Pacific use of coal sticks out! S & C America have the largest portion of hydroelectric use. Natural gas is used widely, with Middle East having the greatest portion. Nuclear is relatively small, with Europe leading and N America close behind. Finally, look at oil use, Middle East is greatest user, followed by S & C America and Africa. Could you see these details in BP’s stacked column chart? Why Are Excel Chart Users Still Using Excel for Multivariate Charts If R is so good and free, then why are Excel charters still using Excel for multivariate charts? For me, there are two reasons:
- At first, I didn’t know any better. I knew about small multiples from Tufte’s writings, however, I didn’t know there was a free tool that could do trellis – lattice type small multiples.
- R Learning Curve - Naomi Robbins book, Creating More Effective Graphs, introduced me to R. I got excited about R’s capabilities, however, I found the learning curve daunting. It was easier to slip back into comfortable Excel charts rather than learn a new – better way to make charts.
R Lattice Flexibility In a comment to this post, Tony said ” .. I typically like to see all of the charts either in a row or column so it’s easier to compare.” Here the lattice plot the way Tony prefers. The original 2 x 3 matrix was the default. by adding a simple layout control, I changed it to a 1x 6 display.
Update Hadley Wickham, an R heavyweight (that’s a good thing), suggested in his comment that “.. you might also want to use the reorder function to reorder the factor levels in terms of the highest use”. Since it sounded like a good idea from a really experienced R programmer, I decided to give it a try. Here’s my revised chart based on Hadley’s suggested reorder of both the fuel and region factors.
I like it! The fuel panels are now sorted by median percent energy use, with nuclear the lowest and oil the highest. The regions are sorted by magnitude of oil use, with Africa at low end and N America at high end.
Update 2 In my 1st update, I sorted the panels by magnitude of energy use, however, I was not able to sort by regional energy use within panels. I tried several sorting, ordering and reordering approaches to no avail. I finally asked the R Graphics expert, Paul Murrell, the author of R Graphics for help. Thanks again Paul.
Here’s the plot the way I really wanted it.
“Whilst the charts look nice isn’t the fact that the x-axis doesn’t begin at 0 confusing the message?
Percentage of nuclear energy usage in Middle-East is not that intuitive…I would recommend forcing the x-axis to begin at 0.”
Here’s the chart with the x-axis starting at 0.