**What’s Wrong With Excel’s Stacked Column Chart ** This is my 2nd post on BP Oil Statistical Review of World Energy – June 2008, post 1 is here. In this post, I discuss Excel’s stacked column charts, using BP’s Regional Consumption Pattern 2007 as an example. ** ** First, the good news. While I’m not sure what chart package BP used, this chart looks a lot like an Excel stacked column chart with some color, Y axis position and label enhancements. The bad news is that BP used a stacked column chart which is the Excel way of showing three variables on a 2D display. In this case, the variables are: **Region** of World, **Fuel** type and **percent** of energy use by fuel in that region. It is difficult to interpret the values for the internal fuels on the stacked column chart because they do not have a common baseline. Here’s the same data in an R Lattice chart. To me, the R Lattice – trellis chart helps me to see the patterns more clearly than the stacked column chart. Notice how Asia Pacific use of coal sticks out! S & C America have the largest portion of hydroelectric use. Natural gas is used widely, with Middle East having the greatest portion. Nuclear is relatively small, with Europe leading and N America close behind. Finally, look at oil use, Middle East is greatest user, followed by S & C America and Africa. Could you see these details in BP’s stacked column chart? **Why Are Excel Chart Users Still Using Excel for Multivariate Charts** If R is so good and free, then why are Excel charters still using Excel for multivariate charts? For me, there are two reasons:

- At first, I didn’t know any better. I knew about small multiples from Tufte’s writings, however, I didn’t know there was a free tool that could do trellis – lattice type small multiples.
- R Learning Curve – Naomi Robbins book, Creating More Effective Graphs, introduced me to R. I got excited about R’s capabilities, however, I found the learning curve daunting. It was easier to slip back into comfortable Excel charts rather than learn a new – better way to make charts.

**R Lattice Flexibility** In a comment to this post, Tony said ” .. *I typically like to see all of the charts either in a row or column so it’s easier to compare*.” Here the lattice plot the way Tony prefers. The original 2 x 3 matrix was the default. by adding a simple layout control, I changed it to a 1x 6 display.

** Update** Hadley Wickham, an R heavyweight (that’s a good thing), suggested in his comment that “..

**Here’s my revised chart based on Hadley’s suggested reorder of both the fuel and region factors.**

*you might also want to use the reorder function to reorder the factor levels in terms of the highest use”. Since it sounded like a good idea from a really experienced R programmer, I decided to give it a try.*I like it! The fuel panels are now sorted by median percent energy use, with nuclear the lowest and oil the highest. The regions are sorted by magnitude of oil use, with Africa at low end and N America at high end.

**Update 2** In my 1st update, I sorted the panels by magnitude of energy use, however, I was not able to sort by regional energy use within panels. I tried several sorting, ordering and reordering approaches to no avail. I finally asked the R Graphics expert, Paul Murrell, the author of R Graphics for help. Thanks again Paul.

Here’s the plot the way I really wanted it.

*“Whilst the charts look nice isn’t the fact that the x-axis doesn’t begin at 0 confusing the message?*

*Percentage of nuclear energy usage in Middle-East is not that intuitive…I would recommend forcing the x-axis to begin at 0.” *

Here’s the chart with the x-axis starting at 0.

Pingback: R ggplot2 Plot Beats Excel Stacked Column Chart « Charts & Graphs

I’ve sent my script to your processtrends.com email address.

You’ll notice that ggplot2 leaves less margin around the edges and gray’s the axis tick labels, to focus attention on the data itself.

At first, the gray background (which can be changed easily) reminded me a bit of Excel’s gray background, but the difference is that Excel’s is dark enough to cut overall contrast — making it harder to read — while ggplot’s is lighter.

(P.S. The graph works fine in 0.8 of ggplot2, but a bug fix was released in the last couple of days: 0.8.1.)

Wayne:

Thanks for the tip on ggplot2. I have it on my list to try out. Your comment has convinced me to move it up to the top.

Your script is being chewed up by WordPress, when it sees the “” it thinks you are sending html tags.

You can get around this by usign tag or tag.

Since I’d like to put your ggplot2 in the post as update 4, please send me a simple text file of your script and I’ll add the plot to the post.

Kelly

One last try, then I have to kick myself and quit. The less-than and greater-thans are messing it up…

#########

library (ggplot2)

energy$Fuel.median <- with(energy, rev()reorder(Fuel, Percent, mean))

qplot(Region, Percent, data=energy, geom=”bar”, main=”Regional Energy Consumption Patterns – 2007″, xlab=”Percent of Regional Energy Use”, ylab=”Region”) + coord_flip() + facet_grid (Fuel.median ~ .)

#########

Hmmm… got mis-formatted. Let me try again…

> library (ggplot2)

> energy$Fuel.median qplot(Region, Percent, data=energy, geom=”bar”, main=”Regional Energy Consumption Patterns – 2007″, xlab=”Percent of Regional Energy Use”, ylab=”Region”) + coord_flip() + facet_grid (Fuel.median ~ .)

I think you will like ggplot2. It takes a different mindset from other R graphing alternatives, but it really makes sense as you proceed to charts that are more and more complex. A quick start on your Update 3 trellis chart:

> library (ggplot2)

> energy energy$Fuel.median qplot (Region, Percent, data=energy, geom=”bar”, main=”Regional Energy Consumption Patterns – 2007″, xlab=”Percent of Regional Energy Use”, ylab=”Region”) + coord_flip() + facet_grid (Fuel.median ~ .)

Pingback: Is it “..too hard” to change R plot defaults? Update 1 « Charts & Graphs

Andreas

Your line of script may have been disrupted by WordPress. It looks like you are calling the function xyplot() rather than plot(). Can you send me the actual r script and data files?

Send them to koday**at**processtrends**dot*com with appropriate corrections to remove characters that I’ve added to fool the robotic e-mail scalpers.

Kelly

Hi! thanks for your posts. I’ trying to learn R and will come back ofte :-)

Right now I wonder about this: When doing a normal (not lattice) plot, xlim doesent really force the axis to start at 0

> x y plot(y~x, xlim=c(0,10),ylim=c(0,10))

there is still some kind of inner margin. Any idea how to remove it?

sincerely

I didn’t notice the fact that you are reordering regional energy use _within_ each panel. I’m not convinced this is a good idea because it’s very easy to miss (I did!) if you just quickly skim the axis labels and assume that they are all the same.

I’m disturbed that the problem with the bars not starting at 0 occurs in all of the bar charts. This is documented in panel.barchart: “Otherwise, defaults to ‘NULL’, in which case bars start at

the left (or bottom) end of a panel. This choice is somewhat unfortunate, as it can be misleading, but is the default for historical reasons.” So the moral is to always set origin to 0.

Hadley:

I’ve started looking at your ggplot2 documentation. I am impressed and will give it a try very soon.

Here’s the code I used to produce my lattice plot.

#######

library(lattice)

my_data <- read.table(“http://processtrends.com/Files/2007_energy_use_by_region_type.csv”,

, sep =”,”, header=TRUE,

skip=””,

colClasses=c(“factor”, “factor”, “numeric”))

# Reorder fuel factor by median % use

fuel_median <- with(my_data, reorder(Fuel, Percent, mean))

barchart(my_data$Region ~ Percent | fuel_median, data = my_data,

main=”Regional Energy Consumption Patterns – 2007 “,

sub = “Data Source: BP Statistical Review of World Energy 2008″,

xlab=”Percent of Regional Energy Use “,

# xlim=c(0,55),

ylab=”Region”,

col=”grey”,

border=”NA”,

par.settings=list(axis.text=list(cex=0.85), fontsize=list(text=10)),

par.strip.text=list(cex=0.9),

par.strip.col=”white” ,

layout = c(1,6),

aspect = (0.3),

# Code provided by Paul Murrell to order by fuel% within each fuel panel

scales=list(y=list(relation=”free”)),

prepanel=function(x, y, …) {

list(ylim=levels(reorder(y, x, sum)))},

panel=function(x, y, …) {

panel.barchart(sort(x), reorder(y, x, mean), …) })

#########

Once I got Jaanus’s comment, I added the the xlim(0,55) argument which solved the problem.

When I run the script with the xlim() active, the chart plots strating at 0.

I suspect the problem is with the prepanel() call.

What lattice code did you use to produce the graph? I’m surprised that the bars didn’t start at zero already (they would if you used ggplot2 ;)

Jaanus

I’ve adjusted the x-axis scale to start at 0.

Whilst the charts look nice isn’t the fact that the x-axis doesn’t begin at 0 confusing the message?

Percentage of nuclear energy usage in Middle-East is not that intuitive…

I would recommend forcing the x-axis to begin at 0.

I’ve updated the lattice plot to sort by both fuel and region within fuel, as suggested by Tony.

One approach would be to use the plyr package and do something like:

ddply(mydf, .(fuel), transform, region = reorder(region, perc))

Hadley

Hadley:

Thanks for the tip on the reorder function. I was able to reorder based on fuel; however, I have been stumped on how to reorder on both fuel and region within fuel.

Tony:

Thanks for feedback. As my note to Hadley shows, I’m still trying to figure our how to do a 2-way reorder based on fuel then region within fuel.

Stay tunned. I’ll get it eventually.

I like the most recent version with the reordering. One change that may help even more is reordering each energy type so the bars go from highest to lowest.

I think this change would make it easier to read and provide just a little more value that what’s already available.

I’m looking forward to learning R.

For those graphics, you might also want to use the reorder function to reorder the factor levels in terms of the highest use.

And it’s nice to see someone selling R to excel users! It really is much easier to do data analysis in R, once you get the hang of it.

You might also want to have a look at my ggplot2 package. It offers similar capabilities to lattice, but based around Wilkinson’s grammar of graphics, which means it’s generally easier to build up complex graphics a piece at a time. There some nice side effects of this approach, including automatic legends that get it right 99% of the time. See http://had.co.nz/ggplot2 for more info and examples.

Ahh, I like that version better. It is not that easy to make that same change in Excel.

You could also add another series next to this vertical chart that shows energy type on the y-axis and segmented chart by region. This will allow you to see the region detail vs. energy type.

Nice job Kelly!

Tony

Good point. I’ve added a vertical version to show how easy it is to adjust lattice chart layouts in R.

I also agree and have written about the ineffectiveness of stacked column charts. If I was trying to create this in Excel, I would go with a panel chart, which is similar to your R Lattice chart. I typically like to see all of the charts either in a row or column so it’s easier to compare.

I’m not familiar with R but Origin can make the lattice chart also (but not automatically). You can create several individual charts and then use the layout function to combine them. Origin is not free though. I’m going to check out R.