Excel’s Missing Factor

In this post I show how R’s factor data type allows users to produce effective charts much more quickly and simply than Excel. Since Excel does not include a “factor” data type, users need to use tricks and workarounds to categorize and display factor type data.

Excel Example Data Set

In  his “VBA to Split Data Range into Multiple Chart Series”  (May, 2008) , Jon Peltier writes:

A common problem is to make a chart from a list like this one. You want separate series for each item in one column (e.g., the cities in the first column), but your list is different every time, and you have to spend an hour assigning data to each series in the chart.

This is a job for VBA. It’s possible to write a simple loop that reads the first column of the range, grouping rows together by item.”

Here’s Jon’ finished chart.

Jon Peltier's Excel Chart with VBA Data Range Split

A new R user at the Learning R blog wrote a post on March 16, 2009 using Jon’s example. He/she used Hadley Wickham’s  ggplot2 to reproduce Jon’s chart.  Here’s a screen shot of the Learning R version.

learn_r_chart

They look pretty similar.

Using R’s “factor”  Data Type to subset() the data

R has a  built-in “factor” data type that is great for categorizing data into sub-groups.  Male/Female, smoker/non smoker, income ranges ( < $50k, $50 – 100K,  > $100K), even cities  are examples of categorical variables that take on a limited set of values.  We often want to subset our data by categories and then summarize or plot the subset results. Excel provides sumif(), sumproduct(), array formulas and pivot tables as partial solutions to  subsetting by category. However, users are on their own when it comes to plotting factor type conditional data.

Since Excel does not handle factors directly, Jon Peltier developed a custom technique to subset his data set by category name and plot each subset as a separate data series. Jon’s VBA code assumes the categorical variable is in the first column and he groups the rows by the unique items in the 1st column. This is a nice workaround to a serious Excel limitation , as long as the categorical variable is in the 1st column. If the categorical variable is in columns 2, 3 or other, then Jon needs to reorganize his data set or adjust his VBA code. What about problems with 2 or even more factors? Suppose we wanted to look at disease incidence by gender, smoking, weight status? Jon will be still be busy slicing his data set with VBA while the R user will be looking at his panel charts and interpreting his/her results

R’s factor data type provides a more universal solution to this data subsetting problem than a VBA workaround. Here’s my R script for Jon’s example.

link <-  "C:\\R_Home\\Charts & Graphs Blog\\Data_org\\city_data.txt"
 par(las=1)
 my_data <- read.table(link, header=T,
	colClasses= c("factor", rep("numeric", 2)), sep=" ",
	col.names=c("City", "X","Y"))
 attach(my_data)
 plot(X, Y, type = "n",
	xlim = c(0,20), ylim=c(0, 20),xlab="", ylab="", bty="o",
	xaxs="i", yaxs="i" )
for (i in 1:nlevels(City)) {
  my_sub <- subset(my_data, City == levels(City)[i])
  points(my_sub$X, my_sub$Y, col = i,pch=14+i)
  text(my_sub$X, my_sub$Y,  levels(City)[i], cex=0.85,pos=4)
 }

Here’s my R chart version.

kod_chart_sn2

In this short script, I read in Jon’s example data, specify City as a factor, subset, plot and label the City values with a simple for() loop.  Clean,  simple and quick!

Excel users who work with multivariate data and find that you are spending time looking for and/or developing Excel tricks and work arounds should seriously consider adding R to your analytical toolkit. Why reinvent statistical charting techniques in Excel when they are readily available in R?

About these ads

3 responses to “Excel’s Missing Factor

  1. Pingback: ggplot2: Don’t Try This With Excel « Learning R

  2. Pingback: Don’t Try This With Excel « Charts & Graphs

  3. That’s impressive. Nice script.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s