Don’t Try This With Excel

In this post I show 4 charts of the same data to demonstrate  what Excel chart users are missing by not having a more powerful charting tool. This post,  building on my previous discussion of using factors for conditional formatting,  shows the potential advantages of plotting summary values and bounding area polygons . These analytical displays are not readily available to even advanced Excel users.

Introduction

In my Excel’s Missing Factors post, I showed how R’s built-in factor data type and subset()  function can do the  same thing as the VBA code that Jon Peltier illustrated in his May, 2008 post. The point of that post was to show that …

Excel users who work with multivariate data and find that you are spending time looking for and/or developing Excel tricks and work arounds should seriously consider adding R to your analytical toolkit. Why reinvent statistical charting techniques in Excel when they are readily available in R?

In this post, I want to extend my discussion to show 2 additional charts, a total of 4 on a page, to demonstrate the power that R brings to the analyst. R’s functions all work together seamlessly. While Excel users can find examples, tips and tools on advanced chart web sites,  the Excel  techniques do not work together in a data analysis system.

4 Displays of the Same  Data

Here’s Jon’s data in a 4 panel chart.

dont_try

Panel 1 shows the basic Excel type plot. Panel 2 shows conditional formatting with VBA. Panel 3 shows conditional formatting of the averages for each City.

When I made Panel 3 I felt that it actually improved my understanding of the data, however, I felt something was missing.  Oh yes, data loss aversion.

I then decided to add polygons to see if showing the data ranges could help. I’ve been fascinated with polygons in Excel but haven’t had a reason to use them. Andy Pope has a very interesting example of finding the bounding area of XY data. He uses the convex hull algorithm. His VBA code is very impressive. Jon Peltier has also done some work with polygons, using VBA.

While Andy and Jon’s VBA code examples are impressive, they are not in a form where I could drop them into a workbook and quickly make my chart.

R has a convex hull (chull) function that lets the user plot the bounding area of the data quickly and easily.  After reading the documentation, I added the bounding areas in Panel 4.

Panel 4 really helped me see to the data. I noticed several things that were not evident in the earlier panels:

  • Atlanta only had 2 data points
  • Chicago had 3 data points, all on the same line
  • Boston and Detroit had a wider range than the other 2

Interesting, But What’s the Point?

As I work with R,  I keep getting the feeling that it is incredibly well organized for data analysis. Panel charts let me display several charts on the same page, letting me see the data from several angles. With a few lines of R script, I can easily summarize and display my data without having to create extra work sheets or helper columns that clutter my workbook.

While I can find Excel VBA tools to do many tasks, these tools are usually single purpose and don’t work well with other tools. I can either spend my time tweaking VBA code or use R’s  built-in functions which work together in a data analysis system. If Excel VBA programmers had joined in a cooperative venture like the R programmers, maybe I’d still be using Excel VBA. Maybe?

Here’s the R script:

## Don't Try This With Excel #######################################
##Script to 4 versions of J Peltier's May, 2008 condional format post chart
## STEP 1: SETUP - Source File
 rm(list=ls())
 oldpar<- par(no.readonly=T)
 script<- "C:/R_Home/Charts&Graphs Blog/Data_org/Dont_try_w_Excel.R"
 link <-  "C:\\R_Home\\Charts & Graphs Blog\\Data_org\\city_data.txt"
 par(las=1, mfcol = c(4,1), oma=c(2,1,4,1), pty="m")
 par(mar=c(0,4,0,4),mgp=c(1,0.25,0),tcl=-0.15)
 par(ps=12, xaxs="i",yaxs="i")
 my_data <- read.table(link, header=T,
   colClasses= c("factor", rep("numeric", 2)), sep=" ",
   col.names=c("City", "X","Y"))
 attach(my_data)
## Plot 1: Basic Excel Type Chart ###########################
 plot_1 <-  plot(X,Y, pch=19, xlim = c(0,20), ylim=c(0,20), axes=F, ann=F)
 axis(4, cex.axis=0.9)
 box("plot", col = "black")
 text(17,17, "1: Simple XY Plot", font=4, cex=0.9)
## Plot 2 - all data points w/ Conditional Formatting ###############
 avg_x <- numeric(nlevels(City))   # create vector for City avgs - x
 avg_y <- numeric(nlevels(City))   # create vector for city avg = y
 plot_2 <-  plot(X, Y, type = "n",
   xlim = c(0,20), ylim=c(0, 20),xlab="", ylab="", bty="o",
   xaxs="i", yaxs="i" , axes=F, cex.axis=0.9,  ann=T)
 axis(2, cex.axis =0.9)
 box("plot", col = "black")
 for (i in 1:nlevels(City)) {
  my_sub <- subset(my_data, City == levels(City)[i])
  points(my_sub$X, my_sub$Y, col = i,pch=14+i)
  text(my_sub$X, my_sub$Y,  levels(City)[i], cex=0.9,pos=4) }
 text(17,17, "2: Conditional Format", font=4, cex=0.9)
## Plot 3 - just averages ################################
 plot_3<-  plot(X, Y, type = "n",
    xlim = c(0,20), ylim=c(0, 20),xlab="", ylab="", bty="o",
    xaxs="i", yaxs="i",cex.axis=0.9,axes=F )
 axis(4, cex.axis =0.9)
 box("plot", col = "black")
 for (j in 1:nlevels(City)) {
  my_sub <- subset(my_data, City == levels(City)[j])
  avg_x <- tapply(my_sub$X, INDEX = my_sub$City, mean)
  avg_y <- tapply(my_sub$Y, INDEX = my_sub$City, mean)
  text(avg_x, avg_y,  levels(City)[j],cex=0.9, pos=4)
  points(avg_x, avg_y, col = j,pch=14+j, cex=1.25)  }
 text(17,17, "3: City Averages", font=4, cex=0.9)
## Plot 4 - Polygons ###################################
  par(mar=c(2,4,0,4))
 plot_4<-  plot(X, Y, type = "n",
    xlim = c(0,20), ylim=c(0, 20),xlab="", ylab="", bty="o",
    xaxs="i", yaxs="i",cex.axis=0.9 )
 box("plot", col = "black")
 for (k in 1:nlevels(City)) {
   my_sub <- subset(my_data, City == levels(City)[k])
  avg_x <- tapply(my_sub$X, INDEX = my_sub$City, mean)
  avg_y <- tapply(my_sub$Y, INDEX = my_sub$City, mean)
  points(my_sub$X, my_sub$Y, col = k,pch=14+k, cex = 1)
  hull <- chull(my_sub$X, my_sub$Y)
  polygon(my_sub$X[hull], my_sub$Y[hull],col = k,
    density = 15,border = "grey" )
  text(avg_x, avg_y,  levels(City)[k],cex=0.9, pos=4) }
  text(17,17, "4: City Hulls", font=4, cex=0.9)
## Outer margin annotation
  mtext("Don't Try This With Excel", side = 3,font=4,  outer = T, line=2.5)
  mtext("Based on Jon Peltier Example - May, 2008", side = 3,line = 0.5 ,
     font = 1,outer    = T, cex = 0.7)
  my_date <- format(Sys.time(), "%m/%d/%y")
  mtext("DK O'Day", side = 1, line = 0.5, cex = 0.7, outer = T, adj =0.8)
  mtext(script, side = 1, line = .5, cex=0.7, outer = T, adj = 0)
  mtext(my_date, side = 1, line =.5, cex = 0.7, outer = T, adj = 1)
## STEP 5: CLOSE
 par(oldpar)
detach(my_data)
About these ads

5 responses to “Don’t Try This With Excel

  1. Pingback: R Lets You Put Chart Inside Chart « Charts & Graphs

  2. Pingback: ggplot2: Don’t Try This With Excel (Revised) « Learning R

  3. ps. the RSS from your Process Trends site has the wrong URL for this blog post (missing the “s” in chart(s)graphs.wordress.com/….)

  4. Pingback: ggplot2: Don’t Try This With Excel « Learning R

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s