*In this post I show 4 charts of the same data to demonstrate what Excel chart users are missing by not having a more powerful charting tool. This post, building on my previous discussion of using factors for conditional formatting, shows the potential advantages of plotting summary values and bounding area polygons . These analytical displays are not readily available to even advanced Excel users. *

**Introduction**

In my Excel’s Missing Factors post, I showed how R’s built-in factor data type and subset() function can do the same thing as the VBA code that Jon Peltier illustrated in his May, 2008 post. The point of that post was to show that …

*Excel users who work with multivariate data and find that you are spending time looking for and/or developing Excel tricks and work arounds should seriously consider adding R to your analytical toolkit. Why reinvent statistical charting techniques in Excel when they are readily available in R?*

In this post, I want to extend my discussion to show 2 additional charts, a total of 4 on a page, to demonstrate the power that R brings to the analyst. R’s functions all work together seamlessly. While Excel users can find examples, tips and tools on advanced chart web sites, the Excel techniques do not work together in a data analysis system.

**4 Displays of the Same Data **

Here’s Jon’s data in a 4 panel chart.

Panel 1 shows the basic Excel type plot. Panel 2 shows conditional formatting with VBA. Panel 3 shows conditional formatting of the averages for each City.

When I made Panel 3 I felt that it actually improved my understanding of the data, however, I felt something was missing. Oh yes, data loss aversion.

I then decided to add polygons to see if showing the data ranges could help. I’ve been fascinated with polygons in Excel but haven’t had a reason to use them. Andy Pope has a very interesting example of finding the bounding area of XY data. He uses the convex hull algorithm. His VBA code is very impressive. Jon Peltier has also done some work with polygons, using VBA.

While Andy and Jon’s VBA code examples are impressive, they are not in a form where I could drop them into a workbook and quickly make my chart.

R has a convex hull (chull) function that lets the user plot the bounding area of the data quickly and easily. After reading the documentation, I added the bounding areas in Panel 4.

Panel 4 really helped me see to the data. I noticed several things that were not evident in the earlier panels:

- Atlanta only had 2 data points
- Chicago had 3 data points, all on the same line
- Boston and Detroit had a wider range than the other 2

**Interesting, But What’s the Point?**

As I work with R, I keep getting the feeling that it is incredibly well organized for data analysis. Panel charts let me display several charts on the same page, letting me see the data from several angles. With a few lines of R script, I can easily summarize and display my data without having to create extra work sheets or helper columns that clutter my workbook.

While I can find Excel VBA tools to do many tasks, these tools are usually single purpose and don’t work well with other tools. I can either spend my time tweaking VBA code or use R’s built-in functions which work together in a data analysis system. If Excel VBA programmers had joined in a cooperative venture like the R programmers, maybe I’d still be using Excel VBA. Maybe?

Here’s the R script:

## Don't Try This With Excel ####################################### ##Script to 4 versions of J Peltier's May, 2008 condional format post chart ## STEP 1: SETUP - Source File rm(list=ls()) oldpar<- par(no.readonly=T) script<- "C:/R_Home/Charts&Graphs Blog/Data_org/Dont_try_w_Excel.R" link <- "C:\\R_Home\\Charts & Graphs Blog\\Data_org\\city_data.txt" par(las=1, mfcol = c(4,1), oma=c(2,1,4,1), pty="m") par(mar=c(0,4,0,4),mgp=c(1,0.25,0),tcl=-0.15) par(ps=12, xaxs="i",yaxs="i") my_data <- read.table(link, header=T, colClasses= c("factor", rep("numeric", 2)), sep=" ", col.names=c("City", "X","Y")) attach(my_data) ## Plot 1: Basic Excel Type Chart ########################### plot_1 <- plot(X,Y, pch=19, xlim = c(0,20), ylim=c(0,20), axes=F, ann=F) axis(4, cex.axis=0.9) box("plot", col = "black") text(17,17, "1: Simple XY Plot", font=4, cex=0.9) ## Plot 2 - all data points w/ Conditional Formatting ############### avg_x <- numeric(nlevels(City)) # create vector for City avgs - x avg_y <- numeric(nlevels(City)) # create vector for city avg = y plot_2 <- plot(X, Y, type = "n", xlim = c(0,20), ylim=c(0, 20),xlab="", ylab="", bty="o", xaxs="i", yaxs="i" , axes=F, cex.axis=0.9, ann=T) axis(2, cex.axis =0.9) box("plot", col = "black") for (i in 1:nlevels(City)) { my_sub <- subset(my_data, City == levels(City)[i]) points(my_sub$X, my_sub$Y, col = i,pch=14+i) text(my_sub$X, my_sub$Y, levels(City)[i], cex=0.9,pos=4) } text(17,17, "2: Conditional Format", font=4, cex=0.9) ## Plot 3 - just averages ################################ plot_3<- plot(X, Y, type = "n", xlim = c(0,20), ylim=c(0, 20),xlab="", ylab="", bty="o", xaxs="i", yaxs="i",cex.axis=0.9,axes=F ) axis(4, cex.axis =0.9) box("plot", col = "black") for (j in 1:nlevels(City)) { my_sub <- subset(my_data, City == levels(City)[j]) avg_x <- tapply(my_sub$X, INDEX = my_sub$City, mean) avg_y <- tapply(my_sub$Y, INDEX = my_sub$City, mean) text(avg_x, avg_y, levels(City)[j],cex=0.9, pos=4) points(avg_x, avg_y, col = j,pch=14+j, cex=1.25) } text(17,17, "3: City Averages", font=4, cex=0.9) ## Plot 4 - Polygons ################################### par(mar=c(2,4,0,4)) plot_4<- plot(X, Y, type = "n", xlim = c(0,20), ylim=c(0, 20),xlab="", ylab="", bty="o", xaxs="i", yaxs="i",cex.axis=0.9 ) box("plot", col = "black") for (k in 1:nlevels(City)) { my_sub <- subset(my_data, City == levels(City)[k]) avg_x <- tapply(my_sub$X, INDEX = my_sub$City, mean) avg_y <- tapply(my_sub$Y, INDEX = my_sub$City, mean) points(my_sub$X, my_sub$Y, col = k,pch=14+k, cex = 1) hull <- chull(my_sub$X, my_sub$Y) polygon(my_sub$X[hull], my_sub$Y[hull],col = k, density = 15,border = "grey" ) text(avg_x, avg_y, levels(City)[k],cex=0.9, pos=4) } text(17,17, "4: City Hulls", font=4, cex=0.9) ## Outer margin annotation mtext("Don't Try This With Excel", side = 3,font=4, outer = T, line=2.5) mtext("Based on Jon Peltier Example - May, 2008", side = 3,line = 0.5 , font = 1,outer = T, cex = 0.7) my_date <- format(Sys.time(), "%m/%d/%y") mtext("DK O'Day", side = 1, line = 0.5, cex = 0.7, outer = T, adj =0.8) mtext(script, side = 1, line = .5, cex=0.7, outer = T, adj = 0) mtext(my_date, side = 1, line =.5, cex = 0.7, outer = T, adj = 1) ## STEP 5: CLOSE par(oldpar) detach(my_data)

Pingback: R Lets You Put Chart Inside Chart « Charts & Graphs

Pingback: ggplot2: Don’t Try This With Excel (Revised) « Learning R

ps. the RSS from your Process Trends site has the wrong URL for this blog post (missing the “s” in chart(s)graphs.wordress.com/….)

B

Thanks for the tip. It should be correct now.

Pingback: ggplot2: Don’t Try This With Excel « Learning R