Climate Charts & Graphs

Entries from September 2008

Data Loss Aversion II – R Lattice Plot

September 29, 2008 · 8 Comments

Click to Enlarge

This post continues Jorges Camoes discussion on data loss aversion.  In my first post on this topic,  I used a dot plot to show the 1967 and 2005 values to summarize relative shifts in households by total household income bracket.   Derek, giving in to data loss aversion, used a logarithmic axis technique to show all data in intervening years between 1967 and 2005. Andreas Lipphardt used small multiples to show 1967 and 2005 values as well as the overall change by bracket. We can combine Dereks “giving into loss aversion” and Andreas’s “small multiples” approaches to show all data for the 9 series in a compact trellis chart by using the R Lattice package. I have a brief discussion on using R  for advanced charting here. This R lattice plot has several advantages:

  • Shows all data  
  • All plots share common X and Y axes, reducing axis labeling
  • Plot uses banking to 45 to enhance visualization
The terms trellis, lattice and panel charts seem to be used interchangably, depending on which software was used to develop the chart. Small multiples is a more generic term that applies to Tufte’s approach of making a series of small, similar charts.
 
Excel has limited tools for effective multivariate charts, no trellis or lattice charts and no built-in small multiples capabilities.  While we can use clever Excel workarounds like panel charts or manually generated small mutiples,  I find that it is wise to move beyond Excel’s chart limits for multivariate charts.
 
R, a powerful and free statistical analysis and graphing package, has excellent multivariate charting tools. The R learning curve is well worth it for those Excel charters who want to move beyond Excel to the wider world of advanced charting.               

Categories: Multivariate Charts
Tagged:

Logarithmic Scale for Time Series Charts

September 25, 2008 · Leave a Comment

 In posting about Loss Aversion, Derek at Information Ocean, presented an alternative to the dot plot approach I suggested in a previous post.  Derek got the idea from a post by Dr. Nicolas Bissantz on his blog. Nicolas asked the question, “Do time series charts really compare time series?”, a very important question considering the widespread use of time series charts.  Nicolas has done us a great favor by discussing the critical role of the Y axis scale in time series data and reminding us of high school math that some us may have forgotten. If you use time series charts, I suggest you read his post.

Derek used a logarithmic Y axis scale chart to evaluate the % income distribution by total household data set that Jorge Cameos , Derek, Andreas at  XLCubed and I have been discussing recently.  Derek and Andreas have both shed considerable light on this data set, helping to show why chart type selection is so critical for effective charting.

While I’ll have more posts on chart selection later, right now I’d like to concentrate on the use of logarithmic scales in time series charts.

Derek applied a logarithmic scale to the cumulative % distribution of household income to produce this chart.

Derek’s discussion on his chart …”The surprising result is that the <$5k income level contains almost as many households in 2005 as in 1967, and significantly more than in the 70s, after a fall in the 60s. After a modest fall in the 90s, it rises again after 2000, as do all the other income levels below $100k. ”

Derek’s log scale chart definitely shows fluctuations in the <$5,000 series. Here’s my chart to help take a closer look at just the <$5,000 series to better understand what happened.

  Let’s review Derek’s observations with my chart:

  • <$5k income level contains almost as many households in 2005 as in 1967 - 1967 shows 4.9%, 2005 shows 3.3%, a 32% decline in the proportion of households < $5,000 income.
  • “significantly more than in the 70s” - True
  • “After a modest fall in the 90s”  - The 1990’s started at 2.7%, increased to 3.2% in 1993, dropped to 2.5% in 1999. This seems to have continued the see-saw effect that started in 1974.
  • it rises again after 2000″ - It rises to 3.4% in 2004, it drops to 3.3% in 2005.
Based on his comments on several data visualization sites, I think that Derek is a careful chart reviewer. What happened with his interpretation of this data series?
 
No Y axis grid lines?
 
Interpreting a log scale axis is more challenging than a linear scale because the magnitude of a change per unit measure (cm, inch) varies by location along the axis, this can lead to misinterpretation of the magnitude of changes.  
 
Grid lines and axis labels are important when using a logarithmic scale so that the chart reader is helped to adjust to the non-linear scale along the axis with grid guidelines and intermittent labels across each log cycle. This is particularly true if you use Excel’s default log scale without a custom axis.

Categories: Time Series Charts
Tagged:

Is A Poor Chart Worse Than No Chart?

September 23, 2008 · Leave a Comment

Kaiser at  Junk Charts posted on a New York Times chart from the Sept. 6, 2008 op-ed piece “Let’s Talk About Sex“. Kaiser and his commenters agreed that the chart was not effective.

The chart, partially reproduced to the right, shows 4 teenage sex indicators for 28 countries. The chart designer chose to use bubble size to compare the other country rates to the USA rate. The result is a confusing chart.

I have several concerns with this chart: 

1. Relationship to the article - I read the article to see how the chart fit into the writer’s discussion.  To my surprise, the writer did not mention the chart at all.  The chart stands on its own, with no relationship to the article.

2.  Chart Design - Using bubble size to compare rates is a poor charting technique. I created this dot plot of 1970 and 1998 teenage birthrates as an alternative to the Times chart.

3. Missing Data Analysis - Charts, an important tool in data analysis, are not the same as data analysis. We need to interpret, evaluate, synthesize our data to gain understanding. The Times’ article and chart do not provide any interpretation, analysis or synthesis. Why have a chart if we  are going to ignore it?

There are a number of important findings in the data that the author could have pointed out:

  • All countries except Ireland had a decrease in teenage birthrate from 1970 to 1998
  • 1998 teenage birthrates varied from a low of 4.6 births per 1,000 woman in Japan to a high of 52.1in the US
  • The US 1998 teenage birthrate was nearly 70% greater than the closest countries of New Zealand and Britain 
My blog post title asks the question – “Is a poor chart worse than  no chart?“.  The New York Times’ article was not improved by the graphic. Since the graphic was so poor, it likely took focus from the author’s words for many readers without providing any insight into the issue being discussed. In this case, the poor chart was definitely worse than no chart.

Categories: Chart Principles
Tagged:

Data Loss Aversion

September 22, 2008 · 6 Comments

I’m joining an on-going discussion chain by four data – chart blogs that I follow.  Andrew Gellman started the chain with a post on The Monkey Cage about a New York Times article on data visualizations sites like Many Eyes.  Andrew pointed out that the NYT example chart “.. is just horrible”.  ”It’s a classic example of a graph that looks cool but is just confusing.”

Kaiser Fung of JunkCharts followed up with a post on Loss Aversion raising concerns about “cramming as much data into the chart as possible“. Kaiser points out that this tendency is “..taking Tuft’s concept of maximizing data-ink ratio to the extreme.” In discussing the original NYT graph, Kaiser says “.. Every piece of data is given equal footing, which results in nothing standing out.”

Jorge Camoes, following up on Kaiser’s post, points to a Tufte corollary “..To clarify, add detail” , which supports the loss aversion tendency. Jorge shows an example  chart with nine time series and asks “does it make any sense to add those nine series to a single chart?

Andreas Lipphardt of XLCubed followed up Jorge’s question on how to best show this chart data by adding an elegant set of grouped colors. 

Does Andreas’s color coding solve the readability issue? No! While it helps, it does not significantly clarify the data.  We need to rethink our chart; what are we trying to show? There are three factors in this data set: year, income class and % of households in the class.  What are we most interested in? Do we really need to show the data for each year, aren’t we more interested in the long term trend?  

To me, the most important information is the long term shift in income distribution by income group, not the year to year changes. Lets use a dot plot and directly compare 1967 and 2005 distributions.


The dot plot clarifies the situation by showing changes in income by class for just 2 years so that we can compare changes by class. The % of households in the top 3 income classes were much higher in 2005, the $50-74,900 class stayed the same and % of households with total income less than $49,900 decreased.

In this case, changing chart type improved the chart more than enhanced color coding. We need to make sure we select the most appropriate chart before we try to optimize the chart format.

Kaiser’s Loss Aversion concerns raise an important charting prinicple, clarity in our chart purpose is critical to making an effective chart.  More data or better colors won’t help a poor chart type selection.

 

Source data file link.

Categories: Chart Principles
Tagged: