kwc.org Photos Spare Cycles MythBusters

Book: Visual Display of Quantitative Information

book image

You don't need me to tell you that Edward Tufte's Visual Display of Quantitative Information (amazon) is a good book. I've posted my outline notes below (mostly for my own benefit, as this is useless without Tufte's pretty [and ugly] examples).

General notes:

Tufte's preferences
- Graphical displays should show lots of information in a small amount of space. Those with little data can often be better represented in words.
- favors juxtaposition of many series of data (multivariate)
- prefers maximizing the amount of expression in a minimal amount of space, but not simplistic.
- Graphical displays should contain both broad and fine-grained information.
- Data maps are good ways of conveying information quickly, however, they do have the flaw that they correlate data with the area of a region rather than the population of a region.
- Graphics should not contain more dimensions than actual data. Common error is to use 2-D or 3-D objects to represent univariate data (e.g. oil barrels to represent price).


Chapter 1: Graphical excellence

History of graphical illustration of quantitative information
- 10th/11th centruy illustration of planetary orbits by monastery school is oldest known example
- time-series charts do not appear in scientific literature until late 1700s
- Edmond Halley's trade wind and monsoon chart in 1686 is one of the first known data maps.
- first economic time-series plotted in 1786
- formalized by Playfair and Lambert
- 1930s-1970s: dullard years. Concerned with 2 purposes: 1) making graphs "alive" and 2) fighting deception.
- late 1960s: John Tukey made statistical graphics respectable once more.

Types of statistical graphs:
- data map
- Edmond Halley's trade wind and monsoon chart in 1686
- Charles Joseph Minard's world map of French wine exports
- time series
- most frequently used.
- Tufte, in general, views these as less sophisticated since they often don't show any relation other than time
- space-time narrative designs
- Napolean's march into Russia
- relational graphics
- advanced by Lambert and Playfair. Broke graphical design free from analogies to physical world.

Small multiple:
- repeat the same design over and over so that people can quickly compare data sets. e.g. overlaying each on the same map.


Chapter 2: Graphical Integrity

Tables vs. graphics
- tables usually outperform graphics when there are 20 data points or less

Misperceptions:
- perceived area of circle grows more slowly than actual area:
perceived area = (actual area)*(.8 +- 3)
- perceptions change with experience, context, and other people's opinions

Distortions:
- Lie factor = (size of effect in graphic)/(size of effect in data)
- Overstating: lie factor > 1
- Understating: lie factor < 1
- Most distortions are overstating
- Lie factors of 2 to 5 are common
- design variation: changes in representation of data, such as changing scales or perspective or size:data ratios.
- tips:
- use nominal instead of real units for money
- don't use more dimensions than your data has
- don't use multidimensional objects for univariate data (e.g. barrels of oil for price, dollar bills for price)
- 6 principles of graphical integrity:
- The representation of number, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented
- Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data aon the graphic itself. Label important events in the data.
- Show data variation, not design variation.
- In time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units.
- The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.
- Graphics must not quote data out of context.


Chapter 3: Sources of graphical integrity and sophistication

Statistical data is not boring:
- common failure of graphical display is assuming that the purpose of the display is to be non-boring or to dumb data down for readers.
- attempts to dress up data lead to mistakes such as small data sets, forcing multivariate data into univariate displays, overdecoration, lack of fine detail.

Sophistication:
- Tufte defines graphical sophistication of a publication to be the percentage of statistical graphics based on more than one variable, but not a time-series or map.
- New York Times graphics have lower sophistication than high school textbooks, yet reading level is very high.
- Japan ranks highly in graphical sophistication, possibly due to educational focus on statistics at an early age.

Graphical competence requires:
- substantive skills
- statistical skills
- artistic skills

Part II Theory of Data Graphics

Chapter 4: Data-Ink and Graphical Redesign

Data-Ink Ratio = (data-ink)/(total ink used to print graphic) = 1.0
- There should be no wasted ink in a graphic
- grids are a waste of ink
- bar charts are a big waste of ink: left bar, right bar, and top bar all provide same information

symmetry can often be eliminated
- studies show that viewers barely examine other half, other than to note that it's symmetrical

Redundancy can be useful
- helps with data that wraps around, periodic
- extending train chart by half cycle allows each train's line to be a continuous line, instead of having to wrap from the right margin back to the beginning.

Tufte's editing error:
- pg. 101
- reduced graph missing labels (pre GS/post GS)

Five principles of data graphics:
(1) Above all else show the data
(2) Maximize the data-ink ratio (within reason)
(3) Erase non-data ink (within reason)
(4) Erase redundant data-ink (within reason, except when it helps)
(5) Revise and edit

Chapter 5: Chartjunk: Vibrations, Grids, and Ducks

Moire effect
- optical illusions to create sense of vibration
- frequenly appears even in scientific publications

Grids
- dark grid lines camouflage data information
- grids mostly there for initial plotting
- if grid necessary, then make it gray

Self-promoting graphics
- Big Duck
- Flanders building
- in the context of graphical design, when the overall design emphasizs graphical style over quantative information
- 3D designs example of this

Worst graphic ever?
- pg. 118

Chapter 6: Data-Ink Maximization and Graphical Design

Applying data-ink maximization theory to redesigns

Revision of the box plot
- normal box plots are a waste of data ink
- Tufte's redesigns
- don't have to reposition the ruler, saves time
- conveys same information

Revision of the bar chart
- Tufte not a fan of the bar chart
- replace tick marks on axis with white grid lines
- eliminate axis

Revision of the scatterplot
- range-frame
- erase parts of the frame to convey more information
- start/stop frame lines at minimum/maximum
- can also show quartile data in frame lines using offset or line weights
- can also show median using white grid line in frame line
- dot-dash-plot
- show marginal distribution in frame
- tick marks in axis correspond with plotted data
- can read each frame axis independently to understand marginal distribution along a single distribution
- rugplot
- connected dot-dash plots
- dashed line connects data points across plots

Chapter 7: Multifunctioning Graphical Elements

Multifunctioning principles
- use data-ink to show multiple pieces of information, if possible
- leave no non-data ink

Stem-and-leaf plot
- histogram distribution of data using data itself to display the histogram

Ayres time-series of divisions in France
- data shows:
(1) how many divisions in France at a given point in time
(2) what divisions were in France at a given point in time
(3) how long each division was in France

Chernoff faces
- faces distinguishable even with small feature detail

Graph of painted line strips on pavement
- line strips are the data/graph

Data-based grids
- data can sometimes form grid, or grid can be element of interest
- e.g. grid showing Mitchell map vs. current day map
- e.g. Playfair's marking of major events as vertical grid lins

Double-functioning labels
- convert frames into range-frames and relable ends of frame with minimum/maximum
- labels should be data-based
- e.g. Tungsten research plot that labels each point with it's chronological order

Puzzles and hierarchy in graphics
- color can often confuse, requiring additional explanation
- humans don't have a total ordering for color, only partial orderings (e.g. red vs. other colors)
- shades of gray can be more effective than color
- graphics can have at least three viewing depths:
(1) overall (macro) structure, first glance
(2) fine (micro) struture
(3) implicit structure
- e.g. census map of United States: (1) US map, (2) population distribution, (3) effect of geograph on population distribution
- visual angles
- for multifunctioning data, it can be good to line up one data display along a single sight line.
- e.g. Ayres' divisions in France display.
- horizontal: Duration of stay lines
- profile: number of divisions over time
- vertical: divisions at a given point in time

Chapter 8: Data Density and Small Multiples

Data density
- data density of a graphic = (number of entires in a data matrix)/(area of data graphic)
- human eye can distinguish a large number of distinctions in a small area. Tufte mentions 625 points in one square inch as an example.
- thus, marginal cost of interpreting more information in graphics is low
- graphics should take advantage of human capabilities
- example data densities:
- simple bar chart: .02 numbers/sq. cm.
- NY weather history: 181 numbers/sq. cm.
- galaxy map (current record for statistical graphics): 17,000 numbers/sq. cm.
- maps tend to have the higher data density than statistical data
- USBG topographic quadrangle estimated at 40,000 numbers/sq. cm.

Data density in publications
- media data density low in publications
- Nature is the highest with 48 numbers per sq. inch
- Pravda the lowest with 0.2 numbers per sq. inch
- average probably between 10 and 20 numbers per sq. inch
- Wall Street Journal, Asahi, and The Times have data-densities in graphics similar to Journal of the American Statistical Assocation

High-information graphics
- data graphics should:
- be based on large data matrices
- have high data density

Small multiples
- same combination of variables indexed by change in another variable (e.g. time)
- e.g. twenty-three hours of LA air pollution
- e.g. human vs. simian chromosomes
- Principles of small multiples
- inevitably comparative
- deftly multivariate
- shrunken, high-density graphics
- usually based on a large data matrix
- drawn almost entirely with data-ink
- efficient in interpretation
- often narrative in context, showing shifts in the relationship between variables as the index variable changes (thereby revealing interaction or multiplicative effects)

Chapter 9: Aesthetics and Technique in Data Graphical Design

Good design is found in simplicity of design and complexity of data

Guides to good aesthetics
- have a properly chosen format and design
- use words, numbers, and drawing together
- reflect a balance, a proportion, a sense of relevant scale
- display an accessible complexity of detail
- often have a narrative quality, a story to tell about the data
- are drawn in a professional manner, with the technical details of
production done with care
- avoid content-free decoration, including chartjunk

Design choices
- Three choices: sentences, tables, and graphics
- sentences poor at showing more than two pieces of data, don't invite comparison.
- never use pie charts
- low data density
- doesn't invite easy comparison between pie charts
- supertable
- elaborate table with many pieces of data
- better than hundreds of bar charts
- wordy data graphics work well for highly labelled data

Making Complexity Accessible: Combining Words, Numbers, and Pictures
- words and pictures should be combined
- often they are separated due to printing limitations
- data graphics are paragraphs about data and should be treated as such
- text and graphics should run together
- avoid ruled lines separators between text and graphics
- use same typefaces in graphics and text
- advances in printing have introduced greater schisms in text and graphics
- graphics often handed off to graphics specialists
- machinery separates text and graphics printing mechanisms
- early scientific works (da Vinci) had text flowing around statistical graphics
- words should tell viewer how to read the graphic, not what to read

Accessible Complexity: the Friendly Data Graphic

Friendly vs. unfriendly:
- graphic should not abuse abbreviations as they require additional effort to interpret
- words should run left to right instead of vertical
- graphics should have an appropriate amount of labelling to make interpretation easy
- no legend should be required to understand encoding
- graphic should be attractive, rather than filled with chartjunk
- type should be clear
- type should not be in all uppercase
- reading is more difficult when letters are all capitals, because each letter is the same height and occupies similar volume.
- graphics shoud be accessible to color blind individuals

Proportion and Scale: Line Weight and Lettering
- need to control relative weights of elements in graphic so that they appear to be in harmony.
- adjust line weighting as appropriate
- lines in graphics should be thin
- 18th and 19th century graphics are attractive because they used copper plate engraving, which produces thin lines
- use different line weights in perpindicular intersecting lines to convey data
- use heavier line for actual data measure

Proporation and Scale: The Shape of Graphics
- graphics should be longer than tall
- analogous to the horizon
- easier to read words from right to left than vertical
- causal effects easier to understand left to right
- high contrast good for understanding
- shading should be calm (no moire effect)
- some claim that the golden rectangle is the preferred aesthetic, but this is not strongly supported.
- pg. 189 at least 5 other rectangles have simple mathmatical properties which can be used to claim an aesthetic advantage.
- studies show the preferences fall within the range of the golden ratio, but the standard deviation is high
- Shape Principles
- If the nature of the data suggests the shape of the graphic, follow that suggestion.
- Otherwise, move towards horizontal graphics about 50 percent wider than tall.

Epilogue: Designs for the Display of Information
- Tufte weakens the book by offer a broad disclaimer that in the end you need to rely on your own judgement.


SUMMARY

Principles:
- Principles of graphical excellence:
- Graphical excellence is the well-designed presentation of interesting data - a matter of substance, of statistics, and of design
- Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency
- Graphical excellence is that which gives to the viewer the greatest number of ideas in th shortest time with the least ink in the smallest space
- Graphical excellence is nearly always multivariate
- Graphical excellence requires telling the truth about the data
- Six principles of graphical integrity:
(1) The representation of number, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented
(2) Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data aon the graphic itself. Label important events in the data.
(3) Show data variation, not design variation.
(4) In time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units.
(5) The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.
(6) Graphics must not quote data out of context.
- Five principles of data graphics:
(1) Above all else show the data
(2) Maximize the data-ink ratio (within reason)
(3) Erase non-data ink (within reason)
(4) Erase redundant data-ink (within reason, except when it helps)
(5) Revise and edit
- Forgo charjunk, including
moire vibration,
the grid, and the duck
- Mobilize every graphical element, perhaps several times over, to show the data
- graphics can have at least three viewing depths:
(1) overall (macro) structure, first glance
(2) fine (micro) struture
(3) implicit structure
- for multifunctioning data, line up data variations (functions) along individual sight lines (visual angles)
- maximize data density and size of data matrix, within reason
- Shrink Principle: graphics can be shrunk way down
- Principles of small multiples
- inevitably comparative
- deftly multivariate
- shrunken, high-density graphics
- usually based on a large data matrix
- drawn almost entirely with data-ink
- efficient in interpretation
- often narrative in context, showing shifts in the relationship between variables as the index variable changes (thereby revealing interaction or multiplicative effects)
- Good design is found in simplicity of design and complexity of data
- Data graphics are paragraphs about data and should be treated as such
- Words should tell viewer how to read the graphic, not what to read
- Shape Principles
- If the nature of the data suggests the shape of the graphic, follow that suggestion.
- Otherwise, move towards horizontal graphics about 50 percent wider than tall

Statistical Definitions:
- Lie factor = (size of effect in graphic)/(size of effect in data)
- graphical sophistication = % of statistical graphics based on more than one variable, but not a time-series or map.
- Data-Ink Ratio = (data-ink)/(total ink used to print graphic) = 1.0
- data density of a graphic = (number of entires in a data matrix)/(area of data graphic)

Post a comment


tags.

related entries.

what is this?

This page contains a single entry from kwc blog posted on May 28, 2003 11:51 PM.

The previous post was This CD DVD is broken.

The next post is Matrix Persona.

Current entries can be found on the main page.