July 30, 2009

The visual display of quantities information - By Edward R. Tufte.

The visual display of quantities information - By Edward R. Tufte.

(This book was recommended to me by a person who got degrees from Princeton, Stanford and NYU)

Graphical display should

- show the data
- induce the viewer to think about the substance rather than about methodology, graphics design, the technology of graphical design or something else
- avoid distorting what the idea have to say
- present many numbers in a small space
- make large data sets coherent
- encourage the eye to compare different pieces of data
- reveal the data at several levels of detail, from a broad overview to the fine structure
- serve a reasonably clear purpose: description, exploration, tabulation, or decoration
- be closely integrated with the statistical and verbal descriptions of a data set.

Principles of Graphical Excellence
- Graphical excellence is the well-designed presentation of interesting data - a matter of substance, of statistics and of design.
- Graphical excellence consists of complex ideas communicated with clarity, precision and efficiency
- Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.
- Graphical excellence is nearly always multivariate
- Graphical excellence requires telling the truth about the data

Graphic misrepresentation measured by the
Lie-factor = size of effect shown in graphic/ size of effect in data

If he lie-factor is 1, then the graphic might be doing a reasonable job of accurately representing the underlying numbers. Anything less or more than 1 is actually indicate distortion of data graphic.

Graphical integrity is more likely to result, if these 6 principles are followed

1. The representation of numbers as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.

2. Clear detailed and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanation of the data on the graphic itself. Label important events in the data.

3. Show data variation, not design variation

4. In time-series displays of money, deflated and standardized units of monetary measurements are nearly always better than nominal units.

5. The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data

6. Graphics must not quote data out of context.

Good book on graphics - Herdg’s Graphics/Diagrams

The conditions under which many data graphics are produced guarantee grphic mediocrity and these conditions are
1. Lie
2. Employ only the simplest designs often un-standardized time-series based on a small handful of data points
3. Miss the real news actually in the data.

Graphical competence demands 3 quite different skills.
1. The substantive
2. Statistical
3. Artistic

5 principles in the theory of data graphics produce substantial changes in graphical design.
1. Above all else show the data
2. Maximize the data-link ratio
3. Erase non-data-link
4. Erase redundant data-link
5. Revise and edit.

Data link ratio= data-link / total ink used to print the graphic
= proportion of a graphic’s ink devoted to the non-redundant display of data-information
= 1.0 - proportion of a graphic that can be erased without loss of data-information.

Data density & small multiples:
Our eyes can make a remarkable number of distinctions within a small area.

Well-designed small multiples are - inevitably comparative
- deftly multivariate
- shrunken, high -density graphics
- usually based on a large data matrix
- drawn almost entirely with data-ink
- efficient in interpretation
- often narrative in content, showing shifts in the relationship between variables as the index variable changes

Small multiples reflect much of the theory of data graphics:
For non-data-link, less is more
For data-link, less is bore

Attractive displays of statistical information
- have a properly chosen format and design
- use words, numbers, and drawing together
- reflect a balance, a proportion, a sense of relevant scale
- display accessible complexity of detail
- often have a narrative quality, a story to tell about the data
- are drawn in a professional manner with the technical details of production done with care
- avoid content-free decoration, including chart junk.

Conventional sentence is a poor way to show more than 2 numbers because it prevents a comparison within the data.

Tables are clearly the best way to show exact numerical values although the entries can also be arranged in semi-graphical form.

The principle of data/text integrat5ion is ‘data graphics are paragraphs about the data and should be treated as such.

Friendly graphics are - words are spelled out, mysterious and elaborate encoding avoided.
- words are run from left to right, the usual direction for reading occidental languages
- little messages he;[ explain the data
- elaborately encoded shadings, cross-hatching and colors are avoided; instead labels are placed on the graphics itself; no legend is required.
- graphics attracts viewer, provokes curiosity
- colors, if used are chosen so that the color-deficient and color-blind can make sense of the graphics
- type is clear, precise, modest, lettering may be done by hand.
- type is upper-and-lower case, with serifs.

The shape of the graphics.

Graphics should tend toward the horizontal, greater in length than height.

Several lines of reasoning favor, horizontal over vertical displays

Horizontally stretched time-series are more accessible to the eye.

The analogy to the horizon also suggests that a shaded high contrast display might occasionally be better than the floating snake.

Ease of labeling - It is easier to write and to read words that read from left to right on a horizontally stretched plotting field (e.g. writing ‘some labels in one line rather than in 2 lines with one word each per line)

Emphasis on causal influence (effect on y-axis (vertical) & cause on x-axis (horizontal))

Longer horizontal helps to elaborate the workings of the causal variable in more detail.

If the nature of the data suggests the shape of the graphic, follow that suggestion.
If not, move towards horizontal graphics about 50% wider than tall.