Hi there! We had a one week break at school as the inclement weather forced us to cancel the class last week.
Here are the lecture slides from this class: Beyond Charts: High-Information Graphics.
In this third lecture I have introduced the concept of “high-information graphics”, a term I have stolen from Tufte’s Visual Display of Quantitative Information. For the first time, I decided to introduce this concept very early on in the course because I noticed students have a very hard time conceptualizing visual representations where lots of information is visible in one single view. In the past I have seen lots of students squeezing a million items data sets into a four-bar bar chart. Literally.
The Aggregation Twitch
I coined the term aggregation twitch hoping my students will remember the concept in the future. The aggregation twitch is the tendency to overaggregate data through summary statistics. When confronted with a data table many think: “how can I reduce this to a few numbers?”. I think Tufte captured the phenomenon just right:
“Data-rich designs give context and credibility to statistical evidence. Low-information designs are suspect: what is left out, what is hidden, why are we shown so little?”
Then, commenting on what’s the difference between high vs. low information designs:
“Summary graphics can emerge from high-information displays, but there is nowhere to go if we begin with a low-information design.”
I love this last sentence because, in its simplicity, it suggests some kind of stance or attitude in designing visualization.
In order to make the concept more explicit I presented an example from one of my past students. He was assigned the task to create a visualization from the Aid Data data set, which contains more than a million items and several attributes like donor, recipient, date, purpose, etc. His first implementation was a funny (in some perverse way admittedly) line plot with four lines and a lot of options to decide what data segments to display. I was stunned! But since then I kept thinking about that example and how pervasive this aggregation attitude is.
My students seem to have grasped the concept, even though I regret I did not provide any positive example. I spent quite some time explaining why I think this is a limited way of doing visualization but I forgot to prepare and show counterexamples. Not good.
The query paradigm and the notion of overview
My student’s example gave me the opportunity to discuss a related problem I often see: relying excessively on data querying. That’s the way most students think about data visualization initially: create one simple chart and provide lots of options to select what statistical aggregates to display. Interestingly, this is the same way most data portals present they data by default; and by the way why most fail to produce anything interesting since many many years.
The problem with this approach is that there is very limited space for data comparison and rich “graphical inference”, which is exactly what our brain is good for. What many don’t get is that as soon as you change parameters the old chart is not visible anymore and you have to rely on memory rather than perception to relate what you see now to what you saw before. But the very reason why visualization is so powerful, is exactly because the information you need is there in front of you, and can be accessed any time. A concept fantastically expressed by Colin Ware in his book when he writes: “the world is its own memory” .
In order to make the distinction clearer I proposed to summarize the concept through this simple dichotomy:
Query paradigm: ask first, then present.
Visualization paradigm: present first, then ask.
The query paradigm forces you to initiate the analysis by thinking what you want first. The hard way. But visualization, for the most part, works in reverse: you first see what is in the data and then you are kind of forced to ask some questions as you detect interesting patterns you feel compelled to interpret and explain.
At this point one of my students jumped up and said: “no wait a minute … in order to create a data visualization you have to have some kind of question first!”. I fully agree. Visualization should be built with a purpose in mind. I think the difference is more in whether the current design provides an overview over your data set or not. The query paradigm chops data in sealed segments one can see only individually; one at a time. But the visualization paradigm tries to build a whole map of your data and let you navigate through this entire space.
Note that I am not necessarily claiming one is better than another! There are many great uses of query interfaces. What worries me the most, to be true, is that the query paradigm is so pervasive that it ends up being the only solution people may consider when approaching visualization problems for the first time.
Where does the aggregation twitch come from?
Why students have a hard time assimilating these concepts? Why are high-information graphics so foreign to most of them? Why do they have a hard time grasping this concept? I think there are at least two main issues at play here:
- Underestimation of visual perception. When I work with students, in or out of my class, it always amazes me how fearful they are to make their charts smaller. They fear they will be too hard to see and I keep pushing them to make the damn thing smaller. Much much smaller. The human eye is an incredibly powerful device but it looks like most people do not realize how powerful it is. Probably because we take it for granted. Colin Ware has a nice section in his Information Visualization book on visual acuity  which I suggest to read to everyone. It’s such a fascinating piece of research! For instance, take this: a monitor has about 40 pixels per square inch and the human eye can distinguish line collinearity at a resolution as low as 1/10 of a pixel.
- Overestimation of human (short) memory. As I said above, most people approach data visualization with a query paradigm: one big chart and a lot of options to decide what to put there. This may work in some cases but it limits enormously the amount of reasoning we can do with it. We humans can hold a very small set of objects in our working memory at any given time, that’s the famous “magical number seven” (tip: it’s actually more complicated than that but it works for this example), therefore when a chart changes, we can no longer relate the previous set to the current one. Visual perception is orders of magnitudes more powerful than memory. That’s why visualization shines.
There is actually a third issue which did not occur to me until I presented these ideas in class: visual literacy and familiarity (I started getting obsessed with this issue lately). Most of the fancy visualization techniques we develop are totally unfamiliar for most people out there. Not only they need to spend time learning how to decode them, but they may also be totally overwhelmed by the information density carried by these pictures. This became totally clear to me when I presented this Treemap in class (click to see a bigger version):
One of my students raised his hand with a facial expression between disgust and pain: “Prof., that’s too much information at once, I cannot bear it”. That’s the thing: while some people (me included) seem to take pleasure from looking at the intricate patterns high-information graphics make, some other people just cannot bear it. Question: is that a learned behavior or it’s more rooted in individual differences we humans have? I don’t know.
That’s all folks … Now I need to prepare for my next lecture (and whole bunch of other stuff by the way:))
 Ware, Colin. Visual thinking: For design. Morgan Kaufmann, 2010.
 Ware, Colin. Information Visualization. Morgan Kaufmann, 2013 (third edition)