Demystifying Cargo Cult Visualization: You Cannot Visualize 3 Variables by Mixing 3 Colors

cargo cultYou may have noticed last week, there was a spike of interest around a “new” visualization technique proposed by the GOOD Magazine in which 3 colors are used to represent 3 aspect of a demographic data set. I originally answered to a question posted in Twitter by Moritz Stefaner in which he asked what we thought about it. Surprisingly the whole stuff spread like a virus and new blog posts popped up here and there and people came up with every kind of sophisticated explanations and arguments about what is good, what is bad, what could be good if, what could be done better, etc … If your radar didn’t catch these signals take a look to the very-well-crafted Andy Kirk’s post which pretty much summarizes the whole thing.

I won’t make any discount here: in my humble opinion this is plain BS … or better it is what I call Cargo Cult Visualization. I’ll describe what I mean with this term, how visualization theory predicts that the technique is plain wrong and why you’d better study some basic theory before attempting new “inventions”.

Why you cannot visualize 3 variables with 3 colors

Of course I have nothing against experimentation and it’s totally fine to explore crazy ideas with the purpose of learning something new out of it. But here we have a new technique sold as an invention when in fact the technique is not new and science predicts it doesn’t work.

When I originally read the invitation of Moritz to comment on this technique all in a sudden it reminded me of a couple of pages from Colin Ware’s Information Visualization book. The theory behind it is called Integral-Separable Dimensions and it explains why this cannot work.

Integral-Separable Dimensions Theory

Colin Ware (at page 177) explains the theory and why it is important in visual encoding of data dimensions. When we build visualizations we map data features to visual features (size, color, shape, etc.) and we expect to see similarities and relationships between these objects visually. The problem is that the choice of which visual features are used in conjunction to encode the various data features greatly affects the way their are perceived. All features influence each other to some extent but some more than others. For instance, if you use color and size to encode two data features, the way color is perceived will be affected by the size of the object (other then a number of other contextual factors). It turns out that this effect has been the object of several studies in vision science and we know quite well how certain features interact.

We say that two dimensions (features) are integral when they are perceived holistically, that is, it’s hard to visually decode the value of one independently from the other. Picture a series of rectangles in a scatter plot where the height is mapped to one data feature and the width to another:

integral separable dimensions

can you easily spot all the rectangles with the same width (click on the images for a larger version)? No, it’s not fast.

And what about all those with the same height? The same.

Here are those with the same width:

integral separable dimensions

And here those with the same height:

integral separable dimensions

On the contrary if you use color and size the task is easier.

integral separable dimensions

You can more easily spot yellow or black dots. And you can also sport circles or squares. It’s not super fast but it’s better, right? Shape and color are in fact more separable than width and height.

Again in the book you can find a clarifying example (at page 181) (this is actually the picture that came into my mind first). The dimensions are ordered from the most integral, at the top, to the most separable, at the bottom.

Integral-Separable Dimension from Colin Ware's Book

You notice anything? What is a the top? Color channels. You see, it takes reading a couple of pages to demystify a cargo cult technique. And what the map in GOOD magazine proposes is not even what is suggested here by Colin Ware because Colin suggests using some specific color channels.

How the idea could be implemented better

So the problem with this map is not only the choice of using 3 colors for 3 dimensions but also the bad execution of the idea. Color theory in fact teaches us that colors can be described by 3 channels. There exist a fairly large number of ways to describe color and they all describe color with 3 channels. Why the authors didn’t try to use at least one of those? Some of them won’t work anyway but at least it would make more sense. Also, two of the original data features, high school graduates and  college graduates, could be easily combined to answer their question and then more visual options would be available. And color could still be not optimal but interesting to explore!

An alternative experiment that could run is based on the use of the Opponent Process Theory (sorry I know I am getting too technical here but I don’t want to impress you, it’s just that I want to demonstrate how experimentation should be guided by knowledge). I won’t explain the theory in details, again you can find it in Colin Ware’s book (page 110). The theory says a simple thing: our visual system and its internal circuitry is made in a way that we naturally have 3 embedded channels: yellow-blue, red-green, black-white. If these are our natural channels why not trying with them? That would be interesting to explore! I wouldn’t expect to have an enlightening map out of it but at least this is worth trying.

Thanks to Alan MacEachren I also  discovered that Cynthia Brewer, one of the major experts in color use for data visualization (if you don’t know her ColorBrewer go there NOW), actually tried a similar scheme in the mid-90’s. I have found an example online (thanks to Robert Roth who posted it) which I repeat here below.

Brewer (1994: 133) on Twitpic

The map uses a trivariate color mapping to show the “percent of labor force employed in each of the three sectors”. At first sight it might seem like the same idea. Just that it is proposed by one of the most authoritative person in the field. But it’s not, it is fundamentally different. In this case, the three dimensions-colors represent the proportion of the same variable (i.e., labor force) along there different categories (services, agriculture, industry). The mappings is in turn fundamentally a categorical mapping and you still need to refer back to the legend in order to accurately decode the map.

What is cargo cult visualization and why we don’t need it

Really guys … to the risk of appearing academic or orthodox or whatever: we don’t need cargo cult visualization. But what is cargo cult visualization? Famous physicist Richard Feynman coined the term cargo cult science in a famous lecture … (source Wikipedia)

” … to negatively characterize research in the soft sciences (psychology and psychiatry in particular) – arguing that they have the semblance of being scientific, but are missing “a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty”.

I immediately thought about it when I created the concept of this post. Cargo cult vis is not just junk charts, it’s more insidious. In chart junks there is “only” the bad or creative use of standard charts in ways that basically hide the message behind the glitter. But here we have a more courageous step: a method proposed like if it was new when in fact it is not new at all and it’s badly executed. Cargo cult visualization is trying to invent new techniques without having any minimal knowledge of the basics. That’s dangerous and can deceive novices who are interested in visualization.

If you read this blog regularly you know how much I care to give honest and solid information to people who want to become data visualization experts. And I am a big big fan of experimentation and of putting things into practice from day one. But if you are ambitious and want to come up with new techniques you’d better watch out and do your homeworks if you don’t want to shoot yourself in the foot. There are some classic knowledge sources in infovis and it doesn’t take much to acquire the basics.

I know I might appear elitist or even arrogant but self-celebration is not my intent here, I just want to make sure the point that knowledge is pretty much accessible and there are no excuses if you are too lazy and pretend to make great things. Just that. If you don’t do your little homework you are just walking in the dark.

… but don’t get me wrong: bad ideas are a fabulous tool!

There is one last thing that I want to clarify because I think it is useful. This post of mine might give the impression that I think that trying out bad ideas is bad. No, no, no! To the contrary, deliberately experimenting with mechanisms you know are clearly wrong is not bad. I learned this lesson several years ago from my dear friend and renown professor of HCI, Alan Dix, who once taught me the value of creating bad ideas deliberately. I suggest you to give a look this his page about bad ideas generation and their role in design. However the whole thing boils down to the fact that when we create bad ideas deliberately we free ourselves from judgment and criticism and learn new aspects of the problem we are trying to solve. I remember very clearly a couple of great bad idea: the glass hammer and the inflatable dart board :-) But bad idea generation, in order to be useful, has to be done purposely. I doubts this is the case for the map we have analyzed here.

I really hope this post was useful to better understand what’s the value of basic knowledge in visualization and to convince people that acquiring it is actually really worth it and necessary.

Of course it is totally possible I missed something or that I am plain wrong on something. If this is the case please let me know what you think and send me a message on Twitter or comment below, I’d love to hear you.

(If you liked this post please remember to share it by using the twitter or facebook buttons, thanks!)