… and finally stop polluting our eyes I’d say!
I was talking with Ilya, a new PhD student in our department, the other day and in front of a prototype he developed he said something like: “oh yes, and I should find the right color mapping here but … how?” Oh well … good question! Originally I wanted to write a whole new post on it but after some reasonings I came to the conclusion that not only it is a daunting task but also and more importantly I don’t know enough to seriously teach about it.
But wait a minute, does it mean I cannot help him and the ever increasing pool of poor color choosers? No, there is one thing I can do at least: share my list of favorite sources of information on color. And maybe add some tips and rules of thumb I often use for myself.
So, no more excuses to use poor color schemes. Here is my annotated list of resources, plus some personal tips.
List of papers I found most useful in understanding color in use. Some of
them are written more for the general public, some others require quite
some effort to understand. They cover however a very large part of what
should be learned and the effort is largely payed off.
Color Use Guidelines for Data Representation. Brewer, C. A.,Proceedings of the Section on Statistical Graphics, American Statistical Association,
Alexandria VA. pp. 55-60 (1999).
[ If you can read only one, read this ]
you don’t have time to read and you need one single source for practical
advice stop here. This is the best and conciser explanation about how to use color in visualization you’ll ever find. Cynthia Brewer is a
cartographer and focused much of her work on color in geographical
data but her suggestions apply broadly to any kind of data. You may
see the result of her work in Color Brewer,
an on line tool to learn how to select color scales. The tool alone is
an eye-opener for those who don’t know anything about the topic.
How NOT to lie with visualization. BE Rogowitz, LA Treinish, S Bryson, Computers in Physics (1996).
[ More into color for SciVis but still very useful and great examples ]
This is another classic, quite short and easy to read. I like it especially
for its focus on how harmful color can be if not used properly. The use
of color is discussed more in the context of scientific visualization
where continuous shades of color are often the case, like in medical
images and geographical mapping, but the results can be applied to any
other visualization. It is especially interesting the notion that different
color mapping strategies should/can be used according to the task at
hand (e.g., segmentation, highlight, etc.).
Designing pixel-oriented visualization techniques: Theory and applications. DA Keim, IEEE Transactions on Visualization and Computer Graphics (2000).
[ Discussion (and code) of a “perceptually” optimal color scale ]
Though this is not only about color, the paper contains a very useful section on color and on how to build a perceptually optimal color scale. The color scale is called HSI (Hue, Saturation, Intensity) and is a variation over the most common RGB, HSB, etc.
The very good point about it is that it is a very rare example of
article where both color theory and practical implementations are
discussed in the same place. The HSI color scale can be easily re-implemented by following the code they provide in a related paper: Issues in visualizing large databases. DA Keim, HP Kriegel – Proc. Conf. on Visual Database Systems, VDB’95 (1995).
Color Scales for Image Data. H. Levkowitz, G. T. Herman, IEEE Computer Graphics and Applications (12):1 pp.72 – 80 (1992).
[ Some relevant psychophysics theory and its relevance in color scale design ]
This is a purely theoretical paper. I included it because it contains some
information that is difficult to find elsewhere. And also because I
find it especially intriguing. Here we learn that (1) not all
differences in color intensity are perceived by our eyes and (2) that a
linear increase in color intensity is not necessarily perceived
linearly. The concept of Just Noticeable Difference (JND) is
introduced and applied to color scale design. One practical consequence
is that it doesn’t matter how well we map our data to color,
some differences will always be lost.
Choosing Effective Colours for Data Visualization. Healey, C. G., Proceedings IEEE Visualization ’96, pp. 263-270 (1996).
[ Not easy read, hard-core experimentation, but unique info on categorical colors ]
This is even more theoretical than the paper above. And be warned, it is not
an easy read! Anyway, I put it in the list because it is the only
“serious” reference I know where the selection of categorical colors,
that is, colors that represents categories and not quantity, is
discussed in fine details and an algorithm for their selection is
discussed. Here we learn that color is not as powerful as we may think. The
maximum number of distinguishable colors we can use to label data is
around 12. Not so many indeed!
Information visualization: perception for design (Chapter 4: Color) by Colin Ware.
Ware’s book is simply the best resource for whatever concerns
perception theory applied to visualization. Admittedly, this is
probably the best book on visualization ever. Chapter 4 is all about
color theory and its content is obviously great. Theory and practice
are well balanced and useful examples are illustrated throughout the
chapter. I think it only missed practical advices and how to implement
the suggestions in practice, but ok, maybe this would be out of the
scope of the book.
Envisioning Information (Chapter 5: Color and Information) by Edward Tufte.
don’t think this book needs any introduction. It is part of the famous
Tufte’s trilogy and of course it contains some indications on color
use. Even if here one can find many of the things discussed in other
books and papers, but in a useful summarized version, it also contains
some unique content in the usual original Tufte’s style. A great piece
of knowledge here is given right away as the chapter opens. Tufte
summarizes color uses in information design as: to label, to measure, to represent or imitate reality and to enliven or decorate. These few tasks provide a useful framework around the work of a visualization designer.
Show Me the Numbers (Chapter 6: Visual perception and quantitative communication) by Stephen Few.
chapter written by Stephen Few is the best summary I have ever seen on
visual perception theory applied to visualization. Here you will find
not only how to use color effectively but also how to boil down basic
theory on how human vision works to few simple rules to apply in visual
design. In a way it can be considered a sort of Colin Ware’s book
compressed in one pill. So again, if you don’t have enough time to read,
pick this one and study this chapter. You won’t regret your choice.
Tips and rules of thumb
Finally I try to put something myself. This is just a random list of rules I learned the hard way by doing.
- Don’t overestimate the power of color
– Color is attractive and powerful and let’s admit it, it is what makes
most of our visualizations pretty and nice to see. But for any serious
use it is important to realize how limited it is. The number of colors
we can easily distinguish is incredibly low (this you can learn it from
the refs above). For instance, it is estimated that the maximum number
of categorical colors we can easily detect in a representation is
around 12. Similar figures holds when presenting continuous data.
Compared to other data features like position, length, size, it is
visually perceived less efficiently. So just don’t believe color
mapping will do wonders, it is useful within its bounds.
- Always provide a color legend
– I think this one goes in the list of the most common mistakes in
visualization: some data feature is represented with color but then
there’s nothing in the interface that tells you what this color
represents. A color legend is alway needed and not only for labeling. As
an example, when it represents quantitative data it must also tell us to
what numbers the brightest and darkest colors map to. So in short,
please do your home work, provide a legend.
- Use color with extreme care and parsimony (above all do no harm!)
– This is a sort of repetition of the first point but from a different
angle. As color is added to an interface it soon becomes noise. Learn
to use it with extreme care and parsimony. It is important for instance
to realize that if color is used to represent a data feature it is
extremely hard to use it for some other elements in the interface.
In the end it is extremely important what Tufte says: “above all do no harm“.
- Learn to love grays and gray scales (grids!)
– The best use one can find of color is to understand how powerful
colorless graphics are. In particular shades of grays are so useful in
data representation that I am surprised there are so few, if any,
specialists advocating for their use (Tufte mentions it by the way). Give a look around, pick the best
known and best crafted tools and you’ll see that most of the times
their design is based on shades of gray. Gray is especially useful in
segmenting the visualization space and organizing it in spaces. The most
obvious example is the use of grids in charts and alternated rows in
tables (Stephen Few shows excellent examples in Show Me the Numbers) but
the same principle applies to thousands of other visualization components. So
in short: learn to love gray and gray scales, they can do wonders and rarely do harm.
- Don’t represent unordered data with ordered colors
– This is self-explanatory but I see it so often that I think it’s
worth to add it. Also, I think not everybody would agree with me on
that. Some people use different intensities of the same “hue” to
represent categories. In my opinion this is poor use of color and opens
the door to false interpretations. Ordered colors are automatically
coded as “there’s some ordered here” by our brain. Why do we want to
fool our mind when there are better solutions? Use distinguishable hues
and, if possible, make them of the same intensity. This will work best.
- Keep an eye to skewed distributions
– Personally I always find this problem in my data visualizations and I
am surprised it is not discussed more. When the dimension you map to
color has a skewed distribution the result is incredibly poor: there
are few items represented by the highest intensity and all the others
flattened to the lower. In short, there’s nothing really useful to see apart the fact that there are two or three items with
very high values. In this case one option is to adopt a not linear
mapping between data feature and color. Common solutions are
logarithmic or square root functions that alleviate the problem and
permit to reproduce a full progression of values.
Here was my list and …. oh before I forget there is one last major one!
- Don’t use the (infamous) rainbow color scale – Maybe someone would laugh at this advice as something too obvious but then, thanks to Ilya I
discovered that there is nothing to laugh about. If you are not
convinced see this study on the uses of the rainbow color scale and
discover how many professionals and researchers still believe it has
some value:Rainbow Color Map (Still) Considered Harmful
If you want to design great visualizations, learning to use color properly
and effectively cannot be avoided. The whole system is as weak as the
weakest link, therefore if color is used badly your design will suffer a
lot. Take your time, read as many of these references as you can and
you won’t regret. They come from top class researchers and
designers, you can trust their words. Your visualizations will improve,
your clients will thank you, and the visual world will definitely and
finally be less polluted.