Had a fantastic visit at ProPublica yesterday (thanks Alberto for inviting me and Scott for having me, you have an awesome team!) and we discussed about lots of interesting things at the intersection of data visualization, literacy, statistics, journalism, etc. But there is one thing that really caught my attention. Lena very patiently (thanks Lena!) showed me some of the nice visualizations she created and then asked:
How do you evaluate visualization?
How do you know if you have done things right?
Heck! This is the kind of question I should be able to answer. I did have some suggestion for her, yet I realize there are no established methodologies. This comes as a bit of a surprise to me as I have been organizing the BELIV Workshop on Visualization Evaluation for a long time and I have been running user studies myself for quite some time now.
Yet, when we are confronted with the task of evaluating visualization for communication purposes and for a wide audience what is the best way to go? I am not aware of established practices or methodologies that address this problem. Traditionally, academic work has focussed more on exploratory data analysis problems conducted by experts or very narrow experimental work on graphical perception.
But, let’s see what are the main issues and options there …
1) Expert Review or User Study? This is a classic problem in usability evaluation. Should we ask an expert to look at our visualization and give suggestions on how to improve it or involve users and see how they perform? Both a very valid and not necessarily mutually exclusive options. Typically, expert reviews are less costly and as such they are used in the early phases of the development process to iterate fast on the design. User studies involve a (hopefully) representative sample of people who get exposed to the visualization and some sort of qualitative or quantitative data collection about their experience. The unique problem of visualization, as opposed to the more generic problem of user-interface design, is that there are not so many experts out there. Plus, the experts do not use an established methodology, so the whole process does not scale. But if you want to run user studies you life does not get easier. User studies are a huge mess and if you don’t have experience running them you can do lots of things wrong. Very wrong.
2) Representative Sample? Assuming you want to run a user study, what is a representative sample? Once again, I think visualization poses unique challenges here. The problem is that visual literacy is quite low in the population so it’s not clear what you should shoot for. If you want to communicate to the layman you might end up not using visuals at all! But at the same time if every agency out there plays safe we won’t see any progress and we cannot expect visual literacy to increase. It’s a catch-22: if we don’t use advanced graphics people don’t learn, but if we use these visuals they might not be able to read our message. So we are left with the question of what is a representative sample. I think the main question here is representative of what? One way to go is to create a profile and recruit people with this profile or try to cover a whole spectrum of profiles, which of course might be much more costly and time consuming.
3) Data Collection / Benchmark Tasks or What? Ok now we have a representative sample of our readers, how do we test our visualization with them? One might try to adopt established methods from usability evaluation but the problem here is that usability evaluation is mostly based on the concept of “task”, that is, I show my study participants my interface and ask them to do something with it. Is that a good method for vis? I am not sure. Communication-oriented visualization is not really about performing a specific task to achieve a well-defined goal. Visualization is more about information transfer. How do we measure information transfer? Maybe we show the visualizations first and then we ask questions afterwards to see what information people have retained? That’s a viable way but it does not capture the visualization process itself, that is, what and how the user thinks during his or her interaction with the visuals. Another way to go is to use a “think-aloud” protocol: you sit next to your users and ask them to vocalize what they are thinking. This way you have a direct experience with what is going on. But, once again, this is easier said than done, as the way you interact with your participants, what you ask them, when and how, can heavily influence the outcome. So you have to be very careful there too.
There are probably many many more issues here but the common thread seems to be that while there are established methods and methodologies one may be able to adopt from traditional usability testing, visualization poses some unique challenges that are not solved yet.
On a side note, we also discussed the use of crowdsourcing platforms like Amazon Mechanical Turk to evaluate visualization. This is another viable way. It may (maybe) solve the sampling problem but it does not solve the others. Actually the others get even more complicated when you have limited interaction with your target population.
And you? Do you have any experience doing evaluation in this area? Are there other important issues or solutions worth mentioning?
Once again thanks Scott, Alberto and Lena for the inspiring discussion that triggered this blog post. There is so much more work that needs to be done!