Had a fantastic visit at ProPublica yesterday (thanks Alberto for inviting me and Scott for having me, you have an awesome team!) and we discussed about lots of interesting things at the intersection of data visualization, literacy, statistics, journalism, etc. But there is one thing that really caught my attention. Lena very patiently (thanks Lena!) showed me some of the nice visualizations she created and then asked:

How do you evaluate visualization?

How do you know if you have done things right?

Heck! This is the kind of question I should be able to answer. I did have some suggestion for her, yet I realize there are no established methodologies. This comes as a bit of a surprise to me as I have been organizing the BELIV Workshop on Visualization Evaluation for a long time and I have been running user studies myself for quite some time now.

Yet, when we are confronted with the task of evaluating visualization for communication purposes and for a wide audience what is the best way to go? I am not aware of established practices or methodologies that address this problem. Traditionally, academic work has focussed more on exploratory data analysis problems conducted by experts or very narrow experimental work on graphical perception.

But, let’s see what are the main issues and options there …

1) Expert Review or User Study? This is a classic problem in usability evaluation. Should we ask an expert to look at our visualization and give suggestions on how to improve it or involve users and see how they perform? Both a very valid and not necessarily mutually exclusive options. Typically, expert reviews are less costly and as such they are used in the early phases of the development process to iterate fast on the design. User studies involve a (hopefully) representative sample of people who get exposed to the visualization and some sort of qualitative or quantitative data collection about their experience. The unique problem of visualization, as opposed to the more generic problem of user-interface design, is that there are not so many experts out there. Plus, the experts do not use an established methodology, so the whole process does not scale. But if you want to run user studies you life does not get easier. User studies are a huge mess and if you don’t have experience running them you can do lots of things wrong. Very wrong.

2) Representative Sample? Assuming you want to run a user study, what is a representative sample? Once again, I think visualization poses unique challenges here. The problem is that visual literacy is quite low in the population so it’s not clear what you should shoot for. If you want to communicate to the layman you might end up not using visuals at all! But at the same time if every agency out there plays safe we won’t see any progress and we cannot expect visual literacy to increase. It’s a catch-22: if we don’t use advanced graphics people don’t learn, but if we use these visuals they might not be able to read our message. So we are left with the question of what is a representative sample. I think the main question here is representative of what? One way to go is to create a profile and recruit people with this profile or try to cover a whole spectrum of profiles, which of course might be much more costly and time consuming.

3) Data Collection / Benchmark Tasks or What? Ok now we have a representative sample of our readers, how do we test our visualization with them? One might try to adopt established methods from usability evaluation but the problem here is that usability evaluation is mostly based on the concept of “task”, that is, I show my study participants my interface and ask them to do something with it. Is that a good method for vis? I am not sure. Communication-oriented visualization is not really about performing a specific task to achieve a well-defined goal. Visualization is more about information transfer. How do we measure information transfer? Maybe we show the visualizations first and then we ask questions afterwards to see what information people have retained? That’s a viable way but it does not capture the visualization process itself, that is, what and how the user thinks during his or her interaction with the visuals. Another way to go is to use a “think-aloud” protocol: you sit next to your users and ask them to vocalize what they are thinking. This way you have a direct experience with what is going on. But, once again, this is easier said than done, as the way you interact with your participants, what you ask them, when and how, can heavily influence the outcome. So you have to be very careful there too.

There are probably many many more issues here but the common thread seems to be that while there are established methods and methodologies one may be able to adopt from traditional usability testing, visualization poses some unique challenges that are not solved yet.

On a side note, we also discussed the use of crowdsourcing platforms like Amazon Mechanical Turk to evaluate visualization. This is another viable way. It may (maybe) solve the sampling problem but it does not solve the others. Actually the others get even more complicated when you have limited interaction with your target population.

And you? Do you have any experience doing evaluation in this area? Are there other important issues or solutions worth mentioning?

Once again thanks Scott, Alberto and Lena for the inspiring discussion that triggered this blog post. There is so much more work that needs to be done!

Take care.

{ 4 comments }

I could not resist writing this short blog post after having a such a nice conversation with Scott Davidoff yesterday. Scott is a manager at the Human Interfaces Group at NASA JPL and he leads a group of people that takes care of big data problems at NASA (I mean big big data as those coming from telescopes and missions).

While on the phone he said:

You know Enrico … the way I see it is that we are mechanics for scientists … the same way Formula 1 has mechanics for their cars“.

What a brilliant metaphor! Irresistible. It matches perfectly my philosophy and at the same time, sorry to say, I think it does not match very well with the way most people see vis right now.

It reminds me the brilliant “Computer Scientist as a Toolsmith“, the fantastic essay written by Fred Brooks (ACM Turing Award) which I have adopted a long time ago as my personal manifesto. Fred Brooks advocated for a different way to see the role of Computer Science (one I am sure many of my colleagues refuse) as an engineering discipline whose purpose is to provide services to scientists. He famously stated that:

IA > AI (Intelligence Amplification can beat Artificial Intelligence).

That is, a machine and a mind can beat a mind-imitating machine working by itself.

And this all reminds me why I do what I do and why I think we should do more. Much more. In 2011 I was invited at Visualizing Europe, and event organized by Visualizing.org, and I gave a talk that pretty much covered the same ground: “Data Visualization is NOT Useful. It’s Indispensable“.

Talking with Scott, once again I realized how many people out there need our help. These are the people who may discover the next cure for cancer, help us going to Mars, find a way to preserve our planet, prevent terrorist attacks or disasters, just to name a few. You may think these people already have the necessary knowledge, means and skills to tackle big data problems on their own but you are wrong. These people are busy with their science, and for a good reason!

All these people need us! Let me repeat it: all these people need us! It’s up to us to show them what they can do with our tools and skills. Most of them simply do not imagine how powerful some of the things we do may be for them.

Let me tell you one thing: I have collaborated with a few scientists in my career so far and they love it when we make their life easier. Often they are blown away by simple trick we take for granted.

So if you are passionate about data and data visualization I urge you to think about this: you can decide to tackle hard problems with data. You can decide to make a big difference with pairing up with people who deal with hard scientific problems and help them make progress. It’s up to you to make this choice.

C’mon!

My biggest ambition is to be a mechanic. A mechanic for the the Formula 1 of science.

And you?

 

{ 5 comments }

… or whatever we want to call it.

Yin Shanyang writes on twitter in response to my last post on vis as bidirectional channel:

Screen Shot 2014-05-08 at 11.18.17 AM

This comment really hits a nerve on me as I have been thinking about this issue quite a lot lately. I must confess I am no longer satisfied with the word “visualization”. And I am even less satisfied by all the other paraphernalia people like to use: data visualization, interactive visualization, information visualization, visual analytics, infographics, etc.

The reason is that I think all these words do not describe well the work I and many other people do. While visualization seems to be appropriate when the main purpose is data presentation, I don’t think it captures the value of visualization when it is used as a data sensemaking tool.

When used for this purpose interaction is crucial. Analysis looks more like a continuous loop between these steps:

  1. specify to the computer what you want to see and how (the specific visual representation)
  2. detect patterns, interpret the results and generate questions
  3. ask the computer to change the data and/or the visualization to accommodate the new question(s)
  4. assess the results … repeat …

Analytical discourse is a term I saw used in the visual analytics agenda a few years back and I think it captures very well this concept. This all interplay and discourse between the machine and the human. This is what many of us are after and I am not sure the term visualization is able to express this concept in its entirety. The value of these tools is not exclusively in the visual representation; interaction plays a major role.

This became even more apparent to me while teaching my InfoVis course this semester. I teach a lot of things about visual representation but when students come down to building software for their projects, what they are really working on is a fully-fledged user interface. They have multiple linked views, search boxes, dynamic query sliders and all the rest. It’s interactive user interface design they end up doing, not visualization. And user interface design carries a lot of additional challenges that go beyond visual representation. Sure, designing the appropriate representation is still very important but many other choices impact the final results.

For instance all my students’ projects have multiple interactive views, maybe sometime just a main visualization, a list of terms and a couple of query sliders for dynamic filtering, but how do you call that? I call that visualization but in practice it’s a complex user interface. Or a “data interface” as suggested by Yin.

One last note. While thinking about this whole idea I recalled that Jeff Heer‘s lab at UW is called Interactive Data Lab and I think he’s got it right. Interaction with the data is the main thing, visualization is the medium we use to create part of this interaction.

What do you think? Too heretic? To much of a hassle?

{ 6 comments }

I am preparing a presentation for a talk I am giving next week and I have a slide I always use at the beginning that asks this question:

How do we get information from the computer into our heads?

This works as a motivation to introduce the idea that regardless the data crunching power we are going to produce in the future the real bottleneck, in many applications, will always be the human mind. Getting information across from what our computers accumulate and generate to our heads and being able to understand it is the real challenge. Visualization is the tool we use to deal with this problem. By using effective visual representations of data we tap into the power of the human brain with all its incredible powers we have not been able yet to reproduce and synthesize in a machine (I let the discussion of whether this is possible or even desirable to others).

When I present this slide I normally quote the great Fred Brooks’  The Computer Scientist as Toolsmith and add this image from the paper:

Screen Shot 2014-05-06 at 8.00.27 PM

But today for the first time I realized that when we talk about visualization we always talk about it as a one way channel, from the computer (or other media) to the human, when in fact there is a lot of knowledge flowing from the human to the machine.

When we use an interactive visualization tool we decide which data segments we want to attend to (think how Tableau works). This is derived from our knowledge and questions which we implicitly use to make choices about what to visualize next and how. When we use dynamic queries we use our knowledge to tell the computer that we are interested in a specific segment of the data and that we want to see it now.

There is a simple but effective function in Tableau that I love and is a good example of what I am trying to say here: the “exclude” function, which allows you to remove a data item from the visualization completely because not interesting or just annoying. When we do that, we are transferring our specific knowledge to the computer to tell it that we don’t need to see that data point anymore.

All in all it seems to boil down to interaction and how it is the only way to translate our intentions into instructions our computers can interpret. I think that what I really want to say is that we tend to forget how powerful this channel is and how limited it is to think about visualization exclusively as a 1-way communication tool. Sure, we can keep considering visualization this way but I think it’s much more exciting to think about it as a “visual thinking tool” where information flows in both directions.

And I think there is even more than that. While interaction in visualization is currently limited to giving instructions about what to see next, nothing prevents interaction to be used as a tool to transfer pieces of human knowledge directly to the computer. Classic examples where this has been attempted in the area of machine learning and related fields are relevance feedback mechanisms and active learning. Both technique rest on the idea of asking a human how to judge a decision made by the computer and use the result as a way to improve the computation. This is only one example but I think there are many unexplored ways to input our knowledge back into the computer to make it smarter and I think visualization should play a much larger role there.

That’s all for now. Thoughts?

{ 0 comments }

[Be warned: this is me in a somewhat depressive state after the deep stress I have endured by submitting too many papers at VIS'14 yesterday. I hope you will forgive me. In reality I could not be more excited about what I am doing and what WE are doing as a community. Yet, I feel the urge to share this with you. I will probably regret it in a few days :)]

I happen to click on one of the last links in one of the popular visualization blogs. I am excited. The title looks cool, the data looks cool and the design of the visualization looks super cool: sleek and clean, the way I like it. I give a look at the demo and you know what? There’s nothing there to see. Empty. No new knowledge, nothing to learn, nothing you can absorb. Nada.

This is not an isolated case. And that’s the reason why I am not happy to disclose which particular project I am talking about. First, because it would not be fair (I hate throwing shit at people). Second, because, as I said, this is not an isolated case. Third, because this particular project is only an expedient to talk about something much larger.

The way I see visualization is as a super powerful discovery tool. Stealing words to Fred Brooks, visualization for me is, ultimately, an intelligence amplification” tool: interactive user interfaces to observe the unobservable (or think the unthinkable?).

But many many visualizations out there show nothing. They are like modern food: empty calories. We, as a community, spent and still spend lots of energy debating whether one particular way of representing a given piece of information is better than another but we seem to forget that what is really important is what we decide to show in the first place. Ultimately, the yardstick should be: did you learn something watching this? Is there any kind of nutrient that enters your brain?

Let’s put it this way: if it was possible to observe exactly what kind of changes happen in the brain of a person when exposed to some new piece of information, through visualization, what would you like to see there? I would like to see a Pollock-like explosion of spreading activation followed by a difference. A delta. A sweet and tiny new brick of knowledge.

I see too much ambiguity out there. We talk about telling stories, about beautiful visualizations, and we talk a lot about wrong ways to visualize data. But what I would like to talk more is about: are we making a difference? Not a difference in the market or on twitter or whatever. A difference in people’s mind. In their brain actually.

I think the answer is mostly yes. I think … I believe … Or I like to believe. But sometime I fear we are not. The biggest fear I have, and this is the real sense of this post, is that if we will not be able to teach people how to create nutritious visualizations we may become irrelevant. Maybe it’s just a stupid thought, I don’t know, but that’s the way I feel when I get depressed by watching empty calories visualization (btw, maybe this should have been the real title of this post). The allure of pretty picture one day will end and I am not sure what will be left to see.

Creating visualizations to change people’s brain significantly is not an easy task but it’s also the only thing that really excites me about visualization  [Added note: Alberto and Gregor in the comments pointed out there is no way NOT to change your brain anyway when you are exposed to a visualization. They are right. So this is more of a colorful image than a good representation of what happens in reality. Yet, I like the concept anyway. Just don't take to literally!]. And now that I think about it, maybe I am writing this post more for myself than for you. I want to remind myself that my ultimate goal is to help people do remarkable things with visualization. It’s so easy to forget it in the day-to-day. I want to be able to literally change those neurons and synapses and make a difference in people’s brain. That’s what counts for me. Isn’t that a more than worthy and magnificent goal?

And what is your goal by the way?


Take care,
Enrico.

{ 13 comments }

Course Diary #3: Beyond Charts: Dynamic Visualization

March 7, 2014

This is the last lecture of the introductory part of my course where I give a very broad (and admittedly shallow) overview of some key visualization concepts I hope will stick in my students’ head. After talking about basic charts and high-information graphics I introduce dynamic visualization as visual representations that can change through user […]

Read the full article →

Course Diary #2: Beyond Charts: High-Information Graphics

February 28, 2014

Hi there! We had a one week break at school as the inclement weather forced us to cancel the class last week. Here are the lecture slides from this class: Beyond Charts: High-Information Graphics. In this third lecture I have introduced the concept of “high-information graphics”, a term I have stolen from Tufte’s Visual Display […]

Read the full article →

Course Diary #1: Basic Charts

February 10, 2014

Starting from this week and during the rest of the semester I will be writing a new series called “Course Diary” where I report about my experience while teaching Information Visualization to my students at NYU. Teaching to them is a lot of fun. They often challenge me with questions and comments which force me […]

Read the full article →

The Role of Algorithms in Data Visualization

January 28, 2014

It’s somewhat surprising to me to notice how little we discuss about the more technical side of data visualization. I use to say that visualization is something that “happens in your head” to emphasize the role of perception and cognition and to explain why it is so hard to evaluate visualization. Yet, visualization happens a […]

Read the full article →

Data with a Soul and a Few More Lessons I Have Learned About Data

January 15, 2014

I don’t know if this is true for you but I certainly used to take data for granted. Data are data, who cares where they come from. Who cares how they are generated. Who cares what they really mean. I’ll take these bits of digital information and transform them into something else (a visualization) using […]

Read the full article →