Category Archives: Thoughts

Ideas, thoughts, open issues, questions.

What do we do this VIS thing for? Towards a data visualization ethos.

I often find myself asking: “What do we do this Data Visualization thing for?”. Of course I do it mostly because it’s fun, and I bet it’s the same for you. Yet, is there a way we can find some deeper meaning in it? Are there some higher level purposes we can identify? Meaning often comes in relation to impact one can have on other people’s lives, so here is a tentative list off the top of my head of how vis can impact people’s lives (feel free to add yours in the comments below). Continue reading

Mechanics for the Formula 1 of Science

I could not resist writing this short blog post after having a such a nice conversation with Scott Davidoff yesterday. Scott is a manager at the Human Interfaces Group at NASA JPL and he leads a group of people that takes care of big data problems at NASA (I mean big big data as those coming from telescopes and missions).

While on the phone he said:

You know Enrico … the way I see it is that we are mechanics for scientists … the same way Formula 1 has mechanics for their cars“.

What a brilliant metaphor! Irresistible. It matches perfectly my philosophy and at the same time, sorry to say, I think it does not match very well with the way most people see vis right now. Continue reading

Visualization as a bidirectional channel

I am preparing a presentation for a talk I am giving next week and I have a slide I always use at the beginning that asks this question:

How do we get information from the computer into our heads?

This works as a motivation to introduce the idea that regardless the data crunching power we are going to produce in the future the real bottleneck, in many applications, will always be the human mind. Getting information across from what our computers accumulate and generate to our heads and being able to understand it is the real challenge. Visualization is the tool we use to deal with this problem. By using effective visual representations of data we tap into the power of the human brain with all its incredible powers we have not been able yet to reproduce and synthesize in a machine (I let the discussion of whether this is possible or even desirable to others).

When I present this slide I normally quote the great Fred Brooks’  The Computer Scientist as Toolsmith and add this image from the paper:

Screen Shot 2014-05-06 at 8.00.27 PM

But today for the first time I realized that when we talk about visualization we always talk about it as a one way channel, from the computer (or other media) to the human, when in fact there is a lot of knowledge flowing from the human to the machine.

When we use an interactive visualization tool we decide which data segments we want to attend to (think how Tableau works). This is derived from our knowledge and questions which we implicitly use to make choices about what to visualize next and how. When we use dynamic queries we use our knowledge to tell the computer that we are interested in a specific segment of the data and that we want to see it now.

There is a simple but effective function in Tableau that I love and is a good example of what I am trying to say here: the “exclude” function, which allows you to remove a data item from the visualization completely because not interesting or just annoying. When we do that, we are transferring our specific knowledge to the computer to tell it that we don’t need to see that data point anymore.

All in all it seems to boil down to interaction and how it is the only way to translate our intentions into instructions our computers can interpret. I think that what I really want to say is that we tend to forget how powerful this channel is and how limited it is to think about visualization exclusively as a 1-way communication tool. Sure, we can keep considering visualization this way but I think it’s much more exciting to think about it as a “visual thinking tool” where information flows in both directions.

And I think there is even more than that. While interaction in visualization is currently limited to giving instructions about what to see next, nothing prevents interaction to be used as a tool to transfer pieces of human knowledge directly to the computer. Classic examples where this has been attempted in the area of machine learning and related fields are relevance feedback mechanisms and active learning. Both technique rest on the idea of asking a human how to judge a decision made by the computer and use the result as a way to improve the computation. This is only one example but I think there are many unexplored ways to input our knowledge back into the computer to make it smarter and I think visualization should play a much larger role there.

That’s all for now. Thoughts?

My (stupid) fear we may, one day, become irrelevant

[Be warned: this is me in a somewhat depressive state after the deep stress I have endured by submitting too many papers at VIS’14 yesterday. I hope you will forgive me. In reality I could not be more excited about what I am doing and what WE are doing as a community. Yet, I feel the urge to share this with you. I will probably regret it in a few days :)]

I happen to click on one of the last links in one of the popular visualization blogs. I am excited. The title looks cool, the data looks cool and the design of the visualization looks super cool: sleek and clean, the way I like it. I give a look at the demo and you know what? There’s nothing there to see. Empty. No new knowledge, nothing to learn, nothing you can absorb. Nada.

This is not an isolated case. And that’s the reason why I am not happy to disclose which particular project I am talking about. First, because it would not be fair (I hate throwing shit at people). Second, because, as I said, this is not an isolated case. Third, because this particular project is only an expedient to talk about something much larger.

The way I see visualization is as a super powerful discovery tool. Stealing words to Fred Brooks, visualization for me is, ultimately, an intelligence amplification” tool: interactive user interfaces to observe the unobservable (or think the unthinkable?).

But many many visualizations out there show nothing. They are like modern food: empty calories. We, as a community, spent and still spend lots of energy debating whether one particular way of representing a given piece of information is better than another but we seem to forget that what is really important is what we decide to show in the first place. Ultimately, the yardstick should be: did you learn something watching this? Is there any kind of nutrient that enters your brain?

Let’s put it this way: if it was possible to observe exactly what kind of changes happen in the brain of a person when exposed to some new piece of information, through visualization, what would you like to see there? I would like to see a Pollock-like explosion of spreading activation followed by a difference. A delta. A sweet and tiny new brick of knowledge.

I see too much ambiguity out there. We talk about telling stories, about beautiful visualizations, and we talk a lot about wrong ways to visualize data. But what I would like to talk more is about: are we making a difference? Not a difference in the market or on twitter or whatever. A difference in people’s mind. In their brain actually.

I think the answer is mostly yes. I think … I believe … Or I like to believe. But sometime I fear we are not. The biggest fear I have, and this is the real sense of this post, is that if we will not be able to teach people how to create nutritious visualizations we may become irrelevant. Maybe it’s just a stupid thought, I don’t know, but that’s the way I feel when I get depressed by watching empty calories visualization (btw, maybe this should have been the real title of this post). The allure of pretty picture one day will end and I am not sure what will be left to see.

Creating visualizations to change people’s brain significantly is not an easy task but it’s also the only thing that really excites me about visualization  [Added note: Alberto and Gregor in the comments pointed out there is no way NOT to change your brain anyway when you are exposed to a visualization. They are right. So this is more of a colorful image than a good representation of what happens in reality. Yet, I like the concept anyway. Just don’t take to literally!]. And now that I think about it, maybe I am writing this post more for myself than for you. I want to remind myself that my ultimate goal is to help people do remarkable things with visualization. It’s so easy to forget it in the day-to-day. I want to be able to literally change those neurons and synapses and make a difference in people’s brain. That’s what counts for me. Isn’t that a more than worthy and magnificent goal?

And what is your goal by the way?


Take care,
Enrico.

Data with a Soul and a Few More Lessons I Have Learned About Data

I don’t know if this is true for you but I certainly used to take data for granted. Data are data, who cares where they come from. Who cares how they are generated. Who cares what they really mean. I’ll take these bits of digital information and transform them into something else (a visualization) using my black magic and show it to the world.

I no longer see it this way. Not after attending a whole three days event called the Aid Data Convening; a conference organized by the Aid Data Consortium (ARC) to talk exclusively about data. Not just data in general but a single data set: the Aid Data, a curated database of more than a million records collecting information about foreign aid.

The database keeps track of financial disbursements made from donor countries (and international organizations) to recipient countries for development purposes: health and education, disasters and financial crises, climate change, etc. It spans a time range between 1945 up to these days and includes hundreds of countries and international organizations.

Aid Data users are political scientists, economists, social scientists of many sorts, all devoted to a single purpose: understand aid. Is aid effective? Is aid allocated efficiently? Does aid go where it is more needed? Is aid influenced by politics (the answer is of course yes)? Does aid have undesired consequences? Etc.

Isn’t that incredibly fascinating? Here is what I have learned during these few days I have spent talking with these nice people.

Data are not always abundant or easy to get. In the big data era we are so used to data abundance that we end up forgetting how some data, data crucial for important human endeavors, may be very hard to get. It’s not just like creating the next python script and scrap a million records in 24 hours. No, it’s a super-painful process. For instance, the Aid Data folks have a whole team of data gatherers and a multistep process which includes: setting up an agreement with a foreign country, having people flying to remote places and convince officials to make their information available, obtain a whole bunch of documents and files, transform these files into a common format and add geographical coordinates (geocoding) where necessary,  cross-checking data with multiple coders, etc. How far is this from writing a bunch of python code?

Data granularity can be a game-changer. It took me a while to understand why Aid Data users are so excited by the new release of the database which features, for the first time, data at the sub-national rather than only national level. This means that financial disbursements are geocoded at a higher level of granularity, that is, instead of knowing only that a certain amount has flown from one country to another you can now know in which region it has gone. To my eyes this seemed like a minor thing, but as I went through a few presentations of people doing real research with these data I suddenly realized it is a huge change! Picture this: you know data is flowing from the US to Uganda but you have no idea where it goes once it lands there. All in a sudden researchers can ask a whole lot of new and more interesting questions. In turn, this makes me think how this extends to many other data sets: small changes can have huge impacts. A little bit more details, may pave the way to much bigger questions. How can we make existing data systematically more valuable by adding crucial information? And what is this crucial information by the way?

Questions are much more important than data. I did not need to attend this conference to realize how true this is. Yet, after attending it I am even more convinced now. One of the highest peaks of the event for me was listening to all the diverse and interesting questions researchers have on this single data set. There are all kind of flavors: aid effect on health, democratic processes, recovery from disasters and violence or vice-versa how specific events or conditions influence aid. Even if data are a critical asset to answer these question, and to substantiate them with hard numbers, the real value comes from the questions, not from the data. And questions come from the most important asset we have: our brain. Data without brain is useless. Brain without data may still be somewhat useful I guess.

Interesting questions are causal. It’s stunning for me to see how most visualization projects are mostly organized around the detection and depiction of trends, patterns, outliers, groupings,  and so seldom around causation. Yet, in most scientific endeavors causal relationships is what matters the most. While detecting trends is still important, ultimately researchers want to see how A has an effect on B (well it may be much more complicated than that but you get the point): does aid have an effect on child mortality? does aid reduce conflicts? does aid to region A displace resources from region B? It’s extremely surprising to me, after working in visualization for many years, to realize how agnostic visualization is to causation and causal models, when in fact virtually every scientific question subsumes a causal relationship. How can we make progress and systematically explore how visualization can help uncover or present causal relationships?

OMG data bias! It was sometime halfway through the conference, after hearing all sorts of praises for Aid Data,  that one of the attendants bravely stood up and said something along the lines: “Hey wait a moment folks … these data have a huge bias! If we include only countries which accept to provide their data, we have a big  selection bias problem. How is this going to affect our research?” (kudos to Bruce for raising this question). This reminded me that data always comes with all sorts of intricacies and problems. It can be bias, it can be missing values, it can be errors, it can be a lot of other hidden things that may totally invalidate our findings. If there is one lesson to learn here is this: while it is easy to get super-excited about data and the endless opportunities they present, it is hard to acknowledge data are limited and may even be useless in some circumstances. Rather than sweeping these problems under the carpet, we’d better develop some sort of “data braveness” or “data mindfulness” and admit that data, after all, may have all sorts of bugs.

Communities of practice and visualization as a cultural artifact. During the course of the conference I had the opportunity to see lots of charts, graphs, diagrams. Visualization is definitely part of this community: they love maps and enjoy presenting they ideas through colorful visual representations. Earlier, last year I had the opportunity to work with a group of climate scientists on a different project and similarly I have seen them using lots of charts, diagrams, graphs. What I am starting to notice, after seeing so many people using visualization for their own purposes, is that visualization is a cultural artifact. Communities of practice go through an interesting evolutionary process where tools like data visualization are adopted, transformed and consolidated, forming numerous implicit and explicit defaults, conventions and expectations. With Aid Data for instance most people need to visually correlate two main variables: amount of aid and an outcome variable, both in geographical space. Most of them end up using a choropleth map with bubbles on top. Is that the best representation? I don’t know. But I know this is familiar to everyone and this is what most of them expect and are used to see. How much do we know about these communities of practice? How can research in visualization develop a better understanding of how people use visualization in real-world settings? What could we gain by doing that?

Behind data there might be a “soul”. Finally, the last thing I learned is the most important one. Data is just a signal, only a dry description of something that is much more important: real people, phenomena, events. It’s way too easy, when used to work with lots of different data and big piles of them, to forget what lies behind all these bits; what these bits really are. Aid Data and the stories I have heard of reminded me that behind data there can be profound desperation, joy, struggles, good and bad intentions, failures and successes. In a word, there can be real humans and their lives. I think it is really important for us not to lose this connection. Not to completely detach from what these data represent. Next time you start a project try to pause for a moment and think: behind data there might be a soul.

That’s all I had to say. This has been an extremely enriching experience for me and I hope these few thoughts will spark some new ideas and feelings into you. As usual, feel free to comment and react on it. I’d love to hear your voice!

Take care.

The myth of the aimless data explorer

aimlessThere is a sentence I have heard or read multiple times in my journey into (academic) visualization: visualization is a tool people use when they don’t know what question to ask to their data.

I have always taken this sentence as a given and accepted it as it is. Good, I thought, we have a tool to help people come up with questions when they have no idea what to do with their data. Isn’t that great? It sounded right or at least cool.

But as soon as I started working on more applied projects, with real people, real problems, real data they care about, I discovered this all excitement for data exploration is just not there. People working with data are not excited about “playing” with data, they are excited about solving problems. Real problems. And real problems have questions attached, not just curiosity. There’s simply nothing like undirected data exploration in the real world.

Digging a little deeper into the issue, I realize that after all this is natural and somewhat obvious: why should people explore data for the sake of it? Sure some people like us (yes the hopeless data geeks) do take pleasure in looking into a bunch of data, but we are a minority and I am not sure we should take us as the model of reference for what we do.

The reason why I decided to write about this thing is that I think this myth is somewhat pervasive and it’s not limited to visualization. While I am not a Data Mining or Machine Learning expert I know some people in the area and I know some of then too promote “knowledge discovery” as the science of finding good questions.

But wait a moment you might say … when we use knowledge discovery tools (yes, vis is a knowledge discovery tool) sometimes we do stumble into unanticipated questions and these questions may in fact be the real value of the whole process! I agree. And I have experienced this effect multiple times myself. Yet, I think this does not contradict my point: what I am arguing is not that we should not help people coming up with new questions as a collateral effect of data analysis or that coming up with new question is not valuable. What I am arguing here is that we should be very careful in selling visualization as a tool for people who don’t know what question to ask. This is simply not true. Everyone has a question and actually I even believe everyone should start with a question.

There are a couple of words I like more when talking visualization: hypothesis and explanation. These are great words! They describe much better what visualization is good for. You might actually have a good question to start with but not a good hypothesis or explanation for what is going on there (some patients develop unexpected complications after receiving a particular treatment and you don’t know why). And visualization can for sure help you out with coming up with one. Visualization is an “hypothesis booster”. It’s actually so effective that it could even be dangerous in this respect (it may bias you toward some explanation)!

So next time you talk about visualization restrain yourself to selling it for a tool to help people aimlessly explore some data. And when you hear someone saying that please send him or her to this post. I’d be happy to defend my position :)

Am I missing something here? Am I totally wrong in some sense? I know there are some people out there who would strongly disagree with me, feel free to let me hear your voice!

 

 

Data Visualization Semantics

A few days ago I had this nice chat with Jon Schwabish while sipping some iced tea at Think Coffee in downtown Manhattan: what elements of a graphic design give meaning to a visualization? How does the graphical marks, their aesthetics, and their contextual components translate into meaningful concepts we can store in our head?

Everything started from us discussing the role of text in visualization and how labels and annotations play a big role in this sense. Try to think about visualization with no text at all: where does the meaning come from?

I think interpretation depends at least on these two main factors: (1) background knowledge in the reader and (2) semantic cues in the graphics. Interpretation is a sort of “dance” between these two elements: what we have in our head influences what we see in the graphics (this is a very well known fact in vision science) and what we see in the graphics influences what we think.

Background Knowledge. No interpretation can happen if we do not connect what we see with information that is already stored in our head. That’s the way Colin Ware puts it in his “Visual Thinking for Design“:

“When we look at something, perhaps 95 percent of what we consciously perceive is not what is “out there” but what is already in our heads in long-term memory. The reason we have the impression of perceiving a rich and complex environment is that we have rich and complex networks of meaning stored in our brains” [Ch.6, p.116]

And:

“… we have been discussing about objects and scenes as pure visual entities. But scenes and objects have meaning largely through links to other kinds of information stored in a variety of specialized regions of the brain” [Ch.6, p.114]

We are so fixated with data today that we end up forgetting data is merely a (dry) representation of a much more complex phenomenon, and that people need to have their own internal representation of this phenomenon in order to reason about it. This is independent from the data and it plays a big role on how people interpret and interacts with a visualization. Sure, one could always analyze a graph syntactically and say that something is increasing or decreasing over time, or that some “objects” cluster together, etc. But is that useful at all?

Of course, interpretation is subjective and biases pop up all the time, but how does the designer’s intent interact with all the preconceptions, biases and skills of any given reader? This is a huge topic and I don’t see anything around that can help us sorting these thing out.

Interestingly, I see two opposite cases taking place in visualization use and practice. When visualization is used mainly as a communication tool, that is, to convey a predefined message the designer has crafted for the reader, the reader has to be educated before interpretation takes place.

But when visualization is used as an exploratory or decision making tool developed for a group of domain experts, we have an opposite kind of gap: the designer is typically ignorant about the deep meaning of the data and needs to be educated before good design takes place. Without a very tight collaboration between the designer and the domain scientist it’s practically impossible to build something really useful. I have experienced that myself many times. Unfortunately, such a tight collaboration does not happen easily and it’s very hard to establish in the first place.

Semantic Cues. The way visualization itself is designed can support or hinder the semantic association between graphical elements and concepts. The minimum requirement is that the user understands how the graphics works and what it represents. Some charts are easier to interpret because people are familiar with them, some others are fancier and need additional explanations.

But even when a chart is familiar, explanations are needed to understand what the graphical objects represent. I have seen this problem so many times in presentations, especially when some fancy visualization technique is used: the presenter does not describe the semantic associations well enough and the audience gets totally lost.

Other than showing trends and quantities visualization needs to make clear how to create a mental link between the objects stored in your head and those perceived in the visualization: the “what”, “who”, “where”, elements. The theory of visual encoding is so heavily based on the accurate representation of quantitative information that it seems like we have totally forgotten how important it is to employ effective encodings for the what/where channels. This is perhaps why visualization of geographical data is often on a map. Keeping the geographical metaphor intact might not be the “best” visual encoding for the task at hand, yet it carries such a high degree of semantics that it’s hard to shy away from it.

Finally, going back to the original idea of this post, text is king when we talk about interpretation. Seriously, think about visualization with and without text. Text makes visualization alive. It gives meaning to what you see. Among the most common textual elements you can find in a visualization there are: axes labels, legend labels, item labels, titles, annotations, but I guess there are many unused/under-researched aspects. Labeling is tricky and not well studied yet (except for label placement, a quite extensively developed niche). For instance, when the number of data labels shown is higher than a few units there is a high risk to clutter up the screen and no obvious solutions exist.

Also, there is a limited understanding of what’s the best way to integrate visualization and text in a much more natural and seamless way, which goes beyond simply attaching labels to objects and very well beyond the scope of this post.

And you? What do you think? Did you ever think about how meaning is conveyed in visualization? Anything to add?

Thanks for reading. Take care.

What’s the best way to *teach* visualization?

Yes teach, not learn. I have been writing about ways to learn visualizations multiple time (here, here, here, here, here, and, here) and others have done it multiple times too, but I am more interested in questions about how to best teach visualization now. I have been teaching a whole new Information Visualization course last semester and I honestly have several doubts on what’s the best way to teach visualization.

First, I need your feedback.

I will get to the main doubts I have in a moment but before that I want to ask you a favor: if you teach or ever taught a course, I would be very happy to hear from you your opinion and experiences with teaching visualization. I know, as a matter of fact, every teacher has big questions and doubts about best teaching practices. I’d love to hear from you the ups and downs of your teaching experiences.

I would also be very happy to get feedback from people who have attended visualization courses in the past. Did you ever attend a visualization course? If yes, what do you think is the best way to be taught about visualization? Is there anything that worked especially well or bad. What is the most challenging part? Theory, practice, tools, examples? What does or doesn’t work in class?

My InfoVis Course

My Information Visualization course is held at NYU-Poly and it’s open to students of every level (undergrads, grads, and phds). The course is organized around lectures, reading assignments, exercises (mostly in class) and a (big) final project. The project is the most important part of the course and my students work on it for more than half of the time of the whole course (about 2.5 months).

The course focusses mainly on visualization theory (mostly perceptual issues and visual encoding) and on the visualization design process. I have two main goals for my students: (1) make sure they can, for any given problem, explore a very large set of solutions (rather than focus on the first one that comes into their mind), (2) predict as much as possible what works and what does not work, that is, design and implement effective visualizations.

Issues with teaching visualization.

Here is a list of some specific issues I have with my courses.

Visual literacy. I have noticed this problem multiple times in my courses: I show a wide array of visualization examples early on in the course and then I quickly focus on the nuts and bolts of visualization design. The students seem to understand what I teach but they simply have not experienced enough visualization design to really internalize and fully understand what I say. Also, when we come to the problem of designing some visualizations for a data set they have never seen before in the past they don’t have enough of the visualization design space in mind to explore all the interesting possibilities. They are mostly anchored to what come first (usually from experience). How do you develop visual literacy early on in the course? I am not sure but I feel students need a much deeper immersion into the visualization design space before they are able to work on it.

Tools. I usually give total freedom to my students to choose whatever tool they want (I usually make the bad joke they can code visualization in assembler if they want to). I give some detailed advice on what I think are the best tools around but then they choose and learn the tools on their own. I used to think that tool choice is a secondary aspect of teaching visualization but I totally changed my mind. Visualization, as any other design craft, is totally dependent and shaped by the tools one uses, both consciously and unconsciously. The tool you use will give a certain shape and frame of mind to the visualization you produce (I first learned about the idea of how technology and context shape the creative process from David Byrne’s book How Music Works). Furthermore, there is the issue coding vs. noncoding. I know a lot of people do great visualization without writing a single line of code. Yet, I think coding gives much more freedom. I now believe it is much more effective to give tools the role they deserve and teach one (max two) core tools in my future courses. How do you use tools in your course? Is it an accessory or fundamental aspect of your course?

Projects. I think there’s no doubt visualization can only be learned by doing a lot of practice. I use to repeat that visualization can only be judged when you see it not when you think about it. Projects are a great way to put in practice what you learn and solve some interesting and challenging problems. Yet, how do you split the time between the project part and the lectures? How early should the student work on their projects? Ideally they should start very early on so that they have enough time, but shouldn’t they first acquire some basic knowledge? But how can they acquire this knowledge without practice? Again, I used to believe the best way is to split the course into two main periods: the lectures period and the project period. Now I am no longer sure this is the best way to go. How about mini-projects? Projects with a much shorter time span but nicely interleaved with the lectures? It sounds like a nice option. Yet, when are the students confronted with a more realistic medium-size project. Can one afford to have mini-projects AND one big project? I am not sure.

There are many other issues I have but these are the most pressing ones I had.

Now it’s your turn! What’s the best way to teach visualization? I’d love to hear your opinions and experiences. Thanks!

Take care.

 

 

Where are the data visualization success stories?

I see a lot of visualization around me now and I am extremely excited about it. Yet, are we making any real difference? I mean, are we having any real impact in people’s life other than telling them beautiful stories?

Yes I know, impact could be defined in a million different ways and it may be hard to capture. But why? Why I never stumble into an article or blog post showing, I don’t know, for instance, how visualization helped a group of doctors doing something remarkable with visualization?

Is it just because this stuff does not get reported or what?

Here are a few possible explanations:

  • Explanation#1: Impactful visualization is hidden. Those people who are using visualization successfully, who have a real impact, are too busy to report their success.
  • Explanation #2: Visualization is just a fragment of a much larger process. Visualization, when is not used as a communication/story telling tool is part of a much larger process, which includes many other steps and tools so simply success is not ascribed to visualization.
  • Explanation #3: Visualization impact has yet to come. Maybe we just have to wait a bit longer and we’ll get all the success we want.

What do you think? Do you have other explanations? Is my question just too pretentious? Or did I just miss a ton of success stories and this post is totally nonsense?

P.S.1 On a side note: other areas of data analysis, especially automatic approaches like machine learning and data mining have plenty of stories to tell. Why? Food for thought …

P.S. 2 After writing this post I discovered my friend Andy Kirk has written a much longer post on this issue.

What Is Progress In Visualization?

Being a visualization researcher means a very large body of my work revolves around pushing the boundaries of visualization further. I do that by mostly developing innovative techniques but also trying to better understand how humans interact with this amazing tool we call visualization.

You might think I have at least a rough idea of what progress means in visualization then, but in fact I don’t. And I guess I am not alone: researchers are trained to dive into tiny details and speculate for ages. The purpose of this post is to explore bigger questions:

  • What is progress in visualization?
  • How do we make progress in visualization?
  • And how do we measure it?

I ask that because honestly I don’t see a direction in what we are doing. We researchers are mostly focussed on developing yet another technique, practitioners on (understandably) satisfying their customers. But what is our ultimate goal? Here I propose s few ways we can look at progress in visualization.

Progress As Real-World Impact

First and foremost I propose progress in visualization is the extent to which we are able to help people do remarkably useful things with data. This is for me the gold standard, the holy grail. It is a broad and vague definition but it helps. When I say “remarkably useful” I mean: can we say visualization played a critical role in curing or preventing diseases? Reducing poverty? Solving or preventing economic crises? Make people richer or happier? Etc. Think about it, why not? Why do we do visualization if not for these purposes?

Despite some few isolated cases I don’t see this happening now. We should keep our eyes open and focus more on having an impact in the real world. Visualization has this potential, I am sure, and progress is made, I believe, when we help people do remarkable things. The VisWeek conference used to host a very nice session called Discovery Exhibition with the specific intent to showcase success stories. Unfortunately, (its hurts to admit it) I think it was quite a failure. I remember a similar frustrating post from Stephen Few some years ago: “True Stories about the Benefits of Data Visualization“. And I have yet to see persuasive answers to his call.

Progress As Knowledge Construction

I have to admit measuring progress exclusively in terms of impact and success stories might be a bit fuzzy, not very practical and ultimately a bit subjective. Another possibility is to define progress as the accumulation of knowledge that permits to build more effective visualization. But what do we need to know that we don’t know yet? Broadly speaking we need to know:

  1. How humans work.
  2. How to translate knowledge about humans into visualization design.

Are we doing that right now? Partly, in academic environments and a bit outside, but not enough in my opinion. It’s surprising to see how much more foundational work has been done in the past and how little today. We have a rough idea of how visual variables (position, length, color, size, etc.) work in isolation but very little understanding of how they interact in complex environments. We have alternative visualizations for the same kind of data and little understanding of how they influence information extraction (parallel coordinates vs. scatter plot matrix? node-link diagrams vs. matrices? maps or abstract representation? animation or small multiple?) And we have not even started scratching the surface of muddier issues like semantics, influence, persuasion, etc.

Progress as Technical Achievement

I don’t even know if I need to comment on this one, it’s pretty straightforward: technical achievement is the development of visualization and interaction techniques that solve unsolved technical problems or improve performance over existing solutions. Typically this takes the following form:

  • New visualization or interaction design.
  • Faster and/or more accurate algorithms.
  • Increased scalability in terms of data size and dimensionality.
  • Accommodation of new data formats and tasks.

I think it’s safe to say academic research is mostly focused on this. I am not sure whether technical achievement translates into real benefits in real-world applications but from time to time we have really useful stuff coming out. Edge bundling and horizon graphs are the first things that come into my mind. Are we making progress in this area? Yes. Would I like to see more? Yes and no … In a way sometimes I feel like we are spinning the wheel (please note that I include myself into this description and I am not immune to many many faults) so I’d like to see less spinning-the-wheel technical contributions and more useful stuff. But I also realize we cannot invent a new edge bundling every year. Progress happens with valleys and peaks.

Progress As Education and Adoption

Maybe this is the most neglected kind of progress, yet it very much lies at my heart. The last way to define progress in visualization I propose is the extent to which we are able to teach people how to judge and use visualization effectively and how many people will use visualization in their work. We need to reach more people (visualization at school?) but more importantly we need to teach proper visualization. We need courses, seminars, teaching material, web sites, and a whole army of evangelists. I am lucky enough to know quite a bunch of them but we need more.

I want to measure progress in a few years by counting how many people are able to criticize a chart. I also want to measure progress by assessing whether visualization will be part of the standard toolbox of scientists, business men and decision makers around the world.

Conclusion

This is what I had to say about progress. I know it’s not perfect, it’s just a draft. And now it’s your turn. How do you define progress in visualization? Are we making progress? How would you measure progress in visualization in, let’s say, 5 or 10 years from now?

And by the way, do you care about making progress? Why not? It is not necessary to be “a researcher” to make progress, you can make progress in a thousand ways. The only thing we need is to bring more focus. Or maybe we just have to let things happen and have some fun? I am looking forward to hearing from you guys. Thanks for reading.

On a side note: I have been out of the scenes with FILWD for a very long while. There are good reasons why that happened (I’ll tell you more about that later) but I want to assure you FILWD is not going to fade away. To the contrary, I have many plans on how to grow it further and offer a better service. If you are still there reading me after so much time well … thank you so much from the bottom of my heart! -Enrico