Category Archives: Thoughts

Ideas, thoughts, open issues, questions.

What do we talk about when we talk about “Data Exploration”?

There is an old “adage” in the InfoVis / Visual Analytics community I have heard a zillion times: “visualization is needed/useful when people don’t have a specific question in mind“. For many years I have taken this as “the verb”. Then, over time, as I have grown more experienced, I have started questioning the whole concept: why would someone look at a given data set if he or she has no specific goal or question in mind? It does not make sense.

This is an aspect of visualization that has puzzled me for a long time. An interesting conundrum I believe it is still largely unsolved. One of those things many say, but nobody really seems to have grasped in full depth.

Here is my humble attempt at putting some order into this matter. Let’s start with definitions:

The definition introduces a couple of very important features: familiarity (“through an unfamiliar area“) and learning (“in order to learn about it“). If we take this definition as main guidance, we can say that data visualization is particularly helpful when we use it to look into some unfamiliar data to learn more about something.

I suspect there are (at least) three main situations in which this can happens.

  1. Need to familiarize with a new data set (“how does it look like?“). Anyone who dabbles with data a bit goes through this: you receive or find a new data set and the first thing you need to do is to figure out what information it contains. How many fields? What type of fields? What is their meaning? Are there any missing values? Is there anything I don’t actually understand? How are the values distributed? Is there any temporal or geographical information? Are we actually in presence of some kind of network or relation structure? Etc. One crucial, and often overlooked, aspect of this activity is “data semantics”. I personally find that understanding the meaning of the various fields and the values they contain is a such a crucial and hard activity at the beginning. An activity that often requires many many back-and-forth discussions and clarifications with domain experts and data collectors.
  2. Hunting for “something” interesting (“is there anything interesting here?“). I suspect this is what people mostly really mean when they talk about “data exploration“: the feeling that something interesting may be hidden there and that some exploratory work is needed to figure it out. But … When does this actually happen? What kind of real-world activities are characterized by this desire of finding “something”? I am not sure I have an all-encompassing answer to that, but I am familiar with at least two examples: data journalism and quantified self. In data journalism it is very common to first get your hands on some potentially “juicy” data set and then try to figure out what interesting stories may hide there (Panama Papers, Clinton’s Emails, Etc.). I have observed this in our collaboration with ProPublica when hunting for stories about how people review doctors in Yelp. In quantified self you often want to look at your data to see if you can detect anything unexpected. I have experienced the same when looking at personal data I have collected about my deep work habits (or lack thereof). Sometimes we know there must be something interesting in a given data set, and visualization guides us in the formulation of unexpressed questions. The interesting aspect of this activity is that the outcome is often more (and better) questions, not answers.
  3. Going off on a tangent (“oh … this may be interesting too!“). There is one last, subtler, kind of data exploration. You start with a specific question in mind but, as you go about it, you find something interesting that triggers an additional question you had not anticipated. This is the power of visual data analysis, it forces you to notice something new and you have to follow the path. This happens to me all the time (and I hope it’s just not a sign of my ADD). Some of these are useless diversions. Some of them actually lead to some pretty unique gems!

These three modalities can of course overlap a lot. I am also sure there are other situations we can describe as data exploration which I am not covering here (in case you have some suggestions please let me know!).

I want to conclude by saying that this is an incredibly under-explored area of data visualization. More advances are needed at least in two directions.

  • First, we need to much better understand data exploration as a process and, if possible, create models able to describe it in useful abstract terms. In visualization research we often refer to Card and Pirolli’s “Sensemaking Loop” to describe this kind of open-ended and incremental activity but for some reason every time I try to use it, it does not seem to describe what I actually observe in practice (this deserve its own post).
  • Second, we need to develop more methods, techniques and tools to support interactive data exploration. I bet there are lots of “latent needs” waiting to be discovered out there. This is another area where I believe we, visualization researchers have surprisingly made little progress. We have built a lot of narrow solutions that work for 3-5 people but very few general purpose methods and techniques. We need more of that (this also deserves its own post)!
  • Third, we need to find ways to teach exploratory data analysis systematically to others in ways that make the process as effective as possible. I am appalled at how little guidance and material there is out there on teaching people how to do the actual analysis work. Statisticians are fixated with confirmatory analysis and regard exploration as a second-class citizen. Visualization researchers are too busy building stuff and have done too little to teach others how to do the actual ground work. This is a problem we need to solve. It’s for this reason that next semester I will be teaching a new course with this specific purpose. Stay tuned.

That’s all I had to say about Data Exploration.

And you? What is your take? What is data exploration for you? And how can we improve it?

Take care.

What do we do this VIS thing for? Towards a data visualization ethos.

I often find myself asking: “What do we do this Data Visualization thing for?”. Of course I do it mostly because it’s fun, and I bet it’s the same for you. Yet, is there a way we can find some deeper meaning in it? Are there some higher level purposes we can identify? Meaning often comes in relation to impact one can have on other people’s lives, so here is a tentative list off the top of my head of how vis can impact people’s lives (feel free to add yours in the comments below). Continue reading

Mechanics for the Formula 1 of Science

I could not resist writing this short blog post after having a such a nice conversation with Scott Davidoff yesterday. Scott is a manager at the Human Interfaces Group at NASA JPL and he leads a group of people that takes care of big data problems at NASA (I mean big big data as those coming from telescopes and missions).

While on the phone he said:

You know Enrico … the way I see it is that we are mechanics for scientists … the same way Formula 1 has mechanics for their cars“.

What a brilliant metaphor! Irresistible. It matches perfectly my philosophy and at the same time, sorry to say, I think it does not match very well with the way most people see vis right now. Continue reading

Visualization as a bidirectional channel

I am preparing a presentation for a talk I am giving next week and I have a slide I always use at the beginning that asks this question:

How do we get information from the computer into our heads?

This works as a motivation to introduce the idea that regardless the data crunching power we are going to produce in the future the real bottleneck, in many applications, will always be the human mind. Getting information across from what our computers accumulate and generate to our heads and being able to understand it is the real challenge. Visualization is the tool we use to deal with this problem. By using effective visual representations of data we tap into the power of the human brain with all its incredible powers we have not been able yet to reproduce and synthesize in a machine (I let the discussion of whether this is possible or even desirable to others).

When I present this slide I normally quote the great Fred Brooks’  The Computer Scientist as Toolsmith and add this image from the paper:

Screen Shot 2014-05-06 at 8.00.27 PM

But today for the first time I realized that when we talk about visualization we always talk about it as a one way channel, from the computer (or other media) to the human, when in fact there is a lot of knowledge flowing from the human to the machine.

When we use an interactive visualization tool we decide which data segments we want to attend to (think how Tableau works). This is derived from our knowledge and questions which we implicitly use to make choices about what to visualize next and how. When we use dynamic queries we use our knowledge to tell the computer that we are interested in a specific segment of the data and that we want to see it now.

There is a simple but effective function in Tableau that I love and is a good example of what I am trying to say here: the “exclude” function, which allows you to remove a data item from the visualization completely because not interesting or just annoying. When we do that, we are transferring our specific knowledge to the computer to tell it that we don’t need to see that data point anymore.

All in all it seems to boil down to interaction and how it is the only way to translate our intentions into instructions our computers can interpret. I think that what I really want to say is that we tend to forget how powerful this channel is and how limited it is to think about visualization exclusively as a 1-way communication tool. Sure, we can keep considering visualization this way but I think it’s much more exciting to think about it as a “visual thinking tool” where information flows in both directions.

And I think there is even more than that. While interaction in visualization is currently limited to giving instructions about what to see next, nothing prevents interaction to be used as a tool to transfer pieces of human knowledge directly to the computer. Classic examples where this has been attempted in the area of machine learning and related fields are relevance feedback mechanisms and active learning. Both technique rest on the idea of asking a human how to judge a decision made by the computer and use the result as a way to improve the computation. This is only one example but I think there are many unexplored ways to input our knowledge back into the computer to make it smarter and I think visualization should play a much larger role there.

That’s all for now. Thoughts?

My (stupid) fear we may, one day, become irrelevant

[Be warned: this is me in a somewhat depressive state after the deep stress I have endured by submitting too many papers at VIS’14 yesterday. I hope you will forgive me. In reality I could not be more excited about what I am doing and what WE are doing as a community. Yet, I feel the urge to share this with you. I will probably regret it in a few days :)]

I happen to click on one of the last links in one of the popular visualization blogs. I am excited. The title looks cool, the data looks cool and the design of the visualization looks super cool: sleek and clean, the way I like it. I give a look at the demo and you know what? There’s nothing there to see. Empty. No new knowledge, nothing to learn, nothing you can absorb. Nada.

This is not an isolated case. And that’s the reason why I am not happy to disclose which particular project I am talking about. First, because it would not be fair (I hate throwing shit at people). Second, because, as I said, this is not an isolated case. Third, because this particular project is only an expedient to talk about something much larger.

The way I see visualization is as a super powerful discovery tool. Stealing words to Fred Brooks, visualization for me is, ultimately, an intelligence amplification” tool: interactive user interfaces to observe the unobservable (or think the unthinkable?).

But many many visualizations out there show nothing. They are like modern food: empty calories. We, as a community, spent and still spend lots of energy debating whether one particular way of representing a given piece of information is better than another but we seem to forget that what is really important is what we decide to show in the first place. Ultimately, the yardstick should be: did you learn something watching this? Is there any kind of nutrient that enters your brain?

Let’s put it this way: if it was possible to observe exactly what kind of changes happen in the brain of a person when exposed to some new piece of information, through visualization, what would you like to see there? I would like to see a Pollock-like explosion of spreading activation followed by a difference. A delta. A sweet and tiny new brick of knowledge.

I see too much ambiguity out there. We talk about telling stories, about beautiful visualizations, and we talk a lot about wrong ways to visualize data. But what I would like to talk more is about: are we making a difference? Not a difference in the market or on twitter or whatever. A difference in people’s mind. In their brain actually.

I think the answer is mostly yes. I think … I believe … Or I like to believe. But sometime I fear we are not. The biggest fear I have, and this is the real sense of this post, is that if we will not be able to teach people how to create nutritious visualizations we may become irrelevant. Maybe it’s just a stupid thought, I don’t know, but that’s the way I feel when I get depressed by watching empty calories visualization (btw, maybe this should have been the real title of this post). The allure of pretty picture one day will end and I am not sure what will be left to see.

Creating visualizations to change people’s brain significantly is not an easy task but it’s also the only thing that really excites me about visualization  [Added note: Alberto and Gregor in the comments pointed out there is no way NOT to change your brain anyway when you are exposed to a visualization. They are right. So this is more of a colorful image than a good representation of what happens in reality. Yet, I like the concept anyway. Just don’t take to literally!]. And now that I think about it, maybe I am writing this post more for myself than for you. I want to remind myself that my ultimate goal is to help people do remarkable things with visualization. It’s so easy to forget it in the day-to-day. I want to be able to literally change those neurons and synapses and make a difference in people’s brain. That’s what counts for me. Isn’t that a more than worthy and magnificent goal?

And what is your goal by the way?


Take care,
Enrico.

Data with a Soul and a Few More Lessons I Have Learned About Data

I don’t know if this is true for you but I certainly used to take data for granted. Data are data, who cares where they come from. Who cares how they are generated. Who cares what they really mean. I’ll take these bits of digital information and transform them into something else (a visualization) using my black magic and show it to the world.

I no longer see it this way. Not after attending a whole three days event called the Aid Data Convening; a conference organized by the Aid Data Consortium (ARC) to talk exclusively about data. Not just data in general but a single data set: the Aid Data, a curated database of more than a million records collecting information about foreign aid.

The database keeps track of financial disbursements made from donor countries (and international organizations) to recipient countries for development purposes: health and education, disasters and financial crises, climate change, etc. It spans a time range between 1945 up to these days and includes hundreds of countries and international organizations.

Aid Data users are political scientists, economists, social scientists of many sorts, all devoted to a single purpose: understand aid. Is aid effective? Is aid allocated efficiently? Does aid go where it is more needed? Is aid influenced by politics (the answer is of course yes)? Does aid have undesired consequences? Etc.

Isn’t that incredibly fascinating? Here is what I have learned during these few days I have spent talking with these nice people.

Data are not always abundant or easy to get. In the big data era we are so used to data abundance that we end up forgetting how some data, data crucial for important human endeavors, may be very hard to get. It’s not just like creating the next python script and scrap a million records in 24 hours. No, it’s a super-painful process. For instance, the Aid Data folks have a whole team of data gatherers and a multistep process which includes: setting up an agreement with a foreign country, having people flying to remote places and convince officials to make their information available, obtain a whole bunch of documents and files, transform these files into a common format and add geographical coordinates (geocoding) where necessary,  cross-checking data with multiple coders, etc. How far is this from writing a bunch of python code?

Data granularity can be a game-changer. It took me a while to understand why Aid Data users are so excited by the new release of the database which features, for the first time, data at the sub-national rather than only national level. This means that financial disbursements are geocoded at a higher level of granularity, that is, instead of knowing only that a certain amount has flown from one country to another you can now know in which region it has gone. To my eyes this seemed like a minor thing, but as I went through a few presentations of people doing real research with these data I suddenly realized it is a huge change! Picture this: you know data is flowing from the US to Uganda but you have no idea where it goes once it lands there. All in a sudden researchers can ask a whole lot of new and more interesting questions. In turn, this makes me think how this extends to many other data sets: small changes can have huge impacts. A little bit more details, may pave the way to much bigger questions. How can we make existing data systematically more valuable by adding crucial information? And what is this crucial information by the way?

Questions are much more important than data. I did not need to attend this conference to realize how true this is. Yet, after attending it I am even more convinced now. One of the highest peaks of the event for me was listening to all the diverse and interesting questions researchers have on this single data set. There are all kind of flavors: aid effect on health, democratic processes, recovery from disasters and violence or vice-versa how specific events or conditions influence aid. Even if data are a critical asset to answer these question, and to substantiate them with hard numbers, the real value comes from the questions, not from the data. And questions come from the most important asset we have: our brain. Data without brain is useless. Brain without data may still be somewhat useful I guess.

Interesting questions are causal. It’s stunning for me to see how most visualization projects are mostly organized around the detection and depiction of trends, patterns, outliers, groupings,  and so seldom around causation. Yet, in most scientific endeavors causal relationships is what matters the most. While detecting trends is still important, ultimately researchers want to see how A has an effect on B (well it may be much more complicated than that but you get the point): does aid have an effect on child mortality? does aid reduce conflicts? does aid to region A displace resources from region B? It’s extremely surprising to me, after working in visualization for many years, to realize how agnostic visualization is to causation and causal models, when in fact virtually every scientific question subsumes a causal relationship. How can we make progress and systematically explore how visualization can help uncover or present causal relationships?

OMG data bias! It was sometime halfway through the conference, after hearing all sorts of praises for Aid Data,  that one of the attendants bravely stood up and said something along the lines: “Hey wait a moment folks … these data have a huge bias! If we include only countries which accept to provide their data, we have a big  selection bias problem. How is this going to affect our research?” (kudos to Bruce for raising this question). This reminded me that data always comes with all sorts of intricacies and problems. It can be bias, it can be missing values, it can be errors, it can be a lot of other hidden things that may totally invalidate our findings. If there is one lesson to learn here is this: while it is easy to get super-excited about data and the endless opportunities they present, it is hard to acknowledge data are limited and may even be useless in some circumstances. Rather than sweeping these problems under the carpet, we’d better develop some sort of “data braveness” or “data mindfulness” and admit that data, after all, may have all sorts of bugs.

Communities of practice and visualization as a cultural artifact. During the course of the conference I had the opportunity to see lots of charts, graphs, diagrams. Visualization is definitely part of this community: they love maps and enjoy presenting they ideas through colorful visual representations. Earlier, last year I had the opportunity to work with a group of climate scientists on a different project and similarly I have seen them using lots of charts, diagrams, graphs. What I am starting to notice, after seeing so many people using visualization for their own purposes, is that visualization is a cultural artifact. Communities of practice go through an interesting evolutionary process where tools like data visualization are adopted, transformed and consolidated, forming numerous implicit and explicit defaults, conventions and expectations. With Aid Data for instance most people need to visually correlate two main variables: amount of aid and an outcome variable, both in geographical space. Most of them end up using a choropleth map with bubbles on top. Is that the best representation? I don’t know. But I know this is familiar to everyone and this is what most of them expect and are used to see. How much do we know about these communities of practice? How can research in visualization develop a better understanding of how people use visualization in real-world settings? What could we gain by doing that?

Behind data there might be a “soul”. Finally, the last thing I learned is the most important one. Data is just a signal, only a dry description of something that is much more important: real people, phenomena, events. It’s way too easy, when used to work with lots of different data and big piles of them, to forget what lies behind all these bits; what these bits really are. Aid Data and the stories I have heard of reminded me that behind data there can be profound desperation, joy, struggles, good and bad intentions, failures and successes. In a word, there can be real humans and their lives. I think it is really important for us not to lose this connection. Not to completely detach from what these data represent. Next time you start a project try to pause for a moment and think: behind data there might be a soul.

That’s all I had to say. This has been an extremely enriching experience for me and I hope these few thoughts will spark some new ideas and feelings into you. As usual, feel free to comment and react on it. I’d love to hear your voice!

Take care.

The myth of the aimless data explorer

aimlessThere is a sentence I have heard or read multiple times in my journey into (academic) visualization: visualization is a tool people use when they don’t know what question to ask to their data.

I have always taken this sentence as a given and accepted it as it is. Good, I thought, we have a tool to help people come up with questions when they have no idea what to do with their data. Isn’t that great? It sounded right or at least cool.

But as soon as I started working on more applied projects, with real people, real problems, real data they care about, I discovered this all excitement for data exploration is just not there. People working with data are not excited about “playing” with data, they are excited about solving problems. Real problems. And real problems have questions attached, not just curiosity. There’s simply nothing like undirected data exploration in the real world.

Digging a little deeper into the issue, I realize that after all this is natural and somewhat obvious: why should people explore data for the sake of it? Sure some people like us (yes the hopeless data geeks) do take pleasure in looking into a bunch of data, but we are a minority and I am not sure we should take us as the model of reference for what we do.

The reason why I decided to write about this thing is that I think this myth is somewhat pervasive and it’s not limited to visualization. While I am not a Data Mining or Machine Learning expert I know some people in the area and I know some of then too promote “knowledge discovery” as the science of finding good questions.

But wait a moment you might say … when we use knowledge discovery tools (yes, vis is a knowledge discovery tool) sometimes we do stumble into unanticipated questions and these questions may in fact be the real value of the whole process! I agree. And I have experienced this effect multiple times myself. Yet, I think this does not contradict my point: what I am arguing is not that we should not help people coming up with new questions as a collateral effect of data analysis or that coming up with new question is not valuable. What I am arguing here is that we should be very careful in selling visualization as a tool for people who don’t know what question to ask. This is simply not true. Everyone has a question and actually I even believe everyone should start with a question.

There are a couple of words I like more when talking visualization: hypothesis and explanation. These are great words! They describe much better what visualization is good for. You might actually have a good question to start with but not a good hypothesis or explanation for what is going on there (some patients develop unexpected complications after receiving a particular treatment and you don’t know why). And visualization can for sure help you out with coming up with one. Visualization is an “hypothesis booster”. It’s actually so effective that it could even be dangerous in this respect (it may bias you toward some explanation)!

So next time you talk about visualization restrain yourself to selling it for a tool to help people aimlessly explore some data. And when you hear someone saying that please send him or her to this post. I’d be happy to defend my position :)

Am I missing something here? Am I totally wrong in some sense? I know there are some people out there who would strongly disagree with me, feel free to let me hear your voice!

 

 

Data Visualization Semantics

A few days ago I had this nice chat with Jon Schwabish while sipping some iced tea at Think Coffee in downtown Manhattan: what elements of a graphic design give meaning to a visualization? How does the graphical marks, their aesthetics, and their contextual components translate into meaningful concepts we can store in our head?

Everything started from us discussing the role of text in visualization and how labels and annotations play a big role in this sense. Try to think about visualization with no text at all: where does the meaning come from?

I think interpretation depends at least on these two main factors: (1) background knowledge in the reader and (2) semantic cues in the graphics. Interpretation is a sort of “dance” between these two elements: what we have in our head influences what we see in the graphics (this is a very well known fact in vision science) and what we see in the graphics influences what we think.

Background Knowledge. No interpretation can happen if we do not connect what we see with information that is already stored in our head. That’s the way Colin Ware puts it in his “Visual Thinking for Design“:

“When we look at something, perhaps 95 percent of what we consciously perceive is not what is “out there” but what is already in our heads in long-term memory. The reason we have the impression of perceiving a rich and complex environment is that we have rich and complex networks of meaning stored in our brains” [Ch.6, p.116]

And:

“… we have been discussing about objects and scenes as pure visual entities. But scenes and objects have meaning largely through links to other kinds of information stored in a variety of specialized regions of the brain” [Ch.6, p.114]

We are so fixated with data today that we end up forgetting data is merely a (dry) representation of a much more complex phenomenon, and that people need to have their own internal representation of this phenomenon in order to reason about it. This is independent from the data and it plays a big role on how people interpret and interacts with a visualization. Sure, one could always analyze a graph syntactically and say that something is increasing or decreasing over time, or that some “objects” cluster together, etc. But is that useful at all?

Of course, interpretation is subjective and biases pop up all the time, but how does the designer’s intent interact with all the preconceptions, biases and skills of any given reader? This is a huge topic and I don’t see anything around that can help us sorting these thing out.

Interestingly, I see two opposite cases taking place in visualization use and practice. When visualization is used mainly as a communication tool, that is, to convey a predefined message the designer has crafted for the reader, the reader has to be educated before interpretation takes place.

But when visualization is used as an exploratory or decision making tool developed for a group of domain experts, we have an opposite kind of gap: the designer is typically ignorant about the deep meaning of the data and needs to be educated before good design takes place. Without a very tight collaboration between the designer and the domain scientist it’s practically impossible to build something really useful. I have experienced that myself many times. Unfortunately, such a tight collaboration does not happen easily and it’s very hard to establish in the first place.

Semantic Cues. The way visualization itself is designed can support or hinder the semantic association between graphical elements and concepts. The minimum requirement is that the user understands how the graphics works and what it represents. Some charts are easier to interpret because people are familiar with them, some others are fancier and need additional explanations.

But even when a chart is familiar, explanations are needed to understand what the graphical objects represent. I have seen this problem so many times in presentations, especially when some fancy visualization technique is used: the presenter does not describe the semantic associations well enough and the audience gets totally lost.

Other than showing trends and quantities visualization needs to make clear how to create a mental link between the objects stored in your head and those perceived in the visualization: the “what”, “who”, “where”, elements. The theory of visual encoding is so heavily based on the accurate representation of quantitative information that it seems like we have totally forgotten how important it is to employ effective encodings for the what/where channels. This is perhaps why visualization of geographical data is often on a map. Keeping the geographical metaphor intact might not be the “best” visual encoding for the task at hand, yet it carries such a high degree of semantics that it’s hard to shy away from it.

Finally, going back to the original idea of this post, text is king when we talk about interpretation. Seriously, think about visualization with and without text. Text makes visualization alive. It gives meaning to what you see. Among the most common textual elements you can find in a visualization there are: axes labels, legend labels, item labels, titles, annotations, but I guess there are many unused/under-researched aspects. Labeling is tricky and not well studied yet (except for label placement, a quite extensively developed niche). For instance, when the number of data labels shown is higher than a few units there is a high risk to clutter up the screen and no obvious solutions exist.

Also, there is a limited understanding of what’s the best way to integrate visualization and text in a much more natural and seamless way, which goes beyond simply attaching labels to objects and very well beyond the scope of this post.

And you? What do you think? Did you ever think about how meaning is conveyed in visualization? Anything to add?

Thanks for reading. Take care.

What’s the best way to *teach* visualization?

Yes teach, not learn. I have been writing about ways to learn visualizations multiple time (here, here, here, here, here, and, here) and others have done it multiple times too, but I am more interested in questions about how to best teach visualization now. I have been teaching a whole new Information Visualization course last semester and I honestly have several doubts on what’s the best way to teach visualization.

First, I need your feedback.

I will get to the main doubts I have in a moment but before that I want to ask you a favor: if you teach or ever taught a course, I would be very happy to hear from you your opinion and experiences with teaching visualization. I know, as a matter of fact, every teacher has big questions and doubts about best teaching practices. I’d love to hear from you the ups and downs of your teaching experiences.

I would also be very happy to get feedback from people who have attended visualization courses in the past. Did you ever attend a visualization course? If yes, what do you think is the best way to be taught about visualization? Is there anything that worked especially well or bad. What is the most challenging part? Theory, practice, tools, examples? What does or doesn’t work in class?

My InfoVis Course

My Information Visualization course is held at NYU-Poly and it’s open to students of every level (undergrads, grads, and phds). The course is organized around lectures, reading assignments, exercises (mostly in class) and a (big) final project. The project is the most important part of the course and my students work on it for more than half of the time of the whole course (about 2.5 months).

The course focusses mainly on visualization theory (mostly perceptual issues and visual encoding) and on the visualization design process. I have two main goals for my students: (1) make sure they can, for any given problem, explore a very large set of solutions (rather than focus on the first one that comes into their mind), (2) predict as much as possible what works and what does not work, that is, design and implement effective visualizations.

Issues with teaching visualization.

Here is a list of some specific issues I have with my courses.

Visual literacy. I have noticed this problem multiple times in my courses: I show a wide array of visualization examples early on in the course and then I quickly focus on the nuts and bolts of visualization design. The students seem to understand what I teach but they simply have not experienced enough visualization design to really internalize and fully understand what I say. Also, when we come to the problem of designing some visualizations for a data set they have never seen before in the past they don’t have enough of the visualization design space in mind to explore all the interesting possibilities. They are mostly anchored to what come first (usually from experience). How do you develop visual literacy early on in the course? I am not sure but I feel students need a much deeper immersion into the visualization design space before they are able to work on it.

Tools. I usually give total freedom to my students to choose whatever tool they want (I usually make the bad joke they can code visualization in assembler if they want to). I give some detailed advice on what I think are the best tools around but then they choose and learn the tools on their own. I used to think that tool choice is a secondary aspect of teaching visualization but I totally changed my mind. Visualization, as any other design craft, is totally dependent and shaped by the tools one uses, both consciously and unconsciously. The tool you use will give a certain shape and frame of mind to the visualization you produce (I first learned about the idea of how technology and context shape the creative process from David Byrne’s book How Music Works). Furthermore, there is the issue coding vs. noncoding. I know a lot of people do great visualization without writing a single line of code. Yet, I think coding gives much more freedom. I now believe it is much more effective to give tools the role they deserve and teach one (max two) core tools in my future courses. How do you use tools in your course? Is it an accessory or fundamental aspect of your course?

Projects. I think there’s no doubt visualization can only be learned by doing a lot of practice. I use to repeat that visualization can only be judged when you see it not when you think about it. Projects are a great way to put in practice what you learn and solve some interesting and challenging problems. Yet, how do you split the time between the project part and the lectures? How early should the student work on their projects? Ideally they should start very early on so that they have enough time, but shouldn’t they first acquire some basic knowledge? But how can they acquire this knowledge without practice? Again, I used to believe the best way is to split the course into two main periods: the lectures period and the project period. Now I am no longer sure this is the best way to go. How about mini-projects? Projects with a much shorter time span but nicely interleaved with the lectures? It sounds like a nice option. Yet, when are the students confronted with a more realistic medium-size project. Can one afford to have mini-projects AND one big project? I am not sure.

There are many other issues I have but these are the most pressing ones I had.

Now it’s your turn! What’s the best way to teach visualization? I’d love to hear your opinions and experiences. Thanks!

Take care.

 

 

Where are the data visualization success stories?

I see a lot of visualization around me now and I am extremely excited about it. Yet, are we making any real difference? I mean, are we having any real impact in people’s life other than telling them beautiful stories?

Yes I know, impact could be defined in a million different ways and it may be hard to capture. But why? Why I never stumble into an article or blog post showing, I don’t know, for instance, how visualization helped a group of doctors doing something remarkable with visualization?

Is it just because this stuff does not get reported or what?

Here are a few possible explanations:

  • Explanation#1: Impactful visualization is hidden. Those people who are using visualization successfully, who have a real impact, are too busy to report their success.
  • Explanation #2: Visualization is just a fragment of a much larger process. Visualization, when is not used as a communication/story telling tool is part of a much larger process, which includes many other steps and tools so simply success is not ascribed to visualization.
  • Explanation #3: Visualization impact has yet to come. Maybe we just have to wait a bit longer and we’ll get all the success we want.

What do you think? Do you have other explanations? Is my question just too pretentious? Or did I just miss a ton of success stories and this post is totally nonsense?

P.S.1 On a side note: other areas of data analysis, especially automatic approaches like machine learning and data mining have plenty of stories to tell. Why? Food for thought …

P.S. 2 After writing this post I discovered my friend Andy Kirk has written a much longer post on this issue.