[Be warned: this is me in a somewhat depressive state after the deep stress I have endured by submitting too many papers at VIS'14 yesterday. I hope you will forgive me. In reality I could not be more excited about what I am doing and what WE are doing as a community. Yet, I feel the urge to share this with you. I will probably regret it in a few days :)]

I happen to click on one of the last links in one of the popular visualization blogs. I am excited. The title looks cool, the data looks cool and the design of the visualization looks super cool: sleek and clean, the way I like it. I give a look at the demo and you know what? There’s nothing there to see. Empty. No new knowledge, nothing to learn, nothing you can absorb. Nada.

This is not an isolated case. And that’s the reason why I am not happy to disclose which particular project I am talking about. First, because it would not be fair (I hate throwing shit at people). Second, because, as I said, this is not an isolated case. Third, because this particular project is only an expedient to talk about something much larger.

The way I see visualization is as a super powerful discovery tool. Stealing words to Fred Brooks, visualization for me is, ultimately, an intelligence amplification” tool: interactive user interfaces to observe the unobservable (or think the unthinkable?).

But many many visualizations out there show nothing. They are like modern food: empty calories. We, as a community, spent and still spend lots of energy debating whether one particular way of representing a given piece of information is better than another but we seem to forget that what is really important is what we decide to show in the first place. Ultimately, the yardstick should be: did you learn something watching this? Is there any kind of nutrient that enters your brain?

Let’s put it this way: if it was possible to observe exactly what kind of changes happen in the brain of a person when exposed to some new piece of information, through visualization, what would you like to see there? I would like to see a Pollock-like explosion of spreading activation followed by a difference. A delta. A sweet and tiny new brick of knowledge.

I see too much ambiguity out there. We talk about telling stories, about beautiful visualizations, and we talk a lot about wrong ways to visualize data. But what I would like to talk more is about: are we making a difference? Not a difference in the market or on twitter or whatever. A difference in people’s mind. In their brain actually.

I think the answer is mostly yes. I think … I believe … Or I like to believe. But sometime I fear we are not. The biggest fear I have, and this is the real sense of this post, is that if we will not be able to teach people how to create nutritious visualizations we may become irrelevant. Maybe it’s just a stupid thought, I don’t know, but that’s the way I feel when I get depressed by watching empty calories visualization (btw, maybe this should have been the real title of this post). The allure of pretty picture one day will end and I am not sure what will be left to see.

Creating visualizations to change people’s brain significantly is not an easy task but it’s also the only thing that really excites me about visualization  [Added note: Alberto and Gregor in the comments pointed out there is no way NOT to change your brain anyway when you are exposed to a visualization. They are right. So this is more of a colorful image than a good representation of what happens in reality. Yet, I like the concept anyway. Just don't take to literally!]. And now that I think about it, maybe I am writing this post more for myself than for you. I want to remind myself that my ultimate goal is to help people do remarkable things with visualization. It’s so easy to forget it in the day-to-day. I want to be able to literally change those neurons and synapses and make a difference in people’s brain. That’s what counts for me. Isn’t that a more than worthy and magnificent goal?

And what is your goal by the way?


Take care,
Enrico.

{ 13 comments }

This is the last lecture of the introductory part of my course where I give a very broad (and admittedly shallow) overview of some key visualization concepts I hope will stick in my students’ head. After talking about basic charts and high-information graphics I introduce dynamic visualization as visual representations that can change through user interaction.

Here are the lecture slides: Beyond Charts: Dynamic Visualization.

That’s the magic of computer graphics! The visual representation can respond and change according to our actions. Isn’t that great? Yes it is, but what is it for? This is what I asked to my students at the beginning of this class. I ask because I have the impression interaction in many visualizations comes as an afterthought: let’s put a little bit of hovering there and a nice animated zoom there. But interaction is an integral part of the well-reasoned choices a designers has to make in order to make a visualization effective, it’s not just an additional layer one can add there to add a couple of cool functions.

Interaction is the element of a visualization design that allows people to reason about data and that’s the way I presented it in class. It’s only through interaction that you can smoothly go through a long series of loops of: (1) detect something interesting in the data; (2) trigger a question; (3) change the representation in order to answer that question. Here is the (almost embarrassingly simplified) diagram I have used:

Interaction in visualization

Interaction is basically about reasoning with data though many of these intricate loops, not making it cool. Even though admittedly interaction does make visualization cool. But I guess you want to go past beyond the coolness factor, right? That’s almost too easy to achieve.

Next, I introduce Donald Norman’s 7 stages of action. The model describes the stages humans go through when they interact with the world to achieve a specific goal. Here is a sketch of the model:

Screen Shot 2014-03-06 at 11.09.45 PM

The model has been designed to describe things as simple as opening a door or turning on the volume of you speakers but it works equally well with complex user interfaces. The pedagogical value of the model in my opinion is that it make explicit the fact that interactive visualization is a lot about translation: (1) translating the goals we have in our head into actions and visual search tasks we perform with our hand and eyes and (2) translating (actually decoding and giving a meaning) to the changed visual representation we have in front of us after changing it through our actions. Our role as visualizations designers is to make these translations as smooth and natural as possible. Norman calls these critical points “gulf of execution” and “gulf of interpretation”. Easy and effective.

The comments I received after the lecture in our internal forum confirmed that the model does help students wrapping their head around the role of interaction in visualization so I am glad I included it. One student commented: “It is really interesting to see a process, which we all manage, unconsciously broken down to separate steps, where we can surprisingly easily relate those steps to our own experiences. ” Another one wrote: “I was really intrigued with Norman’s 7 Stages of Action. It seems like a really logical way to think holistically about interaction design.

During the rest of my lecture I described this paper: Yi, Ji Soo, et al. “Toward a deeper understanding of the role of interaction in information visualization.” Visualization and Computer Graphics, IEEE Transactions on 13.6 (2007): 1224-1231. This is a super useful paper if you want to learn more about the role of interaction in visualization. The thing I like the most about it is that it describes interaction techniques in terms if “intent” rather than how they are implemented. I like this approach because it abstract away from the technicalities of the technique and creates a more direct connection between interaction and reasoning. These are the categories:

Mark something as interesting (Select)
Show me something else (Explore)
Show me a different arrangement (Reconfigure)
Show me a different representation (Encode)
Show me more or less detail (Abstract/Elaborate)
Show me something conditionally (Filter)
Show me related items (Connect)

If you have never read this paper I suggest you to give it a look, it’s a very good read. Another very good read on the same topic is the more recent: Heer, Jeffrey, and Ben Shneiderman. “Interactive dynamics for visual analysis.” Queue 10.2 (2012): 30. That’s a very good one too.

One of my students in the forum raised a question about complexity: by introducing all this interaction don’t we risk to make visualization too hard to use and understand? Yes, I think there is a very high risk to make things too complex and more interaction does increase the need of users to learn how to use the system. It’s wise to adopt a parsimony principe when we talk about interaction in visualization. Cramming twenty different techniques in one system for the sake of it it’s not going to work. Interaction is a dangerous tool and it must be used with great care. The best is when it blends smoothly into the visual representation and makes important questions easy to answer.

Overall I think we still have to learn a lot about interaction. Most visualizations on the web are static, and most of the interactive ones are either not very well designed or very limited. While little interaction may be necessary for visual data presentations, more rich and well-integrated interaction is crucial for analytical reasoning. If we want to help people reason about data and derive useful insights we have to better understand how to support this complex process.

That’s all for now. Thanks for reading.

{ 2 comments }

Visualization of a million items.

Hi there! We had a one week break at school as the inclement weather forced us to cancel the class last week.

Here are the lecture slides from this class: Beyond Charts: High-Information Graphics.

In this third lecture I have introduced the concept of “high-information graphics”, a term I have stolen from Tufte’s Visual Display of Quantitative Information. For the first time, I decided to introduce this concept very early on in the course because I noticed students have a very hard time conceptualizing visual representations where lots of information is visible in one single view. In the past I have seen lots of students squeezing a million items data sets into a four-bar bar chart. Literally.

The Aggregation Twitch

I coined the term aggregation twitch hoping my students will remember the concept in the future. The aggregation twitch is the tendency to overaggregate data through summary statistics. When confronted with a data table many think: “how can I reduce this to a few numbers?”. I think Tufte captured the phenomenon just right:

Data-rich designs give context and credibility to statistical evidence. Low-information designs are suspect: what is left out, what is hidden, why are we shown so little?

Then, commenting on what’s the difference between high vs. low information designs:

Summary graphics can emerge from high-information displays, but there is nowhere to go if we begin with a low-information design.

I love this last sentence because, in its simplicity, it suggests some kind of stance or attitude in designing visualization.

In order to make the concept more explicit I presented an example from one of my past students. He was assigned the task to create a visualization from the Aid Data data set, which contains more than a million items and several attributes like donor, recipient, date, purpose, etc. His first implementation was a funny (in some perverse way admittedly) line plot with four lines and a lot of options to decide what data segments to display. I was stunned! But since then I kept thinking about that example and how pervasive this aggregation attitude is.

My students seem to have grasped the concept, even though I regret I did not provide any positive example. I spent quite some time explaining why I think this is a limited way of doing visualization but I forgot to prepare and show counterexamples. Not good.

The query paradigm and the notion of overview

My student’s example gave me the opportunity to discuss a related problem I often see: relying excessively on data querying. That’s the way most students think about data visualization initially: create one simple chart and provide lots of options to select what statistical aggregates to display. Interestingly, this is the same way most data portals present they data by default; and by the way why most fail to produce anything interesting since many many years.

The problem with this approach is that there is very limited space for data comparison and rich “graphical inference”, which is exactly what our brain is good for. What many don’t get is that as soon as you change parameters the old chart is not visible anymore and you have to rely on memory rather than perception to relate what you see now to what you saw before. But the very reason why visualization is so powerful, is exactly because the information you need is there in front of you, and can be accessed any time. A concept fantastically expressed by Colin Ware in his book when he writes: “the world is its own memory” [1].

In order to make the distinction clearer I proposed to summarize the concept through this simple dichotomy:

Query paradigm: ask first, then present.
Visualization paradigm: present first, then ask.

The query paradigm forces you to initiate the analysis by thinking what you want first. The hard way. But visualization, for the most part, works in reverse: you first see what is in the data and then you are kind of forced to ask some questions as you detect interesting patterns you feel compelled to interpret and explain.

At this point one of my students jumped up and said: “no wait a minute … in order to create a data visualization you have to have some kind of question first!”. I fully agree. Visualization should be built with a purpose in mind. I think the difference is more in whether the current design provides an overview over your data set or not. The query paradigm chops data in sealed segments one can see only individually; one at a time. But the visualization paradigm tries to build a whole map of your data and let you navigate through this entire space.

Note that I am not necessarily claiming one is better than another! There are many great uses of query interfaces. What worries me the most, to be true, is that the query paradigm is so pervasive that it ends up being the only solution people may consider when approaching visualization problems for the first time.

Where does the aggregation twitch come from?

Why students have a hard time assimilating these concepts? Why are high-information graphics so foreign to most of them? Why do they have a hard time grasping this concept? I think there are at least two main issues at play here:

  1. Underestimation of visual perception. When I work with students, in or out of my class, it always amazes me how fearful they are to make their charts smaller. They fear they will be too hard to see and I keep pushing them to make the damn thing smaller. Much much smaller. The human eye is an incredibly powerful device but it looks like most people do not realize how powerful it is. Probably because we take it for granted. Colin Ware has a nice section in his Information Visualization book on visual acuity [2] which I suggest to read to everyone. It’s such a fascinating piece of research! For instance, take this: a monitor has about 40 pixels per square inch and the human eye can distinguish line collinearity at a resolution as low as 1/10 of a pixel.
  2. Overestimation of human (short) memory. As I said above, most people approach data visualization with a query paradigm: one big chart and a lot of options to decide what to put there. This may work in some cases but it limits enormously the amount of reasoning we can do with it. We humans can hold a very small set of objects in our working memory at any given time, that’s the famous “magical number seven” (tip: it’s actually more complicated than that but it works for this example), therefore when a chart changes, we can no longer relate the previous set to the current one. Visual perception is orders of magnitudes more powerful than memory. That’s why visualization shines.

There is actually a third issue which did not occur to me until I presented these ideas in class: visual literacy and familiarity (I started getting obsessed with this issue lately). Most of the fancy visualization techniques we develop are totally unfamiliar for most people out there. Not only they need to spend time learning how to decode them, but they may also be totally overwhelmed by the information density carried by these pictures. This became totally clear to me when I presented this Treemap in class (click to see a bigger version):

Treemap

One of my students raised his hand with a facial expression between disgust and pain: “Prof., that’s too much information at once, I cannot bear it”. That’s the thing: while some people (me included) seem to take pleasure from looking at the intricate patterns high-information graphics make, some other people just cannot bear it. Question: is that a learned behavior or it’s more rooted in individual differences we humans have? I don’t know.

That’s all folks … Now I need to prepare for my next lecture (and whole bunch of other stuff by the way:))

[1] Ware, Colin. Visual thinking: For design. Morgan Kaufmann, 2010.

[2] Ware, Colin. Information Visualization. Morgan Kaufmann, 2013 (third edition)

{ 6 comments }

Course Diary #1: Basic Charts

by Enrico on February 10, 2014

in Course Diary

Starting from this week and during the rest of the semester I will be writing a new series called “Course Diary” where I report about my experience while teaching Information Visualization to my students at NYU. Teaching to them is a lot of fun. They often challenge me with questions and comments which force me to think more deeply about visualization. Here I’ll report about some of my experiences and reflections on the course.

Lecture slides for this class: http://bit.ly/infovis14-l2

In the second lecture of my course (the first was a broad introduction to infovis) I introduced basic charts: bar charts, line charts, scatter plots, and some of their variants. These basic charts give me the opportunity to talk about two important concepts: the relationship between data type and graph type (even though in a somewhat primitive way) and graphical perception.

In order to let students absorb graphical perception I spend a lot of time playing graphical trick rather than talking about theory (I’ll do that later on extensively). For instance, I show the “barless bar chart” a bar chart with dots in place of bars:

barless-barchart

But I don’t limit myself to showing these are sub-optimal charts, I invite the students to think about why, and I’ve found this very nicely and naturally introduces broader and more relevant concepts. Let me explain with an example: a line chart without lines.

time-series-dot-plot

It’s easy to argue this does not work well. Especially when you show it paired with a proper line chart. But then you ask: why? Why it does not work as well as the version with lines? I’ve found that students have to stretch their mind and think much more deeply about the issue. Heck I have to think much more deeply myself!

For instance, I realized while discussing this example in class, that a line chart without lines is a very good example of why and when visualization works best: when data understanding is supported by perceptual rather than cognitive processes. A line chart without lines forces us to trace a line between the points. We desperately need that line! It’s not that we don’t use that line at all, it’s more than we draw it in our head rather than seeing it with our eyes. We can still judge the slope and detect patterns of course but it’s much much harder (slower/less accurate)! This simple concept can be applied everywhere in visualization. You get it here, with a simple time line, and you can re-apply it in a thousand different new cases.

Another example I have shown which spurred some interesting discussion is the “colorless divided bar chart”.

colorless-divided-barchart

Once again, this one forces you to think more deeply about graphical perception. A divided bar chart with color is clearly better right? But why? Why is it better? Most students said there reason is because it’s easier to detect which bar is which: red to red, blue to blue, etc. And then I say: yes ok, but why? Why is color helping you here? After all each bar has its own position, and position is a pretty strong visual primitive to encode data (hint: it’s actually the best one). And then I explain that position here is overloaded, that is, it’s used to encode two things at the same time: the groups and the categories within each group and they get mixed up more easily without color. That’s when everyone nodded in class.

At some point I showed four different ways to display a time series (there are many more of course):

time-series-designs

When I showed that, a couple of students raised some interesting questions. One was about the line chart vs. the area chart. The area chart looks pretty good, when is a good idea to use it? One students suggested the area chart has more contrast than the line chart and this made me think that area charts are probably very good when we have lots of them in a small multiple fashion as they they create a closed shape and as such are easier to compare.

By using this technique of starting from a basic chart and stripping it down of some fundamental design elements I have found I can teach a lot. I almost stumbled into this technique by chance but I think it’s very effective and I will use it over and over again in my course.

Besides dissecting charts, another recurring question I get in class is: how do we judge if a visualization is better than another? That’s a super hard question and I am glad I get it all the time. There’s not enough space here to articulate the answer but there is one thing I stress a lot in class: you cannot judge visualization without specifying a purpose.

I think everyone in this field has a tendency to judge visualizations in absolute terms, without considering their context (I have done that multiple times too). Too many believe data visualization is only about “data + visualization” (hence the name right?), forgetting that visualization with a purpose attached is impossible to judge. And again, basic charts, in all their simplicity, offer several opportunities to expose this concept: divided or stacked bar charts? area or line chart? multiple superimposed time lines or small multiples? There’s no absolute best here.

I have only one last comment from this lecture: Tableau is awesome. Coming up with examples and quickly tweaking them by adding and removing graphical properties saved me hours and hours of time. I was initially tempted to draw these examples on my whiteboard and take pictures, then I tried Tableau and it made me smile. A big smile. This makes me also think that Tableau other than being a great analytic and presentation tool can also be used as an excellent didactic tool.

That’s all for this week. Wish me luck for my next class!

{ 10 comments }

The Role of Algorithms in Data Visualization

by Enrico on January 28, 2014

in Research

It’s somewhat surprising to me to notice how little we discuss about the more technical side of data visualization. I use to say that visualization is something that “happens in your head” to emphasize the role of perception and cognition and to explain why it is so hard to evaluate visualization. Yet, visualization happens a lot in the computer also, and what happens there can be extremely fascinating too.

So, today I want to talk about algorithms in visualization. What’s the use of algorithms in visualization? When do we need them? Why do we need them? What are they for? Surprisingly, even in academic circles I noticed we tended to either avoid the question completely or to take the answer for granted. Heck, even the few books we have out there: how many of them teach the algorithmic side of visualization? None.

I have grouped algorithms in four broad classes. For each one I am going to give a brief description and a few examples.

Spatial Layout. The most important perceptual component of every visualization is how data objects are positioned on the screen, that is, the logic and mechanism by which a data element is uniquely positioned on the spatial substrate. Most visualization techniques have closed formulations, based on coordinate systems, that permits to uniquely, and somewhat trivially, find a place for each data object. Think scatter plots, bar charts, line charts, parallel coordinates, etc. Some other visualization techniques, however, have more complex logics which require algorithms to find the “right” position for each data element. A notable example is treemaps, which starting from the somewhat simple initial formulation called “slice-and-dice” evolved into more complex formulations like squarified treemaps and voronoi treempas. A treemap is not based on coordinates, it requires a logic. One of my favorite papers ever is “Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies” where alternative treemap algorithms are proposed and rigorously evaluated. Another example is the super wide class of graph layout algorithms called force-directed layouts, where nodes and edges take place according to some iterative optimization procedures. This class is so wide that some specific conferences and books exist only to study new graph layouts. Many other examples exist: multidimensional scaling, self-organizing maps, flowmap layouts, etc. A lot has been done in the past but a lot needs to be done yet too, especially in better understanding how scale them up to much higher number of elements.

(Interactive) Data Abstraction. There are many cases where data need to be processed and abstracted before they can be visualized. Above all the need to deal with very large data sets (see “Extreme visualization: squeezing a billion records into a million pixels” for a glimpse of the problem). It does not matter how big your screen is, at some point you are going to hit a limit. One class of data abstraction is binning and data cubes (Tableau is mostly based on that for instance), which summarize and reduce the data by grouping them into intervals. Every visualization based on density has some sort of binning or smoothing behind the lines and the mechanism can turn out to be complex enough to require some sort of clever algorithm. More interesting is the case of data visualizations that have to adapt to user interaction. Even the most trivial abstraction mechanism may require some complex algorithm to make sure the visualization is updated in less than one second when the user needs to navigate from one abstraction level to another. A recent great example of this kind of work is “imMens: Real-time Visual Querying of Big Data“. Of course binning is not the only data abstraction mechanism needed in visualization. For instance, all sorts of clustering algorithms have been used in visualization to reduce data size. Notably, graph clustering algorithms can (sometime) turn some huge “spaghetti mess” into some more intelligible picture. For an overview of aggregation techniques in visualization you can read “Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines” a very useful survey on the topic.

Smart Encoding. Every single visualization can be seen as an encoding procedure where one has to decide how to map data features to visual features. To build a bubble chart you have to decide which variable you want to map to the x-axis, y-axis, color, size, etc … You get the picture. This kind of process may become tedious or too costly when the number of data dimensions increases. Also, some users may not have a clue on how to “correctly” visualize some data. Encoding algorithms can do some of the encoding for you or at least guide you into the process. This kind of approach never became too popular in reality but visualization researchers have spent quite some time developing smart encoding techniques. Notably, Jock Mackinlay‘s seminal work: “Automating the design of graphical presentations of relational information” and the later implementation of the “Show Me” function in Tableau (Show Me: Automatic Presentation for Visual Analysis). Other examples exist but they tend to be more on the academic speculation side. One thing I have never seen though is the use of smart encoding as an artistic tool. Why not let the computer explore a million different encodings and see what you get? That would be a fun project.

Quality Measures. Even if this may seem a bit silly at first, algorithms can be used to supplement or substitute humans in judging the quality of a visualization. If you go back to all the previous classes I have described above, you can realize that in everyone there might be some little mechanism of quality judgment. Layout algorithms (especially the nondeterministic ones) may need to routinely check the quality of the current layout. Same thing for sorting algorithms like those needed to fin meaningful orderings in matrices and heatmaps. Data abstraction algorithms may need to automatically find the right parameters for the abstraction. And smart encoding algorithms may need to separate the wheat from the chaff by suggesting only encodings with quality above a given threshold. A couple of years back I have written a paper on quality metrics titled “Quality metrics in high-dimensional data visualization: An overview and systematization” to create a systematic description of how they are used in visualization. The topic is arguably a little academic but I can assure you it’s a fascinating one with lots of potential for innovation.

These are the four classes of algorithms I have currently identified in visualization. Are there more out there? I am sure there are and that’s partly the reason why I have written this post. If there are other uses for algorithms which I did not list here please comment on this post and feel free to suggest more. That would help me build a better picture. There is much more to say on this topic.

Take care.

{ 2 comments }

Data with a Soul and a Few More Lessons I Have Learned About Data

January 15, 2014

I don’t know if this is true for you but I certainly used to take data for granted. Data are data, who cares where they come from. Who cares how they are generated. Who cares what they really mean. I’ll take these bits of digital information and transform them into something else (a visualization) using […]

Read the full article →

The myth of the aimless data explorer

January 7, 2014

There is a sentence I have heard or read multiple times in my journey into (academic) visualization: visualization is a tool people use when they don’t know what question to ask to their data. I have always taken this sentence as a given and accepted it as it is. Good, I thought, we have a […]

Read the full article →

Data Visualization Semantics

July 15, 2013

A few days ago I had this nice chat with Jon Schwabish while sipping some iced tea at Think Coffee in downtown Manhattan: what elements of a graphic design give meaning to a visualization? How does the graphical marks, their aesthetics, and their contextual components translate into meaningful concepts we can store in our head? Everything started […]

Read the full article →

What’s the best way to *teach* visualization?

July 1, 2013

Yes teach, not learn. I have been writing about ways to learn visualizations multiple time (here, here, here, here, here, and, here) and others have done it multiple times too, but I am more interested in questions about how to best teach visualization now. I have been teaching a whole new Information Visualization course last […]

Read the full article →

The (new) VIS 2013 Industry and Government Track

June 24, 2013

It always sounds weird to me when I have to explain how the  VIS Conference (formerly VisWeek) is not only for academics. Until now, I always had to use a number of convoluted arguments to explain it but now I have one more tool in my swiss army knife: the industry and government track. What does […]

Read the full article →