Author Archives: Enrico Bertini

Course Diary #2: Beyond Charts: High-Information Graphics

Visualization of a million items.

Hi there! We had a one week break at school as the inclement weather forced us to cancel the class last week.

Here are the lecture slides from this class: Beyond Charts: High-Information Graphics.

In this third lecture I have introduced the concept of “high-information graphics”, a term I have stolen from Tufte’s Visual Display of Quantitative Information. For the first time, I decided to introduce this concept very early on in the course because I noticed students have a very hard time conceptualizing visual representations where lots of information is visible in one single view. In the past I have seen lots of students squeezing a million items data sets into a four-bar bar chart. Literally.

The Aggregation Twitch

I coined the term aggregation twitch hoping my students will remember the concept in the future. The aggregation twitch is the tendency to overaggregate data through summary statistics. When confronted with a data table many think: “how can I reduce this to a few numbers?”. I think Tufte captured the phenomenon just right:

Data-rich designs give context and credibility to statistical evidence. Low-information designs are suspect: what is left out, what is hidden, why are we shown so little?

Then, commenting on what’s the difference between high vs. low information designs:

Summary graphics can emerge from high-information displays, but there is nowhere to go if we begin with a low-information design.

I love this last sentence because, in its simplicity, it suggests some kind of stance or attitude in designing visualization.

In order to make the concept more explicit I presented an example from one of my past students. He was assigned the task to create a visualization from the Aid Data data set, which contains more than a million items and several attributes like donor, recipient, date, purpose, etc. His first implementation was a funny (in some perverse way admittedly) line plot with four lines and a lot of options to decide what data segments to display. I was stunned! But since then I kept thinking about that example and how pervasive this aggregation attitude is.

My students seem to have grasped the concept, even though I regret I did not provide any positive example. I spent quite some time explaining why I think this is a limited way of doing visualization but I forgot to prepare and show counterexamples. Not good.

The query paradigm and the notion of overview

My student’s example gave me the opportunity to discuss a related problem I often see: relying excessively on data querying. That’s the way most students think about data visualization initially: create one simple chart and provide lots of options to select what statistical aggregates to display. Interestingly, this is the same way most data portals present they data by default; and by the way why most fail to produce anything interesting since many many years.

The problem with this approach is that there is very limited space for data comparison and rich “graphical inference”, which is exactly what our brain is good for. What many don’t get is that as soon as you change parameters the old chart is not visible anymore and you have to rely on memory rather than perception to relate what you see now to what you saw before. But the very reason why visualization is so powerful, is exactly because the information you need is there in front of you, and can be accessed any time. A concept fantastically expressed by Colin Ware in his book when he writes: “the world is its own memory” [1].

In order to make the distinction clearer I proposed to summarize the concept through this simple dichotomy:

Query paradigm: ask first, then present.
Visualization paradigm: present first, then ask.

The query paradigm forces you to initiate the analysis by thinking what you want first. The hard way. But visualization, for the most part, works in reverse: you first see what is in the data and then you are kind of forced to ask some questions as you detect interesting patterns you feel compelled to interpret and explain.

At this point one of my students jumped up and said: “no wait a minute … in order to create a data visualization you have to have some kind of question first!”. I fully agree. Visualization should be built with a purpose in mind. I think the difference is more in whether the current design provides an overview over your data set or not. The query paradigm chops data in sealed segments one can see only individually; one at a time. But the visualization paradigm tries to build a whole map of your data and let you navigate through this entire space.

Note that I am not necessarily claiming one is better than another! There are many great uses of query interfaces. What worries me the most, to be true, is that the query paradigm is so pervasive that it ends up being the only solution people may consider when approaching visualization problems for the first time.

Where does the aggregation twitch come from?

Why students have a hard time assimilating these concepts? Why are high-information graphics so foreign to most of them? Why do they have a hard time grasping this concept? I think there are at least two main issues at play here:

  1. Underestimation of visual perception. When I work with students, in or out of my class, it always amazes me how fearful they are to make their charts smaller. They fear they will be too hard to see and I keep pushing them to make the damn thing smaller. Much much smaller. The human eye is an incredibly powerful device but it looks like most people do not realize how powerful it is. Probably because we take it for granted. Colin Ware has a nice section in his Information Visualization book on visual acuity [2] which I suggest to read to everyone. It’s such a fascinating piece of research! For instance, take this: a monitor has about 40 pixels per square inch and the human eye can distinguish line collinearity at a resolution as low as 1/10 of a pixel.
  2. Overestimation of human (short) memory. As I said above, most people approach data visualization with a query paradigm: one big chart and a lot of options to decide what to put there. This may work in some cases but it limits enormously the amount of reasoning we can do with it. We humans can hold a very small set of objects in our working memory at any given time, that’s the famous “magical number seven” (tip: it’s actually more complicated than that but it works for this example), therefore when a chart changes, we can no longer relate the previous set to the current one. Visual perception is orders of magnitudes more powerful than memory. That’s why visualization shines.

There is actually a third issue which did not occur to me until I presented these ideas in class: visual literacy and familiarity (I started getting obsessed with this issue lately). Most of the fancy visualization techniques we develop are totally unfamiliar for most people out there. Not only they need to spend time learning how to decode them, but they may also be totally overwhelmed by the information density carried by these pictures. This became totally clear to me when I presented this Treemap in class (click to see a bigger version):

Treemap

One of my students raised his hand with a facial expression between disgust and pain: “Prof., that’s too much information at once, I cannot bear it”. That’s the thing: while some people (me included) seem to take pleasure from looking at the intricate patterns high-information graphics make, some other people just cannot bear it. Question: is that a learned behavior or it’s more rooted in individual differences we humans have? I don’t know.

That’s all folks … Now I need to prepare for my next lecture (and whole bunch of other stuff by the way:))

[1] Ware, Colin. Visual thinking: For design. Morgan Kaufmann, 2010.

[2] Ware, Colin. Information Visualization. Morgan Kaufmann, 2013 (third edition)

Course Diary #1: Basic Charts

Starting from this week and during the rest of the semester I will be writing a new series called “Course Diary” where I report about my experience while teaching Information Visualization to my students at NYU. Teaching to them is a lot of fun. They often challenge me with questions and comments which force me to think more deeply about visualization. Here I’ll report about some of my experiences and reflections on the course.

Lecture slides for this class: http://bit.ly/infovis14-l2

In the second lecture of my course (the first was a broad introduction to infovis) I introduced basic charts: bar charts, line charts, scatter plots, and some of their variants. These basic charts give me the opportunity to talk about two important concepts: the relationship between data type and graph type (even though in a somewhat primitive way) and graphical perception.

In order to let students absorb graphical perception I spend a lot of time playing graphical trick rather than talking about theory (I’ll do that later on extensively). For instance, I show the “barless bar chart” a bar chart with dots in place of bars:

barless-barchart

But I don’t limit myself to showing these are sub-optimal charts, I invite the students to think about why, and I’ve found this very nicely and naturally introduces broader and more relevant concepts. Let me explain with an example: a line chart without lines.

time-series-dot-plot

It’s easy to argue this does not work well. Especially when you show it paired with a proper line chart. But then you ask: why? Why it does not work as well as the version with lines? I’ve found that students have to stretch their mind and think much more deeply about the issue. Heck I have to think much more deeply myself!

For instance, I realized while discussing this example in class, that a line chart without lines is a very good example of why and when visualization works best: when data understanding is supported by perceptual rather than cognitive processes. A line chart without lines forces us to trace a line between the points. We desperately need that line! It’s not that we don’t use that line at all, it’s more than we draw it in our head rather than seeing it with our eyes. We can still judge the slope and detect patterns of course but it’s much much harder (slower/less accurate)! This simple concept can be applied everywhere in visualization. You get it here, with a simple time line, and you can re-apply it in a thousand different new cases.

Another example I have shown which spurred some interesting discussion is the “colorless divided bar chart”.

colorless-divided-barchart

Once again, this one forces you to think more deeply about graphical perception. A divided bar chart with color is clearly better right? But why? Why is it better? Most students said there reason is because it’s easier to detect which bar is which: red to red, blue to blue, etc. And then I say: yes ok, but why? Why is color helping you here? After all each bar has its own position, and position is a pretty strong visual primitive to encode data (hint: it’s actually the best one). And then I explain that position here is overloaded, that is, it’s used to encode two things at the same time: the groups and the categories within each group and they get mixed up more easily without color. That’s when everyone nodded in class.

At some point I showed four different ways to display a time series (there are many more of course):

time-series-designs

When I showed that, a couple of students raised some interesting questions. One was about the line chart vs. the area chart. The area chart looks pretty good, when is a good idea to use it? One students suggested the area chart has more contrast than the line chart and this made me think that area charts are probably very good when we have lots of them in a small multiple fashion as they they create a closed shape and as such are easier to compare.

By using this technique of starting from a basic chart and stripping it down of some fundamental design elements I have found I can teach a lot. I almost stumbled into this technique by chance but I think it’s very effective and I will use it over and over again in my course.

Besides dissecting charts, another recurring question I get in class is: how do we judge if a visualization is better than another? That’s a super hard question and I am glad I get it all the time. There’s not enough space here to articulate the answer but there is one thing I stress a lot in class: you cannot judge visualization without specifying a purpose.

I think everyone in this field has a tendency to judge visualizations in absolute terms, without considering their context (I have done that multiple times too). Too many believe data visualization is only about “data + visualization” (hence the name right?), forgetting that visualization with a purpose attached is impossible to judge. And again, basic charts, in all their simplicity, offer several opportunities to expose this concept: divided or stacked bar charts? area or line chart? multiple superimposed time lines or small multiples? There’s no absolute best here.

I have only one last comment from this lecture: Tableau is awesome. Coming up with examples and quickly tweaking them by adding and removing graphical properties saved me hours and hours of time. I was initially tempted to draw these examples on my whiteboard and take pictures, then I tried Tableau and it made me smile. A big smile. This makes me also think that Tableau other than being a great analytic and presentation tool can also be used as an excellent didactic tool.

That’s all for this week. Wish me luck for my next class!

The Role of Algorithms in Data Visualization

It’s somewhat surprising to me to notice how little we discuss about the more technical side of data visualization. I use to say that visualization is something that “happens in your head” to emphasize the role of perception and cognition and to explain why it is so hard to evaluate visualization. Yet, visualization happens a lot in the computer also, and what happens there can be extremely fascinating too.

So, today I want to talk about algorithms in visualization. What’s the use of algorithms in visualization? When do we need them? Why do we need them? What are they for? Surprisingly, even in academic circles I noticed we tended to either avoid the question completely or to take the answer for granted. Heck, even the few books we have out there: how many of them teach the algorithmic side of visualization? None.

I have grouped algorithms in four broad classes. For each one I am going to give a brief description and a few examples.

Spatial Layout. The most important perceptual component of every visualization is how data objects are positioned on the screen, that is, the logic and mechanism by which a data element is uniquely positioned on the spatial substrate. Most visualization techniques have closed formulations, based on coordinate systems, that permits to uniquely, and somewhat trivially, find a place for each data object. Think scatter plots, bar charts, line charts, parallel coordinates, etc. Some other visualization techniques, however, have more complex logics which require algorithms to find the “right” position for each data element. A notable example is treemaps, which starting from the somewhat simple initial formulation called “slice-and-dice” evolved into more complex formulations like squarified treemaps and voronoi treempas. A treemap is not based on coordinates, it requires a logic. One of my favorite papers ever is “Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies” where alternative treemap algorithms are proposed and rigorously evaluated. Another example is the super wide class of graph layout algorithms called force-directed layouts, where nodes and edges take place according to some iterative optimization procedures. This class is so wide that some specific conferences and books exist only to study new graph layouts. Many other examples exist: multidimensional scaling, self-organizing maps, flowmap layouts, etc. A lot has been done in the past but a lot needs to be done yet too, especially in better understanding how scale them up to much higher number of elements.

(Interactive) Data Abstraction. There are many cases where data need to be processed and abstracted before they can be visualized. Above all the need to deal with very large data sets (see “Extreme visualization: squeezing a billion records into a million pixels” for a glimpse of the problem). It does not matter how big your screen is, at some point you are going to hit a limit. One class of data abstraction is binning and data cubes (Tableau is mostly based on that for instance), which summarize and reduce the data by grouping them into intervals. Every visualization based on density has some sort of binning or smoothing behind the lines and the mechanism can turn out to be complex enough to require some sort of clever algorithm. More interesting is the case of data visualizations that have to adapt to user interaction. Even the most trivial abstraction mechanism may require some complex algorithm to make sure the visualization is updated in less than one second when the user needs to navigate from one abstraction level to another. A recent great example of this kind of work is “imMens: Real-time Visual Querying of Big Data“. Of course binning is not the only data abstraction mechanism needed in visualization. For instance, all sorts of clustering algorithms have been used in visualization to reduce data size. Notably, graph clustering algorithms can (sometime) turn some huge “spaghetti mess” into some more intelligible picture. For an overview of aggregation techniques in visualization you can read “Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines” a very useful survey on the topic.

Smart Encoding. Every single visualization can be seen as an encoding procedure where one has to decide how to map data features to visual features. To build a bubble chart you have to decide which variable you want to map to the x-axis, y-axis, color, size, etc … You get the picture. This kind of process may become tedious or too costly when the number of data dimensions increases. Also, some users may not have a clue on how to “correctly” visualize some data. Encoding algorithms can do some of the encoding for you or at least guide you into the process. This kind of approach never became too popular in reality but visualization researchers have spent quite some time developing smart encoding techniques. Notably, Jock Mackinlay‘s seminal work: “Automating the design of graphical presentations of relational information” and the later implementation of the “Show Me” function in Tableau (Show Me: Automatic Presentation for Visual Analysis). Other examples exist but they tend to be more on the academic speculation side. One thing I have never seen though is the use of smart encoding as an artistic tool. Why not let the computer explore a million different encodings and see what you get? That would be a fun project.

Quality Measures. Even if this may seem a bit silly at first, algorithms can be used to supplement or substitute humans in judging the quality of a visualization. If you go back to all the previous classes I have described above, you can realize that in everyone there might be some little mechanism of quality judgment. Layout algorithms (especially the nondeterministic ones) may need to routinely check the quality of the current layout. Same thing for sorting algorithms like those needed to fin meaningful orderings in matrices and heatmaps. Data abstraction algorithms may need to automatically find the right parameters for the abstraction. And smart encoding algorithms may need to separate the wheat from the chaff by suggesting only encodings with quality above a given threshold. A couple of years back I have written a paper on quality metrics titled “Quality metrics in high-dimensional data visualization: An overview and systematization” to create a systematic description of how they are used in visualization. The topic is arguably a little academic but I can assure you it’s a fascinating one with lots of potential for innovation.

These are the four classes of algorithms I have currently identified in visualization. Are there more out there? I am sure there are and that’s partly the reason why I have written this post. If there are other uses for algorithms which I did not list here please comment on this post and feel free to suggest more. That would help me build a better picture. There is much more to say on this topic.

Take care.

Data with a Soul and a Few More Lessons I Have Learned About Data

I don’t know if this is true for you but I certainly used to take data for granted. Data are data, who cares where they come from. Who cares how they are generated. Who cares what they really mean. I’ll take these bits of digital information and transform them into something else (a visualization) using my black magic and show it to the world.

I no longer see it this way. Not after attending a whole three days event called the Aid Data Convening; a conference organized by the Aid Data Consortium (ARC) to talk exclusively about data. Not just data in general but a single data set: the Aid Data, a curated database of more than a million records collecting information about foreign aid.

The database keeps track of financial disbursements made from donor countries (and international organizations) to recipient countries for development purposes: health and education, disasters and financial crises, climate change, etc. It spans a time range between 1945 up to these days and includes hundreds of countries and international organizations.

Aid Data users are political scientists, economists, social scientists of many sorts, all devoted to a single purpose: understand aid. Is aid effective? Is aid allocated efficiently? Does aid go where it is more needed? Is aid influenced by politics (the answer is of course yes)? Does aid have undesired consequences? Etc.

Isn’t that incredibly fascinating? Here is what I have learned during these few days I have spent talking with these nice people.

Data are not always abundant or easy to get. In the big data era we are so used to data abundance that we end up forgetting how some data, data crucial for important human endeavors, may be very hard to get. It’s not just like creating the next python script and scrap a million records in 24 hours. No, it’s a super-painful process. For instance, the Aid Data folks have a whole team of data gatherers and a multistep process which includes: setting up an agreement with a foreign country, having people flying to remote places and convince officials to make their information available, obtain a whole bunch of documents and files, transform these files into a common format and add geographical coordinates (geocoding) where necessary,  cross-checking data with multiple coders, etc. How far is this from writing a bunch of python code?

Data granularity can be a game-changer. It took me a while to understand why Aid Data users are so excited by the new release of the database which features, for the first time, data at the sub-national rather than only national level. This means that financial disbursements are geocoded at a higher level of granularity, that is, instead of knowing only that a certain amount has flown from one country to another you can now know in which region it has gone. To my eyes this seemed like a minor thing, but as I went through a few presentations of people doing real research with these data I suddenly realized it is a huge change! Picture this: you know data is flowing from the US to Uganda but you have no idea where it goes once it lands there. All in a sudden researchers can ask a whole lot of new and more interesting questions. In turn, this makes me think how this extends to many other data sets: small changes can have huge impacts. A little bit more details, may pave the way to much bigger questions. How can we make existing data systematically more valuable by adding crucial information? And what is this crucial information by the way?

Questions are much more important than data. I did not need to attend this conference to realize how true this is. Yet, after attending it I am even more convinced now. One of the highest peaks of the event for me was listening to all the diverse and interesting questions researchers have on this single data set. There are all kind of flavors: aid effect on health, democratic processes, recovery from disasters and violence or vice-versa how specific events or conditions influence aid. Even if data are a critical asset to answer these question, and to substantiate them with hard numbers, the real value comes from the questions, not from the data. And questions come from the most important asset we have: our brain. Data without brain is useless. Brain without data may still be somewhat useful I guess.

Interesting questions are causal. It’s stunning for me to see how most visualization projects are mostly organized around the detection and depiction of trends, patterns, outliers, groupings,  and so seldom around causation. Yet, in most scientific endeavors causal relationships is what matters the most. While detecting trends is still important, ultimately researchers want to see how A has an effect on B (well it may be much more complicated than that but you get the point): does aid have an effect on child mortality? does aid reduce conflicts? does aid to region A displace resources from region B? It’s extremely surprising to me, after working in visualization for many years, to realize how agnostic visualization is to causation and causal models, when in fact virtually every scientific question subsumes a causal relationship. How can we make progress and systematically explore how visualization can help uncover or present causal relationships?

OMG data bias! It was sometime halfway through the conference, after hearing all sorts of praises for Aid Data,  that one of the attendants bravely stood up and said something along the lines: “Hey wait a moment folks … these data have a huge bias! If we include only countries which accept to provide their data, we have a big  selection bias problem. How is this going to affect our research?” (kudos to Bruce for raising this question). This reminded me that data always comes with all sorts of intricacies and problems. It can be bias, it can be missing values, it can be errors, it can be a lot of other hidden things that may totally invalidate our findings. If there is one lesson to learn here is this: while it is easy to get super-excited about data and the endless opportunities they present, it is hard to acknowledge data are limited and may even be useless in some circumstances. Rather than sweeping these problems under the carpet, we’d better develop some sort of “data braveness” or “data mindfulness” and admit that data, after all, may have all sorts of bugs.

Communities of practice and visualization as a cultural artifact. During the course of the conference I had the opportunity to see lots of charts, graphs, diagrams. Visualization is definitely part of this community: they love maps and enjoy presenting they ideas through colorful visual representations. Earlier, last year I had the opportunity to work with a group of climate scientists on a different project and similarly I have seen them using lots of charts, diagrams, graphs. What I am starting to notice, after seeing so many people using visualization for their own purposes, is that visualization is a cultural artifact. Communities of practice go through an interesting evolutionary process where tools like data visualization are adopted, transformed and consolidated, forming numerous implicit and explicit defaults, conventions and expectations. With Aid Data for instance most people need to visually correlate two main variables: amount of aid and an outcome variable, both in geographical space. Most of them end up using a choropleth map with bubbles on top. Is that the best representation? I don’t know. But I know this is familiar to everyone and this is what most of them expect and are used to see. How much do we know about these communities of practice? How can research in visualization develop a better understanding of how people use visualization in real-world settings? What could we gain by doing that?

Behind data there might be a “soul”. Finally, the last thing I learned is the most important one. Data is just a signal, only a dry description of something that is much more important: real people, phenomena, events. It’s way too easy, when used to work with lots of different data and big piles of them, to forget what lies behind all these bits; what these bits really are. Aid Data and the stories I have heard of reminded me that behind data there can be profound desperation, joy, struggles, good and bad intentions, failures and successes. In a word, there can be real humans and their lives. I think it is really important for us not to lose this connection. Not to completely detach from what these data represent. Next time you start a project try to pause for a moment and think: behind data there might be a soul.

That’s all I had to say. This has been an extremely enriching experience for me and I hope these few thoughts will spark some new ideas and feelings into you. As usual, feel free to comment and react on it. I’d love to hear your voice!

Take care.

The myth of the aimless data explorer

aimlessThere is a sentence I have heard or read multiple times in my journey into (academic) visualization: visualization is a tool people use when they don’t know what question to ask to their data.

I have always taken this sentence as a given and accepted it as it is. Good, I thought, we have a tool to help people come up with questions when they have no idea what to do with their data. Isn’t that great? It sounded right or at least cool.

But as soon as I started working on more applied projects, with real people, real problems, real data they care about, I discovered this all excitement for data exploration is just not there. People working with data are not excited about “playing” with data, they are excited about solving problems. Real problems. And real problems have questions attached, not just curiosity. There’s simply nothing like undirected data exploration in the real world.

Digging a little deeper into the issue, I realize that after all this is natural and somewhat obvious: why should people explore data for the sake of it? Sure some people like us (yes the hopeless data geeks) do take pleasure in looking into a bunch of data, but we are a minority and I am not sure we should take us as the model of reference for what we do.

The reason why I decided to write about this thing is that I think this myth is somewhat pervasive and it’s not limited to visualization. While I am not a Data Mining or Machine Learning expert I know some people in the area and I know some of then too promote “knowledge discovery” as the science of finding good questions.

But wait a moment you might say … when we use knowledge discovery tools (yes, vis is a knowledge discovery tool) sometimes we do stumble into unanticipated questions and these questions may in fact be the real value of the whole process! I agree. And I have experienced this effect multiple times myself. Yet, I think this does not contradict my point: what I am arguing is not that we should not help people coming up with new questions as a collateral effect of data analysis or that coming up with new question is not valuable. What I am arguing here is that we should be very careful in selling visualization as a tool for people who don’t know what question to ask. This is simply not true. Everyone has a question and actually I even believe everyone should start with a question.

There are a couple of words I like more when talking visualization: hypothesis and explanation. These are great words! They describe much better what visualization is good for. You might actually have a good question to start with but not a good hypothesis or explanation for what is going on there (some patients develop unexpected complications after receiving a particular treatment and you don’t know why). And visualization can for sure help you out with coming up with one. Visualization is an “hypothesis booster”. It’s actually so effective that it could even be dangerous in this respect (it may bias you toward some explanation)!

So next time you talk about visualization restrain yourself to selling it for a tool to help people aimlessly explore some data. And when you hear someone saying that please send him or her to this post. I’d be happy to defend my position :)

Am I missing something here? Am I totally wrong in some sense? I know there are some people out there who would strongly disagree with me, feel free to let me hear your voice!

 

 

Data Visualization Semantics

A few days ago I had this nice chat with Jon Schwabish while sipping some iced tea at Think Coffee in downtown Manhattan: what elements of a graphic design give meaning to a visualization? How does the graphical marks, their aesthetics, and their contextual components translate into meaningful concepts we can store in our head?

Everything started from us discussing the role of text in visualization and how labels and annotations play a big role in this sense. Try to think about visualization with no text at all: where does the meaning come from?

I think interpretation depends at least on these two main factors: (1) background knowledge in the reader and (2) semantic cues in the graphics. Interpretation is a sort of “dance” between these two elements: what we have in our head influences what we see in the graphics (this is a very well known fact in vision science) and what we see in the graphics influences what we think.

Background Knowledge. No interpretation can happen if we do not connect what we see with information that is already stored in our head. That’s the way Colin Ware puts it in his “Visual Thinking for Design“:

“When we look at something, perhaps 95 percent of what we consciously perceive is not what is “out there” but what is already in our heads in long-term memory. The reason we have the impression of perceiving a rich and complex environment is that we have rich and complex networks of meaning stored in our brains” [Ch.6, p.116]

And:

“… we have been discussing about objects and scenes as pure visual entities. But scenes and objects have meaning largely through links to other kinds of information stored in a variety of specialized regions of the brain” [Ch.6, p.114]

We are so fixated with data today that we end up forgetting data is merely a (dry) representation of a much more complex phenomenon, and that people need to have their own internal representation of this phenomenon in order to reason about it. This is independent from the data and it plays a big role on how people interpret and interacts with a visualization. Sure, one could always analyze a graph syntactically and say that something is increasing or decreasing over time, or that some “objects” cluster together, etc. But is that useful at all?

Of course, interpretation is subjective and biases pop up all the time, but how does the designer’s intent interact with all the preconceptions, biases and skills of any given reader? This is a huge topic and I don’t see anything around that can help us sorting these thing out.

Interestingly, I see two opposite cases taking place in visualization use and practice. When visualization is used mainly as a communication tool, that is, to convey a predefined message the designer has crafted for the reader, the reader has to be educated before interpretation takes place.

But when visualization is used as an exploratory or decision making tool developed for a group of domain experts, we have an opposite kind of gap: the designer is typically ignorant about the deep meaning of the data and needs to be educated before good design takes place. Without a very tight collaboration between the designer and the domain scientist it’s practically impossible to build something really useful. I have experienced that myself many times. Unfortunately, such a tight collaboration does not happen easily and it’s very hard to establish in the first place.

Semantic Cues. The way visualization itself is designed can support or hinder the semantic association between graphical elements and concepts. The minimum requirement is that the user understands how the graphics works and what it represents. Some charts are easier to interpret because people are familiar with them, some others are fancier and need additional explanations.

But even when a chart is familiar, explanations are needed to understand what the graphical objects represent. I have seen this problem so many times in presentations, especially when some fancy visualization technique is used: the presenter does not describe the semantic associations well enough and the audience gets totally lost.

Other than showing trends and quantities visualization needs to make clear how to create a mental link between the objects stored in your head and those perceived in the visualization: the “what”, “who”, “where”, elements. The theory of visual encoding is so heavily based on the accurate representation of quantitative information that it seems like we have totally forgotten how important it is to employ effective encodings for the what/where channels. This is perhaps why visualization of geographical data is often on a map. Keeping the geographical metaphor intact might not be the “best” visual encoding for the task at hand, yet it carries such a high degree of semantics that it’s hard to shy away from it.

Finally, going back to the original idea of this post, text is king when we talk about interpretation. Seriously, think about visualization with and without text. Text makes visualization alive. It gives meaning to what you see. Among the most common textual elements you can find in a visualization there are: axes labels, legend labels, item labels, titles, annotations, but I guess there are many unused/under-researched aspects. Labeling is tricky and not well studied yet (except for label placement, a quite extensively developed niche). For instance, when the number of data labels shown is higher than a few units there is a high risk to clutter up the screen and no obvious solutions exist.

Also, there is a limited understanding of what’s the best way to integrate visualization and text in a much more natural and seamless way, which goes beyond simply attaching labels to objects and very well beyond the scope of this post.

And you? What do you think? Did you ever think about how meaning is conveyed in visualization? Anything to add?

Thanks for reading. Take care.

What’s the best way to *teach* visualization?

Yes teach, not learn. I have been writing about ways to learn visualizations multiple time (here, here, here, here, here, and, here) and others have done it multiple times too, but I am more interested in questions about how to best teach visualization now. I have been teaching a whole new Information Visualization course last semester and I honestly have several doubts on what’s the best way to teach visualization.

First, I need your feedback.

I will get to the main doubts I have in a moment but before that I want to ask you a favor: if you teach or ever taught a course, I would be very happy to hear from you your opinion and experiences with teaching visualization. I know, as a matter of fact, every teacher has big questions and doubts about best teaching practices. I’d love to hear from you the ups and downs of your teaching experiences.

I would also be very happy to get feedback from people who have attended visualization courses in the past. Did you ever attend a visualization course? If yes, what do you think is the best way to be taught about visualization? Is there anything that worked especially well or bad. What is the most challenging part? Theory, practice, tools, examples? What does or doesn’t work in class?

My InfoVis Course

My Information Visualization course is held at NYU-Poly and it’s open to students of every level (undergrads, grads, and phds). The course is organized around lectures, reading assignments, exercises (mostly in class) and a (big) final project. The project is the most important part of the course and my students work on it for more than half of the time of the whole course (about 2.5 months).

The course focusses mainly on visualization theory (mostly perceptual issues and visual encoding) and on the visualization design process. I have two main goals for my students: (1) make sure they can, for any given problem, explore a very large set of solutions (rather than focus on the first one that comes into their mind), (2) predict as much as possible what works and what does not work, that is, design and implement effective visualizations.

Issues with teaching visualization.

Here is a list of some specific issues I have with my courses.

Visual literacy. I have noticed this problem multiple times in my courses: I show a wide array of visualization examples early on in the course and then I quickly focus on the nuts and bolts of visualization design. The students seem to understand what I teach but they simply have not experienced enough visualization design to really internalize and fully understand what I say. Also, when we come to the problem of designing some visualizations for a data set they have never seen before in the past they don’t have enough of the visualization design space in mind to explore all the interesting possibilities. They are mostly anchored to what come first (usually from experience). How do you develop visual literacy early on in the course? I am not sure but I feel students need a much deeper immersion into the visualization design space before they are able to work on it.

Tools. I usually give total freedom to my students to choose whatever tool they want (I usually make the bad joke they can code visualization in assembler if they want to). I give some detailed advice on what I think are the best tools around but then they choose and learn the tools on their own. I used to think that tool choice is a secondary aspect of teaching visualization but I totally changed my mind. Visualization, as any other design craft, is totally dependent and shaped by the tools one uses, both consciously and unconsciously. The tool you use will give a certain shape and frame of mind to the visualization you produce (I first learned about the idea of how technology and context shape the creative process from David Byrne’s book How Music Works). Furthermore, there is the issue coding vs. noncoding. I know a lot of people do great visualization without writing a single line of code. Yet, I think coding gives much more freedom. I now believe it is much more effective to give tools the role they deserve and teach one (max two) core tools in my future courses. How do you use tools in your course? Is it an accessory or fundamental aspect of your course?

Projects. I think there’s no doubt visualization can only be learned by doing a lot of practice. I use to repeat that visualization can only be judged when you see it not when you think about it. Projects are a great way to put in practice what you learn and solve some interesting and challenging problems. Yet, how do you split the time between the project part and the lectures? How early should the student work on their projects? Ideally they should start very early on so that they have enough time, but shouldn’t they first acquire some basic knowledge? But how can they acquire this knowledge without practice? Again, I used to believe the best way is to split the course into two main periods: the lectures period and the project period. Now I am no longer sure this is the best way to go. How about mini-projects? Projects with a much shorter time span but nicely interleaved with the lectures? It sounds like a nice option. Yet, when are the students confronted with a more realistic medium-size project. Can one afford to have mini-projects AND one big project? I am not sure.

There are many other issues I have but these are the most pressing ones I had.

Now it’s your turn! What’s the best way to teach visualization? I’d love to hear your opinions and experiences. Thanks!

Take care.

 

 

The (new) VIS 2013 Industry and Government Track

It always sounds weird to me when I have to explain how the  VIS Conference (formerly VisWeek) is not only for academics. Until now, I always had to use a number of convoluted arguments to explain it but now I have one more tool in my swiss army knife: the industry and government track. What does it mean? Well, it basically means that if you want to show your work at the conference you have a very specific track for industry and government work.

I think this is very useful for people who are not in academia and important to increase the dialogue between researchers and practitioners, so I decided to ask a few questions to the track chairs Danyel Fisher, David Gotz, and Bill Wright. Especially, I wanted them to explain how it works and why you should participate. And also how to convince your boss!

[Note: the deadline is very very close: June 27. I apologize for posting this that late but I hope you may still submit something. Also, from time to time submission deadlines get postponed, so keep an eye on it!]

The new deadline is July 3rd, 2013.

What is the Industry and Government Track?
The Industry and Government Track is ideal for people who work with visualization in their day-to-day jobs: whether building a visualization dashboard for a business intelligence application, or putting together charts and graphs to explain their products or policies to customers, or just exploring their own data. We want to help them participate, learn from, and teach the IEEE VIS community.

The Track consists of several components: a poster session for practitioners; a panel of invited talks from industrial visualization-users; and a series of other events through the conference—workshops, tutorials, panels, and papers—that are likely to be particularly interesting to practitioners. In addition, companies that use or create visualizations are invited to exhibit in our exhibition hall; a discount “startup” package helps small companies get exhibition space for a song.

Why this new track at VIS 2013?
The VIS conference has traditionally been an academically-oriented conference, with some of the most innovative work in information visualization and visual analytics. But it can also be a little bit insular: many of the cool visualizations don’t make it outside our community, and we aren’t necessarily aware of the challenges that drive the outside world.

That’s certainly not always true, of course: the VAST conference does a great job of watching how data analytics works in the real world; tools like Many Eyes and D3.js have made a substantial impact; and the conference attracts attendees from Microsoft, IBM, and Google, as well as a variety of government agencies and smaller companies. However, VIS 2013 would like to increase the amount of mixing between these communities. We believe that sharing ideas and building connections across these artificial boundaries would be beneficial to . We think that it is a great time for us to share our research with a broader audience—and we’d like to learn from the outside, too.

Who should submit to it and why? What can one get out of presenting a poster at VIS 2013?
Anyone who has solved an interesting problem with visualization—in the way they share it, or show it, or the angle they take on the data—is welcome to submit a poster. So is anyone who is working their way through a broad problem. A detailed Call for Participation with submission instructions can be found at on the industry and government track web page. The deadline for submissions is June 27, 2013.

Posters are a great way to share your work with the conference attendees. Posters will be displayed in a prominent location for several days at the conference, right next to the research and student posters that have been a traditional part of the VIS conference over the years. Attendees can browse the posters throughout the conference, and a formal poster reception gives brings large audiences to the poster gallery. In addition, there will be poster “fast-forward” held during a general session in front of all attendees. During the fast-forward, poster presenters can speak briefly about the main idea behind their poster. Finally, poster abstracts will appear in the published conference proceedings.

With all of these events, presenting a poster should give participants a context to more easily meet other interested conference-goers, and will get them broad exposure to the community. And of course posters aren’t the only reason to participate. We also encourage folks to join in the many other IEEE VIS events throughout the week.

Ok sounds great but … how do I convince my boss to fund my trip?
The best way to change someone’s mind about funding your trip is to focus on the business value that you’ll get from attending. And we hope that you’ll find it to be pretty easy argument to make.

For experienced visualization practitioners, attending IEEE VIS can connect you with new contacts from around the world. Leading experts from industry, government, and academic research centers all descend on IEEE VIS to talk about visualization, foster collaboration, and learn from each other. The variety of expertise that gathers from around the world makes VIS a great networking opportunity. But it isn’t just experts. VIS can also be a fantastic place to recruit fresh new talent as many of the top visualization students from around the world come to showcase their latest projects. And speaking of projects, VIS is a great place to learn about new developments in the field. Research talks showcase the latest work emerging from labs across the world. Learning from these presentations can help you keep your projects fresh and cutting edge as the field continues to evolve.

For those who are newer to the field, an added benefit is the opportunity to learn from experts from a huge range of backgrounds. Formal tutorials and workshops are a key part of the program and offer lessons or discussions on specific focused topics. Panel discussions are another great places to gain insights by listening to leading researchers or practitioners share their insights. The program includes several social functions and coffee breaks where you can borrow the ear of experienced visualization researchers and practitioners to gain insights into the problems you are facing in your own work.

And finally, if you are like many IEEE VIS attendees, you’ll come away inspired and overflowing with new ideas to bring back home. So tell your boss about all of the great things you’ll learn, the contacts you’ll make, the skills you’ll develop, and the energy and innovation that you’ll bring back home with you after the conference. That sounds like a winning argument to me!

Thanks Danyel, David, and Bill! I hope you’ll get fantastic submissions.

 

Smart Visualization Annotation

There are three research papers which have drawn my attention lately. They all deal with automatic annotation of data visualizations, that is, adding labels to the visualization automatically.

It seems to me that annotations, as an integral part of a visualization design, have received somewhat little attention in comparison to other components of a visual representation (shapes, layouts, colors, etc.). A quick check in the books I have in my bookshelf kind of support my hypothesis. The only exception I found is Colin Ware’s Information Visualization book, which has a whole section on “Linking Text with Graphical Elements“. This is weird because, think about it, text is the most powerful means we have to bridge the semantic gap between the visual representation and its interpretation. With text we can clarify, explain, give meaning, etc.

Smart annotations is an interesting area of research because, not only it can reduce the burden of manually annotating a visualization but it can also reveal interesting patterns and trends we might not know about or, worse, may get unnoticed. Here are the three papers (click on the images to see a higher resolution version).

Paper#1: “Just-in-time annotation of clusters, outliers, and trends in point-based data visualizations. Kandogan, Eser. Visual Analytics Science and Technology (VAST), 2012 IEEE Conference on. IEEE, 2012.

 

This annotation works on point based visualizations. The system detects trends automatically by analyzing the visual information displayed on the screen (that is, patterns are detected in the visual space, not the data space) and tries to find a description for the observed trends. Once a description is found, the system overlays labels that convey this information. So, for instance, in the image above the algorithm finds visual clusters (groupings) and annotates them with the data values that most explain the trend (data dimensions and values that have a unique distribution in the cluster). The paper does not focus only on clusters, it provides techniques to annotate trends and outliers as well and it describes the whole framework in a way that it is easy to imagine how this can be extended to other domains and visualizations.

Paper #2: “Contextifier: Automatic Generation of Annotated Stock Visualizations. Hullman, Jessica, Nicholas Diakopoulos, and Eytan Adar. ACM Conference on Human Factors in Computing Systems (CHI). May, 2013.

Contextifier automatically annotates stock market timelines (like the one shown above) by discovering automatically salient trends in the charts (peaks and valleys) and corresponding news that might be relevant to explain the trend. The system is based on an input article and a news corpus. The input article is used as a query to find relevant news in the corpus and to match them against salient features in the graph. Articles and trends are matched to decide which time points should be annotated. These points are subsequently annotated with the most relevant news in the corresponding time frame. The paper also contains a very interesting analysis of how visualization designers annotate their visualization. The outcome of this analysis is used to inform the design of the annotation engine.

Paper #3: “Graphical Overlays: Using Layered Elements to Aid Chart Reading. Kong, Nicholas, and Maneesh Agrawala. Visualization and Computer Graphics, IEEE Transactions on 18.12 (2012): 2631-2638. [Sorry no free access to this one.]

Graphical overlays actually does much more than annotating a chart with text, it’s a whole system to add information on top of existing charts to aid their reading. So, for instance, other than adding notes to a chart to identify potentially interesting trends it also adds grids, highlights elements of a specific type (e.g., one set of bars in a bar chart), adds summary statistics (like an average line in a time chart). The system works entirely on image data, which means it does not require direct access to the original data used to create the chart. In the authors’ words: ” Our approach is based on the insight that generating most of these graphical overlays only requires knowing the properties of the visual marks and axes that encode the data, but does not require access to the underlying data values. Thus, our system analyzes the chart bitmap to extract only the properties necessary to generate the desired overlay.

These three papers present very clever mechanisms to annotate visualizations in different contexts and with different purposes. I suggest you to give a look to the papers because they provide numerous interesting technical details. Beyond the technical aspects though I believe it is interesting that a some researchers are independently focusing on visualization annotation. Annotation is extremely important and I think we did not spend enough energy in exploring its potential and challenges. I also think there is an educational gap we should cover, that is, how do we teach our students when, how and why a visualization should be annotated?

I am curious to hear from you what you think. What do you think about the papers I presented? And what do you think about annotation in general? How do you deal with annotations yourself?

Take care.

Visualization Papers at CHI 2013

I just came back from CHI 2013, the premier conference on human-computer interaction (Paris was chilly and expensive. Yet, dramatically beautiful, as always). Here is a selection of interesting visualization papers I picked up from the program.

Using fNIRS Brain Sensing to Evaluate Information Visualization Interfaces. Interesting study from Tufts University on the feasibility of using brain scanning techniques to study mental workload in visualization.

Weighted Graph Comparison Techniques for Brain Connectivity Analysis. Excellent study on the ever-lasting battle between node-link graphs and matrices (to visualize weighted graphs in this case). Matrices win over node-links almost in every task. Very good example of exploration and evaluation of a specific design space. A lot to learn here.

The Challenges of Specifying Intervals and Absences in Temporal Queries: A Graphical Language Approach. Visual and interaction design study to allow end-users (doctors in this case) to specify complex temporal queries without writing a single line of code. It makes me think how visualization can and should be used not only as an output device but also a way to facilitate inputing data into a system.

Evaluating the Efficiency of Physical Visualizations. User study comparing 2D and 3D bar charts on a standard computer display to physical bar charts fabricated with a laser printer. Physical 3D is more effective than display 3D. Why? See the paper. (Side note: we featured this work in a Data Stories episode on Data Sculptures)

Contextifier: Automatic Generation of Annotated Stock Visualizations. Automatic annotation of stock market line graphs by extracting text from news articles. Annotation has been neglected for a while in vis (maybe because text is not considered part of the visualization?) but I think it’s super important. This is a great first step in the right direction.

Motif Simplification: Improving Network Visualization Readability with Fan, Connector, and Clique Glyphs. We all know how easily graphs can turn into hairballs. Motif simplification is a smart way to reduce the complexity of graphs by aggregating nodes into predefined glyphs.

Evaluation of Alternative Glyph Designs for Time Series Data in a Small Multiple Setting. User study on the comparison of icon-sized time-series visualizations. Two aspects are evaluated: layout (circular, linear) and value coding (length, color intensity). The study leads to a number of design guidelines (and hey … I am one of the co-authors here :))

I hope you’ll enjoy reading these papers. There is a lot of food for thoughts here. Comments, requests, criticism, always welcome.

Take care.