Author Archives: Enrico Bertini

Telling Stories Or Solving Problems? Towards “Problem-Solving” Visualization

Yesterday I stumbled upon a recently published and excellent visual analytics case study: the Circle Line Rogue Train Case. The article describes how data scientists at GovTech’s Data Science Division in Singapore used visualization to discover the origin of a recurring and mysterious disruptions of Singapore’s MRT Circle Line.

This study represents my ideal case for data visualization: the visual exploration of a given data set to solve an important problem somebody has.

Why is it so rare to read about problem-solving visualization?

Besides deeply enjoying reading the story and the data analysis behind it, this study got me thinking: why is it so rare to read about this kind of projects? Is it because not many people actually carry out this kind of work? Or maybe because nobody dares to write about them? Or maybe they are out there but we never notice?

Beyond hedonistic and narrative visualization.

When I look at the data visualization landscape I recognize that the very large majority of projects we see around fall into two main categories:

  • Hedonistic Visualization: showing how cool something is.
  • Narrative Visualization: supporting a narrative (often journalistic, sometime scientific).

Before I move on, let me clarify I have nothing against these two categories. I am myself as “guilty” as anyone else of being a purveyor of these two kind of visualizations. They are a the root of the enormous success visualization experienced in recent years.

Yet, when I look at example of “problem-solving visualization” I can’t stop from thinking: “Yes! That’s what really matters to me!“, “That’s what I want to see people doing more“,  and “That’s what I want to teach my students how to do“.

The limits of Data Journalism.

As much as I love data journalism and I consider it a cornerstone of an era in which possibly people can use reason to discuss issues affecting our society, I do believe it is limited. The biggest limit of data journalism resides in its “storytelling bent”. Journalists want to tell you a story and they need to entertain you. This, in turn, poses some important problems. First, it’s hard to analyze data without thinking of the story one has in mind. Second, not all important problems out there are journalistic problems. Many problems are practical issues people have and that can hopefully be solved by analyzing some data. We don’t see much of that but virtually every organization needs it.

The limits of Scientific Communication.

It may seem weird at first sight, but I do believe scientists suffer from the same problem journalists have. Probably even more: they want to show you their scientific story. I know many will read this as an heresy but it’s pretty much a matter of fact. This becomes even clearer when one considers the cornerstone of scientific endeavors: “hypothesis testing”. Scientists don’t have a neutral stance, they want to demonstrate the hypothesis they have generated before carrying out the science and have a strong incentive to convince you they are right. But I am digressing …

Data Science as “Problem-Solving” Visualization.

With the advent of Data Science it has become imperative for us to figure out how to extract actionable information out of data. I must confess I am concerned with equating Data Science with Machine Learning and automation. Not because I do not believe they are important or worth pursuing, but because an excessive focus on automation may prevent us to work more on the important skill of augmenting humans in ways that can help them reason better on important issues using data. This is a crucial skill we have to cherish and develop further. I really hope Data Science programs throughout the country would like to emphasize this aspect more. As the “Rogue Train” case demonstrates, there are cases in the world where solutions really need to come from a human taking care of it.

Towards a better pedagogy of “thinking with data”.

I want to conclude by saying that I truly believe we, as a community, have to develop the right curricula and infrastructure to teach these crucial skills to aspiring data scientists. It may be less fancy than learning how to implement the latest deep learning technology but not less important.

I, for one, am trying to add a little piece to this puzzle. Next semester I will be teaching a new course at NYU called “Data Sensemaking”. My goal there is to teach students how to carry out projects like the “Rogue Train” case.

I have to confess I don’t know how to teach it. It’s going to be a big challenge! In any case I will be reporting my experience here as I move steps through the weeds.

And you?

What do you think? Am I making sense here? Did I miss anything important? I’d be happy to hear …

Quantifying and Visualizing “Deep Work” (reposted)

A few days ago I posted my first article on Medium on “Quantifying and Visualizing “Deep Work”“. This is a little personal visualization project I developed over the holidays.

I have been collecting personal data for the whole year on what I call “deep work sessions”: times when I deliberately decide to work with maximum focus and no interruptions or distractions on something.

At the end of the year, I decided to take a look at the data and build the story of my “deep work” in 2016. In the article you’ll also find some reflections on the process and what I have learned by doing it.

I hope you’ll enjoy it! Feel free to comment here or on Medium if you wish.

p.s. My first experience with Medium was so good that I am considering switching entirely. I still have to figure out what the advantages and disadvantages may be but I for sure loved the writing experience and the feedback I received.

InfoVis Course Diary: The Course Recap Exercise

The last day of my course this year I decided to try a new experiment: the “course recap exercise”. I asked each student (individually) to create a new google doc file in our class folder and to answer questions about what they have learned during the course.

It occurred to me that if I have done my job right, students should be able to remember the most important ideas and principles presented during the course.

I also wanted the students to have an opportunity to actively reflect on what knowledge and skills the course provided to them, with the hope that in so doing they would cement them even further.

The exercise turned out to be incredibly instructive, both for me and for my students.

Here is what I asked precisely:

  • Question 1: what are the top ten most important concepts/ideas you have learned in this course? I asked the students to avoid talking with their peers and looking into our study material but just try to recall concepts from their mind.
  • Question 2: What is the most important idea/concept you have learned about each of the following topics (focus on 1 only): Data Abstraction, Fundamental Charts, Visual Encoding, Color and Other Channels, Visualization Guidelines, Interaction and Multiple Views, Spatial Data. In this second step I asked the students to recall one specific idea or concept for each of the major modules we worked on (no, we did not cover networks and trees and the world did not collapse). This second step ensured me they would not try to recall at least one concept from each of the main topics we covered.
  • Question 3: What is the most important lesson you have learned in each of this type of activity: Projects, Data Analysis and Presentation, Chart Decomposition, Vis Design Workshops. In this final question I aimed at facilitating mental processing of lessons learned while doing practical work.

The results of this exercise went way better than I had imagined. I was really surprised by how many concepts, ideas and principles the students recalled. At the end of each question I also spent considerable time probing the students with more detailed questions trying to understand if they only understood the concepts superficially, parroting the things that I went repeating throughout the course, or they had internalized them in ways that allow them to reason productively. To my surprise, most of the probing went pretty well and confirmed me that understanding visualization concepts is not particularly hard.

On a negative note, however, all this knowledge does not translate into being particularly proficient in developing effective visualizations. As I will explain in future posts I did not see huge improvements in the way students designed and developed their projects. I keep seeing a big gap between visualization theory and practice.

On the value of nongraded assessments.

In any case, this little exercise was another opportunity for me to test a concept I started experimenting with this year: the idea that assessment and grading do not need to go together and that actually coupling them together can also be detrimental.

All students seemed to be pretty relaxed in answering the questions I posed and discussing their answers allowed us to discuss many details in much more depth.

I don’t know why it occurred to me to assign this exercise only at the end of the course but I believe the same structure can be used at regular intervals during the course to better understand what students are learning and how to fix potential gaps.

InfoVis Course Diary: Developing Visualization Design Workshops

During the last four/five weeks of the course I have been assigning visualization design exercises in class. The main idea is to assign practical design problems to students to solve in class during a workshop of about 2.5 hours. Here is a description of how I am organizing the workshops and what I have learned so far.


The reason why I decided to create and assign design exercise in class is because in past years I have always been frustrated with how little students learned by listening to my lectures and by observing how quickly they learned once we worked together on the design problems they had to solve for their group project. The philosophy behind the workshops is therefore to bring more of this experience in class, together with the advantage of having the whole class working on the same problem in a much more structured way (in group projects each team solves a different problem independently). Another advantage of workshops is that when I give feedback to one group of students all the other students can listen and relate what I am saying to their own solution.

Preparation (Reverse Engineering Vis Projects)

To prepare these exercises I decided to use the following strategy: reverse engineer existing visualization project I like. I start from an existing project and I effectively go from solution back to the original problem. Inspiration can come from many sources: papers published at IEEE VIS or ACM CHI; projects developed by some established visualization designer or newspaper; class projects developed during past editions of the course.

(Note: I recently discovered Shiqing He and Eytan Adar use the same strategy in their beautiful and admittedly much more advanced VizItCards method)

How do I do that? I focus on the following elements:

  • Problem Statement: What is the original problem they wanted to solve? Who has this problem? Why is it interesting and important?
  • Data Set: What data set did they use? Is the data set available? If not, can I use a similar one?
  • Questions: What are the driving questions they want to answer by looking at the visualization?


Once I get in class, I typically follow this sequence:

  1. I ask my students to form groups (basically the same every week).
  2. I read the whole text of the exercise and make sure everything is clear to everyone.
  3. I ask students to read everything on their own.
  4. I ask students to create individual solutions first.
  5. I ask students to discuss their solutions and create a group solution.
  6. Student put their solutions in a shared google doc, one for each team (I don’t care about possible cheating).
  7. I create a set of slides (on the fly) with students’ solutions and comment on them in front of the class.

What I noticed during these last few weeks is that the are many implementation details one needs and should work on to tune the execution of the workshop.

  • Group and individual work. I still have to find the right balance between individual and group work. But after some experimentation I believe both are highly needed.
  • Material and techniques for creating mockups. I am not particularly happy with the unstructured way I let students create their mock-ups. Some are very good, some are very bad. In the future I want to give more precise instructions and unify the way mockup are developed.
  • Using whiteboard and markers to create mockups. I discovered, almost by chance,  that using whiteboards is a much better way for groups to generate and discuss mockups. If you can afford having multiple whiteboards or, even better, you have the luxury of having “whiteboard walls” in your classroom you should try this out. The biggest difference is that with whiteboards everyone can see what is happening and participation and feedback happen much more naturally.
  • Deriving patterns and principles from the solutions. Ideally, at the end of each workshop we should be able to derive general principles students can apply to other cases, beyond the specifics of the exercise. I truly believe this is an important pedagogical step. I started collecting some of these ideas and patterns at the end of each workshop but I’d like to find a better and more systematic way to do it in the future. Ideally, these patterns and principles may be reused in future workshops as the material develops further.

Sharing workshop material

I plan to share all of the exercises I created for my course later on in 2017 and make it available for everyone to use. I just need to give it a more decent shape. I’d be more than happy to develop these exercises further together. Just stay tuned!

Developing a “Data Sensemaking” Course

As we approach the end of the Fall semester I start thinking about the new course I’ll have to teach in Spring. It turns out I decided to design and give a new course I am really excited about. It’s name is “Data Sensemaking” and I will be teaching my students how to derive knowledge out of data using interactive visualization and data processing tools. I am writing this blog posts to share some of the ideas I am “brewing” and hopefully to get some feedback from you to see if you can help me perfect my plans.

What is “Data Sensemaking”?

The main purpose of the course is to teach students how to use a variety of exploratory data analysis methods to extract information and hypotheses from data. Some people call this Exploratory Data Analysis (EDA), but I preferred to use Data Sensemaking because EDA carries a somewhat heavy baggage from years old statistical debates. I also like the name “data sensemaking” because it refers more to the outcome than the process.

Why a course on “Data Sensemaking?”

First, because after teaching Information Visualization for a number of years I realized that students lack the very basic knowledge needed to reason with data and this is what matters the most today. It does not matter how cool a given visualization tool or solution is, ultimately we need people to extract valuable and possibly actionable information out of data.

Second, because people are fixated with complex statistical models and machine learning (which are mostly applied blindly to data problems) when in fact what we need to perfect is our ability to reason with data. There is ready-made recipe for this and because of that I find teaching this topic fascinating and enriching. Our brain and rational thinking is the biggest asset we have and I am appalled at how little information and training programs exist to develop our ability to reason effectively with data.

Third, because I want to learn more myself and the best way to learn I know is to teach. While there are some skills in data sensemaking that I have been honing for several years, there are other I feel I need to perfect and develop much much further.

What will I teach in the course?

This is still work in progress but I have a tentative set of topics in mind I’d like to share with you to see what you think about it. Here is my current set of topics:

  1. Defining (worthwhile) data problems. When is a data problem a good problem? How does one go about defining its requirements and constraints?
  2. Asking good and effective questions. How can one systematically transform a foggy data problem into a set of well defined analytical questions? What is a good question?
  3. Finding, generating, and manipulating data. How does one find or generate data useful towards solving the stated problem? What kind of data manipulation and integration are needed? How does one know if the data are good enough?
  4. Exploring data to generate answers and questions. How does one explore data with interactive data visualizations to investigate the stated questions? How does one deal with the new questions/ideas inevitable arising during data analysis?
  5. Biases, lies and data malpractice. How does one avoid to be fooled by data and visualization? What are the major traps out there? What is the best mindset for “data investigators“?
  6. Creating effective data narratives. Once the analysis is ready, how do you organize and re-design the results to communicate them in a way that people are willing to listen/read and actually understand your message?

Major doubts/hurdles.

When looking at the list above there are a few elements I am still uncertain about:

  • What makes a data problem a good problem? How do I teach students what a good problem is? We are so much used to thinking about solutions and so little about problems. But, in the end generating worthwhile problems is the most important thing to do. Do you know of any resources on how to pursue and define good problems?
  • What statistical fallacies, biases, etc., should I focus on? There is an endless list of biases and fallacies humans are prone to commit. How do I prioritize them. What is the core set of concepts and examples I should focus on? How can I uniformly cover problems with thinking, statistical and visualization fallacies?
  • How do I teach the narrative side of visualization? I know very well how to teach which chart is “right” for a given communication task. But this is only a fraction of what one needs to learn to communicate effectively. The narrative style, the sequence and the interrelation between text and graphics is also very important. How do I teach this? Is there any book or resource I can use to cover these aspects?

What else?

What else should I teach? If you would be one of my students, what else would you like me to teach you? Note that I omitted the technical side of it but I will be teaching some practical tools to use. I plan to focus mostly on Rstat and Tableau. I also plan to have each student set up a blog in which they have to create a number of data sensemaking mini-projects.

Let me know what you think!!!

What do we talk about when we talk about “Data Exploration”?

There is an old “adage” in the InfoVis / Visual Analytics community I have heard a zillion times: “visualization is needed/useful when people don’t have a specific question in mind“. For many years I have taken this as “the verb”. Then, over time, as I have grown more experienced, I have started questioning the whole concept: why would someone look at a given data set if he or she has no specific goal or question in mind? It does not make sense.

This is an aspect of visualization that has puzzled me for a long time. An interesting conundrum I believe it is still largely unsolved. One of those things many say, but nobody really seems to have grasped in full depth.

Here is my humble attempt at putting some order into this matter. Let’s start with definitions:

The definition introduces a couple of very important features: familiarity (“through an unfamiliar area“) and learning (“in order to learn about it“). If we take this definition as main guidance, we can say that data visualization is particularly helpful when we use it to look into some unfamiliar data to learn more about something.

I suspect there are (at least) three main situations in which this can happen.

  1. Need to familiarize with a new data set (“how does it look like?“). Anyone who dabbles with data a bit goes through this: you receive or find a new data set and the first thing you need to do is to figure out what information it contains. How many fields? What type of fields? What is their meaning? Are there any missing values? Is there anything I don’t actually understand? How are the values distributed? Is there any temporal or geographical information? Are we actually in presence of some kind of network or relation structure? Etc. One crucial, and often overlooked, aspect of this activity is “data semantics”. I personally find that understanding the meaning of the various fields and the values they contain is a such a crucial and hard activity at the beginning. An activity that often requires many many back-and-forth discussions and clarifications with domain experts and data collectors.
  2. Hunting for “something” interesting (“is there anything interesting here?“). I suspect this is what people mostly really mean when they talk about “data exploration“: the feeling that something interesting may be hidden there and that some exploratory work is needed to figure it out. But … When does this actually happen? What kind of real-world activities are characterized by this desire of finding “something”? I am not sure I have an all-encompassing answer to that, but I am familiar with at least two examples: data journalism and quantified self. In data journalism it is very common to first get your hands on some potentially “juicy” data set and then try to figure out what interesting stories may hide there (Panama Papers, Clinton’s Emails, Etc.). I have observed this in our collaboration with ProPublica when hunting for stories about how people review doctors in Yelp. In quantified self you often want to look at your data to see if you can detect anything unexpected. I have experienced the same when looking at personal data I have collected about my deep work habits (or lack thereof). Sometimes we know there must be something interesting in a given data set, and visualization guides us in the formulation of unexpressed questions. The interesting aspect of this activity is that the outcome is often more (and better) questions, not answers.
  3. Going off on a tangent (“oh … this may be interesting too!“). There is one last, subtler, kind of data exploration. You start with a specific question in mind but, as you go about it, you find something interesting that triggers an additional question you had not anticipated. This is the power of visual data analysis, it forces you to notice something new and you have to follow the path. This happens to me all the time (and I hope it’s just not a sign of my ADD). Some of these are useless diversions. Some of them actually lead to some pretty unique gems!

These three modalities can of course overlap a lot. I am also sure there are other situations we can describe as data exploration which I am not covering here (in case you have some suggestions please let me know!).

I want to conclude by saying that this is an incredibly under-explored area of data visualization. More advances are needed at least in two directions.

  • First, we need to much better understand data exploration as a process and, if possible, create models able to describe it in useful abstract terms. In visualization research we often refer to Card and Pirolli’s “Sensemaking Loop” to describe this kind of open-ended and incremental activity but for some reason every time I try to use it, it does not seem to describe what I actually observe in practice (this deserve its own post).
  • Second, we need to develop more methods, techniques and tools to support interactive data exploration. I bet there are lots of “latent needs” waiting to be discovered out there. This is another area where I believe we, visualization researchers have surprisingly made little progress. We have built a lot of narrow solutions that work for 3-5 people but very few general purpose methods and techniques. We need more of that (this also deserves its own post)!
  • Third, we need to find ways to teach exploratory data analysis systematically to others in ways that make the process as effective as possible. I am appalled at how little guidance and material there is out there on teaching people how to do the actual analysis work. Statisticians are fixated with confirmatory analysis and regard exploration as a second-class citizen. Visualization researchers are too busy building stuff and have done too little to teach others how to do the actual ground work. This is a problem we need to solve. It’s for this reason that next semester I will be teaching a new course with this specific purpose. Stay tuned.

That’s all I had to say about Data Exploration.

And you? What is your take? What is data exploration for you? And how can we improve it?

Take care.

11 (Papers + Talks) Highlights from IEEE VIS’16


Hey, it took me a while to create this list! But better later than never. Here is my personal list of 11 highlight from the IEEE VIS’16 Conference.

If you did not have a chance to attend the conference you can start from here and then look into the following links:


Surprise! Bayesian Weighting for De-Biasing Thematic Maps.
Michael Correll, Jeffrey Heer.

Did you ever stumble into one of those choropleth maps in which the distribution of a given quantity is shown, (say, number of cars from a given manufacturer) but the only signal you can see is actually population density? This is the kind of problem Surprise! addresses. It deals with situations in which the quantity one wants to depict is confounded by another variable. To solve this problem Surprise! uses an underlying Bayesian model of how the quantity should be distributed and visualizes deviations from the model rather than quantity (hence the name Surprise!).

I think this is a brilliant idea which addresses a super common problem. I have seen people stumble into this problem countless times and I am glad we finally have a paper that explains the phenomenon and proposes a solution. The only issue is that visualizing surprise is not as natural as visualizing the actually quantity; which is normally what people would expect. One open challenge then is how to communicate both values at the same time.

Vega-Lite: A Grammar of Interactive Graphics.
Arvind Satyanarayan, Dominik Moritz, Kanit “Ham” Wongsuphasawat, Jeffrey Heer.

The IDL team has done over the years and astounding job at developing an ecosystem of frameworks and tools to make the development of advanced visualizations easier and faster. Vega-Lite builds on top of Vega, which they presented last year, and proposes a much simpler language and extremely powerful functions to generate interactive graphics (with linked views, selections, filters, etc.). Arvind and Dominik gave a live demo and I have to say I am really impressed. While most existing frameworks focus on the representation part of visualization, this one focuses on interaction and as such it covers a really big gap. I am curious to see what people will manage to build using Vega-Lite. If you built some interactive visualizations in the past you certainly know that the interaction part is by far the hardest and messiest one. Vega-Lite seems to make it much simpler and straightforward than it used to be. I am looking forward to trying it out!

PROACT: Iterative Design of a Patient-Centered Visualization for Effective Prostate Cancer Health Risk Communication.
Anzu Hakone, Lane Harrison, Alvitta Ottley, Nathan Winters, Caitlin Guthiel, Paul KJ Han, Remco Chang.

PROACT is a simple visualization dashboard that helps patients with prostate cancer understand their disease and make informed decisions about choosing between a conservative solution or surgery. The paper does a great job at describing the context and the challenges associated with such a delicate kind of situation and how visualization systems can be used by doctors and patients to enhance communication.

I consider this paper super relevant. While if you look into the images you won’t be impressed by fancy colorful views and interactions, the system has been demonstrated to be really effective in a very important and critical setting. It also raises awareness about issues we rarely discuss in visualization; especially how to deal with emotions and how to design systems that inform while being careful with the impact such knowledge may have on the viewers.

TextTile: An Interactive Visualization Tool for Seamless Exploratory Analysis of Structured Data and Unstructured Text.
Cristian Felix, Anshul Pandey, Enrico Bertini.

This is the latest product coming out of my lab. I plan to write a separate blog post on it later on. TextTile stems from multiple interactions we had with journalists and data analysts who need to look into data sets containing textual data together with tabular data (e.g., product reviews and surveys). In TextTile we propose a model that describes systematically how one can interactively query data starting from text and reflecting the results on the data table and vice-versa. The tool realizes this model in an interactive visual user interface with a mechanism similar to what is found in Tableau: the user creates queries and plots by dragging data fields to a predefined set of operations. I suggest you to try it on your own! You can find a demo here:

Evaluating the Impact of Binning 2D Scalar Fields.
Lace Padilla, P. Samuel Quinan, Miriah Meyer, and Sarah H. Creem-Regehr.

I chose to include this paper because I found its message extremely inspiring. In visualization research we often cite a principle (proposed by Jock MacKinlay) called the “expressiveness principle“. The principle  states that “a visual encoding should express all of the relationships in the data, and only the relationships in the data“. This paper shows that this principle may actually not always hold. The paper describes experiments in which performance improves when a continuous value is presented with discrete color steps rather than continuous; a solution that breaks the expressiveness principle.  This may seem a minor detail but I believe it demonstrates a much bigger idea: there is lots of conventional wisdom ready to be debunked and it is up to us to hunt for this kind of research. Every single scientific endeavor is a loop of construction and destruction of past theories and idea. This paper is a great example of the destruction part of the cycle. We need more papers like this one!

VizItCards: A Card-Based Toolkit for Infovis Design Education.
Shiqing He and Eytan Adar.

What a lovely lovely project! If you have ever tried to teach visualization you know how hard it is. Students just don’t get it if you give lectures and lots of theory. Visualization needs to be learned by doing. But organizing a course on doing in a systematic way is hard. Damn hard! Shiqing and Eytan have done an amazing job at making this process systematic and easy to adopt. They developed a toolkit and a set of cards instructors can use to guide students during a series of design workshops. One aspect I like a lot, other than the cards idea, is that many exercises have been ideated starting from an existing data visualization project and “retrofitted” to their original “amorphous” status of having a bunch of data and a vague goal. This is what the students are shown at the beginning and at the end of the process they can compare their results with the results developed in the original project. You can find the toolkit here: I am planning to adopt some of it myself next time I’ll teach my course (too late for this semester).

Colorgorical: Creating discriminable and preferable color palettes for information visualization.
Connor C. Gramazio, David H. Laidlaw, Karen B. Schloss.

Creating categorical color palettes is a hard task and if you want to do it manually it’s even harder. Colorgorical is a new color selection tool that enables you to build new categorical color palettes using a lot of useful and interesting parameters, including: perceptual difference, name difference, pair preference, and name uniqueness. An internal algorithm tries to optimize all the desired parameters and generates a new color palette for you. You can also add starting colors to make sure some colors you want to have are actually present in the final color palette. I strongly suggest you to play with it! They have a nice web site explaining all the parameters and a simple interface to generate new palettes.


An Empire Built On Sand: Reexamining What We Think We Know About Visualization.
Robert Kosara.

Robert’s talk was more of a performance than a talk. I really really enjoyed it. His talk at BELIV was all focused on the idea that we in vis regard some ideas as truth and keep repeating them even if evidence for them is actually weak or nonexistent. Robert kept repeating, in a wonderfully coordinated sequence, “how do we know that?” … “how do we know that?” … “how do we know that?“. I loved it. Too bad the talk was not recorded. But you can find the accompanying paper here. Kudos to Robert for assuming the role of contrarian at vis. We really need people like him who do not hold back, speak with candor, and are ready to yell the “the emperor has no clothes”.

We Should Never Stop BELIVing: Reflections on 10 Years of Workshops on the Esoteric Art of Evaluating Information Visualization.
Enrico Bertini.

Here is another one from yours truly. I started the BELIV workshop on evaluation in vis in 2006 with Giuseppe Santucci (my PhD advisor) and Catherine Plaisant and the organizers kindly asked me to give a keynote for the 10 years anniversary. If you click on the URL above you can watch the entire talk. I tried to be funny and also to give a sense of how much progress we have made and what may come next. Evaluation in visualization is a continuously evolving endeavor and there is much to learn and perfect. The vis community has been receptive to new ideas on how to conduct empirical research and I predict we will see a lot of innovation in coming years. Let me know what you think if you watch the video!

Capstone Talk: The three laws of communication.
Jean-luc Doumont.

Wow! I had absolutely no idea who Jean-Luc was before I entered the room and started listening to his talk. This is by far one of the best capstone talks I have ever attended at VIS, if not the best. Jean-Luc gave a talk on how to convey messages effectively and organized it around a number of principles he developed through the years of his activity training people on effective communication. This guy know what he is talking about. His body language, the way he expresses his thoughts, the quality and density of information in what he says, the style of his slides, etc., everything is great. His work can inform any professional who needs to communicate information better, being it visual or verbal. He has a fantastic book which looks very much like Tufte’s but more on general communication. If you have never heard of him take a look at his work, he is amazing … and super fun!

Communicating Methods, Results, and Intentions in Empirical Research.
Jessica Hullman.

Jessica is doing some of the most interesting type of work in visualization. Her blend of core statistical concept and visualization is very much needed and one of the most interesting recent trend in vis: how to use vis to communicate statistics better and, at the same time, how to use statistics to do better vis research. In her short talk Jessica raised a number of important points on how we communicate research, not only to others but also to ourselves, and how we can introduce practices that may reduce the chances we are fooling ourselves. The world of experimental research and statistics is changing very fast and we are witnessing a wave of great self-criticism and reform. While this is true for science in general, the world of visualization research is also very receptive to what is happening and Jessica is one of the few vis people who is helping us make sense of it.

That’s all folks! I hope you’ll find these projects inspiring!

InfoVis Course Diary: Basic Charts Need to Be Learned First

In the third week of my course I introduce fundamental charts. These are charts that are super common and most people are familiar with: bar charts, histograms, line charts, scatter plots, heat maps, etc. This is a major departure from the textbook I use, which introduces methods to visualize tabular data only in later chapters.

Why do I start with basic charts?

It’s pedagogically better to start with basic rather than advanced charts. I truly believe one must first learn how to use basic charts properly before moving on to other more exotic territories. Within their constrained space, there is a lot to learn and there are infinite variations and tweaks one can apply.

These charts, in their most basic format, cover all possible combinations of two attributes, thus giving students a manageable and yet powerful mental model to think about how pairs of attributes can be combined: a scatter plot is made of 2 quantitative attributes; a bar chart is made of 1 categorical and 1 quantitative attribute; a line chart is also made of 2 quantitative attributes but one is special as it represents time;  a heat map is a combination of two categorical attributes and their frequencies, etc.

These charts are also infinitely “tweakable” while retaining simplicity. For instance, what happens to these charts when you need to map an additional 3rd attribute? In scatter plots you can use color and/or size. In bar charts you can use stacked or grouped bars. In line charts you can add multiple lines.

Take this scatter plot below in which I am plotting data from the USDA food nutrients database (each dot is a food, axes are: amounts of mono-saturated and poly-saturated fats).

Scatter plots are infinitely “tweakable”. Here we progressively encode 2, 3 and 4 attributes with position, color hue and size.

In its most basic format it encodes two quantitative attributes (the amount of two type of fats). The next one encodes a third categorical attribute (food type) using color hue. And the final one encodes a fourth attribute (amount of water) with size. That’s a very gentle and yet solid way of introducing fundamental visualization concepts. In class I am actually cycling through many of these examples, starting from the most basic chart and asking students to accommodate more data or needs.

With these basic charts one can also start introducing examples of problematic design solutions. For instance, it’s easy to talk about the “truncated axis” and the “dual axis” problems.

See, for instance, the infamous Planned Parenthood hearing charts that made the news last year.

Misleading chart in which the dual-axis method has been used to give a false impression about the data.

Another aspect is that basic charts are an amazing toolbox for the visualization designers because the very large majority of existing problems can be solved with them. It’s very rare to find a visualization problem that cannot be solved, at least in a first approximation, with these basic designs. Plus, whenever needed, you can always try to apply little modifications to conform them to your needs. I really believe there are way more interesting designs one can generate by tweaking these basic charts than trying to come up with something entirely new.

Finally, basic charts are the most familiar ones. If in your project familiarity is an important aspect you don’t want people to spend time figuring out how to decode your charts. This is an aspect that is often overlooked in visualization. When presented with a new visualization the first thing the reader/user needs to figure out is “how do I actually read this?“.

Some questions from the students …

Before I conclude I want to briefly touch upon a few recurring questions students asked in their reading response exercise (I’m paraphrasing).

Q: “Are pie charts really so bad? I like them!

Students are always puzzled when they see that pie charts are so heavily criticized by some people. I am personally not interested at all about the pie charts debate. I am not because I do not think it’s really consequential. In any case what I tell my students is that they should never use rules blindly. So pie charts are a good example to exercise their own good judgment. When are they appropriate? When are they not? This is what matters the most to me. I am myself not a big fan of pie charts but I do not believe they are evil, and I do believe there situations in which it may be reasonable to use them. Robert Kosara has published a good number of interesting blogs posts and papers on the topic.

Q: “How do we deal with people’s subjective preference for some charts?

That’s actually a big one. Students at the beginning of the course are still puzzled by the idea that some charts may be objectively better than others in some contexts and for some tasks. Because of that, they feel lost when they think that some people reading their charts may actually find them not exactly of their favorite “taste”.

This is too long a subject to be developed fully here. But the main thing to learn is that charts need to be effective before being “pleasurable”. Aesthetics plays a role of course but it cannot subsume effectiveness. Therefore one needs to learn what is effective and what is not and then find a way to “inject” the right aesthetic sense within these constraints.

The best data visualizations out there (and the best designers by the way) are those that are great at finding this fine balance.

But there is more to say on the topic. A very important principle, dear to user experience designers, is that good design is about giving people what they need, not necessarily what they tell you they need. It’s your responsibility as a designer and as an engineer to figure out what is the best solution for your audience, you can’t rely exclusively on what they tell you they want or need.

Q: “How do we deal with large data sets? These charts do not scale!

Correct. Many of the basic charts do not scale to large data sets or large number of values. But this is exactly the point! By realizing what the limitations of basic charts are, one is forced to think about what the alternatives are. It’s a very useful step from the pedagogical point of view.

But even before moving to more exotic solutions, students first need to figure out how scalable basic charts really are. A very consistent trend I found in students is that they are afraid of making charts small and, because of that, they largely underestimate how scalable they really are!

As Tufte’s work on “sparklines” demonstrates, charts can be incredibly small and still convey a lot of information. So, while there are cases in which standard charts may actually not be sufficiently scalable, I consider it crucial to first show students how to make charts small. Very small.

That’s all for now. Hope this helps!

InfoVis Course Diary: Update on Class Flipping

Flipping ClassroomMy adventure on flipping my InfoVis class is moving on. Here is a brief update on what is happening in class and a few small lessons learned during the first four weeks of the course.

Please keep in mind this is work in progress and pretty much a brain dump. I may in the future find that some of these ideas are not as brilliant as they look right now.

Students need more time than you think to develop their exercises in class. During the first three weeks I have consistently underestimated the time it takes to students to develop solutions for their exercises. My class lasts two hours an a half and it is barely enough time to develop an exercise fully. In the first two weeks I planned to have time for three different activities, now I know I have to basically focus on only one. Yesterday, I assigned a data analysis and presentation exercise (to be developed with Tableau) based on a real-world data set and the two hours an a half have been barely enough.

The exercises need to be broken down into smaller (timed) steps. It is very hard for students to mentally “digest” a complex exercise as a single unit. A much better strategy is to break the exercise down into smaller steps that lead to the final solution. A big advantage of this strategy is that at the end of each step I can comment on what the students have just done and how it connects to higher level concepts. I try to enable this generalization by providing comments and also asking a few direct questions. This seems to work really well. For instance, I broke the data analysis and presentation exercise down to: (1) familiarize with the data and its meaning; (2) generate analytical questions; (3) do data analysis in Tableau; (4) collect the results and organize them in a narrative; (5) write a final document. At each of these steps there are plenty of comment one can insert. I have also found that timing these steps and making the timing explicit and visible to the students upfront helps organizing the work and make sure that groups are mostly in sync.

Individual or group work? So far I have experimented exclusively with group work. None of the exercises I have assigned in class require students to work on something individually. My biggest fear is that group dynamics may actually play against students who are particularly shy or simply do not feel like doing the required work for that day. This is an aspect I still need to perfect. What I found so far is that simply walking around and offering help seems to keep the students alert and active. I also try to actively spot students who seem to be on the verge of  disengagement and encourage them to participate more. Yet, at the same time, in a few instances I found that my interventions was not needed and that students have a very clever and natural way to alternate between solitary reflection and sharing with the group. If anything, what I am learning is to trust the process and my students much more and give them freedom to organize their work as they wish.

Am I useful/needed after all? Let’s admit it. Standing in class and walking around without uttering a word does not feel as good as doing the somewhat self-righteous work of giving a lecture. I am no longer the main deal. Students don’t look at me, they have their heads buried in their problems. Is that bad? No. It’s actually good. But I keep asking myself, as the class unfolds, what is the best way for me to be useful here? It feels like swimming in a new kind ocean. My sense is that I still have a lot to learn. For now what I am doing is the following: (1) I make sure to be available for everyone and to devote the same amount of time to every group; (2) I check that no one is having major problems and/or being overshadowed by other students within a group; (3) I follow in real time what students are producing (see note on GDoc below) and make sure to stop at regular intervals to comment on what they are doing right/wrong and to channel their thinking towards elements of the exercise that generalize to other cases. An important note: I am not allowing myself to check my emails or use my phone, it’s so easy to be sucked in and lose focus. I am now some kind of coach and need to be totally focused on the class dynamics.

Room space and arrangement is crucial. In my room there is not enough space and we don’t have tables, only chairs with folding tables (which by the way I always hated as a student also). This is definitely not optimal. So, if you are reading this and are making plans for your class, try to get a spacious room with tables! In my case I literally force students to turn their chairs in a way that they are arranged in circles. I truly believe this is crucial to make sure students are actually working together. I actually tell them they should do it because I am worried they would develop neck pain if they do not turn their chairs, but it’s more because I want them to face each other :)

How about theory and principles (a.k.a. do they get it?)? I was explaining to a colleague the other day my flipping experiment and her concern was: “but are they getting the theory right?“. Good point. Is all of this active work going to translate into higher-level knowledge students will be able to internalize and transfer to other situations? The honest answer is: I don’t know. Yet, a few reflections are due here. First, there’s tons of education research in support of active learning. Students to get the higher level knowledge that is needed. Second, my experience is that students won’t get it when you lecture them either! This is the real reason why I am doing this. My hope is that applying the principles in practical exercises is going to cement the knowledge they acquire while reading and watching the video lectures. I am planning for some additional assessment later on in the course. Hopefully the results will be positive.

Real-time exercise development in Google Docs works like magic. I talked about how I use Google Docs in class before. I totally love it. Let me describe this again. Every single exercise I assign in class requires producing material that goes in a GDocs file (one for each group). I create a folder and ask students to put their files there. Since GDoc files update in real time, I can literally follow what each group is doing without interfering too much with their work. This is such a powerful tool! It feels like magic. Watching how the exercise develops enables me to figure out what is happening and intervene when necessary. Not just that, I can also use examples from one group, to show to all the others group positive or negative aspects of a given solution. And this is priceless.

Tableau is great for teaching. This week the exercise we developed in class requires the students to develop a solution with Tableau. While Tableau is not necessarily super intuitive all the time, it has the big advantage that moving first steps is extremely easy. Students can very quickly produce initial charts. When then they get stuck with something, I explain how to solve it, and some little learning happens. Another practical advantage of Tableau is that students feel like learning it is a good investment of their time: Tableau is very popular and highly requested in jobs applications. Finally, Tableau teaches students to think in terms of mapping data attributes to visual channels, which is the foundation of visual encoding; by far the most important piece of knowledge taught in a visualization course.

Ok … That’s all for now. I really hope you’ll find this useful. Your feedback and questions are very welcome!

InfoVis Course Diary: Dabbling with Data Abstraction

Chapter on Data AbstractionIn the second week of my course we start directly with the “data abstraction” chapter taken from Tamara’s book, which is the official textbook in my course.

Data abstraction is mostly about describing data in a way that is instrumental to visualization design. That is, by detecting certain structures and information within them, you can think ahead about how to navigate the visualization design space. This is where you learn, among other things, the extremely valuable concept of attribute type: categorical, ordinal and quantitative.

Tamara did an excellent job at creating a consistent catalog of data abstractions; which covers an extremely wide set of cases and situations. Here is an excerpt from the data abstraction summary you can find in the book chapter (side note: I absolutely love the way the book provides diagrams summarizing the content at the beginning of each chapter):

Data Abstraction

Summary of “Data Abstraction” chapter from “Visualization Analysis and Design” book.

That said, there are a few things my students and I are always struggling with. Here is an account of the problems and possible solutions.

  1. Some data abstractions are way to complex and cover rare cases. It is clear that Tamara tried to be as complete as possible and, as such, her effort is highly laudable. The need for completeness however creates a tension with simplicity. As you can see in the diagram, some abstractions are very familiar: table, network, tree, etc. But then things get way muddier with grids, spatial fields, geometry, clusters and sets. I have been through this many times and invariably when we get to this chapter my students are confused. I must confess I am confused too. Here are examples of questions I received this semester: “I don’t really understand the concept of field and geometry, especially the difference between them” or “While looking at the continuous field example about sea temperature of locations on the
    planet, it somehow seems like to be geometry dataset type?“. What is the solution? I don’t know. As I said, I like the completeness and the consistent approach, but I am also concerned with how much students will be able to retain. Maybe we can create a “data abstraction light”? My data abstraction light would be something along these lines. There are two data set types: tables and networks. Each of these contain attributes of three possible types: categorical, ordinal and quantitative. Some of these may represent time and/or geography. Too simplistic?
  2. Is data abstraction a matter of “describing” or “designing” data structures? Whenever I teach data abstraction, I feel like there is one important part I should be teaching and it’s not developed enough: the art of “sculpting” your data so that it has the “shape” it is need to solve the problem you want to solve. To be fair, Tamara does talk about this, but I believe this part is not developed/structured enough. While data abstraction helps you describe what kind of information your data contain (description), you also need to figure out how your data can and should be molded to get to the desired solution (design). This is a crucial vis design activity and it’s never acknowledged enough. Let me state it in a different way: your role as a visualization designer is not only to find how to represent the data you have, but also what to represent in the first place. This is a huge aspect of visualization design which is very often overlooked! Now … In how many ways can data be manipulated? And which of these ways do we need to teach? Above all, I believe that anything resembling an SQL query (or Pivot Table in Excel-ese) is a fundamental step (this is why I believe Tableau is so successful: ultimately it’s a database query system). So, fundamental operations in a table are: selecting attributes, filtering rows in a principled manner and aggregating them according to aggregate operations. In a way, every chart can be described as an SQL query over a data set. Other fundamental operations are those that transform a data set or attribute from one type to another: from table to network, from quantitative to categorical, from place names to their coordinates, etc. I believe this is so crucial because there is never a unique way of looking at a given data set. In a way rather than calling this step “data abstraction” I would even call it “data interpretation” Or … “data design“? Or … “data sculpting“? For instance, in class I often use the Aid Data data set. It’s a fantastic data set recording information about financial disbursements between countries for aid purposes over time. The data is stored as a table/spreadsheet and contain, among other fields: origin, destination, time, amount, purpose. Now, if I select origin, transform it into spatial coordinates, and aggregate over amount, I can create a nice bubble chart map of donors. But if I select origin, destination and amount, I can create a weighted node-link diagram. Whereas, If I select  purpose and aggregate by amount, I can create a bar chart of how disbursements distribute across purposes. You see how crucial this is?
  3. Not enough emphasis on the relationship between analytical questions and “data shapes”. This is somewhat related to my last observation. Data abstraction and transformation do not happen in a vacuum, they are instrumental to achieving a data analysis and presentation goal. But data analysis and presentation presuppose looking for answers to a series of questions. Ultimately, this is what happens in practice: finding the right “shape” for your data is guided by your desire to pursue some questions and goals. This is easier said than done. One caveat in visualization is that not all questions are perfectly laid out in front of us. Sometime we “discover” new questions as we proceed in our analysis. In any case, it is true that virtually every single visual representation has one or more possible questions attached, that is, questions that can be answered by looking at it (and new questions that cannot be solved by looking at it by the way). I now feel that this tight relationship between questions, data sculpting and representation needs to be highlighted and trained. I have done a bit of this in class already, through exercises, and my students seem to have learned a lot. One student at the end of the class told me: “Prof. I loved this exercise today!“. On a side note, this is why I am so happy I no longer need to give lectures in class. I can afford teaching students concepts through practice. They end up developing a sense of what these concepts really mean, not only intellectually, but also in practical (more internalized) ways.

That’s all for now. My next post will be on the third module I teach, which is “fundamental variations of charts”. Let me know what you think! This is very much work in progress!

Thanks for reading.