Category Archives: Course Diary

InfoVis Course Diary: The Course Recap Exercise

The last day of my course this year I decided to try a new experiment: the “course recap exercise”. I asked each student (individually) to create a new google doc file in our class folder and to answer questions about what they have learned during the course.

It occurred to me that if I have done my job right, students should be able to remember the most important ideas and principles presented during the course.

I also wanted the students to have an opportunity to actively reflect on what knowledge and skills the course provided to them, with the hope that in so doing they would cement them even further.

The exercise turned out to be incredibly instructive, both for me and for my students.

Here is what I asked precisely:

  • Question 1: what are the top ten most important concepts/ideas you have learned in this course? I asked the students to avoid talking with their peers and looking into our study material but just try to recall concepts from their mind.
  • Question 2: What is the most important idea/concept you have learned about each of the following topics (focus on 1 only): Data Abstraction, Fundamental Charts, Visual Encoding, Color and Other Channels, Visualization Guidelines, Interaction and Multiple Views, Spatial Data. In this second step I asked the students to recall one specific idea or concept for each of the major modules we worked on (no, we did not cover networks and trees and the world did not collapse). This second step ensured me they would not try to recall at least one concept from each of the main topics we covered.
  • Question 3: What is the most important lesson you have learned in each of this type of activity: Projects, Data Analysis and Presentation, Chart Decomposition, Vis Design Workshops. In this final question I aimed at facilitating mental processing of lessons learned while doing practical work.

The results of this exercise went way better than I had imagined. I was really surprised by how many concepts, ideas and principles the students recalled. At the end of each question I also spent considerable time probing the students with more detailed questions trying to understand if they only understood the concepts superficially, parroting the things that I went repeating throughout the course, or they had internalized them in ways that allow them to reason productively. To my surprise, most of the probing went pretty well and confirmed me that understanding visualization concepts is not particularly hard.

On a negative note, however, all this knowledge does not translate into being particularly proficient in developing effective visualizations. As I will explain in future posts I did not see huge improvements in the way students designed and developed their projects. I keep seeing a big gap between visualization theory and practice.

On the value of nongraded assessments.

In any case, this little exercise was another opportunity for me to test a concept I started experimenting with this year: the idea that assessment and grading do not need to go together and that actually coupling them together can also be detrimental.

All students seemed to be pretty relaxed in answering the questions I posed and discussing their answers allowed us to discuss many details in much more depth.

I don’t know why it occurred to me to assign this exercise only at the end of the course but I believe the same structure can be used at regular intervals during the course to better understand what students are learning and how to fix potential gaps.

InfoVis Course Diary: Developing Visualization Design Workshops

During the last four/five weeks of the course I have been assigning visualization design exercises in class. The main idea is to assign practical design problems to students to solve in class during a workshop of about 2.5 hours. Here is a description of how I am organizing the workshops and what I have learned so far.

Philosophy

The reason why I decided to create and assign design exercise in class is because in past years I have always been frustrated with how little students learned by listening to my lectures and by observing how quickly they learned once we worked together on the design problems they had to solve for their group project. The philosophy behind the workshops is therefore to bring more of this experience in class, together with the advantage of having the whole class working on the same problem in a much more structured way (in group projects each team solves a different problem independently). Another advantage of workshops is that when I give feedback to one group of students all the other students can listen and relate what I am saying to their own solution.

Preparation (Reverse Engineering Vis Projects)

To prepare these exercises I decided to use the following strategy: reverse engineer existing visualization project I like. I start from an existing project and I effectively go from solution back to the original problem. Inspiration can come from many sources: papers published at IEEE VIS or ACM CHI; projects developed by some established visualization designer or newspaper; class projects developed during past editions of the course.

(Note: I recently discovered Shiqing He and Eytan Adar use the same strategy in their beautiful and admittedly much more advanced VizItCards method)

How do I do that? I focus on the following elements:

  • Problem Statement: What is the original problem they wanted to solve? Who has this problem? Why is it interesting and important?
  • Data Set: What data set did they use? Is the data set available? If not, can I use a similar one?
  • Questions: What are the driving questions they want to answer by looking at the visualization?

Execution

Once I get in class, I typically follow this sequence:

  1. I ask my students to form groups (basically the same every week).
  2. I read the whole text of the exercise and make sure everything is clear to everyone.
  3. I ask students to read everything on their own.
  4. I ask students to create individual solutions first.
  5. I ask students to discuss their solutions and create a group solution.
  6. Student put their solutions in a shared google doc, one for each team (I don’t care about possible cheating).
  7. I create a set of slides (on the fly) with students’ solutions and comment on them in front of the class.

What I noticed during these last few weeks is that the are many implementation details one needs and should work on to tune the execution of the workshop.

  • Group and individual work. I still have to find the right balance between individual and group work. But after some experimentation I believe both are highly needed.
  • Material and techniques for creating mockups. I am not particularly happy with the unstructured way I let students create their mock-ups. Some are very good, some are very bad. In the future I want to give more precise instructions and unify the way mockup are developed.
  • Using whiteboard and markers to create mockups. I discovered, almost by chance,  that using whiteboards is a much better way for groups to generate and discuss mockups. If you can afford having multiple whiteboards or, even better, you have the luxury of having “whiteboard walls” in your classroom you should try this out. The biggest difference is that with whiteboards everyone can see what is happening and participation and feedback happen much more naturally.
  • Deriving patterns and principles from the solutions. Ideally, at the end of each workshop we should be able to derive general principles students can apply to other cases, beyond the specifics of the exercise. I truly believe this is an important pedagogical step. I started collecting some of these ideas and patterns at the end of each workshop but I’d like to find a better and more systematic way to do it in the future. Ideally, these patterns and principles may be reused in future workshops as the material develops further.

Sharing workshop material

I plan to share all of the exercises I created for my course later on in 2017 and make it available for everyone to use. I just need to give it a more decent shape. I’d be more than happy to develop these exercises further together. Just stay tuned!

InfoVis Course Diary: Basic Charts Need to Be Learned First

In the third week of my course I introduce fundamental charts. These are charts that are super common and most people are familiar with: bar charts, histograms, line charts, scatter plots, heat maps, etc. This is a major departure from the textbook I use, which introduces methods to visualize tabular data only in later chapters.

Why do I start with basic charts?

It’s pedagogically better to start with basic rather than advanced charts. I truly believe one must first learn how to use basic charts properly before moving on to other more exotic territories. Within their constrained space, there is a lot to learn and there are infinite variations and tweaks one can apply.

These charts, in their most basic format, cover all possible combinations of two attributes, thus giving students a manageable and yet powerful mental model to think about how pairs of attributes can be combined: a scatter plot is made of 2 quantitative attributes; a bar chart is made of 1 categorical and 1 quantitative attribute; a line chart is also made of 2 quantitative attributes but one is special as it represents time;  a heat map is a combination of two categorical attributes and their frequencies, etc.

These charts are also infinitely “tweakable” while retaining simplicity. For instance, what happens to these charts when you need to map an additional 3rd attribute? In scatter plots you can use color and/or size. In bar charts you can use stacked or grouped bars. In line charts you can add multiple lines.

Take this scatter plot below in which I am plotting data from the USDA food nutrients database (each dot is a food, axes are: amounts of mono-saturated and poly-saturated fats).

Scatter plots are infinitely “tweakable”. Here we progressively encode 2, 3 and 4 attributes with position, color hue and size.

In its most basic format it encodes two quantitative attributes (the amount of two type of fats). The next one encodes a third categorical attribute (food type) using color hue. And the final one encodes a fourth attribute (amount of water) with size. That’s a very gentle and yet solid way of introducing fundamental visualization concepts. In class I am actually cycling through many of these examples, starting from the most basic chart and asking students to accommodate more data or needs.

With these basic charts one can also start introducing examples of problematic design solutions. For instance, it’s easy to talk about the “truncated axis” and the “dual axis” problems.

See, for instance, the infamous Planned Parenthood hearing charts that made the news last year.

Misleading chart in which the dual-axis method has been used to give a false impression about the data.

Another aspect is that basic charts are an amazing toolbox for the visualization designers because the very large majority of existing problems can be solved with them. It’s very rare to find a visualization problem that cannot be solved, at least in a first approximation, with these basic designs. Plus, whenever needed, you can always try to apply little modifications to conform them to your needs. I really believe there are way more interesting designs one can generate by tweaking these basic charts than trying to come up with something entirely new.

Finally, basic charts are the most familiar ones. If in your project familiarity is an important aspect you don’t want people to spend time figuring out how to decode your charts. This is an aspect that is often overlooked in visualization. When presented with a new visualization the first thing the reader/user needs to figure out is “how do I actually read this?“.

Some questions from the students …

Before I conclude I want to briefly touch upon a few recurring questions students asked in their reading response exercise (I’m paraphrasing).

Q: “Are pie charts really so bad? I like them!

Students are always puzzled when they see that pie charts are so heavily criticized by some people. I am personally not interested at all about the pie charts debate. I am not because I do not think it’s really consequential. In any case what I tell my students is that they should never use rules blindly. So pie charts are a good example to exercise their own good judgment. When are they appropriate? When are they not? This is what matters the most to me. I am myself not a big fan of pie charts but I do not believe they are evil, and I do believe there situations in which it may be reasonable to use them. Robert Kosara has published a good number of interesting blogs posts and papers on the topic.

Q: “How do we deal with people’s subjective preference for some charts?

That’s actually a big one. Students at the beginning of the course are still puzzled by the idea that some charts may be objectively better than others in some contexts and for some tasks. Because of that, they feel lost when they think that some people reading their charts may actually find them not exactly of their favorite “taste”.

This is too long a subject to be developed fully here. But the main thing to learn is that charts need to be effective before being “pleasurable”. Aesthetics plays a role of course but it cannot subsume effectiveness. Therefore one needs to learn what is effective and what is not and then find a way to “inject” the right aesthetic sense within these constraints.

The best data visualizations out there (and the best designers by the way) are those that are great at finding this fine balance.

But there is more to say on the topic. A very important principle, dear to user experience designers, is that good design is about giving people what they need, not necessarily what they tell you they need. It’s your responsibility as a designer and as an engineer to figure out what is the best solution for your audience, you can’t rely exclusively on what they tell you they want or need.

Q: “How do we deal with large data sets? These charts do not scale!

Correct. Many of the basic charts do not scale to large data sets or large number of values. But this is exactly the point! By realizing what the limitations of basic charts are, one is forced to think about what the alternatives are. It’s a very useful step from the pedagogical point of view.

But even before moving to more exotic solutions, students first need to figure out how scalable basic charts really are. A very consistent trend I found in students is that they are afraid of making charts small and, because of that, they largely underestimate how scalable they really are!

As Tufte’s work on “sparklines” demonstrates, charts can be incredibly small and still convey a lot of information. So, while there are cases in which standard charts may actually not be sufficiently scalable, I consider it crucial to first show students how to make charts small. Very small.

That’s all for now. Hope this helps!

InfoVis Course Diary: Update on Class Flipping

Flipping ClassroomMy adventure on flipping my InfoVis class is moving on. Here is a brief update on what is happening in class and a few small lessons learned during the first four weeks of the course.

Please keep in mind this is work in progress and pretty much a brain dump. I may in the future find that some of these ideas are not as brilliant as they look right now.

Students need more time than you think to develop their exercises in class. During the first three weeks I have consistently underestimated the time it takes to students to develop solutions for their exercises. My class lasts two hours an a half and it is barely enough time to develop an exercise fully. In the first two weeks I planned to have time for three different activities, now I know I have to basically focus on only one. Yesterday, I assigned a data analysis and presentation exercise (to be developed with Tableau) based on a real-world data set and the two hours an a half have been barely enough.

The exercises need to be broken down into smaller (timed) steps. It is very hard for students to mentally “digest” a complex exercise as a single unit. A much better strategy is to break the exercise down into smaller steps that lead to the final solution. A big advantage of this strategy is that at the end of each step I can comment on what the students have just done and how it connects to higher level concepts. I try to enable this generalization by providing comments and also asking a few direct questions. This seems to work really well. For instance, I broke the data analysis and presentation exercise down to: (1) familiarize with the data and its meaning; (2) generate analytical questions; (3) do data analysis in Tableau; (4) collect the results and organize them in a narrative; (5) write a final document. At each of these steps there are plenty of comment one can insert. I have also found that timing these steps and making the timing explicit and visible to the students upfront helps organizing the work and make sure that groups are mostly in sync.

Individual or group work? So far I have experimented exclusively with group work. None of the exercises I have assigned in class require students to work on something individually. My biggest fear is that group dynamics may actually play against students who are particularly shy or simply do not feel like doing the required work for that day. This is an aspect I still need to perfect. What I found so far is that simply walking around and offering help seems to keep the students alert and active. I also try to actively spot students who seem to be on the verge of  disengagement and encourage them to participate more. Yet, at the same time, in a few instances I found that my interventions was not needed and that students have a very clever and natural way to alternate between solitary reflection and sharing with the group. If anything, what I am learning is to trust the process and my students much more and give them freedom to organize their work as they wish.

Am I useful/needed after all? Let’s admit it. Standing in class and walking around without uttering a word does not feel as good as doing the somewhat self-righteous work of giving a lecture. I am no longer the main deal. Students don’t look at me, they have their heads buried in their problems. Is that bad? No. It’s actually good. But I keep asking myself, as the class unfolds, what is the best way for me to be useful here? It feels like swimming in a new kind ocean. My sense is that I still have a lot to learn. For now what I am doing is the following: (1) I make sure to be available for everyone and to devote the same amount of time to every group; (2) I check that no one is having major problems and/or being overshadowed by other students within a group; (3) I follow in real time what students are producing (see note on GDoc below) and make sure to stop at regular intervals to comment on what they are doing right/wrong and to channel their thinking towards elements of the exercise that generalize to other cases. An important note: I am not allowing myself to check my emails or use my phone, it’s so easy to be sucked in and lose focus. I am now some kind of coach and need to be totally focused on the class dynamics.

Room space and arrangement is crucial. In my room there is not enough space and we don’t have tables, only chairs with folding tables (which by the way I always hated as a student also). This is definitely not optimal. So, if you are reading this and are making plans for your class, try to get a spacious room with tables! In my case I literally force students to turn their chairs in a way that they are arranged in circles. I truly believe this is crucial to make sure students are actually working together. I actually tell them they should do it because I am worried they would develop neck pain if they do not turn their chairs, but it’s more because I want them to face each other :)

How about theory and principles (a.k.a. do they get it?)? I was explaining to a colleague the other day my flipping experiment and her concern was: “but are they getting the theory right?“. Good point. Is all of this active work going to translate into higher-level knowledge students will be able to internalize and transfer to other situations? The honest answer is: I don’t know. Yet, a few reflections are due here. First, there’s tons of education research in support of active learning. Students to get the higher level knowledge that is needed. Second, my experience is that students won’t get it when you lecture them either! This is the real reason why I am doing this. My hope is that applying the principles in practical exercises is going to cement the knowledge they acquire while reading and watching the video lectures. I am planning for some additional assessment later on in the course. Hopefully the results will be positive.

Real-time exercise development in Google Docs works like magic. I talked about how I use Google Docs in class before. I totally love it. Let me describe this again. Every single exercise I assign in class requires producing material that goes in a GDocs file (one for each group). I create a folder and ask students to put their files there. Since GDoc files update in real time, I can literally follow what each group is doing without interfering too much with their work. This is such a powerful tool! It feels like magic. Watching how the exercise develops enables me to figure out what is happening and intervene when necessary. Not just that, I can also use examples from one group, to show to all the others group positive or negative aspects of a given solution. And this is priceless.

Tableau is great for teaching. This week the exercise we developed in class requires the students to develop a solution with Tableau. While Tableau is not necessarily super intuitive all the time, it has the big advantage that moving first steps is extremely easy. Students can very quickly produce initial charts. When then they get stuck with something, I explain how to solve it, and some little learning happens. Another practical advantage of Tableau is that students feel like learning it is a good investment of their time: Tableau is very popular and highly requested in jobs applications. Finally, Tableau teaches students to think in terms of mapping data attributes to visual channels, which is the foundation of visual encoding; by far the most important piece of knowledge taught in a visualization course.

Ok … That’s all for now. I really hope you’ll find this useful. Your feedback and questions are very welcome!

InfoVis Course Diary: Dabbling with Data Abstraction

Chapter on Data AbstractionIn the second week of my course we start directly with the “data abstraction” chapter taken from Tamara’s book, which is the official textbook in my course.

Data abstraction is mostly about describing data in a way that is instrumental to visualization design. That is, by detecting certain structures and information within them, you can think ahead about how to navigate the visualization design space. This is where you learn, among other things, the extremely valuable concept of attribute type: categorical, ordinal and quantitative.

Tamara did an excellent job at creating a consistent catalog of data abstractions; which covers an extremely wide set of cases and situations. Here is an excerpt from the data abstraction summary you can find in the book chapter (side note: I absolutely love the way the book provides diagrams summarizing the content at the beginning of each chapter):

Data Abstraction

Summary of “Data Abstraction” chapter from “Visualization Analysis and Design” book.

That said, there are a few things my students and I are always struggling with. Here is an account of the problems and possible solutions.

  1. Some data abstractions are way to complex and cover rare cases. It is clear that Tamara tried to be as complete as possible and, as such, her effort is highly laudable. The need for completeness however creates a tension with simplicity. As you can see in the diagram, some abstractions are very familiar: table, network, tree, etc. But then things get way muddier with grids, spatial fields, geometry, clusters and sets. I have been through this many times and invariably when we get to this chapter my students are confused. I must confess I am confused too. Here are examples of questions I received this semester: “I don’t really understand the concept of field and geometry, especially the difference between them” or “While looking at the continuous field example about sea temperature of locations on the
    planet, it somehow seems like to be geometry dataset type?“. What is the solution? I don’t know. As I said, I like the completeness and the consistent approach, but I am also concerned with how much students will be able to retain. Maybe we can create a “data abstraction light”? My data abstraction light would be something along these lines. There are two data set types: tables and networks. Each of these contain attributes of three possible types: categorical, ordinal and quantitative. Some of these may represent time and/or geography. Too simplistic?
  2. Is data abstraction a matter of “describing” or “designing” data structures? Whenever I teach data abstraction, I feel like there is one important part I should be teaching and it’s not developed enough: the art of “sculpting” your data so that it has the “shape” it is need to solve the problem you want to solve. To be fair, Tamara does talk about this, but I believe this part is not developed/structured enough. While data abstraction helps you describe what kind of information your data contain (description), you also need to figure out how your data can and should be molded to get to the desired solution (design). This is a crucial vis design activity and it’s never acknowledged enough. Let me state it in a different way: your role as a visualization designer is not only to find how to represent the data you have, but also what to represent in the first place. This is a huge aspect of visualization design which is very often overlooked! Now … In how many ways can data be manipulated? And which of these ways do we need to teach? Above all, I believe that anything resembling an SQL query (or Pivot Table in Excel-ese) is a fundamental step (this is why I believe Tableau is so successful: ultimately it’s a database query system). So, fundamental operations in a table are: selecting attributes, filtering rows in a principled manner and aggregating them according to aggregate operations. In a way, every chart can be described as an SQL query over a data set. Other fundamental operations are those that transform a data set or attribute from one type to another: from table to network, from quantitative to categorical, from place names to their coordinates, etc. I believe this is so crucial because there is never a unique way of looking at a given data set. In a way rather than calling this step “data abstraction” I would even call it “data interpretation” Or … “data design“? Or … “data sculpting“? For instance, in class I often use the Aid Data data set. It’s a fantastic data set recording information about financial disbursements between countries for aid purposes over time. The data is stored as a table/spreadsheet and contain, among other fields: origin, destination, time, amount, purpose. Now, if I select origin, transform it into spatial coordinates, and aggregate over amount, I can create a nice bubble chart map of donors. But if I select origin, destination and amount, I can create a weighted node-link diagram. Whereas, If I select  purpose and aggregate by amount, I can create a bar chart of how disbursements distribute across purposes. You see how crucial this is?
  3. Not enough emphasis on the relationship between analytical questions and “data shapes”. This is somewhat related to my last observation. Data abstraction and transformation do not happen in a vacuum, they are instrumental to achieving a data analysis and presentation goal. But data analysis and presentation presuppose looking for answers to a series of questions. Ultimately, this is what happens in practice: finding the right “shape” for your data is guided by your desire to pursue some questions and goals. This is easier said than done. One caveat in visualization is that not all questions are perfectly laid out in front of us. Sometime we “discover” new questions as we proceed in our analysis. In any case, it is true that virtually every single visual representation has one or more possible questions attached, that is, questions that can be answered by looking at it (and new questions that cannot be solved by looking at it by the way). I now feel that this tight relationship between questions, data sculpting and representation needs to be highlighted and trained. I have done a bit of this in class already, through exercises, and my students seem to have learned a lot. One student at the end of the class told me: “Prof. I loved this exercise today!“. On a side note, this is why I am so happy I no longer need to give lectures in class. I can afford teaching students concepts through practice. They end up developing a sense of what these concepts really mean, not only intellectually, but also in practical (more internalized) ways.

That’s all for now. My next post will be on the third module I teach, which is “fundamental variations of charts”. Let me know what you think! This is very much work in progress!

Thanks for reading.

InfoVis Course Diary: Flipping My Class and Other Innovations

flipped-classroomLast week I started a new edition of “Information Visualization“, the course I give at NYU Tandon, my university. With this blog post I am starting a new series, similar to the one I did a couple of years back, in which I write about the course and ideas that originate from my experience while teaching it.

In this first post I want to talk about new ideas I am implementing, in the hope this will be inspiring and helpful to other instructors like me. Last summer, I spent quite some time reviewing education research and reading books on the topic (while preparing my NSF CAREER), and I felt I really needed to change a few things in the way I am teaching. Here are some of the more most prominent ones.

Flipped classroom model. For the first time this semester I will be using a flipped classroom model. In this model students read assigned material and watch my recorded lectures (which I had recorded in previous semesters) at home. Time in class is entirely spent on feedback, discussions and exercises.

Why do I do that? First, because throughout the years of teaching I learned that I am the most valuable when I am giving feedback to my students, not when I am lecturing. The get the best out of me when I show them what is a better way to solve a given problem. Second, because lectures allow (encourage?) students to be totally passive and to disengage (a weird kind of luxury given how much it costs to be an NYU student). When this is then compounded with how pervasive digital distractions are in today’s classroom, this is a perfect recipe for disaster. Finally, I am just bored of spending two and a half hours talking and seeing everyone progressively “fade away” (together with my voice).

Last week I gave my first “flipped class” and I believe it went very well. There are a few things I’ll need to fix, but overall it seems to work. It’s work in progress and I expect to learn a lot.

Each week I ask my students to answer a few questions about the readings/videos I assigned them to read/watched and I use their answers to provide feedback to everyone at the beginning of each class. Three benefits: (1) I now have time to review all students’ responses before going in class (as opposed to preparing the lecture); (2) I can cluster them into major themes/issues and use this information to improve the course; and (3) since I discuss this in front of the class, everyone can hear what I have to say and still reacts and benefit from it. Next week, I’ll assign the first hands-on exercise in class. I am hopeful this also will work just right.

Decoupling grading and assessment. This is another no-brainer I figured out only a few months ago: assessment and grading do not need to coincide. The main goal of assessment is not to assign a grade to each student, but: (1) to inform the instructor about how much and what students are learning and (2) to inform the students about what aspects of the assigned material they still need to work on. When you see it this way, it transforms completely the way you think about assignments and exercises. I am relieved by the burden of turning everything into a grade and at the same time I now see this as my little tool I use to decide what to talk about in class and tweak it.

In my new model, assessment and grading are decoupled. I use assessment to guide myself and the students, and grading, through a test later on to give a final grade to a given set of modules.

Of course, I still need to check that students are actually doing the work I ask them to do at home. For this reason all my assignments are mandatory and graded only on a pass/fail scale. My students know that their final grade will be affected only slightly by these assignments but at the same time they know they must submit and take it seriously. In short, it’s hard to fail and your grade is rarely affected, unless you do not take these assignments seriously (in which case you fail badly).

Promoting self-awareness and ownership. This is a hard one. One I am still struggling with. What I noticed in the way we teach, is that we give too little responsibility to students to “own” the burden of becoming knowledgeable in a given topic. While structure, syllabi, grading, etc., have their role, they also have a dark side: they allow students to just blindly follow whatever they are asked to do as automaton; without self-reflection and criticism. This is not a good model. Life out of school does not work this way. It does not matter how good your grades are, what really matters is learning about how to be responsible for your own education and empowerment. In the real world, nobody is feeding you with a spoon, you have to figure things out on your own (jobs where this happens tend to be crap by the way) and this is a way bigger lesson than actually learning the very content of the course.

So now … How do we encourage self-awareness and ownership? It’s hard. And I can’t say I have figured it all out. But I am carefully inserting little exercises and questions to encourage students to reflect more on what they are learning an why. For instance I would ask in the reading response things like: “why do you think this is important?” or “how are you going to apply this in the future?” or “how does this fit in your current and future knowledge?“. This is an area where I want to learn more. For now, I am just moving baby steps. If you have some ideas to share with me, I would be more than happy to learn!

Putting even more emphasis on “vis basics” and UI design. As a visualization designer and researcher, what I have learned throughout the years is that: (1) it is very rare for an “unconventional” chart to beat a conventional one; (2) it’s much more effective to learn how to use (and tweak) a few basic charts, then to explore a very large set of new fancy solutions as a first step; (3) you always need to start from the basics to realize they don’t work, and then change it. With this in mind, I am putting even more emphasis than usual in teaching students how to use well the most basic charts that cover fundamental combinations of 2 or 3 attributes (most charts out there don’t go past beyond 3).

Other than basic charts such as bar charts, heat maps, line charts etc., I’ll spend some time on maps because they are pervasive and are used in virtually every single data visualization job out there. Whereas I am growing way more doubtful about networks and trees. The heretic side of me is willing to heavily reduce the amount of time I’ll spend on them or even cut them out completely. I know many will harshly protest about this, but that’s the way I am thinking right now. Apologies.

A second aspect of my course is my focus on “analytical interfaces“: I teach students how to build interactive visual applications to help people make sense of data. I am teaching CSE students and I want them to acquire a special skill very few people have.

What I noticed in past editions of my course however is that students inevitably end up in a situation where they know how to design a single good chart/view, but have no idea whatsoever how to tie multiple charts and interactive components together in one complex UI. When you think about it, it’s astounding how little information and guidance exists out there on how to do this properly for analytical interfaces! This semester therefore I’ll put even more emphasis on this aspect and I’ll design new material to address this gap.

Asynchronous real-time communication tools to use in class. I started using Slack to communicate with my students inside and outside the classroom. I was fed up with the jurassic tools my university provides to communicate with students and I felt I needed a more direct connection. Slack is just perfect. Now I have a specific team for the course and receiving messages from students does not clutter up my (sacred) email inbox anymore.

But the most surprising advantage of Slack is that I can use it to communicate with my students in class! I know … It looks counterintuitive. But the thing is that in class, especially now that everyone is very active, there is often the need for me to: (1) send a link or document of some sort to everyone and (2) ask students to provide information to me asynchronously but in real-time. This second use is particularly useful. I can for instance ask a question and tell students to write an answer in Slack. This way they have time to work on it and I have time to process the results and produce feedback of some sort while they are working.

I am doing something similar with Google Docs: I assign an exercise to groups of students and ask them to produce the results in a Google Doc. As they do that, I can monitor what they are doing and intervene when necessary. For instance, I can identify issues with one group and help them directly, or sometime I can simply write a comment in the document and they’ll see it coming from me. I will experiment much more with these tools. So far, my experience is extremely positive.

Book suggestion. There is much more I have learned about new/better ways of teaching over the summer. I could go on forever. I just want to conclude suggesting an amazing book I read which inspired many of the changes I am making in the course. The book is “McKeachie’s Teaching Tips“. There are many good books around, but this one covers a lot in a small space. It’s a fantastic starting point to get the basics on a given aspect related to teaching. It provides an overview and lots of references you can then use to dig deeper if you want. It’s highly recommended.

This concludes my first entry for this semester’s course diary. I’ll write again as soon as I have more to say.

Hope you’ll enjoy it!

Course Diary #3: Beyond Charts: Dynamic Visualization

This is the last lecture of the introductory part of my course where I give a very broad (and admittedly shallow) overview of some key visualization concepts I hope will stick in my students’ head. After talking about basic charts and high-information graphics I introduce dynamic visualization as visual representations that can change through user interaction.

Here are the lecture slides: Beyond Charts: Dynamic Visualization.

That’s the magic of computer graphics! The visual representation can respond and change according to our actions. Isn’t that great? Yes it is, but what is it for? This is what I asked to my students at the beginning of this class. I ask because I have the impression interaction in many visualizations comes as an afterthought: let’s put a little bit of hovering there and a nice animated zoom there. But interaction is an integral part of the well-reasoned choices a designers has to make in order to make a visualization effective, it’s not just an additional layer one can add there to add a couple of cool functions.

Interaction is the element of a visualization design that allows people to reason about data and that’s the way I presented it in class. It’s only through interaction that you can smoothly go through a long series of loops of: (1) detect something interesting in the data; (2) trigger a question; (3) change the representation in order to answer that question. Here is the (almost embarrassingly simplified) diagram I have used:

Interaction in visualization

Interaction is basically about reasoning with data though many of these intricate loops, not making it cool. Even though admittedly interaction does make visualization cool. But I guess you want to go past beyond the coolness factor, right? That’s almost too easy to achieve.

Next, I introduce Donald Norman’s 7 stages of action. The model describes the stages humans go through when they interact with the world to achieve a specific goal. Here is a sketch of the model:

Screen Shot 2014-03-06 at 11.09.45 PM

The model has been designed to describe things as simple as opening a door or turning on the volume of you speakers but it works equally well with complex user interfaces. The pedagogical value of the model in my opinion is that it make explicit the fact that interactive visualization is a lot about translation: (1) translating the goals we have in our head into actions and visual search tasks we perform with our hand and eyes and (2) translating (actually decoding and giving a meaning) to the changed visual representation we have in front of us after changing it through our actions. Our role as visualizations designers is to make these translations as smooth and natural as possible. Norman calls these critical points “gulf of execution” and “gulf of interpretation”. Easy and effective.

The comments I received after the lecture in our internal forum confirmed that the model does help students wrapping their head around the role of interaction in visualization so I am glad I included it. One student commented: “It is really interesting to see a process, which we all manage, unconsciously broken down to separate steps, where we can surprisingly easily relate those steps to our own experiences. ” Another one wrote: “I was really intrigued with Norman’s 7 Stages of Action. It seems like a really logical way to think holistically about interaction design.

During the rest of my lecture I described this paper: Yi, Ji Soo, et al. “Toward a deeper understanding of the role of interaction in information visualization.” Visualization and Computer Graphics, IEEE Transactions on 13.6 (2007): 1224-1231. This is a super useful paper if you want to learn more about the role of interaction in visualization. The thing I like the most about it is that it describes interaction techniques in terms if “intent” rather than how they are implemented. I like this approach because it abstract away from the technicalities of the technique and creates a more direct connection between interaction and reasoning. These are the categories:

Mark something as interesting (Select)
Show me something else (Explore)
Show me a different arrangement (Reconfigure)
Show me a different representation (Encode)
Show me more or less detail (Abstract/Elaborate)
Show me something conditionally (Filter)
Show me related items (Connect)

If you have never read this paper I suggest you to give it a look, it’s a very good read. Another very good read on the same topic is the more recent: Heer, Jeffrey, and Ben Shneiderman. “Interactive dynamics for visual analysis.” Queue 10.2 (2012): 30. That’s a very good one too.

One of my students in the forum raised a question about complexity: by introducing all this interaction don’t we risk to make visualization too hard to use and understand? Yes, I think there is a very high risk to make things too complex and more interaction does increase the need of users to learn how to use the system. It’s wise to adopt a parsimony principe when we talk about interaction in visualization. Cramming twenty different techniques in one system for the sake of it it’s not going to work. Interaction is a dangerous tool and it must be used with great care. The best is when it blends smoothly into the visual representation and makes important questions easy to answer.

Overall I think we still have to learn a lot about interaction. Most visualizations on the web are static, and most of the interactive ones are either not very well designed or very limited. While little interaction may be necessary for visual data presentations, more rich and well-integrated interaction is crucial for analytical reasoning. If we want to help people reason about data and derive useful insights we have to better understand how to support this complex process.

That’s all for now. Thanks for reading.

Course Diary #2: Beyond Charts: High-Information Graphics

Visualization of a million items.

Hi there! We had a one week break at school as the inclement weather forced us to cancel the class last week.

Here are the lecture slides from this class: Beyond Charts: High-Information Graphics.

In this third lecture I have introduced the concept of “high-information graphics”, a term I have stolen from Tufte’s Visual Display of Quantitative Information. For the first time, I decided to introduce this concept very early on in the course because I noticed students have a very hard time conceptualizing visual representations where lots of information is visible in one single view. In the past I have seen lots of students squeezing a million items data sets into a four-bar bar chart. Literally.

The Aggregation Twitch

I coined the term aggregation twitch hoping my students will remember the concept in the future. The aggregation twitch is the tendency to overaggregate data through summary statistics. When confronted with a data table many think: “how can I reduce this to a few numbers?”. I think Tufte captured the phenomenon just right:

Data-rich designs give context and credibility to statistical evidence. Low-information designs are suspect: what is left out, what is hidden, why are we shown so little?

Then, commenting on what’s the difference between high vs. low information designs:

Summary graphics can emerge from high-information displays, but there is nowhere to go if we begin with a low-information design.

I love this last sentence because, in its simplicity, it suggests some kind of stance or attitude in designing visualization.

In order to make the concept more explicit I presented an example from one of my past students. He was assigned the task to create a visualization from the Aid Data data set, which contains more than a million items and several attributes like donor, recipient, date, purpose, etc. His first implementation was a funny (in some perverse way admittedly) line plot with four lines and a lot of options to decide what data segments to display. I was stunned! But since then I kept thinking about that example and how pervasive this aggregation attitude is.

My students seem to have grasped the concept, even though I regret I did not provide any positive example. I spent quite some time explaining why I think this is a limited way of doing visualization but I forgot to prepare and show counterexamples. Not good.

The query paradigm and the notion of overview

My student’s example gave me the opportunity to discuss a related problem I often see: relying excessively on data querying. That’s the way most students think about data visualization initially: create one simple chart and provide lots of options to select what statistical aggregates to display. Interestingly, this is the same way most data portals present they data by default; and by the way why most fail to produce anything interesting since many many years.

The problem with this approach is that there is very limited space for data comparison and rich “graphical inference”, which is exactly what our brain is good for. What many don’t get is that as soon as you change parameters the old chart is not visible anymore and you have to rely on memory rather than perception to relate what you see now to what you saw before. But the very reason why visualization is so powerful, is exactly because the information you need is there in front of you, and can be accessed any time. A concept fantastically expressed by Colin Ware in his book when he writes: “the world is its own memory” [1].

In order to make the distinction clearer I proposed to summarize the concept through this simple dichotomy:

Query paradigm: ask first, then present.
Visualization paradigm: present first, then ask.

The query paradigm forces you to initiate the analysis by thinking what you want first. The hard way. But visualization, for the most part, works in reverse: you first see what is in the data and then you are kind of forced to ask some questions as you detect interesting patterns you feel compelled to interpret and explain.

At this point one of my students jumped up and said: “no wait a minute … in order to create a data visualization you have to have some kind of question first!”. I fully agree. Visualization should be built with a purpose in mind. I think the difference is more in whether the current design provides an overview over your data set or not. The query paradigm chops data in sealed segments one can see only individually; one at a time. But the visualization paradigm tries to build a whole map of your data and let you navigate through this entire space.

Note that I am not necessarily claiming one is better than another! There are many great uses of query interfaces. What worries me the most, to be true, is that the query paradigm is so pervasive that it ends up being the only solution people may consider when approaching visualization problems for the first time.

Where does the aggregation twitch come from?

Why students have a hard time assimilating these concepts? Why are high-information graphics so foreign to most of them? Why do they have a hard time grasping this concept? I think there are at least two main issues at play here:

  1. Underestimation of visual perception. When I work with students, in or out of my class, it always amazes me how fearful they are to make their charts smaller. They fear they will be too hard to see and I keep pushing them to make the damn thing smaller. Much much smaller. The human eye is an incredibly powerful device but it looks like most people do not realize how powerful it is. Probably because we take it for granted. Colin Ware has a nice section in his Information Visualization book on visual acuity [2] which I suggest to read to everyone. It’s such a fascinating piece of research! For instance, take this: a monitor has about 40 pixels per square inch and the human eye can distinguish line collinearity at a resolution as low as 1/10 of a pixel.
  2. Overestimation of human (short) memory. As I said above, most people approach data visualization with a query paradigm: one big chart and a lot of options to decide what to put there. This may work in some cases but it limits enormously the amount of reasoning we can do with it. We humans can hold a very small set of objects in our working memory at any given time, that’s the famous “magical number seven” (tip: it’s actually more complicated than that but it works for this example), therefore when a chart changes, we can no longer relate the previous set to the current one. Visual perception is orders of magnitudes more powerful than memory. That’s why visualization shines.

There is actually a third issue which did not occur to me until I presented these ideas in class: visual literacy and familiarity (I started getting obsessed with this issue lately). Most of the fancy visualization techniques we develop are totally unfamiliar for most people out there. Not only they need to spend time learning how to decode them, but they may also be totally overwhelmed by the information density carried by these pictures. This became totally clear to me when I presented this Treemap in class (click to see a bigger version):

Treemap

One of my students raised his hand with a facial expression between disgust and pain: “Prof., that’s too much information at once, I cannot bear it”. That’s the thing: while some people (me included) seem to take pleasure from looking at the intricate patterns high-information graphics make, some other people just cannot bear it. Question: is that a learned behavior or it’s more rooted in individual differences we humans have? I don’t know.

That’s all folks … Now I need to prepare for my next lecture (and whole bunch of other stuff by the way:))

[1] Ware, Colin. Visual thinking: For design. Morgan Kaufmann, 2010.

[2] Ware, Colin. Information Visualization. Morgan Kaufmann, 2013 (third edition)

Course Diary #1: Basic Charts

Starting from this week and during the rest of the semester I will be writing a new series called “Course Diary” where I report about my experience while teaching Information Visualization to my students at NYU. Teaching to them is a lot of fun. They often challenge me with questions and comments which force me to think more deeply about visualization. Here I’ll report about some of my experiences and reflections on the course.

Lecture slides for this class: http://bit.ly/infovis14-l2

In the second lecture of my course (the first was a broad introduction to infovis) I introduced basic charts: bar charts, line charts, scatter plots, and some of their variants. These basic charts give me the opportunity to talk about two important concepts: the relationship between data type and graph type (even though in a somewhat primitive way) and graphical perception.

In order to let students absorb graphical perception I spend a lot of time playing graphical trick rather than talking about theory (I’ll do that later on extensively). For instance, I show the “barless bar chart” a bar chart with dots in place of bars:

barless-barchart

But I don’t limit myself to showing these are sub-optimal charts, I invite the students to think about why, and I’ve found this very nicely and naturally introduces broader and more relevant concepts. Let me explain with an example: a line chart without lines.

time-series-dot-plot

It’s easy to argue this does not work well. Especially when you show it paired with a proper line chart. But then you ask: why? Why it does not work as well as the version with lines? I’ve found that students have to stretch their mind and think much more deeply about the issue. Heck I have to think much more deeply myself!

For instance, I realized while discussing this example in class, that a line chart without lines is a very good example of why and when visualization works best: when data understanding is supported by perceptual rather than cognitive processes. A line chart without lines forces us to trace a line between the points. We desperately need that line! It’s not that we don’t use that line at all, it’s more than we draw it in our head rather than seeing it with our eyes. We can still judge the slope and detect patterns of course but it’s much much harder (slower/less accurate)! This simple concept can be applied everywhere in visualization. You get it here, with a simple time line, and you can re-apply it in a thousand different new cases.

Another example I have shown which spurred some interesting discussion is the “colorless divided bar chart”.

colorless-divided-barchart

Once again, this one forces you to think more deeply about graphical perception. A divided bar chart with color is clearly better right? But why? Why is it better? Most students said there reason is because it’s easier to detect which bar is which: red to red, blue to blue, etc. And then I say: yes ok, but why? Why is color helping you here? After all each bar has its own position, and position is a pretty strong visual primitive to encode data (hint: it’s actually the best one). And then I explain that position here is overloaded, that is, it’s used to encode two things at the same time: the groups and the categories within each group and they get mixed up more easily without color. That’s when everyone nodded in class.

At some point I showed four different ways to display a time series (there are many more of course):

time-series-designs

When I showed that, a couple of students raised some interesting questions. One was about the line chart vs. the area chart. The area chart looks pretty good, when is a good idea to use it? One students suggested the area chart has more contrast than the line chart and this made me think that area charts are probably very good when we have lots of them in a small multiple fashion as they they create a closed shape and as such are easier to compare.

By using this technique of starting from a basic chart and stripping it down of some fundamental design elements I have found I can teach a lot. I almost stumbled into this technique by chance but I think it’s very effective and I will use it over and over again in my course.

Besides dissecting charts, another recurring question I get in class is: how do we judge if a visualization is better than another? That’s a super hard question and I am glad I get it all the time. There’s not enough space here to articulate the answer but there is one thing I stress a lot in class: you cannot judge visualization without specifying a purpose.

I think everyone in this field has a tendency to judge visualizations in absolute terms, without considering their context (I have done that multiple times too). Too many believe data visualization is only about “data + visualization” (hence the name right?), forgetting that visualization with a purpose attached is impossible to judge. And again, basic charts, in all their simplicity, offer several opportunities to expose this concept: divided or stacked bar charts? area or line chart? multiple superimposed time lines or small multiples? There’s no absolute best here.

I have only one last comment from this lecture: Tableau is awesome. Coming up with examples and quickly tweaking them by adding and removing graphical properties saved me hours and hours of time. I was initially tempted to draw these examples on my whiteboard and take pictures, then I tried Tableau and it made me smile. A big smile. This makes me also think that Tableau other than being a great analytic and presentation tool can also be used as an excellent didactic tool.

That’s all for this week. Wish me luck for my next class!