Developing a “Data Sensemaking” Course

As we approach the end of the Fall semester I start thinking about the new course I’ll have to teach in Spring. It turns out I decided to design and give a new course I am really excited about. It’s name is “Data Sensemaking” and I will be teaching my students how to derive knowledge out of data using interactive visualization and data processing tools. I am writing this blog posts to share some of the ideas I am “brewing” and hopefully to get some feedback from you to see if you can help me perfect my plans.

What is “Data Sensemaking”?

The main purpose of the course is to teach students how to use a variety of exploratory data analysis methods to extract information and hypotheses from data. Some people call this Exploratory Data Analysis (EDA), but I preferred to use Data Sensemaking because EDA carries a somewhat heavy baggage from years old statistical debates. I also like the name “data sensemaking” because it refers more to the outcome than the process.

Why a course on “Data Sensemaking?”

First, because after teaching Information Visualization for a number of years I realized that students lack the very basic knowledge needed to reason with data and this is what matters the most today. It does not matter how cool a given visualization tool or solution is, ultimately we need people to extract valuable and possibly actionable information out of data.

Second, because people are fixated with complex statistical models and machine learning (which are mostly applied blindly to data problems) when in fact what we need to perfect is our ability to reason with data. There is ready-made recipe for this and because of that I find teaching this topic fascinating and enriching. Our brain and rational thinking is the biggest asset we have and I am appalled at how little information and training programs exist to develop our ability to reason effectively with data.

Third, because I want to learn more myself and the best way to learn I know is to teach. While there are some skills in data sensemaking that I have been honing for several years, there are other I feel I need to perfect and develop much much further.

What will I teach in the course?

This is still work in progress but I have a tentative set of topics in mind I’d like to share with you to see what you think about it. Here is my current set of topics:

  1. Defining (worthwhile) data problems. When is a data problem a good problem? How does one go about defining its requirements and constraints?
  2. Asking good and effective questions. How can one systematically transform a foggy data problem into a set of well defined analytical questions? What is a good question?
  3. Finding, generating, and manipulating data. How does one find or generate data useful towards solving the stated problem? What kind of data manipulation and integration are needed? How does one know if the data are good enough?
  4. Exploring data to generate answers and questions. How does one explore data with interactive data visualizations to investigate the stated questions? How does one deal with the new questions/ideas inevitable arising during data analysis?
  5. Biases, lies and data malpractice. How does one avoid to be fooled by data and visualization? What are the major traps out there? What is the best mindset for “data investigators“?
  6. Creating effective data narratives. Once the analysis is ready, how do you organize and re-design the results to communicate them in a way that people are willing to listen/read and actually understand your message?

Major doubts/hurdles.

When looking at the list above there are a few elements I am still uncertain about:

  • What makes a data problem a good problem? How do I teach students what a good problem is? We are so much used to thinking about solutions and so little about problems. But, in the end generating worthwhile problems is the most important thing to do. Do you know of any resources on how to pursue and define good problems?
  • What statistical fallacies, biases, etc., should I focus on? There is an endless list of biases and fallacies humans are prone to commit. How do I prioritize them. What is the core set of concepts and examples I should focus on? How can I uniformly cover problems with thinking, statistical and visualization fallacies?
  • How do I teach the narrative side of visualization? I know very well how to teach which chart is “right” for a given communication task. But this is only a fraction of what one needs to learn to communicate effectively. The narrative style, the sequence and the interrelation between text and graphics is also very important. How do I teach this? Is there any book or resource I can use to cover these aspects?

What else?

What else should I teach? If you would be one of my students, what else would you like me to teach you? Note that I omitted the technical side of it but I will be teaching some practical tools to use. I plan to focus mostly on Rstat and Tableau. I also plan to have each student set up a blog in which they have to create a number of data sensemaking mini-projects.

Let me know what you think!!!

5 thoughts on “Developing a “Data Sensemaking” Course

  1. Richard Dunks

    This is a great way to frame the concept of making sense of data. It’s a lot better than the overly complex notion of systematic deconstruction, which is what I associate with EDA.

    As far as your questions, in my classes (http://training.datapolitan.com), I try to focus on questions first. I brainstorm questions among the participants in a value-free way that gets a wide range of potential questions and then challenge each one to pursue the question that is most interesting to them. In that way, I’m not imposing an expert prejudice for what makes a “good” question and encourage an iterative process of asking one question that leads to others and on down the line. This more closely models what I think makes for good analysis, namely the pursuit of understanding through the lens of simple curiosity.

    I think the key biases I work with in class (and professionally in my work) are the expertise bias (https://www.psychologytoday.com/blog/everybody-is-stupid-except-you/201008/the-expertise-bias) and confirmation bias (https://www.sciencedaily.com/terms/confirmation_bias.htm), though these are by no means the only ones I think we have to be mindful of (https://en.wikipedia.org/wiki/List_of_cognitive_biases). I think both of these come into play both with the analysis and the visualization design. Falling into these, I’m much less likely to ask the key questions for understanding or communicate my analysis clearly and concisely.

    As far as teaching the narrative of data, I think it’s challenging and I look at this less as an instruction and more a facilitation. I’m constantly challenging my students to articulate what story the data tells. It’s asking them to think for themselves and practice the art of data storytelling, first through example, then through guided practice and reflection with their peers. I let the peer reflection happen first and then add my reflection so the discussion doesn’t center around my perspective and biases. I’m a big believer that more is learned in failure than in success, giving them examples of both and developing their skills to articulate a meaningful critique of what they encounter. My hope is that they can transfer a critique of others to a critique of their own work in order to improve on what they do.

    As far as what to teach, I think there is a lot of value in using data that students can see themselves in, whether it’s NYC Open Data (like 311 complaints) or population data, anything that resonates with them and their interests help motivate self-discovery and learning.

    Thanks for sharing your thoughts. It helps people like me better improve the work we do by seeing your example of thoughtful reflection on your work.

    Reply
    1. Enrico Bertini Post author

      Thanks so much Richard! What an amazing set of tips! I hope we’ll have the opportunity to talk more in person sometime. If you feel like sharing some of your material with me I would be really grateful, especially if you have exercises and data sets to assign during in-class workshops.

      Reply
  2. Bill Saltmarsh

    I love that you’re doing this. This is highly needed on a wide scale. Regarding some of the biases and fallacies to focus on, I do a talk sometimes about ‘Dysfunctional Data’ where I try to address some of common misconceptions and false presumptions that analytics can bring. Here are a few items I touch on: Correlation vs Causation, the elusive nature of determining causality, Simpson’s paradox, how to determine the best ‘average’ for your dataset, the understated role of randomness in a dataset. Hope that helps a little. Good luck!

    Reply
    1. Enrico Bertini Post author

      Thanks so much Bill! Do you have anything I can read or watch on “Dysfunctional Data”?

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *