As we approach the end of the Fall semester I start thinking about the new course I’ll have to teach in Spring. It turns out I decided to design and give a new course I am really excited about. It’s name is “Data Sensemaking” and I will be teaching my students how to derive knowledge out of data using interactive visualization and data processing tools. I am writing this blog posts to share some of the ideas I am “brewing” and hopefully to get some feedback from you to see if you can help me perfect my plans.
What is “Data Sensemaking”?
The main purpose of the course is to teach students how to use a variety of exploratory data analysis methods to extract information and hypotheses from data. Some people call this Exploratory Data Analysis (EDA), but I preferred to use Data Sensemaking because EDA carries a somewhat heavy baggage from years old statistical debates. I also like the name “data sensemaking” because it refers more to the outcome than the process.
Why a course on “Data Sensemaking?”
First, because after teaching Information Visualization for a number of years I realized that students lack the very basic knowledge needed to reason with data and this is what matters the most today. It does not matter how cool a given visualization tool or solution is, ultimately we need people to extract valuable and possibly actionable information out of data.
Second, because people are fixated with complex statistical models and machine learning (which are mostly applied blindly to data problems) when in fact what we need to perfect is our ability to reason with data. There is ready-made recipe for this and because of that I find teaching this topic fascinating and enriching. Our brain and rational thinking is the biggest asset we have and I am appalled at how little information and training programs exist to develop our ability to reason effectively with data.
Third, because I want to learn more myself and the best way to learn I know is to teach. While there are some skills in data sensemaking that I have been honing for several years, there are other I feel I need to perfect and develop much much further.
What will I teach in the course?
This is still work in progress but I have a tentative set of topics in mind I’d like to share with you to see what you think about it. Here is my current set of topics:
- Defining (worthwhile) data problems. When is a data problem a good problem? How does one go about defining its requirements and constraints?
- Asking good and effective questions. How can one systematically transform a foggy data problem into a set of well defined analytical questions? What is a good question?
- Finding, generating, and manipulating data. How does one find or generate data useful towards solving the stated problem? What kind of data manipulation and integration are needed? How does one know if the data are good enough?
- Exploring data to generate answers and questions. How does one explore data with interactive data visualizations to investigate the stated questions? How does one deal with the new questions/ideas inevitable arising during data analysis?
- Biases, lies and data malpractice. How does one avoid to be fooled by data and visualization? What are the major traps out there? What is the best mindset for “data investigators“?
- Creating effective data narratives. Once the analysis is ready, how do you organize and re-design the results to communicate them in a way that people are willing to listen/read and actually understand your message?
When looking at the list above there are a few elements I am still uncertain about:
- What makes a data problem a good problem? How do I teach students what a good problem is? We are so much used to thinking about solutions and so little about problems. But, in the end generating worthwhile problems is the most important thing to do. Do you know of any resources on how to pursue and define good problems?
- What statistical fallacies, biases, etc., should I focus on? There is an endless list of biases and fallacies humans are prone to commit. How do I prioritize them. What is the core set of concepts and examples I should focus on? How can I uniformly cover problems with thinking, statistical and visualization fallacies?
- How do I teach the narrative side of visualization? I know very well how to teach which chart is “right” for a given communication task. But this is only a fraction of what one needs to learn to communicate effectively. The narrative style, the sequence and the interrelation between text and graphics is also very important. How do I teach this? Is there any book or resource I can use to cover these aspects?
What else should I teach? If you would be one of my students, what else would you like me to teach you? Note that I omitted the technical side of it but I will be teaching some practical tools to use. I plan to focus mostly on Rstat and Tableau. I also plan to have each student set up a blog in which they have to create a number of data sensemaking mini-projects.
Let me know what you think!!!