Yesterday I stumbled upon a recently published and excellent visual analytics case study: the Circle Line Rogue Train Case. The article describes how data scientists at GovTech’s Data Science Division in Singapore used visualization to discover the origin of a recurring and mysterious disruptions of Singapore’s MRT Circle Line.
This study represents my ideal case for data visualization: the visual exploration of a given data set to solve an important problem somebody has.
Why is it so rare to read about problem-solving visualization?
Besides deeply enjoying reading the story and the data analysis behind it, this study got me thinking: why is it so rare to read about this kind of projects? Is it because not many people actually carry out this kind of work? Or maybe because nobody dares to write about them? Or maybe they are out there but we never notice?
Beyond hedonistic and narrative visualization.
When I look at the data visualization landscape I recognize that the very large majority of projects we see around fall into two main categories:
- Hedonistic Visualization: showing how cool something is.
- Narrative Visualization: supporting a narrative (often journalistic, sometime scientific).
Before I move on, let me clarify I have nothing against these two categories. I am myself as “guilty” as anyone else of being a purveyor of these two kind of visualizations. They are a the root of the enormous success visualization experienced in recent years.
Yet, when I look at example of “problem-solving visualization” I can’t stop from thinking: “Yes! That’s what really matters to me!“, “That’s what I want to see people doing more“, and “That’s what I want to teach my students how to do“.
The limits of Data Journalism.
As much as I love data journalism and I consider it a cornerstone of an era in which possibly people can use reason to discuss issues affecting our society, I do believe it is limited. The biggest limit of data journalism resides in its “storytelling bent”. Journalists want to tell you a story and they need to entertain you. This, in turn, poses some important problems. First, it’s hard to analyze data without thinking of the story one has in mind. Second, not all important problems out there are journalistic problems. Many problems are practical issues people have and that can hopefully be solved by analyzing some data. We don’t see much of that but virtually every organization needs it.
The limits of Scientific Communication.
It may seem weird at first sight, but I do believe scientists suffer from the same problem journalists have. Probably even more: they want to show you their scientific story. I know many will read this as an heresy but it’s pretty much a matter of fact. This becomes even clearer when one considers the cornerstone of scientific endeavors: “hypothesis testing”. Scientists don’t have a neutral stance, they want to demonstrate the hypothesis they have generated before carrying out the science and have a strong incentive to convince you they are right. But I am digressing …
Data Science as “Problem-Solving” Visualization.
With the advent of Data Science it has become imperative for us to figure out how to extract actionable information out of data. I must confess I am concerned with equating Data Science with Machine Learning and automation. Not because I do not believe they are important or worth pursuing, but because an excessive focus on automation may prevent us to work more on the important skill of augmenting humans in ways that can help them reason better on important issues using data. This is a crucial skill we have to cherish and develop further. I really hope Data Science programs throughout the country would like to emphasize this aspect more. As the “Rogue Train” case demonstrates, there are cases in the world where solutions really need to come from a human taking care of it.
Towards a better pedagogy of “thinking with data”.
I want to conclude by saying that I truly believe we, as a community, have to develop the right curricula and infrastructure to teach these crucial skills to aspiring data scientists. It may be less fancy than learning how to implement the latest deep learning technology but not less important.
I, for one, am trying to add a little piece to this puzzle. Next semester I will be teaching a new course at NYU called “Data Sensemaking”. My goal there is to teach students how to carry out projects like the “Rogue Train” case.
I have to confess I don’t know how to teach it. It’s going to be a big challenge! In any case I will be reporting my experience here as I move steps through the weeds.
What do you think? Am I making sense here? Did I miss anything important? I’d be happy to hear …