Book: Statistics as Principled Argument

by Enrico on January 9, 2015

in Reviews

stats-principledI just started reading Statistics as Principled Argument and I could not resist to start writing something about it because, simply stated, it’s awesome.

The reason why I am so excited is because this is probably the first stats book I found that focusses exclusively on the narrative and rhetorical side of statistics.

Abelson makes explicit what most people don’t seem to see, or be willing to admit: it does not matter how rigorous your data collection and analysis is (and by the way it’s very hard to be rigorous in the first place), every conclusion you draw out of data is is full of rhetoric.

I think this is a super important message not only for those who produce stories or arguments based on data, from scientists to journalists, but also and above all for the population at large. I think too often people are impressed by this aura of scientific rigorousness and objectivity that numbers and technology provide. There’s no such thing as totally neutrality and objectivity. There are credible and not so credible arguments.

Here are a few sentences extracted from the book I’d like to share:

“… the presentation of the inferences drawn from statistical analysis importantly involves rhetoric” …

and then on the narrative role of stats:

“Beyond its rhetorical function statistics statistical analysis has a narrative role. Meaningful research tells a story with some point to it, and statistics can sharpen the story.”

and on interestingness:

“I have been led to consider hat kind of claims a statistical story can make, and what makes a claim interesting. Interestingness seems to have to do with changing the audience’s belief about important relationships, often by articulating circumstances in which obvious explanations of things break down”

and on the purpose of statistics:

“I have arrived at the theme that the purpose of statistics is to organize a useful argument from quantitative evidence, using a form of principled rhetoric.

and then he brilliantly warns us that we cannot just do without the rigorousness of numbers and stats:

“The word principled is crucial. Just because rhetoric is unavoidable, indeed acceptable, in statistical presentations it does not mean that you should say anything you please.”

It looks like a great book everyone should read. I am on chapter four.

p.s. Thank you very much Alberto for suggesting the book to me in the first place and Stefania for reminding me.

{ 2 comments }

Had a fantastic visit at ProPublica yesterday (thanks Alberto for inviting me and Scott for having me, you have an awesome team!) and we discussed about lots of interesting things at the intersection of data visualization, literacy, statistics, journalism, etc. But there is one thing that really caught my attention. Lena very patiently (thanks Lena!) showed me some of the nice visualizations she created and then asked:

How do you evaluate visualization?

How do you know if you have done things right?

Heck! This is the kind of question I should be able to answer. I did have some suggestion for her, yet I realize there are no established methodologies. This comes as a bit of a surprise to me as I have been organizing the BELIV Workshop on Visualization Evaluation for a long time and I have been running user studies myself for quite some time now.

Yet, when we are confronted with the task of evaluating visualization for communication purposes and for a wide audience what is the best way to go? I am not aware of established practices or methodologies that address this problem. Traditionally, academic work has focussed more on exploratory data analysis problems conducted by experts or very narrow experimental work on graphical perception.

But, let’s see what are the main issues and options there …

1) Expert Review or User Study? This is a classic problem in usability evaluation. Should we ask an expert to look at our visualization and give suggestions on how to improve it or involve users and see how they perform? Both a very valid and not necessarily mutually exclusive options. Typically, expert reviews are less costly and as such they are used in the early phases of the development process to iterate fast on the design. User studies involve a (hopefully) representative sample of people who get exposed to the visualization and some sort of qualitative or quantitative data collection about their experience. The unique problem of visualization, as opposed to the more generic problem of user-interface design, is that there are not so many experts out there. Plus, the experts do not use an established methodology, so the whole process does not scale. But if you want to run user studies you life does not get easier. User studies are a huge mess and if you don’t have experience running them you can do lots of things wrong. Very wrong.

2) Representative Sample? Assuming you want to run a user study, what is a representative sample? Once again, I think visualization poses unique challenges here. The problem is that visual literacy is quite low in the population so it’s not clear what you should shoot for. If you want to communicate to the layman you might end up not using visuals at all! But at the same time if every agency out there plays safe we won’t see any progress and we cannot expect visual literacy to increase. It’s a catch-22: if we don’t use advanced graphics people don’t learn, but if we use these visuals they might not be able to read our message. So we are left with the question of what is a representative sample. I think the main question here is representative of what? One way to go is to create a profile and recruit people with this profile or try to cover a whole spectrum of profiles, which of course might be much more costly and time consuming.

3) Data Collection / Benchmark Tasks or What? Ok now we have a representative sample of our readers, how do we test our visualization with them? One might try to adopt established methods from usability evaluation but the problem here is that usability evaluation is mostly based on the concept of “task”, that is, I show my study participants my interface and ask them to do something with it. Is that a good method for vis? I am not sure. Communication-oriented visualization is not really about performing a specific task to achieve a well-defined goal. Visualization is more about information transfer. How do we measure information transfer? Maybe we show the visualizations first and then we ask questions afterwards to see what information people have retained? That’s a viable way but it does not capture the visualization process itself, that is, what and how the user thinks during his or her interaction with the visuals. Another way to go is to use a “think-aloud” protocol: you sit next to your users and ask them to vocalize what they are thinking. This way you have a direct experience with what is going on. But, once again, this is easier said than done, as the way you interact with your participants, what you ask them, when and how, can heavily influence the outcome. So you have to be very careful there too.

There are probably many many more issues here but the common thread seems to be that while there are established methods and methodologies one may be able to adopt from traditional usability testing, visualization poses some unique challenges that are not solved yet.

On a side note, we also discussed the use of crowdsourcing platforms like Amazon Mechanical Turk to evaluate visualization. This is another viable way. It may (maybe) solve the sampling problem but it does not solve the others. Actually the others get even more complicated when you have limited interaction with your target population.

And you? Do you have any experience doing evaluation in this area? Are there other important issues or solutions worth mentioning?

Once again thanks Scott, Alberto and Lena for the inspiring discussion that triggered this blog post. There is so much more work that needs to be done!

Take care.

{ 4 comments }

I could not resist writing this short blog post after having a such a nice conversation with Scott Davidoff yesterday. Scott is a manager at the Human Interfaces Group at NASA JPL and he leads a group of people that takes care of big data problems at NASA (I mean big big data as those coming from telescopes and missions).

While on the phone he said:

You know Enrico … the way I see it is that we are mechanics for scientists … the same way Formula 1 has mechanics for their cars“.

What a brilliant metaphor! Irresistible. It matches perfectly my philosophy and at the same time, sorry to say, I think it does not match very well with the way most people see vis right now.

It reminds me the brilliant “Computer Scientist as a Toolsmith“, the fantastic essay written by Fred Brooks (ACM Turing Award) which I have adopted a long time ago as my personal manifesto. Fred Brooks advocated for a different way to see the role of Computer Science (one I am sure many of my colleagues refuse) as an engineering discipline whose purpose is to provide services to scientists. He famously stated that:

IA > AI (Intelligence Amplification can beat Artificial Intelligence).

That is, a machine and a mind can beat a mind-imitating machine working by itself.

And this all reminds me why I do what I do and why I think we should do more. Much more. In 2011 I was invited at Visualizing Europe, and event organized by Visualizing.org, and I gave a talk that pretty much covered the same ground: “Data Visualization is NOT Useful. It’s Indispensable“.

Talking with Scott, once again I realized how many people out there need our help. These are the people who may discover the next cure for cancer, help us going to Mars, find a way to preserve our planet, prevent terrorist attacks or disasters, just to name a few. You may think these people already have the necessary knowledge, means and skills to tackle big data problems on their own but you are wrong. These people are busy with their science, and for a good reason!

All these people need us! Let me repeat it: all these people need us! It’s up to us to show them what they can do with our tools and skills. Most of them simply do not imagine how powerful some of the things we do may be for them.

Let me tell you one thing: I have collaborated with a few scientists in my career so far and they love it when we make their life easier. Often they are blown away by simple trick we take for granted.

So if you are passionate about data and data visualization I urge you to think about this: you can decide to tackle hard problems with data. You can decide to make a big difference with pairing up with people who deal with hard scientific problems and help them make progress. It’s up to you to make this choice.

C’mon!

My biggest ambition is to be a mechanic. A mechanic for the the Formula 1 of science.

And you?

 

{ 7 comments }

… or whatever we want to call it.

Yin Shanyang writes on twitter in response to my last post on vis as bidirectional channel:

Screen Shot 2014-05-08 at 11.18.17 AM

This comment really hits a nerve on me as I have been thinking about this issue quite a lot lately. I must confess I am no longer satisfied with the word “visualization”. And I am even less satisfied by all the other paraphernalia people like to use: data visualization, interactive visualization, information visualization, visual analytics, infographics, etc.

The reason is that I think all these words do not describe well the work I and many other people do. While visualization seems to be appropriate when the main purpose is data presentation, I don’t think it captures the value of visualization when it is used as a data sensemaking tool.

When used for this purpose interaction is crucial. Analysis looks more like a continuous loop between these steps:

  1. specify to the computer what you want to see and how (the specific visual representation)
  2. detect patterns, interpret the results and generate questions
  3. ask the computer to change the data and/or the visualization to accommodate the new question(s)
  4. assess the results … repeat …

Analytical discourse is a term I saw used in the visual analytics agenda a few years back and I think it captures very well this concept. This all interplay and discourse between the machine and the human. This is what many of us are after and I am not sure the term visualization is able to express this concept in its entirety. The value of these tools is not exclusively in the visual representation; interaction plays a major role.

This became even more apparent to me while teaching my InfoVis course this semester. I teach a lot of things about visual representation but when students come down to building software for their projects, what they are really working on is a fully-fledged user interface. They have multiple linked views, search boxes, dynamic query sliders and all the rest. It’s interactive user interface design they end up doing, not visualization. And user interface design carries a lot of additional challenges that go beyond visual representation. Sure, designing the appropriate representation is still very important but many other choices impact the final results.

For instance all my students’ projects have multiple interactive views, maybe sometime just a main visualization, a list of terms and a couple of query sliders for dynamic filtering, but how do you call that? I call that visualization but in practice it’s a complex user interface. Or a “data interface” as suggested by Yin.

One last note. While thinking about this whole idea I recalled that Jeff Heer‘s lab at UW is called Interactive Data Lab and I think he’s got it right. Interaction with the data is the main thing, visualization is the medium we use to create part of this interaction.

What do you think? Too heretic? To much of a hassle?

{ 6 comments }

I am preparing a presentation for a talk I am giving next week and I have a slide I always use at the beginning that asks this question:

How do we get information from the computer into our heads?

This works as a motivation to introduce the idea that regardless the data crunching power we are going to produce in the future the real bottleneck, in many applications, will always be the human mind. Getting information across from what our computers accumulate and generate to our heads and being able to understand it is the real challenge. Visualization is the tool we use to deal with this problem. By using effective visual representations of data we tap into the power of the human brain with all its incredible powers we have not been able yet to reproduce and synthesize in a machine (I let the discussion of whether this is possible or even desirable to others).

When I present this slide I normally quote the great Fred Brooks’  The Computer Scientist as Toolsmith and add this image from the paper:

Screen Shot 2014-05-06 at 8.00.27 PM

But today for the first time I realized that when we talk about visualization we always talk about it as a one way channel, from the computer (or other media) to the human, when in fact there is a lot of knowledge flowing from the human to the machine.

When we use an interactive visualization tool we decide which data segments we want to attend to (think how Tableau works). This is derived from our knowledge and questions which we implicitly use to make choices about what to visualize next and how. When we use dynamic queries we use our knowledge to tell the computer that we are interested in a specific segment of the data and that we want to see it now.

There is a simple but effective function in Tableau that I love and is a good example of what I am trying to say here: the “exclude” function, which allows you to remove a data item from the visualization completely because not interesting or just annoying. When we do that, we are transferring our specific knowledge to the computer to tell it that we don’t need to see that data point anymore.

All in all it seems to boil down to interaction and how it is the only way to translate our intentions into instructions our computers can interpret. I think that what I really want to say is that we tend to forget how powerful this channel is and how limited it is to think about visualization exclusively as a 1-way communication tool. Sure, we can keep considering visualization this way but I think it’s much more exciting to think about it as a “visual thinking tool” where information flows in both directions.

And I think there is even more than that. While interaction in visualization is currently limited to giving instructions about what to see next, nothing prevents interaction to be used as a tool to transfer pieces of human knowledge directly to the computer. Classic examples where this has been attempted in the area of machine learning and related fields are relevance feedback mechanisms and active learning. Both technique rest on the idea of asking a human how to judge a decision made by the computer and use the result as a way to improve the computation. This is only one example but I think there are many unexplored ways to input our knowledge back into the computer to make it smarter and I think visualization should play a much larger role there.

That’s all for now. Thoughts?

{ 0 comments }

My (stupid) fear we may, one day, become irrelevant

April 1, 2014

[Be warned: this is me in a somewhat depressive state after the deep stress I have endured by submitting too many papers at VIS’14 yesterday. I hope you will forgive me. In reality I could not be more excited about what I am doing and what WE are doing as a community. Yet, I feel […]

Read the full article →

Course Diary #3: Beyond Charts: Dynamic Visualization

March 7, 2014

This is the last lecture of the introductory part of my course where I give a very broad (and admittedly shallow) overview of some key visualization concepts I hope will stick in my students’ head. After talking about basic charts and high-information graphics I introduce dynamic visualization as visual representations that can change through user […]

Read the full article →

Course Diary #2: Beyond Charts: High-Information Graphics

February 28, 2014

Hi there! We had a one week break at school as the inclement weather forced us to cancel the class last week. Here are the lecture slides from this class: Beyond Charts: High-Information Graphics. In this third lecture I have introduced the concept of “high-information graphics”, a term I have stolen from Tufte’s Visual Display […]

Read the full article →

Course Diary #1: Basic Charts

February 10, 2014

Starting from this week and during the rest of the semester I will be writing a new series called “Course Diary” where I report about my experience while teaching Information Visualization to my students at NYU. Teaching to them is a lot of fun. They often challenge me with questions and comments which force me […]

Read the full article →

The Role of Algorithms in Data Visualization

January 28, 2014

It’s somewhat surprising to me to notice how little we discuss about the more technical side of data visualization. I use to say that visualization is something that “happens in your head” to emphasize the role of perception and cognition and to explain why it is so hard to evaluate visualization. Yet, visualization happens a […]

Read the full article →