Smart Visualization Annotation

There are three research papers which have drawn my attention lately. They all deal with automatic annotation of data visualizations, that is, adding labels to the visualization automatically.

It seems to me that annotations, as an integral part of a visualization design, have received somewhat little attention in comparison to other components of a visual representation (shapes, layouts, colors, etc.). A quick check in the books I have in my bookshelf kind of support my hypothesis. The only exception I found is Colin Ware’s Information Visualization book, which has a whole section on “Linking Text with Graphical Elements“. This is weird because, think about it, text is the most powerful means we have to bridge the semantic gap between the visual representation and its interpretation. With text we can clarify, explain, give meaning, etc.

Smart annotations is an interesting area of research because, not only it can reduce the burden of manually annotating a visualization but it can also reveal interesting patterns and trends we might not know about or, worse, may get unnoticed. Here are the three papers (click on the images to see a higher resolution version).

Paper#1: “Just-in-time annotation of clusters, outliers, and trends in point-based data visualizations. Kandogan, Eser. Visual Analytics Science and Technology (VAST), 2012 IEEE Conference on. IEEE, 2012.


This annotation works on point based visualizations. The system detects trends automatically by analyzing the visual information displayed on the screen (that is, patterns are detected in the visual space, not the data space) and tries to find a description for the observed trends. Once a description is found, the system overlays labels that convey this information. So, for instance, in the image above the algorithm finds visual clusters (groupings) and annotates them with the data values that most explain the trend (data dimensions and values that have a unique distribution in the cluster). The paper does not focus only on clusters, it provides techniques to annotate trends and outliers as well and it describes the whole framework in a way that it is easy to imagine how this can be extended to other domains and visualizations.

Paper #2: “Contextifier: Automatic Generation of Annotated Stock Visualizations. Hullman, Jessica, Nicholas Diakopoulos, and Eytan Adar. ACM Conference on Human Factors in Computing Systems (CHI). May, 2013.

Contextifier automatically annotates stock market timelines (like the one shown above) by discovering automatically salient trends in the charts (peaks and valleys) and corresponding news that might be relevant to explain the trend. The system is based on an input article and a news corpus. The input article is used as a query to find relevant news in the corpus and to match them against salient features in the graph. Articles and trends are matched to decide which time points should be annotated. These points are subsequently annotated with the most relevant news in the corresponding time frame. The paper also contains a very interesting analysis of how visualization designers annotate their visualization. The outcome of this analysis is used to inform the design of the annotation engine.

Paper #3: “Graphical Overlays: Using Layered Elements to Aid Chart Reading. Kong, Nicholas, and Maneesh Agrawala. Visualization and Computer Graphics, IEEE Transactions on 18.12 (2012): 2631-2638. [Sorry no free access to this one.]

Graphical overlays actually does much more than annotating a chart with text, it’s a whole system to add information on top of existing charts to aid their reading. So, for instance, other than adding notes to a chart to identify potentially interesting trends it also adds grids, highlights elements of a specific type (e.g., one set of bars in a bar chart), adds summary statistics (like an average line in a time chart). The system works entirely on image data, which means it does not require direct access to the original data used to create the chart. In the authors’ words: ” Our approach is based on the insight that generating most of these graphical overlays only requires knowing the properties of the visual marks and axes that encode the data, but does not require access to the underlying data values. Thus, our system analyzes the chart bitmap to extract only the properties necessary to generate the desired overlay.

These three papers present very clever mechanisms to annotate visualizations in different contexts and with different purposes. I suggest you to give a look to the papers because they provide numerous interesting technical details. Beyond the technical aspects though I believe it is interesting that a some researchers are independently focusing on visualization annotation. Annotation is extremely important and I think we did not spend enough energy in exploring its potential and challenges. I also think there is an educational gap we should cover, that is, how do we teach our students when, how and why a visualization should be annotated?

I am curious to hear from you what you think. What do you think about the papers I presented? And what do you think about annotation in general? How do you deal with annotations yourself?

Take care.