(Note: if you are new to this series, the DVBTK doesn’t teach you how to do visualization. Rather it is meant to help people find a less chaotic and more effective path towards the acquisition of the necessary skills to become a data visualization pro. To know more, make sure to read the introduction to the series first.)
The DVBTK #1 introduced books and study material to make sure you acquire the right knowledge in the right order. Studying is the first step and there’s no level of practice that can substitute for it.
That said, it is extremely important to realize that good visualization cannot happen without practice. It’s not only that practice is a necessary complement to theory, but also that you will understand the theory only once you apply it for real.
But if you want to do visualization you need some tools right? Right. And again the web is a jungle and you might have troubles understanding what is the tool for you. You probably have heard a thousand names and acronyms but you cannot really decide; there are too many choices and too little guidance.
Here is the guidance. In the following, I propose a number of rules and factors you need to take into account when choosing a visualization tool. Furthermore I introduce a number of “staple visualization tools”: established tools which you can make great visualizations with.
And there is more to come!
I felt you needed to know more about each tool, so I decided to interview (at least) one data visualization professional with proven and long-lasting experience with it. Be sure not to miss these interviews, I will be posting them during the next weeks. And of course be sure to send your remarks or questions in the comment below, so that I will be able to address them in the upcoming posts.
Golden Rules of Visualization Tools
First of all you need some fundamental rules.
Rule #1: No tool will turn you into a pro. I think I stressed this point already in the past but it’s worth going over it again. Given the rapid development of visualization technology you might be tempted to adopt the latest technology thinking that it will turn you into a pro. This is not the case. There is no tool that can make you a pro, unless you develop your theoretical and design skills accordingly and organically. A visualization designer is a great designer regardless the tool of choice. It’s basically the same as photography. The last digital reflex may take crisper shots but it won’t turn you into the next Ansel Adams.
Rule #2: First learn one single tool very well. Again, given the vast amount of choices you may make and the endless production of new technologies, you might be tempted to go after all of them. Don’t get me wrong, experimentation and exploration are great but what you need first is a tool that make you feel home, a safe place where you know you can always express yourself regardless the complexity of the idea you have in mind. Choose one tool (see below how) and learn it very well first, you won’t regret it.
Rule #3: Choose tools you are totally in love with. Don’t choose a tool because it’s cool and everybody use it, choose the one that makes you feel great, the one you can have an affair with. People give their best with tools when they are totally in love with them and just cannot stop exploring all their capabilities. If a tool doesn’t click, if you don’t crave to use it (at least at the beginning) it’s a bad sign, move on to the next one.
Let’s clear this out now: do you need to be a programmer?
Damn it! I was almost going to take the safe route and write down a politically-correct and well-balanced answer but … sincerely? Yes, I think you need to be able to write code. I mean, of course you can get away without coding, and below I propose tools which do not require you to write code, but why the hell do you want to limit yourself to such an extent?
I get asked this question quite often and I came to the conclusion that the cost-benefit ratio is so skewed that I cannot see a reason why not coding. And the reason is not only in the benefit part of the ratio but also, and more importantly, in the cost. If you are scared by code it’s time for you to realize that writing code is nothing special and it’s not too difficult either. We all learned to write essays at school, and writing good ones is much more difficult than writing a few lines of code.
A large segment of our culture promoted this view that writing code (together with science and engineering in general) is the sole right of engineers and geeks. Hey you know what? I am terrible at technical things and yet I managed to get a PhD in Computer Engineering and I can write with code the things I have in mind. If I can do it, you can do it.
You don’t need to become a software engineer. The most complex stuff comes when you want to design and develop full applications with lots of interaction and many interconnected modules. But in most cases this is not what you are required to do, and in any case you can always acquire more advanced skills one you find that you need them.
So, choose a language, grab a copy of a good tutorial or book, and learn to code. And hey, why not learning it by doing visualization?! Some of the tools outlined below are just perfect for this purpose (especially Processing and its sketchbook approach). That’s a win-win situation.
How to choose the “right” tool
There is no absolute “right” tool. The best tool is the one you can do great thing with, the one you love. However, there are a number of factors to keep in mind when making your choice.
- Maturity. Is the tool one of the latest fancy and coolest technology on the market with uncertain future or it has been used consistently and with success for quite some time? It’s not a strict rule, but if you bet on the latest technology chances are it will be abandoned in the future. This is especially true for visualization where technology is evolving very very rapidly. In doubt, go for the proven and trusted.
- Community. If your tool doesn’t have a large and stable community of enthusiastic visualization people, it’s a bad sign. Every great tool has a big community and a community is the most important factor in learning. It doesn’t matter how good the documentation is, you are going to need some help (and inspiration) from others.
- Documentation. That’s a very relevant and critical one. Good documentation is notoriously rare. To some extent a good community can alleviate the problems due to limited or bad documentation, but you don’t want to wait for a reply in a forum to move on in your project, especially at its very early stage.
- Examples. There are two main reasons why examples are important. First, you can use examples as a reality check: if people are not producing great visualizations with your tool of choice there must be a reason. Second, having great examples around you is a perfect method to learn fast. Learning by example is extremely powerful and should always be used in conjunction with more structured material. I know people who learn only through examples and they are great!
- Cognitive Fit. I cannot stress this one enough. You have to choose the best tool for YOU and this is a little bit like buying a suit: you have to feel comfortable and cool with it. If not, it’s not for you. The best tools are those with a low “friction factor”, that is, it is natural and easy for you to translate your ideas into pictures.
- Target Platform. Not all tools are created equal in the way they produce their output. Some are specifically targeted to the web, some allows easy conversion to static documents, some allow for the creation of full desktop applications. You’d better make sure to clarify what kind of output you want to produce before making a decision.
- Interaction and Performance. If you want to create interactive visualizations you have to make sure the tool you select allows for rich interaction. Also, when large data is involved you have to make sure your environment performs smoothly.
Staple Data Visualization Tools
Staple data visualization tools are tools with which you cannot go wrong. These are the tools I feel confident to suggest, especially if you are starting out. Of course, this list is very personal and you might find other tools you like. As I said above, if you are in love with a tool go with it. But if you don’t know where to start this list is a very safe bet.
Processing is the mother of all data visualization environments. Ben Fry and Casey Reas created it in 2001, out of their work at MIT, to help data designers create visualization sketches. Today it is one of the most established tool I can think of, maybe the most established. It has a huge user base and it has been used for every conceivable data visualization project (a lot for artistic purposes but for “serious” stuff too). The library is based on Java and this means that in order to use it you would need to learn at least bits of it. But, given the handy functions Processing provides this could also be considered a gentle introduction to the language itself.
If you are willing to write code, you want total freedom in terms of design, and a solid platform, I cannot think of anything better than Processing. You just need to download the software (it is totally free), give a look to the amazing learning material, and start writing code.
Big Pluses: totally free, lots of learning material, very flexible, lots of examples, can be extended with any java library available, can generate many kinds of output, can afford high performance through the OpenGL integration.Few Minuses: it takes learning a new language if you don’t know Java, need to write code even for very simple charts, limited support for advanced user interface components, not conceived for the web.
If you have never heard of R, you are in trouble. I think there’s no way for a data professional to ignore it today. R is a programming language and environment and it is the de facto standard for anything concerning data crunching; visualization included. R is not a visualization tool, it is much much more. It comes with a standard and comprehensive library of data manipulation and statistical functions, plus a huge set of ever growing libraries available on the web.
Data visualization can be done by writing very simple statements with the standard graphics library it comes equipped with or with any of the additional libraries people use, like the fantastic ggplot2.
Normally people use it through the standard console where you write your statements to process data and generate graphics. While R certainly requires programming skills, technically you don’t necessarily need to write full programs, rather your need to write a few statements in the console. But the difference may become blurred.
If you are not too inclined to learning a full programming language like Java, going with R could be a good compromise. The big plus of learning R is that with a single tool you are able to cover the full data manipulation and transformation pipeline, which is not true with other tools mentioned here. Plus, knowing R for data manipulation is a terrific skill you would need anyway.
On the downside, R gives to you less flexibility in generating exactly the visualization you have in mind, if you are thinking of anything too fancy. Also, as far as I know, it is extremely limited if you want to generate custom interactive visualizations. As far as I know R is best to generate static charts out of your data.
It’s worth noticing that several people use to post-process the charts generated with R with programs like Illustrator to make the whole output a bit prettier (check out Visualize This from Nathan Yau if you want to know more). But don’t worry I have seen people doing incredible things with R and I am sure you can do the same with a bit of practice.
Big Pluses: the most established tool for data manipulation in the world, integrated statistical and data manipulation functions, can handle very big data, huge library for additional functions, huge community, good visualization defaults.Some Minuses: need to write statements in a console to “draw” visualizations, not as flexible as a general-purpose programming language.
A data visualization language that permits to design custom visualizations with a few lines of code, at the right level of abstraction yet powerful, with very good performance, and specifically designed to run directly on the web, is something that is going to stay with us for a while and it deserves a lot of consideration.
D3 already has aficionados everywhere, they just love the technology, and the documentation is pretty amazing. Also, people start showing off examples here and there so learning from others won’t be a problem.
Big Pluses: visualizations delivered directly through a web browser, compact code, good community size and excellent documentation.
Some Minuses: the code is a bit tricky and it requires some getting used to, it is not as diffused as other technologies (but this is going to change soon), it might be discontinued in the future the same way as Protovis was.
Notable Examples: Jan Willem Tulp’s Urban Water | D3 Examples Page
Finally an advanced data visualization tool that non-programmers can use! Let me tell it right away: Tableau is one of the biggest things happened in visualization during the last years and I love it. It permits to load and display data in a number of seconds simply by dragging data fields in the view and pushing a few buttons here and there.
What is striking about Tableau is that, while it is not as flexible as a programming language, it allows for pretty sophisticated visualization designs. Also, thanks to its powerful interface it is possible to explore a very large number of designs in a snap.
It takes some times to get used to its internal model and mechanisms, but once you understand how it works it is incredibly fast and powerful. I have been using it for a while and it amazes me how easy it is to go from one view to another; which is especially important in the early stages of a visualization project.
Sure, the level of customization you can achieve with alternatives based on programming is not reachable with Tableau but you can do pretty sophisticated things and I cannot think of a single better tool if you decide not to write code.
Other features I love of Tableau are the possibility to export static and interactive dashboards and the ease with which it loads a very large number of data formats.
There is one huge spot however: Tableau is not free and it’s quite expensive. However, you can still use Tableau Public, which is a somewhat limited version of Tableau, devised to create visualizations that go directly on the web and it’s free. I know a lot of people who are using Tableau only through the public version and they seem to be happy with it.
Big pluses: can create visualizations in a snap, very easy to explore many alternative views of the same data, does not require programming, very large user base.
Some minuses: not as flexible as using a programming language, it’s expensive, takes some time to understand how it works.
Notable examples: Tableau Software’s visual gallery | Clearly and Simply’s Tableau Posts
Excel?! Yes Excel. You might be surprised to see it in the list of staple data visualization tools. I took me a long time to decide whether to include it or not. I’ve been consulting with trusted friends and pondered over it for a while and I came to the conclusion it deserves its own spot.
Because Excel is a standard and it’s everywhere. Plus, people have been doing pretty amazing stuff with it.
If you happen to work in an organization of any kind, chances are Excel is what everyone use and trust (I have seen it everywhere, especially working with my fellow biologists). This means that this is the material you have to work with, whether you like it or not. People are naturally skeptical about changes (and for a good reason!) so they won’t like you introducing a new technology just because you want to spread the data visualization wisdom.
Plus, Excel is a pretty amazing piece of software, which probably unfairly inherited the overall bad light Microsoft products have. Being able to use Excel to draw effective charts can be a tremendous asset for you; with the advantage of using an almost universal platform.
The main and biggest problem with Excel is getting rid of the defaults. They are crap, a perfect gallery of junk charts. But, once you lean how to bypass them you are in the realm of affective and advanced charts. You don’t believe me? Give a look to what Jorge Camoes and John Peltier are able to do with it. And hey, if you want to learn something about Excel be sure to read their web sites from top to bottom.
I think the choice of whether to invest on Excel or not is very much dependent on your situation. If you are totally free and independent, it might not be the right choice, but if you expect to work within the constraints of your organization or with clients in the BI area or similar, being able to work in the context of their comfort tool can be a huge advantage.
Big pluses: universal platform, everybody understand excel, practically free, easy to go from data to chart, integrated with the spreadsheet functionalities.
Some minuses: the defaults are crap, harder to go beyond standard charts, slow with big data.
Notable examples: anything from Excel Charts gurus Jorge Camoes and John Peltier.
There is more to come: interviews are on the way!
I hope the information I provided above will be sufficient to make a well-reasoned decision. In any case there is more material to come: I conducted for each tool at least one interview with a real expert who has a proven track of successful visualizations with the target environment. Stay tuned! I will be posting them in the upcoming weeks.
This series is meant to help you guys, so whatever doubt or question you have, feel free to ask by writing a comment below or sending a message on twitter or writing me an email directly. And please, if you find this post and the series useful don’t forget to share it with your friends. Thanks!