The Data Visualization Beginner’s Toolkit #2: Visualization Tools

by Enrico on August 25, 2011

in Guides

visualization tools(Note: if you are new to this series, the DVBTK doesn’t teach you how to do visualization. Rather it is meant to help people find a less chaotic and more effective path towards the acquisition of the necessary skills to become a data visualization pro. To know more, make sure to read the introduction to the series first.)

The DVBTK #1 introduced books and study material to make sure you acquire the right knowledge in the right order. Studying is the first step and there’s no level of practice that can substitute for it.

That said, it is extremely important to realize that good visualization cannot happen without practice. It’s not only that practice is a necessary complement to theory, but also that you will understand the theory only once you apply it for real.

But if you want to do visualization you need some tools right? Right. And again the web is a jungle and you might have troubles understanding what is the tool for you. You probably have heard a thousand names and acronyms but you cannot really decide; there are too many choices and too little guidance.

Here is the guidance. In the following, I propose a number of rules and factors you need to take into account when choosing a visualization tool. Furthermore I introduce a number of “staple visualization tools”: established tools which you can make great visualizations with.

And there is more to come!

I felt you needed to know more about each tool, so I decided to interview (at least) one data visualization professional with proven and long-lasting experience with it. Be sure not to miss these interviews, I will be posting them during the next weeks. And of course be sure to send your remarks or questions in the comment below, so that I will be able to address them in the upcoming posts.

Golden Rules of Visualization Tools

First of all you need some fundamental rules.

Rule #1: No tool will turn you into a pro. I think I stressed this point already in the past but it’s worth going over it again. Given the rapid development of visualization technology you might be tempted to adopt the latest technology thinking that it will turn you into a pro. This is not the case. There is no tool that can make you a pro, unless you develop your theoretical and design skills accordingly and organically. A visualization designer is a great designer regardless the tool of choice. It’s basically the same as photography. The last digital reflex may take crisper shots but it won’t turn you into the next Ansel Adams.

Rule #2: First learn one single tool very well. Again, given the vast amount of choices you may make and the endless production of new technologies, you might be tempted to go after all of them. Don’t get me wrong, experimentation and exploration are great but what you need first is a tool that make you feel home, a safe place where you know you can always express yourself regardless the complexity of the idea you have in mind. Choose one tool (see below how) and learn it very well first, you won’t regret it.

Rule #3: Choose tools you are totally in love with. Don’t choose a tool because it’s cool and everybody use it, choose the one that makes you feel great, the one you can have an affair with. People give their best with tools when they are totally in love with them and just cannot stop exploring all their capabilities. If a tool doesn’t click, if you don’t crave to use it (at least at the beginning) it’s a bad sign, move on to the next one.

Let’s clear this out now: do you need to be a programmer?

Damn it!  I was almost going to take the safe route and write down a politically-correct and well-balanced answer but … sincerely? Yes, I think you need to be able to write code. I mean, of course you can get away without coding, and below I propose tools which do not require you to write code, but why the hell do you want to limit yourself to such an extent?

I get asked this question quite often and I came to the conclusion that the cost-benefit ratio is so skewed that I cannot see a reason why not coding. And the reason is not only in the benefit part of the ratio but also, and more importantly, in the cost. If you are scared by code it’s time for you to realize that writing code is nothing special and it’s not too difficult either. We all learned to write essays at school, and writing good ones is much more difficult than writing a few lines of code.

A large segment of our culture promoted this view that writing code (together with science and engineering in general) is the sole right of engineers and geeks. Hey you know what? I am terrible at technical things and yet I managed to get a PhD in Computer Engineering and I can write with code the things I have in mind. If I can do it, you can do it.

You don’t need to become a software engineer. The most complex stuff comes when you want to design and develop full applications with lots of interaction and many interconnected modules. But in most cases this is not what you are required to do, and in any case you can always acquire more advanced skills one you find that you need them.

So, choose a language, grab a copy of a good tutorial or book, and learn to code. And hey, why not learning it by doing visualization?! Some of the tools outlined below are just perfect for this purpose (especially Processing and its sketchbook approach). That’s a win-win situation.

How to choose the “right” tool

There is no absolute “right” tool. The best tool is the one you can do great thing with, the one you love. However, there are a number of factors to keep in mind when making your choice.

  • Maturity. Is the tool one of the latest fancy and coolest technology on the market with uncertain future or it has been used consistently and with success for quite some time? It’s not a strict rule, but if you bet on the latest technology chances are it will be abandoned in the future. This is especially true for visualization where technology is evolving very very rapidly. In doubt, go for the proven and trusted.
  • Community. If your tool doesn’t have a large and stable community of enthusiastic visualization people, it’s a bad sign. Every great tool has a big community and a community is the most important factor in learning. It doesn’t matter how good the documentation is, you are going to need some help (and inspiration) from others.
  • Documentation. That’s a very relevant and critical one. Good documentation is notoriously rare. To some extent a good community can alleviate the problems due to limited or bad documentation, but you don’t want to wait for a reply in a forum to move on in your project, especially at its very early stage.
  • Examples. There are two main reasons why examples are important. First, you can use examples as a reality check: if people are not producing great visualizations with your tool of choice there must be a reason. Second, having great examples around you is a perfect method to learn fast. Learning by example is extremely powerful and should always be used in conjunction with more structured material. I know people who learn only through examples and they are great!
  • Cognitive Fit. I cannot stress this one enough. You have to choose the best tool for YOU and this is a little bit like buying a suit: you have to feel comfortable and cool with it. If not, it’s not for you. The best tools are those with a low “friction factor”, that is, it is natural and easy for you to translate your ideas into pictures.
  • Target Platform. Not all tools are created equal in the way they produce their output. Some are specifically targeted to the web, some allows easy conversion to static documents, some allow for the creation of full desktop applications. You’d better make sure to clarify what kind of output you want to produce before making a decision.
  • Interaction and Performance. If you want to create interactive visualizations you have to make sure the tool you select allows for rich interaction. Also, when large data is involved you have to make sure your environment performs smoothly.

Staple Data Visualization Tools

Staple data visualization tools are tools with which you cannot go wrong. These are the tools I feel confident to suggest, especially if you are starting out. Of course, this list is very personal and you might find other tools you like. As I said above, if you are in love with a tool go with it. But if you don’t know where to start this list is a very safe bet.

Processing

Processing is the mother of all data visualization environments. Ben Fry and Casey Reas created it in 2001, out of their work at MIT, to help data designers create visualization sketches. Today it is one of the most established tool I can think of, maybe the most established. It has a huge user base and it has been used for every conceivable data visualization project (a lot for artistic purposes but for “serious” stuff too). The library is based on Java and this means that in order to use it you would need to learn at least bits of it. But, given the handy functions Processing provides this could also be considered a gentle introduction to the language itself.

If you are willing to write code, you want total freedom in terms of design, and a solid platform, I cannot think of anything better than Processing. You just need to download the software (it is totally free), give a look to the amazing learning material, and start writing code.

Processing does not have a rich set of user interface widgets but frankly I don’t think this is a too limiting factor. Interaction can be very smooth and if you need high performance you can always use OpenGL which is nicely integrated into the library. If you want to generate output for the web you can also use processing.js, which generates browser readable javascript code.

Big Pluses: totally free, lots of learning material, very flexible, lots of examples, can be extended with any java library available, can generate many kinds of output, can afford high performance through the OpenGL integration.
Few Minuses: it takes learning a new language if you don’t know Java, need to write code even for very simple charts, limited support for advanced user interface components, not conceived for the web.
Notable Examples: any project from Ben Fry | amazing “serious” bio-applications from Miriah Meyer.

R

If you have never heard of R, you are in trouble. I think there’s no way for a data professional to ignore it today. R is a programming language and environment and it is the de facto standard for anything concerning data crunching; visualization included. R is not a visualization tool, it is much much more. It comes with a standard and comprehensive library of data manipulation and statistical functions, plus a huge set of ever growing libraries available on the web.

Data visualization can be done by writing very simple statements with the standard graphics library it comes equipped with or with any of the additional libraries people use, like the fantastic ggplot2.

Normally people use it through the standard console where you write your statements to process data and generate graphics. While R certainly requires programming skills, technically you don’t necessarily need to write full programs, rather your need to write a few statements in the console. But the difference may become blurred.

If you are not too inclined to learning a full programming language like Java, going with R could be a good compromise. The big plus of learning R is that with a single tool you are able to cover the full data manipulation and transformation pipeline, which is not true with other tools mentioned here. Plus, knowing R for data manipulation is a terrific skill you would need anyway.

On the downside, R gives to you less flexibility in generating exactly the visualization you have in mind, if you are thinking of anything too fancy. Also, as far as I know, it is extremely limited if you want to generate custom interactive visualizations. As far as I know R is best to generate static charts out of your data.

It’s worth noticing that several people use to post-process the charts generated with R with programs like Illustrator to make the whole output a bit prettier (check out Visualize This from Nathan Yau if you want to know more). But don’t worry I have seen people doing incredible things with R and I am sure you can do the same with a bit of practice.

Big Pluses: the most established tool for data manipulation in the world, integrated statistical and data manipulation functions, can handle very big data, huge library for additional functions, huge community, good visualization defaults.
Some Minuses: need to write statements in a console to “draw” visualizations, not as flexible as a general-purpose programming language.

D3

D3 is the creation of Mike Bostock and Jeff Heer from Stanford. Its primary feature is to permit the creation of complex interactive data visualizations through very compact code that can be delivered through a web browser. It is based on javascript and svg and provides a number of handy functions that make constructing visualizations a lot easier.

Some of you might be surprised to see such a young technology included in my list of staple visualization tools, but D3 is not as new as you might think. Jeff Heer and Mike Bostock (later) are top-class researchers and they have been developing visualization libraries for a long time, always pushing the technology further (Prefuse, Flare, Protovis, D3). D3 in particular was born on top of the ashes of Protovis, a first attempt to create a visualization library in javascript.

A data visualization language that permits to design custom visualizations with a few lines of code, at the right level of abstraction yet powerful, with very good performance, and specifically designed to run directly on the web, is something that is going to stay with us for a while and it deserves a lot of consideration.

D3 already has aficionados everywhere, they just love the technology, and the documentation is pretty amazing. Also, people start showing off examples here and there so learning from others won’t be a problem.

If you are inclined to web programming, you like javascript (I personally have a strong idiosyncrasy with it), and are familiar with web technologies like css and svg, D3 could be just the right choice for you. I don’t have any experience with it but all my geek visualization friends are super-excited about it and they swear it is the best data visualization technology ever created.

Big Pluses: visualizations delivered directly through a web browser, compact code, good community size and excellent documentation.
Some Minuses: the code is a bit tricky and it requires some getting used to, it is not as diffused as other technologies (but this is going to change soon), it might be discontinued in the future the same way as Protovis was.
Notable Examples: Jan Willem Tulp’s Urban Water | D3 Examples Page

Tableau

Finally an advanced data visualization tool that non-programmers can use! Let me tell it right away: Tableau is one of the biggest things happened in visualization during the last years and I love it. It permits to load and display data in a number of seconds simply by dragging data fields in the view and pushing a few buttons here and there.

What is striking about Tableau is that, while it is not as flexible as a programming language, it allows for pretty sophisticated visualization designs. Also, thanks to its powerful interface it is possible to explore a very large number of designs in a snap.

It takes some times to get used to its internal model and mechanisms, but once you understand how it works it is incredibly fast and powerful. I have been using it for a while and it amazes me how easy it is to go from one view to another; which is especially important in the early stages of a visualization project.

Sure, the level of customization you can achieve with alternatives based on programming is not reachable with Tableau but you can do pretty sophisticated things and I cannot think of a single better tool if you decide not to write code.

Other features I love of Tableau are the possibility to export static and interactive dashboards and the ease with which it loads a very large number of data formats.

There is one huge spot however: Tableau is not free and it’s quite expensive. However, you can still use Tableau Public, which is a somewhat limited version of Tableau, devised to create visualizations that go directly on the web and it’s free. I know a lot of people who are using Tableau only through the public version and they seem to be happy with it.

Big pluses: can create visualizations in a snap, very easy to explore many alternative views of the same data, does not require programming, very large user base.
Some minuses: not as flexible as using a programming language, it’s expensive, takes some time to understand how it works.
Notable examples: Tableau Software’s visual gallery | Clearly and Simply’s Tableau Posts

Excel

Excel?! Yes Excel. You might be surprised to see it in the list of staple data visualization tools. I took me a long time to decide whether to include it or not. I’ve been consulting with trusted friends and pondered over it for a while and I came to the conclusion it deserves its own spot.

Why?

Because Excel is a standard and it’s everywhere. Plus, people have been doing pretty amazing stuff with it.

If you happen to work in an organization of any kind, chances are Excel is what everyone use and trust (I have seen it everywhere, especially working with my fellow biologists). This means that this is the material you have to work with, whether you like it or not. People are naturally skeptical about changes (and for a good reason!) so they won’t like you introducing a new technology just because you want to spread the data visualization wisdom.

Plus, Excel is a pretty amazing piece of software, which probably unfairly inherited the overall bad light Microsoft products have. Being able to use Excel to draw effective charts can be a tremendous asset for you; with the advantage of using an almost universal platform.

The main and biggest problem with Excel is getting rid of the defaults. They are crap, a perfect gallery of junk charts. But, once you lean how to bypass them you are in the realm of affective and advanced charts. You don’t believe me? Give a look to what Jorge Camoes and John Peltier are able to do with it. And hey, if you want to learn something about Excel be sure to read their web sites from top to bottom.

I think the choice of whether to invest on Excel or not is very much dependent on your situation. If you are totally free and independent, it might not be the right choice, but if you expect to work within the constraints of your organization or with clients in the BI area or similar, being able to work in the context of their comfort tool can be a huge advantage.

Big pluses: universal platform, everybody understand excel, practically free, easy to go from data to chart, integrated with the spreadsheet functionalities.
Some minuses: the defaults are crap, harder to go beyond standard charts, slow with big data.
Notable examples: anything from Excel Charts gurus Jorge Camoes and John Peltier.

There is more to come: interviews are on the way!

I hope the information I provided above will be sufficient to make a well-reasoned decision. In any case there is more material to come: I conducted for each tool at least one interview with a real expert who has a proven track of successful visualizations with the target environment. Stay tuned! I will be posting them in the upcoming weeks.

This series is meant to help you guys, so whatever doubt or question you have, feel free to ask by writing a comment below or sending a message on twitter or writing me an email directly. And please, if you find this post and the series useful don’t forget to share it with your friends. Thanks!

Take care,
Enrico.

  • http://www.visualisingdata.com Andy Kirk

    Great work as ever Enrico, particularly the added-value of the introductory rules and considerations for choosing the right tool for the right task/capability.

  • http://www.janwillemtulp.com Jan Willem Tulp

    Really good post Enrico. Thank you so much for all the effort you’ve put into writing this. It’s a very good overview for anyone willing to get started in data visualization!

  • Enrico

    Thanks guys. We really need as a community to built a thriving environment for people to grow. That’s the only way to ensure we all grow together. Let’s keep this in mind. Your support is extremely valuable for me. Thanks!

  • Sharon

    Thanks for the suggestions, Enrico. I agree with the importance of learning code; I’ve been coding on the Web since the ’90s. However, those of us who work for a media site and want to create visualizations for publication aren’t always able to write code in something like D3 for publication, because our site uses a CMS that just displays basic HTML (and only developers are given access to the underlying code).

  • http://vallandingham.me Jim V

    Good to see your recommendations.
    I might add to the Processing section ruby-processing:
    https://github.com/jashkenas/ruby-processing/wiki

    Its Processing in Ruby!

    Seems like a powerful tool with having the visualization capabilities of Processing and the data processing capabilities of Ruby.

  • http://blog.pointlineplane.co.uk Tom P

    To the list I’d add Flash/Actionscript.
    Advantages:
    Massive installed user base
    Large well supported and documented libraries for UI and general graphics tasks
    Active community

    Disadvantages: Off the shelf visualisation libraries are thin on the ground (similar situation to Processing)
    Won’t deal with massive data sets (similar to D3 and other JS libraries)

  • http://blog.pointlineplane.co.uk Tom P

    … also, totally agree with your Excel comments. Its really valuable to have some skill in that area as it’s the common thread between many different disciplines and for 90% (made up stat) or more of data sets provides a really good place to start exploring/ sketching .

  • http://www.excelcharts.com/blog/ Jorge Camoes

    Enrico, thanks for the reference. Great post as usual. Let me add my two cents. I absolutely agree with the first two “golden rules”. You use tools to make you more efficient, but training is often focused on making you more efficient at using the tools, and if you don’t know what you are doing with them, you can always use the templates…

    I don’t think you must fall in love with the tool, though. You must feel comfortable with it, sure, but you can have a simple and rational relationship with it. I wrote this recently: data visualization is a language and each tool has its own style. You should find a tool that matches your style. For example, if you want to make 3D pie charts you should use Dundas instead of Tableau. This is a rational decision, you dont have to love Dundas (God forbid).

    Our community often preaches to the choir. We need to clearly differentiate our audience. Not everyone wants to become a data visualization expert. I don’t want to learn English to write a novel, I just want to communicate. I chose Excel instead of a more glamorous tool because I want to share my experience with corporate users like me. Users that want to make better charts but cannot/don’t want to forget their role in the organization.

    Excel is a nice tool to learn data visualization. Because of its wrong defaults, it’s easier to understand what good visualization is about. But at some point you must switch, unless you want to want to become an Excel MVP. From Excel to Tableau, from Tableau to D3. From data to information, from information to knowledge…

    I think two tools are missing in your list: Illustrator and Spotfire.

    • ragaar

      We could include Photoshop/Gimp, 3Ds Max, Autocad, ProE, and LaTeX to your list as well. I’d defend Enrico’s above choices as it depends on usage, and personal experience.

    • http://vislives.com Chris Pudney

      Spotfire is a very powerful visualization tool but it’s pricey. Panopticon and Omniscope have similar capabilities.

  • m-b

    Thanks for the list. As for Excel being ‘slow with big data’ you should check out PowerPivot which is a free add-in for Excel and enables you to work with millions of rows of data.

  • ragaar

    Rule 2, in my opinion, is spot on.
    Although it may cause our opinions to become biased, data visualization doesn’t require one to have a vast array of programming language experience.
    I find that Matlab is gaining more and more ground amongst the programming communities. By proxy, Octave: the open-source alternative.

  • Noah

    Great work, I like that D3 is included. It’s certainly my favourite.

    I actually find it easy to use after having read through a few tutorials. From there on in the API documentation has covered any new learning requirements.

  • Pingback: The Data Visualization Beginner’s Toolkit #2: Visualization Tools | VizWorld.com

  • Pranav

    Thanks for the wonderful post. I just had one question in mind. Assuming, i have a huge data set and i need to do some pre-processing to the data and make a customized visualization for this particular data set. Among the tools that you’ve mentioned in the post, where does Adobe Illustrator rank and which among the above (inclusive of Illustrator) is a better tool for the problem that i just mentioned?
    Thanks,
    Pranav

  • Pascal

    Dear enrico,

    thanks for this post.
    I am currently using R in my day to day work for data processing and visualisation (static).
    I have looked a bit at Processing and capabilities, in my leisure time.

    I am wondering why there is not much discusions on linking both, where would be the benefits and what kind of projects are using both ?
    might be that this is a non-question, with the linking made really easy (eg, viewed from a R perspective : https://r-forge.r-project.org/projects/rprocessing/) ?

    any comments ?

    • Enrico

      Thanks for your message. I think the main reason is to maintain the flexibility and interactive capabilities of processing couples together with the full data manipulation power of R. It seems a very reasonable solution to me if you have big data and you need fancy custom visualizations. I personally don’t know of any project using them both.

  • Pingback: Time to spice up your visualization skills? «

  • Pingback: Episode #5 – How To Learn Data Visualization (with Andy Kirk) | Data Stories

  • waleed_al_hadban

    thanks a lot for the these notes and guidelines, I found them very helpful for someone just starting with InfoVis. I also wanted to share with you a book that I read and I found it very good too.
    it is Information Visualization: Design for Interaction (2nd Edition) by Robert Spence.
    thanks again.
    yours truly,
    waleed al hadban.

  • Radarthreat

    I prefer Python+Numpy+Pandas+Scikit over R

Previous post:

Next post: