How to become a data visualization ninja with 3 free tools for non-programmers

by Enrico on December 17, 2010

in Guides

We noticed many times between the lines of this blog how data visualization is in the hype and how this trend is growing and growing. That’s good news guys! It’s fun and it’s … success! But as more and more people join this wild bunch we have to take care of those who are not as skilled as we are yet. There are many people out there who love data visualization but they think they are out of this business because they are not able to code. I personally think that this is a problem and that we have to be as inclusive as we can. Intelligence is distributed everywhere and we need to “grab” it wherever we find it.

As data visualization develops we can expect to have people developing tools that are more accessible and easier to use also for non-programmers. After all, think about it: why should a person be able to code in order to become a data visualization expert? Think for a moment about other successful technologies. Do we need to program to become skilled graphic designers? Or what about building animations? And designing blueprints with CAD? Folks, I would even say that the we could measure the maturity of our field by the level of democratization we will be able to achieve in the future.

By the way, if you are one of those  wannabe non-programmer-data-enthusiast I have a good news for you: the future is here! There are already at least 3 free and powerful tools for you. But let me introduce the problem first, because we need a larger perspective.

Data Visualization is 80% data 20% visualization

I don’t think I will find anyone arguing against this sentence. One of the first thing to understand is that in order to come up with a great visualization it is necessary to break your back on data gathering and preprocessing first. In my experience the proportion is around 80% vs. 20%. I know this might appear frustrating at first but there’s no other way.

And there is also a subtler issue here. Many people think that the power of visualization resides in visualization itself. But a visualization turns out to be great if and only if the data you show has some real value, some clear and interesting message to convey, some story to tell. You can build the most beautiful visualization in the world but if the data is dull you will have a very hard time in impressing people.

So here it is. You need the following:

  • Tools for data gathering: you can decide to spend your afternoons wandering like a zombie on the web to find out something pretty, data in data.gov or the OECD repository but you are better off if you build your own data out of the web. For this reason the best is to scrape data from the millions of website you have at your hands. The limit is only your fantasy and you achieve total freedom.
  • Tools for data manipulation: you’d better realize it from day one: data is dirt and data manipulation is a dirty job. You won’t like it (well … I like it but this is another story) but you will have to do it, especially if you gather stuff from the web. You have missing values, outliers, formats you don’t like, data you want to transform, aggregate, sample, etc.
  • Tools for data visualization: and here comes the sweet thing. But what if you never wrote a single line of code? That’s tough. Ok, let me tell you something: writing lines of code is not that hard and I heartedly suggest everyone trying it out. The power you get into your hands is immeasurable. But if you are too scared or simply too lazy or just don’t have time to invest, you need a visualization design tool. Something to transform your data into pixels. Possibly something with a slow and gentle learning curve.

The Ninja’s Toolbox

Outwit Hub (data scraping)

Outwit Hub is a little and fantastic FireFox add-on. When you run it a new browser windows is opened with lots of functions. It analyzes the current web page and extracts all the elements it contains and categorizes them in types of elements: tables, text, images, links, documents, etc. How do you use it for your purposes? There are three main key functions:

  • Export html tables into .csv tables of similar format.
  • Extract data items included in opening and closing html tags
  • Automate the application of a scraper to a series of links

Isn’t it fantastic? The web its on your hands and the limit is only your fantasy. I’ve been playing with it for a while and it’s a piece of cake. You need an initial investment to understand how it works, even if many of the functions are self-explanatory. If you want to know more you can follow the outwit blog which contains several useful tutorials.

One last word: there is a free version and a paid version. I must admit the free version is a bit limited even though you can start doing some funny things with it. If you want more, especially in terms of automation and data patterns specification you need the pro version, which, by the way, has a very small price of (note: this link is not sponsored, it comes only from my appreciation of the tool).

KNIME (data processing)

KNIME stands for Konstanz Information Miner and it’s a quite famous data mining tool. It has a large group of followers in the area of pharmaceutical companies but I suspect it is little known to data visualization experts. The whole issue of what is the role of Data Mining in visualization deserves a whole blog post, and it’s in my plans to talk diffusely about that. But here KNIME is suggested for two main reason:

  • Intuitive user interface based on a workflow model
  • Handy data preprocessing and transformation tools

In order to use KNIME you don’t need any programming skills. The application is organized around a workflow paradigm where you select the nodes you need to process your data and you connect them in order to let your data get the shape you want. For instance there are nodes for data input and output, to load .csv files, or data from a database. Then there are nodes to filter, aggregate, normalize data columns. Nodes to take care of missing values. Then, if you are a little experimental and adventurous you have a whole bunch of mining nodes to apply stuff like clustering, classification, rules, etc. There are also some data visualization functions but they are not really the best.

Tableau Public (data visualization)

I don’t know if I really have to introduce Tableau Public. Tableau is one of the fastest growing hi-tech company and its product is revolutionizing the way data is visualized in business environments and on the web. Tableau Public is a free version of Tableau that permits to publish visualizations on the web. The main strength of Tableau are:

  • Its easy-to-use interfaces based on drag-and-drop operations.
  • The possibility to create publishable visualizations.

One can load a data table (from multiple formats) and start experimenting with various forms of charts in a matter of seconds. Just select a data column and drag it to visualization space and it guesses what is the best representation. Similarly, data columns can be mapped to color, size, shape, etc. In a few minutes it is possible to explore tens of different design solutions and choose the one that best suits the goal in your head. Once the best design has been found it is possible to publish it on the web in a dashboard format.

Conclusion

Download all of them and start experimenting. And most of all HAVE FUN! As usual, each tool needs some level of investment but they all have a very gentle learning curve and you can get cool stuff out of them from the very beginning. Let me know if you have and questions or problems, I’d be happy to help.

If you liked this post you can do at least three things to make me happy: comment below, subscribe to my RSS, re-tweet it. Thanks!

  • http://goo.gl/yZ8Xk Kaizer

    Thank you for this post. I found it useful. I do not know how to go about it but now i have a starting point.
    Kaizer.

    • Enrico

      Thanks for your comment Kaizer, I am glad to hear this is useful to someone. Please let me know how I can help you … I might come up with a new post if I have more info on what people need. All my best.

  • Pingback: Daily Links for December 17th through December 18th | Akkam's Razor

  • Pingback: [On the Web this week 2] | Data is Huge

  • Pingback: Population Change by State from 2000 to 2010: Infographic | Plannovation

  • Gonzalo

    Thanks Enrico for your post, i think this three tools are very useful for people that is starting with this(like me). I have some doubts about how to start designing visualization, what features can I consider when starting with the design of an web application, for example. I’ve been searching something like a checklist or methodology about design, but I haven’t found anything yet. Do you know where can I keep searching? Best regards, i’ve already suscribed to you RSS.

    • Enrico

      Thanks for your message. I am planning to provide some more structured information soon. In the meantime the best book I can suggest for starters is Stephen Few’s Show Me the Numbers. You can’t go wrong with it, believe me.

  • http://xvisr.wordpress.com radosveta

    Thanks for the useful post! I think OutWit Hub will be very useful to me, I was just wondering how I can transfer data from a table on a site into a more workable format without having to copy/paste and correct weird formatting for hours. Will try it right away.

    Another question that’s been wondering in my mind for quite some time now: how does one spot discrepancies in their data? I know it will strongly depend on the nature of the data and the purpose we want to use it for, but I was wondering if there were any general rules about that. Or maybe your advise to people who are just starting out on literature on data mining and cleansing?

    • Enrico

      Radosveta, I am glad you found it useful. Data cleansing, reconciliation, integration, etc. is a HUGE topic. I am not really a specialist on the topic. What normally people do is to struggle with that manually and learn a number of tricks and patterns. You might want to give a look to to Jeff Heer’s Wrangler: http://vis.stanford.edu/papers/wrangler. I’ve heard good things about Google Refine too: http://code.google.com/p/google-refine/.

  • http://himmelweiss.de Alfred Berlin Hypnose

    Thank you for the article because you giving me new ideas…..

  • http://www.probiotixfoods.com Modi

    Very useful post – Outwit Hub is extremely useful not only to SEOs but to web developers too.

  • Pingback: Como encontrar dados dignos de links | F2 - Sistemas

  • Pingback: Time to spice up your visualization skills? «

  • Pingback: » How to become a data visualization ninja ….

  • Pingback: How to choose the right chart (corrected) | viewtific

  • http://twitter.com/fx86 Fibinse Xavier

    I guess you should add Scraperwiki to the list, as well. It’s a good tool for getting data off websites which might not have good APIs.

  • Disconnected Press

    Tableau Public, is windows only correct? Is there an OSX alternative that you’d suggest? Thank you :)

    - NX

Previous post:

Next post: