We noticed many times between the lines of this blog how data visualization is in the hype and how this trend is growing and growing. That’s good news guys! It’s fun and it’s … success! But as more and more people join this wild bunch we have to take care of those who are not as skilled as we are yet. There are many people out there who love data visualization but they think they are out of this business because they are not able to code. I personally think that this is a problem and that we have to be as inclusive as we can. Intelligence is distributed everywhere and we need to “grab” it wherever we find it.
As data visualization develops we can expect to have people developing tools that are more accessible and easier to use also for non-programmers. After all, think about it: why should a person be able to code in order to become a data visualization expert? Think for a moment about other successful technologies. Do we need to program to become skilled graphic designers? Or what about building animations? And designing blueprints with CAD? Folks, I would even say that the we could measure the maturity of our field by the level of democratization we will be able to achieve in the future.
By the way, if you are one of those wannabe non-programmer-data-enthusiast I have a good news for you: the future is here! There are already at least 3 free and powerful tools for you. But let me introduce the problem first, because we need a larger perspective.
Data Visualization is 80% data 20% visualization
I don’t think I will find anyone arguing against this sentence. One of the first thing to understand is that in order to come up with a great visualization it is necessary to break your back on data gathering and preprocessing first. In my experience the proportion is around 80% vs. 20%. I know this might appear frustrating at first but there’s no other way.
And there is also a subtler issue here. Many people think that the power of visualization resides in visualization itself. But a visualization turns out to be great if and only if the data you show has some real value, some clear and interesting message to convey, some story to tell. You can build the most beautiful visualization in the world but if the data is dull you will have a very hard time in impressing people.
So here it is. You need the following:
- Tools for data gathering: you can decide to spend your afternoons wandering like a zombie on the web to find out something pretty, data in data.gov or the OECD repository but you are better off if you build your own data out of the web. For this reason the best is to scrape data from the millions of website you have at your hands. The limit is only your fantasy and you achieve total freedom.
- Tools for data manipulation: you’d better realize it from day one: data is dirt and data manipulation is a dirty job. You won’t like it (well … I like it but this is another story) but you will have to do it, especially if you gather stuff from the web. You have missing values, outliers, formats you don’t like, data you want to transform, aggregate, sample, etc.
- Tools for data visualization: and here comes the sweet thing. But what if you never wrote a single line of code? That’s tough. Ok, let me tell you something: writing lines of code is not that hard and I heartedly suggest everyone trying it out. The power you get into your hands is immeasurable. But if you are too scared or simply too lazy or just don’t have time to invest, you need a visualization design tool. Something to transform your data into pixels. Possibly something with a slow and gentle learning curve.
The Ninja’s Toolbox
Outwit Hub (data scraping)
Outwit Hub is a little and fantastic FireFox add-on. When you run it a new browser windows is opened with lots of functions. It analyzes the current web page and extracts all the elements it contains and categorizes them in types of elements: tables, text, images, links, documents, etc. How do you use it for your purposes? There are three main key functions:
- Export html tables into .csv tables of similar format.
- Extract data items included in opening and closing html tags
- Automate the application of a scraper to a series of links
Isn’t it fantastic? The web its on your hands and the limit is only your fantasy. I’ve been playing with it for a while and it’s a piece of cake. You need an initial investment to understand how it works, even if many of the functions are self-explanatory. If you want to know more you can follow the outwit blog which contains several useful tutorials.
One last word: there is a free version and a paid version. I must admit the free version is a bit limited even though you can start doing some funny things with it. If you want more, especially in terms of automation and data patterns specification you need the pro version, which, by the way, has a very small price of (note: this link is not sponsored, it comes only from my appreciation of the tool).
KNIME (data processing)
KNIME stands for Konstanz Information Miner and it’s a quite famous data mining tool. It has a large group of followers in the area of pharmaceutical companies but I suspect it is little known to data visualization experts. The whole issue of what is the role of Data Mining in visualization deserves a whole blog post, and it’s in my plans to talk diffusely about that. But here KNIME is suggested for two main reason:
- Intuitive user interface based on a workflow model
- Handy data preprocessing and transformation tools
In order to use KNIME you don’t need any programming skills. The application is organized around a workflow paradigm where you select the nodes you need to process your data and you connect them in order to let your data get the shape you want. For instance there are nodes for data input and output, to load .csv files, or data from a database. Then there are nodes to filter, aggregate, normalize data columns. Nodes to take care of missing values. Then, if you are a little experimental and adventurous you have a whole bunch of mining nodes to apply stuff like clustering, classification, rules, etc. There are also some data visualization functions but they are not really the best.
Tableau Public (data visualization)
I don’t know if I really have to introduce Tableau Public. Tableau is one of the fastest growing hi-tech company and its product is revolutionizing the way data is visualized in business environments and on the web. Tableau Public is a free version of Tableau that permits to publish visualizations on the web. The main strength of Tableau are:
- Its easy-to-use interfaces based on drag-and-drop operations.
- The possibility to create publishable visualizations.
One can load a data table (from multiple formats) and start experimenting with various forms of charts in a matter of seconds. Just select a data column and drag it to visualization space and it guesses what is the best representation. Similarly, data columns can be mapped to color, size, shape, etc. In a few minutes it is possible to explore tens of different design solutions and choose the one that best suits the goal in your head. Once the best design has been found it is possible to publish it on the web in a dashboard format.
Download all of them and start experimenting. And most of all HAVE FUN! As usual, each tool needs some level of investment but they all have a very gentle learning curve and you can get cool stuff out of them from the very beginning. Let me know if you have and questions or problems, I’d be happy to help.
If you liked this post you can do at least three things to make me happy: comment below, subscribe to my RSS, re-tweet it. Thanks!