When I think Visualization and Excel there are two names that come into my mind: Jorge Camoes and Jon Peltier. If you want to do serious data visualization with Excel, stop here, they are the names. Since I was more familiar with Jorge’s work and had more opportunities to discuss with him I decided to interview him to cover the Excel part of this series, but you can give a look to Jon’s web site if you have any additional questions.
Jorge had been developing visualization in Excel for a long time now and I still remember the time when I saw one of his dashboards in Excel: “Wow, can Excel do that?” Give a look with your eyes to his dashboard courses. Pretty amazing isn’t it?
I have been following Jorge’s blog for a long time now and I often enjoyed his short and catchy blog posts. If you are not following him, give it a try. It’s worth it.
How did you start using Excel?
I started my professional career as a desk researcher. I had to create information products with lots of charts using market and socio-demographic data, and Microsoft Office was the only tool available. Like everyone else, I had no data visualization training, so you can imagine how bad those charts were. On the one hand, that’s very depressing, from a personal point of view. On the other hand, this proves that data visualization skills are easily acquired, once you become aware of what data visualization is all about.
What’s the best and worst aspect of Excel?
We must emphasize that Excel is not a data visualization tool, so you cannot directly compare it to other tools. That said, you can learn and practice sound data visualization principles using Excel. Its chart gallery is poor, but you can make new charts using some more or less clever tricks. Check Jon Peltier‘s site to see how you can extend the Excel chart gallery. So, its flexibility and general availability are the best aspects.
Unfortunately, defaults that emphasize marketing and sales pitch are responsible for a generation of users that don’t really know what a chart is. That’s the worst aspect of Excel. Also, because of it’s flexibility, many users do not recognize that they need stronger data management skills.
How is the learning curve vs. return-on-investment of Excel?
Most business users have access to Excel training. They just need to be brainwashed to remove all they think they know about charts. :) Corporate data visualization culture is so poor that applying simple rules can greatly improve insights and ROI, and you can do it using Excel.
Ok, I am a beginner and I want to learn Excel, where do I start?
What other tools would you recommend other than Excel?
90% of all charts you need in a business environment can be done in Excel. But if it takes a full day to code a chart that you can do in minutes using a different tool you have a good argument to make the switch. I would recommend Tableau, Qlikview or Spotfire. They are well-aligned with currently accepted data visualization best practices and they force you to learn more about structuring your data.
Some comments from Jorge … and my answers:
J: Business users hate programming. You can’t explain a product manager that a simple recorded Excel macro can make all the difference. You can’t tell them that they need programming skills to make a chart. E: I think this is totally fine and probably a reason behind the big success of Tableau.
J: If it can be done in Excel managers will not spend more money getting a new tool. E: Ok, but then they have to be ready to pay someone to let Excel do the job right? I see a great potential for consultants here.
J: But managers are becoming aware that they need a serious (visual) reporting tool; Tableau is one of the options; traditional BI tools are moving fast. If a BI tool supports sparklines that’s a good starting point. E: I think managers will feel more and more pressure as visualization becomes mainstream. I think we just have to wait a little to see some stuff flourishing. I am not too pessimistic.
J: I believe tools matter, and matter a lot. Tools are not neutral (Tufte says that regarding Powerpoint). If you have to fight them they’ll make your life miserable. Try to apply Tufte’s principles to Crystal Xcelsius. I already wrote about this in my blog using fable about the scorpion and the frog (“it’s in my nature”). E: Sure, tools matter. Especially if you know how to switch from one to another according to your needs. Nonetheless, I still believe principles come first. And in order to select the “right” tool and understand its limitations you have to have a clearer idea of what you want to achieve.
J: Life is short: I would argue that it’s better to learn about perception, statistics, data management and graphic design. Delegate the programming part. I’ve been making some dashboards and I spend more time programming than exploring better ways to show the data. Hate that. E: Cannot agree more. Eve though I think we are still in a phase where it’s really really hard to split between the designer and the implementer. The two things are so intertwined that trying to outsource the implementation may very easily lead to unsatisfactory results. But sure, the real skill is in the design IMO.
J: If you don’t include R in your list you’ll get into troubles :) E: Sure! It’s in the pipeline :-)
When I saw for the first time a visualization developed by Jan, the Ghost Counties, I was totally fascinated. It’s brilliant. It took me a while to understand how it works, but once I got it I could not help but admiring the strange mix of complexity and simplicity it provides.
Despite he looks so serious in this picture on the left, he has a big smile and he is fun. I met him for the first time at Visualizing Europe and since then we exchanged many emails. Plus, he is a regular commenter here (and everywhere) and I love him for that.
I don’t know how much I have to add to convince you his advice is a valuable one. Just give a look to his portfolio and judge yourself. He is IMO one of the most interesting data visualization freelancers recently appeared on the scene.
I know, by talking with him, he is proficient with several technologies but he has a passion for D3.
How did you start using Protovis/D3?
I’ve always been someone interested in the latest technologies. So, since I follow the data viz community very closely, I was aware of Protovis very early on, and I was aware of the development of D3 even before it was released to the public. I have a software development background, so I don’t have too much trouble finding my way in new programming languages, and since it excites me to work with new technologies and frameworks, I just started playing with Protovis and D3 as soon as it became available.
What’s the best and worst aspect of Protovis/D3?
The best aspect of Protovis is that it is a domain specific declarative language, which means that is fairly easy to start writing code, using visualization related keywords and functions. The best aspects of D3 is it’s flexibility (more direct integration with SVG) and better performance. The worst aspect of both D3 and Protovis is that it’s hard or impossible to get it working on older browsers, and the learning curve for D3 may be somewhat harder than for Protovis.
Ok, I am a beginner and I want to learn Protovis/D3, where do I start?
How is the learning curve vs. return-on-investment of Protovis/D3?
Protovis is really a good way to learn visualization and programming at the same time. Protovis is a language that is really geared towards the visualization (and diagrams) domain, so it really makes sense to talk about axis, marks, lines, bars, pies, etc. Also, there are really quite some good examples on the Protovis website, so it’s fairly easy to get started. However, right now Protovis is not supported anymore, and people are really moving to D3 now, so getting support may become a little tricky. Also, Protovis does not perform as well as D3 with very complex graphs for instance, and also, compared to D3, in Protovis you’re a little bit limited to the animations you can achieve. So, overall, a good way to start and also good to make some nice standard diagrams and visualizations, but if you really want to do ‘heavy’ visualization stuff, you might consider moving on to D3.
D3 is more powerful, more flexible and seems to have more capabilities (and better performance) than Protovis. The flip side is that in order to have a more flexible programming language, the language is also more abtract. Though many concepts of Protovis are also implemented in D3, and there are also quite some predefined visualization layouts, it’s also more useful (compared to Protovis) to gain some knowledge of SVG, since it’s more likely that you might do some low level stuff. D3 does give you much better animation capabilities, better performance, more flexibility, so, once you get the hang of D3 and some SVG, you’re able to create some very compelling interactive visualizations.
What other tools would you recommend other than Protovis/D3?
A recommendation I’d like to add: when I work on Protovis or D3 visualizations, I use TextMate on the Mac. This allows you to open a preview window which renders your visualization near real-time when you are typing in your code. I’m sure that are similar tools that do this. This is really great for getting immediate feedback while you’re coding.
I am really excited to announce my first interview for the “Tools from the Pros” series! We start with a very good one: Miriah Meyer talks about Processing.
Miriah is assistant professor at University of Utah. I met her only briefly during a couple of conferences but I am a huge fan of her research work on interactive visualization systems for biological data analysis (be sure to check them out!) Her tools are a rare example of well-crafted design studies in interactive data visualization and, as far as I understand, they are all developed in Processing.
I really like this interview because it covers many of the things beginners (and more advanced users) need to know. One above all: the rapid prototyping approach Processing makes possible and the whole mindset behind copying and pasting code to explore alternative designs.
Thanks Miriah! I think people has a lot to gain by reading this interview.
How did you start using Processing?
I started using Processing in 2008 when I helped design a new undergraduate visualization course at Harvard. We chose Processing as the language for the course, and I learned the core bits of the language putting together homework assignments. I quickly came to appreciate how Processing got rid of all the annoying parts of graphics programming — setting up a rendering window, registering callback functions, dealing with linking and libraries and compiling to multiple platforms, that ridiculous gluPickMatrix, and not to mention the headache of type.
We had Ben Fry come to the class to give a guest lecture that spring, after which we went out to lunch. I’ll never forget his answer to my question of why he created Processing. He said (well, I’m paraphrasing here) that he wanted a sandbox to play in, to quickly develop prototypes without getting bogged down in the architecture of the code. He emphasized Processing as a language to try different designs, with real data and with real interaction. And that cutting and pasting code in Processing is totally cool if it gets a design up and going faster. He wanted a language that lets people totally focus on the visualization concept and design without having to think too hard about the code underneath.
Well, that sounded great to me. And I quickly became a total convert, cutting and pasting code until things got so messy that I had to just rewrite an entire project. I found this philosophy totally liberating and that my work benefited immensely from rapid prototyping. Processing is a language that supports this style of development.
What’s the best and worst aspect of Processing?
In short, the best aspect of Processing is the amount of code it takes to get a simple scene with callbacks going — it is a small fraction of what it would take with OpenGL. Simple primitives like circles, squares, text, etc. are nicely abstracted into one-line function calls. Mouse and keyboard callbacks are automatically handled. There is a wide variety of common graphics helper functions available, like lerp-ing colors. Full-screen apps work without having to grab weird OS handles. The PDF library that exports the current scene as vector-graphics has forever changed figures for papers for me. And the ability to export an application to a variety of operating systems in a single go is absolutely invaluable when working with users on a variety of platforms.
Despite all the simplification of the underlying graphics library, Processing still feels like you are in complete control of every mark you make on the screen. I almost never feel like I need to find a way around a function to get the sort of control I want. The design decisions that went into creating the Processing API are fabulous. Really.
As for the drawbacks, there aren’t any really great libraries (yet) for basic user interface widgets. Which for me is ok because I’m kinda neurotic about how my scroll bars look and act. But for graphics beginners this can be a real time-sink. Same goes for more sophisticated types of visual representations like basic charts, maps, and networks. Other languages like Protovis provide built-in algorithms for handling these very common types of representations. In Processing, you’ll have to implement your own graph layout algorithm (or, find one on the web). Again, this can be a hurdle for people with less programming experience.
And as a small gripe — Processing has implementations of Bezier and Catmull-Rom curves … but where is the love for b-splines???
Ok, I am a beginner and I want to learn Processing, where do I start?
On the Learning page you’ll find a whole series of tutorials and examples that can walk you through the basic functions of Processing. The next step is to peruse the inspiring demos in the Exhibition, many of which will include example code. When you see a function you don’t understand, the Reference page has wonderful documentation for the language.
If you are new to programming or graphics programming the two books I recommend are:
You can work through the Getting Started book in a day. It’s short and sweet. If you find that you need more help, the Shiffman book includes more details on how to program and lots of paper and pencil, and coding, exercises. Daniel Shiffman wrote this book from course notes he created in teaching design students at NYU about coding and Processing. It’s intro to programming via Processing.
If you are an experienced graphics programmer all you need is what you can find on the Processing website.
How is the learning curve vs. return-on-investment of Processing?
If you know OpenGL and are familiar with Java, the learning curve is super short and shallow. If you are new to graphics, it will take you less time to wrap your head around Processing than OpenGL. And if you are new to programming, Processing is a really fun way to learn the basics.
With that said, it is still a programming language. Reading in data from a file requires basic coding skills, as does just about any interesting interactive visualization. You have to be comfortable with for-loops and arrays. Processing makes graphics programming way easier, but it doesn’t automatically generate visual representations of data. You have to code that.
If you want control over every aspect of your visualization and interaction designs, then you really just have to program. Processing is one of the best languages to use for that. If you just want to see what your data looks like, then there are other tools that can do this quickly with built in visual representations (like Tableau, ManyEyes, Matlab, R, etc).
What other tools would you recommend other than Processing?
I’d recommend any of the tools and languages I’ve mentioned previously. Another gem is ColorBrewer for selecting great colormaps.
Still, nothing beats OpenGL for truly understanding how graphics works. If you are serious about developing interactive visualizations, I think that taking an intro to graphics course that uses OpenGL is invaluable. Understanding the rendering pipeline and how it is implemented in a computer will make the seemingly quirky aspects of even a language like Processing make sense.
In my last post on data visualization tools I suggested a number of strategies to choose the best tool for you and I provided a list of those I think are the best bets currently available. Now, while I think this list is already very useful, I decided to give you more and I interviewed one data visualization expert for each tool mentioned in the list.
I will be publishing the interviews during the next weeks. Some of them are still in preparation and the list might be expanded in the future as comments and requests come in (please feel free to ask!) What I can tell you from now is that I have the following interviews in editing stage and that the first one will come very very soon:
Each one is a real pro in his area and knows very well the tool he or she uses to make effective visualization. I am sure you will get a lot of useful information out of them.
To each one I asked the following questions:
How did you start doing visualization with X?
What’s the best and worst aspect of X?
Ok, I am a beginner and I want to learn doing visualization with X, where do I start?
How is the learning curve vs. return-on-investment of X?
What other tools would you recommend other than X?
I hope you’ll enjoy it. Stay tuned! The first one is coming very soon.
Important: if you are an expert and are willing to answer these questions about your favorite tool I’d be happy to include you in the list!
More important: specific request for other person/tool interviews are welcome! Who else would you like me to catch? About which tool? I cannot assure you anything but I’d love to receive your requests.
(Note: if you are new to this series, the DVBTK doesn’t teach you how to do visualization. Rather it is meant to help people find a less chaotic and more effective path towards the acquisition of the necessary skills to become a data visualization pro. To know more, make sure to read the introduction to the series first.)
The DVBTK #1 introduced books and study material to make sure you acquire the right knowledge in the right order. Studying is the first step and there’s no level of practice that can substitute for it.
That said, it is extremely important to realize that good visualization cannot happen without practice. It’s not only that practice is a necessary complement to theory, but also that you will understand the theory only once you apply it for real.
But if you want to do visualization you need some tools right? Right. And again the web is a jungle and you might have troubles understanding what is the tool for you. You probably have heard a thousand names and acronyms but you cannot really decide; there are too many choices and too little guidance.
Here is the guidance. In the following, I propose a number of rules and factors you need to take into account when choosing a visualization tool. Furthermore I introduce a number of “staple visualization tools”: established tools which you can make great visualizations with.
And there is more to come!
I felt you needed to know more about each tool, so I decided to interview (at least) one data visualization professional with proven and long-lasting experience with it. Be sure not to miss these interviews, I will be posting them during the next weeks. And of course be sure to send your remarks or questions in the comment below, so that I will be able to address them in the upcoming posts.
Golden Rules of Visualization Tools
First of all you need some fundamental rules.
Rule #1: No tool will turn you into a pro. I think I stressed this point already in the past but it’s worth going over it again. Given the rapid development of visualization technology you might be tempted to adopt the latest technology thinking that it will turn you into a pro. This is not the case. There is no tool that can make you a pro, unless you develop your theoretical and design skills accordingly and organically. A visualization designer is a great designer regardless the tool of choice. It’s basically the same as photography. The last digital reflex may take crisper shots but it won’t turn you into the next Ansel Adams.
Rule #2: First learn one single tool very well. Again, given the vast amount of choices you may make and the endless production of new technologies, you might be tempted to go after all of them. Don’t get me wrong, experimentation and exploration are great but what you need first is a tool that make you feel home, a safe place where you know you can always express yourself regardless the complexity of the idea you have in mind. Choose one tool (see below how) and learn it very well first, you won’t regret it.
Rule #3: Choose tools you are totally in love with. Don’t choose a tool because it’s cool and everybody use it, choose the one that makes you feel great, the one you can have an affair with. People give their best with tools when they are totally in love with them and just cannot stop exploring all their capabilities. If a tool doesn’t click, if you don’t crave to use it (at least at the beginning) it’s a bad sign, move on to the next one.
Let’s clear this out now: do you need to be a programmer?
Damn it! I was almost going to take the safe route and write down a politically-correct and well-balanced answer but … sincerely? Yes, I think you need to be able to write code. I mean, of course you can get away without coding, and below I propose tools which do not require you to write code, but why the hell do you want to limit yourself to such an extent?
I get asked this question quite often and I came to the conclusion that the cost-benefit ratio is so skewed that I cannot see a reason why not coding. And the reason is not only in the benefit part of the ratio but also, and more importantly, in the cost. If you are scared by code it’s time for you to realize that writing code is nothing special and it’s not too difficult either. We all learned to write essays at school, and writing good ones is much more difficult than writing a few lines of code.
A large segment of our culture promoted this view that writing code (together with science and engineering in general) is the sole right of engineers and geeks. Hey you know what? I am terrible at technical things and yet I managed to get a PhD in Computer Engineering and I can write with code the things I have in mind. If I can do it, you can do it.
You don’t need to become a software engineer. The most complex stuff comes when you want to design and develop full applications with lots of interaction and many interconnected modules. But in most cases this is not what you are required to do, and in any case you can always acquire more advanced skills one you find that you need them.
So, choose a language, grab a copy of a good tutorial or book, and learn to code. And hey, why not learning it by doing visualization?! Some of the tools outlined below are just perfect for this purpose (especially Processing and its sketchbook approach). That’s a win-win situation.
How to choose the “right” tool
There is no absolute “right” tool. The best tool is the one you can do great thing with, the one you love. However, there are a number of factors to keep in mind when making your choice.
Maturity. Is the tool one of the latest fancy and coolest technology on the market with uncertain future or it has been used consistently and with success for quite some time? It’s not a strict rule, but if you bet on the latest technology chances are it will be abandoned in the future. This is especially true for visualization where technology is evolving very very rapidly. In doubt, go for the proven and trusted.
Community. If your tool doesn’t have a large and stable community of enthusiastic visualization people, it’s a bad sign. Every great tool has a big community and a community is the most important factor in learning. It doesn’t matter how good the documentation is, you are going to need some help (and inspiration) from others.
Documentation. That’s a very relevant and critical one. Good documentation is notoriously rare. To some extent a good community can alleviate the problems due to limited or bad documentation, but you don’t want to wait for a reply in a forum to move on in your project, especially at its very early stage.
Examples. There are two main reasons why examples are important. First, you can use examples as a reality check: if people are not producing great visualizations with your tool of choice there must be a reason. Second, having great examples around you is a perfect method to learn fast. Learning by example is extremely powerful and should always be used in conjunction with more structured material. I know people who learn only through examples and they are great!
Cognitive Fit. I cannot stress this one enough. You have to choose the best tool for YOU and this is a little bit like buying a suit: you have to feel comfortable and cool with it. If not, it’s not for you. The best tools are those with a low “friction factor”, that is, it is natural and easy for you to translate your ideas into pictures.
Target Platform. Not all tools are created equal in the way they produce their output. Some are specifically targeted to the web, some allows easy conversion to static documents, some allow for the creation of full desktop applications. You’d better make sure to clarify what kind of output you want to produce before making a decision.
Interaction and Performance. If you want to create interactive visualizations you have to make sure the tool you select allows for rich interaction. Also, when large data is involved you have to make sure your environment performs smoothly.
Staple Data Visualization Tools
Staple data visualization tools are tools with which you cannot go wrong. These are the tools I feel confident to suggest, especially if you are starting out. Of course, this list is very personal and you might find other tools you like. As I said above, if you are in love with a tool go with it. But if you don’t know where to start this list is a very safe bet.
Processing is the mother of all data visualization environments. Ben Fry and Casey Reas created it in 2001, out of their work at MIT, to help data designers create visualization sketches. Today it is one of the most established tool I can think of, maybe the most established. It has a huge user base and it has been used for every conceivable data visualization project (a lot for artistic purposes but for “serious” stuff too). The library is based on Java and this means that in order to use it you would need to learn at least bits of it. But, given the handy functions Processing provides this could also be considered a gentle introduction to the language itself.
If you are willing to write code, you want total freedom in terms of design, and a solid platform, I cannot think of anything better than Processing. You just need to download the software (it is totally free), give a look to the amazing learning material, and start writing code.
Big Pluses: totally free, lots of learning material, very flexible, lots of examples, can be extended with any java library available, can generate many kinds of output, can afford high performance through the OpenGL integration.
Few Minuses: it takes learning a new language if you don’t know Java, need to write code even for very simple charts, limited support for advanced user interface components, not conceived for the web.
If you have never heard of R, you are in trouble. I think there’s no way for a data professional to ignore it today. R is a programming language and environment and it is the de facto standard for anything concerning data crunching; visualization included. R is not a visualization tool, it is much much more. It comes with a standard and comprehensive library of data manipulation and statistical functions, plus a huge set of ever growing libraries available on the web.
Data visualization can be done by writing very simple statements with the standard graphics library it comes equipped with or with any of the additional libraries people use, like the fantastic ggplot2.
Normally people use it through the standard console where you write your statements to process data and generate graphics. While R certainly requires programming skills, technically you don’t necessarily need to write full programs, rather your need to write a few statements in the console. But the difference may become blurred.
If you are not too inclined to learning a full programming language like Java, going with R could be a good compromise. The big plus of learning R is that with a single tool you are able to cover the full data manipulation and transformation pipeline, which is not true with other tools mentioned here. Plus, knowing R for data manipulation is a terrific skill you would need anyway.
On the downside, R gives to you less flexibility in generating exactly the visualization you have in mind, if you are thinking of anything too fancy. Also, as far as I know, it is extremely limited if you want to generate custom interactive visualizations. As far as I know R is best to generate static charts out of your data.
It’s worth noticing that several people use to post-process the charts generated with R with programs like Illustrator to make the whole output a bit prettier (check out Visualize This from Nathan Yau if you want to know more). But don’t worry I have seen people doing incredible things with R and I am sure you can do the same with a bit of practice.
Big Pluses: the most established tool for data manipulation in the world, integrated statistical and data manipulation functions, can handle very big data, huge library for additional functions, huge community, good visualization defaults.
Some Minuses: need to write statements in a console to “draw” visualizations, not as flexible as a general-purpose programming language.
A data visualization language that permits to design custom visualizations with a few lines of code, at the right level of abstraction yet powerful, with very good performance, and specifically designed to run directly on the web, is something that is going to stay with us for a while and it deserves a lot of consideration.
D3 already has aficionados everywhere, they just love the technology, and the documentation is pretty amazing. Also, people start showing off examples here and there so learning from others won’t be a problem.
Big Pluses: visualizations delivered directly through a web browser, compact code, good community size and excellent documentation. Some Minuses: the code is a bit tricky and it requires some getting used to, it is not as diffused as other technologies (but this is going to change soon), it might be discontinued in the future the same way as Protovis was. Notable Examples:Jan Willem Tulp’s Urban Water | D3 Examples Page
Finally an advanced data visualization tool that non-programmers can use! Let me tell it right away: Tableau is one of the biggest things happened in visualization during the last years and I love it. It permits to load and display data in a number of seconds simply by dragging data fields in the view and pushing a few buttons here and there.
What is striking about Tableau is that, while it is not as flexible as a programming language, it allows for pretty sophisticated visualization designs. Also, thanks to its powerful interface it is possible to explore a very large number of designs in a snap.
It takes some times to get used to its internal model and mechanisms, but once you understand how it works it is incredibly fast and powerful. I have been using it for a while and it amazes me how easy it is to go from one view to another; which is especially important in the early stages of a visualization project.
Sure, the level of customization you can achieve with alternatives based on programming is not reachable with Tableau but you can do pretty sophisticated things and I cannot think of a single better tool if you decide not to write code.
Other features I love of Tableau are the possibility to export static and interactive dashboards and the ease with which it loads a very large number of data formats.
There is one huge spot however: Tableau is not free and it’s quite expensive. However, you can still use Tableau Public, which is a somewhat limited version of Tableau, devised to create visualizations that go directly on the web and it’s free. I know a lot of people who are using Tableau only through the public version and they seem to be happy with it.
Big pluses: can create visualizations in a snap, very easy to explore many alternative views of the same data, does not require programming, very large user base. Some minuses: not as flexible as using a programming language, it’s expensive, takes some time to understand how it works. Notable examples:Tableau Software’s visual gallery | Clearly and Simply’s Tableau Posts
Excel?! Yes Excel. You might be surprised to see it in the list of staple data visualization tools. I took me a long time to decide whether to include it or not. I’ve been consulting with trusted friends and pondered over it for a while and I came to the conclusion it deserves its own spot.
Because Excel is a standard and it’s everywhere. Plus, people have been doing pretty amazing stuff with it.
If you happen to work in an organization of any kind, chances are Excel is what everyone use and trust (I have seen it everywhere, especially working with my fellow biologists). This means that this is the material you have to work with, whether you like it or not. People are naturally skeptical about changes (and for a good reason!) so they won’t like you introducing a new technology just because you want to spread the data visualization wisdom.
Plus, Excel is a pretty amazing piece of software, which probably unfairly inherited the overall bad light Microsoft products have. Being able to use Excel to draw effective charts can be a tremendous asset for you; with the advantage of using an almost universal platform.
The main and biggest problem with Excel is getting rid of the defaults. They are crap, a perfect gallery of junk charts. But, once you lean how to bypass them you are in the realm of affective and advanced charts. You don’t believe me? Give a look to what Jorge Camoes and John Peltier are able to do with it. And hey, if you want to learn something about Excel be sure to read their web sites from top to bottom.
I think the choice of whether to invest on Excel or not is very much dependent on your situation. If you are totally free and independent, it might not be the right choice, but if you expect to work within the constraints of your organization or with clients in the BI area or similar, being able to work in the context of their comfort tool can be a huge advantage.
Big pluses: universal platform, everybody understand excel, practically free, easy to go from data to chart, integrated with the spreadsheet functionalities. Some minuses: the defaults are crap, harder to go beyond standard charts, slow with big data. Notable examples: anything from Excel Charts gurus Jorge Camoes and John Peltier.
There is more to come: interviews are on the way!
I hope the information I provided above will be sufficient to make a well-reasoned decision. In any case there is more material to come: I conducted for each tool at least one interview with a real expert who has a proven track of successful visualizations with the target environment. Stay tuned! I will be posting them in the upcoming weeks.
This series is meant to help you guys, so whatever doubt or question you have, feel free to ask by writing a comment below or sending a message on twitter or writing me an email directly. And please, if you find this post and the series useful don’t forget to share it with your friends. Thanks!
One of the main goals of this blog, other than challenging the status quo with reflections at the intersection between academics and practitioners, is to help people become data visualization experts. It’s not rare for me to receive emails from people who are enthusuastic about visualization but have little guidance about how to become an expert.
I have been posting some few articles in the past with this specific goal but I realized that they are too scattered and not organized in a way to represent an organic resource for the readers.
For this reason, I decided to create a series specifically designed to help those of you guys who are excited about visualization but really don’t know how and where to start. The series is meant to be part of a permanent collection in FILWD and it’s my first serious attempt to react to my own call to action: “When will we decide to provide lots of value?“.
Introducing the series
The Data Visualization Beginner’s Toolkit will function as an orientation guide for poeple who need guidance in finding the right resources to become data visualization experts. In the guide I will not be teaching visualization directly, nothing technical or theoretical about it (I have plans for this later), but I will show you the resources and one path.
Having such a guide is particularly important today because data visualization is really just like a jungle. There are plenty of opinions, blog posts, research papers, consumerist visualizations, books, etc., and it’s very hard to separate the wheat from the chaff.
When reading this series please keep in my this is my very personal view and, as such, is limited to my own experience. Also, whatever list I will propose is certainly neither unique nor exhaustive. If you are looking for an exhaustive list of resources I highly recommend you Andy Kirk’s collection of data visualization resources.
Here is a tentative list of topics I am planning to cover in the series (subject to changes):
Books and Other Resources
Programming Languages and Tools
Sources of Good Examples
Please if there is anything else you would like to be covered let me know! Send me a message or add a comment below.
Books about Visualization
There is a reason why I start the series with a list of books: if you don’t know the basics of data visualization you will always be an amateur. And what’s worse, visualization experts will notice it and will not take your work seriously.
Also, orienting yourself in the mess we have right now might prove discouraging and prone to errors. If you type “data visualization” in Amazon the result is a disaster, believe me.
Finally, even if you end up picking up very good books, it is definitely possible they are not the right ones given the amount of knowledge and expertise you currently have. Here I suggest the following path (in order).
Show Me the Numbers: Designing Tables and Graphs to Enlighten (To acquire solid foundations). This book teaches the basics of visualization by using only tables and simple charts. You won’t find fancy and colorful visualizations, only scatter plots, bar charts and stuff like that. But that’s the way to go! If you understand the basics then it’s a lot easier to spot the limitations of basic graphs and go beyond them. Plus the book contains the best summary of visual perception applied to visualization I know. It really is a true gem. Don’t make the mistake to be attracted by fancy stuff and skip the basics, start here and you will have very solid foundations.
Readings in Information Visualization: Using Vision to Think (Chapter 1 only) (To go beyond simple charts). Once you understand how charts work and you have learned the basics of visual perception, you are ready to explore fancier stuff. Yet you need some guidance on how to explore the huge data visualization space. The first chapter of this book is the best self-contained piece of work I know. It’s able to provide all it’s needed to start thinking more creatively, but in a structured manner, about advanced visualizations. The book also has a strong emphasis on interaction which is important. If you want to go beyond the first chapter fine, but the book itself is a collection of papers and many of them are totally outdated. But wait a moment, this doesn’t means you cannot find useful material there! In the collection you can find fundamental papers that are totally worth a read: the work of Jacques Bertin above all.
The Visual Display of Quantitative Information and the rest of Tufte’s books (To learn what “graphical excellence” is). People go crazy with Tufte’s book and I understand why: they are totally beautiful, the cover, the format, the colors, the contet, everything. But regardless their beauty, I have always thought it’s really hard to learn something out of them; they require you to think really deeply about what you see. Basically they are “just” a collection of images. The Visual Display of Quantitative Information is the first one and is the only one I truly recommend because it give more guidance than the others. The others are wonderful but you will have to spend more time on them to translate their content into design practices.
Information Visualization: Perception for Design (To know what happens in our brain when we see a visualization). If you have read all the books cited above congratulations! You have learned really a lot. Now, information visualization is deeply rooted in visual perception and cognition. If you want to master the art of visualizization, at some point you will have to know these basics; especially if you aspire at designing innovative visualizations that fit people’s needs. This book starts from the very basics of human vision (e.g., how the eyes work) up to how we think with visualizations. It’s a tough read but it’s totally worth it. You will have to spend quite some time thinking how these theories apply to your specific projects, but believe me, it’s a true investment. I experienced countless situations where a visualization design problem was deeply rooted in one of the issues discussed in this book. You will find yourself referring back to it all the time.
More Books about Visualization
Important: Are the books not mentioned above bad or not worth it? Absolutely not.
It is important to consider two factors: (1) there are several books I have never read or even skimmed through which might provide some additional value to you; (2) there are extremely valuable book I’ve not included just because they are either too advanced or don’t fit the progression of readings I am proposing here. Please keep in mind: I am suggesting you to read these books in the order I gave above.
A few additional books that come into my mind, which need at least a short mention are:
The monumental Semiology of Graphics by Jacques Bertin, which I did not include because it is still hard to get despite a new edition came out and because it’s really a hard read for non-experts.
The extremely beautiful and information rich Visual Language for Designers by Connie Malamed, which I did not include because I haven’t finished reading it yet.
The deep and dense How Maps Work by Alan MacEachren, which despite the title teaches visualization and makes you think deeply about it.
The not known enough and little gem Designing Visual Interfaces by Kevin Mullet and Darrel Sano, which teaches aesthetics in a functional and systematic manner.
Books NOT about Visualization
It’s important to acknowledge that not all the knowledge a visualization expert needs comes from data visualization books. I have no intention to write another long list of related disciplines’ books, but it’s important for you to know that a good data visualization expert may have strong foundations in areas such as: statistics and data mining, data management and manipulation, human-computer interaction and cognitive science.
I don’t want to scare you: you can start doing visualization without these, but little by little you likely will find yourself digging more into these areas.
Also, let me stress the importance of human-computer interaction and related areas. While the rest is normally acquired, at least on a superficial level, by using various technologies you encounter along the way, human-computer interaction has a less technical flavor and you might not learn anything of it unless you seek it.
Knowing how people reason and interact with user interfaces is a crucial skill, the real differentiatior, that you’d better acquire if you want to become a pro. I cannot stress this point enough. Visualization, as any other user interface, happens in people’s mind, not in the computer! And if you want to design great ones you’d better learn how people’s mind work.
Unfortunately, other than the books I mentioned above, there are not many other sources from which you can really learn something. But luckily there are some few notable exceptions! Tamara Munzner and Jeff Heer, top-researchers in the field, share the material of their courses freely on the web and you should not miss them for any reason:
These are university courses, with a specific target, but I cannot think of a more carefully and better organized set of information covering the whole theory and practice of information visualization. What is really unique in these courses and their material is the way this information is organized. Information visualization is still a young discipline and nobody really agrees yet on the content and order to use when teaching it. These two courses found in my opinion the perfect balance between coverage and organization.
Another great source for learning data visualization are Stephen Few’s articles and white papers, which teach a whole lot of fundamental data visualization skills with his usual concise and effective style.
Can I get all the knowledge I need with these books? No.
And there are two main reasons. First of all, one of the biggest and surprising gaps I see in the current literature is a book that teaches systematically how to design a visualization from scratch. I am really surprised. Ben Fry in his Visualizing Data has a few elements of it, but since the book essentially teaches also how to use Processing the whole thing is a bit too diluted. Apart from that, I am not aware of any book that fills this gap (please let me know in case you know one).
A second issue is that there is no book that can really teach you to be a great data visualization designer. The only way to become an expert is to actually design your own stuff and iterate over and over on it until you perfect your skills. Studying and reflecting is important, but doing is equally, if not more, important. The two things complement and enrich each other.
That’s all folks. I really hope this series will be useful to you. Let me know what you think and if it helps. Also, I’d really love if you could enrich it with your suggestions. You can write comments below or send message to me on twitter at @FILWD.
Please do not forget to share this with your friends or people who might benefit from it. Its main purpose is to let you guys become better data visualization experts. Help me to spread the word around.
One of the most exciting and silent phenomenon I have seen developing during the last years/months in data visualization is the growing number of people who are transitioning or already succeeded to make a living by being a data visualization freelancer.
Who is a data visualization freelancer? Basically a self-employed person who sells data visualization services to companies and institutions. What kind of services? I don’t know … the sky is the limits.
And it’s exactly by discussing with Moritz that I came up with the idea of digging deeper into this fantastic world. The thing went more or less like that: I met Moritz at the airport, when we were both invited for a panel at Visualizing Europe, and during a casual chat he told me something along these lines: “You know Enrico … academia and research are cool but at some point I wanted to make real stuff for real people“. Yeah sure I understand. “… I always had small consultancy jobs during my studies so I had some experience … at some point I decided to become a data visualization freelancer“. Cool! “… so you know what? I work from home, I can plan the time myself and make sure I play with my kids and talk with my wife” Uh!? “… and of course I am being successful and I am invited here and there so I am travelling a bit around the world and meeting interesting people“.
Ok, are you salivating already or what? I must admit it, despite I really love my academic job, I felt a bit jealous.
Send your questions! I will interview Moritz next week.
Ok so … I will be interviewing Moritz next week about data visualization freelancing. I started collecting a number of questions for him but I need your help! What would you like to ask to Moritz? What are you curious about? Is there a nasty question no one has the courage to ask? I think it would be much much better if you guys tell me what *you* want to know. So, don’t miss this opportunity. You could realize that being a data visualization freelancer is not a dream. It’s definitely possible! And Moritz can tell you how or at least provide some indications.
(On a side note, I will be experimenting with skype-based video interviews for the first time. I am totally excited by this new format and I’d love to have your feedback once it is done. The only video I currently host here is my interview with JD Fekete on Jacques Bertin, but the quality is really bad. I did some initial tests and the results look amazing. I really hope you will like it.)
Few additional reflections.
1st – Freelancing and working from home is not exclusive to data visualization, it is part of a bigger trend and it’s in my opinion absolutely awesome. The web is full of bloggers and small entrepreneurs that make a living by writing their blogs and giving their services by working from home. If you want to know more give a read to the super-successful 4-hour workweek. Note: some people love it and some people absolutely detest it, but no one can deny it captures a strong trend in our society. You can decide to ignore it or to read it and take the risk to have your life changed. You decide.
2nd – Being a professor used to be one of the best jobs in the world for the amount of freedom professors have (and still has a large degree of benefits in my opinion) but being a freelancer and working from home is a good competitor. Academia has its glow of knowledge and a little bit of mystery on its side but this whole segmentation of professions is going to disappear anyway. I see academia and freelancing somewhat similar as they both feature really special amounts of freedom. Academia maybe gives more opportunities to really do whatever comes to your mind but freelancing, as far as I can understand, has a considerable reduction in the amount of bureaucracy you have to bear with.
3rd – Most importantly, I believe freelancing is just a perfect fit for data visualization. I am not a big fan of generalist visualization because I think people get the best out of it when it meets the specific needs of a project. And this happens when you have a competent person able to listen, understand, and offer a tailored solution. For this reason I am a strong proponent of data visualization freelancing. It pisses me off that nobody is talking about it because it’s really a great trend. Plus, the more competent visualization designers we will have around, the more we will be able to show people great examples to take inspiration from.
Again: send your questions!
Ok, let me repeat it again. I will be interviewing Moritz some time next week. This means the interview will appear here no sooner than about 10 days. I will be able to take your questions into account only if they come during the next few days. Don’t miss this opportunity, send your questions or ideas to me and Moritz. The easiest way is to add a comment here below. Otherwise you can contact us on twitter @FILWD and @moritz_stefaner or send me a private message.
Take care, have fun. We are waiting for your input!
It has been said that in order to create visualisations that matter, you need to ask good questions. But as a relative newcomer to this field, I’ve been wondering recently:
What questions should I be asking?
I come from a marketing angle as opposed to Business Intelligence, and in marketing as much as anywhere, the lack of good questions has given birth to a swarm of pointless infographics.
Too often, the only principles claimed to be upheld are ensuring the content is ‘interesting’ and ‘relevant’. And the questions posed, if any, are:
What’s interesting about this data?
What’s going on in this field right now?
And who is this content relevant to?
The big problem here is how vague the questions are. Instead of ‘what’s interesting’, how can you ask questions that dig into what matters most to your audience? How can you help them? How can your visualisations be really worthwhile?
We should be asking questions of ourselves, of the client, the audience, the data, the context, the media, the visualisation, and design.
By questioning ourselves, I mean to raise awareness of the intent behind our actions. Are you forcing the data to fit your agenda? What factors are influencing your decisions? What assumptions are you making about the process, the client, and the audience?
It’s important to realise not every question will be the killer one each time. In the same way a footballer won’t score with every shot, not every question will strike gold. Some will return answers that barely scrape the surface. But others could unearth a few gems.
If you’re passionate about what you do, you’ll want your visualisations to stand up to certain principles. Amongst others, the standards to which we hold our creations ensure they are:
Purposeful, Effective, Simple, Accurate, and Attractive
And these principles are intended to guide our choices. But simply having them in the back of your mind won’t guide your choices nearly as well as asking questions which force you to do so.
How can you make this easier for them to understand?
How important is this to them?
How well has this already been covered elsewhere?
What data can you gather to support this idea?
How was the data collected?
What assumptions are you making about the data?
What other data or information would make this more meaningful?
In what context will this information be delivered?
How can you deliver the message so it’s easy to follow?
Is any of the information you are presenting misleading or confusing?
Does the design grab you?
Does the design draw attention to the most important parts or is it distracting?
What can you remove without sacrificing clarity?
Without going far along this line of investigation, I can already tell it’s powerful. I have found the questions I’ve been gathering have me thinking differently. Ideas crop up more frequently, and I feel clearer about the tasks at hand.
The biggest revelation for me has been not simply focusing on the data in front of me, or the medium to be used, but turning my attention to the end user, and drilling down into what matters most to them.
Compared to a process chart, or a list of values pinned to the wall, questions are almost impossible to ignore. Like the name of that song you can’t remember that finally springs to mind long after the moment has passed. You just keep working on it.
As I mentioned at the start, I’m in the early stages of this exploration, and I’d love to get your input.
What killer questions do you ask at different stages of the project?
What other questions can you think of?
Please share your thoughts and experiences in the comments below. I’d love to create a big list of questions we can all use to get better at what we do.
Today I got the following email from Ragaar, one of my readers: “Congratulations! Your blog is in the big six, according to eagereyes.” I checked the link and found that Robert kindly included me in his Six Niche Visualization Blogs and said nice words about FILWD.
That’s fine and great of course, but the event also made me realize that new readers might not easily get a full picture of what FILWD is all about and what it has to offer. Plus I have the feeling many of the regular readers too might not know what’s the idea behind it, so I thought I could indulge, for one time, on a self-referential posts. But before I move on to the post, let me send two important messages.
@Robert:EagerEyes is the only blog that convinced me it was possible to write a successful blog about visualization without necessarily showing pictures all the time. Thanks! @Ragaar:I cannot express with words what it means to me to receive such an enthusiastic email from a reader because MY blog was mentioned in a more famous one. That’s well beyond my expectations. You made my day.
Why should you read FILWD?
I know, I know … this looks like a message coming from the marketing department. But there’s no marketing department here, only me behind a screen.
The world is flooded by data.
People love visualization because it turns (boring) numbers into flashy images.
Thus they take data, throw colored pixels on the screen, and call it visualization.
But visualization is much more than eye-candy and few people know it or realize it.
Many are jumping on the wagon but don’t know where or how to start, so they just start and throw pixels.
If they type “visualization” in Google or anything similar they are redirected to famous blogs with thousands of flashy images but almost no guidance.
Academics have been studying visualization for a long time but people jumping on the wagon don’t know what they have to offer and how to access it.
Visualization users have specific needs that academics ignore. Plus academics have a deep faith on their knowledge but visualization theory is really really limited.
People need more guidance.
People need to be told that visualization is much much more than flashy images.
Visualization users and academics need a bridge.
FILWD aims at addressing these consequences, based on the aforementioned facts. If you like this plan jump on the wagon and let’s do this journey together.
Most popular posts
Now that you know what FILWD is all about let me guide you through the most popular posts I have had since I started 7 months ago. These might not be my favorites but I have a religious respect for the crowd and if people liked them so much there might be a reason. So here we go.
7 Classic Foundational Vis Papers You Might not Want to Publicly Confess you Don’t Know – When I saw how people reacted to this post I was blown away, I could not believe it. This is still by far the most successful post I have ever written and it continues to attract people every day. Currently it has 168 tweets and 22 comments. I cannot tell why it is so popular but I can tell that this is the prototype of the kind of contribution FILWD wants to give. From day one my focus has been on bridging the divide between what people do in academia and what people out there need in terms of visualization. If more people read this stuff, everyone will profit from it.
How to become a data visualization ninja with 3 free tools for non-programmers – This is a very practical post. I think part of its popularity is due to the sexy name. Nonetheless, the main message of the post, regardless the fact that it point to some real, useful, and free tools, is that data visualization is a lot about data manipulation before anything could be visualized. The post just gives you an entry point to make the whole thing less cumbersome.
How to Become a Data Visualization Expert: A Recipe – I am sure there are lots of people who are interested in becoming visualization experts today. That is really great! But there is also so much noise that I think it is really really hard to trace your own path in such a mess. This post is intended to help a novice take the right path in this amazing world.
How do you visualize too much data? – This is a technical/academic post. A very large number of visualizations we see on the web have very low data density. But people is confronted every day with tens of thousands or even millions of data records. How do you visualize them? The post gives some initial suggestions on which tactics can be used.
Demystifying Cargo Cult Visualization: You Cannot Visualize 3 Variables by Mixing 3 Colors – This is a “rant” kind of post, but supported by some hard visualization theory. I must admit I was a bit afraid when I pushed the “publish” button of this one. The whole post is an open criticism to a visualization published by the SEED magazine (and it was pretty tough!) but it really was meant to be used as a concrete example to convey a deeper message: in order to design innovative visualizations you have to know the science behind it before.
Can visualization influence people? I mean can we prove it? – This is a recent one. A very candid one admittedly. The question was originally posed by a guy attending an invited talk I gave at the IDRC in Canada and to whom I could not give a satisfactory answer. So I thought I could just “crowdsource” it and see what the answer would be. Luckily, I received lot of comments. And these comments are the real wealth of the whole post.
Why Visualization Cannot Afford Ignoring Data Mining and Vice Versa – I think there are a lot of data mining guys out there who are increasingly looking into visualization as a way to solve some of the problems machines cannot solve. On the other hand visualization cannot really handle problems with very complex data without the use of automatic techniques. The post summarizes the main issues and suggests ways and reasons for a tighter integration.
Few reflections on the popular posts lists
You guys seem to like when I posts things starting with “How to …”. This is not new. It’s a very well known fact that how-to posts tend to attract people. Nonetheless, it also demonstrates that many of you are in need to learn the basics of visualization, and this is another fundamental premise of FILWD.
The fact that the most popular post is one with a long list of research papers is a big revelation for me. I always suspected that people interested in visualization need to better know what research has to offer, but I could never imagine such an interest. This is a big call for those who, like me, write about visualization: there are people out there that are interested in more than flashy graphics and animated bubbles!
Most underrated posts
There are a couple of posts I think deserved more than the attention they received.
InfoVis Makes Us Cyborgs – In this post I was trying to explain how visualization is a natural development of the way the world is changing and that it can bee seen, as well as other technologies, an extension of our mind. My guess is that the article was too hard to digest and far from the needs of the readers. But, if you would like to give it another try I think it could be worth it.
When will we decide to provide lots of value? – This one is a call to action to all my fellow bloggers or the aspiring ones. As I said above, people are desperately in need of good quality sources and it’s our responsibility to make them available and easily accessible.
Why do I write FILWD?
There are a number of selfish and narcissistic reasons for having a blog of course, and I am not immune to that. But let me tell you that the biggest satisfactions I had from writing this stuff so far is the number of enthusiastic messages I received and keep receiving from people saying that they desperately needed some of the things written here. There is nothing better for me than knowing I am useful. In the end this is what FILWD is for me: doing something useful for other people.
What do you want from FILWD? Speak up!
Now it’s your turn to speak. What do you want to see in FILWD? Is there something special you would like to see here? Is there anything you did not like? Do you have suggestions on how to make FILWD more useful to you? I am happy to hear. You can write a comment below or send me a message on twitter.
This is a follow-up post on how visualization can influence people. If you are reading this for the first time, the whole thing originated from a quite candid post of mine, inspired by a question I got during one of my talks: “Can visualization influence people? I mean can we prove it?“. The post seemed to touch the right nerve and quite some people commented on it with interesting suggestions and interpretations. I summarized these ideas in a follow-up post: “Data Visualization and Influence (Part 1)” and since I felt there could be more to say I split it into two parts. Here we are with the second part.
Florence Nightingale’s Causes of Mortality in the Army in the East.
“She convinced military authorities, Parliament and Queen Victoria to carry out her reforms. In recognition of her statistical ability, she became the first female member of the Royal Statistical Society in 1859. Her insistence on good sanitation, fresh air and public health saved thousands of lives, both for soldiers and civilians, on battlefields and in hospitals.”
Jamie Oliver’s wheelbarrow full of sugar (from about 13:00 minutes in)
Technically not a visualization in the standard format. But I think it’s a good and inspiring example anyways. I am sure Jamie Oliver convinced with this lots of people how crazy the volume of sugar in our societies is.
Al Gore’s Inconvenient Truth Maybe this one does not even need a comment. I don’t know how many people have been convinced after this video full of graphics that climate change is a serious thing, but for sure we cannot say that it generated indifference. This work had for good or bad a strong influence.
Please if you have more examples to suggest write a comment below or send me a message. I’ll try to keep an updated list of influential visualizations. This will be useful for me and you all when talking with someone about influential visualizations as we will have some examples ready to be shown right away.
– “Google Scholar nets the most promising hit for “visualization persuasion” in the paper of Sheppard, Shaw, Flanders and Burch about using visualization to combat climate change.” (jakob): Can Visualisation Save the World? (Research Paper)
– “… we recently started aiming at understanding how visualisation influence business decision makers.” (Zied M. Ouertani): Service Performance Information (Web Page)
– “… I guess it is useful to address the notion of opinion shaping … I wrote a blog post exploring this a few months ago” (Anthony Hamelle): Shaping Opinions (Blog Post)
– “I’m a bit skeptical about the HP paper that someone linked to; it leans rather heavily on the questionable assertion that we remember 80% of what we see and do …” (Chris Atherton): Forgetting Curve (Wikipedia Page)
– “Your comments reminded me of a crucial aspect which is the skills or qualities of the presenter … I remember Robertson et al. discussing this issue in their Effectiveness of animation in trend visualization.” (Enrico): Effectiveness of animation in trend visualization (Research Paper)
– “Then again, if you just want to find out “what works” rather than “why” at first, there might be a starting point in creating narratives … The power of narratives to sway people is well documented” (jakob): Couch it in a Narrative (Blog Post)
– “Persuading people is certainly a complex issue. One of my favourite examples is an experiment that found that people who had just gone up an escalator were twice as likely to give to charity as people approached after travelling downwards!” (@DaveAnalyst): Why Getting ‘High’ Increases Acts of Charity
This concludes this mini-series on visualization and influence. I thank you all again for posting so many good comments and links! I hope this will continue to inspire you all. Please don’t forget to retweet or comment if you find it useful and interesting.