11 essential features that visual analysis tools should have
Visual analysis tools are adding advanced analytics for big data
After recently playing with SAS Visual Analytics, I’ve been thinking about tools for visual analysis. By visual analysis I mean the type of analysis most recently popularized by Tableau, QlikView, and Spotfire: you encounter a data set for the first time, conduct exploratory data analysis, with the goal of discovering interesting patterns and associations. Having used a few visualization tools myself, here’s a quick of features (culled from tools I’ve used or have seen in action).
Requires little (to no) coding
The viz tools I currently use require programming skills. Coding means switching back-and-forth between a visual (chart) and text (code). It’s nice to be able to customize charts via code, but when you’re in the exploratory phase not having to think about code syntax is ideal. Plus GUI-based tools allow you to collaborate with many more users.
Includes an expanded set of basic charts
Aside from statistical graphics (line, bar, scatter, histogram, bubble, boxplot,…), these days the ability to visualize hierarchies (treemap), financial (stock charts), longitudinal, geospatial (maps) and network data are essential.
Charts are easy to customize
It should be easy to tweak labels, colors, and other elements. There are times when default labels need to be resized or repositioned, to make them legible. You should also be able to adjust coloring schemes to your liking (colors are usually assigned based on category or, in the case of heat maps, value).
Templates can be created
Once you create a chart with your preferred color and labeling scheme, you should be able to templatize it for future projects. [Ideally templates support rule-based formatting (“if negative, color = red”), but this starts to involve some coding.]
Visual summaries are easy to generate (histograms, association matrix)
You’ll be exploring data sets that contain many observations (rows) and variables (columns). SAS Visual Analytics produces a quick summary (average, min/max, histogram) for each variable and displays the results in a compact, scrollable format. This is done entirely through a GUI and doesn’t require any coding.
Drill-down to source points: identify, isolate, and fix minor data errors
Visual summaries alert you to potential problems with your data (outliers or errors). A few tools give you the ability to isolate outliers or fix simple data problems through a GUI. More generally, it’s nice to be able to drill-down from the chart to examine (via dynamic rollover or other method) the underlying data.
In-place filtering
While exploring data, you need to be able to quickly filter by value or category – using checkboxes, drop-downs, sliders, …
Support for visual pivoting
Many business analysts are heavy users of pivot tables – a tabular summarization technique found in spreadsheets and reporting tools. Visual pivoting replaces tabular presentation with charts. My first experience using this type of visual exploration was through the Trellis graphs introduced in S/S-Plus. Thanks to Tableau’s easy-to-use interface, this form of visual analysis has become a popular way to explore data.
Support for analytics
Many visualization tools lack analytic capabilities. From simple (error bar, quantiles) to advanced (clustering, forecasting, multidimensional scaling), analytic tools expand what users can do. Case in point, SAS Visual Analytics has tools for conducting sensitivity analysis and forecasting (GUI-based, no coding required). An example is to take a given time-series (unit sales), plot a forecast of its behavior for the next six time periods, and study how the forecast varies when other key variables (customer satisfaction) change.
Tools for sharing, collaboration, and replication
Several tools let you publish your static or interactive charts, and some tools even let you subscribe to the work of other users. For sharing, collaboration, and documentation, it should be possible to annotate your work. Being able to collaborate with others would be nice, at a minimum one should at least be able to copy ( modify) the work of another user.
Big Data: Volume and Variety
A tool should produce charts quickly even when it’s hitting massive data sets. Simply put, it should be truly interactive. Several new tools target larger data sets, some are geared specifically for Hadoop users (a partial list includes Datameer, Platfora, SiSense, and SAS Visual Analytics). But there will be occasions when you’ll be working with small data sets (or be offline). To that end you should be able to visually explore small data (locally using your laptop) without having to connect to a more powerful environment (such as a cluster or a beefy server).
I haven’t come across great viz tools for exploring unstructured data, so I’ll interpret variety in a different way. Co-existence (usually of Hadoop & data warehouses) means data will continue to reside in different systems. Being able to connect to a variety of data sources is essential. (Among startups, Datameer does a good job of this.) Some tools include public data sets (e.g., US Census) and use them to generate examples.
Update (5/23/2012):
Recommend items worth investigating
When you first encounter a data set with lots of variables, it can be a bit overwhelming. Using simple pattern recognition techniques, tools should surface associations/patterns/anomalies worth investigating. Some tools in finance do this for time-series: trends, new highs/lows, and forecasts are drawn automatically. I’d love to have suggestions for what visual pivots (trellis charts) to draw.