Data Visualization Course

Data Visualization

Course Description

This 8 hour class will provide the foundations for creating better graphical information from potentially very large data sources to help in data discovery and enhance reported results. First principles and the human elements of information visualization from multiple leading sources such as Edward Tufte and Stephen Few will be explored using example data sets. We will discuss best practices to most effectively and efficiently make your point. We will explore common errors and make recommendations for aesthetics to include color, font, dimensionality, size, proportion, and scaling. Appropriate displays for univariate and multivariate plots, time dependent data, maps, networks, and animation will be recommended. This hands-on workshop will use Excel and R to the maximum extent practical and the latest trial versions of JMP and Tableau for participants to create and dynamically modify graphs.

Course Goals/Objectives

A participant who successfully completes this course will:

  • Know the definition of data visualization and information visualization.
  • Be familiar with human perception and how to effectively use it to make better graphical displays
  • Understand the principles of graphical excellence
  • Avoid common mistakes in graphical displays
  • Be able to use software to create graphs that best convey information about the business problem
  • Know what displays are effective for univariate distributions, multivariate correlations and models, maps, and networks
  • Be able to create and export animated graphics

Course Outline

  1. Introduction to Information and Data Visualization
    1. Definitions and other closely related disciplines
    2. Data and Information Visualization Uses
    3. Examples from published sources and run with datasets using course software
  2. First Principles (from leading IV texts by Stephen Few and Edward Tufte)
    1. Data integrity, sources and types—continuous or categorical (nominal, ordinal)
    2. Visual perception
    3. Graphical excellence, data-ink ratio, aesthetics (color, font, size, proportion, reference lines,…)
  3. Single Variable Methods
    1. Numerical summaries for continuous variables (measures of location and scale)
    2. Distribution analysis (histogram, box plot, QQ plot, stem and leaf, density plots, probability distributions)
    3. Part to Whole methods (pie chart, doughnut chart, bar graphs)
    4. Outlier detection and impact
  4. Bivariate Methods
    1. Correlation methods—scatterplots, line graphs, density ellipse, color maps, contour plots
    2. Methods to detect differences across categorical levels—ANOVA, box plots,
    3. Categorical analysis methods—measures of association, tree maps, mosaic plots
  5. Multivariate Methods
    1. Distributions—brushing, multivariate histogram/density plots
    2. Correlations—heatmaps, conditional correlation matrix, multivariate scatterplots, parallel plots, cell plots
    3. Dimension reduction—principal components
    4. Multivariate outliers
  6. Predictive Analytics Use of Graphical Displays
    1. Cluster analysis—dendograms, cluster distances
    2. Regression anlaysis—factor and surface profilers, model diagnostic plots, variable interaction, outlier analysis
    3. Decision tree methods—decision tree, impurity measures, model performance, variable importance
    4. Model comparison—lift charts, ROC curves
  7. Time Series Methods
    1. Time series plots, bubble plots,
    2. Autocorrelation and Partial Autocorrelation Function plots
    3. Smoothing, seasonality, and ARIMA model forecasts
  8. Text Mining Visualization
    1. Word clouds
    2. Document term matrix and principal components
    3. Word frequencies and corpus summaries
    4. Word association and relationships
  9. Dasboarding
    1. Examples
    2. Principles of useful dashboards
  10. Advanced Methods
    1. Mapping and geospatial methods
    2. Graph Builder by SAS
    3. Filtering and Animation
    4. Interactive visualization
    5. Network anlaysis
  11. Case Study end-to-end exercise emphasizing core principles of class