About Stat 892/992 - Special Topics in Data Visualization
Course Description
Design and use of data visualizations for statistical communication. Topics include the grammar of graphics, methods for evaluating graphics for utility, visual inference and visual statistics, high dimensional graphics, and exploratory data analysis methods. This course will be reading and writing intensive. Familiarity with R or python programming (pandas, numpy) is expected.
Course Goals
- Assess the consequences of different graphical design decisions, identifying primary and secondary comparisons and likely areas of user focus.
- Compare the principles of the grammar of graphics with implementations in different software packages and identify the consequences of these implementation decisions.
- Evaluate existing visualization studies and suggest designs for new experiments to evaluate the effectiveness of different visualization design decisions.
- Discuss the similarities and differences between visual inference and classical Frequentist or Bayesian statistical inference procedures.
- Use appropriate methods to view high-dimensional data and discuss the strengths and weaknesses of different approaches to high dimensional data visualization, including nonlinear dimension reduction (t-SNE, UMAP) and interactive tours.
Course Objectives
(what you should be able to do at the end of this course)
A. Given a graphic, identify primary and secondary comparisons, likely areas of user focus, and describe the consequences of different design decisions. Suggest (and create) other alternative graphics that improve deficiencies in the original, and compare the strengths and weaknesses of the different versions of the chart before deciding on an optimal version. (Goals: 1)
B. Discuss or write a comparison between one or more software implementations of the grammar of graphics and the theoretical construct. Identify and critique the implementation decisions and assess the consequences of these decisions for usability and consistency within a syntactic framework. Examine defaults within each implementation and evaluate the pros and cons of these default design constructs when creating graphics. (Goals: 1, 2)
C. Develop a user study that evaluates a visualization design in a way that extends existing research in the field. Develop appropriate data sets and models, identify necessary experimental controls, and generate relevant stimuli for the experiment. In a mini-paper, motivate the experiment by connecting it to existing visualization literature and grounds the study conceptually. If applicable, examine the role the grammar of graphics plays in the implementation of visualization research. (Goals: 3; 2, 4 optional)
D. Using a specific example, develop a visual inference experiment that answers a research question. Write a paper comparing and contrasting the goals and implementation of a statistical inference procedure and a visual inference procedure, evaluating what can be learned from each and what factors may complicate each procedure. (Goals: 3, 4)
E. Describe the differences between two high-dimensional data visualization techniques, and what can be seen from each. For a given dataset or scenario, suggest a procedure for navigating high-dimensional data to explore for relationships between the variables. (Goals: 1, 5)
Textbooks
I will make every effort to ensure that the textbooks which are required for this course are electronically reserved through the library.
- Getting (more out of) Graphics by Antony Unwin
- Exploratory Data Analysis by John Tukey
- Visualization Analysis and Design by Tamara Munzner
- Fundamentals of Data Visualization by Claus Wilke
In addition to these books, we will also read a selection of journal articles that will be assigned on a week-by-week basis.