Eye Fitting Straight Lines in the Modern Era

Authors

Emily A. Robinson

Reka Howard

Susan VanderPlas

Abstract

How do statistical regression results compare to intuitive, visually fitted results? Fitting lines by eye through a set of points has been explored since the 20th century. Common methods of fitting trends by eye involve maneuvering a string, black thread, or ruler until the fit is suitable, then drawing the line through the set of points. In 2015, the New York Times introduced an interactive feature, called ‘You Draw It’, where readers were asked to input their own assumptions about various metrics and compare how these assumptions relate to reality. In this paper, we validate ‘You Draw It’ as a method for graphical testing, comparing results to the less technological method utilized in and extending that study with formal statistical analysis methods. Results were consistent with those found in the previous study; when shown points following a linear trend, participants tended to fit the slope of the first principal component over the slope of the least-squares regression line. This trend was most prominent when shown data simulated with larger variances. This study reinforces the differences between intuitive visual model fitting and statistical model fitting, providing information about human perception as it relates to the use of statistical graphics.

1 Introduction

We all use statistical graphics, but how do we know that the graphics we use are communicating properly? When creating a graphic, we must consider the design choices most effective for conveying the intended result. For instance, we may decide to highlight the relationship between two variables in a scatterplot by including a trend line, or adding color to highlight clustering (VanderPlas and Hofmann 2017). These design choices require that we understand the perceptual and visual biases that come into play when creating graphics, and as graphics are evaluated visually, we must use human testing to ground our understanding in empiricism.

Much of the research on the perception of visual features in charts has been conducted in psychophysics and tests for accuracy and quantitative comparisons when understanding a plot. Cleveland and McGill (1984) conducted a series of cognitive tasks designed to establish a hierarchy of visual components for making comparisons. For example, it is more effective to display information on an \(x\) or \(y\) axis rather than using color in order to reduce the visual effort necessary to make numerical comparisons. Cleveland and McGill (1985) found that assessing the position of points along an axis is easier than determining the slope of a line. Other studies focused on the viewers’ ability to perceive the strength of the relationship between \(x\) and \(y\) coordinates in a scatterplot. For instance, when the data appear dense, viewers tend to overestimate the magnitude of the correlation coefficient (Cleveland, Diaconis, and McGill 1982; Lauer and Post 1989). Cleveland (1993) provided an argument for displaying cyclical patterns with an aspect ratio which sets the curve close to 45\(^{\circ}\). Kosslyn and Kosslyn (2006) examined how Gestalt principles of perceptual organization are instrumental in extracting data from a chart. For example, Ciccione and Dehaene (2020) conducted a study to support data points located closer together are more likely to be perceived as the same group and Appelle (1972) found that it is easier to discriminate vertical and horizontal lines than oblique lines. The results of these cognitive tasks provided some consistent guidance for chart design; however, other methods of visual testing can further evaluate design choices and help us understand cognitive biases related to the evaluation of statistical charts.

1.1 Testing Statistical Graphics

We need human testing of graphics in order to draw broad conclusions, develop guidelines for graphical design, and improve graphical communication. Studies might ask participants to identify differences in graphs, read information off of a chart accurately, use data to make correct real-world decisions, or predict the next few observations. All of these types of tests require different levels of use and manipulation of the information being presented in the chart. Early research studies considered graphs from a psychological perspective (Spence 1990; Lewandowsky and Spence 1989), testing participants’ abilities to detect a stimulus or a difference between two stimuli. Psychophysical methods have been used to test graphical perception, as in VanderPlas and Hofmann (2015a), which used the method of adjustment - a technique which requires participants to alter a changing stimulus to match a given constant stimuli (Gescheider 1997)- to estimate the magnitude of the impact of the sine illusion. However, there are more modern testing methods that have been developed since the heyday of psychophysics.

One major development in statistical graphics which led to more advanced testing methods is Wilkinson’s Grammar of Graphics (Wilkinson 2013). The Grammar of Graphics serves as the fundamental framework for data visualization with the notion that graphics are built from the ground up by specifying exactly how to create a particular graph from a given data set. Visual representations are constructed through the use of “tidy data” which is characterized as a data set in which each variable is in its own column, each observation is in its own row, and each value is in its own cell (Wickham and Grolemund 2016). Graphics are viewed as a mapping from variables in a data set (or statistics computed from the data) to visual attributes such as the axes, colors, shapes, or facets on the canvas in which the chart is displayed. Software, such as Hadley Wickham’s ggplot2 (Wickham 2011), aims to implement the framework of creating charts and graphics as the Grammar of Graphics recommends.

Combining the Grammar of Graphics with another tool for statistical graphics testing, the statistical lineup, yields a method for evaluating graphical design choices. Buja et al. (2009) introduced the lineup protocol to provide a framework for inferential testing. A statistical lineup is a plot consisting of smaller panels where the viewer is asked to identify the target panel containing the real data from among a set of decoy null plots which display data under the assumption there is no relationship. If the viewer can identify the target panel randomly embedded within the set of null panels, this suggests that the real data is visually distinct from data generated under the null model. Through experimentation, methods such as the lineup protocol allow researchers to conduct studies geared at understanding human ability to conduct tasks related to the perception of statistical charts such as differentiation, prediction, estimation, and extrapolation (VanderPlas and Hofmann 2017, 2015b; Hofmann et al. 2012). The advancement of graphing software provides the tools necessary to develop new methods of testing statistical graphics. While these testing methods are excellent, there is one particular subset of statistical graphics testing methods which we intend to develop further in this paper: assessing graphics by fitting statistical models “by eye”.

2 Methods

Annotation Key

  • Study design and data collection

  • Setting - location, timeframe, recruitment/inclusion criteria

  • Sampling method

  • Variables collected (and any derived variables)

  • Statistical analysis methods

2.1 Participants

Participants were recruited through through Twitter, Reddit, and direct email in May 2021. A total of 35 individuals completed 131 unique ‘You Draw It’ task plots. Data were collected as a part of a pilot study meant to test the applet; therefore, either voluntary participant dropout or disconnection from a server not designed to accommodate large magnitudes of participants resulted in missing plots in our data set for analysis. All participants had normal or corrected to normal vision and signed an informed consent form. The experimental tasks took approximately 15 minutes to complete. As this is a pilot study, participants from Twitter and Reddit pages related to data visualization voluntarily completed the study and likely have an interest in fields related to statistics and want to help advance research in graphics. While this study does utilize a convenience sample, as this is primarily a perceptual task, previous results have found few differences between expert and non-expert participants in this context (VanderPlas and Hofmann 2015b). These data were collected to validate this method of graphical testing, with the hopes of providing a new tool to assess graphical perception interactively. Participants completed the experiment on their own computers in an environment of their choosing. The experiment was conducted and distributed through a Shiny application (Chang et al. 2021) found at emily-robinson.shinyapps.io/you-draw-it-validation-applet.

2.2 ‘You Draw It’ Task

In the study, participants were shown an interactive scatterplot Figure 1 along with the prompt, “Use your mouse to fill in the trend in the yellow box region.” The yellow box region moved along as the user drew their trend-line, providing a visual cue which indicates where the user still needed to complete a trend line. After the entire domain had been visually estimated or predicted, the yellow shaded region disappeared, indicating the participant had completed the task. Data Driven Documents (D3), a JavaScript-based graphing framework that facilitates user interaction, was used to create the ‘You Draw It’ visual. In order to allow for user interaction and data collection, we integrated the D3 visual into Shiny using the r2d3 package (Strayer, Luraschi, and Allaire 2022). While the interface is highly customized to this particular task, we hope to generalize the code and provide a Shiny widget in an R package soon.

Figure 1: ‘You Draw It’ task plot as shown to particpants during the study. The first frame (left) illustrates what particpants first saw with the prompt “Use your mouse to fill in the trend in the yellow box region.” The second frame (middle), illustrates what the particpant saw while completing the task; the yellow shaded region provided a visual cue for participants indicating where the participant still needed to complete a trend-line. The last frame (right) illustrates the participants finished trend-line before submission.

2.3 Data Generation

All data processing was conducted in R software environment for statistical computing and graphics (R Core Team 2021). A total of \(N = 30\) points \((x_i, y_i), i = 1,...,N\) were generated for \(x_i \in [x_{min}, x_{max}]\) where \(x\) and \(y\) have a linear relationship. Data were simulated based on the point-slope form of a linear model with additive errors: \[\begin{align} y_i = \beta_1(x_i-\bar{x}) + y_{\bar{x}} + e_i \\ \text{with } e_i & \sim N(0, \sigma^2). \nonumber \end{align}\]

Model equation parameters, \(\beta_1\), \(y_{\bar{x}}\), and parameter choice letter names (S, F, V, N), were selected to reflect the four data sets used and labeled in Mosteller et al. (1981) Table 1. The mean of the generated \(x\) values and the predefined \(y\) value at \(\bar x\), denoted \(y_{\bar x}\) were used in the point-slope equation of a line. Parameter choices S, F, and N simulated data across a domain of 0 to 20. Parameter choice F produced a trend with a positive slope and a large variance while N had a negative slope and a large variance. In comparison, S showed a trend with a positive slope and a small variance while V yielded a steep positive slope with a small variance over the domain of 4 to 16. Figure 2 illustrates an example of simulated data for all four parameter choices intended to reflect the trends in Mosteller et al. (1981). Aesthetic design choices were made consistent across each of the interactive ‘You Draw It’ task plots. The y-axis range extended 10% beyond (above and below) the range of the simulated data points to allow for users to draw outside the simulated data set range and avoid anchoring their lines to the corners of the plot.

Table 1: Designated model equation parameters for simulated data.
Parameter Choice \(y_{\bar{x}}\) \(\beta_1\) \(\sigma\) Domain
S 3.88 0.66 1.30 (0,20)
F 3.90 0.66 1.98 (0,20)
V 3.89 1.98 1.50 (4,16)
N 4.11 -0.70 2.50 (0,20)
Figure 2: Example of simulated data points displayed in a scatterplot illustrating the trends associated with the four selected parameter choices.

2.4 Study Design

This experiment was conducted as part of a larger study of the perception of log and linear scales; for simplicity, we focused on the study design and methods related to the current study. Each data set was generated randomly and independently for each participant at the start of the experiment and mapped to a scatterplot. Participants in the study were shown two ‘You Draw It’ practice plots in order to train participants in the skills associated with executing the task - in particular, the responsiveness of the applet requires that participants draw a line at a certain speed, ensuring that all of the evenly spaced points along the hand-drawn line are filled in. During the practice session, participants were provided with instruction prompts accompanied by a .gif and a practice plot. Instructions guided participants to start at the edge of the yellow box, to make sure the yellow shaded region was moving along with their mouse as they drew, and that they could draw over their already drawn line. Practice plots were then followed by one of each of the four ‘You Draw It’ task plots associated with the current study (S, F, V, and N). The order of the task plots was randomly assigned for each individual in a completely randomized design.

3 Results

3.1 Fitted Regression Lines

We compared the participant drawn line to two regression lines determined by ordinary least squares (OLS) regression and regression based on the principal axis (PA). Figure 3 illustrates the difference between an OLS regression line which minimizes the vertical distance of points from the line and a regression line based on the PA which minimizes the Euclidean distance of points (orthogonal) from the line.

Due to the randomness in the data generation process, the actual slope of the linear regression line fit through the simulated points could differ from the predetermined slope. Therefore, we fit an OLS regression to each scatterplot to obtain estimated parameters \(\hat\beta_{0,OLS}\) and \(\hat\beta_{1,OLS}\). Fitted values, \(\hat y_{k,OLS}\), were then obtained every 0.25 increment across the domain from the OLS regression equation, \(\hat y_{k,OLS} = \hat\beta_{0,OLS} + \hat\beta_{1,OLS} x_k\), for \(k = 1, ..., 4 x_{max} +1\). The PA regression slope, \(\hat\beta_{1,PA}\), and y-intercept, \(\hat\beta_{0,PA}\), were determined using the mcreg function in the mcr package in R (Manuilova, Schuetzenmeister, and Model 2021) which implements Deming regression (equivalent to a regression based on the slope of the first principal axis). Fitted values, \(\hat y_{k,PA}\) were then obtained every 0.25 increment across the domain from the PA regression equation, \(\hat y_{k,PA} = \hat\beta_{0,PA} + \hat\beta_{1,PA} x_k\), for \(k = 1, ..., 4 x_{max} +1\).

Figure 3: Comparison between an OLS regression line which minimizes the vertical distance of points from the line and a regression line based on the principal axis which minimizes the Euclidean distance of points (orthogonal) from the line.

4 Discussion and Conclusion

The intent of this research was to adapt ‘You Draw It’ from the New York Times feature as a tool and method for testing graphics and introduce a method for statistically modeling the participant drawn lines. We provided support for the validity of the ‘You Draw It’ method by replicating the study found in Mosteller et al. (1981). Using generalized additive mixed models, we assessed the deviation of the participant drawn lines from the statistically fitted regression lines. Our results found that when shown points following a linear trend, participants visually fit a regression line that mimics the first principal axis regression as opposed to ordinary least squares regression. Data simulated with a larger variance provided strong support for a participants tendency to visually fit the first principal axis regression. We utilized modern technology to replicate a study conducted 40 years ago, and strengthened the original results with current analysis methods which allow for more flexibility and sophistication. Our results indicate that participants minimized the distance from their drawn regression line over both the \(x\) and \(y\) axis simultaneously. We allowed participants to draw trend lines that deviated from a straight line and gained an insight into the curvature the human eye perceives in a set of points. Researchers in cognitive and human movement sciences have found that human arm movement is a complex task (Miall and Haggard 1995; Rousset, Bérard, and Ortega 2015). The ‘You Draw It’ method described in this paper uses indirect interaction in which the mouse position and resulting visual line on the screen are dissociated. Therefore, curvature found in participant drawn lines from a straight lines could potentially be explained by the lack of coordination which results from the eye-hand dissociation from indirect drawing and the distortion of visual perception affecting the curvature of movements. Additionally, there is a training effect related to the completion of the ‘You Draw It’ task - the movement of the line must be slow so that the visual representation on the screen can accurately capture each movement. De Graaf, Sittig, and Gon (1991) conducted a study in which participants moved their hand slowly from an initial position in front of them to a visual target (movement task); they were then asked to repeat the task using different sizes of pointers (perceptual task). Their results indicated that deviations from the shortest pointers were comparable to those of the movement task, but that bias increased as the length of the pointer increased. While we suggested participants use a mouse to complete the study, we could not require the use; therefore, some participants may have used a track-pad and results may have been influenced by the pressure placed on their track-pad (Easton and Falzett 1978).

5 Future Work

This study provided a basis for the use of ‘You Draw It’ as a tool for testing statistical graphics and introduced a method for statistically modeling participant drawn lines using generalized additive mixed models. Additional studies related to the validation and use of the tool would be useful for providing insight into explanations of biases introduced by the task such as the deviation from a straight line. For instance, a variation on the current study could compare manual adjustment methods such as shifting and rotating a horizontal line segment until the fit is suitable to the ‘You Draw It’ method on the same set of data. This might explain the large deviation from the participant drawn line as \(x\) approaches 20. Another useful extension study would be to compare the ‘You Draw It’ method as conducted by direct interaction - using a digital pen on a tablet - to indirect interaction - using a computer mouse to relate to a pointer on the screen. Further extensions to this work might ask participants to draw a trend-line through scatterplots with one (or multiple) extreme outliers in order to evaluate the perceptual system’s resistance to outliers.

While the focus of this study was on drawing linear trend-lines, further investigation is necessary to implement this method in non-linear settings and with real data in order to facilitate scientific communication - a strength of the combination of the flexible ‘You Draw It’ method and GAMM analysis method. This tool could also be used to evaluate human ability to extrapolate data from trends. In the future, we intend to create an R package designed for easy implementation of ‘You Draw It’ task plots in order to make this tool accessible to other researchers.

6 Supplementary Material

References

Aisch, Gregor, Amanda Cox, and Kevin Quealy. 2015. “You Draw It: How Family Income Predicts Children’s College Chances.” The New York Times. The New York Times. https://www.nytimes.com/interactive/2015/05/28/upshot/you-draw-it-how-family-income-affects-childrens-college-chances.html.
Appelle, Stuart. 1972. “Perception and Discrimination as a Function of Stimulus Orientation: The" Oblique Effect" in Man and Animals.” Psychological Bulletin 78 (4): 266.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.
Bostock, Michael, Vadim Ogievetsky, and Jeffrey Heer. 2011. “D\(^3\) Data-Driven Documents.” IEEE Transactions on Visualization and Computer Graphics 17 (12): 2301–9.
Buchanan, Larry, Haeyoun Park, and Adam Pearce. 2017. “You Draw It: What Got Better or Worse During Obama’s Presidency.” The New York Times. The New York Times. https://www.nytimes.com/interactive/2017/01/15/us/politics/you-draw-obama-legacy.html.
Buja, Andreas, Dianne Cook, Heike Hofmann, Michael Lawrence, Eun-Kyung Lee, Deborah F Swayne, and Hadley Wickham. 2009. “Statistical Inference for Exploratory Data Analysis and Model Diagnostics.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 367 (1906): 4361–83.
Chang, Winston, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. 2021. Shiny: Web Application Framework for r. https://CRAN.R-project.org/package=shiny.
Ciccione, Lorenzo, and Stanislas Dehaene. 2020. “Grouping Mechanisms in Numerosity Perception.” Open Mind 4: 102–18.
———. 2021. “Can Humans Perform Mental Regression on a Graph? Accuracy and Bias in the Perception of Scatterplots.” Cognitive Psychology 128: 101406.
Cleveland, William S. 1993. Visualizing Data. Summit, NJ: Hobart Press.
Cleveland, William S, Persi Diaconis, and Robert McGill. 1982. “Variables on Scatterplots Look More Highly Correlated When the Scales Are Increased.” Science 216 (4550): 1138–41.
Cleveland, William S, and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54.
———. 1985. “Graphical Perception and Graphical Methods for Analyzing Scientific Data.” Science 229 (4716): 828–33.
De Graaf, JB, AC Sittig, and JJ van der Gon. 1991. “Misdirections in Slow Goal-Directed Arm Movements and Pointer-Setting Tasks.” Experimental Brain Research 84 (2): 434–38.
Deming, William Edwards. 1943. Statistical Adjustment of Data. New York, NY: John Wiley & Sons.
Easton, Randolph D, and Michelle Falzett. 1978. “Finger Pressure During Tracking of Curved Contours: Implications for a Visual Dominance Phenomenon.” Perception & Psychophysics 24 (2): 145–53.
Finney, DJ. 1951. “Subjective Judgment in Statistical Analysis: An Experimental Study.” Journal of the Royal Statistical Society: Series B (Methodological) 13 (2): 284–97.
Gescheider, George. 1997. Psychophysics: The Fundamentals. 3rd ed. Mahwah, NJ: Lawrence Erlbaum Associates.
Hofmann, Heike, Lendie Follett, Mahbubul Majumder, and Dianne Cook. 2012. “Graphical Tests for Power Comparison of Competing Designs.” IEEE Transactions on Visualization and Computer Graphics 18 (12): 2441–48.
Katz, Josh. 2017. “You Draw It: Just How Bad Is the Drug Overdose Epidemic?” The New York Times. The New York Times. https://www.nytimes.com/interactive/2017/04/14/upshot/drug-overdose-epidemic-you-draw-it.html.
Kosslyn, Stephen M, and Stephen Michael Kosslyn. 2006. Graph Design for the Eye and Mind. New York, NY: Oxford University Press.
Lauer, Thomas W, and Gerald V Post. 1989. “Density in Scatterplots and the Estimation of Correlation.” Behaviour & Information Technology 8 (3): 235–44.
Lewandowsky, Stephan, and Ian Spence. 1989. “The Perception of Statistical Graphs.” Sociological Methods & Research 18 (2-3): 200–242.
Linnet, Kristian. 1998. “Performance of Deming Regression Analysis in Case of Misspecified Analytical Error Ratio in Method Comparison Studies.” Clinical Chemistry 44 (5): 1024–31.
Manuilova, Ekaterina, Andre Schuetzenmeister, and Fabian Model. 2021. Mcr: Method Comparison Regression. https://CRAN.R-project.org/package=mcr.
Martin, Robert F. 2000. “General Deming Regression for Estimating Systematic Bias and Its Confidence Interval in Method-Comparison Studies.” Clinical Chemistry 46 (1): 100–104.
Miall, RC, and PN Haggard. 1995. “The Curvature of Human Arm Movements in the Absence of Visual Experience.” Experimental Brain Research 103 (3): 421–28.
Mosteller, Frederick, Andrew Siegel, Edward Trapido, and Cleo Youtz. 1981. “Eye Fitting Straight Lines.” The American Statistician 35 (3): 150–52.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rousset, Élisabeth, François Bérard, and Michaël Ortega. 2015. “Study of the Effect of the Directness of the Interaction on Novice Users When Drawing Straight Lines.” In Proceedings of the 27th Conference on l’interaction Homme-Machine, 1–7.
Spence, Ian. 1990. “Visual Psychophysics of Simple Graphical Elements.” Journal of Experimental Psychology: Human Perception and Performance 16 (4): 683.
Strayer, Nick, Javier Luraschi, and JJ Allaire. 2022. R2d3: Interface to ’D3’ Visualizations. https://CRAN.R-project.org/package=r2d3.
Unwin, Antony, and Graham Wills. 1988. “Eyeballing Time Series.” In Proceedings of the 1988 ASA Statistical Computing Section, 263–68.
VanderPlas, Susan, and Heike Hofmann. 2015a. “Signs of the Sine Illusion—Why We Need to Care.” Journal of Computational and Graphical Statistics 24 (4): 1170–90.
———. 2015b. “Spatial Reasoning and Data Displays.” IEEE Transactions on Visualization and Computer Graphics 22 (1): 459–68.
———. 2017. “Clusters Beat Trend!? Testing Feature Hierarchy in Statistical Graphics.” Journal of Computational and Graphical Statistics 26 (2): 231–42.
Wickham, Hadley. 2011. “Ggplot2.” Wiley Interdisciplinary Reviews: Computational Statistics 3 (2): 180–85.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Sebastopol, CA: "O’Reilly Media, Inc.".
Wilkinson, Leland. 2013. The Grammar of Graphics. New York, NY: Springer Science & Business Media.
Wood, Simon. 2017. Generalized Additive Models: An Introduction with r. 2nd ed. New York, NY: Chapman; Hall/CRC.