Results of a visual inference study on reading residual plots of misspecified linear regression model caused by missing Hermite polynomial terms

A dataset containing the information of 160 subject and their responses to 588 linupes. There are a total of 588 lineups, where lineup 577 - 588 are used as attention checks. Every subject evaluates 18 different lineups and two randomly assigned attention checks. Every lineup except those used as attention checks has been evaluated by five different subjects. Every lineup consists of 20 residual plots with one actual residual plot and 19 null residual plots drawn with rotated residuals.

Usage

polynomials

Format

A tibble with 3200 rows and 30 variables:

page: The page number of the study website
response_time: Time spent on a page, in milliseconds (1 second = 1000 milliseconds)
set: The set number or the subject ID
num: The lineup number in a set
selection: Selections made by the subject. Multiple selections are allowed and seperated by "_". "0" means the subject can't tell the difference between plots
num_selection: Number of selections made by the subject
reason: The reason for making the selections provided by the subject
confidence: Level of difference between the selected plots and others provided by the subject
age_group: Age group of the subject
educatoin: Educational background of the subject
pronoun: Preferred pronoun
previous_experience: Previous experience in any research that requires reading data graphs
lineup_id: Lineup ID
type: Type of the model
formula: The main formula of the model
shape: Shape of the Hermite polynomials, please check POLY_MODEL$hermite
x_dist: Distribution of the variable x
include_z: Whether to include variable z in the model
e_dist: Distribution of error term e
e_sigma: The standard deviation of the error term e
name: Name of the model
k: Number of residual plots in a lineup
n: Number of observations in a residual plot
effect_size: Effect size of the actual residual plot
answer: The answer of the lineup
detect: Whether the subject selects the actual residual plot
conventional_p_value: P-value of the conventional test (F-test) by comparing the null model (y ~ x) and the correct model (y ~ x + z)
weigthed_detect: If detect == TRUE, weighted_detect = detect/num_selection. Otherwise, weighted_detect = 0.
prop_detect: Poportion of detection of a lineup. For a lineup, prop_detect = mean(weighted_detect).

Details

To reproduce the models, use poly_model().

For x_dist = "uniform", define x = rand_uniform(-1, 1).

For x_dist = "normal", define x = {stand_dist <- function(x) {(x - min(x))/max(x - min(x)) * 2 - 1}; raw_x <- rand_normal(sigma = 0.3); closed_form(~stand_dist(raw_x))}.

For x_dist = "lognormal", define x = {stand_dist <- function(x) {(x - min(x))/max(x - min(x)) * 2 - 1}; raw_x <- rand_lognormal(sigma = 0.6); closed_form(~stand_dist(raw_x/3 - 1))}.

For x_dist = "uniform_discrete", define x = rand_uniform_d(k = 5, even = TRUE).

For example, if shape = 1, e_sigma = 1, include_z = TRUE and x_dist = "uniform", then the model can be defined as y = poly_model(shape = 1, sigma = 1, include_z = TRUE, x = rand_uniform(-1, 1)).

Note that the models will not produce exactly the same lineups as shown to subjects due to randomness. Data stored in get_polynomials_lineup() should be used instead.