Results of a visual inference study on reading residual plots of misspecified linear regression model caused by missing Hermite polynomial terms
Source:R/data.R
polynomials.Rd
A dataset containing the information of 160 subject and their responses to 588 linupes. There are a total of 588 lineups, where lineup 577 - 588 are used as attention checks. Every subject evaluates 18 different lineups and two randomly assigned attention checks. Every lineup except those used as attention checks has been evaluated by five different subjects. Every lineup consists of 20 residual plots with one actual residual plot and 19 null residual plots drawn with rotated residuals.
Format
A tibble with 3200 rows and 30 variables:
- page
The page number of the study website
- response_time
Time spent on a page, in milliseconds (1 second = 1000 milliseconds)
- set
The set number or the subject ID
- num
The lineup number in a set
- selection
Selections made by the subject. Multiple selections are allowed and seperated by
"_"
."0"
means the subject can't tell the difference between plots- num_selection
Number of selections made by the subject
- reason
The reason for making the selections provided by the subject
- confidence
Level of difference between the selected plots and others provided by the subject
- age_group
Age group of the subject
- educatoin
Educational background of the subject
- pronoun
Preferred pronoun
- previous_experience
Previous experience in any research that requires reading data graphs
- lineup_id
Lineup ID
- type
Type of the model
- formula
The main formula of the model
- shape
Shape of the Hermite polynomials, please check POLY_MODEL$hermite
- x_dist
Distribution of the variable
x
- include_z
Whether to include variable
z
in the model- e_dist
Distribution of error term
e
- e_sigma
The standard deviation of the error term
e
- name
Name of the model
- k
Number of residual plots in a lineup
- n
Number of observations in a residual plot
- effect_size
Effect size of the actual residual plot
- answer
The answer of the lineup
- detect
Whether the subject selects the actual residual plot
- conventional_p_value
P-value of the conventional test (F-test) by comparing the null model (y ~ x) and the correct model (y ~ x + z)
- weigthed_detect
If
detect == TRUE
,weighted_detect = detect/num_selection
. Otherwise,weighted_detect = 0
.- prop_detect
Poportion of detection of a lineup. For a lineup,
prop_detect = mean(weighted_detect)
.
Details
To reproduce the models, use poly_model()
.
For x_dist = "uniform"
, define x = rand_uniform(-1, 1)
.
For x_dist = "normal"
, define x = {stand_dist <- function(x) {(x - min(x))/max(x - min(x)) * 2 - 1}; raw_x <- rand_normal(sigma = 0.3); closed_form(~stand_dist(raw_x))}
.
For x_dist = "lognormal"
, define x = {stand_dist <- function(x) {(x - min(x))/max(x - min(x)) * 2 - 1}; raw_x <- rand_lognormal(sigma = 0.6); closed_form(~stand_dist(raw_x/3 - 1))}
.
For x_dist = "uniform_discrete"
, define x = rand_uniform_d(k = 5, even = TRUE)
.
For example, if shape = 1
, e_sigma = 1
, include_z = TRUE
and x_dist = "uniform"
,
then the model can be defined as y = poly_model(shape = 1, sigma = 1, include_z = TRUE, x = rand_uniform(-1, 1))
.
Note that the models will not produce exactly the same lineups as shown to
subjects due to randomness. Data stored in get_polynomials_lineup()
should be used instead.