Page 514. Of Uses and Abuses
Uses
Correlation and Regression Correlation and regression analysis can be used
to determine whether there is a significant relationship between two variables.
When there is, you can use one of the variables to predict the value of the other
variable. For instance, educators have used correlation and regression analysis
to determine that there is a significant correlation between a student’s SAT
score and the grade point average from a student’s freshman year at college.
Consequently, many colleges and universities use SAT scores of high school
applicants as a predictor of the applicant’s initial success at college.
Abuses
Confusing Correlation and Causation The most common abuse of correlation
in studies is to confuse the concepts of correlation with those of causation
(see page 480). Good SAT scores do not cause good college grades. Rather, there
are other variables, such as good study habits and motivation, that contribute to
both. When a strong correlation is found between two variables, look for other
variables that are correlated with both.
Considering Only Linear Correlation The correlation studied in this chapter
is linear correlation. When the correlation coefficient is close to 1 or close to -1,
the data points can be modeled by a straight line. It is possible that a correlation
coefficient is close to 0 but there is still a strong correlation of a different type.
Consider the data listed in the table at the left. The value of the correlation
coefficient is 0. However, the data are perfectly correlated with the equation
x2+y2 =1, as shown in the figure at the left.
Ethics
When data are collected, all of the data should be used when calculating
statistics. In this chapter, you learned that before finding the equation of a
regression line, it is helpful to construct a scatter plot of the data to check for
outliers, gaps, and clusters in the data. Researchers cannot use only those data
points that fit their hypotheses or those that show a significant correlation.
Although eliminating outliers may help a data set coincide with predicted
patterns or fit a regression line, it is unethical to amend data in such a way. An
outlier or any other point that influences a regression model can be removed
only when it is properly justified.
In most cases, the best and sometimes safest approach for presenting
statistical measurements is with and without an outlier being included. By
doing this, the decision as to whether or not to recognize the outlier is left to
the reader.