Math 210
Laboratory 6

Regression and Correlation

Least-squares regression is used to describe a linear relationship between two variables and correlation is used to describe the strength of this relationship. For this lab, you are going to determine the regression equation and correlation coefficients for a couple of data sets and answer some questions associated with either the regression equation or the correlation coefficient.
 

  1. Robert Pershing Wadlow was born in Alton, Illinois on February 22, 1918. He weighed a normal eight pounds, six ounces. However, due to an overactive pituitary gland he grew at an astounding rate. He continued to grow his entire life. When he died, at age 22, (in Manistee, Michigan) he reached a height of 8 feet 11.1 inches. This qualifies him as the tallest person in history, as recorded in the Guinness Book of Records. Get the Robert Wadlow data.  The age given in this data set is his age in years and the height is given in inches.
    1. Plot the data with age as the explanatory variable (or x-variable) and height as the response variable (or y-variable).  (Graph > Scatterplot)  Make sure the axes of your scatterplot are labeled correctly.  Give your scatterplot a title.  Do your points look fairly linear?
    2. Find the equation for the least-squares regression line where his age is the explanatory variable (or predictor) and his height is the response variable. (Stat > Regression > Regression)
    3. Describe what the y-intercept of your regression line means in terms of his age and height. Be specific and use the appropriate numbers in your explanation. Does this number make sense? Explain.
    4. Describe what the slope of your regression line means in terms of his age and height. Be specific.
    5. Find the correlation for Robert Wadlow's age and height. Based on the correlation, is his age a good predictor of his height? (Stat > Basic Statistics > Correlation)
    6. Plot the data with age as the explanatory variable and height as the response variable along with the regression line. (Stat > Regression > Fitted Line Plot)
  1. The last time that a world's record in the men's mile run was set was on July 7, 1999.  It was accomplished by Hicham El Guerrouj of Morocco.  (El Guerrouj more recently won gold in the 1500 m and 5000m runs at the Olympics in Athens in 2004 and has been retired since 2006.) Get the mile run data. This table lists the year in which the record was broken, the runner, his nationality, and his time in seconds.
    1. Transform your years so that it gives years since 1900.  (Calc > Calculator.  Store results in some open column.  Your expression should be year - 1900.)  Label your new column of data "years since 1900."  Plot the data with year since 1900 as the explanatory variable (x-variable) and record time as the response variable (y-variable).  Make sure your scatterplot is labeled appropriately.  Do these points fall in a fairly linear pattern?
    2. Find the equation for the least-squares regression line that fits these data. Your explanatory variable (or predictor) should be number of years since 1900 and your response variable should be the time in seconds.
    3. Describe what the y-intercept and the slope of your regression line mean in terms of the year and the record time. Be specific and use the appropriate numbers in your explanation.
    4. Plot the data with year (since 1900) as the explanatory variable and record time as the response variable along with the regression line.
    5. When the first person ran a mile in less than 4 minutes it made headlines around the world.  Looking at your data, who was the first person to run a mile in less than 4 minutes?  When did that occur?
    6. Using your regression equation, when does your equation predict that someone would run a mile in 4 minutes?  Is the year the same as your answer to the previous question?  If not, why is there a difference?