Math 210
Laboratory 20

Old Faithful - Inference on Regression

The Old Faithful Geyser in Yellowstone National Park erupts every 35 to 120 minutes. The duration of each eruption lasts for 1½ to 5 minutes.  Notice that Old Faithful is not as faithful as one might expect. The time between eruptions and the length of each eruption varies quite a bit. However, one can estimate the time of the next eruption quite accurately given the duration of the previous eruption.

In this lab we will determine if there is a linear relationship between the duration of the eruption and the time between eruptions.  The data set we will work with consists of  the duration of eruption and time between eruptions for 222 different eruptions of Old Faithful taken over a number of days in August 1978 and August 1979. (From Applied Linear Regression, 2nd Edition, by Sanford Weisberg, pp. 231 and 234.) The times given are in minutes.  We will use the length of the duration to predict the length of the amount of time until the next eruption again.  The park rangers at Yellowstone do this and their predictions are posted near the geyser and at the web cam picture site located here.

Copy the Old Faithful data into a Minitab worksheet and answer the following questions.

  1. We will first examine the data using descriptive methods.
    1. Make a scatterplot with duration on the horizontal axis and time between eruptions on the vertical axis.  Describe the relationship between the two variables.  (Graph > Scatterplots > Simple.)
    2. Find a regression equation where duration is the explanatory (predictor) variable and the time between eruptions is the response variable.  (Stat > Regression > Regression.)
    3. Use your regression equation to estimate the length of the time between eruptions when the duration is 4.0 minutes.
    .
  2. We will now use the data set to do some inference.
    1. Test the hypotheses H0: Ɓ = 0 and Ha: Ɓ > 0.  Report the test statistic, P-value, and give a conclusion.
    2. Give both a 95% confidence interval and a 95% prediction interval for the length of time between eruptions for a 4.0 minute duration.  (To do this go to Stat > Regression > Regression, click on Options then put 4.0 in the "Prediction intervals for new observations:" box.)
    3. Explain the difference between a confidence interval and a prediction interval.
    4. In the original data, there are 15 interval times given with a duration of 4.0 minutes.  How many of these are in your prediction interval from part (b)?  How many (or what proportion) would you expect to be in your prediction interval?
    5. Give the residual plot for the regression equation.  By examining these residuals, does it appear that any regression assumptions are not met?  (To do this go to Stat > Regression > Regression, click on Graphs then put duration in the "Residuals versus the variable:" box.)
    .
  3. The following table is used by rangers when predicting the length of time until the next eruption. (The table was taken from Jason Project VIII,  http://www.jasonproject.org.)
If the eruption lasts: To the starting time add:
1.5 min 51 min
2.0 min 58 min
2.5 min 65 min
3.0 min 71 min
3.5 min 76 min
4.0 min 82 min
4.5 min 89 min
5.0 min 95 min
    1. How does your prediction for the length of the interval for a 4.0 minute duration compare to that of the chart's?
    2. The margin of error for the predicted intervals in the chart is plus or minus 10 minutes.  Is this similar to the margin of error in your confidence interval or your prediction interval?