Math 210
Laboratory 20
Old Faithful - Inference on Regression
The Old Faithful Geyser in Yellowstone National Park erupts every 35
to 120 minutes. The duration of each eruption lasts for 1½ to 5
minutes. Notice that Old Faithful is not as faithful as one might
expect. The time between eruptions and the length of each eruption
varies
quite a bit. However, one can estimate the time of the next eruption
quite
accurately given the duration of the previous eruption.
In this lab we will determine if there is a linear relationship
between
the duration of the eruption and the time between eruptions. The
data set we will work with consists of the duration of eruption
and
time between eruptions for 222 different eruptions of Old Faithful
taken
over a number of days in August 1978 and August 1979. (From Applied
Linear
Regression, 2nd Edition, by Sanford Weisberg, pp. 231 and 234.) The
times
given are in minutes. We will use the length of the duration to
predict
the length of the amount of time until the next eruption again.
The park rangers at Yellowstone do this and their
predictions
are posted near the geyser and at the web cam picture site located here.
Copy the Old Faithful data into a
Minitab
worksheet and answer the following questions.
- We will first examine the data using descriptive methods.
- Make a scatterplot with duration on the horizontal axis and
time between eruptions on
the vertical axis. Describe the relationship between the two
variables. (Graph >
Scatterplots > Simple.)
- Find a regression equation where duration is the explanatory
(predictor) variable
and
the time between eruptions is the response variable. (Stat >
Regression > Regression.)
- Use your regression equation to estimate the length of the
time between eruptions
when
the duration is 4.0 minutes.
. - We will now use the data set to
do some inference.
- Test the hypotheses H0: Ɓ
= 0 and Ha: Ɓ > 0. Report
the test statistic, P-value, and give a conclusion.
- Give both a 95% confidence interval and a 95% prediction
interval for
the
length of time between eruptions for a 4.0 minute duration. (To
do
this go to Stat > Regression >
Regression,
click on Options then put 4.0
in
the "Prediction intervals for new
observations:"
box.)
- Explain the difference between a confidence interval and a
prediction
interval.
- In the original data, there are 15 interval times given with a
duration
of 4.0 minutes. How many of these are in your prediction interval
from part (b)? How many (or what proportion) would you expect to
be in your prediction interval?
- Give the residual plot for the regression equation. By
examining
these residuals, does it appear that any regression assumptions are not
met? (To do this go to Stat >
Regression
> Regression, click on Graphs
then put duration in the "Residuals versus
the variable:" box.)
. - The following table is used by
rangers when predicting the length
of time
until the next eruption. (The table was taken from Jason Project
VIII,
http://www.jasonproject.org.)
| If the eruption lasts: |
To the starting time add: |
| 1.5 min |
51 min |
| 2.0 min |
58 min |
| 2.5 min |
65 min |
| 3.0 min |
71 min |
| 3.5 min |
76 min |
| 4.0 min |
82 min |
| 4.5 min |
89 min |
| 5.0 min |
95 min |
- How does your prediction for the length of the interval for a
4.0
minute
duration compare to that of the chart's?
- The margin of error for the predicted intervals in the chart is
plus or
minus 10 minutes. Is this similar to the margin of error in your
confidence interval or your prediction interval?