Math 210
Laboratory 22

Inference about Relationships: Chi-square, Regression, and ANOVA

We will be examining three types of relationships in this lab---each with a different type of test of significance.  The first looks at a relationship between two categorical variables, the second looks a relationship between two quantitative variables, and the third looks at a relationship between the means from five different populations.


Chi-Square Test: Heart disease and Baldness

In a study researchers selected a sample of 663 heart disease patients (male) and a control group of 772 males not suffering from heart disease. Each was asked to classify their degree of baldness on a 5-point scale. The results are given in the following table.  Use these data to answer the following questions.
Baldness
None
Little
Some
Much
Extreme
Heart_disease
251
165
195
50
2
Control
331
221
185
34
1
  1. We will first look at this relationship using descriptive methods.
    1. Of those in the control group, what percent claimed to have little or no baldness?  Some, much or extreme baldness?
    2. Of those with heart disease, what percent claimed to have little or no baldness?  Some, much or extreme baldness?
    3. At this stage in the investigation, state whether you think there is a relationship between heart disease and baldness. Explain your answer.
  1. We are now going to run a test of significance on this.
    1. State the null and alternative hypotheses you would use to test whether there is a relationship between heart disease and baldness.
    2. Complete a chi-square test to determine if there is a relationship between heart disease and baldness.  (Put the data into Minitab then use Stat > Tables > Chi-square Test.)  Make sure you give the chi-square statistic, the P-value, and a conclusion.
    3. Do the results of your test show that baldness causes heart disease?  Or that heart disease causes baldness?


Old Faithful - Inference on Regression

The Old Faithful Geyser in Yellowstone National Park erupts every 35 to 120 minutes. The duration of each eruption lasts for 1½ to 5 minutes.  Notice that Old Faithful is not as faithful as one might expect. The time between eruptions and the length of each eruption varies quite a bit. However, one can estimate the time of the next eruption quite accurately given the duration of the previous eruption.

In this part we will determine if there is a linear relationship between the duration of the eruption and the time between eruptions.  The data set we will work with consists of  the duration of eruption and time between eruptions for 222 different eruptions of Old Faithful taken over a number of days in August 1978 and August 1979. (From Applied Linear Regression, 2nd Edition, by Sanford Weisberg, pp. 231 and 234.)  The times given are in minutes.  We will use the length of the duration to predict the length of the amount of time until the next eruption again.  The park rangers at Yellowstone do this and their predictions are posted near the geyser and at the web cam picture site located here.

Copy the Old Faithful data into a Minitab worksheet and answer the following questions.

  1. We will first examine the data using descriptive methods.
    1. Make a scatterplot with duration on the horizontal axis and time between eruptions on the vertical axis.  Describe the relationship between the two variables. (Graph > Scatterplot > Simple.)
    2. Find a regression equation where duration is the explanatory variable and the time between eruptions is the response variable.  (Stat > Regression > Regression.)
    3. Use your regression equation to estimate the length of the time between eruptions when the duration is 4.0 minutes.
    .
  2. We will now use the data set to do some inference.
    1. Is there a positive relationship between the duration of an eruption and the time until the next erruption.  Using the data test this hypotheses.  Report the hypotheses, test statistic, P-value, and a conclusion.  (You should have already obtained your test statistic and P-value in completing question 1b.)
    2. Give both a 95% confidence interval and a 95% prediction interval for the length of time between eruptions for a 4.0 minute duration.  (To do this go to Stat > Regression > Regression, click on Options then put 4.0 in the "Prediction intervals for new observations:" box.)
    3. Explain the difference between a confidence interval and a prediction interval.
    4. In the original data, there are 15 interval times given with a duration of 4.0 minutes.  How many of these are in your prediction interval from part (b)?  How many (or what proportion) would you expect to be in your prediction interval?

Cuckoo for ANOVA

Some bird species do not build nests and do not raise their own young.  Instead, they lay their eggs in the nests of other "host" birds. In this country, the cowbird is the most common bird that will do this.  In general, these  types of birds are known as brood parasites.  Due to the size differences and early hatching ability of the parasitic bird, the hosts raise only the young of the parasitic bird, thereby loosing their own clutch.  It is somewhat comical to see a small bird such as a sparrow or a finch feeding its much larger "adopted" baby.   To see a picture of a large cowbird egg in a nest of other smaller birds click here.  To see a small host bird feeding a cowbird click here.

Cuckoos are another species of brood parasites.  The data set that can be found here contains the length of cuckoo eggs that were found in nests of various other host birds. (Reference: L.H.C. Tippett, The Methods of Statistics, 4th Edition, John Wiley and Sons, Inc., 1952, p. 176.) The host birds represented are the Tree Pipit, Hedge Sparrow, Robin, Pied Wagtail, and Wren. All data are lengths in millimeters.

Copy the data into Minitab and do the following.

  1. Make side-by-side boxplots of the lengths.  (Graph > Boxplots > Multiple Y's > Simple.)  By looking at the boxplots, does it appear that the mean lengths of the cuckoo's eggs vary depending on the host bird?
  1. Complete an ANOVA for the mean lengths for the five different host birds. (Stat > ANOVA > One way (Unstacked)) then put all five species into the Responses window.) Make sure you write out your hypotheses, give your test statistic and P-value, and write out your conclusion.