Stats >> Training >> Stats #03: Practice Exercises

These exercises refer to two data sets:

You will find these data sets in the SPSS Program folder, located in the Classroom Examples folder.

1. In the breast feeding data set (BF.SAV), examine the relationship between the total number of apnea and bradycardia incidents (TOTAL_AB) and the age of the infant at discharge from the hospital (DC_AGE). Although a linear regression model is not ideal for this type of data, you will find some interesting and useful ideas from this analysis.

2. Using the same data set, examine the relationship between TOTAL_AB and the treatment group variable (FEED_TYP).

3. In the housing data set (HOUSING.SAV), examine the relationship between the square footage of a house (SQFT) and the sales price of the house (PRICE).

4. In the same data set, examine whether a custom built house (CUST: 1=Yes, 0=No) influences the price of a home.

5. You are concerned that custom built houses are more expensive, not because they are custom built, but only because they are bigger.

6. Infants with low birth weights and early gestational ages tend to have more problems with apnea and bradycardia. Since birth weight and gestational age are so closely related, you are not sure how to separately account for the predictive ability of each variable.

7. Examine the assumptions of the regression model for the housing data, where you used SQFT and CUST to predict PRICE.

8. There are additional residual plots that you can use to check if additional variables should be included in your regression model.

9. A possible violation of the assumptions of the linear regression model is when the variation in the dependent variable is related to one of the fixed factors or to one of the covariates. You draw scatterplots and/or boxplots of the residuals versus the factors and covariates. If the variation in one part of the graph is much different than in another part of the graph, you should investigate further. Generally, you need to look for a very large discrepancy: variation that is 2 or 3 times larger/smaller. Discrepancies of this size warrant further investigation and possible use of more complex regression models.

10. For the breast feeding data, fit a regression model using TOTAL_AB as the dependent variable and DC_AGE as the independent variable. As noted earlier, linear regression is not an ideal procedure here.