Consider the real estate data on 60 homes in a middle class neighborhood of Southwest Houston provided on the Excel data file. The variables are: Price (Y) in $1000, Square Feet (X1), # of Rooms (X2), # of Bedrooms (X3), # of Bathrooms (X4), Age (X5)
A). Construct Scatter plot of all 5 variables (5 graphs), find the best fitted linear equations and comment on the goodness of the fit based on Significant F values.
B). Construct the correlation matrix of all 6 variables and rank variables based on the absolute value of correlations with Price.
C). Construct a full model using all independent variables, find the equation, R-squared, significant F, and comment on the goodness of the model.
D). In part C, rank the independent variables based on degree of contribution to the model. Pick the top 4 variables based on P-values and construct a 4 variable multiple linear regression model including residual values. Find the regression equation, R-squared, significant F, and comment on the goodness of the model.
E). Based on model in part (e), find value for 2500 sq. ft. home with 6 rooms, 2.5 baths, 3 bedrooms, and age of 40.
E). Use the residuals output in part (e) and identify top 5 overvalued and 5 undervalued homes in this neighborhood.